Distributional Offline ContinuousTime Reinforcement Learning with Neural PhysicsInformed PDEs (SciPhy RL for DOCTRL)
Abstract
This paper addresses distributional offline continuoustime reinforcement learning (DOCTRL) with stochastic policies for highdimensional optimal control. A soft distributional version of the classical HamiltonJacobiBellman (HJB) equation is given by a semilinear partial differential equation (PDE). This `soft HJB equation' can be learned from offline data without assuming that the latter correspond to a previous optimal or nearoptimal policy. A datadriven solution of the soft HJB equation uses methods of Neural PDEs and PhysicsInformed Neural Networks developed in the field of Scientific Machine Learning (SciML). The suggested approach, dubbed `SciPhy RL', thus reduces DOCTRL to solving neural PDEs from data. Our algorithm called Deep DOCTRL converts offline highdimensional data into an optimal policy in one step by reducing it to supervised learning, instead of relying on value iteration or policy iteration methods. The method enables a computable approach to the quality control of obtained policies in terms of both their expected returns and uncertainties about their values.
 Publication:

arXiv eprints
 Pub Date:
 April 2021
 arXiv:
 arXiv:2104.01040
 Bibcode:
 2021arXiv210401040H
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Artificial Intelligence;
 Physics  Computational Physics;
 Quantitative Finance  Computational Finance;
 I.2.6;
 I.2.8
 EPrint:
 24 pages, 5 figures