Optimal control theory mixed with deep learning where software agents learn to take actions within an environment and make sequences of decisions to maximize a cumulative reward -- typically stated in terms of markov decision process -- finding a balance between exploration (uncharted territory) and exploitation (current knowledge). Generally a reverse engineering of various psychological learning processes. [1] [2] [3]