open the offcanvas menu Derwen home page graph technologies

search and discovery

Optimal control theory mixed with deep learning where software agents learn to take actions within an environment and make sequences of decisions to maximize a cumulative reward -- typically stated in terms of markov decision process -- finding a balance between exploration (uncharted territory) and exploitation (current knowledge). Generally a reverse engineering of various psychological learning processes. [1] [2] [3]