プロフィール

2010 – 2014, University of Cambridge, BA, MEng
2014 – 2020, Okinawa Institute of Science and Technology Graduate University, PhD
2020 – 2024, Kyoto University, Program-Specific Assistant Professor
2024 – , University of Tokyo, Project Assistant Professor

Paavo Parmas holds BA and MEng degrees from the University of Cambridge (advisor: Carl Edward Rasmussen) and a PhD from the Okinawa Institute of Science and Technology (advisor: Kenji Doya). During his PhD he spent time as a visiting student at the University of Cambridge in Carl Edward Rasmussen’s lab, at TU Darmstadt in Jan Peters’ lab and at ATR in Jun Morimoto’s lab. Moreover, he spent time as a research intern in RIKEN AIP in Masashi Sugiyama’s lab and at DeepMind Paris in Remi Munos’ lab.

研究内容
Keywords: Model-based reinforcement learning, Monte Carlo gradient estimation, policy gradient reinforcement learning, world models, deep learning, robotics
Paavo is broadly interested in everything related to gradients – how to compute/estimate them, and their application to optimize various objectives in machine learning. A key theme in his research has been the application of derivative-based optimization algorithms to reinforcement learning. One of his notable findings is the issue that gradients will become ill-behaved in chaotic systems [1]. Such issues are prevalent throughout deep learning. To overcome such issues, he proposed new gradient computation schemes that combine 0-th order derivative-free gradient estimators with 1-st order derivative-based gradient estimators. In later work he proposed a unified view of such gradient estimators [2], proposed software that automatically implements these algorithms [3], and scaled the algorithms to be applicable with world model based reinforcement learning [4].