◼︎ Bibliography
Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, and Shixiang Shane Gu. “Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning”, International Conference on Machine Learning on Machine Learning 2021 (ICML2021). July 2021.
◼︎ Overview
Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments. In particular, we still do not have agreeable ways to measure the difficulty or solvability of a task, given that each In particular, we still do not have agreeable ways to measure the difficulty or solvability of a task, given that each has fundamentally different actions, observations, dynamics, rewards, and can be tackled with diverse RL algorithms. information capacity (PIC) – the mutual information between policy parameters and episodic return – and policy-optimal information capacity (POIC) – the mutual information between policy parameters and episodic return – and policy-optimal information capacity (POIC) – between policy parameters and episodic optimality – as two environment-agnostic, algorithm -Evaluating our metrics across toy environments as well as continuous control benchmark tasks from OpenAI Gym and DeepMind Control Suite, we empirically demonstrate that these information-theoretic metrics have higher correlations with normalized Lastly, we show that these metrics can also be used for fast and compute-efficient optimizations of key design parameters such as reward shadows. Lastly, we show that these metrics can also be used for fast and compute-efficient optimizations of key design parameters such as reward shaping, policy architectures, and MDP properties for better solvability by RL algorithms without ever running full RL experiments. full RL experiments.