Our paper has been accepted for publication in the Transactions of the Japanese Society for Artificial Intelligence.
◼︎ Bibliographic Information
 Xin Zhang, Tatsuya Matsushima, Yutaka Matsuo, Yusuke Iwasawa: M3IL: Multi-Modal Meta-Imitation Learning, Transactions of the Japanese Society for Artificial Intelligence, Volume 38 Number 2 J-STAGE (2022)
 ◼︎Overview
 Imitation Learning (IL) is anticipated to achieve intelligent robots since it allows the user to teach various robot tasks easily. -In particular, FewShot Imitation Learning(FSIL) aims to infer and adapt fast to unseen tasks with a small amount of data. Especially when we want to teach the robot a new task, we need to execute the task for the assignment every time. Inspired by the fact that humans specify tasks using language instructions without executing them, we propose a multi-modal FSIL setting in this work. The model leverages image and language information in the training phase and utilizes both image and language or only language information in the testing phase. We also propose a Multi-Modal Meta-Imitation Learning or M3IL, which can infer with only image or language information. Our result shows the effectiveness of M3IL and the importance of language Our result shows the effectiveness of M3IL and the importance of language instructions in the FSIL setting


