小林由弥, 鈴木雅大, 松尾豊: 深層生成モデルによる背景情報を利用したシーン解釈, 人工知能学会論文誌, 第38巻3号, 2023.
Ability to understand surrounding environment compositionally by decomposing it into its individual components is important cognitive ability. Human beings decompose arbitral entities into some parts based on its semantics or functionality, and recognize those parts as “object”. Such kind of object recognition ability is fundamental to planning. Recently, researches called “scene interpretation” have been conducted using deep generative models. Those researches build models that are able to recognize environment compositionally. The objective of this paper is to extend scene interpretation methods. Application of existing methods are restricted to simple images, and could not deal with complex images such as real images and heavily textured images. This is because previous works are done in fully-unsupervised manner, and the objective function is just minimizing reconstruction error. Therefore, in this case, models have no clues about objects unlike models leveraging supervised information, or inductive bias. In this research, we propose a method to decompose scenes as intended using minimum auxiliary information to identify objects. We build a model that utilizes background as auxiliary information to separate representation of background and foreground, and then we show our method is able to deal with datasets that are difficult for existing methods.