■書誌情報
Koya Sakamoto, Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, Shu Morikuni, Naoya Chiba, Motoaki Kawanabe, Yusuke Iwasawa, Yutaka Matsuo: E3VS-Bench: A Benchmark for Viewpoint-Dependent Active Perception in 3D Gaussian Splatting Scenes, Proceedings of the 19th European Conference on Computer Vision (ECCV 2026), September 2026
■概要
Visual search in 3D environments requires embodied agents to actively explore their surroundings and acquire task-relevant evidence. However, existing visual search and embodied AI benchmarks rely on static observations or limited egocentric motion, and therefore cannot evaluate viewpoint-dependent phenomena that arise in real-world 3D environments, such as occlusion, depth ambiguity, and hidden object attributes. To address this limitation, we introduce E3VS-Bench, a benchmark for embodied 3D visual search where agents must control their viewpoints in 6-DoF to gather viewpoint-dependent evidence for question answering. E3VS-Bench consists of 99 high-fidelity 3D scenes reconstructed using 3D Gaussian Splatting and 2,014 question-driven episodes. 3D Gaussian Splatting enables photorealistic free-viewpoint rendering that preserves fine-grained visual details (e.g., small text and subtle attributes) often degraded in mesh-based simulators, enabling questions that cannot be answered from a single view and require active inspection in 6-DoF. We evaluate multiple state-of-the-art VLMs and compare their performance with humans. Despite strong 2D reasoning ability, all models exhibit a substantial gap from humans, revealing fundamental limitations in active perception and coherent viewpoint planning in real-world 3D environments.
—
ECCV 2026に当研究室の論文が採録
