• Home
  • News
  • Our paper was accepted for NeurIPS 2023
  • Our paper was accepted for NeurIPS 2023

    ◼︎Bibliographic information
    Paul Yoo, Jiaxian Guo, Yutaka Matsuo, Shixiang Shane Gu. “DreamSparse: Escaping from Plato’s Cave with 2D Diffusion Model Given Sparse Views.” Neural Information Processing Systems (NeurIPS 2023)
    ◼︎Overview
    Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and texture-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a conditional model is introduced to convert these 3D feature maps into spatial guidance information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images. DreamSparse introduces a geometry module as a 3D prior to capture 3D features from sparse views and, in order to leverage that information for the diffusion model, a conditional model is proposed to convert 3D features as guidance information to guide the pre-trained diffusion model to generate geometrically consistent images. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalizing to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images.