松尾研リサーチインターン体験記 vol.11 【アルゴリズム】拡散モデルの蒸留手法に関する研究

松尾・岩澤研究室では，「知能を創る」というミッションのもと、世界モデルをはじめとした深層学習やそれを超える基礎技術の開発、ロボティクスや大規模言語モデル、アルゴリズムの社会実証といった幅広い研究領域で活動しています。

こうした活動を更に拡大するため、リサーチインターンシップを開催し、15名の方にご参加いただきました。

▼リサーチインターンシップ概要
https://weblab.t.u-tokyo.ac.jp/news/20240417/

▼インターンテーマ／メンターの紹介記事
https://weblab.t.u-tokyo.ac.jp/news/20240426/

本記事では、リサーチインターンに参加いただいたメンバーの体験記をご紹介します。

Self-Introduction

My name is Alex Matsumura and was a second-year undergraduate student at Waseda University during this internship. I grew up in Cupertino, California and moved to Japan two years ago for university. During my time growing up in America, I attended a Japanese school every Saturday that ended up being extremely useful in my time in Japan, especially since a lot of the verbal communication in this lab was conducted in Japanese.

I first found myself interested in deep learning during the second semester of my first year after learning the fundamentals such as multivariable calculus and linear algebra. The idea of backpropagation and making cold machine systems learn and interpret abstract concepts was mind blowing to me. I then self-studied a lot of math, especially focusing on probability theory while reading foundational papers in deep learning such as Resnet and Transformers. During my first internship I was given the chance to pretrain a language model, as well as train my own diffusion model from scratch. After getting a feel for the fundamentals of the different areas in deep learning, I eventually settled with generative models because I felt that the math underlying their theory was extremely elegant. After reading and getting familiar with generative models, I started looking around for internships or research opportunities that were related to diffusion models. Unfortunately, not a lot of research opportunities are available in the area of generative models in japan, and furthermore even fewer were willing to give a chance to a second year undergraduate student. However, Matso Lab’s research internship not only directly aligned with some of my biggest interests, but they also had a very friendly and supportive and open environment.

About Research

The internship track that I chose was called “Research on Distillation Methods for Diffusion Models”, and as the name suggested, my mentor and I developed a new distillation method that allowed for one step generation. At the start of the internship, my mentor suggested an idea that he thought might work. After doing some tweaking and refining this initial idea, we decided to try training the model. Though initial results seemed to be promising and resulted in relatively good performance, it wasn’t enough to reach the state of the art. What we noticed was that there are a lot of different methods that all look really similar, except they differ slightly in one way or another.

Our first realization was that all of the diffusion distillation methods out there could be divided into two categories: instance-based and distribution-based distillation. We built upon an idea introduced by a recent paper named “Diffusion models are innate one-step generators”, as their hypothesis was compelling. What they claimed was that instance-based methods perform worse and converge slower than distribution-based methods because they require the trajectories of the teacher model to be copied by the student model. Since the teacher model has much more expressibility (in terms of diffusion steps) compared to the student model, the loss landscape will differ, meaning that enforcing the exact trajectories will be suboptimal for the student model. This paper however, did not give an exact definition for distribution-based distillation and so we started from there. After defining distribution-based distillation, we found that a lot of the SoTA methods follow our definition. Furthermore, what was interesting was that our method was actually the exact same as the current SoTA data-free distillation method when configured to a certain hyperparameter even though derived in completely different ways (they derived it empirically by coincidence). We realized that there is a deeper connection to all of these methods than we initially thought. We identified the leading methods, used the information we can gather about the different divergences that they represent and why they work, and made a distribution-based objective of our own to test our hypothesis. What we found was that our method does in fact converge in less images compared to almost all SoTA methods in literature. Although we have yet to conduct a long training run to beat the current SoTA, the results seem to be promising.

Closing

Besides the actual research, the environment and people there was what made my experience so enjoyable. For the first half of my internship, I went to the lab everyday and had a chance to spend time with the members there. They were all very open and friendly to me and I felt very accepted despite having such different backgrounds from them. Specifically, my mentor, Taniguchi-san, was extremely nice to me and I had a fun time working with him. His English was very good, and could understand what I was saying even when I spoke to him in English. Our normal conversations and zoom meetings were mainly in Japanese, although I often sprinkled in English when I could not find the words to express myself in Japanese. All slack and written communication on the other hand was conducted in English. He was also always open to listen to my ideas and we often bounced ideas back and forth during the internship. I really appreciate his kindness and guidance, and I learned so much from him during these 7 weeks. I attended the kickoff bbq event in the beginning, and also attended a talk by Johnathan Frankle about training large language models. Both were very unique opportunities and I found them enjoyable to attend. All in all, this internship was a good experience, and I learned a lot of invaluable skills and knowledge.

いかがでしたでしょうか？
松尾研では研究員を積極的に募集しております。気になる方は下記をご覧ください！
https://weblab.t.u-tokyo.ac.jp/joinus/career/

松尾研リサーチインターン体験記 vol.11 【アルゴリズム】拡散モデルの蒸留手法に関する研究

Related Post