Pixyz, a library for implementing deep generative models, was released last fall, amid the growing interest in “generative models” for generating images, documents, and music in artificial intelligence research.
We interviewed Masahiro Suzuki, a researcher at Matsuo Lab, who played a central role in the development of Pixyz.
I had always dreamed of “realizing the brain of a robot through artificial intelligence,” and I first encountered machine learning when I was a junior in college.
Then, I graduated from Hokkaido University’s Department of Information Science and joined the Matsuo Laboratory in my first year of doctoral studies. I was attracted to the Matsuo Lab because I sympathized with Professor Matsuo, who said that deep learning (Deep Learning) had great potential at a time when it was not attracting as much attention as it is now.
Currently, as a researcher at the Matsuo Lab, I am not only conducting research on multimodal learning and deep generative models, but I am also involved in many educational activities, such as the DeepLearning Basic Course.
Multimodal learning: Machine learning that takes multiple types of data as input and processes them in an integrated manner.
What is “Pixyz”?
Pixyz is a library for easy and general-purpose implementation of a framework called “deep generative model” in deep learning.
Deep generative models are generative models composed of a deep neural network, so we first explain the generative model.
Simply put, a generative model is a framework that focuses on “how was the data we have now created?” and model it (the process of data generation). Most deep learning research to date has focused on “separating” data, but the generative model is a contrasting approach.
If a generative model can be properly trained from data, it can “generate” new data that closely resembles the real data. Also, since the generative model knows the process of generating the learned data, it can perform “anomaly detection” and “noise reduction.
Generative models are usually designed as probabilistic models, but recently deep neural networks have come to be used as these probabilistic models, and thanks to the expressive power of the networks, it is now possible to learn higher-dimensional and larger data. This is the deep generative model.
Before the release of Pixyz, I did not expect the response to be that great, as I thought the number of people who would use it would be very limited. However, we were very surprised by the response we received after the announcement on Twitter, which was more than we had expected.
We wanted to create a library that would simplify the implementation of the latest deep generative models.
Deep generative models have attracted a great deal of attention because of their ability to generate high quality images. However, recent methods have become increasingly complex, making it difficult to implement them using conventional deep learning libraries. Against this background, we decided to develop Pixyz.
The direct impetus for the development of Pixyz came from Tars, which we developed two years ago. Pixyz is an extension of Tars, and allows for more complex and various types of deep generative models to be implemented in a concise manner.
In the beginning, I was the only one who developed the software as a hobby, so I had to proceed all by myself, which was a difficult task. Now, however, I am developing it together with others in my laboratory who are also using it.
Recently, many deep generative models have been proposed, and research on “world models,” in which the environment itself is learned from images and other data, is also underway to further develop deep generative model research.
It is believed that this world model will enable us to understand the structure of the world and further generate, or simulate, artificial intelligence that can predict and imagine.
Development with “democratization” in mind
On the other hand, such a world model is designed with a very complex deep generative model, which makes it difficult to understand and implement, as well as difficult for non-specialists to use.
In this sense, we believe that Pixyz can contribute to the “democratization” of deep generative models and world models. Currently, I feel that Pixyz is still insufficiently developed as a library, but in the future I would like to make it a library that can be used by many researchers.
Regarding world models, the research called GQN (Generative Query Network) published by DeepMind in Science last year attracted a lot of attention.
This is a research project in which several viewpoints of a room and images of the scenery seen from those viewpoints are given to an artificial intelligence, which then infers information about what kind of room it is and can generate images from unseen viewpoints of the same room.
In the Matsuo Lab, Pixyz, which was developed with this background, was used to successfully reproduce and implement GQN by Mr. Taniguchi, a member of the Matsuo Lab and a fourth-year undergraduate student.
This implementation was introduced on Twitter by Ali Eslami, the first author of the GQN paper, along with Pixyz, and became a hot topic.
In addition to this, we are also working on implementing various state-of-the-art deep generative models and world models in Pixyz. These are available on a page (repository) named “Pixyzoo”, so please visit this page as well.
I myself believe that research on deep generative models and world models is important for knowledge processing in robots and for the realization of general-purpose artificial intelligence like the human brain, and I would be happy if Pixyz can help realize this goal.
Message to students studying deep learning
As a researcher who is also involved in education, I feel that now is a very good time to study machine learning and deep learning, but at the same time, it is no longer a good time to do only that.
In my case, in thinking about “how to achieve human-like intelligence,” I also researched the human brain. Although I myself am still in my infancy as a researcher, I feel that such knowledge is now being put to good use.
Therefore, I hope that you will know that absorbing a wide range of knowledge, rather than studying only one thing, may seem like a roundabout way, but in the end it will be a shortcut to finding your own strengths.
[Profile
Masahiro Suzuki Project Researcher, Graduate School of Engineering, The University of Tokyo
March 2013 Graduated from Faculty of Engineering, Hokkaido University (Academic Excellence Award)
March 2015 Completed Graduate School of Information Science and Technology, Hokkaido University
Mar. 2018 Completed Graduate School of Engineering, The University of Tokyo, Ph.
D. thesis: Research on multimodal learning with deep learning and generative models (Dean’s Award (Research), Graduate School of Engineering)
Apr 2018 Specially Appointed Researcher, Graduate School of Engineering, The University of Tokyo
Research Interests.
Transfer learning (zero-shot learning), deep generative models (VAE), multimodal learning