Special Project Researcher Takeshi Kojima, who joined the Matsuo-Iwasawa lab in 2020 as a Ph.D. student, has remained with the lab since receiving his Ph.D., working primarily on research and development of large-scale language models (LLMs).His paper on thediscovery that specifying “Let’s think step by step” increases the correct response rate of AI hasbeen cited more than 2000 times (as of May 24, 2024) in less than two years since its publication, attracting the attention of AI researchers around the world. We interviewed Dr. Kojima, who is also the leader of the development of a large-scale language model with 10 billion parameters and the leader of the development support team for a project adopted by METI’s GENIAC, about his latest LLM research and the appeal of the Matsuo/Iwasawa Lab.
Co-authoring papers with prominent AI researchers
Opportunities to work with prominent AI researchers
-Your representative paper, commonly known as the “step by step paper,” has been cited more than 2,000 times in less than two years since its publication. Could you give us an overview again?
This is a paper titled ” Large Language Models are Zero-Shot Reasoners,” which was published in May 2022 when I was a doctoral student after joining the Matsuo-Iwasawa Laboratory, showing that LLMs (large-scale language models) are capable of zero-shot multistep reasoning. The data shows that LLMs (Large Language Models) are capable of zero-shot multistage reasoning.
Zero-shot multistage reasoning means that LLMs can perform a given complex task without any Few-Shot examples in the prompts (instructions). However, the LLMs with vast amounts of knowledge are not able to do this. However, I believed that by providing special prompts to LLMs with vast amounts of knowledge, zero-shot inference would be possible.
For more information, click here>>
Large Language Models are Zero-Shot Reasoners
Specifically, by giving the prompt “Let’s think step by step,” I showed that LLMs can perform logical reasoning on new tasks for which they have not been previously trained. The LLM used in the experiment was Instruct GPT, the successor to GPT-3, which was used to solve problems on MultiArith (a data set that assesses mathematical reasoning ability) before and after the “Let’s think step by step” prompt, We observed similar behavior with other LLMs, such as Google’s PaLM.
I got the idea from a paper published in January 2022 by Google researchers titled “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. I thought that this could be done with zero-shot. It took me about a month or two to write it up, partly because it was an area that no one had worked on yet.
For more information, click here>>
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
-How has the publication of this paper changed your life?
After the publication of this paper, ChatGPT appeared at the end of November 2022, and “prompt engineering” became a major topic of conversation. We are proud that we were able to present one of the sources of LLM utilization research that is discussed in this context. In addition, the paper has been cited by many researchers around the world, and I have had more and more opportunities to use the paper as a business card at domestic and international conferences, where I have been able to have lively conversations with the people involved. However, what I think again is that I came up with this idea only because I was in the Matsuo-Iwasawa Laboratory.
When I came up with the idea for the step by step paper, I reported on a recent research topic I was thinking about during our weekly online meeting, and Dr. Iwasawa shared the related previous research with me via Slack. Dr. Iwasawa shared the relevant previous research with me via Slack. Then, Dr. Reed Machel O’Neill (currently affiliated with Google Brain) shared a recently published paper on “Few-shot CoT (Chain-of-Thought)” on Slack, which I read and was shocked, and step by step After reading it, I decided on the direction of the paper.
From there, it was just a lot of trial and error. At first, we were trying to design a complex inference format by human beings (i.e., manually combine zero-shots to solve a problem), but we tried various ideas and failed at every step. When there was nothing left to do, I thought that it would be better to let the LLM itself, rather than humans, think about the complex thought process, and the phrase “Let’s think step by step” suddenly came to me, and I tried it with no luck and it worked (laughs).
After that, when I reported the results of the initial experiments at our weekly online meeting, Dr. Iwasawa said they were interesting, and he also approached Dr. Shayne Gou, who was currently affiliated with OpenAI at the time (now affiliated with Google DeepMind), and we quickly began talking about co-authoring a paper.Because I was in daily communication with world-class researchers in this way, I was able to see the hottest papers at the most important time for me as a fledgling researcher. It is not that “Chance favors the prepared mind,” but this is where I realized the importance of being well prepared and taking a good position.
I have a liberal arts background with an undergraduate degree in international relations and a master’s degree in economics.
Please tell us again about your career before joining the Matsuo-Iwasawa Lab.
I joined the Matsuo-Iwasawa Laboratory in 2020 as a Ph. At that time, I was a company employee working for Peach Aviation Corporation, and I was mainly working on database development as an AI engineer.
During my undergraduate years, I majored in International Relations. Actually, I have a liberal arts background. After graduating, I took a SE position at an NEC affiliate, where I worked mainly on database management within the company using SQL and other tools. This is where I started my career as an engineer.After that, I entered a master’s program at Kyoto University in 2014. Again, my major was economics. What I was doing was empirical analysis using econometric models, and I wrote my master’s thesis on the Chinese economy using data analysis. I wrote my master’s thesis on the Chinese economy using data analysis. I learned data science methods such as multiple regression analysis here.
-What was your first encounter with deep learning?
Toward the end of my master’s program. I was like, “Hey, there’s an interesting technology out there. After that, I took a personal interest and taught myself about deep learning while I was working at the company.
The first time I spoke with Dr. Matsuo was in 2019. I was there for a presentation at the Japan Society for Artificial Intelligence (JSAI) National Conference held in Niigata that year. I knew Dr. Matsuo, of course, so I greeted him at the reception and told him that I was interested in basic research on deep learning, and he introduced me to Dr. Iwasawa, who said, “Well, then,” and we went straight to the second meeting, which I also attended …… I remember it was like that. Later, I told him again that I wanted to do research in the doctoral program, and after taking the stipulated examinations, I was officially accepted to participate in basic research as a doctoral student.
Abundant computational resources and a research environment that draws inspiration from other fields
What was your impression of the Matsuo-Iwasawa Laboratory when you were a doctoral student?
I had the impression that there were many students who had a strong awareness of their desire to solve social issues through new technologies. Also, I am a liberal arts major, and up until my master’s program, I thought that research was something you did alone, but the Matsuo-Iwasawa Lab was completely different. Research is basically conducted in teams, and there are many opportunities for communication among the members using tools such as Slack. And the amount of enthusiasm felt through the communication was amazing. I was able to make contacts with senior venture entrepreneurs who had come from the Matsuo Lab, and it was a treasure trove of stimulation for me on a daily basis.
In terms of physical resources, I was surprised at the abundance of computing resources. We have on-premise servers in the lab and a dedicated infrastructure team. In addition, there is a project that utilizes AIST’s ABCI (cloud computing infrastructure for artificial intelligence processing). I don’t think there are many organizations at other universities that can do this much at the laboratory level.
-Why did you decide to stay in the laboratory after completing the doctoral program?
I wanted to continue basic research simply because it is interesting. Of course, there was an option to belong to a company and engage in AI research, but I thought that the strength of academia was that I could conduct research following my pure curiosity. I was also attracted by the sense of togetherness of taking on challenges together with familiar members, and became a member of the Matsuo-Iwasawa Laboratory as a specially-appointed researcher in April 2023.
I believe that one of the strengths of the Matsuo-Iwasawa Laboratory lies in its “broad tolerance. What I mean is that we have not only basic research on deep learning, but also research teams in robotics and cognitive science, and we have an environment that allows us to be exposed to a wide range of research, from natural language processing to image recognition and deep generative modeling.
Normal laboratories would specialize in specialized research such as “natural language processing” or “image generation. On the other hand, the Matsuo-Iwasawa Lab allows interdisciplinary research at the level of a single laboratory, and for some researchers, this is a great advantage. For example, at first I was mainly engaged in natural language processing research, but inspired by a presentation on image processing by a member who was working next to me, I started a new research project on image recognition and was able to produce results.
This may sound abstract, but researchers tend to get stuck in one hole, but there are cases where a breakthrough can be found by taking the opportunity to broaden one’s perspective. It is important to take a road trip. This kind of thing doesn’t happen if you are doing AI research in a company seeking only efficiency.
Developing a Japanese LLM with 10 billion parameter size by our own hands
-What kind of missions have you been working on since you became a project researcher in the Matsuo-Iwasawa Laboratory in April 2023?
In April 2023, I was suddenly called upon todevelop“Weblab-10B. This was a project to create an LLM originating from Matsuo Lab, aiming to realize a large-scale Japanese language model with a parameter size of 10 billion and support for two languages, Japanese and English, which was the highest level in Japan (at that time).
To be honest, I was puzzled when I heard about the project. I had been doing research using LLMs, and now I was suddenly asked to create an LLM. However, my feeling of “this sounds interesting” outweighed my apprehension, and I felt that I would be able to understand what I could not understand even if I made full use of LLMs.
However, as it turns out, even if you create an LLM, you still don’t know what’s inside (laughs). I realized that “creating” and “understanding” are two different things. From my perspective as a researcher, LLM is like a new organism from an unknown planet. So, first of all, I am trying to understand how it realizes complex human thought by observing it in detail.
-What new knowledge have you gained from creating LLMs with your own hands?
As I said, “In the end, we don’t know what’s inside,” but there are many things we have learned. For example, in Weblab-10B, we had the students learn two languages, Japanese and English, simultaneously in a bilingual environment, and then fine-tuned them mainly in English. The results showed improved performance on Japanese as well as English tasks, confirming knowledge transfer between the languages. These findings motivated us to study the internal behavior of multilingual large-scale language models, which will be discussed later.
There is also a lot of knowledge gained on the technical side. As a researcher, it is of great value to have experienced the pipeline (a series of tasks) for building a modern LLM. Although I had experience in machine learning using large-scale servers during my doctoral program, writing a program that connects multiple servers and performs multi-node learning was a completely different challenge.
In fact, developing an LLM with a size of 10 billion parameters from scratch would be very costly. Tokyo Institute of Technology’s Swallow (Japanese LLM), for example, is based on Meta’s Llama 2, so in terms of cost there is a difference of several times for the same model size. I think the ability to take on research on such a huge scale is one of the major advantages of the Matsuo-Iwasawa Laboratory.
Tackling new research themes at LLM, such as multilingual processing and token compression
-Since the beginning of 2024, you have been participating in projects adopted by GENIAC, haven’t you?
GENIAC (Generative AI Accelerator Challenge) is a project led bythe Ministry of Economy, Trade and Industry and NEDOto develop basic AI models. I am providing technical support as a leader of the development support team for the 50 billion parameter size LLM development project adopted by GENIAC, and I consider it my mission to pass on the knowledge gained from the development of Weblab-10B to the next generation.
I have also recently started a research group called Understanding and Controlling LLM (UCLLM) and am engaged in research activities with many members, including intern students. We currently have about 15 members (as of May 2024). The majority are students, and some are working professionals who are doing dual jobs.
Here, we are engaged in both research and development related to LLM. Regarding the former, we are promoting research mainly focused on understanding the principle of operation in order to pursue the reasons for the dramatic performance development of LLM, research on social risks such as Unlearning, Bias, and Hallucination*, and research on the development of large-scale language models using large-scale language models, etc. This activity was launched in earnest around December 2023, and some of the research results have already been submitted to international conferences (awaiting results).
*New issues that AI researchers need to face are emerging, such as Unlearning=technology to remove part of what has been learned from already trained AI, Bias=problem of human bias (bias) in AI learning content, and Hallucination=problem of AI output that is not true.
With regard to the latter, this is what the aforementioned development support in the GENIAC project was all about: standardized code was built and made publicly available to enable LLM development to be done in a single step. Several teams in the competition have developed based on this standardized code.
A variety of people are involved in the GENIAC project, including students from the University of Tokyo, students from other universities, researchers, and working people who work in companies. There are also many internship students participating in UCLLM who are not members of our laboratory. If you are interested in these studies, we would like to invite you to apply for an internship. It is easy to become paralyzed when you are in a laboratory, but the abundant opportunities to get to know such excellent people outside the laboratory are a major attraction of doing research in the Matsuo-Iwasawa Laboratory.
-What are your goals for the future as an LLM researcher?
After the development of Weblab-10B, I have become very interested in multilingual LLMs, and modern LLMs such as GPT-4 are capable of producing excellent multilingual output. We have found that the reason for this is the concentration of language-specific neurons in the first and last layers of the neural network that underlies the language model. In our latest paper, we analyze six languages (English, German, French, Spanish, Chinese, and Japanese) and show in various experiments what role the neurons in each language play in text output.
For more information, click here>>
On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
The other issue is whether it is possible to compress LLM prompts that have become long-winded. While the “step by step paper” found a new way to utilize LLMs, it also presented a method of creating long prompts for step-by-step processing. For example, one of the prompts is to read a 1000-character long sentence, right? We believe that there may be a way to compress this into 3 characters and have it read, thereby increasing the efficiency of the computation. Token compression techniques are one of the trends in LLM research.
-Finally, do you have a message for your future colleagues?
LLM development is now a battle of “Scaling Laws”. This is a simple law that states that as the number of parameters, training and computation increases, the performance of LLM improves on the same scale. In other words, we are in a world where corporate organizations like Google and Open AI are fighting an overwhelming battle with their massive capital. However, there are always things that can only be done in the world of academia. If there are people who have the mentality to think through their own issues and research problems and never give up until they are solved, I would love to do research with them. It doesn’t matter whether you have a liberal arts or science background. What is important is passion. Let’s pursue new possibilities as researchers in the Matsuo-Iwasawa Laboratory.
In this way, the Matsuo Lab is promoting LLM research to realize our vision of “Creating Intelligence. If you are at all interested, please come and talk with us at a casual interview.
Click here to apply for a specially-appointed researcher, specially-appointed assistant professor, or specially-appointed lecturer position at the Matsuo Lab.