Mr. Jonathan Frankle, who is famous for “The Lottery Ticket Hypothesis” and is currently Chief Neural Network Scientist at Databricks, gave a lecture at the University of Tokyo.
Approximately 50 people, including researchers and students of the Matsuo-Iwasawa Laboratory, attended the lecture on “Training Modern LLMs from Scratch.
Speaker biography: Mr. Jonathan Frankle is currently the Chief Neural Network Scientist at Databricks. He is also leading the Mosaic Research lab, where his team of more than 30 research scientists is conducting empirical research on how neural networks learn with the goal of making modern generative AI models like LLMs and diffusion models more efficient. He also completed his PhD. in Computer Science at the Massachusetts Institute of Technology in 2023.
Title: Training Modern LLMs from Scratch
Abstract: This lecture describe the process of training contemporary LLMs from scratch based on my experience doing so at scale in industry with models like DBRX and MPT. The lecture began by explaining the fundamental design decisions that go into building a model and the cost of doing so, andl concluded with the logistics of training it, fine-tuning it, and aligning it with human preferences. Databricks believes in open science, so the lecturer openly shared details about how we train industrial-grade LLMs.
The contents of the lecture can also be viewed at the following link
Session overview: IN THE TRENCHES WITH DBRX: BUILDING A STATE-OF-THE-ART OPEN-SOURCE MODEL
Youtube: “In the Trenches with DBRX: Building a State-of-the-Art Open-Source Model”
At the Matsuo-Iwasawa Laboratory, there are opportunities to actively exchange opinions with people outside the laboratory.
We hope to continue this kind of active exchange across laboratory boundaries in the future.
Mr Jonathan, thank you very much for visiting the Matsuo Lab and we hope to meet you again in the future.