-
Introducing Flan-UL2 20B: The Latest Addition to the Open-Source Flan Models
- 2023/03/03
- 再生時間: 3 分
- ポッドキャスト
-
サマリー
あらすじ・解説
Introducing Flan-UL2 20B: The Latest Addition to the Open-Source Flan Models Researchers have released a new open-source Flan 20B model that was trained on top of the previously open-sourced UL2 20B checkpoint. These checkpoints have been uploaded to Github, and technical details have been updated in the UL2 paper. The Flan series of models are designed to operate on a collection of diverse datasets phrased as instructions for generalisation across multiple tasks. The Flan datasets have now been open-sourced in the "The Flan Collection: Designing Data and Methods for Effective Instruction Tuning" paper (by Longpre et al.). The researchers have also released a series of T5 models ranging from 200M to 11B parameters that have been instruction tuned with Flan in the "Scaling Instruction-Finetuned Language Models (Chung et al.)", also known as the Flan2 paper. What is Flan Instruction Tuning? The key idea of Flan Instruction Tuning is to train a large language model on a collection of datasets phrased as instructions so that the model can generalise across diverse tasks. While Flan has been primarily trained on academic tasks, the researchers are planning to expand the scope of the model to cover other areas in the future. What's New with Flan-UL2 20B? The new Flan-UL2 20B checkpoint has been designed to improve the "usability" of the original UL2 model, which was trained exclusively on the C4 corpus. The UL2 objective trains the model on a mixture of denoisers with diverse span corruption and prefix language modelling tasks. There are two major updates that have been made to the UL2 20B model with Flan. The first update is the use of a receptive field of 2048 instead of the original 512, making it more usable for few-shot in-context learning. The second update is the removal of the mandatory mode switch tokens required for good performance. Instead, the researchers have continued training UL2 for an additional 100k steps (with a small batch) to forget mode tokens before applying Flan instruction tuning. Comparison to Other Models in the Flan Series Flan-UL2 20B outperforms Flan-T5 XXL on all four setups with an overall performance lift of +3.2% relative improvement. The gains on CoT versions of MMLU and BBH tasks have a much larger delta, with an increase of +7.4% for MMLU and +3.1% for BBH when compared to Flan-T5 XXL. Limitations of Flan While Flan models are cost-friendly, compact, and free, there are some limitations associated with these models. For example, Flan is primarily instruction-tuned on academic tasks, where outputs are typically short, academic, and traditional. This limitation means that Flan is mostly useful for academic tasks. Conclusion Flan-UL2 20B is a significant addition to the Flan series of models, as it expands the size ceiling of the current Flan-T5 models by approximately 2x. This new model has been designed to improve the usability of the original UL2 model and exhibits a substantial improvement in CoT capabilities. Researchers are excited to see what the community does with this new model, which is currently the best open-source model on the Big-Bench hard and MMLU.