『Interconnects』のカバーアート

Interconnects

Interconnects

著者: Nathan Lambert
無料で聴く

このコンテンツについて

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories.

www.interconnects.aiInterconnects AI, LLC
科学
エピソード
  • Olmo 3: America’s truly open reasoning models
    2025/11/20
    We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents:* The best 32B base model.* The best 7B Western-origin thinking & instruct models.* The first 32B (or larger) fully open reasoning model.This is a big milestone for Ai2 and the Olmo project. These aren’t huge models (more on that later), but it’s crucial for the viability of fully open-source models that they are competitive on performance – not just replications of models that came out 6 to 12 months ago. As always, all of our models come with full training data, code, intermediate checkpoints, training logs, and a detailed technical report. All are available today, with some more additions coming before the end of the year.As with OLMo 2 32B at its release, OLMo 3 32B is the best open-source language model ever released. It’s an awesome privilege to get to provide these models to the broader community researching and understanding what is happening in AI today.Paper: https://allenai.org/papers/olmo3 Artifacts: https://huggingface.co/collections/allenai/olmo-3Demo: https://playground.allenai.org/ Blog: https://allenai.org/blog/olmo3 Base models – a strong foundationPretraining’s demise is now regularly overstated. 2025 has marked a year where the entire industry rebuilt their training stack to focus on reasoning and agentic tasks, but some established base model sizes haven’t seen a new leading model since Qwen 2.5 in 2024. The Olmo 3 32B base model could be our most impactful artifact here, as Qwen3 did not release their 32B base model (likely for competitive reasons). We show that our 7B recipe competes with Qwen 3, and the 32B size enables a starting point for strong reasoning models or specialized agents. Our base model’s performance is in the same ballpark as Qwen 2.5, surpassing the likes of Stanford’s Marin and Gemma 3, but with pretraining data and code available, it should be more accessible to the community to learn how to finetune it (and be confident in our results).We’re excited to see the community take Olmo 3 32B Base in many directions. 32B is a loved size for easy deployment on single 80GB+ memory GPUs and even on many laptops, like the MacBook I’m using to write this on.A model flow – the lifecycle of creating a modelWith these strong base models, we’ve created a variety of post-training checkpoints to showcase the many ways post-training can be done to suit different needs. We’re calling this a “Model Flow.” For post-training, we’re releasing Instruct versions – short, snappy, intelligent, and useful especially for synthetic data en masse (e.g. recent work by Datology on OLMo 2 Instruct), Think versions – thoughtful reasoners with the performance you expect from a leading thinking model on math, code, etc. and RL Zero versions – controlled experiments for researchers understanding how to build post-training recipes that start with large-scale RL on the base model.The first two post-training recipes are distilled from a variety of leading, open and closed, language models. At the 32B and smaller scale, direct distillation with further preference finetuning and reinforcement learning with verifiable rewards (RLVR) is becoming an accessible and highly capable pipeline. Our post-training recipe follows our recent models: 1) create an excellent SFT set, 2) use direct preference optimization (DPO) as a highly iterable, cheap, and stable preference learning method despite its critics, and 3) finish up with scaled up RLVR. All of these stages confer meaningful improvements on the models’ final performance.Instruct models – low latency workhorsesInstruct models today are often somewhat forgotten, but the likes of Llama 3.1 Instruct and smaller, concise models are some of the most adopted open models of all time. The instruct models we’re building are a major polishing and evolution of the Tülu 3 pipeline – you’ll see many similar datasets and methods, but with pretty much every datapoint or training code being refreshed. Olmo 3 Instruct should be a clear upgrade on Llama 3.1 8B, representing the best 7B scale model from a Western or American company. As scientists we don’t like to condition the quality of our work based on its geographic origins, but this is a very real consideration to many enterprises looking to open models as a solution for trusted AI deployments with sensitive data.Building a thinking modelWhat people have most likely been waiting for are our thinking or reasoning models, both because every company needs to have a reasoning model in 2025, but also to clearly open the black box for the most recent evolution of language models. Olmo 3 Think, particularly the 32B, are flagship models of this release, where we considered what would be best for a reasoning model at every stage of training.Extensive effort (ask me IRL about more war stories) went into every stage of the post-training of the Think models. We...
    続きを読む 一部表示
    11 分
  • Why AI writing is mid
    2025/11/17
    First, on the topic of writing, the polished, and more importantly printed, version of my RLHF Book is available for pre-order. It’s 50% off for a limited time, you can pre-order it here! Like a lot of writing, I’ve been sitting on this piece for many months thinking it’s not contributing enough, but the topic keeps coming up — most recently via Jasmine Sun — and people seem to like it, so I hope you do too!It’s no longer a new experience to be struck by just how bad AI models are at writing good prose. They can pull out a great sentence every now and then, particularly models like GPT-5 Pro and other large models, but it’s always a quick comment and never many sustained successive sentences. More importantly, good AI writing feels like a lucky find rather than the result of the right incantation. After spending a long time working training these models, I’m fairly convinced that this writing inhibition is a structural limitation to how we train these models today and the markets they’re designed to serve.If we're making AIs that are soon to be superhuman at most knowledge work, that are trained primarily to predict text tokens, why is their ability to create high quality text tokens still so low? Why can’t we make the general ChatGPT experience so much more refined and useful for writers while we’re unlocking entirely new ways of working with them every few months — most recently the CLI agents like Claude Code. This gap is one of my favorite discussions of AI because it’s really about the definition of good writing is in itself.Where language models can generate beautiful images from random noise, they can't reliably generate a good few sentences from a couple bullet points of information. What is different about the art form of writing than what AI can already capture?I'm coming to believe that we could train a language model to be a great writer, but it goes against so many of the existing training processes. To list a few problems at different stages of the stack of varying severity in terms of their handicapping of writing:* Style isn’t a leading training objective. Language models all go through preference training where many aspects from helpfulness, clarity, honesty, etc. are balanced against each other. Many rewards make any one reward, such as style, have a harder time standing out. Style and writing quality is also far harder to measure, so it is less likely to be optimized vis-a-vis other signals (such as sycophancy, which was easier to capture).* Aggregate preferences suppress quirks. Language model providers design models with a few intended personalities, largely due to the benefits of predictability. These providers are optimizing many metrics for "the average user." Many users will disagree on what their preference for “good writing” is.* Good writing’s inherent friction. Good writing often takes much longer to process, even when you’re interested in it. Most users of ChatGPT just want to parse the information quickly. Doubly, the people creating the training data for these models are often paid per instance, so an answer with more complexity and richness would often be suppressed by subtle financial biases to move on.* Writing well is orthogonal to training biases. Throughout many stages of the post-training process, modern RLHF training exploits subtle signals for sycophancy and length-bias that aren't underlying goals of it. These implicit biases go against the gradient for better writing. Good writing is pretty much never verbose.* Forced neutrality of a language model. Language models are trained to be neutral on a variety of sensitive topics and to not express strong opinions in general. The best writing unabashedly shares a clear opinion. Yes, I’d expect wackier models like Grok to potentially produce better writing, even if I don’t agree with it. This leads directly to a conflict directly in something I value in writing — voice.All of these create models that are appealing to broad audiences. What we need to create a language model that can write wonderfully is to give it a strong personality, and potentially a strong "sense of self" — if that actually impacts a language model's thinking. The cultivation of voice is one of my biggest recommendations to people trying to get better at writing, only after telling them to find something they want to learn about. Voice is core to how I describe my writing process.When I think about how I write, the best writing relies on voice. Voice is where you process information into a unique representation — this is often what makes information compelling.Many people have posited that base models make great writers, such as when I discussed poetry with Andrew Carr on his Interconnects appearance, but this is because base models haven’t been squashed to the narrower style of post-trained responses. I’ve personally been thinking about this sort of style induced by post-training recently as we ...
    続きを読む 一部表示
    8 分
  • Interview: Ant Group's open model ambitions
    2025/11/12
    This is the first of a handful of interviews I’m doing with teams building the best open language models of the world. In 2025, the open model ecosystem has changed incredibly. It’s more populated, far more dominated by Chinese companies, and growing. DeepSeek R1 shocked the world and now there are a handful of teams in China training exceptional models. The Ling models, from InclusionAI — Ant Group’s leading AI lab — have been one of the Chinese labs from the second half of the year that are releasing fantastic models at a rapid clip. This interview is primarily with Richard Bian, who’s official title is Product & Growth Lead, Ant Ling & InclusionAI (on LinkedIn, X), previously leading AntOSS (Ant Group’s open source software division). Richard spent a substantial portion of his career working in the United States, with time at Square, Microsoft, and an MBA from Berkeley Haas, before returning to China and work at Ant.Also joining are two leads of the Ant Ling technical team, Chen Liang (Algorithm Engineer), and Ziqi Liu (Research Lead).This interview focuses on many topics of the open language models, such as:* Why is the Ant Group — known for the popular fintech app AliPay — investing so much in catching up to the frontier of AI?* What does it take to rapidly gain the ability to train excellent models?* What decisions does one make when deciding a modeling strategy? Text-only or multimodal? What size of models?…* How does the Chinese AI ecosystem prioritize different directions than the West?And many more topics. Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Some more references & links:* InclusionAI’s homepage, highlighting their mission.* AntLingAGI on X (models, research, etc.), InclusionAI on X (overall initiative), InclusionAI GitHub, or their Discord community.* Ling 1T was highlighted in “Our Picks” for our last open model roundup in October.* Another interview with Richard at State of Open Conference 2025.* Over the last few months, our coverage of the Chinese ecosystem has taken off, such as our initial ranking of 19 open Chinese AI labs (before a lot of the models we discuss below), model roundups, and tracking the trajectory of China’s ecosystem. An overview of Ant Ling & Inclusion AIAs important context for the interview, we wanted to present an overview of InclusionAI, Ant’s models, and other efforts that emerged onto the scene just in the last 6-9 months. To start — branding.Here’s a few screenshots of InclusionAI’s new website. It starts with fairly standard “open-source AI lab messaging.”Then I was struct by the very distinct messaging which is surprisingly rare in the intense geopolitical era of AI — saying AI is shared for humanity.I expect a lot of very useful and practical messaging from Chinese open-source labs. They realize that Western companies likely won’t pay for their services, so having open models is their only open door to meaningful adoption and influence.Main models (Ling, Ring, & Ming)The main model series is the Ling series, their reasoning models are called Ring, and their Multimodal versions are called Ming. The first public release was Ling Plus, 293B sparse MoE in April. They released the paper for their reasoning model in June and have continued to build on their MoE-first approach.Since then, the pace has picked up significantly. Ling 1.5 came in July.Ling (and Ring) 2.0 came in September of this year, with a 16B total, 2B active mini model, an 100B total, 6B active flash model, and a big 1T total parameter 50B active primary model. This 1T model was accompanied by a substantial tech report on the challenges of scaling RL to frontier scale models. The rapid pace that Chinese companies have built this knowledge (and shared it clearly) is impressive and worth considering what it means for the future.Eval scores obviously aren’t everything, but they’re the first step to building meaningful adoption. Otherwise, you can also check out their linear attention model (paper, similar to Qwen-Next), some intermediate training checkpoints, or multimodal models.Experiments, software, & otherInclusionAI has a lot of projects going in the open source space. Here are some more highlights:* Language diffusion models: MoEs, sizes similar to Ling 2.0 mini and flash (so they likely used those as base). Previous versions exist. * Agent-based models/fine-tunes, Deep Research models, computer-use agentic models.* GroveMoE, MoE arch experiments.* RL infra demonstrations (Interestingly, those are dense models)* AWorld: Training + general framework for agents (RL version, paper)* AReal: RL training suite Interconnects is a reader-supported publication. Consider becoming a subscriber.Chapters* 00:00:00 A frontier lab contender in 8 months* 00:07:51 Defining AGI with metaphor* 00:20:16 How the lab was born* 00:23:30 Pre-training paradigms* 00:40:25 Post training...
    続きを読む 一部表示
    1 時間 18 分
まだレビューはありません