『Interconnects』のカバーアート

Interconnects

Interconnects

著者: Nathan Lambert
無料で聴く

今ならプレミアムプランが3カ月 月額99円

2026年5月12日まで。4か月目以降は月額1,500円で自動更新します。

概要

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories.

www.interconnects.aiInterconnects AI, LLC
科学
エピソード
  • Notes from inside China's AI labs
    2026/05/07
    Staring out the window on a new, high-speed train from Hangzhou to Shanghai I’m gifted with views of dramatic ridgelines speckled with wind turbines that are silhouetted against the setting sun. The mountains cast a backdrop to a mix of spanning fields and clustered skyscrapers. I’m returning from China with great humility. It’s a very warming, human experience to go somewhere so foreign and be so welcomed. I had the honor of meeting so many people in the AI ecosystem who I knew from afar, and they greeted me with big smiles and cheer, reminding me how global my work and the AI ecosystem is.Interconnects AI is a reader-supported publication. Consider becoming a subscriber.The mentality of Chinese researchersThe Chinese companies building language models are set up as the perfect fast-followers for the technology, building on long-standing cultural traditions in education and work, along with subtly different approaches to building technology companies. When you look at the outputs, the latest, biggest models enabling agentic workflows, and the ingredients, excellent scientists, large-scale data, and accelerated computing, the Chinese and American labs look largely similar. The lasting differences emerge in how these are organized and conditioned.I’ve long thought that a reason that the Chinese labs are so good at catching up and keeping up with the frontier is that they’re culturally aligned for this task, but without talking to people directly I felt like it wasn’t my place to attribute substantial influence to this hunch. Speaking with many wonderful, humble, and open scientists at the leading Chinese labs has crystallized a lot of my beliefs.So much of building the best LLMs today comes down to meticulous work across the entire stack, from data to architecture details and RL algorithm implementations. All points of the model can give some improvements, and fitting them in together is a complex process where the work of some brilliant individuals needs to get shelved in favor of the overall model maximizing a multi-objective optimization.Where American researchers are obviously also brilliant at solving the individual components, there’s more of a culture of speaking up for yourself in the U.S. As a scientist, you’re more successful when you speak up for your work and modern culture is pushing the new path to fame of “leading AI scientists”. This results in direct conflict. The Llama organization is heavily rumored to have collapsed under the political weight of these interests embedding themselves in a hierarchical organization. I’ve heard of other labs saying that it can be needed to pay off a top researcher to get them to stop complaining about their idea not making it in the final model. Whether or not that’s exactly true, the idea is clear. Ego and desires for career advancement do get in the way of making the best models. A small, directional shift in this sort of culture between the U.S. and China can have a meaningful impact on the final outputs.Some of this has to do with who is building the models in China. There’s an immediate reality at all of the labs that a large proportion of the core contributors are active students. The labs are quite young, and it reminds me of our setup at Ai2, where students are seen as peers and directly integrated in the LLM team. This is incredibly different from the top labs in the US, where the likes of OpenAI, Anthropic, Cursor, etc. simply don’t offer internships. Other companies like Google nominally have internships related to Gemini, but there’s a lot of concern about whether your internship will be siloed and away from anything real.To summarize how the slight change in culture can improve the ability to build models:* More willingness to do non-flashy work in order to improve the final model,* People new to building AI can be free of prior phases of AI hype cycles, allowing them to adapt to the new modern techniques faster (in fact, one of the Chinese scientists I talked to really actively attached to this strength),* Less ego enabling org charts to scale slightly, as there’s less gamifying the system, and* Abundant talent well-suited to solving problems with a proof of concept elsewhere, etc.This slight inclination towards skills that complement building today’s language models stands in contrast to a known stereotype that Chinese researchers tend to produce less creative, field-spawning, 0-to-1 academic style research. Among the more academic lab visits on our trip, many leaders talk about cultivating this more ambitious research culture. At the same time, some technical leaders we talked to were skeptical about whether such a rewiring in the approach to science is likely in the near term, because it’ll take a redesign of the education and incentive systems that is too big to happen within the current economic equilibrium. This culture seems to be training students and engineers that are excellent at the LLM ...
    続きを読む 一部表示
    17 分
  • The distillation panic
    2026/05/04
    ‘Distillation attacks’ is a horrible term for what is happening right now. Yes, some Chinese labs are hacking or jailbreaking APIs to attempt to extract more signal from model APIs — stopping this is important to maintain the U.S.’s lead in AI capabilities. Referring to this as distillation attack is going to irrevocably associate all distillation with this behavior, and distillation generally is a core technique needed to diffuse AI capabilities broadly through academic and economic activities.We went through this sort of language transition with the open source vs open weight debate. All the terms just reduced to open models – very few people in the large AI community know exactly how open-source differs from open-weights. And terminology matters, as the less informed people who still care about — and influence — the technology are bound by different terms they use. If we’re not careful with the discourse around distillation, many people could associate this broad technique used for research and development of new models as an act at the boundary of corporate manipulation and crime.I’ve recently written a more technical piece on estimating how impactful state-of-the-art distillation methods are on leading Chinese models, and this piece follows to push for caution in any hasty actions to target the methods with policy. To set the stage, recall Anthropic’s recent blog post where they detailed “distillation attacks” made by 3 Chinese labs.These labs used a technique called “distillation,” which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.This is a clever paragraph, where they normalize distillation generally and explain how a few people can use it illicitly, without detailing how illicit use often involves other more explicit behavior like jailbreaking, hacking, or identity spoofing of the API.Distillation itself is an industry standard. It’s used extensively, primarily in post-training, by smaller players to create specialized or smaller models. In my book coming this summer, I describe it as follows:The term distillation has been the most powerful form of discussion around the role of synthetic data in language models. Distillation as a term comes from a technical definition of teacher-student knowledge distillation from the deep learning literature.Distillation colloquially refers to using the outputs from a stronger model to train a smaller model.In post-training, this general notion of distillation takes two common forms:* As a data engine to use across wide swaths of the post-training process: Completions for instructions, preference data (or Constitutional AI), or verification for RL.* To transfer specific skills from a stronger model to a weaker model, which is often done for specific skills such as mathematical reasoning or coding.With this definition, it’s easy to see how distillation takes many forms. Of course, if you just take the outputs from GPT-5.5 and train a recent open-weight base model with them to host a competitive product, that’s one thing. But, a lot of the things that fall under the bucket of distillation are complex, multi-stage processes that muddle the exact impact of the model you distilled from.Modern LLM processes could look like using a GPT API to build an initial batch of synthetic data to build a specialized small data-processing model. A good example is a model like olmOCR (or many other models in this category) that are trained to convert PDFs to clean text. This specialized model would be used to create large amounts of data. Finally, you train another model (often from scratch) with the new data you created. Is this final model distilled from GPT?When done via a closed, API-based model, distillation sits in the grey area of the terms of service that you agree to when signing up to the Claude or GPT platform. They generally forbid the use of the API to create competing language model products, but this term has largely gone unenforced. The open-source community used to worry deeply at being cut off from these cutting-edge APIs for doing research or creating public datasets, but to date only one prominent case of corporate accounts being restricted exists (at least until the recent Chinese companies).This is all to say that distillation is an industry standard technique, and the use of closed APIs to perform distillation has always been a grey area. Nvidia’s latest Nemotron models, as one of the only models with open post-training datasets, are technically in large part distilled from Chinese,...
    続きを読む 一部表示
    9 分
  • My bets on open models, mid-2026
    2026/04/15
    We’re living through the period of time when we’ll learn if open models can keep up with closed labs. The obvious answer is that no, they won’t. This answer is a form of saying they won’t keep up in every area. This framing closes off a popular prediction where the open models completely catch up, as in all models saturate and open and closed models only become increasingly similar. In living through this, it’s evidently very unclear when the longer-term stable balance of capabilities will solidify. This is a very complex dynamic, where the core point we monitor is a capability gap between models. At the same time, this gap is intertwined with evolving dynamics in the funding of open models, who builds open models, how techniques like distillation that enable fast-following translate through new application domains, potential regulation hampering the open-source AI ecosystem, and of course who actually uses open models. The capabilities gap is one signal in a complex sea of forces, pushing supply and demand into different shapes. In many cases the demand — where obviously tons of individuals, organizations, and sovereigns want, or need, open models — is largely separated from supply. Supply is fully dictated by economics. The question of “which business strategies support releasing open models” is still at stake.Interconnects AI is a reader-supported publication. To receive new posts and support my work, consider becoming a subscriber.With this complexity, I wanted to distill my key beliefs down into a clear list. These are downstream of 10+ pieces I’ve written or recorded on open models this spring (which are linked throughout).* It’s surprising that the top closed models did not show a growing capability margin over open models, based on compute differences for training and research, especially in the second half of 2025 and through today.* Open model labs are technically very strong at keeping pace on well-established benchmarks. This will continue and reflects a balance of abundant talent and sufficient computing power. * Chinese open-weight labs focus slightly more on benchmark scores than comparable closed labs in the U.S. Distillation helps the Chinese LLM companies do so, but it’s not a panacea. Changes in the distillation dynamic (e.g. regulation) will not be a determining factor on the balance of capabilities. This increase in focus is a natural evolution of their incentives in keeping the narrative on keeping up with the frontier alive, which is crucial to fundraising and adoption.* To date, closed models tend to be more robust and generally useful than similarly scoring open models. Closed models have certain hard-to-measure qualities that are not well captured in current or past benchmarks. This will be key to enabling closed models to dominate in markets where an individual user constantly presents new challenges, i.e. supporting knowledge workers as a direct assistant.* The open vs. closed model race, as monitored through benchmarks, will largely be a game of economic staying power and fast-following, until the market structure constricts. I expect Chinese open-weight labs to face funding difficulties first, as soon as later this year. Funding difficulties will be seen in different capability trajectories 3-9 months later.* The RL dominated training era has increased the relevance of distribution to real-world use-cases as a key factor in continued capabilities improvements. These are tasks where users directly use tools like Claude Code or Codex to solve problems in their job with agents. This is the first clear technical area that closed labs can dominate open-weight models on capabilities, potentially leveraging online RL directly based on user feedback.* Open models will be increasingly adopted in repetitive automation tasks, as measured in the relative share of the API market, for repetitive tasks across the ecosystem. This takes the form of many new AI-native applications, business backend automation, etc. The success of this will drive more investment in domain-specific, efficient open models.This is a complex picture, where the long-term trajectory is more of an economics question rather than an ability one. Many other outlets can paint a far more simplistic narrative that “China will assuredly catch us in AI” and get more distribution because it is a simple story. The reality is complex. Only real AI revenue begets more investment, eventually that’ll be linked to the ability to keep improving models at a rapid rate. Economic realities have not yet impacted scaling open models, as a general category.This economic-focused angle relates to my positions on the open model ecosystem more broadly.* Recurring calls to ban certain types of open models will continue to come but are in practice impossible to implement. Training strong AI models (i.e. near but not at the frontier) is a relatively small cost compared to large-scale deployments. E.g. if the U.S. ...
    続きを読む 一部表示
    7 分
adbl_web_anon_alc_button_suppression_c
まだレビューはありません