-
Interviewing Eugene Vinitsky on self-play for self-driving and what else people do with RL
- 2025/03/12
- 再生時間: 1 時間 9 分
- ポッドキャスト
-
サマリー
あらすじ・解説
Eugene Vinitsky is a professor a New York University department of Civil and Urban Engineering. He’s one of my original reinforcement learning friends from when we were both doing our Ph.D.’s in RL at UC Berkeley circa 2020. Eugene has extensive experience in self-driving, open endedness, multi-agent reinforcement learning, and self-play with RL. In this conversation we focus on a few key topics:* His latest results on self-play for self-driving and what they say about the future of RL,* Why self-play is confusing and how it relates to the recent takeoff of RL for language models, and* The future of RL in LMs and elsewhere.This is a conversation where we take the time to distill very cutting edge research directions down into the core essences. I felt like we were learning in real time what recent developments mean for RL, how RL has different scaling laws for deep learning, and what is truly salient about self-play.The main breakthrough we discuss is scaling up self-play techniques for large-scale, simulated reinforcement learning. Previously, scaling RL in simulation has become economical in single-agent domains. Now, the door is open to complex, multi-agent scenarios where more diversity is needed to find solutions (in this case, that’s what self play does).Eugene’s Google Scholar | Research Lab | Linkedin | Twitter | BlueSky | Blog (with some great career advice).Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Show outline & linksWe cover many papers in this podcast. Also, as an experiment, here’s a Deep Research report on “all the papers that appeared in this podcast transcript.”In this episode, we cover:* Self-play for self-driving, mostly around the recent paper Robust Autonomy Emerges from Self-Play (Cusumano-Towner et al. 2025). The simulator they built powering this is Gigaflow. More discussion on HackerNews.(Here’s another self-play for self-driving paper and another from Eugene from earlier this year).A few highlights:“All simulated agents use the same neural net with the same weights, albeit with randomized rewards and conditioning vector to allow them to behave as different types of vehicles with different types of aggressiveness. This is like driving in a world where everyone is different copies of you, but some of your copies are in rush while others are patient. This allows backprop to optimize for a sort of global utility across the entire population.”“The resulting policy simulates agents that are human-like, even though the system has never seen humans drive.”* Large Language Models are In-context Preference Learners — how language models can come up with reward functions that will be applied to RL training directly. Related work from Stanford.* Related literature from Interconnects! The first includes literature we mention on the learning locomotion for quadrupeds with deep RL (special shoutout as usual to Marco Hutter’s group).* Recent and relevant papers Value-based RL Scales Predictably, Magnetic control of tokamak plasmas through deep reinforcement learning.* Other things we mention:* Cruise, Tesla, and Waymo’s autonomy stacks (speculation) and how the self-driving industry has changed since we were / were considering working in it.* Evo 2 foundation model for biology.* Eugene is working with a new startup on some LLM and RL stuff. If you’re interested in this episode, ping eugene@aitco.dev. Not a paid promotion.Chapters* 00:00:00 Introduction & RL Fundamentals* 00:11:27 Self‑Play for Self‑Driving Cars* 00:31:57 RL Scaling in Robotics and Other Domains* 00:44:23 Language Models and In-Context Preference Learning* 00:55:31 Future of RL and Grad School AdviceTranscriptI attempted to generate with ElevenLab’s new Scribe tool, but found the formatting annoying and reverted back to Alessio’s smol-podcaster. If you’re interested in working part-time as an editorial aide to Interconnects, please get in touch.Nathan Lambert [00:01:27]: Hey, Eugene. Welcome to the show.Eugene Vinitsky [00:01:29]: Hey, Nathan. Thanks for having me. Excited to be here.Nathan Lambert [00:01:32]: Yeah, so I'll have said this in the intro as well, but we definitely go well back in all the way to Berkeley days and RL days, I think.I will embarrass you a little bit now on the live read, which is like, you were one of the people when I was switching into RL, and they're like, oh, it seems like you only figured out how to get into AI from a potentially different background, and that's what I was trying to do in 2017 and 2018.So that was kind of fun, and now we're just friends, which is good.Eugene Vinitsky [00:02:01]: Yeah, we were both figuring out. If I had any lead over you there, I was also frantically trying to figure it out, because I was coming from a weird background.Nathan Lambert [00:02:11]: There are definitely a lot of people that do that now and over-attribute small time deltas to ...