Powered by RND
PodcastsWetenschapInterconnects

Interconnects

Nathan Lambert
Interconnects
Nieuwste aflevering

Beschikbare afleveringen

5 van 116
  • On China's open source AI trajectory
    Hello everyone! I’m coming back online after two weeks of vacation. Thankfully it coincided with some of the slowest weeks of the year in the AI space. I’m excited to get back to writing and (soon) share projects that’ll wrap up in the last months of the year.It seemed like a good time to remind people of the full set of housekeeping for Interconnects. * Many people love the audio version of the essays (read by me, not AI). You can get them in your podcast player here. Paid subscribers can add private podcast feeds under “manage your subscription” where voiceover is available for paywalled posts.* The Interconnects Discord for paid subscribers continues to get better, and is potentially the leading paid perk amid the fragmentation of Twitter etc.* We’re going to be rolling out more perks for group subscriptions and experimental products this fall. Stay tuned, or get in touch if group discounts are super exciting for your company. For the time being, I’m planning trips and meetups across a few conferences in October. I’ll be speaking at The Curve (Oct. 3-5, Berkeley), COLM (Oct. 7-10, Montreal, interest form), and the PyTorch Conference (Oct. 21-24, SF) on open models, Olmo, and the ATOM Project, so stay tuned for meetups and community opportunities. On to the post!China is maneuvering to double down on its open AI ecosystem. Depending on how the U.S. and its allies change culture and mobilize investment, this could make the dominance of Chinese AI models this summer, from Qwen, Kimi, Z.ai, and DeepSeek, looks like foreshadowing rather than the maximum gap in open models between the U.S. and China. Until the DeepSeek moment, AI was likely a fringe issue to the PRC Government. The central government will set guidelines, rules, budgets, and focus areas that will be distributed and enforced across the decentralized government power structures. AI wasn’t a political focus and the strategy of open-source was likely set by companies looking to close the gap with leading American competitors and achieve maximum market share in the minimum time. I hear all the time that most companies in the U.S. want to start with open models for IT and philosophical reasons, even when spinning up access to a new API model is almost effortless, and it’s likely this bias could be even higher internationally where spending on technology services is historically lower.Most American startups are starting with Chinese models. I’ve been saying this for a while, but a more official reference for this comes from a recent quote from an a16z partner, Martin Casado, another vocal advocate of investment in open models in America. He was quoted in The Economist with regards to his venture portfolio companies:“I’d say 80% chance [they are] using a Chinese open-source model.”The crucial question for the next few years in the geopolitical evolution of AI is whether China will double down on this open-source strategy or change course. The difficulty with monitoring this position is that it could look like nothing is happening and China maintains its outputs, even when the processes for creating them are far different. Holding a position is still a decision.It’s feasible in the next decade that AI applications and open models are approached with the same vigor that China built public infrastructure over the last few decades (Yes, I’m reading Dan Wang’s new book Breakneck). It could become a new area that local officials compete in to prove their worth to the nation — I’m not sure even true China experts could make confident predictions here. A large source of uncertainty is whether the sort of top-down, PRC edicts can result in effective AI models and digital systems, where government officials succeeded in the past with physical infrastructure.At the same time as obvious pro-AI messaging, Chinese officials have warned of “disorderly competition” in the AI space, which is an indirect signal that could keep model providers releasing their models openly. Open models reduce duplicative costs of training, help the entire ecosystem monitor best practices, and force business models that aren’t reliant on simple race-to-the-bottom inference markets. Open model submarkets are emerging for every corner of the AI ecosystem, such as video generation or robotic action models, (see our coverage of open models, Artifacts Logs) with a dramatic evolution from research ideas to mature, stable models in the last 12-18 months.China improving the open model ecosystem looks like the forced adoption of Chinese AI chips, further specialization of companies’ open models to evolving niches, and expanded influence on fundamental AI research shared internationally. All of these directions have early signs of occurring.If the PRC Government wanted to exert certain types of control on the AI ecosystem — they could. This Doug Guthrie excerpt from Apple in China tells the story from the perspective of international companies. Guthrie was a major player in advising on culture changes in Cupertino to better adapt Apple’s strategy to the Chinese market.“When you stake your life, your identity, on and around certain ideas, you sort of fight for them,” Guthrie says. “Xi Jinping kind of broke my heart… I was sitting there, in China, in my dream job, and I’m watching Xinjiang’s internment camps. I’m watching China tearing up a fifty-year agreement over Hong Kong.”Apple, meanwhile, had become too intertwined with China. Guthrie had been hired to help understand the country and to navigate it. And Apple had followed through—very successfully. But it had burned so many boats, as the saying goes, that Guthrie felt its fate was married to China’s and there was no way out. “The cost of doing business in China today is a high one, and it is paid by any and every company that comes looking to tap into its markets or leverage its workforce,” he later wrote in a blog. “Quite simply, you don’t get to do business in China today without doing exactly what the Chinese government wants you to do. Period. No one is immune. No one.”China famously cracked down on its largest technology companies in late 2020, stripping key figures of power and dramatic amounts of market value off the books. AI is not immune to this.The primary read here is that the PRC leadership will decide on the role they want to have in the open-source AI ecosystem. The safe assumption has been that it would continue because the government picked up a high-impact national strategy when it first started focusing on the issue, already seeded with international influence. To formalize these intentions, the Chinese government has recently enacted an “AI+” plan that reads very similarly to the recent White House AI Action Plan when it comes to open models. The AI+ plan idea was first proposed in March 2024 and was just approved in its full text on July 31, 2025. The AI+ plan, when enacted by local officials, lays out goals for the AI industry in how many open models to have at different tiers of performance and some funding mechanisms for nurturing them. This is right in line with other comments from party officials. Chinese Premier Li Qiang, second-ranking member of the Politburo Standing Committee, made comments in March directly supporting open-source models. From the Wall Street Journal:Li pledged that China would boost support for applications of large-scale AI models and AI hardware, such as smartphones, robots, and smart cars.China’s top economic planning body also said Wednesday that the country aimed to develop a system of open-source models while continuing to invest in computing power and data for AI.An excerpt from Beijing’s city plan as part of the overall AI+ initiative, translated by GPT-5 Pro, has interesting, specific goals:By end-2025: implement 5 benchmark application projects at a world-leading level; organize 10 demonstration application projects that lead the nation; and promote a batch of commercializable results. Strive to form 3–5 advanced, usable, and self-controllable base large-model products, 100 excellent industry large-model products, and 1,000 industry success cases. Take the lead in building an AI-native city, making Beijing a globally influential AI innovation source and application high ground.The goal of this is to:Encourage open-source, high-parameter, ‘autonomous and controllable’ base foundation models, and support building cloud hosting platforms for models and datasets to facilitate developer sharing and collaboration.Beyond the minor translation bumpiness, the intentions are clear. The goal of the A+ plan is clear with multiple mentions of both open-source models and an open ecosystem with them where the models can be adopted widely. The ecosystem of models can make the impact of any one individual model greater than it would be alone.The Chinese government having centralized power has more direct levers to enact change than the White House, but this comes with the same trade-offs as all initiatives face when comparing the U.S. vs. China’s potential. I won’t review all of the differences in the approaches here.Where the Chinese Government enacts top level edicts that’ll be harder to follow from the West, there are numerous anecdotes and interactions that highlight in plain terms the mood of the AI ecosystem in China. I’ve routinely been impressed by the level of direct engagement I have received from leading Chinese AI companies and news outlets. Interconnects’ readership has grown substantially in China.Chinese companies are very sensitive to how their open contributions are viewed — highlighting great pride in both their work and approach. The latest case was via our China open model rankings that got direct engagement from multiple Chinese AI labs and was highlighted by a prominent AI news outlet in China — 机器之心/Synced. They described Interconnects as a “high-quality content platform deeply focused on frontier AI research.” (This Synced post was translated and discussed in the latest ChinaAI Newsletter)When intellectuals, influencers, and analysts I follow talk directly to technical members of the AI workforce in China, they sound like what we would expect — people who want to build a great technology. Jasmine Sun had a great writeup on her trip that had some anecdotes on AI in China. She asked “Do you guys worry about AI safety?”“We don’t think about risks at all.” …Continuing from Jasmine:This was the first of several conversations that gave us a distinct impression of the Chinese tech community. Spirits are high, and decoupling policies like export controls only fuel their patriotic drive.At the same time, America still represents a covetable life, despite the current political tumult:To be clear, our researcher friend made clear that working at a top US AI lab was still the most desirable option.In so many ways, trying to precisely map China’s next steps in AI is extremely challenging. Can they convert their lead in energy infrastructure to more total AI compute? Can they build their own AI chips? Will they take the frontier of performance with their talented population and a different approach? All of this is up for debate. The intrigue here is exemplified by the abundant interest in sparse news stories on how DeepSeek is training some AI model with Huawei chips. In many ways, these new chips working would be a bigger story than the original DeepSeek model, but all signs point to expected experiments with domestic chips, where China’s leading AI models are likely to be trained on Nvidia and other Western chips for the foreseeable future. I do not expect DeepSeek R2 to be trained on Huawei’s hardware.China’s hardware investment will take a lot longer to play out than open model strategies, but if China pulls it off — along with its other investments, such as self-driving cars and robots — their practical lead in AI could come for more areas. Open models could be China’s beachhead in a bigger technological resurgence with AI.Without major changes to Western investment in open models, we’re approaching a status quo in 2026 and beyond where:* Chinese open models would continue to increase their lead in performance (and adoption) over American counterparts. This will manifest in many ways. One example is how startups in Silicon Valley built on stronger Chinese models will be offering services that compete with entrenched, handicapped Fortune 500 companies wary of adopting these models in their services. This could make some subareas of AI disruption feel particularly intense. * The Chinese open ecosystem’s density of knowledge and sharing would translate into increased scientific and academic impact. China’s share of conference papers at leading AI conferences is already rapidly on the rise, and having an ecosystem built around substantially better models than their Western counterparts could lead this numerous research growing also to be impactful. Better base models allow more interesting RL and agentic research today, and the list of areas reliant on high-performance models is likely to only grow longer with time.* A proliferation of strong open models would make it difficult to restrict the presence or availability of many forms of AI. We do not have the government tools, incentives, nor culture to successfully prevent digital goods from China (or elsewhere) entering the U.S. economy. Many forms of AI governance and regulation in the United States and the rest of the world may need to be reconsidered, where many jurisdictions have looked to control and understand the development of “frontier AI.” Regulation needs to be approached for a world enmeshed in powerful AI models, rather than trying to control access or the releases of a few.These realities all paint a clear picture that bends the association of open models from “soft power” to just “power.” Continuously releasing strong open AI models could allow Chinese companies to shape the technology interfaces, services and reality around the world. Where 2024 was about research on open models, and 2025 the professionalization of them, 2026 could be where we begin to see clear impacts of their power through endless distribution. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    13:37
  • Ranking the Chinese Open Model Builders
    The Chinese AI ecosystem has taken the AI world by storm this summer with an unrelenting pace of stellar open model releases. The flagship releases that got the most Western media coverage are the likes of Qwen 3, Kimi K2, or Zhipu GLM 4.5, but there is a long-tail of providers close behind in both quality and cadence of releases.In this post we rank the top 19 Chinese labs by the quality and quantity of contributions to the open AI ecosystem — this is not a list of raw ability, but outputs — all the way from the top of DeepSeek to the emerging open research labs. For a more detailed coverage of all the specific models, we recommend studying our Artifacts Log series, which chronicles all of the major open model releases every month. We plan to revisit this ranking and make note of major new players, so make sure to subscribe.At the frontierThese companies rival Western counterparts with the quality and frequency of their models.DeepSeekdeepseek.com | 🤗 deepseek-ai | X @DeepSeek_AIDeepSeek needs little introduction. Their V3 and R1 models, and their impact, are still likely the biggest AI stories of 2025 — open, Chinese models at the frontier of performance with permissive licenses and the exposed model chains of thought that enamored users around the world.With all the attention following the breakthrough releases, a bit more has been said about DeepSeek in terms of operations, ideology, and business model relative to the other labs. They are very innovative technically and have not devoted extensive resources to their consumer chatbot or API hosting (as judged by higher than industry-standard performance degradation).Over the last 18 months, DeepSeek was known for making “about one major release a month.” Since the updated releases of V3-0324 and R1-0528, many close observers have been surprised by their lack of contributions. This has let other players in the ecosystem close the gap, but in terms of impact and actual commercial usage, DeepSeek is still king.An important aspect of DeepSeek’s strategy is their focus on improving their core models at the frontier of performance. To complement this, they have experiments using their current generation to make fundamental research innovations, such as theorem proving or math models, which ultimately get used for the next iteration of models. This is similar to how Western labs operate. First, you test a new idea as an experiment internally, then you fold it into the “main product” that most of your users see.DeepSeekMath, for example, used DeepSeek-Coder-Base-v1.5 7B and introduced the now famous reinforcement learning algorithm Group Relative Policy Optimization (GRPO), which is one of the main drivers of R1. The exception to this (at least today) is Janus, their omni-modal series, which has not been used in their main line.Qwenqwenlm.ai | 🤗 Qwen | X @Alibaba_QwenTongyi Qianwen, the primary AI lab within Alibaba’s cloud division, is by far and away most known for their open language model series. They have been releasing many models across a range of sizes (quite similar to Llama 1 through 3) for years. Recently, their models from Qwen 2.5 and Qwen 3 have had accelerating market share among AI research and startup development.Qwen is closer to American Big Tech companies than to other Chinese AI labs in terms of releases: They are covering the entire stack, from VLMs to embedding models, coding models, image and video generation, and so on.They also cater to all possible customers (or rather every part of the open community) by releasing capable models of all sizes. Small dense models are important for academia to run experiments and for small/medium businesses to power their applications, so it comes to no surprise that Qwen-based models are exploding in popularity.On top of model releases for everyone, they also focused on supporting the (Western) community, releasing MLX and GGUF versions of their models for local usage or a CLI for their coding models, which includes a generous amount of free requests.Unlike some American companies, the core team seems to have stayed relatively small in terms of headcount, in line with other Chinese AI labs: Qwen3 has 177 contributors, whereas Llama 3 has thrice the amount, while Gemini 2.5 has over 3,000 people as part of the model. Close competitorsThese companies have recently arrived at the frontier of performance and we will see if they have the capability to consistently release great models at a pace matching Qwen or DeepSeek.Moonshot AI (Kimi)moonshot.cn | 🤗 moonshotai | X @Kimi_MoonshotMoonshot AI is one of the so-called “AI tigers”, a group of hot Chinese AI startups determined by Chinese media and investors. This group consists of Baichuan, Zhipu AI, Moonshot AI, MiniMax, StepFun, and 01.AI — most of which have attracted investments by tech funds and other tech grants. For example, Alibaba is seen as a big winner in the AI space by having their own models and by being a lead investor in Moonshot, sort of like how big tech companies in the U.S. are investing in fundraising rounds for newer AI labs.While their first models, K1 and K1.5, were closed and available on their API, they started releasing open models after the R1 release with experimental models using the Muon optimizer. Similar to DeepSeek, they focus on a single model line, with small experiments eventually feeding back into the main model. K2 is their “moonshot run,” a.k.a. yolo run, and quickly became a hit similar to R1 (see our report from the release).Further reading on Kimi can be found on ChinaTalk.Zhipu / Z.AIz.ai | 🤗 zai-org | X @Zai_orgZhipu, known in the west as Z.ai, is a startup spinoff of Tsinghua University with considerable investments by Chinese companies and VCs. Currently, they are even considering an IPO, which would make them the first AI tiger to do so.In terms of models, they are mostly known for their recent release of GLM-4.5 and GLM-4.5V, which are all very capable for their sizes (both of which are fairly large mixture of expert models). However, they are not just releasing LLMs, but also image and video generation models, setting them apart from pure-LLM companies and labs.NoteworthyThese companies are transitioning to open releases, have open models with inferior capabilities, or slightly different foci than the text-centric labs pushing the frontiers of intelligence.StepFunstepfun.ai | 🤗 stepfun-ai | X @StepFun_aiStepFun first started as a closed model provider, but pivoted to open model releases after DeepSeek R1 shook up the industry. They are mostly focusing on multi-modal model releases, with Step3 being their flagship VLM. They also have image, audio and video generation models.Tencent (Hunyuan)hunyuan.tencent.com | 🤗 Tencent | X @TencentHunyuanHunyuan is mostly known for HunyuanVideo and Hunyuan3D. While they have released three series of different LLMs, their releases come with very strict licenses, which is unusual for Chinese companies and dampens excitement when combined with performance levels that can be found elsewhere.RedNote (Xiaohongshu)xiaohongshu.com | 🤗 rednote-hilabThe Chinese version of Instagram, RedNote, recently joined the ranks of Chinese companies releasing open models. Especially their capable character recognition / OCR model surprised many (see our coverage). Similar to Xiaomi and Baidu, it remains to be seen what their overall open strategy will be in the near and distant future and they have not competed in the large, frontier model space.MiniMaxminimaxi.com | 🤗 MiniMaxAI | X @MiniMax__AIMiniMax is another of the AI tigers and also started as a closed company. After the release of R1, they changed their strategy and released the weights of Minimax-Text-01, following up with reasoning models building upon it. The unique selling point of these models are the 1M context window achieved with hybrid attention.These text models are not the only thing they are focusing on — they also have image and video generation models, but those remain closed and only available on their API. They are also promoting their consumer platform heavily as they eye an IPO.OpenGVLab / InternLMinternlm.intern-ai.org.cn | 🤗 InternLM | X @opengvlabInternLM & OpenGVLab have deep ties to the Shanghai AI Laboratory, with InternLM focusing on the language models, while OpenGVLab releases vision models. While they release a range of models such as S1 or InternLM-Math, the orgs are mostly known for the strong InternVL series. While the first versions mostly used their own InternLM pretrained models, later releases (such as InternVL3) rely on Qwen as the language backend. Skyworkskywork.ai | 🤗 Skywork | X @Skywork_AIThe Singaporean Skywork first started out as an online karaoke company (yes, really) before they pivoted to AI and being a competitor to Manus, with their platform focusing on agents for work-related tasks, such as slide generation.Their LLM journey started with them releasing their own pretrained dense and MoE models. However, they stopped pre-training their own models and instead started to fine-tune existing models: Their OR1 reasoning model builds on top of DeepSeek-R1-Distill-Qwen-32B, R1V3 uses InternVL3 (which itself uses Qwen2.5 as its LLM backend).Aside from LLMs, they have a wide range of other models, from world models, image and video generation models, and reward models. Similar to their LLMs, they mostly build on top of other models. Unlike many labs, Skywork has released some datasets with their models, such as preference and reasoning training data.On the riseThese companies are either just getting their toes wet with open models or operating as more of academic research organizations than labs pushing the performance of models.ByteDance Seedseed.bytedance.com | 🤗 ByteDance-SeedSeed is the R&D arm of ByteDance and eerily similar to Meta’s FAIR division: Diverse models with interesting research, with their papers garnering a ton of attention in the community. However, it remains to be seen whether they shoot for a Llama-style model release or continue to release research artifacts.Here are some recent papers:* Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference* Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving* Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters* Seedance 1.0: Exploring the Boundaries of Video Generation Models* SeedEdit 3.0: Fast and High-Quality Generative Image Editing* Seed1.5‑VL Technical Report* Mogao: An Omni Foundation Model for Interleaved Multi‑Modal Generation* Seed1.5‑Thinking: Advancing Superb Reasoning Models with Reinforcement Learning* VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks* Seed LiveInterpret 2.0: End‑to‑end Simultaneous Speech‑to‑speech Translation with Your VoiceOpenBMBopenbmb.ai | 🤗 openbmb | X @OpenBMBOpenBMB is an open-source community (comparable to BigScience) from Tsinghua University NLP Lab (the very same university where Zhipu was spun off from) with support from the Beijing Academy of Artificial Intelligence (BAAI) and ModelBest.They are mostly focusing on small multi-modal models for the edge, such as MiniCPM-V-4. However, the license is rather restrictive, which is surprising given the community-driven origins of the group. Aside from model releases, they also release frameworks and specialized kernels to make sure their models run on low-end hardware.Xiaomi (MiMo)mi.com | 🤗 XiaomiMiMoXiaomi started releasing a bunch of small, capable models, ranging from LLMs to VLMs and audio models. Xiaomi updating the models quickly after an initial launch and releasing multiple variants of the models show that it is not a one-off foray into open models. However, it remains to be seen whether those are mostly research artifacts or whether they are serious about potentially pushing the frontier or competing for adoption.Baidu (ERNIE)yiyan.baidu.com | 🤗 baidu | X @Baidu_IncBaidu, one of the original names in the Chinese AI space, has only released the weights of ERNIE 4.5. It remains to be seen whether they continue to release weights of newer releases as well.Honorable MentionsThe rest of the labs that we are watching.Multimodal Art Projectionm-a-p.ai | 🤗 m-a-pAn open research community, releasing all kinds of models (including a truly open 7B language model with data, etc.). Now, they’re mostly known for the music generation model YuE.Alibaba International Digital Commerce Groupaidc-ai.com | 🤗 AIDC-AIAnother R&D arm of Alibaba, mostly releasing niche models building upon Qwen.Beijing Academy of Artificial Intelligence (BAAI)baai.ac.cn | 🤗 BAAI | X @BAAIBeijingAs a university, the Beijing Academy of Artificial Intelligence has a high diversity of projects. They are mostly known for BGE, which are capable embedding models.inclusionAI🤗 inclusionAI | X @InclusionAI666The open weight arm from the Ant Group (an affiliate of Alibaba handling mobile payments and some financial industries), responsible for Ling Lite, a series of LLMs.Pangu (Huawei)huaweicloud.com | X @HuaweiCloud1Huawei is working on AI accelerators to threaten the market share of Nvidia GPUs, which are often targeted by regulations, both from the US and China. Their model releases are mostly to show what’s possible with their cards, but not without drama accusing them of upcycling Qwen models and not stating it. We would expect them to continue to release more models in the near future. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    12:41
  • Contra Dwarkesh on Continual Learning
    Dwarkesh Patel’s now well-read post on why he is extending his AI timelines focuses on the idea of continual learning. If you ask me, what we have already is AGI, so the core question is: Is continual learning a bottleneck on AI progress?In this post, I argue that continual learning as he describes it actually doesn’t matter for the trajectory of AI progress that we are on. Continual learning will eventually be solved, but in the sort of way that a new type of AI will emerge from it, rather than continuing to refine what it means to host ever more powerful LLM-based systems. Continual learning is the ultimate algorithmic nerd snipe for AI researchers, when in reality all we need to do is keep scaling systems and we’ll get something indistinguishable from how humans do it, for free.To start, here’s the core of the Dwarkesh piece as a refresher for what he means by continual learning.Sometimes people say that even if all AI progress totally stopped, the systems of today would still be far more economically transformative than the internet. I disagree. I think the LLMs of today are magical. But the reason that the Fortune 500 aren’t using them to transform their workflows isn’t because the management is too stodgy. Rather, I think it’s genuinely hard to get normal humanlike labor out of LLMs. And this has to do with some fundamental capabilities these models lack.I like to think I’m “AI forward” here at the Dwarkesh Podcast. I’ve probably spent over a hundred hours trying to build little LLM tools for my post production setup. And the experience of trying to get them to be useful has extended my timelines. I’ll try to get the LLMs to rewrite autogenerated transcripts for readability the way a human would. Or I’ll try to get them to identify clips from the transcript to tweet out. Sometimes I’ll try to get them to co-write an essay with me, passage by passage. These are simple, self contained, short horizon, language in-language out tasks - the kinds of assignments that should be dead center in the LLMs’ repertoire. And they're 5/10 at them. Don’t get me wrong, that’s impressive.But the fundamental problem is that LLMs don’t get better over time the way a human would. The lack of continual learning is a huge huge problem. The LLM baseline at many tasks might be higher than an average human's. But there’s no way to give a model high level feedback. You’re stuck with the abilities you get out of the box. You can keep messing around with the system prompt. In practice this just doesn’t produce anything even close to the kind of learning and improvement that human employees experience.The core issue I have with this argument is the dream of making the LLMs we’re building today look more like humans. In many ways I’m surprised that Dwarkesh and other very AGI-focused AI researchers or commentators believe this — it’s the same root argument that AI critics use when they say AI models don’t reason. The goal to make AI more human is constraining the technological progress to a potentially impossible degree. Human intelligence has long been the inspiration for AI, but we have long surpassed it being the mirror we look to for inspiration. Now the industry is all in on the expensive path to make the best language models it possibly can. We’re no longer trying to build the bird, we’re trying to transition the Wright Brothers’ invention into the 737 in the shortest time frame possible.To put it succinctly. My argument very much rhymes with some of my past writing. Do language models reason like humans? No. Do language models reason? Yes. Will language model systems continually learn like humans? No.Will language model systems continually learn? Of course.Interconnects is a reader-supported publication. Consider becoming a subscriber.Dwarkesh writes “Rather, I think it’s genuinely hard to get normal humanlike labor out of LLMs.” This is because we’re still early on the buildout of the technology. Human labor takes an immense amount of context and quick thinking, both of which we’re starting to unlock with our language models. On top of this, human labor may not be what we want to create — we want to augment it. Using LLMs as drop in replacements for humans is not a requirement for AGI nor is what Dwarkesh describes a fundamental limitation on AI progress. Francois Chollet cleverly poked at this weakness in his recent conversation with Dwarkesh at an ARC-AGI event:Well, how do you define the difference between the ability to adapt to a new task and learning on the fly? It's, it sounds like the same thing to me.Language models can already pick up subtle context extremely fast. ChatGPT’s memory feature has gotten far better for me. When we’re using the far more powerful models we can expect in the next 18 months this’ll already start to appear magical. Language models are extremely apt at inferring context even without us giving it to them. Soon we’ll be unlocking that subtle connection engine by providing immense, explicit context. I don’t know of anyone who has actually thoroughly digitized all the relevant context of their job and formatted it in a way that is easily readable by an LLM. GPT-5 Pro estimates that all of the writing on Interconnects would be only 500K tokens. That would fit into an existing LLM with no extra system, but I’ve never tried it.The problem that Dwarkesh is facing is that we’re still using LLMs primarily in a single generation manner, which got far better with the introduction of reasoning models, but the economically useful way to use current tools in more complex intellectual domains will require a deep-research style approach over all of your recent work interactions. No one is giving language models that kind of context. None of the tools we use are set up properly to accumulate this type of context.I expect this to change rapidly. ChatGPT, Claude, and the likes are all adding memory features across chats and countless connectors to other pieces of information in your professional life. These memory features will be omnimodal and essential to extracting the type of value Dwarkesh wants. Without them, I agree language models in their current form are hopeless at solving continual learning.This is what I would expect the rumored $2000/month ChatGPT level subscriptions to work with. Each of these bespoke tasks needs to absorb a ton of context and reasoning tokens in order to make a directionally right output. If someone built the Claude Code equivalent for my Substack, with every post tagged by topic and performance metrics, I bet the AI could easily make useful suggestions on how to format my content.Continual learning in how Dwarkesh presents it is a systems problem rather than a learning problem. I expect better context management over my information ecosystem to exist in 2026, but more work to be needed for the AI companies to know how best to reference it and unlock in-context learning that feels like rapid adaptation. Call that 2027.The models that have been released in 2025 will make this far more tractable in the near future. Reasoning models have made in-context learning far more powerful, resulting in rapid progress on held-out and complex domains such as ARC-AGI. These models also have come with massive improvements in context length. Claude and Gemini have 1M+ token context lengths and GPT-5’s is at 400K — they’re all growing steadily. What is important with the context length numbers is that evaluations are showing that these are meaningful improvements that the models can leverage intelligently.With these reasoning models and smart retrieval of context, the systems we are building will look indistinguishable from continual learning. This will definitely be multiple LLMs working together and will operate very differently than the first versions of ChatGPT we were given (and often still use today).The path to continual learning is more context and more horsepower. This is directly in line with the direction AI investment is going. This doesn’t feel like a bottleneck, rather another product problem that we are going to solve. This sort of continual learning may not enable the type of raw intelligence and autonomy that many vocal leaders in AI describe as “superintelligence.” Training models to be smarter on even more complex tasks — e.g. novel biological research — requires mastering agentic behaviors that need to be learned from scratch, as discussed in my post on “What comes next with RL”. There’s no internet scale pretraining data for such agentic tasks. My point is that not all jobs that require continual learning will require the frontiers of intelligence. I’m excited to write blog posts with the bliss of my ChatGPT 6 co-editor.This technology coming soon will not be without its challenges. My first reaction to the continual learning post was more in line with “society isn’t ready for this” rather than commentary on its feasibility. I’ll repeat my warning:For a long time I’ve written that AI models have a higher risk potential in terms of social outcomes because the modalities they interact with us in are far more personal… As AI is going to be so powerful as a standalone entity, breaking some of the symbiotic links will be good for adding friction that makes the technology easier to steer towards good outcomes. In short, be wary of wishing for end-to-end (reinforcement) learning when you’re part of the environment.2 It’s a destiny to dystopia.What we have today is a form of AGI and it’ll soon get much better with better context and memory. The industrialization of language models is giving us incredible improvements across a wide swath of use-cases. These will blow past many basic primitives of intelligence in humans that have motivated AI for decades. First was models reasoning, then will come systems with continual learning. This is exactly what most AI companies are actually building — regardless of what their superintelligence messaging is.Comments are open on this post, please continue the debate! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    10:04
  • GPT-5 and the arc of progress
    If you want a video version of this, check out the last 20 minutes of the livestream reaction (edit, fixed link) I did with Will Brown of Prime Intellect and Swyx of Smol AI & Latent Space.GPT-5 was set up to fail on some of the narratives it was expected to satisfy. The two central themes it had to decide between were the AGI (or superintelligence) narrative that Sam Altman & co. have been using to fundraise and the fact that ChatGPT is one of the fastest-growing consumer technologies of all time. To fulfill both, GPT-5 needed to be AGI while also being cheap enough to serve as the most-used AI system in the world. Business and technological realities made it inevitable that GPT-5’s primary impact would be to solidify OpenAI’s market position, even if it raises a lot of eyebrows for the long-term trajectory of AI.The reactions online capture this as well. The OpenAI live streams have historically catered to AI insiders, but the product speaks entirely to a different audience. The people discussing this release on Twitter will be disappointed in a first reaction, but 99% of people using ChatGPT are going to be so happy about the upgrade. Confusingly enough, this includes many of the critics. GPT-5 is a good AI system. It’s right in line with best-in-class across pretty much every evaluation, while being cheap enough to serve the whole world. OpenAI is largely fixing its product offering with an announcement that was hyped to be one of the biggest AI news cycles of the year. AI news being loud is defined by narratives being different more-so than technology being better. OpenAI releasing an open model again will likely be pinpointed as just as important a day for the arc of AI as the GPT-5 release. In many ways GPT-5 was set up to fail and that is very off-putting for those expecting maximum AI progress in the near term.I’m not going to dwell on it, but oh boy, that was a messy release. GPT-5 being announced and rolled out like this is very odd. Countless plots were mislabeled, live demos had bugs, and the early rollout is doing some weird stuff. This reinforces how OpenAI was torn about the release and backed into a corner with their messaging. They knew they needed to improve the experience with strong competition in the industry, but releasing GPT-5 needed to make a splash after how long they’ve waited (and already parked the GPT 4.5 name).The core question we track in this post is: What does it mean for the next 6-18 months of AI progress if GPT-5 is just as good as all the best models out there, e.g., Claude Sonnet for coding or o3 for search, funneled into one, super cheap package? If AGI was a real goal, the main factor on progress would be raw performance. GPT-5 shows that AI is on a somewhat more traditional technological path, where there isn’t one key factor, it is a mix of performance, price, product, and everything in between. Interconnects is a reader-supported publication. Consider becoming a subscriber.GPT-5’s performanceThere are a few places that we can see that GPT-5 represents a solid step on the performance trend line, but nothing like a step change. First, on LMArena, GPT-5 is fantastic, sweeping the board to #1 on all categories. The last model to claim #1 in pretty much every category was Gemini 2.5 Pro — and that was the biggest step change in Elo since GPT-4 Turbo skyrocketed past the first Claude.Second, GPT-5 is the top model on the ArtificialAnalysis composite benchmark.These two, LMArena & ArtificialAnalysis, represent two coarse evaluations — community vibes and raw benchmarks. Both of these can be gamed, but are still correlated with real-world use. You can also see in OpenAI’s shared results how much the smaller versions improve on the likes of GPT-4.1 mini and o4-mini.In many ways, the march of progress on evals has felt slowed for a while because model releases are so frequent and each individual step is smaller. Lots of small steps make for big change. The overall trend line is still very positive, and multiple companies are filling in the shape of it. My post on “what comes next” from earlier this summer all but called this type of release, where the numbers aren’t shocking but the real world use cases are great, becoming more common.This is a different path for the industry and will take a different form of messaging than we’re used to. More releases are going to look like Anthropic’s Claude 4, where the benchmark gains are minor and the real world gains are a big step. There are plenty of more implications for policy, evaluation, and transparency that come with this. It is going to take much more nuance to understand if the pace of progress is continuing, especially as critics of AI are going to seize the opportunity of evaluations flatlining to say that AI is no longer working.To say it succinctly: Abilities will develop more slowly than products.The product overhang is being extended with each release. We’re still building untapped value with AI models and systems faster than we’re capturing it.Another way to see this incremental push out in models or systems is through OpenAI’s update to the famous METR plot of time to completion for humans of various tasks AI systems can solve 50% of the time. GPT-5 is leading, but also just in line with trends.All of this is to say comprehensively that AI progress is very alive and well, as long as you don’t subscribe to the exponential takeoff in ability. Those arguments are very strained by this GPT-5 release.Yes, AI progress on intelligence and “raw ability” is certainly going to continue at a solid pace for a long time, but how will this translate into recursive self-improvement?GPT-5’s detailsIf you’re reading closely, you may have noticed that this post uses the word system instead of model. All of the leading chat systems have been adding more components onto them like safety checkers and so on, but this is the first one to use different architectures and weights for the primary generation of content across similar queries. GPT-5 is the first in what is to come, mostly to better balance cost and give better user experiences. From the system card:GPT‑5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt). The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time.Along with this, they shipped many product improvements, such as how the model has a 400K context window in the API with great performance, reduced hallucinations, and new personalities. Primarily, I worry as a power user about the router. I sense that for now I’ll default to GPT-5 Thinking, and sometimes upgrade to Pro mode, while downgrading to standard GPT-5 only for benign queries (depending on its search behavior — if it is search-heavy like o3 without thinking, then it should still work well). Thankfully, the thinking mode has a “get an early answer” button, so I don’t see any reason to start elsewhere. If I need an answer fast, I’ll get one. If not, I want the best responses possible.As for prices, here’s a comparison. GPT-5’s top-level model is cheaper than Claude Sonnet and far better than any OpenAI model has been before at coding — one of the core details of this release. Matching Gemini Pro’s pricing when considering Google’s infrastructure advantage is a substantial accomplishment.* OpenAI — GPT-5 (API sizes)* GPT-5: input $1.25, output $10.00. (OpenAI)* GPT-5 mini: input $0.25, output $2.00. (OpenAI)* GPT-5 nano: input $0.05, output $0.40. (OpenAI)* OpenAI — o3 (reasoning)* o3: input $2.00, output $8.00. (OpenAI Platform)* o3-mini: input $1.10, output $4.40. (cached input $0.55) (OpenAI Platform)* Anthropic — Claude 4 family* Claude Sonnet 4: input $3.00, output $15.00. (Anthropic)* Claude Opus 4.1: input $15.00, output $75.00. (Anthropic)* Google — Gemini 2.5* Gemini 2.5 Pro: input $1.25 (≤200k prompt) / $2.50 (>200k); output $10.00 (≤200k) / $15.00 (>200k). (Google AI for Developers)* Gemini 2.5 Flash: input $0.30 (text/image/video) or $1.00 (audio); output $2.50 (includes thinking tokens). (Google AI for Developers)* Gemini 2.5 Flash-Lite: input $0.10 (text/image/video) or $0.30 (audio); output $0.40. (Google AI for Developers)Cheaper, thinking models that work well in applications are far more useful than scaling (as GPT-4.5 has shown us).GPT-5’s impactIt seems like most people in all walks of life are going to love this model — from AI researchers all the way to people who are learning of ChatGPT for the first time today. This is very in line with my expectations for how AI will proceed, as a long, steady march of progress. The fact that the models are getting way cheaper rather than way more expensive definitely signals that we cannot just brute-force scale our way to much stronger systems. Scaling helps, but it is now one of many considerations, and all the laboratories are showing us that much bigger models have diminishing returns in value to customers. At the same time, models being cheaper could be just what we need for Jevons paradox to kick in and provide another boost in AI adoption.Many people will claim that the GPT-5 release was a flop and the bubble will pop for AI. This is downstream of the industry generally making totally unrealistic promises. As someone whose core through-line when covering frontier models is tracking the pace of progress, I translate this as “AI capabilities on benchmarks will proceed a bit more slowly, but we aren’t reaching any clear walls in performance.” The AI performance hills we’re climbing up as an industry do put up some more resistance as the obvious low hanging fruit is gone, but we have the tools to overcome it consistently for the next 6 to 18 months. For companies that have been fundraising on promises of AGI, such as Anthropic and OpenAI, closing the next rounds could be harder. Of course, this depends on whether the messaging of the rounds was a key part of the fundraising. This fundraising inspires capital expenditures across the industry, e.g. TSMC developing the next node for NVIDIA to build new chips, and so on. The AGI narrative and the fundraising it has enabled have been good for the U.S. in terms of building out valuable, raw infrastructure. This could be the beginning of the money train slowing down, but that’s very different from a derailment and a stock market crash. As raw infrastructure spend slows, there will be even more pressure to deliver valuable products to users. A key trend for 2025 has been many of those appearing — Deep Research and Claude Code being the paradigms that everyone has copied. GPT-5 makes these applications better and makes it easier and cheaper for the next viral AI products to hit the market. I’m still excited for what is to come. But first, I’m going to sign off and go play with GPT-5. It’s a good day to build something for the fun of it. As I use it more, I’ll have more to say.Extra GPT-5 linksFor more specifics on the model from people who got early access, I recommend Tyler Cowen, Every.to, or Simon Willison (or Swyx soon, on Latent.Space).Livestream link: https://openai.com/gpt-5/ Research blog post: https://openai.com/index/introducing-gpt-5/ Developer blog post: https://openai.com/index/introducing-gpt-5-for-developers Enterprise blog post: https://openai.com/index/gpt-5-new-era-of-work GPT-5 landing page: https://openai.com/gpt-5/ System Card: https://openai.com/index/gpt-5-system-card/ Coding examples: https://openai.github.io/gpt-5-coding-examples/What would you say if you could talk to a future OpenAI model https://progress.openai.com/Finally, I’ll plug again the video I did with Will Brown and Swyx: Send me the most interesting things you find on GPT-5! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    10:41
  • gpt-oss: OpenAI validates the open ecosystem (finally)
    OpenAI released two open-weight, text-only reasoning models today, both mixture of experts (MoE) sized to run efficiently on a range of hardware from consumer GPUs to the cloud. These models have the Apache 2.0 license, so they’re available for distillation into other reasoning models, deployment into commercial products, and are free of downstream restrictions. These two models, the smaller gpt-oss-20B with 3.6B active parameters and 21B total and the larger gpt-oss-120B with 5.1B active parameters, follow the trends we’ve seen with the other leading open models in architecture choices. Where this release shines is in the dramatic change in open model performance and strategy that comes with the leading name in AI releasing an open model that undercuts some of their own API products.We’ll get to the technical details on the model later, but the main point of this post is how much OpenAI has changed by releasing their first open language model since GPT-2. The larger 120B model “achieves near-parity with OpenAI o4 mini on core reasoning benchmarks‬” and is a major moment for the ecosystem:* OpenAI has released an open model at the frontier of current open model performance — highlighting how major concerns over open models that OpenAI leadership mentioned in 2023 were overblown. The marginal risks of open models have been shown to not be as extreme as many people thought (at least for text only — multimodal is far riskier). Once other organizations, particularly Meta and China showed OpenAI that there was no risk here, the path was opened to release a model.* OpenAI has revealed far more of their technical stack than any release to date. This blog post has light details on many things in the model, but community tinkering will begin to better understand what is going on here. This includes basic things like our first time seeing a raw chain of thought (CoT) for an OpenAI reasoning model, but also more interesting things like how this model is trained to use tools in the CoT like their o3 model. Other details include researchers being able to play with OpenAI’s instruction hierarchy in raw weights (where pieces of it are untouchable in the API), a new “harmony” prompt format, the same “reasoning efforts” of low, medium & high from the API, a huge proof of concept on how far basic, community standard architectures with MoEs can be pushed, and other small details for the AI community to unpack.* OpenAI has initiated a scorched earth policy on the API market, undercutting their own offerings and unleashing an extremely strong, trusted model brand with a permissive license. While adoption of any open model is much slower than an API due to testing, additional configuration, etc., this is set up to go about as fast as it can. Any API model that competes with current models like OpenAI o4 mini, Claude Haiku, Gemini Flash, DeepSeek R1 etc. are all going to have to compete with this model. OpenAI’s o4 mini model is currently served at $1.1 per million input tokens and $4.4 per million output. Serving this open model will likely cost at least 10x less. There are many potential strategic reasons for this, all of which paint OpenAI as having a clearer vision of what makes it valuable. What OpenAI hasn’t touched with this model is interesting too — “For those seeking multimodal support, built-in tools, and‬ seamless integration with our platform, models available through our API platform remain the‬ best option.” These are dropped for reasons above, and “headaches” discussed later in the post.Together, these paint a much clearer vision by OpenAI on how they’ll control the AI ecosystem. The top potential reasons on my mind are:* OpenAI could be trying to make all API models potentially obsolete on cost ahead of the GPT-5 release, which they hope to capture the top end of the market on. Or,* OpenAI could be realizing that models are no longer their differentiation, as ChatGPT users continue to steadily climb — and they’ll soon pass 1 billion weekly actives.There are plenty of other reasons, such as the politics alluded to at the end of the blog post, but OpenAI tends to only act when it serves them directly — they’ve always been a focused company on their goals.There’s also a long list of head scratchers or in-between the lines points that illuminate OpenAI’s strategy a bit more. OpenAI of course didn’t release training data, code, or a technical report, as expected. OpenAI is trying to make a big splash with the name that captures more of the enterprise market, but in doing so takes some collateral damage in the research and true “open source” AI communities. These future questions include:* The naming is bad — a mixture of cringe, confusion-inducing, and still useful for their marketing goals. For anyone following open-source AI for a long time it won’t be new that a major company is blurring the association of the term open-source with the community accepted definitions. I understand why OpenAI did this, but the naming conflict further enforces that the true open source AI community isn’t the target of this release — it’s people that want to try an “open source AI model” for their business, and OpenAI has made the target too big to miss for enterprises.* OpenAI did not release the base models. Anyone following the space would’ve expected this, but it matters substantially for researchers. These two sparse, low numerical precision MoE models won’t be easy for researchers to use. The best model for researchers and tinkerers are dense, base models from 1 to 7 billion parameters. These are much “longer term” artifacts in the open community that will still be using almost only Qwen.I need to take a second before the “unknowns” section and comment on the architecture. These models are reinforcing trends we’re seeing in modeling across the industry. Recent frontier open models are all very sparse MoEs inspired by the DeepSeek architecture. DeepSeek V3 had 37B active and 671B total parameters. Kimi K2 had 32B active and 1T total parameters. With 5B active and 121B total, the sparsity factor fits right in with normal. Sparsity in MoEs is totally king right now. The smaller gpt-oss is a bit less sparse than Qwen’s 3B active, 30B total smaller MoE, but expect the sparsity of these models to continue to increase.Some things we need more testing to know the impact of include:* The model has been quantized for release to MXFP4 (4 bit floating point). It’s not clear exactly who will be impacted here, but this could make it benefit people most with the newest hardware, cause minor issues across Torch/Cuda versions, or even make some of the behaviors weird relative to the trained version internal to OpenAI. This could also be a plus, depending on performance, as the bigger model is quantized to 4 bit precision to enable it to be run on GPUs with 80GB of memory, such as the A/H100 line from NVIDIA.* Safety measures have been taken to change how finetunable the model is. With, or soon after, this release OpenAI is releasing a research paper on new methods to make it so you can’t “finetune the safety away” from a released instruct model. This is a very long-standing issue that people have concerns with over releasing open models. The main question here is if the models OpenAI releases are still able to be finetuned or not for productive use-cases. OpenAI claims they can be in their blog post, but this will be left up to the community to decide. Is finetuning the safety away actually a feature of an easy to use model?For example, Gemma has been tougher for people to finetune historically because it uses a different attention implementation and has a different parameter space from being distilled. Open finetuning stacks are still tuned for Llama and Qwen — this takes a long time to change.Many people will take the “we made it impossible to un-censor this model” as a challenge, which will be interesting to follow in the jailbreaking research community. There is a substantial market for modifiable models.* The model was trained to expect tools, but open model tool use is a mess. One of the biggest problems I worry about in designing an OLMo model with native o3-style tool use is that I need to make it seamless for users to use the same tools from training time at inference time. An early tester in my network mentioned that the model would hallucinate tool calls from training (sort of like what was mentioned around o3’s full release). I don’t expect this to be an unsolvable issue, but it could slow adoption. It could also allow people to reverse engineer the tools that OpenAI uses during training, we’ll see!* We need to re-benchmark the model on open infrastructure. OpenAI did a good job for this release integrating it everywhere, but we need to confirm that the community can easily replicate their evaluation scores. Evaluation at closed labs has increasingly become bespoke to suit their internal needs, which is a logical decision, but this comes at a cost of friction when an open model is released. This is me saying loud and clear that this isn’t a model performance review in a nuanced sense, but a summary of the importance of OpenAI’s approach (and where the opportunity is for the rest of us). Not all good models are easy to use. Some models benchmark well and are useful — e.g. Qwen. Some models benchmark well and are forgotten. Regardless of scores, I expect this to be a useful model.Overall, I would give OpenAI a very strong grade on their first open release in a while — they definitely listened to the feedback given by the community. The path to earning goodwill with the open community, especially with researchers, is to embrace more risk in making models that are easier to modify (and potentially even more revealing), such as the base models for these checkpoints. Open models from the U.S. labs were in such a dire spot that we need any step back in the right direction. As the rollout of the model begins and we have more understanding of it, we’ll include more updates on Interconnects, such as in the next Artifacts Log issue.Interconnects is a reader-supported publication. Consider becoming a subscriber.So, OpenAI is the new open champion, right? There’s no more risk vis-a-vis China? We don’t need Llama anymore? Not quite, let me explain.OpenAI, ATOM, and national championsIt’s a phenomenal step for the open ecosystem, especially for the West and its allies, that the most known brand in the AI space has returned to openly releasing models. This is momentum and could be the start of the turning point of adoption and impact of open models relative to China. The open ecosystem moves fast in some ways and slow in others. Many workflows and expertise is now built on Qwen models due to their frequent, accessible releases. Some of these will try OpenAI the next time they want to make a change, but it’s far from the fact that everyone will immediately switch to OpenAI’s model now that it’s out. To me, OpenAI dropping a strong model has switched the second derivative on the open model scales. The U.S. and its allies will no longer be falling further and further behind, which was the main story of 2025, but we need to build on this momentum if we want to have competitive open models for all use cases in the order of months rather than years.There’s a lot of uncertainty in the incentives for open models. Some of the best China analysts I know share how China is sensing that releasing open models is a successful strategy for them and are doubling down. This is a very reasonable take. The retort is that if we use it as a weakness of the American ecosystem that it is so reliant on Meta’s Llamas, or now GPT OSS, the same could happen for Qwen. So then, what happens if Alibaba decides Qwen’s stellar releases no longer serve them?In this case, there would be a large opportunity in the series of small models from 1 to 70B parameters, but there’s so much competition from China at the larger scales. These are currently the big mixture of experts (MoE) models like DeepSeek V3/R1, Z.ai’s / Zhipu’s GLM 4.5, Kimi K2, and so on. China has more models that are close to this performance level, such as MiniMax or Tencent.All of these companies have uncertainty, but there’s a strength in numbers that reinforces standard practice and sets standards. Releasing strong, large, open models is now the standard in China. We’re back in the precarious period of establishing standards for American companies, who are exposed to the legal risk of not being able to un-release models with many open lawsuits, such as in areas like copyright.These two sides of the open ecosystem are at very different stages and need very different actions. In many ways, we shared The ATOM Project when we did because we could tell this was a local (and hopefully global) minimum in terms of the distance between Western contributions to the open science of AI compared to any point in the recent past and near future. OpenAI’s release is a step in the right direction, but it is still a precarious position. Many people make noise about creating open models, from the AI Action Plan to venture capitalists and academics. What all of these parties have in common is that its not their number one goal. The goal of The ATOM Project is to give an outlet for people like myself that want to make this project their number one priority. This is why we need to keep nurturing entrants into the open model space that are releasing their best models there. It is what made the early versions of Llama great, and is what will be the defining factor of the outputs of ATOM. Models that are designed from first principles to be modifiable, interpretable, and extendable is what will enable a new decade of AI research to be born. This needs base models, training details, convenient sizes, and other little details that are missing from many recent open model releases, including OpenAI’s. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    13:36

Meer Wetenschap podcasts

Over Interconnects

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories. www.interconnects.ai
Podcast website

Luister naar Interconnects, StarTalk Radio en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies

Interconnects: Podcasts in familie

Social
v7.23.7 | © 2007-2025 radio.de GmbH
Generated: 9/14/2025 - 9:07:04 AM