Powered by RND
PodcastsTechnologieAXRP - the AI X-risk Research Podcast
Luister naar AXRP - the AI X-risk Research Podcast in de app
Luister naar AXRP - the AI X-risk Research Podcast in de app
(2.067)(250 021)
Favorieten opslaan
Wekker
Slaaptimer

AXRP - the AI X-risk Research Podcast

Podcast AXRP - the AI X-risk Research Podcast
Daniel Filan
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper,...

Beschikbare afleveringen

5 van 50
  • 38.6 - Joel Lehman on Positive Visions of AI
    Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/01/24/episode-38_6-joel-lehman-positive-visions-of-ai.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch The Alignment Workshop: https://www.alignment-workshop.com/   Topics we discuss, and timestamps:  01:12 - Why aligned AI might not be enough 04:05 - Positive visions of AI 08:27 - Improving recommendation systems   Links: Why Greatness Cannot Be Planned: https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237 We Need Positive Visions of AI Grounded in Wellbeing: https://thegradientpub.substack.com/p/beneficial-ai-wellbeing-lehman-ngo Machine Love: https://arxiv.org/abs/2302.09248 AI Alignment with Changing and Influenceable Reward Functions: https://arxiv.org/abs/2405.17713   Episode art by Hamish Doodles: hamishdoodles.com
    --------  
    15:28
  • 38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
    Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/01/20/episode-38_5-adria-garriga-alonso-detecting-ai-scheming.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch The Alignment Workshop: https://www.alignment-workshop.com/   Topics we discuss, and timestamps: 01:04 - The Alignment Workshop 02:49 - How to detect scheming AIs 05:29 - Sokoban-solving networks taking time to think 12:18 - Model organisms of long-term planning 19:44 - How and why to study planning in networks   Links: Adrià's website: https://agarri.ga/ An investigation of model-free planning: https://arxiv.org/abs/1901.03559 Model-Free Planning: https://tuphs28.github.io/projects/interpplanning/ Planning in a recurrent neural network that plays Sokoban: https://arxiv.org/abs/2407.15421   Episode art by Hamish Doodles: hamishdoodles.com
    --------  
    27:41
  • 38.4 - Shakeel Hashim on AI Journalism
    AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2025/01/05/episode-38_4-shakeel-hashim-ai-journalism.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch The Alignment Workshop: https://www.alignment-workshop.com/   Topics we discuss, and timestamps: 01:31 - The AI media ecosystem 02:34 - Why not more AI news? 07:18 - Disconnects between journalists and the AI field 12:42 - Tarbell 18:44 - The Transformer newsletter   Links: Transformer (Shakeel's substack): https://www.transformernews.ai/ Tarbell: https://www.tarbellfellowship.org/   Episode art by Hamish Doodles: hamishdoodles.com
    --------  
    24:14
  • 38.3 - Erik Jenner on Learned Look-Ahead
    Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/12/12/episode-38_3-erik-jenner-learned-look-ahead.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch The Alignment Workshop: https://www.alignment-workshop.com/   Topics we discuss, and timestamps: 00:57 - How chess neural nets look into the future 04:29 - The dataset and basic methodology 05:23 - Testing for branching futures? 07:57 - Which experiments demonstrate what 10:43 - How the ablation experiments work 12:38 - Effect sizes 15:23 - X-risk relevance 18:08 - Follow-up work 21:29 - How much planning does the network do?   Research we mention: Evidence of Learned Look-Ahead in a Chess-Playing Neural Network: https://arxiv.org/abs/2406.00877 Understanding the learned look-ahead behavior of chess neural networks (a development of the follow-up research Erik mentioned): https://openreview.net/forum?id=Tl8EzmgsEp Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT: https://arxiv.org/abs/2310.07582   Episode art by Hamish Doodles: hamishdoodles.com
    --------  
    23:46
  • 39 - Evan Hubinger on Model Organisms of Misalignment
    The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge". Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/12/01/episode-39-evan-hubinger-model-organisms-misalignment.html   Topics we discuss, and timestamps: 0:00:36 - Model organisms and stress-testing 0:07:38 - Sleeper Agents 0:22:32 - Do 'sleeper agents' properly model deceptive alignment? 0:38:32 - Surprising results in "Sleeper Agents" 0:57:25 - Sycophancy to Subterfuge 1:09:21 - How models generalize from sycophancy to subterfuge 1:16:37 - Is the reward editing task valid? 1:21:46 - Training away sycophancy and subterfuge 1:29:22 - Model organisms, AI control, and evaluations 1:33:45 - Other model organisms research 1:35:27 - Alignment stress-testing at Anthropic 1:43:32 - Following Evan's work   Main papers: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training: https://arxiv.org/abs/2401.05566 Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models: https://arxiv.org/abs/2406.10162   Anthropic links: Anthropic's newsroom: https://www.anthropic.com/news Careers at Anthropic: https://www.anthropic.com/careers   Other links: Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research: https://www.alignmentforum.org/posts/ChDH335ckdvpxXaXX/model-organisms-of-misalignment-the-case-for-a-new-pillar-of-1 Simple probes can catch sleeper agents: https://www.anthropic.com/research/probes-catch-sleeper-agents Studying Large Language Model Generalization with Influence Functions: https://arxiv.org/abs/2308.03296 Stress-Testing Capability Elicitation With Password-Locked Models [aka model organisms of sandbagging]: https://arxiv.org/abs/2405.19550   Episode art by Hamish Doodles: hamishdoodles.com
    --------  
    1:45:47

Meer Technologie podcasts

Over AXRP - the AI X-risk Research Podcast

AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
Podcast website

Luister naar AXRP - the AI X-risk Research Podcast, Tech Update | BNR en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies
Social
v7.6.0 | © 2007-2025 radio.de GmbH
Generated: 2/5/2025 - 4:56:44 PM