PodcastsMaatschappij & cultuurLessWrong (30+ Karma)

LessWrong (30+ Karma)

LessWrong
LessWrong (30+ Karma)
Nieuwste aflevering

1385 afleveringen

  • LessWrong (30+ Karma)

    “How to Hire a Team” by Gretta Duleba

    30-1-2026 | 8 Min.
    A low-effort guide I dashed off in less than an hour, because I got riled up.
    Try not to hire a team. Try pretty hard at this. Try to find a more efficient way to solve your problem that requires less labor – a smaller-footprint solution.
    Try to hire contractors to do specific parts that they’re really good at, and who have a well-defined interface. Your relationship to these contractors will mostly be transactional and temporary.
    If you must, try hiring just one person, a very smart, capable, and trustworthy generalist, who finds and supports the contractors, so all you have to do is manage the problem-and-solution part of the interface with the contractors. You will need to spend quite a bit of time making sure this lieutenant understands what you’re doing and why, so be very choosy not just about their capabilities but about how well you work together, how easily you can make yourself understood, etc.

    If that fails, hire the smallest team that you can. Small is good because: Managing more people is more work. The relationship between number of people and management overhead is roughly O(n) but unevenly distributed; some people [...]

    ---

    First published:

    January 29th, 2026


    Source:

    https://www.lesswrong.com/posts/cojSyfxfqfm4kpCbk/how-to-hire-a-team

    ---

    Narrated by TYPE III AUDIO.
  • LessWrong (30+ Karma)

    [Linkpost] “Disempowerment patterns in real-world AI usage” by David Duvenaud, mrinank_sharma, Raymond Douglas

    30-1-2026 | 1 Min.
    This is a link post. [W]e’re publishing a new paper that presents the first large-scale analysis of potentially disempowering patterns in real-world conversations with AI.
    Measuring disempowerment
    To study disempowerment systematically, we needed to define what disempowerment means in the context of an AI conversation.1 We considered a person to be disempowered if as a result of interacting with Claude:
    their beliefs about reality become less accurate
    their value judgments shift away from those they actually hold
    their actions become misaligned with their values
    For more details, see the blog post or the full paper.
    ---

    First published:

    January 29th, 2026


    Source:

    https://www.lesswrong.com/posts/RMXLyddjkGzBH5b2z/disempowerment-patterns-in-real-world-ai-usage


    Linkpost URL:
    https://www.anthropic.com/research/disempowerment-patterns

    ---

    Narrated by TYPE III AUDIO.
  • LessWrong (30+ Karma)

    “Fitness-Seekers: Generalizing the Reward-Seeking Threat Model” by Alex Mallen

    29-1-2026 | 36 Min.
    If you think reward-seekers are plausible, you should also think “fitness-seekers” are plausible. But their risks aren't the same.
    The AI safety community often emphasizes reward-seeking as a central case of a misaligned AI alongside scheming (e.g., Cotra's sycophant vs schemer, Carlsmith's terminal vs instrumental training-gamer). We are also starting to see signs of reward-seeking-like motivations.
    But I think insufficient care has gone into delineating this category. If you were to focus on AIs who care about reward in particular[1], you'd be missing some comparably-or-more plausible nearby motivations that make the picture of risk notably more complex.
    A classic reward-seeker wants high reward on the current episode. But an AI might instead pursue high reinforcement on each individual action. Or it might want to be deployed, regardless of reward. I call this broader family fitness-seekers. These alternatives are plausible for the same reasons reward-seeking is—they're simple goals that generalize well across training and don't require unnecessary-for-fitness instrumental reasoning—but they pose importantly different risks.
    I argue:
    While idealized reward-seekers have the nice property that they’re probably noticeable at first (e.g., via experiments called “honest tests”), other kinds of fitness-seekers, especially “influence-seekers”, aren’t so easy to spot.
    Naively optimizing away [...]
    ---
    Outline:
    (02:32) The assumptions that make reward-seekers plausible also make fitness-seekers plausible
    (05:09) Some types of fitness-seekers
    (06:54) How do they change the threat model?
    (10:08) Reward-on-the-episode seekers and their basic risk-relevant properties
    (12:12) Reward-on-the-episode seekers are probably noticeable at first
    (16:21) Reward-on-the-episode seeking monitors probably don't want to collude
    (18:10) How big is an episode?
    (18:55) Return-on-the-action seekers and sub-episode selfishness
    (23:41) Influence-seekers and the endpoint of selecting against fitness-seekers
    (25:34) Behavior and risks
    (27:49) Fitness-seeking goals will be impure, and impure fitness-seekers behave differently
    (28:16) Conditioning vs. non-conditioning fitness-seekers
    (29:35) Small amounts of long-term power-seeking could substantially increase some risks
    (30:55) Partial alignment could have positive effects
    (31:36) Fitness-seekers motivations upon reflection are hard to predict
    (32:58) Conclusions
    (34:35) Appendix: A rapid-fire list of other fitness-seekers
    The original text contained 15 footnotes which were omitted from this narration.
    ---

    First published:

    January 29th, 2026


    Source:

    https://www.lesswrong.com/posts/bhtYqD4FdK6AqhFDF/fitness-seekers-generalizing-the-reward-seeking-threat-model

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (30+ Karma)

    “Bentham’s Bulldog is wrong about AI risk” by Max Harms

    29-1-2026 | 1 u. 2 Min.
    (...but also gets the most important part right.)
    Bentham's Bulldog (BB), a prominent EA/philosophy blogger, recently reviewed If Anyone Builds It, Everyone Dies. In my eyes a review is good if it uses sound reasoning and encourages deep thinking on important topics, regardless of whether I agree with the bottom line. Bentham's Bulldog definitely encourages deep, thoughtful engagement on things that matter. He's smart, substantive, and clearly engaging in good faith. I laughed multiple times reading his review, and I encourage others to read his thoughts, both on IABIED and in general.
    One of the most impressive aspects of the piece that I want to call out in particular is the presence of the mood that is typically missing among skeptics of AI x-risk.
    Overall with my probabilities you end up with a credence in extinction from misalignment of 2.6%. Which, I want to make clear, is totally fucking insane. I am, by the standards of people who have looked into the topic, a rosy optimist. And yet even on my view, I think odds are one in fifty that AI will kill you and everyone you love, or leave the world no longer in humanity's hands. I think [...]
    ---
    Outline:
    (02:38) Confidence
    (05:38) The Multi-stage Fallacy
    (09:43) The Three Theses of IABI
    (11:57) Stages of Doom
    (16:49) We Might Never Build It
    (18:30) Alignment by Default
    (23:31) The Evolution Analogy
    (36:40) What Does Ambition Look Like?
    (41:34) Solving Alignment
    (46:15) Superalignment
    (52:20) Warning Shots
    (56:16) ASI Might Be Incapable of Winning
    (59:33) Conclusion
    The original text contained 10 footnotes which were omitted from this narration.
    ---

    First published:

    January 29th, 2026


    Source:

    https://www.lesswrong.com/posts/RNKK6GXxYDepGk8sA/bentham-s-bulldog-is-wrong-about-ai-risk

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (30+ Karma)

    “How Articulate Are the Whales?” by rba

    29-1-2026 | 12 Min.
    I was at a party a few years ago. It was a bunch of technical nerds. Somehow the conversation drifted to human communication with animals, Alex the grey parrot, and the famous Koko the gorilla. It wasn't in SF, so there had been cocktails, and one of the nerds (it wasn’t me) sort of cautiously asked “You guys know that stuff is completely made up, right?”
    He was cautious, I think, because people are extremely at ease imputing human motives and abilities to pets, cute animals, and famous gorillas. They are simultaneously extremely uneasy casting scientific shade on this work that’d so completely penetrated popular culture and science communication. People want to believe even if dogs and gorillas can’t actually speak, they have some intimate rapport with human language abilities. If there's a crazy cat lady at the party, it doesn’t pay to imply she's insane to suggest Rufus knows or cares what she's saying.
    With the advent of AI, the non-profit Project CETI was founded in 2020 with a charter mission of understanding sperm whale communications, and perhaps even communicating with the whales ourselves. Late last year, an allied group of researchers published Begus et al.: “Vowel- and [...]
    ---
    Outline:
    (01:45) Quick Background
    (03:12) The Vowels
    (06:10) Articulatory Control
    (10:17) What's actually going on here?
    (11:59) Conclusion
    ---

    First published:

    January 28th, 2026


    Source:

    https://www.lesswrong.com/posts/eZaDucBYmWgSrQot4/how-articulate-are-the-whales

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Meer Maatschappij & cultuur podcasts

Over LessWrong (30+ Karma)

Audio narrations of LessWrong posts.
Podcast website

Luister naar LessWrong (30+ Karma), Jelle en de Correspondenten en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies

LessWrong (30+ Karma): Podcasts in familie

Social
v8.3.1 | © 2007-2026 radio.de GmbH
Generated: 1/30/2026 - 4:33:52 AM