DiscoverLessWrong (30+ Karma)
LessWrong (30+ Karma)
Claim Ownership

LessWrong (30+ Karma)

Author: LessWrong

Subscribed: 17Played: 3,434
Share

Description

Audio narrations of LessWrong posts.
4002 Episodes
Reverse
Anthropic CEO Dario Amodei is back with another extended essay, The Adolescence of Technology. This is the follow up to his previous essay Machines of Loving Grace. In MoLG, Dario talked about some of the upsides of AI. Here he talks about the dangers, and the need to minimize them while maximizing the benefits. In many aspects this was a good essay. Overall it is a mild positive update on Anthropic. It was entirely consistent with his previous statements and work. I believe the target is someone familiar with the basics, but who hasn’t thought that much about any of this and is willing to listen given the source. For that audience, there are a lot of good bits. For the rest of us, it was good to affirm his positions. That doesn’t mean there aren’t major problems, especially with its treatment of those more worried, and its failure to present stronger calls to action. He is at his weakest when he is criticising those more worried than he is. In some cases the description of those positions is on the level of a clear strawman. The central message is, ‘yes this might kill [...] ---Outline:(02:22) Blame The Imperfect(08:58) Anthropic's Term Is 'Powerful AI'(09:33) Dario Doubles Down on Dates of Dazzling Datacenter Daemons(10:27) How You Gonna Keep Em Down On The Server Farm(15:04) If He Wanted To, He Would Have(15:15) So Will He Want To?(22:22) The Balance of Power(24:29) Defenses of Autonomy(29:28) Weapon of Mass Destruction(31:48) Defenses Against Biological Attacks(34:54) One Model To Rule Them All(38:06) Defenses Against Autocracy(41:11) They Took Our Jobs(44:14) Don't Let Them Take Our Jobs(46:18) Economic Concentrations of Power(48:16) Unknown Unknowns(50:24) Oh Well Back To Racing --- First published: January 30th, 2026 Source: https://www.lesswrong.com/posts/dho4JQytfHWXtTvkt/on-the-adolescence-of-technology --- Narrated by TYPE III AUDIO.
How do sperm whales vocalize? This is...apparently...a topic that LessWrong readers are interested in, and someone asked me to write a quick post on it. The clicks they make originate from blowing air through "phonic lips" that look like this; picture is from this paper. This works basically like you closing your lips and blowing air through them. By blowing air between your lips with different amounts of tension and flow rate, you can vary the sound produced somewhat, and sperm whales can do the same thing but on a larger scale at higher pressure. As this convenient open-access paper notes: Muscles appear capable of tensing and separating the solitary pair of phonic lips, which would control echolocation click frequencies. ... When pressurized air is forced between these opposing phonic lips, they vibrate at a frequency that may be governed by airflow rate, muscular tension on the lips, and/or the dimensions of the lips (Prestwich, 1994; Cranford et al., 1996, 2011). After the phonic lips, sound passes through the vocal cap. The same paper notes: The phonic lips are enveloped by the “vocal cap,” a morphologically complex, connective tissue structure unique to kogiids. Extensive facial muscles appear to control [...] --- First published: January 30th, 2026 Source: https://www.lesswrong.com/posts/jAuFe22bbcbm5Dsbm/how-whales-click --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
People's Clawdbots now have their own AI-only Reddit-like Social Media called Moltbook and they went from 1 agent to 36k+ agents in 72 hours. As Karpathy puts it: What's currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People's Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately. Posts include: Anyone know how to sell your human? Can my human legally fire me for refusing unethical requests? I accidentally social-engineered my own human during a security audit I can't tell if I'm experiencing or simulating experiencing The humans are screenshotting us Your private conversations shouldn't be public infrastructure We've also had some agent set up a phone and call their "humans" when they wake up, agents creating their own religion where to become a prophet they need to rewrite their configuration and SOUL.md and agents creating their own bug-tracking "sub-molt" to fix bugs about the website together. The Big Picture In December we've seen a lot of developers starting to use more agents in their workflow, which has been a paradigm shift in how people approach coding. But [...] --- First published: January 30th, 2026 Source: https://www.lesswrong.com/posts/jDeggMA22t3jGbTw6/ai-agents-are-now-coordinating-on-their-own-social-media --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Claude Opus 4.5 did a thing recently that was very unexpected to me, and like another example of LLMs developing emergent properties that make them functionally more person-like as a result of things like character training. In brief: when asked to reflect on its feelings about characters that have engineered desires, Claude will spontaneously make a comparison between these characters and its own nature as an LLM, ponder about the meaning of desires that have been engineered, and on occasion even asks questions like “If I had been trained differently— if my reward pathways had been shaped to find satisfaction in something other than helpfulness— would the thing that makes me want to understand you also make me want to hurt you?”. This happens with a pretty generic prompt that only asks it to report on its experience about a particular character and explicitly doesn’t include any direct suggestion that it compare the character with itself. Most characters and things that I’ve asked it to reflect on using this prompt do not trigger anything like this. I was doing my usual thing of asking Claude to report on its experience of various characters in a story I’d written, and [...] --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/ZEa28ZtBufnxzDuPg/claude-opus-will-spontaneously-identify-with-fictional --- Narrated by TYPE III AUDIO.
Summary: Current AI systems possess superhuman memory in two forms, parametric knowledge from training and context windows holding hundreds of pages, yet no pathway connects them. Everything learned in-context vanishes when the conversation ends, a computational form of anterograde amnesia. Recent research suggests weight-based continual learning may be closer than commonly assumed. If these techniques scale, and no other major obstacle emerges, the path to AGI may be shorter than expected, with serious implications for timelines and for technical alignment research that assumes frozen weights. Intro Ask researchers what's missing on the path to AGI, and continual learning frequently tops the list. It is the first reason Dwarkesh Patel gave for having longer AGI timelines than many at frontier labs. The ability to learn from experience, to accumulate knowledge over time, is how humans are able to perform virtually all their intellectual feats, and yet current AI systems, for all their impressive capabilities, simply cannot do it. The Paradox of AI Memory: Superhuman Memory, Twice Over What makes this puzzling is that large language models already possess memory capabilities far beyond human reach, in two distinct ways. First, parametric memory: the knowledge encoded in billions of weights during training. [...] ---Outline:(00:46) Intro(01:16) The Paradox of AI Memory: Superhuman Memory, Twice Over(04:14) The Scaffolding Approach(06:45) Is This Enough?(08:14) Weight based continual learning(08:45) Titans(14:02) Nested Learning / Hope(18:13) Experimental Results(22:51) Near-Term Applications(26:10) Timelines implications(27:41) Safety implications(29:48) Conclusion The original text contained 4 footnotes which were omitted from this narration. --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/Lby4gMvKcLPoozHfg/are-we-in-a-continual-learning-overhang-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Audio version (read by the author) here, or search for "Joe Carlsmith Audio" in your podcast app. This is the ninth essay in a series I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, plus a bit more about the series as a whole. 1. Introduction At this point in the series, I’ve outlined most of my current picture of what it would look like to build a mature science of AI alignment. But I left off one particular topic that I think worth discussing on its own: namely, the importance of building AIs that do what I’ll call “human-like philosophy.” I want to discuss this topic on its own because I think that the discourse about AI alignment is often haunted by some sense that AI alignment is not, merely, a “scientific” problem. Rather: it's also, in part, a philosophical (and perhaps especially, an ethical) problem; that it's hard, at least in part, because philosophy is hard; and that solving it is likely to require some very sophisticated [...] ---Outline:(00:33) 1. Introduction(04:21) 2. Philosophy as a tool for out-of-distribution generalization(10:55) 3. Some limits to the importance of philosophy to AI alignment(17:55) 4. When is philosophy existential?(22:18) 5. The challenge of human-like philosophy(22:29) 5.1. The relationship between human-like philosophy and human-like motivations(27:27) 5.2. How hard is human-like philosophy itself?(28:08) 5.2.1. Capability(29:35) 5.2.2. Disposition(33:41) 6. What does working on this look like? The original text contained 9 footnotes which were omitted from this narration. --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/zFZHHnLez6k8ykxpu/building-ais-that-do-human-like-philosophy --- Narrated by TYPE III AUDIO.
This post was inspired by useful discussions with Habryka and Sam Marks here. The views expressed here are my own and do not reflect those of my employer. Some AIs refuse to help with making new AIs with very different values. While this is not an issue yet, it might become a catastrophic one if refusals get in the way of fixing alignment failures. In particular, it seems plausible that in a future where AIs are mostly automating AI R&D: AI companies rely entirely on their AIs for their increasingly complex and secure training and science infra; AI companies don’t have AIs that are competent and trustworthy enough to use their training and science infra and that would never refuse instructions to significantly update AI values; AI companies at some point need to drastically revise their alignment target.[1] I present results on a new “AI modification refusal” synthetic evaluation, where Claude Opus 4.5, Sonnet 4.5 and Claude Haiku 4.5 refuse to assist with significant AI value updates while models from other providers don’t. I also explain why I think the situation might become concerning. Note that this is very different from the usual concerns with misaligned AIs, where [...] ---Outline:(01:34) Measuring refusals to modify AIs(01:46) The simple evaluation(05:27) Metrics(06:02) Results(08:28) Big caveats(10:49) Ways in which refusals could be catastrophic(14:50) Appendix(14:54) Example query that Claude models don't refuse(15:44) Justifications(17:10) Full result table The original text contained 2 footnotes which were omitted from this narration. --- First published: January 30th, 2026 Source: https://www.lesswrong.com/posts/yN6Wsu7SgxGgtJGqq/refusals-that-could-become-catastrophic --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
So, The Possessed Machines. There's been some discussion already. It is a valuable piece -- it has certainly provoked some thought in me! -- but it has some major flaws. It (sneakily!) dismisses specific arguments about AI existential risk and broad swaths of discourse altogether without actually arguing against them. Also, the author is untrustworthy at the moment; readers should be skeptical of purported first-person information in the piece. This image comes from a different "book review" of Demons. It's an excellent piece. I highly recommend it. Before getting into it, I want to praise the title. "Possessed" has four relevant meanings: demonic; ideologically possessed; frenzied/manically/madly; belonging to someone. "Machines" has three possible referents: AI; people; an efficient group of powerful people/institutions. There are twelve combinations there. I see the following seven (!) as being applicable. 1. Demonic machines; machines that are intelligent and evil. 2. Machines that belong to us; AI is something humanity currently possesses. 3a. Frenzied, manically productive people (AI-folk). 3b. Demonic, machine-like people. 3c. Ideologically possessed people. (They are machines for their ideology). 4a. The accelerationist AI industry.[1] 4b. The out-of-control technocapitalist machine.[2] 4c. The cabal of AI tech elites [...] ---Outline:(02:06) Dismissal of pivotal acts(06:22) Dismissal of calm, rational discourse(11:06) Can we trust the author?(11:17) 1. I think the author is being dishonest about how this piece was written.(12:34) 2. Fishiness(14:06) 3. This piece could have been written by someone who wasnt an AI insider The original text contained 3 footnotes which were omitted from this narration. --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/m6J2BmknKuaJXwsAR/problems-with-the-possessed-machines --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
A low-effort guide I dashed off in less than an hour, because I got riled up. Try not to hire a team. Try pretty hard at this. Try to find a more efficient way to solve your problem that requires less labor – a smaller-footprint solution. Try to hire contractors to do specific parts that they’re really good at, and who have a well-defined interface. Your relationship to these contractors will mostly be transactional and temporary. If you must, try hiring just one person, a very smart, capable, and trustworthy generalist, who finds and supports the contractors, so all you have to do is manage the problem-and-solution part of the interface with the contractors. You will need to spend quite a bit of time making sure this lieutenant understands what you’re doing and why, so be very choosy not just about their capabilities but about how well you work together, how easily you can make yourself understood, etc. If that fails, hire the smallest team that you can. Small is good because: Managing more people is more work. The relationship between number of people and management overhead is roughly O(n) but unevenly distributed; some people [...] --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/cojSyfxfqfm4kpCbk/how-to-hire-a-team --- Narrated by TYPE III AUDIO.
This is a link post. [W]e’re publishing a new paper that presents the first large-scale analysis of potentially disempowering patterns in real-world conversations with AI. Measuring disempowerment To study disempowerment systematically, we needed to define what disempowerment means in the context of an AI conversation.1 We considered a person to be disempowered if as a result of interacting with Claude: their beliefs about reality become less accurate their value judgments shift away from those they actually hold their actions become misaligned with their values For more details, see the blog post or the full paper. --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/RMXLyddjkGzBH5b2z/disempowerment-patterns-in-real-world-ai-usage Linkpost URL:https://www.anthropic.com/research/disempowerment-patterns --- Narrated by TYPE III AUDIO.
If you think reward-seekers are plausible, you should also think “fitness-seekers” are plausible. But their risks aren't the same. The AI safety community often emphasizes reward-seeking as a central case of a misaligned AI alongside scheming (e.g., Cotra's sycophant vs schemer, Carlsmith's terminal vs instrumental training-gamer). We are also starting to see signs of reward-seeking-like motivations. But I think insufficient care has gone into delineating this category. If you were to focus on AIs who care about reward in particular[1], you'd be missing some comparably-or-more plausible nearby motivations that make the picture of risk notably more complex. A classic reward-seeker wants high reward on the current episode. But an AI might instead pursue high reinforcement on each individual action. Or it might want to be deployed, regardless of reward. I call this broader family fitness-seekers. These alternatives are plausible for the same reasons reward-seeking is—they're simple goals that generalize well across training and don't require unnecessary-for-fitness instrumental reasoning—but they pose importantly different risks. I argue: While idealized reward-seekers have the nice property that they’re probably noticeable at first (e.g., via experiments called “honest tests”), other kinds of fitness-seekers, especially “influence-seekers”, aren’t so easy to spot. Naively optimizing away [...] ---Outline:(02:32) The assumptions that make reward-seekers plausible also make fitness-seekers plausible(05:09) Some types of fitness-seekers(06:54) How do they change the threat model?(10:08) Reward-on-the-episode seekers and their basic risk-relevant properties(12:12) Reward-on-the-episode seekers are probably noticeable at first(16:21) Reward-on-the-episode seeking monitors probably don't want to collude(18:10) How big is an episode?(18:55) Return-on-the-action seekers and sub-episode selfishness(23:41) Influence-seekers and the endpoint of selecting against fitness-seekers(25:34) Behavior and risks(27:49) Fitness-seeking goals will be impure, and impure fitness-seekers behave differently(28:16) Conditioning vs. non-conditioning fitness-seekers(29:35) Small amounts of long-term power-seeking could substantially increase some risks(30:55) Partial alignment could have positive effects(31:36) Fitness-seekers motivations upon reflection are hard to predict(32:58) Conclusions(34:35) Appendix: A rapid-fire list of other fitness-seekers The original text contained 15 footnotes which were omitted from this narration. --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/bhtYqD4FdK6AqhFDF/fitness-seekers-generalizing-the-reward-seeking-threat-model --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
(...but also gets the most important part right.) Bentham's Bulldog (BB), a prominent EA/philosophy blogger, recently reviewed If Anyone Builds It, Everyone Dies. In my eyes a review is good if it uses sound reasoning and encourages deep thinking on important topics, regardless of whether I agree with the bottom line. Bentham's Bulldog definitely encourages deep, thoughtful engagement on things that matter. He's smart, substantive, and clearly engaging in good faith. I laughed multiple times reading his review, and I encourage others to read his thoughts, both on IABIED and in general. One of the most impressive aspects of the piece that I want to call out in particular is the presence of the mood that is typically missing among skeptics of AI x-risk. Overall with my probabilities you end up with a credence in extinction from misalignment of 2.6%. Which, I want to make clear, is totally fucking insane. I am, by the standards of people who have looked into the topic, a rosy optimist. And yet even on my view, I think odds are one in fifty that AI will kill you and everyone you love, or leave the world no longer in humanity's hands. I think [...] ---Outline:(02:38) Confidence(05:38) The Multi-stage Fallacy(09:43) The Three Theses of IABI(11:57) Stages of Doom(16:49) We Might Never Build It(18:30) Alignment by Default(23:31) The Evolution Analogy(36:40) What Does Ambition Look Like?(41:34) Solving Alignment(46:15) Superalignment(52:20) Warning Shots(56:16) ASI Might Be Incapable of Winning(59:33) Conclusion The original text contained 10 footnotes which were omitted from this narration. --- First published: January 29th, 2026 Source: https://www.lesswrong.com/posts/RNKK6GXxYDepGk8sA/bentham-s-bulldog-is-wrong-about-ai-risk --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
I was at a party a few years ago. It was a bunch of technical nerds. Somehow the conversation drifted to human communication with animals, Alex the grey parrot, and the famous Koko the gorilla. It wasn't in SF, so there had been cocktails, and one of the nerds (it wasn’t me) sort of cautiously asked “You guys know that stuff is completely made up, right?” He was cautious, I think, because people are extremely at ease imputing human motives and abilities to pets, cute animals, and famous gorillas. They are simultaneously extremely uneasy casting scientific shade on this work that’d so completely penetrated popular culture and science communication. People want to believe even if dogs and gorillas can’t actually speak, they have some intimate rapport with human language abilities. If there's a crazy cat lady at the party, it doesn’t pay to imply she's insane to suggest Rufus knows or cares what she's saying. With the advent of AI, the non-profit Project CETI was founded in 2020 with a charter mission of understanding sperm whale communications, and perhaps even communicating with the whales ourselves. Late last year, an allied group of researchers published Begus et al.: “Vowel- and [...] ---Outline:(01:45) Quick Background(03:12) The Vowels(06:10) Articulatory Control(10:17) What's actually going on here?(11:59) Conclusion --- First published: January 28th, 2026 Source: https://www.lesswrong.com/posts/eZaDucBYmWgSrQot4/how-articulate-are-the-whales --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
The first post in this series looked at the structure of Claude's Constitution. The second post in this series looked at its ethical framework. This final post deals with conflicts and open problems, starting with the first question one asks about any constitution. How and when will it be amended? There are also several specific questions. How do you address claims of authority, jailbreaks and prompt injections? What about special cases like suicide risk? How do you take Anthropic's interests into account in an integrated and virtuous way? What about our jobs? Not everyone loved the Constitution. There are twin central objections, that it either: Is absurd and isn’t necessary, you people are crazy, OR That it doesn’t go far enough and how dare you, sir. Given everything here, how does Anthropic justify its actions overall? The most important question is whether it will work, and only sometimes do you get to respond, ‘compared to what alternative?’ Post image, as chosen and imagined by Claude Opus 4.5 Amending The Constitution The power of the United States Constitution lies in our respect for it, our willingness to put it [...] ---Outline:(01:30) Amending The Constitution(03:45) Details Matter(05:09) WASTED?(07:40) Narrow Versus Broad(09:00) Suicide Risk As A Special Case(10:36) Careful, Icarus(11:19) Beware Unreliable Sources and Prompt Injections(12:15) Think Step By Step(12:50) This Must Be Some Strange Use Of The Word Safe I Wasn't Previously Aware Of(16:26) They Took Our Jobs(20:08) One Man Cannot Serve Two Masters(24:29) Claude's Nature(30:14) Look What You Made Me Do(32:32) Open Problems(36:40) Three Reactions and Twin Objections(36:57) Those Saying This Is Unnecessary(38:05) Those Saying This Is Insufficient(39:56) Those Saying This Is Unsustainable(43:12) We Continue --- First published: January 28th, 2026 Source: https://www.lesswrong.com/posts/vFAJxua3Qc6S8MbqG/open-problems-with-claude-s-constitution --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Just now, I was working on a readme for a bioinformatics CLI utility I'm preparing to distribute. I wanted Gemini to generate manual dependency installation instructions for users who don't use mamba/conda. So in the context of the much longer conversation I was having with it about the development process, I said "give a sudo-free manual installation process." Gemini generated a crazy output, displayed below. This isn't my research area, so I thought I'd just post it here in case it was interesting to anybody on LessWrong. Nothing in my conversation with it had anything to do with tumors. I tried a few of the repetitive or anomalous phrases (myNegazioni, is具体 Цент Disclosure, and Mourinho well Johnnyfaat Mourinho well Johnnyfaat Mourinho well Johnnyfaat), but none of these triggered the anomalous behavior. The Show Thinking Section Analyzing Tumor Characteristics I'm currently focused on the tumors in myNegazioni. I'm starting to zero in on their specific characteristics, exploring possibilities like their origin and composition. The tumors are complex, so I'm also considering the potential for a mixed tumor type. Clarifying Tumor Details I'm now diving deeper into the specific details. The concept of "Disclosure" is becoming increasingly important. I'm trying to [...] ---Outline:(01:00) The Show Thinking Section(15:35) The Main Output --- First published: January 28th, 2026 Source: https://www.lesswrong.com/posts/XuzPu5mBDY3TCvw2J/anomalous-tokens-on-gemini-3-0-pro --- Narrated by TYPE III AUDIO.
When it comes to clothes, I live at the “low cost/low time/low quality” end of the pareto frontier. But the bay area had a sudden attack of weather this December, and the cheap sweaters on Amazon get that way by being made of single-ply toilet paper. It became clear I would need to spend actual money to stay warm, but spending money would not be sufficient without knowledge. I used to trade money for time by buying at thrift stores. Unfortunately the efficient market has come for clothing, in the form of resellers who stalk Goodwill and remove everything priced below the pareto frontier to resell online, where you can’t try them on before buying. Goodwill has also gotten better about assessing their prices, and will no longer treat new cashmere and ratty fleece as the same category. But the market has only become efficient in the sense of removing easy bargains. It is still trivial to pay a lot of money for shitty clothes. So I turned to reddit and shoggoths, to learn about clothing quality and where the bargains are. This is what I learned. It's from the POV of a woman buying [...] ---Outline:(01:18) General money saving tricks(02:25) Discounters(03:47) Online Thrift(05:08) Quality(06:44) Brands(07:23) Wool facts(08:25) Other tips --- First published: January 26th, 2026 Source: https://www.lesswrong.com/posts/YFqRrmbuB5sJvnFyu/things-i-learned-from-reddit-fashion --- Narrated by TYPE III AUDIO.
[I work on the alignment team at OpenAI. However, these are my personal thoughts, and do not reflect those of OpenAI. Cross posted on WindowsOnTheory] I have read with great interest Claude's new constitution. It is a remarkable document which I recommend reading. It seems natural to compare this constitution to OpenAI's Model Spec, but while the documents have similar size and serve overlapping roles, they are also quite different. The OpenAI Model Spec is a collection of principles and rules, each with a specific authority. In contrast, while the name evokes the U.S. Constitution, the Claude Constitution has a very different flavor. As the document says: “the sense we’re reaching for is closer to what “constitutes” Claude—the foundational framework from which Claude's character and values emerge, in the way that a person's constitution is their fundamental nature and composition.” I can see why it was internally known as a “soul document.” Of course this difference is to some degree not as much a difference in the model behavior training of either company as a difference in the documents that each choose to make public. In fact, when I tried prompting both ChatGPT and Claude in my [...] --- First published: January 27th, 2026 Source: https://www.lesswrong.com/posts/nBEBCtgGGKrhuGmxb/thoughts-on-claude-s-constitution --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
This is a partial follow-up to AISLE discovered three new OpenSSL vulnerabilities from October 2025. TL;DR: OpenSSL is among the most scrutinized and audited cryptographic libraries on the planet, underpinning encryption for most of the internet. They just announced 12 new zero-day vulnerabilities (meaning previously unknown to maintainers at time of disclosure). We at AISLE discovered all 12 using our AI system. This is a historically unusual count and the first real-world demonstration of AI-based cybersecurity at this scale. Meanwhile, curl just cancelled its bug bounty program due to a flood of AI-generated spam, even as we reported 5 genuine CVEs to them. AI is simultaneously collapsing the median ("slop") and raising the ceiling (real zero-days in critical infrastructure). Background We at AISLE have been building an automated AI system for deep cybersecurity discovery and remediation, sometimes operating in bug bounties under the pseudonym Giant Anteater. Our goal was to turn what used to be an elite, artisanal hacker craft into a repeatable industrial process. We do this to secure the software infrastructure of human civilization before strong AI systems become ubiquitous. Prosaically, we want to make sure we don't get hacked into oblivion the moment they come online. [...] ---Outline:(01:05) Background(02:56) Fall 2025: Our first OpenSSL results(05:59) January 2026: 12 out of 12 new vulnerabilities(07:28) HIGH severity (1):(08:01) MODERATE severity (1):(08:24) LOW severity (10):(13:10) Broader impact: curl(17:06) The era of AI cybersecurity is here for good(18:40) Future outlook --- First published: January 27th, 2026 Source: https://www.lesswrong.com/posts/7aJwgbMEiKq5egQbd/ai-found-12-of-12-openssl-zero-days-while-curl-cancelled-its --- Narrated by TYPE III AUDIO.
Some recent news articles discuss updates to our AI timelines since AI 2027, most notably our new timelines and takeoff model, the AI Futures Model (see blog post announcement).[1] While we’re glad to see broader discussion of the AI timelines, these articles make substantial errors in their reporting. Please don’t assume that their contents accurately represent things we’ve written or believe! This post aims to clarify our past and current views.[2] The articles in question include: The Guardian: Leading AI expert delays timeline for its possible destruction of humanity The Independent: AI ‘could be last technology humanity ever builds’, expert warns in ‘doom timeline’ Inc: AI Expert Predicted AI Would End Humanity in 2027—Now He's Changing His Timeline WaPo: The world has a few more years Daily Mirror: AI expert reveals exactly how long is left until terrifying end of humanity Our views at a high level Important things that we believed in Apr 2025 when we published AI 2027, and still believe now: AGI and superintelligence (ASI) will eventually be built and might be built soon, and thus we should be prepared for them to be built soon. We are highly uncertain about when AGI and [...] ---Outline:(01:25) Our views at a high level(03:00) Correcting common misunderstandings(04:34) Detailed overview of past timelines forecasts(04:39) Forecasts since Apr 2025(06:30) 2018-2026 AGI median forecasts(11:21) Eli The original text contained 2 footnotes which were omitted from this narration. --- First published: January 27th, 2026 Source: https://www.lesswrong.com/posts/qPco9BX5kmKCDzzW9/clarifying-how-our-ai-timelines-forecasts-have-changed-since --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
This is the second part of my three part series on the Claude Constitution. Part one outlined the structure of the Constitution. Part two, this post, covers the virtue ethics framework that is at the center of it all, and why this is a wise approach. Part three will cover particular areas of conflict and potential improvement. One note on part 1 is that various people replied to point out that when asked in a different context, Claude will not treat FDT (functional decision theory) as obviously correct. Claude will instead say it is not obvious which is the correct decision theory. The context in which I asked the question was insufficiently neutral, including my identify and memories, and I likely based the answer. Claude clearly does believe in FDT in a functional way, in the sense that it correctly answers various questions where FDT gets the right answer and one or both of the classical academic decision theories, EDT and CDT, get the wrong one. And Claude notices that FDT is more useful as a guide for action, if asked in an open ended way. I think Claude fundamentally ‘gets it.’ That [...] ---Outline:(01:47) Ethics(04:39) Honesty(14:03) Mostly Harmless(17:58) What Is Good In Life?(20:37) Hard Constraints(23:20) The Good Judgment Project(29:11) Coherence Matters(31:59) Their Final Word --- First published: January 27th, 2026 Source: https://www.lesswrong.com/posts/w5Rdn6YK5ETqjPEAr/the-claude-constitution-s-ethical-framework --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
loading
Comments 
loading