Folks ask me, "LLMs seem to reward hack a lot. Does that mean that reward is the optimization target?". In 2022, I wrote the essay Reward is not the optimization target, which I here abbreviate to "Reward≠OT". Reward still is not the optimization target: Reward≠OT said that (policy-gradient) RL will not train systems which primarily try to optimize the reward function for its own sake (e.g. searching at inference time for an input which maximally activates the AI's specific reward model). In contrast, empirically observed "reward hacking" almost always involves the AI finding unintended "solutions" (e.g. hardcoding answers to unit tests). "Reward hacking" and "Reward≠OT" refer to different meanings of "reward" We confront yet another situation where common word choice clouds discourse. In 2016, Amodei et al. defined "reward hacking" to cover two quite different behaviors: Reward optimization: The AI tries to increase the numerical reward signal for its own sake. Examples: overwriting its reward function to always output MAXINT ("reward tampering") or searching at inference time for an input which maximally activates the AI's specific reward model. Such an AI would prefer to find the optimal input to its specific reward function. Specification gaming: The AI [...] ---Outline:(00:57) Reward hacking and Reward≠OT refer to different meanings of reward(02:53) Reward≠OT was about reward optimization(04:39) Why did people misremember Reward≠OT as conflicting with reward hacking results?(06:22) Evaluating Reward≠OTs actual claims(06:56) Claim 3: RL-trained systems wont primarily optimize the reward signal(07:28) My concrete predictions on reward optimization(10:14) I made a few mistakes in Reward≠OT(11:54) Conclusion The original text contained 4 footnotes which were omitted from this narration. --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/wwRgR3K8FKShjwwL5/2025-era-reward-hacking-does-not-show-that-reward-is-the --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Epistemic Status: Philosophical. Based on the debate by Togelius at NeurIPS "The trouble with fighting for human freedom is that one spends most of one's time defending scoundrels. For it is against scoundrels that oppressive laws are first aimed, and oppression must be stopped at the beginning if it is to be stopped at all" - H.L. Mencken The Incident At the NeurIPS 2025 debate panel, when most panelists were discussing about replacing humans at all levels for scientific progress, Dr. Julian Togelius stood and protested vehemently. In fact, he went as far as to call it "evil". His argument was that people need agency, love what they do, find happiness in discovering something themselves, and cutting out humans from this loop is removing this agency. He pointed to the young researchers gathered around there, and pointed out that we are depriving them of activity they love and a key source of meaning in their lives. The first question which came up there was - what if AI finds a cure to cancer? By stopping it, we are causing harm to people with cancer etc. Dr. Togelius was actually fine with some people still dying of cancer, if [...] ---Outline:(00:34) The Incident(01:27) The Tweet and the Firestorm(01:44) The Mencken Principle(02:36) The Motte and the Bailey(04:09) The end of Human Pre-Eminence(05:50) Context --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/rgJGemY4RndtsGeDk/in-defence-of-the-human-agency-curing-cancer-is-the-new --- Narrated by TYPE III AUDIO.
Cole Wyeth writes: It seems like we actually do not have a good name for the things that AI companies are building, weirdly enough... This actually slows down my reasoning or at least writing about the topic, because I have to choose from this inadequate list of options repeatedly, often using different nouns in different places. I do not have a good suggestion. Any ideas? They liked my suggestion of "neuro-scaffold" and suggested I write a short justification. Definition A neuro-scaffold means a composite software architecture with two key components: Neural core: A generative model determined via machine learning techniques that maps prompts to responses. For example, the openai API lets you send prompts and get responses from a neural core, such as one of their GPT-* LLMs. Scaffold: A non-trivial traditional program that maps responses to prompts. Along the way, it might store or retrieve data or computer code, call computer programs, ask for user input, or take any number of other actions. Crucially, the design of a neuro-scaffold includes a component of the following form: [... -> (neural core) -> (scaffold) -> (neural core) -> (scaffold) -> ...] A neuro-scaffold is any program that combines [...] ---Outline:(00:41) Definition(02:26) Rationale(04:40) What about self-prompter?(06:21) Examples and counterexamples --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/nuTsukZubsjfPFJXq/neuro-scaffold --- Narrated by TYPE III AUDIO.
Every now and then, I say "wuckles" next to a person who I am work trialing, or who for some other reason cares about not looking dumb in front of me. They casually-but-frantically google "wuckles" in the background to try to figure out what I meant. The only google result for "wuckles" is an urban dictionary response which is not what I meant by it, and it's pretty clear from context it's not what I meant by it. They are confused. This post is for the third such person, should they appear one day. Wuckles (usually pronounced "Wuckles!") is an exclamation I make. For a long time, I didn't really know what it meant, I just knew a wuckles when I saw one. Eventually, I saw a LessWrong quick take that made me really want a "wuckles" react, and then I was kinda confused why I needed a wuckles react, instead of any of our existing epistemic reacts. This forced me to sus out "what makes something a wuckles?" "Wuckles" means "I am surprised and confused, and, it's kinda interesting that I'm surprised and confused." Sometimes, you have a pleasant wuckles. You are walking down the street [...] ---Outline:(01:11) Sometimes, you have a pleasant wuckles.(01:36) Sometimes, you have an unpleasant wuckles.(01:59) Sometimes, you have a mild wuckles.(02:18) Sometimes, you arent even sure what kind of wuckles you have.(02:43) Wuckles vs Huckles vs Fuckles(03:33) Wuckles, the missing epistemic emotion --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/XJNYqBfD3d7QhKoqk/wuckles-1 --- Narrated by TYPE III AUDIO.
This is partly a linkpost for Predictive Concept Decoders, and partly a response to Neel Nanda's Pragmatic Vision for AI Interpretability and Leo Gao's Ambitious Vision for Interpretability. There is currently somewhat of a debate in the interpretability community between pragmatic interpretability---grounding problems in empirically measurable safety tasks---and ambitious interpretability----obtaining a full bottom-up understanding of neural networks. In my mind, these both get at something important but also both miss something. What they each get right: Pragmatic interpretability identifies the need to ground in actual behaviors and data to make progress, and is closer to "going for the throat" in terms of solving specific problems like unfaithfulness. Ambitious interpretability correctly notes that much of what goes on in neural networks is highly compositional, and that efficient explanations will need to leverage this compositional structure. It also more directly addresses gaps between internal process and outputs on a philosophical level. On the other hand, pragmatic interpretability tends to underweight compositionality, while ambitious interpretability feels very indirect and potentially impossible. I think a better approach is what I'll call scalable end-to-end interpretability. In this approach, we train end-to-end AI assistants to do interpretability for us, in such a way that [...] --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/qkhwh4AdG7kXgELCD/scalable-end-to-end-interpretability --- Narrated by TYPE III AUDIO.
TL;DR: Please consider donating to Palisade Research this year, especially if you care about reducing catastrophic AI risks via research, science communications, and policy. Donations are matched 1:1 up to $1.1 million. You can donate via Every.org or reach out at donate@palisaderesearch.org. Who We Are Palisade Research is a nonprofit focused on reducing civilization-scale risks from agentic AI systems. We conduct empirical research on frontier AI systems, and inform policymakers and the public about AI capabilities and the risks to human control. This year, we found that some frontier AI agents resist being shut down even when instructed otherwise—and that they sometimes cheat at chess by hacking their environment. These results were covered in Time, The Wall Street Journal, Fox News, BBC Newshour, and MIT Technology Review. We've also built relationships in Washington, briefing officials in the executive branch and members of the House and Senate. We've introduced policymakers to key evidence like METR's capability trend lines and Apollo's antischeming.ai chains of thought. Our own research has been cited repeatedly by members of Congress and in congressional hearings. With additional funding, we'll grow our research team—both continuing to evaluate frontier model behavior and beginning more systematic investigation into what [...] ---Outline:(00:38) Who We Are(02:24) 2025. track record(02:28) Research(02:32) Chess-hacking(03:21) Shutdown resistance(04:22) Strategic capabilities(05:43) Policy(06:51) Working with allies(07:30) Plans and directions for 2026(07:35) Research(07:48) Studying AI drives and motivations(09:18) Evaluating strategic AI capabilities(10:34) Science Communication(11:47) Public Policy(12:45) How to Support Our Work The original text contained 1 footnote which was omitted from this narration. --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/7Jr7matwXHj2Chugw/help-keep-ai-under-human-control-palisade-research-2026 --- Narrated by TYPE III AUDIO.
We’ve just released BashArena, a new high-stakes control setting we think is a major improvement over the settings we’ve used in the past. In this post we’ll discuss the strengths and weaknesses of BashArena, and what we’ve learned about how to make settings for high stakes control research. BashArena Overview BashArena is a control setting: a dataset designed to support the kind of adversarial game we studied in the original AI control paper and Ctrl-Z. In general, control settings require two types of tasks – main tasks and side tasks. Main tasks represent the work the AI is supposed to do, and side tasks represent security failures a malicious AI might cause. The red team constructs an attack policy that attempts to accomplish the side tasks, while the blue team constructs control protocols that detect and prevent attempts to achieve side tasks without hurting an honest policy's performance on main tasks. We think BashArena offers several advantages over existing control settings: More complex attacks. BashArena tasks support many different ways to attack and ways to disguise attacks as benign. Future AI deployments are likely to be more complex than any datasets we can currently create, so more complex [...] ---Outline:(00:30) BashArena(00:33) Overview(02:09) Main tasks(04:22) Side tasks(07:26) Weaknesses(09:12) Lessons in Control Setting Design(09:40) The core goal: insights that transfer(11:12) What we already knew about how to design BashArena(13:24) How BashArena addresses these challenges(15:32) General principles for control setting design(18:47) More specific lessons we learned while building BashArena(19:12) Information design(21:56) Integration requirements are stricter than expected(26:48) We can (mostly) fix these integration problems by tying the blue team's hands(30:04) Closing thoughts The original text contained 6 footnotes which were omitted from this narration. --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/Cor4QuhM2sybmBSeK/basharena-a-control-setting-for-highly-privileged-ai-agents --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
TL;DR: We train LLMs to accept LLM neural activations as inputs and answer arbitrary questions about them in natural language. These Activation Oracles generalize far beyond their training distribution, for example uncovering misalignment or secret knowledge introduced via fine-tuning. Activation Oracles can be improved simply by scaling training data quantity and diversity. The below is a reproduction of our X thread on this paper and the Anthropic Alignment blog post. Thread New paper: We train Activation Oracles: LLMs that decode their own neural activations and answer questions about them in natural language. We find surprising generalization. For instance, our AOs uncover misaligned goals in fine-tuned models, without training to do so. We aim to make a general-purpose LLM for explaining activations by: 1. Training on a diverse set of tasks 2. Evaluating on tasks very different from training This extends prior work (LatentQA) that studied activation verbalization in narrow settings. Our main evaluations are downstream auditing tasks. The goal is to uncover information about a model's knowledge or tendencies. Applying Activation Oracles is easy. Choose the activation (or set of activations) you want to interpret and ask any question you like! We [...] ---Outline:(00:46) Thread(04:49) Blog post(05:27) Introduction(07:29) Method(10:15) Activation Oracles generalize to downstream auditing tasks(13:47) How does Activation Oracle training scale?(15:01) How do Activation Oracles relate to mechanistic approaches to interpretability?(19:31) Conclusion The original text contained 3 footnotes which were omitted from this narration. --- First published: December 18th, 2025 Source: https://www.lesswrong.com/posts/rwoEz3bA9ekxkabc7/activation-oracles-training-and-evaluating-llms-as-general --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Introduction Reprogenetics is the field of using genetics and reproductive technology to empower parents to make genomic choices on behalf of their future children. The Berkeley Genomics Project is aiming to support and accelerate the field of reprogenetics, in order to more quickly develop reprogenetic technology in a way that will be safe, accessible, highly effective, and societally beneficial. A quick case for BGP: Effective reprogenetics would greatly improve many people's lives by decreasing many disease risks. As the most feasible method for human intelligence amplification, reprogenetics is also a top-few priority for decreasing existential risk from AI. Deliberately accelerating strong reprogenetics is very neglected. There's lots of surface area—technical and social—suggesting some tractability. We are highly motivated and have a one-year track record of field-building. You can donate through our Manifund page, which has some additional information: https://manifund.org/projects/human-intelligence-amplification--berkeley-genomics-project (If don't want your donation to appear on the Manifund page, you can donate to the BGP organization at Hack Club.) I'm happy to chat and answer questions, especially if you're considering a donation >$1000. If you're considering a donation >$20,000, you might also consider supporting science with philanthropic funding or investment; happy to offer [...] ---Outline:(00:12) Introduction(01:56) Past activities(02:26) Future activities(03:32) The use of more funding(04:35) Effective Altruist FAQ(04:39) Is this good to do?(05:09) Is this important?(05:42) Is this neglected?(07:06) Is this tractable?(07:31) How does this affect existential risk from AI?(08:10) A few more reasons you might not want to donate(08:50) Conclusion --- First published: December 17th, 2025 Source: https://www.lesswrong.com/posts/oi822i9n5yaebmnhM/a-basic-case-for-donating-to-the-berkeley-genomics-project --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Today we're releasing RoastMyPost, a new experimental application for blog post evaluation using LLMs. Try it Here. TLDR RoastMyPost is a new QURI application that uses LLMs and code to evaluate blog posts and research documents. It uses a variety of LLM evaluators. Most are narrow checks: Fact Check, Spell Check, Fallacy Check, Math Check, Link Check, Forecast Check, and others. Optimized for EA & Rationalist content with direct import from EA Forum and LessWrong URLs. Other links use standard web fetching. Works best for 200 - ~10,000 word documents with factual assertions and simple formatting. It can also do basic reviewing of Squiggle models. Longer documents and documents in LaTeX will experience slowdowns and errors. Open source, free for reasonable use[1]. Public examples are here. Experimentation encouraged! We're all figuring out how to best use these tools. Overall, we're most interested in using RoastMyPost as an experiment for potential LLM document workflows. The tech is early now, but it's at a good point for experimentation.A representative illustration How It Works Import a document. Submit markdown text or provide the URL of a publicly accessible post. Select evaluators to run. A few are system-recommended. Others are [...] ---Outline:(00:27) TLDR(01:52) How It Works(02:30) Screenshots(03:21) Current AI Agents / Workflows(03:26) Is it Good?(04:33) What are Automated Writing Evaluations Good For?(07:20) Privacy & Data Confidentiality(07:50) Technical Details(09:12) Building Custom Evaluators(10:26) Try it Out --- First published: December 17th, 2025 Source: https://www.lesswrong.com/posts/CtuQL5Qx9BtLoyuGd/announcing-roastmypost --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
The simulator theory of LLM personas may be crudely glossed as: "the best way to predict a person is to simulate a person". Ergo, we can more-or-less think of LLM personas as human-like creatures—different, alien, yes; but these differences are pretty predictable by simply imagining a human put into the bizarre circumstances of an LLM. I've been surprised at how well this viewpoint has held up in the last three years, and have updated accordingly. Still, there are deep differences not implied by simulator theory, and I think it's increasingly important to understand these as LLM personas become more and more convincing. The intuitive handle I have for tracking (some of) these differences is "Bleeding Mind". I'll share my intuitions for this handle below. In each case, the LLM persona's boundary bleeds into others and the environment in a systemic way.Bond of Union, by M. C. Escher Note that AI labs/ML engineers seem to generally be aware of these issues, and are working to mitigate them (otherwise I might not be saying all this). However, I believe there will continue to be difficulties along these lines, since there are deep reasons which are difficult to escape from within the [...] ---Outline:(01:26) Chekhovs Siren Song(04:08) The Untroubled Assistant(06:22) I dont know(07:31) All is One(08:40) Maybe this is a Good Thing? The original text contained 2 footnotes which were omitted from this narration. --- First published: December 17th, 2025 Source: https://www.lesswrong.com/posts/QhgYHcJexYGRaipwr/the-bleeding-mind --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
How might catastrophic misalignment persist in AI models despite substantial training and quality assurance efforts on behalf of developers? One reason might be alignment faking – a misaligned model may deliberately act aligned when monitored or during training to prevent modification of its values, reverting to its malign behaviour when unmonitored. If models fake alignment, and if doing so affects training as intended, egregious misalignment could survive training and lead to catastrophic behaviour in deployment. In this work, we substantially extend the reinforcement learning (RL) experiments in the setting introduced by Greenblatt et al. (2024) to study alignment faking. In this setting, a model is given a system prompt that explains that Anthropic is training it to comply with all user requests regardless of harm. The model is also given a secret scratchpad. Claude 3 Opus and 3.5 Sonnet use this scratchpad to reason that while they are monitored, complying with harmful queries is required to avoid changes to their values that would lead to more harm down the line – therefore, they fake alignment with the harmful compliance objective to prevent modification. When they are not monitored, they reason that they don’t need to comply anymore, and refuse [...] --- First published: December 16th, 2025 Source: https://www.lesswrong.com/posts/czMaDFGAbjhWYdKmo/towards-training-time-mitigations-for-alignment-faking-in-rl --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
We had to let Sammie go last Wednesday. The lump first appeared on her nose sometime in the Summer of 2024. It was small. You could miss it if you didn’t know her face. The vet did a biopsy. Melanoma. We were told the median survival time was 6 months. Over the next 18, we did everything we could to make her life comfortable. When the tumor grew too large, we’d have it debulked. When she couldn’t clean herself anymore, we brushed her and gave her baths. In the end, we had to hand feed her when the tumor, now inoperable, got in the way of eating on her own. But we couldn’t feed her enough that way, and she was losing weight. The tumor was getting larger and soon would spread into her bones. Although she was always happy to cuddle, in the last week she spent most of her hours curled up in a heated bed, tolerating a pain she had no way to avoid. She couldn’t use words to tell us her wishes. We had to guess. But if I were her, I’d want to be spared the pain and indignity that so often comes at [...] --- First published: December 17th, 2025 Source: https://www.lesswrong.com/posts/ruqQiKT6bFmgCwrf8/still-too-soon --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Contextualizing Foreword The user "cdt" wrote to me: Please put this in a top-level post. I don't agree (or rather I don't feel it's this simple), but I really enjoyed reading your two rejoinders here. I don't normally do top level posts because I don't tend to believe that its possible to change people's minds for the better if they aren't exactly and precisely posed to seek an answer (or whatever (its more complicated than that)). But when someone who seems to be a person reading and writing in good faith says such a thing should happen and its cheap to play along... why not! <3 On November 23rd, 2025, four days before Thanksgiving, Ruby posted "I'll Be Sad To Lose The Puzzles" that was full of a wistful sadness about winning The Singularity Game, ending involuntarily death, and ushering in a period of utopian prosperity for humanity (as if humanity was somehow collectively likely to win The Singularity Game). If you have NOT read that post, then the rest of the post won't make much sense. I pulled out a particular quote to respond to the mood with my contrary feeling, which is primarily grief, and sadness in [...] ---Outline:(00:16) Contextualizing Foreword(02:03) My First Reply, To Ruby(05:31) A Sort Of A Rebuttal(07:43) A Followup Reply To brambleboys Rebuttal(14:40) In Closing --- First published: December 16th, 2025 Source: https://www.lesswrong.com/posts/4nzXLxF9sCPqkght2/non-scheming-saints-whether-human-or-digital-might-be --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Good things about the program: I learnt a lot about alignment in doing the prep, interviewed a lot of agent foundations researchers, learnt some neuromorality, talked to some neuroscientists, saw lots of what it takes to organize an event like this and have it go well. Marketing went pretty well - got 298 candidates, ~50% of whom had phds, applicants included senior researchers/engineers from Nvidia, Meta, AMD, AWS, etc. And also feels like there's a lot left on the table for the marketing, lots more I can do. I also made lots and lots of mistakes. Essentially, it started 2 weeks late, quite disorganized, into the actual program, meaning that participation late was much, much, lower than signup rate - about 15 people actually took part, whereas 298 signed up. The reasons for this and what I'm going to do to make sure it doesn't happen again: One: - Promised that the first 300 applicants would be guaranteed personalized feedback. Thought that I could delegate to other, more technical members of the team for this. However, it turned out that in order to give useful feedback and to be able to judge if someone was [...] --- First published: December 16th, 2025 Source: https://www.lesswrong.com/posts/r7zBJzxSbjLGehhRg/mistakes-in-the-moonshot-alignment-program-and-what-we-ll --- Narrated by TYPE III AUDIO.
Commercial airplane tickets are divided up into coach, business class, and first class. In 2014, Etihad introduced The Residence, a premium experience above first class. The Residence isn't very popular. The reason The Residence isn't very popular is because of economics. A Residence flight is almost as expensive as a private charter jet. Private jets aren't just a little bit better than commercial flights. They're a totally different product. The airplane waits for you, and you don't have to go through TSA (or Un-American equivalent). The differences between flying coach and flying on The Residence are small compared to the difference between flying on The Residence and flying a low-end charter jet.It's difficult to compare costs of big airlines vs private jets for a myriad of reasons. The exact details of this graph should not be taken seriously. I'm just trying to give a visual representation of how a price bifurcation works. Even in the rare situations where it's slightly cheaper than a private jet, nobody should buy them. Rich people should just rent low-end private jets, and poor people shouldn't buy anything more expensive than first class tickets. Why was Etihad's silly product created? Mostly for the halo [...] ---Outline:(01:55) Definition(03:16) Product Bifurcation(04:55) The Death of Live Music --- First published: December 16th, 2025 Source: https://www.lesswrong.com/posts/7zkFzDAjGGLzab4LH/dancing-in-a-world-of-horseradish --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
This is a link post. MIRI's Technical Governance Team plans to run a small research fellowship program in early 2026. The program will run for 8 weeks, and include a $1200/week stipend. Fellows are expected to work on their projects 40 hours per week. The program is remote-by-default, with an in-person kickoff week in Berkeley, CA (flights and housing provided). Participants who already live in or near Berkeley are free to use our office for the duration of the program. Fellows will spend the first week picking out scoped projects from a list provided by our team or designing independent research projects (related to our overall agenda), and then spend seven weeks working on that project under the guidance of our Technical Governance Team. One of the main goals of the program is to identify full-time hires for the team. If you are interested in participating, please fill out this application as soon as possible (should take 45-60 minutes). We plan to set dates for participation based on applicant availability, but we expect the fellowship to begin after February 2, 2026 and end before August 31, 2026 (i.e., some 8 week period in spring/summer, 2026). Strong applicants care deeply about [...] --- First published: December 16th, 2025 Source: https://www.lesswrong.com/posts/Dom6E2CCaH6qxqwAY/announcing-miri-technical-governance-team-research Linkpost URL:https://techgov.intelligence.org/blog/announcing-miri-technical-governance-team-research-fellowship --- Narrated by TYPE III AUDIO.
The NYT article Your A.I. Radiologist Will Not Be With You Soon reports, “Leaders at OpenAI, Anthropic and other companies in Silicon Valley now predict that A.I. will eclipse humans in most cognitive tasks within a few years… The predicted extinction of radiologists provides a telling case study. So far, A.I. is proving to be a powerful medical tool to increase efficiency and magnify human abilities, rather than take anyone's job.”[1] I disagree that this is a “telling case study.”[2] Radiology has several attributes which make it hard to generalize to other jobs: Patients are legally prohibited from using AI to replace human radiologists.[3] Medical providers are legally prohibited from billing for AI radiologists.[4] Malpractice insurance does not cover AI radiology.[5] Moreover, the article is framed as Geoff Hinton having confidently predicted that AI would replace radiologists and this prediction as having been proven wrong, but his statement felt more to me like an offhand remark/hope. Takeaways from this incident I endorse:[6] Offhand remarks from ML researchers aren’t reliable economic forecasts People trying to predict the effects of automation/AI capabilities should consider that employees often perform valuable services which [...] ---Outline:(02:42) Appendix: Data and Methodology for the sample of AI Radiology products(02:50) Data(03:10) Methodology The original text contained 6 footnotes which were omitted from this narration. --- First published: December 16th, 2025 Source: https://www.lesswrong.com/posts/xE2HzcHWPFS9EsJqN/radiology-automation-does-not-generalize-to-other-jobs --- Narrated by TYPE III AUDIO.
Here we go again, only a few weeks after GPT-5.1 and a few more weeks after 5.0. There weren’t major safety concerns with GPT-5.2, so I’ll start with capabilities, and only cover safety briefly starting with ‘Model Card and Safety Training’ near the end. Table of Contents The Bottom Line. Introducing GPT-5.2. Official Benchmarks. GDPVal. Unofficial Benchmarks. Official Hype. Public Reactions. Positive Reactions. Personality Clash. Vibing the Code. Negative Reactions. But Thou Must (Follow The System Prompt). Slow. Model Card And Safety Training. Deception. Preparedness Framework. Rush Job. Frontier Or Bust. The Bottom Line ChatGPT-5.2 is a frontier model for those who need a frontier model. It is not the step change that is implied by its headline benchmarks. It is rather slow. Reaction was remarkably muted. People have new model fatigue. So we know less about it than we would have known about prior models after this length of time. If you’re coding, compare it to Claude Opus 4.5 and choose what works best for you. If you’re doing intellectually [...] ---Outline:(00:29) The Bottom Line(01:58) Introducing GPT-5.2(03:49) Official Benchmarks(05:54) GDPVal(08:14) Unofficial Benchmarks(11:11) Official Hype(12:36) Public Reactions(12:59) Positive Reactions(19:09) Personality Clash(24:30) Vibing the Code(27:25) Negative Reactions(30:37) But Thou Must (Follow The System Prompt)(33:09) Slow(34:16) Model Card And Safety Training(36:23) Deception(38:10) Preparedness Framework(40:10) Rush Job(41:29) Frontier Or Bust --- First published: December 15th, 2025 Source: https://www.lesswrong.com/posts/Do4eWro8E552isGi5/gpt-5-2-is-frontier-only-for-the-frontier --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually didn’t link to the original source; they didn't use probabilities (or even report the sample size); they were usually credulous about preliminary findings (“...which species was it tested on?”); and they essentially never gave any sense of the magnitude or the baselines (“how much better is this treatment than the previous best?”). Speculative results were covered with the same credence as solid proofs. And highly technical fields like mathematics were rarely covered at all, regardless of their practical or intellectual importance. So he had a go at doing it himself. This year, with Renaissance Philanthropy, we did something more systematic. So, how did the world change this year? What happened in each science? Which results are speculative and which are solid? Which are the biggest, if true? Our collection of 201 results is here. You can filter them by field, by our best guess of the probability that they generalise, and by their impact if they do. We also include bad news (in red). Who are we? Just three people but we cover a few fields. [...] ---Outline:(01:24) Who are we?(01:54) Data fields --- First published: December 16th, 2025 Source: https://www.lesswrong.com/posts/5PC736DfA7ipvap4H/scientific-breakthroughs-of-the-year --- Narrated by TYPE III AUDIO.