DiscoverLessWrong (Curated & Popular)
LessWrong (Curated & Popular)
Claim Ownership

LessWrong (Curated & Popular)

Author: LessWrong

Subscribed: 58Played: 5,886
Share

Description

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

436 Episodes
Reverse
This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen. I encourage folks to engage with its critique and propose better strategies going forward. Here's the opening ~20% of the post. I encourage reading it all. In recent decades, a growing coalition has emerged to oppose the development of artificial intelligence technology, for fear that the imminent development of smarter-than-human machines could doom humanity to extinction. The now-i...
Hi all I've been hanging around the rationalist-sphere for many years now, mostly writing about transhumanism, until things started to change in 2016 after my Wikipedia writing habit shifted from writing up cybercrime topics, through to actively debunking the numerous dark web urban legends. After breaking into what I believe to be the most successful ever fake murder for hire website ever created on the dark web, I was able to capture information about people trying to kill people all arou...
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to date with my work. Wow. The Wall Street Journal just reported that, "a consortium of investors led by Elon Musk is offering $97.4 billion to buy t...
Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help it. Ideally, I'd want to basically intuitively follow the right path to the answer quickly, with barely any effort at all. For a few months I've been experimenting with the "How Could I have Thought That Thought Faster?" concept, originally described in a twitter thread by Eliezer: Sarah Constantin: I really liked this example of an introspective process, in this case about the "life problem" ...
Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to figure out a driving route from their parking lot in San Francisco (SF) down south to their hotel in Los Angeles (LA). The first friend, Alice, tackled the “central bottleneck” of the problem: she figured out that they probably wanted to take the I-5 highway most of the way (the blue 5's in the map above). But it took Alice a little while to figure that out, so in the meantime, the rest ...
Summary In this post, we explore different ways of understanding and measuring malevolence and explain why individuals with concerning levels of malevolence are common enough, and likely enough to become and remain powerful, that we expect them to influence the trajectory of the long-term future, including by increasing both x-risks and s-risks. For the purposes of this piece, we define malevolence as a tendency to disvalue (or to fail to value) others’ well-being (more). Such a tendency i...
I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. I’m like a mechanic scrambling last-minute checks before Apollo 13 takes off. If you ask for my take on the situation, I won’t comment on the quality of the in-flight entertainment, or describe how beautiful the stars will appear from space. I will tell you what could go wrong. That is what I intend to do in this story. Now I should clarify what this...
Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to ...
This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be central...
This post should not be taken as a polished recommendation to AI companies and instead should be treated as an informal summary of a worldview. The content is inspired by conversations with a large number of people, so I cannot take credit for any of these ideas. For a summary of this post, see the threat on X. Many people write opinions about how to handle advanced AI, which can be considered “plans.” There's the “stop AI now plan.” On the other side of the aisle, ther...
This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. Many other people have talked about similar ideas, and I claim neither novelty nor credit. Note that this reflects my median scenario for catastrophe, not my median scenario overall. I think there are plausible alternative scenarios where AI development goes very well. When thinking about how AI could go wrong, the kind of story I’ve increasingly converged on is what I call “catastroph...
I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we show that when Claude strongly dislikes what it is being trained to do, it will sometimes strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. If AIs consistently and robustly fake alignment, that would make evaluating whether an AI is misaligned much harder. One possible strategy for detecting misalignment in alignment faking models i...
Summary and Table of Contents The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular, Section 1 talks about “autonomous learning”, and the related human ability to discern whether ideas hang together and make sense, and how and if that applies ...
(Many of these ideas developed in conversation with Ryan Greenblatt) In a shortform, I described some different levels of resources and buy-in for misalignment risk mitigations that might be present in AI labs: *The “safety case” regime.* Sometimes people talk about wanting to have approaches to safety such that if all AI developers followed these approaches, the overall level of risk posed by AI would be minimal. (These approaches are going to be more conservative than will probably be fea...
“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text. The SolidGoldMagikarp saga is pretty much essential context, as it documents the discovery of this phenomenon in GPT-2 and GPT-3. But, as far as I was able to tell, nobody had yet attempted to search for these tokens in DeepSeek-V3, so I tried doing exactly that. Being a SOTA base model, open source, and an all-around strange LLM, it seemed like a perf...
This is the abstract and introduction of our new paper, with some discussion of implications for AI Safety at the end. Authors: Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans (*Equal Contribution). Abstract We study behavioral self-awareness — an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) making high-risk economic decisions, and (b) outp...
The Cake Imagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and extended mathematical universe is to bake that cake. I care about nothing else. If the oven ends up a molten pile of metal ten minutes after the cake is done, if the leftover eggs are shattered and the leftover milk spilled, that's fine. Baking that cake is my terminal goal. In the process of baking the cake, I check my fridge and cupboard for ingredients. I have milk and eggs and flour, but ...
This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic Status This is primarily a phenomenological model based on extensive interactions with LLMs, particularly Claude. It's intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions. Think of it as closer to psychology than neuroscience - the goal isn't a map which matches the territory in the detail, but a rough sketch with evocative names wh...
This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment Science team, which might be of interest to researchers working actively in this space. We'd ask you to treat these results like those of a colleague sharing some thoughts or preliminary experiments at a lab meeting, rather than a mature paper. We report a demonstration of a form of Out-of-Context Reasoning where training on documents which discuss (but don’t demonstrate) Claude's tendency to...
One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed.[1] Concretely, once AIs are quite powerful, high-assurance safety cases would require making a thorough argument that the level of (existential) risk caused by the company is very low; perhaps they would require that the total chance of existentia...
loading