Discover
LessWrong (Curated & Popular)

635 Episodes
Reverse
Current AI models are strange. They can speak—often coherently, sometimes even eloquently—which is wild. They can predict the structure of proteins, beat the best humans at many games, recall more facts in most domains than human experts; yet they also struggle to perform simple tasks, like using computer cursors, maintaining basic logical consistency, or explaining what they know without wholesale fabrication. Perhaps someday we will discover a deep science of intelligence, and this will ...
About half a year ago, I decided to try stop insulting myself for two weeks. No more self-deprecating humour, calling myself a fool, or thinking I'm pathetic. Why? Because it felt vaguely corrosive. Let me tell you how it went. Spoiler: it went well. The first thing I noticed was how often I caught myself about to insult myself. It happened like multiple times an hour. I would lay in bed at night thinking, "you mor- wait, I can't insult myself, I've still got 11 days to go. Dagnabbit." Th...
About me and this review: I don’t identify as a member of the rationalist community, and I haven’t thought much about AI risk. I read AstralCodexTen and used to read Zvi Mowshowitz before he switched his blog to covering AI. Thus, I’ve long had a peripheral familiarity with LessWrong. I picked up IABIED in response to Scott Alexander's review, and ended up looking here to see what reactions were like. After encountering a number of posts wondering how outsiders were responding to the book, I...
I've noticed an antipattern. It's definitely on the dark pareto-frontier of "bad argument" and "I see it all the time amongst smart people". I'm confident it's the worst, common argument I see amongst rationalists and EAs. I don't normally crosspost to the EA forum, but I'm doing it now. I call it Exhaustive Free Association. Exhaustive Free Association is a step in a chain of reasoning where the logic goes "It's not A, it's not B, it's not C, it's not D, and I can't think of any more thin...
Intro LLMs being trained with RLVR (Reinforcement Learning from Verifiable Rewards) start off with a 'chain-of-thought' (CoT) in whatever language the LLM was originally trained on. But after a long period of training, the CoT sometimes starts to look very weird; to resemble no human language; or even to grow completely unintelligible. Why might this happen? I've seen a lot of speculation about why. But a lot of this speculation narrows too quickly, to just one or two hypotheses. My in...
It's amazing how much smarter everyone else gets when I take antidepressants. It makes sense that the drugs work on other people, because there's nothing in me to fix. I am a perfect and wise arbiter of not only my own behavior but everyone else's, which is a heavy burden because some of ya’ll are terrible at life. You date the wrong people. You take several seconds longer than necessary to order at the bagel place. And you continue to have terrible opinions even after I explain the ri...
This is a link post for two papers that came out today: Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time (Tan et al.) Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment (Wichers et al.) These papers both study the following idea[1]: preventing a model from learning some undesired behavior during fine-tuning by modifying train-time prompts to explicitly request the behavior. We call this technique ...
I woke up Friday morning w/ a very sore left shoulder. I tried stretching it, but my left chest hurt too. Isn't pain on one side a sign of a heart attack? Chest pain, arm/shoulder pain, and my breathing is pretty shallow now that I think about it, but I don't think I'm having a heart attack because that'd be terribly inconvenient. But it'd also be very dumb if I died cause I didn't go to the ER. So I get my phone to call an Uber, when I suddenly feel very dizzy and nauseous. My wife is...
Sahil has been up to things. Unfortunately, I've seen people put effort into trying to understand and still bounce off. I recently talked to someone who tried to understand Sahil's project(s) several times and still failed. They asked me for my take, and they thought my explanation was far easier to understand (even if they still disagreed with it in the end). I find Sahil's thinking to be important (even if I don't agree with all of it either), so I thought I would attempt to write an expla...
Of course, you must understand, I couldn't be bothered to act. I know weepers still pretend to try, but I wasn't a weeper, at least not then. It isn't even dangerous, the teeth only sharp to its target. But it would not have been right, you know? That's the way things are now. You ignore the screams. You put on a podcast: two guys talking, two guys who are slightly cleverer than you but not too clever, who talk in such a way as to make you feel you're not some pathetic voyeur consuming a por...
I was hoping to write a full review of "If Anyone Builds It, Everyone Dies" (IABIED Yudkowski and Soares) but realized I won't have time to do it. So here are my quick impressions/responses to IABIED. I am writing this rather quickly and it's not meant to cover all arguments in the book, nor to discuss all my views on AI alignment; see six thoughts on AI safety and Machines of Faithful Obedience for some of the latter. First, I like that the book is very honest, both about the authors' fea...
Suppose misaligned AIs take over. What fraction of people will die? I'll discuss my thoughts on this question and my basic framework for thinking about it. These are some pretty low-effort notes, the topic is very speculative, and I don't get into all the specifics, so be warned. I don't think moderate disagreements here are very action-guiding or cruxy on typical worldviews: it probably shouldn't alter your actions much if you end up thinking 25% of people die in expectation from misalign...
I wrote my recent Accelerando post to mostly stand on it's own as a takeoff scenario. But, the reason it's on my mind is that, if I imagine being very optimistic about how a smooth AI takeoff goes, but where an early step wasn't "fully solve the unbounded alignment problem, and then end up with extremely robust safeguards[1]"... ...then my current guess is that Reasonably Nice Smooth Takeoff still results in all or at least most biological humans dying (or, "dying out", or at best, ambiguo...
The Standard Reading If you've heard of Le Guin's ‘The Ones Who Walk Away from Omelas’, you probably know the basic idea. It's a go-to story for discussions of utilitarianism and its downsides. A paper calls it “the infamous objection brought up by Ursula Le Guin”. It shows up in university ‘Criticism of Utilitarianism' syllabi, and is used for classroom material alongside the Trolley Problem. The story is often also more broadly read as a parable about global inequality, the comfortable rich...
Related to: Commonsense Good, Creative Good (and my comment); Ethical Injunctions. Epistemic status: I’m fairly sure “ethics” does useful work in building human structures that work. My current explanations of how are wordy and not maximally coherent; I hope you guys help me with that. Introduction It is intractable to write large, good software applications via spaghetti code – but it's comparatively tractable using design patterns (plus coding style, attention to good/bad codesmell, ...
I The popular conception of Dunning-Kruger is something along the lines of “some people are too dumb to know they’re dumb, and end up thinking they’re smarter than smart people”. This version is popularized in endless articles and videos, as well as in graphs like the one below. Usually I'd credit the creator of this graph but it seems rude to do that when I'm ragging on them Except that's wrong. II The canonical Dunning-Kruger graph looks like this: Notice that all the dots are in ...
Tl;dr: We believe shareholders in frontier labs who plan to donate some portion of their equity to reduce AI risk should consider liquidating and donating a majority of that equity now. Epistemic status: We’re somewhat confident in the main conclusions of this piece. We’re more confident in many of the supporting claims, and we’re likewise confident that these claims push in the direction of our conclusions. This piece is admittedly pretty one-sided; we expect most relevant members of our...
Hi all! After about five years of hibernation and quietly getting our bearings,[1] CFAR will soon be running two pilot mainline workshops, and may run many more, depending how these go. First, a minor name change request We would like now to be called “A Center for Applied Rationality,” not “the Center for Applied Rationality.” Because we’d like to be visibly not trying to be the one canonical locus. Second, pilot workshops! We have two, and are currently accepting applic...
Cross-posted from my Substack To start off with, I’ve been vegan/vegetarian for the majority of my life. I think that factory farming has caused more suffering than anything humans have ever done. Yet, according to my best estimates, I think most animal-lovers should eat meat. Here's why: It is probably unhealthy to be vegan. This affects your own well-being and your ability to help others. You can eat meat in a way that substantially reduces the suffering you cause to non...
This is a link post. Today, the Global Call for AI Red Lines was released and presented at the UN General Assembly. It was developed by the French Center for AI Safety, The Future Society and the Center for Human-compatible AI. This call has been signed by a historic coalition of 200+ former heads of state, ministers, diplomats, Nobel laureates, AI pioneers, scientists, human rights advocates, political leaders, and other influential thinkers, as well as 70+ organizations. Signatories inclu...