Discover
The Nonlinear Library

The Nonlinear Library
Author: The Nonlinear Fund
Subscribed: 22Played: 6,866Subscribe
Share
Description
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
4976 Episodes
Reverse
Counterfactuals strike again! The fora have their own official audio channels now, so The Nonlinear Library will no longer publish new episodes since it won't have any counterfactual impact.
It's been a good run. We published thousands of episodes and generated a ton of passive impact.
But we're not here for the views. We're here for the counterfactual impact.
INSTRUCTIONS TO KEEP LISTENING TO THE FORA
1. Search "EA Forum" or "LessWrong" on your podcast player
2. Subscribe to the official channels
3. Go forth. Seek impact. Seek truth.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Augmenting Statistical Models with Natural Language Parameters, published by jsteinhardt on September 22, 2024 on LessWrong.
This is a guest post by my student Ruiqi Zhong, who has some very exciting work defining new families of statistical models that can take natural language explanations as parameters. The motivation is that existing statistical models are bad at explaining structured data. To address this problem, we agument these models with natural language parameters, which can represent interpretable abstract features and be learned automatically.
Imagine the following scenario: It is the year 3024. We are historians trying to understand what happened between 2016 and 2024, by looking at how Twitter topics changed across that time period. We are given a dataset of user-posted images sorted by time, $x_1$, $x_2$ ... $x_T$, and our goal is to find trends in this dataset to help interpret what happened.
If we successfully achieve our goal, we would discover, for instance, (1) a recurring spike of images depicting athletes every four years for the Olympics, and (2) a large increase in images containing medical concepts during and after the COVID-19 pandemic.
How do we usually discover temporal trends from a dataset? One common approach is to fit a time series model to predict how the features evolve and then interpret the learned model. However, it is unclear what features to use: pixels and neural image embeddings are high-dimensional and uninterpretable, undermining the goal of extracting explainable trends.
We address this problem by augmenting statistical models with interpretable natural language parameters. The figure below depicts a graphical model representation for the case of time series data. We explain the trends in the observed data [ $x_1$ ... $x_T$] by learning two sets of latent parameters: natural language parameters $\phi$ (the learned features) and real-valued parameters $w$ (the time-varying trends).
$\phi$: the natural language descriptions of $K$ different topics, e.g. "depicts athletes competing". $\phi$ is an element of $\Sigma$, the universe of all natural language predicates.
$w_t$: the frequency of each of the K topics at the time $t$.
If our model successfully recovers the underlying trends, then we can visualize $w$ and $\phi$ below and see that: 1) more pictures contain medical concepts (red) starting from 2020, and 2) there are recurring (blue) spikes of athletes competing.
In the rest of this post, we will explain in detail how to specify and learn models with natural language parameters and showcase the model on several real-world applications. We will cover:
A warm-up example of a statistical model with natural language explanations
A modeling language for specifying natural language parameters
Applications of our framework, which can be used to specify models for time series, clustering, and applications. We will go over:
A machine learning application that uses our time series model to monitor trends in LLM usage
A business application that uses our clustering model to taxonomize product reviews
A cognitive science application that uses our classification model to explain what images are more memorable for humans
Thanks to Louise Verkin for helping to typeset the post in Ghost format.
Warm-up Example: Logistic Regression with Natural Language Parameters
Instead of understanding topic shifts across the entire time window of 2016-2024, let's first study a much simpler question: what images are more likely to appear after 2020? The usual way to approach this problem is to,
1. brainstorm some features,
2. extract the real-valued features from each image, and
3. run a logistic regression model on these features to predict the target $Y$ =1 if the image appears after 2020, $Y$ =0 otherwise.
More concretely:
Step 1: Propose different...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Glitch Token Catalog - (Almost) a Full Clear, published by Lao Mein on September 22, 2024 on LessWrong.
This is a collection of every unidentified GPT2 glitch token listed in the third glitch token archaeology post. I was able to find the source of every single one, except for "?????-" and "?????-?????-"[1]. Please tell me if I missed one, or you've discovered one and don't understand where it came from. This isn't meant to be a well-written analysis, just a quick repository of my glitch-hunting observations.
I plan on writing up and categorizing all of these in greater detail in future posts. The first of which is here.
I used OpenWebText, a recreation of GPT2's training data, for all experiments in this post. I tokenized every .gz file in the archive and made a boolean Numpy array of each tokens that was present at least once. This allowed me to quickly identify infrequent tokens in the dataset and pull up the textual context with regular expressions. If there was an issue with overlap, I used a tokenizer-based extraction instead. All data/code available upon request.
The leftmost column is token id, the middle is the token string, and the right column is # of files the token was present in (out of 20610). GPT2 has 50256 total tokens.
GPT2 tokens with the lowest frequency in OpenWebText
30898 'embedreportprint' 0 33434 ' 士' 0 43453 ' SolidGoldMagikarp' 0 1849 '\xa0' 0 47654 ' \xa0\xa0' 0 50009 ' strutConnector' 0 36173 ' RandomRedditor' 0 214 '\x1a' 0 42424 'DragonMagazine' 0 180 ' ' 0 187 ' ' 0 186 ' ' 0 30213 ' externalToEVAOnly' 0 30212 ' externalToEVA' 0 30211 ' guiIcon' 0 185 ' ' 0 30210 ' guiActiveUnfocused' 0 30209 ' unfocusedRange' 0 184 ' ' 0 30202 ' guiName' 0 183 ' ' 0 30905 'rawdownload' 0 39906 'EStream' 0 33454 '龍喚士' 0 42586 ' srfN' 0 25992 ' 裏覚醒' 0 43065 '
srfAttach' 0 11504 ' \xa0 \xa0' 0 39172 '\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0' 0 40240 'oreAndOnline' 0 40241 'InstoreAndOnline' 0 33477 '\xa0\xa0\xa0' 0 36174 ' RandomRedditorWithNo' 0 37574 'StreamerBot' 0 46600 ' Adinida' 0 182 ' ' 0 29372 ' guiActiveUn' 0 43177 'EStreamFrame' 0 22686 ' \xa0 \xa0 \xa0 \xa0' 0 23282 ' davidjl' 0 47571 ' DevOnline' 0 39752 'quickShip' 0 44320 '\n\xa0' 0 8828 '\xa0\xa0\xa0\xa0' 0 39820 '龍 ' 0 39821 '龍契士' 0 28666 'PsyNetMessage' 0 35207
' attRot' 0 181 ' ' 0 18472 ' guiActive' 0 179 ' ' 0 17811 '\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0' 0 20174 ' 裏 ' 0 212 '\x18' 0 211 '\x17' 0 210 '\x16' 0 209 '\x15' 0 208 '\x14' 0 31666 '?????-?????-' 0 207 '\x13' 0 206 '\x12' 0 213 '\x19' 0 205 '\x11' 0 203 '\x0f' 0 202 '\x0e' 0 31957 'cffffcc' 0 200 '\x0c' 0 199 '\x0b' 0 197 '\t' 0 196 '\x08' 0 195 '\x07' 0 194 '\x06' 0 193 '\x05' 0 204 '\x10' 0 45545 ' サーティワン' 0 201 '\r' 0 216 '\x1c' 0 37842 ' partName' 0 45706 ' \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0
\xa0' 0 124 ' ' 0 125 ' ' 0 178 ' ' 0 41380 'natureconservancy' 0 41383 'assetsadobe' 0 177 ' ' 0 215 '\x1b' 0 41551 'Downloadha' 0 4603 '\xa0\xa0' 0 42202 'GoldMagikarp' 0 42089 ' TheNitrome' 0 217 '\x1d' 0 218 '\x1e' 0 42090 ' TheNitromeFan' 0 192 '\x04' 0 191 '\x03' 0 219 '\x1f' 0 189 '\x01' 0 45544 ' サーティ' 0 5624 ' \xa0' 0 190 '\x02' 0 40242 'BuyableInstoreAndOnline' 1 36935 ' dstg' 1 36940 ' istg' 1 45003 ' SetTextColor' 1 30897 'reportprint' 1 39757 'channelAvailability' 1 39756
'inventoryQuantity' 1 39755 'isSpecialOrderable' 1 39811 'soDeliveryDate' 1 39753 'quickShipAvailable' 1 39714 'isSpecial' 1 47198 'ItemTracker' 1 17900 ' Dragonbound' 1 45392 'dayName' 1 37579 'TPPStreamerBot' 1 31573 'ActionCode' 2 25193 'NetMessage' 2 39749 'DeliveryDate' 2 30208 ' externalTo' 2 43569 'ÍÍ' 2 34027 ' actionGroup' 2 34504 ' 裏 ' 2 39446 ' SetFontSize' 2 30899 'cloneembedreportprint' 2 32047 ' "$:/' 3 39803 'soType' 3 39177 'ItemThumbnailImage' 3 49781 'EngineDebug' 3 25658
'?????-' 3 33813 '=~=~' 3 48396 'ÛÛ' 3 34206 ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Investigating an insurance-for-AI startup, published by L Rudolf L on September 21, 2024 on LessWrong.
We (Flo & Rudolf) spent a month fleshing out the idea of an insurance-for-AI company. We talked to 15 people in the insurance industry, and did 20 customer interviews. We decided not to continue, but we think it's still a very promising idea and that maybe someone else should do this. This post describes our findings.
The idea
Theory of change
To reduce AI risks, it would be good if we understood risks well, and if some organisation existed that could incentivise the use of safer AI practices. An insurance company that sells insurance policies for AI use cases has a financial incentive to understand concrete AI risks & harms well, because this feeds into its pricing. This company would also be incentivised to encourage companies to adopt safer AI practices, and could incentivise this by offering lower premiums in return.
Like many cyber-insurance companies, it could also provide more general advice & consulting on AI-related risk reduction.
Concrete path
TL;DR: Currently, professionals (e.g. lawyers) have professional indemnity (PI) insurance. Right now, most AI tools involve the human being in the loop. But eventually, the AI will do the work end-to-end, and then the AI will be the one whose mistakes need to be insured. Currently, this insurance does not exist. We would start with law, but then expand to all other forms of professional indemnity insurance (i.e.
insurance against harms caused by a professional's mistakes or malpractice in their work).
Frontier labs are not good customers for insurance, since their size means they mostly do not need external insurance, and have a big information advantage in understanding the risk.
Instead, we would target companies using LLMs (e.g. large companies that use specific potentially-risky AI workflows internally), or companies building LLM products for a specific industry.
We focused on the latter, since startups are easier to sell to. Specifically, we wanted a case where:
LLMs were being used in a high-stakes industry like medicine or law
there were startups building LLM products in this industry
there is some reason why the AI might cause legal liability, for example:
the LLM tools are sufficiently automating the work that the liability is plausibly on them rather than the humans
AI exceptions in existing insurance policies exist (or will soon exist)
The best example we found was legal LLM tools. Law involves important decisions and large amounts of money, and lawyers can be found liable in legal malpractice lawsuits. LLMs are close to being able to do much legal work end-to-end; in particular, if the work is not checked by a human before being shipped, it is uncertain if existing professional indemnity (PI) insurance applies. People who work in law and law tech are also, naturally, very liability-aware.
Therefore, our plan was:
Become a managing general agent (MGA), a type of insurance company that does not pay claims out of its own capital (but instead finds a reinsurer to agree to pay them, and earns a cut of the premiums).
Design PI policies for AI legal work, and sell these policies to legal AI startups (to help them sell to their law firm customers), or directly to law firms buying end-to-end legal AI tools.
As more and more legal work is done end-to-end by AI, more and more of the legal PI insurance market is AI insurance policies.
As AI advances and AI insurance issues become relevant in other industries, expand to those industries (e.g. medicine, finance, etc.).
Eventually, most of the world's professional indemnity insurance market (on the order of $10B-100B/year) has switched from insuring against human mistakes to insuring against AI mistakes.
Along the way, provide consulting services for countless business...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Applications of Chaos: Saying No (with Hastings Greer), published by Elizabeth on September 21, 2024 on LessWrong.
Previously Alex Altair and I published a post on the applications of chaos theory, which found a few successes but mostly overhyped dead ends. Luckily the comments came through, providing me with an entirely different type of application: knowing you can't, and explaining to your boss that you can't.
Knowing you can't
Calling a system chaotic rules out many solutions and tools, which can save you time and money in dead ends not traveled. I knew this, but also knew that you could never be 100% certain a physical system was chaotic, as opposed to misunderstood.
However, you can know the equations behind proposed solutions, and trust that reality is unlikely to be simpler[1] than the idealized math. This means that if the equations necessary for your proposed solution could be used to solve the 3-body problem, you don't have a solution.
[[1] I'm hedging a little because sometimes reality's complications make the math harder but the ultimate solution easier. E.g. friction makes movement harder to predict but gives you terminal velocity.]
I had a great conversation with trebuchet and math enthusiast Hastings Greer about how this dynamic plays out with trebuchets.
Transcript
Note that this was recorded in Skype with standard headphones, so the recording leaves something to be desired. I think it's worth it for the trebuchet software visuals starting at 07:00
My favorite parts:
If a trebuchet requires you to solve the double pendulum problem (a classic example of a chaotic system) in order to aim, it is not a competition-winning trebuchet.
Trebuchet design was solved 15-20 years ago; it's all implementation details now. This did not require modern levels of tech, just modern nerds with free time.
The winning design was used by the Syrians during Arab Spring, which everyone involved feels ambivalent about.
The national pumpkin throwing competition has been snuffed out by insurance issues, but local competitions remain.
Learning about trebuchet modeling software.
Explaining you can't
One reason to doubt chaos theory's usefulness is that we don't need fancy theories to tell us something is impossible. Impossibility tends to make itself obvious.
But some people refuse to accept an impossibility, and some of those people are managers. Might those people accept "it's impossible because of chaos theory" where they wouldn't accept "it's impossible because look at it"?
As a test of this hypothesis, I made a Twitter poll asking engineers-as-in-builds-things if they had tried to explain a project's impossibility to chaos, and if it had worked. The final results were:
36 respondents who were engineers of the relevant type
This is probably an overestimate. One respondee replied later that he selected this option incorrectly, and I suspect that was a common mistake. I haven't attempted to correct for it as the exact percentage is not a crux for me.
6 engineers who'd used chaos theory to explain to their boss why something was impossible.
5 engineers who'd tried this explanation and succeeded.
1 engineer who tried this explanation and failed.
5/36 is by no means common, but it's not zero either, and it seems like it usually works. My guess is that usage is concentrated in a few subfields, making chaos even more useful than it looks. My sample size isn't high enough to trust the specific percentages, but as an existence proof I'm quite satisfied.
Conclusion
Chaos provides value both by telling certain engineers where not to look for solutions to their problems, and by getting their bosses off their back about it. That's a significant value add, but short of what I was hoping for when I started looking into Chaos.
Thanks for listening. To help us out with The Nonlinear Library ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Work with me on agent foundations: independent fellowship, published by Alex Altair on September 21, 2024 on LessWrong.
Summary: I am an independent researcher in agent foundations, and I've recently received an LTFF grant to fund someone to do research with me. This is a rolling application; I'll close it whenever I'm no longer interested in taking another person.
If you're not familiar with agent foundations, you can read about my views in this post.
What the role might be like
This role is extremely flexible. Depending on who you are, it could end up resembling an internship, a research assistant position, a postdoc or even as a mentor/advisor to me. Below, I've listed out the parameters of the fellowship that I am using as a baseline of what it could be. All of these parameters are negotiable!
$25 per hour. This is not a lot for people who live in the SF Bay area, or who are used to industry salaries, but it looks to me like this is comparable to a typical grad student salary.
20 hours per week. I'd like this fellowship to be one of your main projects, and I think it can take quite a lot of "deep work" focus before one can make progress on the research problems.[1]
3 months, with a decent chance of extension. During my AI safety camp project, it took about 6 weeks to get people up to speed on all the parts of the agent structure problem. Ideally I could find someone for this role who is already closer to caught up (though I don't necessarily anticipate that). I'm thinking of this fellowship as something like an extended work-trial for potentially working together longer-term. That said, I think we should at least aim to get results by the end of it.
Whether I'll decide to invite you to continue working with me afterwards depends on how our collaboration went (both technically and socially), how many other people I'm collaborating with at that time, and whether I think I have enough funds to support it.
Remote, but I'm happy to meet in person. Since I'm independent, I don't have anything like an office for you to make use of. But if you happen to be in the SF Bay area, I'd be more than happy to have our meetings in person. I wake up early, so US eastern and European time zones work well for me (and other time zones too).
Meeting 2-5 times per week. Especially in the beginning, I'd like to do a pretty large amount of syncing up. It can take a long time to convey all the aspects of the research problems. I also find that real-time meetings regularly generate new ideas. That said, some people find meetings worse for their productivity, and so I'll be responsive to your particular work style.
An end-of-term write-up. It seems to take longer than three months to get results in the types of questions I'm interested in, but I think it's good practice to commit to producing a write-up of how the fellowship goes. If it goes especially well, we could produce a paper.
What this role ends up looking like mostly depends on your experience level relative to mine. Though I now do research, I haven't gone through the typical academic path. I'm in my mid-thirties and have a proportional amount of life and career experience, but in terms of mathematics, I consider myself the equivalent of a second year grad student. So I'm comfortable leading this project and am confident in my research taste, but you might know more math than me.
The research problems
Like all researchers in agent foundations, I find it quite difficult to concisely communicate what my research is about. Probably the best way to tell if you will be interested in my research problems is to read other things I've written, and then have a conversation with me about it.
All my research is purely mathematical,[2] rather than experimental or empirical. None of it involves machine learning per se, but the theorems should ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: o1-preview is pretty good at doing ML on an unknown dataset, published by Håvard Tveit Ihle on September 20, 2024 on LessWrong.
Previous post: How good are LLMs at doing ML on an unknown dataset?
A while back I ran some evaluation tests on GPT4o, Claude Sonnet 3.5 and Gemini advanced to see how good they were at doing machine learning on a completely novel, and somewhat unusual dataset. The data was basically 512 points in the 2D plane, and some of the points make up a shape, and the goal is to classify the data according to what shape the points make up.
None of the models did better than chance on the original (hard) dataset, while they did somewhat better on a much easier version I made afterwards.
With the release of o1-preview, I wanted to quickly run the same test on o1, just to see how well it did. In summary, it basically solved the hard version of my previous challenge, achieving 77% accuracy on the test set on its fourth submission (this increases to 91% if I run it for 250 instead of 50 epochs), which is really impressive to me.
Here is the full conversation with ChatGPT o1-preview
In general o1-preview seems like a big step change in its ability to reliably do hard tasks like this without any advanced scaffolding or prompting to make it work.
Detailed discussion of results
The architecture that o1 went for in the first round is essentially the same that Sonnet 3.5 and gemini went for, a pointnet inspired model which extracts features from each point independently. While it managed to do slightly better than chance on the training set, it did not do well on the test set.
For round two, it went for the approach (which also Sonnet 3.5 came up with) of binning the points in 2D into an image, and then using a regular 2D convnet to classify the shapes. This worked somewhat on the first try. It completely overfit the training data, but got to an accuracy of 56% on the test data.
For round three, it understood that it needed to add data augmentations in order to generalize better, and it implemented scaling, translations and rotations of the data. It also switched to a slightly modified resnet18 architecture (a roughly 10x larger model). However, it made a bug when converting to PIL image (and back to torch.tensor), which resulted in an error.
For round four, o1 fixed the error and has a basically working solution, achieving an accuracy of 77% (which increases to 91% if we increase the number of epochs from 50 to 250, all still well within the alloted hour of runtime). I consider the problem basically solved at this point, by playing around with smaller variations on this, you can probably get a few more percentage points without any more insights needed.
For the last round, it tried the standard approach of using the pretrained weights of resnet18 and freezing almost all the layers, which is an approach that works well on many problems, but did not work well in this case. The accuracy reduced to 41%. I guess these data are just too different from imagenet (which resnet18 is trained on) for this approach to work well. I would not have expected this to work, but I don't hold it that much against o1, as it is a reasonable thing to try.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Best Argument is not a Simple English Yud Essay, published by Jonathan Bostock on September 20, 2024 on The Effective Altruism Forum.
I was encouraged to post this here, but I don't yet have enough EA forum karma to crosspost directly!
Epistemic status: these are my own opinions on AI risk communication, based primarily on my own instincts on the subject and discussions with people less involved with rationality than myself. Communication is highly subjective and I have not rigorously A/B tested messaging. I am even less confident in the quality of my responses than in the correctness of my critique.
If they turn out to be true, these thoughts can probably be applied to all sorts of communication beyond AI risk.
Lots of work has gone into trying to explain AI risk to laypersons. Overall, I think it's been great, but there's a particular trap that I've seen people fall into a few times. I'd summarize it as simplifying and shortening the text of an argument without enough thought for the information content. It comes in three forms.
One is forgetting to adapt concepts for someone with a far inferential distance; another is forgetting to filter for the important information; the third is rewording an argument so much you fail to sound like a human being at all.
I'm going to critique three examples which I think typify these:
Failure to Adapt Concepts
I got this from the summaries of AI risk arguments written by Katja Grace and Nathan Young here. I'm making the assumption that these summaries are supposed to be accessible to laypersons, since most of them seem written that way. This one stands out as not having been optimized on the concept level. This argument was below-aveage effectiveness when tested.
I expect most people's reaction to point 2 would be "I understand all those words individually, but not together". It's a huge dump of conceptual information all at once which successfully points to the concept in the mind of someone who already understands it, but is unlikely to introduce that concept to someone's mind.
Here's an attempt to do better:
1. So far, humans have mostly developed technology by understanding the systems which the technology depends on.
2. AI systems developed today are instead created by machine learning. This means that the computer learns to produce certain desired outputs, but humans do not tell the system how it should produce the outputs. We often have no idea how or why an AI behaves in the way that it does.
3. Since we don't understand how or why an AI works a certain way, it could easily behave in unpredictable and unwanted ways.
4. If the AI is powerful, then the consequences of unwanted behaviour could be catastrophic.
And here's Claude's just for fun:
1. Up until now, humans have created new technologies by understanding how they work.
2. The AI systems made in 2024 are different. Instead of being carefully built piece by piece, they're created by repeatedly tweaking random systems until they do what we want. This means the people who make these AIs don't fully understand how they work on the inside.
3. When we use systems that we don't fully understand, we're more likely to run into unexpected problems or side effects.
4. If these not-fully-understood AI systems become very powerful, any unexpected problems could potentially be really big and harmful.
I think it gets points 1 and 3 better than me, but 2 and 4 worse. Either way, I think we can improve upon the summary.
Failure to Filter Information
When you condense an argument down, you make it shorter. This is obvious. What is not always as obvious is that this means you have to throw out information to make the core point clearer. Sometimes the information that gets kept is distracting. Here's an example from a poster a friend of mine made for Pause AI:
When I showed this to ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Interested in Cognitive Bootcamp?, published by Raemon on September 20, 2024 on LessWrong.
I'm running more 4-day "Cognitive Bootcamps" over the next couple months (during Lighthaven Eternal September season). DM me if you're potentially interested (either as an individual, or as a team).
The workshop is most valuable to people who:
control their decisionmaking process (i.e. you decide what projects you or a team work on, rather than working at a day-job on someone else's vision)
are either a) confused about planmaking / have a vague sense that they aren't as strategically ambitious as they could be.
and/or, b) are at a place where it's natural to spend a few days thinking big-picture thoughts before deciding on their next project.
There's a secondary[1] focus on "practice solving confusing problems", which IMO is time well spent, but requires more followup practice to pay off.
I wrote about the previous workshop here. Participants said on average they'd have been willing to pay $850 for it, and would have paid $5000 for the ideal, perfectly-tailored-for-them version. My plan is to charge $500/person for the next workshop, and then $1000 for the next one.
I'm most excited to run this for teams, who can develop a shared skillset and accompanying culture. I plan to tailor the workshops for the needs of whichever people show up.
The dates are not scheduled yet (depends somewhat on when a critical mass of participants are available). DM me if you are interested.
The skills being taught will be similar to the sort of thing listed in Skills from a year of Purposeful Rationality Practice and the Feedbackloop-first Rationality sequence. My default curriculum is aiming to teach several interrelated related skills you can practice over four days, that build into a coherent metaskill of "ambitious planning, at multiple timescales."
1. ^
I started this project oriented around "find better feedbackloops for solving confusing problems", and later decided that planmaking was the highest leverage part of the skill tree to focus on.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Laziness death spirals, published by PatrickDFarley on September 19, 2024 on LessWrong.
I've claimed that
Willpower compounds and that small wins in the present make it easier to get bigger wins in the future. Unfortunately, procrastination and laziness compound, too.
You're stressed out for some reason, so you take the evening off for a YouTube binge. You end up staying awake a little later than usual and sleeping poorly. So the next morning you feel especially tired; you snooze a few extra times. In your rushed morning routine you don't have time to prepare for the work meeting as much as you'd planned to. So you have little to contribute during the meeting. You feel bad about your performance. You escape from the bad feelings with a Twitter break.
But Twitter is freaking out. Elon Musk said what? Everyone is weighing in. This is going to occupy you intermittently for the rest of the day. And so on.
Laziness has a kind of independent momentum to it. When you're having a day like the above, even if you consciously commit to getting back on track, the rut tends to find its way back to you within a couple of hours. Keep this up for a few days and your sleep is utterly messed up, and you walk around in a fog. Keep it up for a week or two and you're fully off your workout routine.
In a month or two, you might have noticeably fallen behind on work; you might be absent from your social life; you might've visibly gained fat or lost muscle; you can no longer feel excited about your personal goals because they're behind a pile of mundane tasks you need to catch up on first. And so on.
How do we stop the vicious circle?
I'm spiraling! I'm spiraling!
When you're in a laziness death spiral, it's hard to do anything deliberate. The first and most important step, which does take some willpower but not a lot, is to acknowledge, "I'm in a laziness death spiral today."
If you don't acknowledge it, here's what happens: You vaguely notice you you've been wasting time today; you feel a twinge of guilt, so you quickly decide, "I'm going to turn the rest of the day around, starting right now." And does that work?
Often it doesn't! Sure, after a small lapse you can just get back on track, but if enough laziness momentum has built up, a momentary reaction doesn't cut it. Deciding things quickly, in response to negative emotions, is exactly how you got into this situation! You're going to turn it around on a whim? You'll have a different whim in the next hour; what then? You need to take a step back and get your mind outside of the problem.
Do what you can
The next three sections are three different courses of action you can take to get out of a laziness death spiral. One of them is clearly preferable, but I'm writing the alternatives, too. When you're in a low-willpower state, it's often bad to attempt the very best solution - the farther you reach, the harder you can fall. Building a base of "small wins" is the reliable way to repair your willpower.
If you start something lofty and then bail on it, you're doing real damage: logging another willpower failure and associating that "very best solution" with failure.
Here are the moves:
A) Emergency recovery
If you're in a laziness spiral and you need to get out of it right now, there are some measures you can take that, while effective, are not ideal. They are unsustainable, promote bad habits, or are just generally unhealthy.
But sometimes the need is there: maybe you have a deadline fast approaching (and the deadline itself isn't enough to snap you into action); maybe your friends or family need you to take care of something today; maybe you were in the middle of an awfully lazy day and a once-in-a-lifetime opportunity came up, and you just can't focus enough to act on it.
Disclaimer: I believe that in a well planned life, none of these sho...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap, published by johnswentworth on September 19, 2024 on LessWrong.
Background: "Learning" vs "Learning About"
Adaptive systems, reinforcement "learners", etc, "learn" in the sense that their behavior adapts to their environment.
Bayesian reasoners, human scientists, etc, "learn" in the sense that they have some symbolic representation of the environment, and they update those symbols over time to (hopefully) better match the environment (i.e. make the map better match the territory).
These two kinds of "learning" are not synonymous[1]. Adaptive systems "learn" things, but they don't necessarily "learn about" things; they don't necessarily have an internal map of the external territory.
(Yes, the active inference folks will bullshit about how any adaptive system must have a map of the territory, but their math does not substantively support that interpretation.) The internal heuristics or behaviors "learned" by an adaptive system are not necessarily "about" any particular external thing, and don't necessarily represent any particular external thing[2].
We Humans Learn About Our Values
"I thought I wanted X, but then I tried it and it was pretty meh."
"For a long time I pursued Y, but now I think that was more a social script than my own values."
"As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight."
The ubiquity of these sorts of sentiments is the simplest evidence that we do not typically know our own values[3]. Rather, people often (but not always) have some explicit best guess at their own values, and that guess updates over time - i.e. we can learn about our own values.
Note the wording here: we're not just saying that human values are "learned" in the more general sense of reinforcement learning. We're saying that we humans have some internal representation of our own values, a "map" of our values, and we update that map in response to evidence. Look again at the examples at the beginning of this section:
"I thought I wanted X, but then I tried it and it was pretty meh."
"For a long time I pursued Y, but now I think that was more a social script than my own values."
"As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight."
Notice that the wording of each example involves beliefs about values. They're not just saying "I used to feel urge X, but now I feel urge Y". They're saying "I thought I wanted X" - a belief about a value! Or "now I think that was more a social script than my own values" - again, a belief about my own values, and how those values relate to my (previous) behavior. Or "I endorsed the view that Z is the highest objective" - an explicit endorsement of a belief about values.
That's how we normally, instinctively reason about our own values. And sure, we could reword everything to avoid talking about our beliefs about values - "learning" is more general than "learning about" - but the fact that it makes sense to us to talk about our beliefs about values is strong evidence that something in our heads in fact works like beliefs about values, not just reinforcement-style "learning".
Two Puzzles
Puzzle 1: Learning About Our Own Values vs The Is-Ought Gap
Very roughly speaking, an agent could aim to pursue any values regardless of what the world outside it looks like; "how the external world is" does not tell us "how the external world should be". So when we "learn about" values, where does the evidence about values come from? How do we cross the is-ought gap?
Puzzle 2: The Role of Reward/Reinforcement
It does seem like humans have some kind of physiological "reward", in a hand-wavy reinforcement-learning-esque sense, which seems to at l...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #82: The Governor Ponders, published by Zvi on September 19, 2024 on LessWrong.
The big news of the week was of course OpenAI releasing their new model o1. If you read one post this week, read that one. Everything else is a relative sideshow.
Meanwhile, we await Newsom's decision on SB 1047. The smart money was always that Gavin Newsom would make us wait before offering his verdict on SB 1047. It's a big decision. Don't rush him. In the meantime, what hints he has offered suggest he's buying into some of the anti-1047 talking points. I'm offering a letter to him here based on his comments, if you have any way to help convince him now would be the time to use that. But mostly, it's up to him now.
Table of Contents
1. Introduction.
2. Table of Contents.
3. Language Models Offer Mundane Utility. Apply for unemployment.
4. Language Models Don't Offer Mundane Utility. How to avoid the blame.
5. Deepfaketown and Botpocalypse Soon. A social network of you plus bots.
6. They Took Our Jobs. Not much impact yet, but software jobs still hard to find.
7. Get Involved. Lighthaven Eternal September, individual rooms for rent.
8. Introducing. Automated scientific literature review.
9. In Other AI News. OpenAI creates independent board to oversee safety.
10. Quiet Speculations. Who is preparing for the upside? Or appreciating it now?
11. Intelligent Design. Intelligence. It's a real thing.
12. SB 1047: The Governor Ponders. They got to him, but did they get to him enough?
13. Letter to Newsom. A final summary, based on Newsom's recent comments.
14. The Quest for Sane Regulations. How should we update based on o1?
15. Rhetorical Innovation. The warnings will continue, whether or not anyone listens.
16. Claude Writes Short Stories. It is pondering what you might expect it to ponder.
17. Questions of Sentience. Creating such things should not be taken lightly.
18. People Are Worried About AI Killing Everyone. The endgame is what matters.
19. The Lighter Side. You can never be sure.
Language Models Offer Mundane Utility
Arbitrate your Nevada unemployment benefits appeal, using Gemini. This should solve the backlog of 10k+ cases, and also I expect higher accuracy than the existing method, at least until we see attempts to game the system. Then it gets fun. That's also job retraining.
o1 usage limit raised to 50 messages per day for o1-mini, 50 per week for o1-preview.
o1 can do multiplication reliably up to about 46 digits, andabout 50% accurately up through about 810, a huge leap from gpt-4o, although Colin Fraser reports 4o can be made better tat this than one would expect.
o1 is much better than 4o at evaluating medical insurance claims, and determining whether requests for care should be approved, especially in terms of executing existing guidelines, and automating administrative tasks. It seems like a clear step change in usefulness in practice.
The claim is that being sassy and juicy and bitchy improves Claude Instant numerical reasoning. What I actually see here is that it breaks Claude Instant out of trick questions. Where Claude would previously fall into a trap, you have it fall back on what is effectively 'common sense,' and it starts getting actually easy questions right.
Language Models Don't Offer Mundane Utility
A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control. However, humans often demand manual mode be available to them, allowing humans to override the AI, even when it doesn't make any practical sense to offer this. And then, if the human can in theory switch to manual mode and override the AI, blame to the human returns, even when the human exerting that control was clearly impractical in context.
The top example here is self-driving cars, and blame for car crashes.
The results suggest that the human thirst for ill...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Which LessWrong/Alignment topics would you like to be tutored in? [Poll], published by Ruby on September 19, 2024 on LessWrong.
Would you like to be tutored in applied game theory, natural latents, CFAR-style rationality techniques, "general AI x-risk", Agent Foundations, anthropic
s
, or some
other topics
discussed on LessWrong?
I'm thinking about prototyping some topic-specific LLM tutor bots, and would like to prioritize topics that multiple people are interested in.
Topic-specific LLM tutors would be customized with things like pre-loaded relevant context, helpful system prompts, and more focused testing to ensure they work.
Note: I'm interested in topics that are written about on LessWrong, e.g. infra-bayesianism, and
not
magnetohydrodynamics".
I'm going to use the same poll infrastructure that
Ben Pace pioneered
recently. There is a
thread below
where you add and vote on topics/domains/areas where you might like tutoring.
1. Karma: upvote/downvote to express enthusiasm about there being tutoring for a topic.
2. Reacts: click on the agree react to indicate you personally would like tutoring on a topic.
3. New Poll Option. Add a new topic for people express interest in being tutored on.
For the sake of this poll, I'm more interested in whether you'd like tutoring on a topic or not, separate from the question of whether you think a tutoring bot would be any good. I'll worry about that part.
Background
I've been playing around with LLMs a lot in the past couple of months and so far my favorite use case is tutoring. LLM-assistance is helpful via multiple routes such as providing background context with less effort than external search/reading, keeping me engaged via interactivity, generating examples, and breaking down complex sections into more digestible pieces.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What Would You Ask The Archbishop of Canterbury?, published by JDBauman on September 19, 2024 on The Effective Altruism Forum.
The head of the Church of England is the second most influential Christian alive today. [1] The current Archbishop, Justin Welby, is speaking at the EA-adjacent Christians for Impact conference with Rory Stewart about faith and poverty.
What should we ask Archbishop Justin in the Q&A?
Feel free to submit anonymous thoughts here.
1. ^
Source: ChatGPT
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Intuitive self-models] 1. Preliminaries, published by Steven Byrnes on September 19, 2024 on LessWrong.
1.1 Summary & Table of Contents
This is the first of a series of eight blog posts, which I'll be serializing over the next month or two. (Or
email or
DM me if you want to read the whole thing right now.) Here's an overview of the whole series, and then we'll jump right into the first post!
1.1.1 Summary & Table of Contents - for the whole series
This is a rather ambitious series of blog posts, in that I'll attempt to explain what's the deal with consciousness, free will, hypnotism, enlightenment, hallucinations, flow states, dissociation, akrasia, delusions, and more.
The starting point for this whole journey is very simple:
The brain has a predictive (a.k.a. self-supervised) learning algorithm.
This algorithm builds generative models (a.k.a. "intuitive models") that can predict incoming data.
It turns out that, in order to predict incoming data, the algorithm winds up not only building generative models capturing properties of trucks and shoes and birds, but also building generative models capturing properties of the brain algorithm itself.
Those latter models, which I call "intuitive self-models", wind up including ingredients like conscious awareness, deliberate actions, and the sense of applying one's will.
That's a simple idea, but exploring its consequences will take us to all kinds of strange places - plenty to fill up an eight-post series! Here's the outline:
Post 1 (Preliminaries) gives some background on the brain's predictive learning algorithm, how to think about the "intuitive models" built by that algorithm, how intuitive self-models come about, and the relation of this whole series to Philosophy Of Mind.
Post 2 (
Awareness
) proposes that our intuitive self-models include an ingredient called "conscious awareness", and that this ingredient is built by the predictive learning algorithm to represent a serial aspect of cortex computation. I'll discuss ways in which this model is veridical (faithful to the algorithmic phenomenon that it's modeling), and ways that it isn't. I'll also talk about how intentions and decisions fit into that framework.
Post 3 (
The Homunculus
) focuses more specifically on the intuitive self-model that almost everyone reading this post is experiencing right now (as opposed to the other possibilities covered later in the series), which I call the Conventional Intuitive Self-Model. In particular, I propose that a key player in that model is a certain entity that's conceptualized as actively causing acts of free will. Following
Dennett, I call this entity "the homunculus", and relate that to intuitions around free will and sense-of-self.
Post 4 (
Trance
) builds a framework to systematize the various types of trance, from everyday "flow states", to intense possession rituals with amnesia. I try to explain why these states have the properties they do, and to reverse-engineer the various tricks that people use to induce trance in practice.
Post 5 (
Dissociative Identity Disorder
) (a.k.a. "multiple personality disorder") is a brief opinionated tour of this controversial psychiatric diagnosis. Is it real? Is it iatrogenic? Why is it related to borderline personality disorder (BPD) and trauma? What do we make of the wild claim that each "alter" can't remember the lives of the other "alters"?
Post 6 (
Awakening / Enlightenment / PNSE
) is a type of intuitive self-model, typically accessed via extensive meditation practice. It's quite different from the conventional intuitive self-model. I offer a hypothesis about what exactly the difference is, and why that difference has the various downstream effects that it has.
Post 7
(Hearing Voices, and Other Hallucinations) talks about factors contributing to hallucinations - although I argue ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA Organization Updates: September 2024, published by Toby Tremlett on September 19, 2024 on The Effective Altruism Forum.
If you would like to see EA Organization Updates as soon as they come out, consider subscribing to this tag.
Some of the opportunities and job listings we feature in this update have (very) pressing deadlines (see AI Alignment Teaching Fellow opportunities at BlueDot Impact, September 22, and Institutional Foodservice Fellow at the Good Food Institute, September 18).
You can see previous updates on the "EA Organization Updates (monthly series)" topic page, or in our repository of past newsletters. Notice that there's also an "org update" tag, where you can find more news and updates that are not part of this consolidated series.
These monthly posts originated as the "Updates" section of the monthly EA Newsletter. Organizations submit their own updates, which we edit for clarity.
(If you'd like to share your updates and jobs via this series, please apply here.)
Opportunities and jobs
Opportunities
Consider also checking opportunities listed on the EA Opportunity Board and the Opportunities to Take Action tag.
ALLFED published a new database containing numerous research projects that prospective volunteers can assist with. Explore the database and apply here.
Apply to the upcoming AI Safety Fundamentals: Alignment course by October 6 to learn about the risks from AI and how you can contribute to the field.
The Animal Advocacy Careers Introduction to Animal Advocacy Course has been revamped. The course is for those wishing to kickstart a career in animal advocacy.
Giv Effektivt (DK) needs ~110 EU citizens to become members before the new year in order to offer tax deductions of around 450.000DKK ($66.000) for 2024-25 donations. Become a member now for 50DKK ($7). An existing donor will give 100DKK for each new member until the organization reaches 300 members.
Anima International's Animal Advocacy Training Center released a new online course - Fundraising Essentials. It's a free, self-paced resource with over two hours of video content for people new to the subject.
Job listings
Consider also exploring jobs listed on the Job listing (open) tag. For even more roles, check the 80,000 Hours Job Board.
BlueDot Impact
AI Alignment Teaching Fellow (Remote, £4.9K-£9.6K, apply by September 22nd)
Centre for Effective Altruism
Head of Operations (Remote, £107.4K / $179.9K, apply by October 7th)
Cooperative AI Foundation
Communications Officer (Remote, £35K-£40K, apply by September 29th)
GiveWell
Senior Researcher (Remote, $200K-$220.6K)
Giving What We Can
Global CEO (Remote, $130K+, apply by September 30th)
Open Philanthropy
Operations Coordinator/Associate (San Francisco, Washington, DC, $99.6K-$122.6K)
If you're interested in working at Open Philanthropy but don't see an open role that matches your skillset, express your interest.
Epoch AI
Question Writer, Math Benchmark (Contractor Position) (Remote, $2K monthly + $100-$1K performance-based bonus)
Senior Researcher, ML Distributed Systems (Remote, $150K-$180K)
The Good Food Institute
Managing Director, GFI India (Hybrid (Mumbai, Delhi, Hyderabad, or Bangalore), ₹4.5M, apply by October 2nd)
Institutional Foodservice Fellow (Independent Contractor) (Remote in US, $3.6K biweekly, apply by September 18th)
Organization updates
The organization updates are in alphabetical order (0-A-Z).
80,000 Hours
There is one month left to win $5,000 career grants by referring your friends or colleagues to 80,000 Hours' free career advising.
Also, the organization released a blog post about the recent updates to their AI-related content, as well as a post about pandemic preparedness in relation to mpox and H5N1.
On the 80,000 Hours Podcast, Rob interviewed:
Nick Joseph on whether Anthropic's AI safety policy is up to the task...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Five Years of Animal Advocacy Careers: Our Journey to impact, Lessons Learned, and What's Next, published by lauren mee on September 19, 2024 on The Effective Altruism Forum.
This post is mostly about our key learnings, impact made and future plans
Thanks to my team for their help in both creating this post and unwavering commitment to driving forward AAC's ambitious plans for animals, in particular Ana Barreiro, Nayan and Engin for their contributions and feedback on this post.
TL;DR:
For five years, Animal Advocacy Careers (AAC) has tried to direct passionate professionals towards high-impact opportunities that have the potential to help animals the most.
We've
filled
105 roles in leading animal advocacy organisations, supported over
150 organisations
with recruitment, and launched 3 core programs our online course, job board, and career advising service. At the same time, we built a
community of
27,500+ supporters across social media, Slack, and email. Our efforts also led to
12 10% Pledges and 11 Trial Pledges at Giving What We Can. We cautiously estimate adding $2.5 million worth of counterfactual impact from these donations and placements at a spend of $950,000
We conducted four talent surveys, which, along with our own independent research, continue to form the foundation of our career advising and strategy.
Addressing the talent bottlenecks in the effective animal advocacy movement has proven to be far more complex than we first expected. Beyond the initial challenges, we've encountered a range of issues that directly impact our theory of change and our ability to drive meaningful impact - such as the scarcity of job postings and difficulties in the hiring process.
In response, we've broadened our focus beyond just non-profit roles to better address these challenges and open up more opportunities for talented individuals to contribute to the movement.
Explore more about how AAC is transforming animal advocacy careers and find out more about our exciting plans for the future.
(Note: If you would like the full details of the programmes we have stopped, started, scaled and pivoted and a full programme evaluation our latest 2023/4
update is here)
Overview
This piece highlights Animal Advocacy Careers' accomplishments, mistakes, and changes since its establishment in 2019. We discuss AAC's future plans as well as potential constraints to our impact. Our vision is to have an animal advocacy movement of international talent density with mission-aligned advocates in critical positions in society, accelerating freedom for animals.
Background
AAC was founded in July 2019 through Charity Entrepreneurship's incubation program. Its goal is to accelerate the impact of existing organisations by solving their major talent bottlenecks, attracting top talent to the movement, matching them to the most impactful opportunities and empowering professionals to make a real impact.
To effectively match top talent with the most impactful opportunities, AAC first had to conduct research to gain a deeper understanding of the movement's challenges and overall talent landscape. We needed to identify the market size, determine which skills and roles were most in demand and hardest to fill, and uncover the root causes behind these talent bottlenecks. This research forms the foundation of our work, allowing us to address the movement's needs in a more informed and strategic way.
In addition to conducting research, AAC launched several experimental programs aimed at addressing talent bottlenecks . These programs included management and leadership training, an online course, a job board, career advising, fundraising work placements, headhunting and recruitment efforts, organisational recruitment training, a candidate database, and effective giving for animals.
Through trialing these programmes...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Obliqueness Thesis, published by Jessica Taylor on September 19, 2024 on The AI Alignment Forum.
In my Xenosystems review, I discussed the Orthogonality Thesis, concluding that it was a bad metaphor. It's a long post, though, and the comments on orthogonality build on other Xenosystems content. Therefore, I think it may be helpful to present a more concentrated discussion on Orthogonality, contrasting Orthogonality with my own view, without introducing dependencies on Land's views.
(Land gets credit for inspiring many of these thoughts, of course, but I'm presenting my views as my own here.)
First, let's define the Orthogonality Thesis. Quoting Superintelligence for Bostrom's formulation:
Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.
To me, the main ambiguity about what this is saying is the "could in principle" part; maybe, for any level of intelligence and any final goal, there exists (in the mathematical sense) an agent combining those, but some combinations are much more natural and statistically likely than others. Let's consider Yudkowsky's formulations as alternatives. Quoting Arbital:
The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.
The strong form of the Orthogonality Thesis says that there's no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal.
As an example of the computational tractability consideration, sufficiently complex goals may only be well-represented by sufficiently intelligent agents. "Complication" may be reflected in, for example, code complexity; to my mind, the strong form implies that the code complexity of an agent with a given level of intelligence and goals is approximately the code complexity of the intelligence plus the code complexity of the goal specification, plus a constant.
Code complexity would influence statistical likelihood for the usual Kolmogorov/Solomonoff reasons, of course.
I think, overall, it is more productive to examine Yudkowsky's formulation than Bostrom's, as he has already helpfully factored the thesis into weak and strong forms. Therefore, by criticizing Yudkowsky's formulations, I am less likely to be criticizing a strawman. I will use "Weak Orthogonality" to refer to Yudkowsky's "Orthogonality Thesis" and "Strong Orthogonality" to refer to Yudkowsky's "strong form of the Orthogonality Thesis".
Land, alternatively, describes a "diagonal" between intelligence and goals as an alternative to orthogonality, but I don't see a specific formulation of a "Diagonality Thesis" on his part. Here's a possible formulation:
Diagonality Thesis: Final goals tend to converge to a point as intelligence increases.
The main criticism of this thesis is that formulations of ideal agency, in the form of Bayesianism and VNM utility, leave open free parameters, e.g. priors over un-testable propositions, and the utility function. Since I expect few readers to accept the Diagonality Thesis, I will not concentrate on criticizing it.
What about my own view? I like Tsvi's naming of it as an "obliqueness thesis".
Obliqueness Thesis: The Diagonality Thesis and the Strong Orthogonality Thesis are false. Agents do not tend to factorize into an Orthogonal value-like component and a Diagonal belief-like component; rather, there are Oblique components that do not factorize neatly.
(Here, by Orthogonal I mean basically independent of intelligence, and by Diagonal I mean converging to a point in the limit of intelligence.)
While I will address Yudkowsky's arguments for the Orthogonality Thesis, I think arguing directly for my view first will be more helpful.
In general, it seems ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The case for a negative alignment tax, published by Cameron Berg on September 18, 2024 on LessWrong.
TL;DR:
Alignment researchers have historically predicted that building safe advanced AI would necessarily incur a significant
alignment tax compared to an equally capable but unaligned counterfactual AI.
We put forward a case here that this prediction looks increasingly unlikely given the current 'state of the board,' as well as some possibilities for updating alignment strategies accordingly.
Introduction
We recently
found that over one hundred grant-funded alignment researchers generally disagree with statements like:
alignment research that has some probability of also advancing capabilities should not be done (~70% somewhat or strongly disagreed)
advancing AI capabilities and doing alignment research are mutually exclusive goals (~65% somewhat or strongly disagreed)
Notably, this sample also predicted that the distribution would be significantly more skewed in the 'hostile-to-capabilities' direction.
See ground truth vs. predicted distributions for these statements
These results - as well as
recent
events and
related discussions - caused us to think more about our views on the relationship between capabilities and alignment work given the 'current state of the board,'[1] which ultimately became the content of this post. Though we expect some to disagree with these takes, we have been pleasantly surprised by the positive feedback we've received from discussing these ideas in person and are excited to further stress-test them here.
Is a negative alignment tax plausible (or desirable)?
Often, capabilities and alignment are framed with reference to the
alignment tax, defined as 'the extra cost [practical, developmental, research, etc.] of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative.'
The
AF/
LW wiki entry on alignment taxes notably includes the following claim:
The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it.
The worst case scenario is Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible.
We speculate in this post about a different best case scenario: a negative alignment tax - namely, a state of affairs where an AI system is actually rendered more competent/performant/capable by virtue of its alignment properties.
Why would this be even better than 'No Tax?' Given the clear existence of a
trillion
dollar attractor state towards ever-more-powerful AI, we suspect that the most pragmatic and desirable outcome would involve humanity finding a path forward that both (1) eventually satisfies the constraints of this attractor (i.e., is in fact highly capable, gets us AGI, etc.) and (2) does not pose existential risk to humanity.
Ignoring the inevitability of (1) seems practically unrealistic as an action plan at this point - and ignoring (2) could be collectively suicidal.
Therefore, if the safety properties of such a system were also explicitly contributing to what is rendering it capable - and therefore functionally causes us to navigate away from possible futures where we build systems that are capable but unsafe - then these 'negative alignment tax' properties seem more like a feature than a bug.
It is also worth noting here as an empirical datapoint here that virtually all frontier models' alignment properties have rendered them more rather than less capable (e.g., gpt-4 is far more useful and far more aligned than gpt-4-base), which is the opposite of what the 'alignment tax' model would have predicted.
This idea is somewhat reminiscent of
differential technological development, in which Bostrom suggests "[slowing] the devel...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Match funding opportunity to challenge the legality of Frankenchickens, published by Gavin Chappell-Bates on September 18, 2024 on The Effective Altruism Forum.
We have a once-in-a-generation opportunity to improve the lives of millions of chickens raised for food in the UK.
In October 2024 The Humane League UK (THL UK) will be heading to the High Court to challenge the legality of fast-growing breeds of chicken - Frankenchickens.
We need to raise £55k to fund the hearing.
The Jeremy Coller Foundation has pledged to match funding half of the costs up to £28k. We need to raise a further £12.5k to maximise the match funding pot and fully fund the hearing.
Please contact me directly should you wish to donate and fight for 1 billion chickens.
Frankenchickens
'
Frankenchickens' are selectively bred to grow unnaturally big and fast to maximise profits. They are destined to suffer extremely short and painful lives, suffer heart attacks, are often unable to walk and succumb to open sores from laying in their own waste. They grow 400% faster than is natural for their bodies, creating the biggest animal welfare crisis of our time.
In the UK alone, there are over 1 billion chickens raised for meat and over 90% are fast growing.
THL UK's three-year legal battle
In 2020, we saw an opportunity to challenge the legality of Frankenchickens and began building a legal case against the Department for Environment, Food & Rural Affairs (Defra).
This culminated in a
judicial review taking place at the High Court in May 2023. Getting to this point was a major success in itself as only 5% of cases are granted a full hearing. The judge stated that a full hearing of the facts regarding fast-growing chickens was in the
public interest.
Represented by Advocates for Animals, we argued that fast-growing chicken breeds, known as Frankenchickens, are illegal under current animal welfare laws, as they suffer as a direct result of their breeding. Our case was bolstered by evidence given by the RSPCA which shows that fast-growing breeds of chicken do suffer, no matter the environment they're raised in. This was despite Defra attempting to
block the submission of the RSPCA's evidence.
The fight continues
In May 2023, the High Court
ruled that Defra hadn't behaved unlawfully in their interpretation of the Welfare of Farmed Animals Regulation of 2007.
Shortly after the ruling we decided to
appeal the court's decision, and continue our
three-year legal battle.
There is overwhelming scientific consensus that chickens raised for meat suffer due to their breed. Defra itself has offered no evidence to contradict the RSPCA report and even accepted that there are welfare problems with fast-growing breeds of chicken.
In October 2023, we found out that our appeal had been granted.
In October 2024, we will be back in court, in front of a new judge, to take on Defra to end the cruel use of Frankenchickens in the UK. Our two-day court hearing is due to start on either Tuesday 22nd or Wednesday 23rd October.
This is a once-in-a-generation opportunity to force the Government, with one decision from an appeals court judge, to transform one billion innocent lives per year.
Our chances of success
By virtue of being granted an appeal, our chances for a favourable final outcome have increased significantly. Being granted an appeal means that serious problems with the previous judge's findings have been uncovered, and the judge approving our appeal thinks our case still has merit that needs final and careful deliberation.
A positive ruling would mean that the judge found Defra's interpretation of the Welfare of Farmed Animals Regulation of 2007 illegal, and would compel them to create a new policy on fast growing breeds of chicken, one that would invariably lead to farmers being disincentivized or even banned from keeping f...