DiscoverLessWrong (30+ Karma)“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger
“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

Update: 2025-10-14
Share

Description

tl;dr: We fine-tune or few-shot LLMs to use reasoning encoded with simple ciphers (e.g. base64, rot13, putting a dot between each letter) to solve math problems. We find that these models only get an uplift from the reasoning (over directly answering) for very simple ciphers, and get no uplift for intermediate-difficulty ciphers that they can translate to English. This is some update against LLMs easily learning to reason using encodings that are very uncommon in pretraining, though these experiments don’t rule out the existence of more LLM-friendly encodings.

📄Paper, 🐦Twitter, 🌐Website

Research done as part of the Anthropic Fellows Program.

Summary of the results

We teach LLMs to use one particular cipher, such as:

  • “letter to word with dot” maps each char to a word and adds dots between words.
  • “Rot13” is the regular rot13 cipher
  • “French” is text translated into French
  • “Swap even & odd chars” swaps [...]

---

Outline:

(00:56 ) Summary of the results

(06:18 ) Implications

(06:22 ) Translation abilities != reasoning abilities

(06:44 ) The current SoTA for cipher-based jailbreaks and covert malicious fine-tuning come with a massive capability tax

(07:46 ) Current LLMs probably don't have very flexible internal reasoning

(08:15 ) But LLMs can speak in different languages?

(08:51 ) Current non-reasoning LLMs probably reason using mostly the human understandable content of their CoTs

(09:25 ) Current reasoning LLMs probably reason using mostly the human understandable content of their scratchpads

(11:36 ) What about future reasoning models?

(12:45 ) Future work

---


First published:

October 14th, 2025



Source:

https://www.lesswrong.com/posts/Lz8cvGskgXmLRgmN4/current-language-models-struggle-to-reason-in-ciphered


---


Narrated by TYPE III AUDIO.


---

Images from the article:

Line graph comparing performance of different language models across cipher tasks.
Graph showing model performance with different fine-tuning token amounts (3B-14B models).
Line graph comparing performance of GPT and Qwen models on cipher tasks.
Scatter plot comparing MATH500 accuracy versus pretraining prevalence for different ciphers.  The graph shows two types of ciphers (structure-disrupting and structure-preserving) plotted with different colored points and trend lines, demonstrating their relationship to accuracy scores across various pretraining levels.
Two scatter plots comparing BLEU scores with identification accuracy for different ciphers.  The graphs show language model performance (GPT-4.1 and Sonnet 4) on translation tasks, with data points representing various ciphers and encoding methods like Morse code, base64, and rot13.
Performance comparison graph showing different language models on various cipher tasks.  The graph compares GPT-4.1 variants (nano, mini, standard) and Qwen2.5 models (3B, 7B, 14B) across multiple cipher and encoding tasks like Morse code, base64, and various other ciphers, with a horizontal reference line indicating GPT 4.1's direct answering performance.
Bar graph comparing three Qwen models' performance across different ciphers.  This graph shows performance metrics for Qwen2.5-3B, 7B, and 14B models tested on various text transformations and ciphers, including baseline tests, stylistic text modifications, distractors, language variations, and information content. The blue shaded region indicates significant performance drops, particularly for more complex cipher tasks like Morse code and mathematical content replacement.
Line graph comparing performance of different language models across cipher tasks.  This graph shows a performance comparison between various GPT and Claude models (including gpt-4.1-nano, gpt-5-chat, claude-3-opus, etc.) on different cipher-related tasks. The y-axis shows accuracy percentages from 0 to 1, while the x-axis lists various cipher tasks like base64, Arabic cipher, and rot13 among others. The models generally show declining performance across more complex cipher tasks.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger