DeepSeek-OCR: Contexts Optical Compression
Description
The October 21, 2025 Deepseek paper introduces **DeepSeek-OCR**, a Vision-Language Model (VLM) designed to investigate the feasibility of **contexts optical compression** for managing long contexts in Large Language Models (LLMs). This two-component model utilizes **DeepEncoder** to efficiently convert high-resolution text images into a manageable number of **vision tokens**, and a DeepSeek3B-MoE decoder for text reconstruction (Optical Character Recognition, or OCR). Experiments on the Fox benchmark demonstrate that DeepSeek-OCR can achieve approximately **97% decoding precision** at a **10× text compression ratio**, indicating that visual modality offers a promising avenue for efficiently compressing large amounts of text. Beyond serving as a research tool for exploring vision-text compression and memory-forgetting mechanisms, the model also exhibits strong practical performance, achieving state-of-the-art results on the OmniDocBench while requiring **fewer vision tokens** than comparable models. The architecture and training methodology are detailed, highlighting its potential for applications like high-throughput data generation for LLMs and VLMs.
Source:
https://arxiv.org/pdf/2510.18234




