DiscoverAI: post transformersDeepSeek-OCR: Contexts Optical Compression
DeepSeek-OCR: Contexts Optical Compression

DeepSeek-OCR: Contexts Optical Compression

Update: 2025-11-22
Share

Description

The October 21, 2025 Deepseek paper introduces **DeepSeek-OCR**, a Vision-Language Model (VLM) designed to investigate the feasibility of **contexts optical compression** for managing long contexts in Large Language Models (LLMs). This two-component model utilizes **DeepEncoder** to efficiently convert high-resolution text images into a manageable number of **vision tokens**, and a DeepSeek3B-MoE decoder for text reconstruction (Optical Character Recognition, or OCR). Experiments on the Fox benchmark demonstrate that DeepSeek-OCR can achieve approximately **97% decoding precision** at a **10× text compression ratio**, indicating that visual modality offers a promising avenue for efficiently compressing large amounts of text. Beyond serving as a research tool for exploring vision-text compression and memory-forgetting mechanisms, the model also exhibits strong practical performance, achieving state-of-the-art results on the OmniDocBench while requiring **fewer vision tokens** than comparable models. The architecture and training methodology are detailed, highlighting its potential for applications like high-throughput data generation for LLMs and VLMs.


Source:

https://arxiv.org/pdf/2510.18234

Comments 
In Channel
Meta: SAM 3

Meta: SAM 3

2025-11-2014:22

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

DeepSeek-OCR: Contexts Optical Compression

DeepSeek-OCR: Contexts Optical Compression

mcgrof