DiscoverDeep PapersAccurate KV Cache Quantization with Outlier Tokens Tracing
Accurate KV Cache Quantization with Outlier Tokens Tracing

Accurate KV Cache Quantization with Outlier Tokens Tracing

Update: 2025-06-04
Share

Description

We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance.

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Accurate KV Cache Quantization with Outlier Tokens Tracing

Accurate KV Cache Quantization with Outlier Tokens Tracing

Arize AI