DiscoverWeaviate PodcastREFRAG with Xiaoqiang Lin - Weaviate Podcast #130!
REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!

REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!

Update: 2025-11-03
Share

Description

Xiaoqiang Lin is a Ph.D. student at the National University of Singapore. During his time at Meta, Xiaoqiang lead the research behind REFRAG: Rethinking RAG-based Decoding. Traditional RAG systems use vectors to retrieve relevant context with semantic search, but then throw away the vectors when passing the context to the LLM. REFRAG instead feeds the LLM these pre-compute vectors, achieving massive gains in long context processing and LLM inference speed! REFRAG makes Time-To-First-Token (TTFT) 31x faster and Time-To-Iterative-Token (TTIT) 3x faster, boosting overall LLM throughput by 7x while also being able to handle much longer contexts!


There are so many interesting aspects to this and I really loved diving into the details with Xiaoqiang! I hope you enjoy the podcast!

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!

REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!

Weaviate