DiscoverDeep PapersSkeleton of Thought: LLMs Can Do Parallel Decoding
Skeleton of Thought: LLMs Can Do Parallel Decoding

Skeleton of Thought: LLMs Can Do Parallel Decoding

Update: 2023-08-30
Share

Description

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. In this paper reading, we explore the paper ‘Skeleton-of-Thought’ (SoT) approach, aimed at reducing large language model latency while enhancing answer quality.

This episode is led by Aparna Dhinakaran ( Chief Product Officer, Arize AI) and Sally-Ann Delucia (ML Solutions Engineer, Arize AI), with two of the paper authors: Xuefei Ning, Postdoctoral Researcher at Tsinghua University and Zinan Lin, Senior Researcher, Microsoft Research.

SoT’s innovative methodology guides LLMs to construct answer skeletons before parallel content elaboration, achieving impressive speed-ups of up to 2.39x across 11 models. Don’t miss the opportunity to delve into this human-inspired optimization strategy and its profound implications for efficient and high-quality language generation.

Full transcript and more here: https://arize.com/blog/skeleton-of-thought-llms-can-do-parallel-decoding-paper-reading/

To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

Comments 
In Channel
Anthropic Claude 3

Anthropic Claude 3

2024-03-2543:01

RAG vs Fine-Tuning

RAG vs Fine-Tuning

2024-02-0839:49

Phi-2 Model

Phi-2 Model

2024-02-0244:29

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Skeleton of Thought: LLMs Can Do Parallel Decoding

Skeleton of Thought: LLMs Can Do Parallel Decoding

Arize AI