#138: Distilling the Knowledge in a Neural Network

Update: 2024-09-11

Description

大きなモデルから小さなモデルを作るテクニックを向井が回願しました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。

</figure>

[1503.02531] Distilling the Knowledge in a Neural Network

Comments

In Channel

#143 – SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

2024-12-1136:07

#142: An Empirical Study of Rust-for-Linux: The Success, Dissatisfaction, and Compromise

2024-12-0454:05

#141: SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

2024-11-1336:55

#140: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

2024-10-2439:54

#139: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

2024-10-0237:59

#138: Distilling the Knowledge in a Neural Network

2024-09-1123:38

#137: Optimal Quantile Approximation in Streams

2024-08-1327:31

#136: Distinct Elements in Streams: An Algorithm for the (Text) Book

2024-08-0725:58

#135: In-Datacenter Performance Analysis of a Tensor Processing Unit

2024-07-0244:18

#134: LoRA: Low-Rank Adaptation of Large Language Models

2024-06-0928:40

#133: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

2024-05-2130:18

#132: High-Resolution Image Synthesis with Latent Diffusion Models

2024-05-0234:30

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2024-04-2330:40

#130: Diffusion models from scratch, from a new theoretical perspective

2024-04-0530:44

#129: Programming Massively Parallel Processors (Ch.4- Ch.6)

2024-03-2659:43

#128: Faiss: A library for efficient similarity search and clustering of dense vectors.

2024-03-0942:38

#127: Programming Massively Parallel Processors (Ch.1- Ch.3)

2024-02-2928:52

#126: Vector Database Management Systems

2024-01-3054:03

#125: Always-on Vision Processing Unit for Mobile Applications

2024-01-2327:46

#124: GAIA: a benchmark for General AI Assistants

2023-12-2241:33

00:00

1.0x

#138: Distilling the Knowledge in a Neural Network

#box-pro-ellipsis-176736233610847{-webkit-line-clamp:2;}#138: Distilling the Knowledge in a Neural Network

#138: Distilling the Knowledge in a Neural Network

Jun Mukai

#138: Distilling the Knowledge in a Neural Network