#124: GAIA: a benchmark for General AI Assistants

Update: 2023-12-22

Description

LLM に解かせる難問集と採点結果を向井が睨みました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。

</figure>

[2311.12983] GAIA: a benchmark for General AI Assistants

gaia-benchmark/GAIA · Datasets at Hugging Face

Comments

In Channel

#143 – SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

2024-12-1136:07

#142: An Empirical Study of Rust-for-Linux: The Success, Dissatisfaction, and Compromise

2024-12-0454:05

#141: SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

2024-11-1336:55

#140: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

2024-10-2439:54

#139: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

2024-10-0237:59

#138: Distilling the Knowledge in a Neural Network

2024-09-1123:38

#137: Optimal Quantile Approximation in Streams

2024-08-1327:31

#136: Distinct Elements in Streams: An Algorithm for the (Text) Book

2024-08-0725:58

#135: In-Datacenter Performance Analysis of a Tensor Processing Unit

2024-07-0244:18

#134: LoRA: Low-Rank Adaptation of Large Language Models

2024-06-0928:40

#133: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

2024-05-2130:18

#132: High-Resolution Image Synthesis with Latent Diffusion Models

2024-05-0234:30

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2024-04-2330:40

#130: Diffusion models from scratch, from a new theoretical perspective

2024-04-0530:44

#129: Programming Massively Parallel Processors (Ch.4- Ch.6)

2024-03-2659:43

#128: Faiss: A library for efficient similarity search and clustering of dense vectors.

2024-03-0942:38

#127: Programming Massively Parallel Processors (Ch.1- Ch.3)

2024-02-2928:52

#126: Vector Database Management Systems

2024-01-3054:03

#125: Always-on Vision Processing Unit for Mobile Applications

2024-01-2327:46

#124: GAIA: a benchmark for General AI Assistants

2023-12-2241:33

00:00

#124: GAIA: a benchmark for General AI Assistants

#box-pro-ellipsis-176736251733546{-webkit-line-clamp:2;}#124: GAIA: a benchmark for General AI Assistants

#124: GAIA: a benchmark for General AI Assistants

Jun Mukai

#124: GAIA: a benchmark for General AI Assistants