Explaining Grokking Through Circuit Efficiency

Update: 2023-10-17

Description

Join Arize Co-Founder & CEO Jason Lopatecki, and ML Solutions Engineer, Sally-Ann DeLucia, as they discuss “Explaining Grokking Through Circuit Efficiency." This paper explores novel predictions about grokking, providing significant evidence in favor of its explanation. Most strikingly, the research conducted in this paper demonstrates two novel and surprising behaviors: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalization to partial rather than perfect test accuracy.

Find the transcript and more here: https://arize.com/blog/explaining-grokking-through-circuit-efficiency-paper-reading/

To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

2024-08-1639:05

Breaking Down Meta's Llama 3 Herd of Models

2024-08-0644:40

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

2024-07-2333:57

RAFT: Adapting Language Model to Domain Specific RAG

2024-06-2844:01

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

2024-06-1444:00

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

2024-05-3048:07

Breaking Down EvalGen: Who Validates the Validators?

2024-05-1344:47

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

2024-04-2645:07

Demystifying Chronos: Learning the Language of Time Series

2024-04-0444:40

Anthropic Claude 3

2024-03-2543:01

Reinforcement Learning in the Era of LLMs

2024-03-1544:49

Sora: OpenAI’s Text-to-Video Generation Model

2024-03-0145:08

RAG vs Fine-Tuning

2024-02-0839:49

Phi-2 Model

2024-02-0244:29

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

2024-02-0236:22

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

2023-12-2747:50

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

2023-12-1844:59

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

2023-11-3041:02

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

2023-11-2044:50

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

2023-10-1843:49

00:00

Explaining Grokking Through Circuit Efficiency

#box-pro-ellipsis-172515903995012{-webkit-line-clamp:2;}Explaining Grokking Through Circuit Efficiency

Explaining Grokking Through Circuit Efficiency

Arize AI

Explaining Grokking Through Circuit Efficiency