Episode 7: The Old Parser
Description
Context-free grammars, non-deterministic finite automatons, left-to-right leftmost derivations... what even is all that?! Today we're talking about how Python parses your source code. We start gently with how this worked in the past. Come listen to Łukasz's high-level explanations and Pedantic Pablo's "well actuallys".
# Timestamps
(00:00:00 ) INTRO
(00:01:35 ) You can still download Python 1.0!
(00:02:19 ) The original tokenizer
(00:03:10 ) What even is a tokenizer?
(00:04:08 ) FUN FACTS ABOUT THE TOKENIZER
(00:04:34 ) Circumflex
(00:05:16 ) Python's invisible braces
(00:08:29 ) Backticks in the syntax
(00:11:00 ) Where are the comments stored?
(00:12:27 ) GRAMMAR
(00:13:37 ) What is a grammar?
(00:16:25 ) The long-forgotten 'access' keyword
(00:20:25 ) Making LL1 do things it wasn't meant to do
(00:23:24 ) SURPRISE QUESTION 1: soft keywords
(00:24:46 ) What's a context-free grammar?
(00:26:51 ) A note about backslashes
(00:29:33 ) The Dragon Book(s)
(00:31:27 ) PARSING: What is it?
(00:35:23 ) How to generate a parser?
(00:39:00 ) LL Cool Parser
(00:41:15 ) What if we used LR?
(00:44:01 ) Let's have three tokenizers!
(00:47:50 ) 2to3 and its legacy
(00:52:38 ) Black and its blib2to3
(00:54:04 ) The pesky 'with' statement and the death of LL1
(01:00:05 ) PR OF THE WEEK: GH-113745
(01:05:41 ) SURPRISE QUESTION 2: Subclasses of SyntaxError
(01:07:02 ) WHAT'S GOING ON IN CPYTHON?
(01:09:16 ) Sam Gross nominated as a core dev
(01:10:13 ) Free-threading progress
(01:13:11 ) Faster CPython changes
(01:17:29 ) ntpath.isreserved()
(01:20:11 ) Pablo and the DWARF
(01:22:02 ) OUTRO