DiscovermuckrAIkers
muckrAIkers
Claim Ownership

muckrAIkers

Author: Jacob Haimes and Igor Krawczuk

Subscribed: 1Played: 46
Share

Description

Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
22 Episodes
Reverse
The Mythical AI Bear

The Mythical AI Bear

2026-03-1743:11

This week, Jacob and Igor dissect the "mythical AI bear," the strawman version of AI criticism that gets thrown around in tech discourse. Working through a viral blog post that typifies the genre, they examine how legitimate concerns about code quality, labor displacement, intellectual property, and the erosion of craft get flattened into caricature. Plus: Sam Altman writes ten paragraphs about how unbothered he is by an ad.Chapters(00:00) - Introduction (00:30) - Altman's Super Bowl Meltdown (03:11) - What is "The Bear"? (06:41) - But You Have No Idea What The Code Is (15:44) - But The Craft, But The Mediocrity, But It'll Never Be AGI (24:43) - But They Take Our Jobs & But The Plagiarism (31:21) - Stochastic Parrots & Mythical Bears (42:34) - Outro Critical LinksBelow are the most important links for this episode. For more, visit the episode page on Kairos.fm.Big Think article - The rise of AI denialismFly.io blogpost - My AI Skeptic Friends Are All Nutsantirez blogpost - Don't fall into the anti-AI hypeEmily Bender blogpost - Resistance Isn't DenialismCory Doctorow blogpost - Reverse centaurs are the answer to the AI paradoxWashington Post article - The AI boom is so huge it's causing shortages everywhere elseBusiness Insider article - Veteran investor Jeremy Grantham says AI is 'obviously a bubble'Ipsos survey - Google / Ipsos Multi-Country AI Survey 2026Understanding AI Substack post - AI skeptics and AI boosters are both wrong
We're talking about developments in AI while those in power have unapologetically revealed their true fascist intensions; are we spending our time in the right way? Igor and I discuss the importance of shining a light on the techno-authoritarians who have played a very significant role in current state-of-the-world.While we discuss the murders of Nicole Good and Alex Pretti during this episode, it's important that we also acknowledge the many marginalized people who have died as a result of ICE's behavior, and they same level of outcry didn't happen. Six additional individuals died in ICE custody under suspicious circumstances between January 1st and 25th of 2026: Victor Manuel Díaz, Geraldo Lunas Campos, Luis Gustavo Núñez Cáceres, Luis Beltrán Yáñez-Cruz, Parady La, and Heber Sánchez Domínguez.Chapters(00:00) - | Introduction (03:57) - | The Authoritarian Stack (08:33) - | Palantir & Theil-Government Consolidation (13:44) - | Move Fast & Break Everything (23:14) - | Fascism in the US & Starving the Beast (39:48) - | Finding Local Opportunities for Action Critical LinksBelow are the most important links for this episode. For more, visit the episode page on Kairos.fm.The Authoritarian Stack websiteProject 2025 Observer websiteEFF report - ICE Using Palantir Tool Feeds on Medicaid DataThe Guardian article - Eight people have died in dealings with ICE so far in 2026. These are their storiesIndivisible websiteDistributed AI Research Institute projectsEAAMO website - Mechanism Design for Social GoodCarlos Maza video - How To Be Hopeless
Igor shares a significant shift in his perspective on AI coding tools after experiencing the latest Claude Code release. While he's been the stronger AI skeptic between the two of us, recent developments have shown him genuine utility in specific coding tasks, but this doesn't validate the hype or change the fundamental critiques.We discuss what "rote tasks" are and why they're now automatable with enough investment, the difference between genuine utility and AGI claims, and why this update actually impacts our bubble analysis. We explore how massive investment has finally produced something useful for a narrow domain, but it doesn't mean the technology is generalizable or that AGI is real.Chapters(00:00) - | Introduction (05:07) - | What Changed Igor’s Mind (18:27) - | Rote Tasks Explained (23:31) - | How Does This Impact our Bubble Analysis? (30:48) - | AGI Is Still BS (34:07) - | Externalities Remain Unchanged (37:49) - | Final Thoughts & Outro LinksRelated muckrAIkers episode - Tech Bros Love AI WaifusBubble TalkOfficeChai startup - OpenAI Hasn’t Completed A Successful Full-Scale Pretraining Run Since GPT-4o In May 2024, Says SemiAnalysisVechron report - Anthropic Prepares for Potential 2026 IPO in Bid to Rival OpenAIYCombinator Forum post on AI crashYCombinator Forum post on OpenAI adopting Anthropic's "skills"YCombinator Forum post on OpenAI rumorsYCombinator Forum post on OpenAI add suggestionsOther SourcesLinkedIn post discussing an agentic coding vibe shiftExecutive Order - Ensuring a National Policy Framework for Artificial IntelligenceInside Tech Law blogpost - Germany delivers landmark copyright ruling against OpenAI: What it means for AI and IPNeurIPS 2025 paper - Ascent Fails to ForgetNBER working paper - Large Language Models, Small Labor Market EffectsDwarkesh Podcast blogpost - RL is even more information inefficient than you thought
OpenAI is pivoting to porn while public sentiment turns decisively against AI. Pew Research shows Americans are now concerned over excited by a 2:1 margin. We trace how we got here: broken promises of cancer cures replaced by addiction mechanics and expensive APIs. Meanwhile, data centers are hiding a near-recession, straining power grids, and literally breaking your household appliances. Drawing parallels to the 1970s AI winter, we argue the bubble is shaking and needs to pop now, before it becomes another 2008. The good news? Grassroots resistance works. Protests have already blocked $64 billion in data center projects.NOTE: The project that we cite for the $64 billion blockage is actually a pro-data-center campaign. The numbers still seem ok, but it's worth being aware of.Chapters(00:00) - - Introduction (06:45) - - The Addiction Business Model (10:15) - - Public Sentiment Data (22:45) - - Data Centers and Infrastructure Problems (36:30) - - The Bubble Discussion (44:36) - - Closing Thoughts & Outro LinksPublic Sentiment on AIPew Research report - How People Around the World View AIPew Research report - How the U.S. Public and AI Experts View Artificial IntelligencePew Research report - How Americans View AI and Its Impact on People and SocietyUniversity of Toronto report - Trust, attitudes and use of artificial intelligence: A global study 2025Melbourne Business School report - Key findings on public attitudes towards AIThe Washington Post article - Americans have become more pessimistic about AI. Why?The New York Times article - From Mexico to Ireland, Fury Mounts Over a Global A.I. FrenzyThe Guardian article - ‘It shows such a laziness’: why I refuse to date someone who uses ChatGPTThe Register article - OpenAI's ChatGPT is so popular that almost no one will pay for itAI and Claims of Curing CancerRachel Thomas, PhD blogpost - “AI will cure cancer” misunderstands both AI and medicineThe Atlantic article - OpenAI Wants to Cure Cancer. So Why Did It Make a Web Browser?Independent article - ChatGPT boss predicts when AI could cure cancerThe Atlantic article - AI Executives Promise Cancer Cures. Here’s the RealityAI Porn and the Addiction EconomyForbes article - ChatGPT Will Allow ‘Erotica’ After Easing Mental Health Restrictions, Sam Altman SaysThe Addiction Economy websitePPC article - OpenAI is staffing up to turn ChatGPT into an ad platformTom Nicholas video - Vape-o-nomics: Why Everything is Addictive NowAI BubbleFast Company article - AI isn’t replacing jobs. AI spending isPivot to AI article - The finance press finally starts talking about the ‘AI bubble’Fortune article - Without data centers, GDP growth was 0.1% in the first half of 2025, Harvard economist saysThe Atlantic article - Just How Bad Would an AI Bubble Be?The New York Times article - Debt Has Entered the A.I. BoomWill Lockett's Newsletter article - AI Pullback Has Officially StartedReuters article - Michael Burry of 'Big Short' fame is closing his hedge fundBusiness Insider article - The guy who shorted Enron has a warning about the AI boomDatacentersBloomberg article - AI Needs So Much Power, It’s Making Yours WorseData Center Watch report - $64 billion of data center projects have been blocked or delayed amid local oppositionMore Perfect Union video - We Found the Hidden Cost of Data Centers. It's in Your Electric BillDataCenter Knowledge article - Why Communities Are Protesting Data Centers – And How the Industry Can RespondFighting BackKnight First Amendment Institute essay - AI as Normal TechnologyPranksters vs. Autocrats chapter - Laughtivism: The Secret IngredientSPSP article - Playing with Power: Humor as Everyday ResistanceBlood in the Machine article - The Luddite Renaissance is in full swing
AI Safety for Who?

AI Safety for Who?

2025-10-1349:43

Jacob and Igor argue that AI safety is hurting users, not helping them. The techniques used to make chatbots "safe" and "aligned," such as instruction tuning and RLHF, anthropomorphize AI systems such they take advantage of our instincts as social beings. At the same time, Big Tech companies push these systems for "wellness" while dodging healthcare liability, causing real harms today We discuss what actual safety would look like, drawing on self-driving car regulations.Chapters(00:00) - Introduction & AI Investment Insanity (01:43) - The Problem with AI Safety (08:16) - Anthropomorphizing AI & Its Dangers (26:55) - Mental Health, Wellness, and AI (39:15) - Censorship, Bias, and Dual Use (44:42) - Solutions, Community Action & Final Thoughts LinksAI Ethics & PhilosophyForeign affairs article - The Cost of the AGI DelusionNature article - Principles alone cannot guarantee ethical AIXeiaso blog post - Who Do Assistants Serve?Argmin article - The Banal Evil of AI SafetyAI Panic News article - The Rationality TrapAI Model Bias, Failures, and ImpactsBBC news article - AI Image Generation IssuesThe New York Times article - Google Gemini German Uniforms ControversyThe Verge article - Google Gemini's Embarrassing AI PicturesNPR article - Grok, Elon Musk, and Antisemitic/Racist ContentAccelerAId blog post - How AI Nudges are Transforming Up-and Cross-SellingAI Took My Job websiteAI Mental Health & Safety ConcernsEuronews article - AI Chatbot TragedyPopular Mechanics article - OpenAI and PsychosisPsychology Today article - The Emerging Problem of AI PsychosisRolling Stone article - AI Spiritual Delusions Destroying Human RelationshipsThe New York Times article - AI Chatbots and DelusionsGuidelines, Governance, and CensorshipPreprint - R1dacted: Investigating Local Censorship in DeepSeek's R1 Language ModelMinds & Machines article - The Ethics of AI Ethics: An Evaluation of GuidelinesSSRN paper - Instrument Choice in AI GovernanceAnthropic announcement - Claude Gov Models for U.S. National Security CustomersAnthropic documentation - Claude's ConstitutionReuters investigation - Meta AI Chatbot GuidelinesSwiss Federal Council consultation - Swiss AI Consultation ProceduresGrok Prompts Github RepoSimon Willison blog post - Grok 4 Heavy
The Co-opting of Safety

The Co-opting of Safety

2025-08-2101:24:29

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.(00:00) - Intro (00:21) - Mecha-Hitler Grok (10:07) - "Safety" (19:40) - Under-specification (53:56) - This time isn't different (01:01:46) - Alignment Tax myth (01:17:37) - Actually making AI safer LinksJMLR article - Underspecification Presents Challenges for Credibility in Modern Machine LearningTrail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based SystemsSSRN paper - Uniqueness Bias: Why It Matters, How to Curb ItAdditional Referenced PapersNeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?ICML paper - AI Control: Improving Safety Despite Intentional SubversionICML paper - DarkBench: Benchmarking Dark Patterns in Large Language ModelsOSF preprint - Current Real-World Use of Large Language Models for Mental HealthAnthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackInciting Examplesars Technica article - US government agency drops Grok after MechaHitler backlash, report saysThe Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chatsBBC article - Update that made ChatGPT 'dangerously' sycophantic pulledOther SourcesLondon Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National SecurityVice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy ListservLessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)EA Forum blogpost - An Overview of the AI Safety Funding SituationBook by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information ConcealmentEuronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’Pleias websiteWikipedia page on Jaywalking
AI, Reasoning or Rambling?

AI, Reasoning or Rambling?

2025-07-1401:11:08

In this episode, we redefine AI's "reasoning" as mere rambling, exposing the "illusion of thinking" and "Potemkin understanding" in current models. We contrast the classical definition of reasoning (requiring logic and consistency) with Big Tech's new version, which is a generic statement about information processing. We explain how Large Rambling Models generate extensive, often irrelevant, rambling traces that appear to improve benchmarks, largely due to best-of-N sampling and benchmark gaming.Words and definitions actually matter! Carelessness leads to misplaced investments and an overestimation of systems that are currently just surprisingly useful autocorrects.(00:00) - Intro (00:40) - OBB update and Meta's talent acquisition (03:09) - What are rambling models? (04:25) - Definitions and polarization (09:50) - Logic and consistency (17:00) - Why does this matter? (21:40) - More likely explanations (35:05) - The "illusion of thinking" and task complexity (39:07) - "Potemkin understanding" and surface-level recall (50:00) - Benchmark gaming and best-of-n sampling (55:40) - Costs and limitations (58:24) - Claude's anecdote and the Vending Bench (01:03:05) - Definitional switch and implications (01:10:18) - Outro LinksApple paper - The Illusion of ThinkingICML 2025 paper - Potemkin Understanding in Large Language ModelsPreprint - Large Language Monkeys: Scaling Inference Compute with Repeated SamplingTheoretical understandingMax M. Schlereth Manuscript - The limits of AGI part IIPreprint - (How) Do Reasoning Models Reason?Preprint - A Little Depth Goes a Long Way: The Expressive Power of Log-Depth TransformersNeurIPS 2024 paper - How Far Can Transformers Reason? The Globality Barrier and Inductive ScratchpadEmpirical explanationsPreprint - How Do Large Language Monkeys Get Their Power (Laws)?Andon Labs Preprint - Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous AgentsLeapLab, Tsinghua University and Shanghai Jiao Tong University paper - Does Reinforcement Learning Really Incentivize Reasoning CapacityPreprint - RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMsPreprint - Mind The Gap: Deep Learning Doesn't Learn DeeplyPreprint - Measuring AI Ability to Complete Long TasksPreprint - GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsOther sourcesZuck's Haul webpage - Meta's talent acquisition trackerHacker News discussion - Opinions from the AI communityInterconnects blogpost - The rise of reasoning machinesAnthropic blog - Project Vend: Can Claude run a small shop?
One Big Bad Bill

One Big Bad Bill

2025-06-2353:00

In this episode, we break down Trump's "One Big Beautiful Bill" and its dystopian AI provisions: automated fraud detection systems, centralized citizen databases, military AI integration, and a 10-year moratorium blocking all state AI regulation. We explore the historical parallels with authoritarian data consolidation and why this represents a fundamental shift away from limited government principles once held by US conservatives.(00:00) - Intro (01:13) - Bill, general overview (05:14) - Bill, AI overview (07:54) - Medicaid fraud detection systems (11:20) - Bias in AI Systems and Ethical Concerns (17:58) - Centralization of data (30:04) - Military integration of AI (37:05) - Tax incentives for development (40:57) - Regulatory moratorium (47:58) - One big bad authoritarian regime LinksCongress page on the One Big Beautiful Bill ActNYMag article - Republicans Admit They Didn’t Even Read Their Big Beautiful BillEverything is Horrible Blogpost - They Did Vote For This (GOP House Edition)AuthoritarianismHistorical contextHolocaust Encyclopedia article - Gleichschaltung: Coordinating the Nazi StateWikipedia article - 1943 Amsterdam civil registry office bombingWikipedia article - Four DsConservative leaning, pro-privacy, anti-governmentData Governance Hub blogpost - Review and Literature Guide of Trump’s “One Big Beautiful Dataset”Cato Institute blogpost - If You Value Privacy, Resist Any Form of National ID CardsAmerican Enterprise Intitute blogpost - The Dangerous Road to a “Master File”—Why Linking Government Databases Is a Terrible IdeaEFF blogpost - The Dangers of Consolidating All Government InformationACLU against national ID cardsACLU main page on national ID cardsACLU blogpost - National Identification Cards: Why Does the ACLU Oppose a National I.D. System?ACLU blogpost - 5 Problems with National ID CardsInherent unfairness of MLLighthouse Reports investigation - The Limits of Ethical AILighthouse Reports investigation - Suspicion MachinesAmazon Science publication - Bias preservation in machine learning: The legality of fairness metrics under EU non-discrimination lawMichigan Technology Law Review article - The Unfairness of Fair Machine Learning: Levelling down and strict egalitarianism by defaultWired article - Health Care Bias Is Dangerous. But So Are ‘Fairness’ AlgorithmsMilitaryWallStreet Journal article - The Army’s Newest Recruits: Tech Execs From Meta, OpenAI and MoreTrump executive order - Unleashing American Drone DominanceAnthropic press release - Claude Gov Models for U.S. National Security CustomersMoratorium on State AI RegulationTechPolicy.Press article - The State AI Laws Likeliest To Be Blocked by a MoratoriumForbes article - Colorado’s AI Law Still Stands After Update Effort FailsOther SourcesKPMG report - Incentives and credits tax provisions in “One Big Beautiful Bill Act”The Register article - Trump team leaks AI plans in public GitHub repositoryWallStreet Journal article - To Feed Power-Wolfing AI, Lawmakers Are Embracing NuclearCBS Austin article - IRS direct file program exceeded its expectations but faces uncertain future
Jacob and Igor tackle the wild claims about AI's economic impact by examining three main clusters of arguments: automating expensive tasks like programming, removing "cost centers" like call centers and corporate art, and claims of explosive growth. They dig into the actual data, debunk the hype, and explain why most productivity claims don't hold up in practice. Plus: MIT denounces a paper with fabricated data, and Grok randomly promotes white genocide myths.(00:00) - Recording date + intro (00:52) - MIT denounces paper (04:09) - Grok's white genocide (06:23) - Butthole convergence (07:13) - AI and the economy (14:50) - Automating profit centers (29:46) - Removing the last cost centers (47:16) - "This time is different" (explosive growth) (57:55) - Alpha Evolve, optimization, and slippage LinksUniversity of Chicago working paper - Large Language Models, Small Labor Market EffectsOECD working paper - Miracle or Myth? Assessing the macroeconomic productivity gains from Artificial IntelligenceEpoch AI blogpost - Explosive Growth from AI: A Review of the ArgumentsBusiness Insider article - Anthropic CEO: AI Will Be Writing 90% of Code in 3 to 6 MonthsPreprint - Transformative AGI by 2043 is <1% likelyAutomating profit centersPivot to AI blogpost - If AI is so good at coding … where are the open source contributions?Ben Evans' Mastodon post - "Show me the pull requests"NY Times article - Your A.I. Radiologist Will Not Be With You SoonFastCompany article - More companies are adopting 'AI-first' strategies. Here's how it could impact the environmentForbes article - Business Tech News: Shopify CEO Says AI First Before EmployeesNewsroom article - IBM Study: CEOs Double Down on AI While Navigating Enterprise HurdlesPNAS research article - Evidence of a social evaluation penalty for using AIArs Technica article - AI use damages professional reputation, study suggestsRemoving cost centersThe Register article - Anthopic's law firm blames Claude hallucinations for errorsFortune article - Klarna plans to hire humans again, as new landmark survey reveals most AI projects fail to deliverWikipedia article - The Market for LemonsAlphaEvolveDeepmind press release - AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithmsDeepmind white paper - AlphaEvolve: A coding agent for scientific and algorithmic discoveryOff TopicVelvetShark blogpost - Why do AI company logos look like buttholes?MIT Economics press release - Assuring an accurate research recordPivot to AI blogpost - How to make a splash in AI economics: fake your dataPivot to AI blogpost - Even Elon Musk can’t make Grok claim a ‘white genocide’ in South Africa
DeepSeek: 2 Months Out

DeepSeek: 2 Months Out

2025-04-0901:31:31

DeepSeek has been out for over 2 months now, and things have begun to settle down. We take this opportunity to contextualize the developments that have occurred in its wake, both within the AI industry and the world economy. As systems get more "agentic" and users are willing to spend increasing amounts of time waiting for their outputs, the value of supposed "reasoning" models continues to be peddled by AI system developers, but does the data really back these claims?Check out our DeepSeek minisode for a snappier overview!EPISODE RECORDED 2025.03.30(00:40) - DeepSeek R1 recap (02:46) - What makes it new? (08:53) - What is reasoning? (14:51) - Limitations of reasoning models (why we hate reasoning) (31:16) - Claims about R1 training on Open AI (37:30) - “Deep Research” (49:13) - Developments and drama in the AI industry (56:26) - Proposed economic value (01:14:20) - US government involvement (01:23:28) - OpenAI uses MCP (01:28:15) - Outro LinksDeepSeek websiteDeepSeek paperDeepSeek docs - Models and PricingDeepSeek repo - 3FSUnderstanding DeepSeek/DeepResearchExplainersLanguage Models & Co. article - The Illustrated DeepSeek-R1Towards Data Science article - DeepSeek-V3 Explained 1: Multi-head Latent AttentionJina.ai article - A Practical Guide to Implementing DeepSearch/DeepResearchHan, Not Solo blogpost - The Differences between Deep Research, Deep Research, and Deep ResearchAnalysis and ResearchPreprint - Understanding R1-Zero-Like Training: A Critical PerspectiveBlogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot StudyPreprint - Large Language Monkeys: Scaling Inference Compute with Repeated SamplingPreprint - Chain-of-Thought Reasoning In The Wild Is Not Always FaithfulFallout coverageTechCrunch article - OpenAI calls DeepSeek 'state-controlled,' calls for bans on 'PRC-produced' modelsThe Verge article - OpenAI has evidence that its models helped train China’s DeepSeekInteresting Engineer article - $6M myth: DeepSeek’s true AI cost is 216x higher at $1.3B, research revealsArs Technica article - Microsoft now hosts AI model accused of copying OpenAI dataThe Signal article - Nvidia loses nearly $600 billion in DeepSeek crashYahoo Finance article - The 'Magnificent 7' stocks are having their worst quarter in more than 2 yearsReuters article - Microsoft pulls back from more data center leases in US and Europe, analysts sayUS governanceNational Law Review article - Three States Ban DeepSeek Use on State Devices and NetworksCNN article - US lawmakers want to ban DeepSeek from government devicesHouse bill - No DeepSeek on Government Devices ActSenate bill - Decoupling America's Artificial Intelligence Capabilities from China Act of 2025LeaderboardsaiderLiveBenchLM ArenaKonwinski PrizePreprint - SWE-Bench+: Enhanced Coding Benchmark for LLMsCybernews article - OpenAI study proves LLMs still behind human engineers in over 1400 real-world tasksOther ReferencesAnthropic report - The Anthropic Economic IndexMETR Report - Measuring AI Ability to Complete Long TasksThe Information article - OpenAI Discusses Building Its First Data Center for StorageDeepmind report backing up this ideaTechCrunch article - OpenAI adopts rival Anthropic's standard for connecting AI models to dataReuters article - OpenAI, Meta in talks with Reliance for AI partnerships, The Information reports2024 AI Index reportNDTV article - Ghibli-Style Images To Memes: White House Embraces Alt-Right Online CultureElk post on DOGE and AI
DeepSeek Minisode

DeepSeek Minisode

2025-02-1015:10

DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.(00:00) - Recording date (00:04) - Intro (00:37) - DeepSeek drop and reactions (04:27) - Export controls (08:05) - Skepticism and uncertainty (14:12) - Outro LinksDeepSeek websiteDeepSeek paperReuters article - What is DeepSeek and why is it disrupting the AI sector?Fallout coverageThe Verge article - OpenAI has evidence that its models helped train China’s DeepSeekThe Signal article - Nvidia loses nearly $600 billion in DeepSeek crashCNN article - US lawmakers want to ban DeepSeek from government devicesFortune article - Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the priceDario Amodei's blogpost - On DeepSeek and Export ControlsSemiAnalysis article - DeepSeek DebatesArs Technica article - Microsoft now hosts AI model accused of copying OpenAI dataWiz Blogpost - Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat HistoryInvestigations into "reasoning"Blogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot StudyPreprint - s1: Simple test-time scalingPreprint - LIMO: Less is More for ReasoningBlogpost - Reasoning ReflectionsPreprint - Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH
Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!(00:00) - Recording date (00:05) - Intro (00:29) - Hot off the press (02:17) - Introducing Chris Canal (19:12) - World/risk models (35:21) - Competencies + decision making power (42:09) - Breaking models down (01:05:06) - Timelines, test time compute (01:19:17) - Moving goalposts (01:26:34) - Risk management pre-AGI (01:46:32) - Happy endings (01:55:50) - Causal chains (02:04:49) - Appetite for democracy (02:20:06) - Tech-frame based fallacies (02:39:56) - Bringing back real capitalism (02:45:23) - Orthogonality Thesis (03:04:31) - Why we do this (03:15:36) - Equistamp! LinksEquiStampChris's TwitterMETR Paper - RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human expertsAll Trades article - Learning from History: Preventing AGI Existential Risks through Policy by Chris CanalBetter Systems article - The Omega Protocol: Another Manhattan ProjectSuperintelligence & CommentaryWikipedia article - Superintelligence: Paths, Dangers, Strategies by Nick BostromReflective Altruism article - Against the singularity hypothesis (Part 5: Bostrom on the singularity)Into AI Safety Interview - Scaling Democracy w/ Dr. Igor KrawczukReferenced SourcesBook - Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human FallibilityArtificial Intelligence Paper - Reward is EnoughWikipedia article - Capital and Ideology by Thomas PikettyWikipedia article - PantheonLeCun on AGI"Won't Happen" - Time article - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk"But if it does, it'll be my research agenda latent state models, which I happen to research" - Meta Platforms Blogpost - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AIOther SourcesStanford CS Senior Project - Timing Attacks on Prompt Caching in Language Model APIsTechCrunch article - AI researcher François Chollet founds a new AI lab focused on AGIWhite House Fact Sheet - Ensuring U.S. Security and Economic Strength in the Age of Artificial IntelligenceNew York Post article - Bay Area lawyer drops Meta as client over CEO Mark Zuckerberg’s ‘toxic masculinity and Neo-Nazi madness’OpenEdition Academic Review of Thomas PikettyNeural Processing Letters Paper - A Survey of Encoding Techniques for Signal Processing in Spiking Neural NetworksBFI Working Paper - Do Financial Concerns Make Workers Less Productive?No Mercy/No Malice article - How to Survive the Next Four Years by Scott Galloway
NeurIPS 2024 Wrapped 🌯

NeurIPS 2024 Wrapped 🌯

2024-12-3001:26:57

What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.Posters available at time of episode preparation can be found on the episode webpage.EPISODE RECORDED 2024.12.22(00:00) - Recording date (00:05) - Intro (00:44) - Obligatory mentions (01:54) - SoLaR panel (18:43) - Test of Time (24:17) - And now: science! (28:53) - Downsides of benchmarks (41:39) - Improving the science of ML (53:07) - Performativity (57:33) - NopenAI and Nanthropic (01:09:35) - Fun/interesting papers (01:13:12) - Initial takes on o3 (01:18:12) - WorkArena (01:25:00) - Outro LinksNote: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases. NeurIPS statement on inclusivityCTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs(1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionVisual Autoregressive Model report this link now provides a 404 errorDon't worry, here it is on archive.isReuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report saysCTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI GeniusReddit post on Ilya's talkSoLaR workshop pageReferenced SourcesHarvard Data Science Review article - Data Science at the SingularityPaper - Reward Reports for Reinforcement LearningPaper - It's Not What Machines Can Learn, It's What We Cannot TeachPaper - NeurIPS Reproducibility ProgramPaper - A Metric Learning Reality CheckImproving Datasets, Benchmarks, and MeasurementsTutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best PracticesPaper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?Paper - A Systematic Review of NeurIPS Dataset Management PracticesPaper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks TrackPaper - Benchmark Repositories for Better BenchmarkingPaper - Croissant: A Metadata Format for ML-Ready DatasetsPaper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites ParadoxPaper - Evaluating Generative AI Systems is a Social Science Measurement ChallengePaper - Report Cards: Qualitative Evaluation of LLMsGovernance RelatedPaper - Towards Data Governance of Frontier AI ModelsPaper - Ways Forward for Global AI Benefit SharingPaper - How do we warn downstream model providers of upstream risks?Unified Model Records toolPaper - Policy Dreamer: Diverse Public Policy Creation via Elicitation and Simulation of Human PreferencesPaper - Monitoring Human Dependence on AI Systems with Reliance DrillsPaper - On the Ethical Considerations of Generative AgentsPaper - GPAI Evaluation Standards Taskforce: Towards Effective AI GovernancePaper - Levels of Autonomy: Liability in the age of AI AgentsCertified Bangers + Useful ToolsPaper - Model Collapse Demystified: The Case of RegressionPaper - Preference Learning Algorithms Do Not Learn Preference RankingsLLM Dataset Inference paper + repodattri paper + repoDeTikZify paper + repoFun Benchmarks/DatasetsPaloma paper + datasetRedPajama paper + datasetAssemblage webpageWikiDBs webpageWhodunitBench repoApeBench paper + repoWorkArena++ paperOther SourcesPaper - The Mirage of Artificial Intelligence Terms of Use Restrictions
The idea of model cards, which was introduced as a measure to increase transparency and understanding of LLMs, has been perverted into the marketing gimmick characterized by OpenAI's o1 system card. To demonstrate the adversarial stance we believe is necessary to draw meaning from these press-releases-in-disguise, we conduct a close read of the system card. Be warned, there's a lot of muck in this one.Note: All figures/tables discussed in the podcast can be found on the podcast website at https://kairos.fm/muckraikers/e009/(00:00) - Recorded 2024.12.08 (00:54) - Actual intro (03:00) - System cards vs. academic papers (05:36) - Starting off sus (08:28) - o1.continued (12:23) - Rant #1: figure 1 (18:27) - A diamond in the rough (19:41) - Hiding copyright violations (21:29) - Rant #2: Jacob on "hallucinations" (25:55) - More ranting and "hallucination" rate comparison (31:54) - Fairness, bias, and bad science comms (35:41) - System, dev, and user prompt jailbreaking (39:28) - Chain-of-thought and Rao-Blackwellization (44:43) - "Red-teaming" (49:00) - Apollo's bit (51:28) - METR's bit (59:51) - Pass@??? (01:04:45) - SWE Verified (01:05:44) - Appendix bias metrics (01:10:17) - The muck and the meaning Linkso1 system cardOpenAI press release collection - 12 Days of OpenAIAdditional o1 CoverageNIST + AISI [report] - US AISI and UK AISI Joint Pre-Deployment TestApollo Research's paper - Frontier Models are Capable of In-context SchemingVentureBeat article - OpenAI launches full o1 model with image uploads and analysis, debuts ChatGPT ProThe Atlantic article - The GPT Era Is Already EndingOn Data Labelers60 Minutes article + video - Labelers training AI say they're overworked, underpaid and exploited by big American tech companiesReflections article - The hidden health dangers of data labeling in AI developmentPrivacy International article = Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasetsChain-of-Thought Papers CitedPaper - Measuring Faithfulness in Chain-of-Thought ReasoningPaper - Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingPaper - On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language ModelsPaper - Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language ModelsOther Mentioned/Relevant SourcesAndy Jones blogpost - Rao-BlackwellizationPaper - Training on the Test Task Confounds Evaluation and EmergencePaper - Best-of-N JailbreakingResearch landing page - SWE BenchCode Competition - Konwinski PrizeLakera game = GandalfKate Crawford's Atlas of AIBlueDot Impact's course - Intro to Transformative AIUnrelated DevelopmentsCruz's letter to Merrick GarlandAWS News Blog article - Introducing Amazon Nova foundation models: Frontier intelligence and industry leading price performanceBleepingComputer article - Ultralytics AI model hijacked to infect thousands with cryptominerThe Register article - Microsoft teases Copilot Vision, the AI sidekick that judges your tabsFox Business article - OpenAI CEO Sam Altman looking forward to working with Trump admin, says US must build best AI infrastructure
While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.(00:00) - Intro (00:29) - Hot off the press (02:59) - Repealing the AI executive order? (11:16) - "Manhattan" for AI (24:33) - EU (30:47) - UK (39:27) - Bengio (44:39) - Comparing EU/UK to USA (45:23) - China (51:12) - Taxes (55:29) - The muck LinksSFChronicle article - US gathers allies to talk AI safety as Trump's vow to undo Biden's AI policy overshadows their workTrump's Executive Order on AI (the AI governance executive order at home)Biden's Executive Order on AICongressional report brief which advises a "Manhattan Project for AI"Non-USACAIRNE resource collection on CERN for AIUK Frontier AI Taskforce report (2023)International interim report (2024)Bengio's paper - AI and Catastrophic RiskDavidad's Safeguarded AI program at ARIAMIT Technology Review article - Four things to know about China’s new AI rules in 2024GovInsider article - Australia’s national policy for ethical use of AI starts to take shapeFuture of Privacy forum article - The African Union’s Continental AI Strategy: Data Protection and Governance Laws Set to Play a Key Role in AI RegulationTaxesMacroeconomic Dynamics paper - Automation, Stagnation, and the Implications of a Robot TaxCESifo paper - AI, Automation, and TaxationGavTax article - Taxation of Artificial Intelligence and AutomationPerplexity PagesCERN for AI pageChina's AI policy pageSingapore's AI policy pageAI policy in Africa, India, Australia pageOther SourcesArtificial Intelligence Made Simple article - NYT's "AI Outperforms Doctors" Story Is WrongIntel report - Reclaim Your Day: The Impact of AI PCs on ProductivityHeise Online article - Users on AI PCs slower, Intel sees problem in unenlightened usersThe Hacker News article - North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedInFuturism article - Character.AI Is Hosting Pedophile Chatbots That Groom Users Who Say They're UnderageVice article - 'AI Jesus' Is Now Taking Confessions at a Church in SwitzerlandPolitico article - Ted Cruz: Congress 'doesn't know what the hell it's doing' with AI regulationUS Senate Committee on Commerce, Science, and Transportation press release - Sen. Cruz Sounds Alarm Over Industry Role in AI Czar Harris’s Censorship Agenda
The End of Scaling?

The End of Scaling?

2024-11-1901:07:00

Multiple news outlets, including The Information, Bloomberg, and Reuters [see sources] are reporting an "end of scaling" for the current AI paradigm. In this episode we look into these articles, as well as a wide variety of economic forecasting, empirical analysis, and technical papers to understand the validity, and impact of these reports. We also use this as an opportunity to contextualize the realized versus promised fruits of "AI".(00:23) - Hot off the press (01:49) - The end of scaling (10:50) - "Useful tools" and "agentic" "AI" (17:19) - The end of quantization (25:18) - Hedging (29:41) - The end of upwards mobility (33:12) - How to grow an economy (38:14) - Transformative & disruptive tech (49:19) - Finding the meaning (56:14) - Bursting AI bubble and Trump (01:00:58) - The muck LinksThe Information article - OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements SlowsBloomberg [article] - OpenAI, Google and Anthropic Are Struggling to Build More Advanced AIReuters article - OpenAI and others seek new path to smarter AI as current methods hit limitationsPaper on the end of quantization - Scaling Laws for PrecisionTim Dettmers Tweet on "Scaling Laws for Precision"Empirical AnalysisWU Vienna paper - Unslicing the pie: AI innovation and the labor share in European regionsIMF paper - The Labor Market Impact of Artificial Intelligence: Evidence from US RegionsNBER paper - Automation, Career Values, and Political PreferencesPew Research Center report - Which U.S. Workers Are More Exposed to AI on Their Jobs?ForecastingNBER/Acemoglu paper - The Simple Macroeconomics of AINBER/Acemoglu paper - Harms of AIIMF report - Gen-AI: Artificial Intelligence and the Future of WorkSubmission to Open Philanthropy AI Worldviews Contest - Transformative AGI by 2043 is <1% likelyExternalities and the Bursting BubbleNBER paper - Bubbles, Rational Expectations and Financial MarketsClayton Christensen lecture capture - Clayton Christensen: Disruptive innovationThe New Republic article - The “Godfather of AI” Predicted I Wouldn’t Have a Job. He Was Wrong.Latent Space article - $2 H100s: How the GPU Rental Bubble BurstOn ProductizationPalantir press release on introduction of Claude to US security and defenseArs Technica article - Claude AI to process secret government data through new Palantir dealOpenAI press release on partnering with Condé NastCandid Technology article - Shutterstock and Getty partner with OpenAI and BRIAE2BStripe agentsRobopairOther SourcesCBS News article - Google AI chatbot responds with a threatening message: "Human … Please die."Biometric Update article - Travelers to EU may be subjected to AI lie detectorTechcrunch article - OpenAI’s tumultuous early years revealed in emails from Musk, Altman, and othersRichard Ngo Tweet on leaving OpenAI
October 2024 saw a National Security Memorandum and US framework for using AI in national security contexts. We go through the content so you don't have to, pull out the important bits, and summarize our main takeaways.(00:48) - The memorandum (06:28) - What the press is saying (10:39) - What's in the text (13:48) - Potential harms (17:32) - Miscellaneous notable stuff (31:11) - What's the US governments take on AI? (45:45) - The civil side - comments on reporting (49:31) - The commenters (01:07:33) - Our final hero (01:10:46) - The muck LinksUnited States National Security Memorandum on AIFact Sheet on the National Security MemorandumFramework to Advance AI Governance and Risk Management in National SecurityRelated MediaCAIS Newsletter - AI Safety Newsletter #43NIST report - Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence ProfileACLU press release - ACLU Warns that Biden-Harris Administration Rules on AI in National Security Lack Key ProtectionsWikipedia article - Presidential MemorandumReuters article - White House presses gov't AI use with eye on security, guardrailsForbes article - America’s AI Security Strategy Acknowledges There’s No Stopping AIDefenseScoop article - New White House directive prods DOD, intelligence agencies to move faster adopting AI capabilitiesNYTimes article - Biden Administration Outlines Government ‘Guardrails’ for A.I. ToolsForbes article - 5 Things To Know About The New National Security Memorandum On AI – And What ChatGPT ThinksFederal News Network interview - A look inside the latest White House artificial intelligence memoGovtech article - Reactions Mostly Positive to National Security AI MemoThe Information article - Biden Memo Encourages Military Use of AIOther SourcesPhysical Intelligence press release - π0: Our First Generalist PolicyOpenAI press release - Introducing ChatGPT SearchWhoPoo App!!
Frontier developers continue their war on sane versioning schema to bring us Claude 3.5 Sonnet (New), along with "computer use" capabilities. We discuss not only the new model, but also why Anthropic may have released this model and tool combination now.(00:00) - Intro (00:22) - Hot off the press (05:03) - Claude 3.5 Sonnet (New) Two 'o' 3000 (09:23) - Breaking down "computer use" (13:16) - Our understanding (16:03) - Diverging business models (32:07) - Why has Anthropic chosen this strategy? (43:14) - Changing the frame (48:00) - Polishing the lily LinksAnthropic press release - Introducing Claude 3.5 Sonnet (New)Model Card AddendumOther Anthropic Relevant MediaPaper - Sabotage Evaluations for Frontier ModelsAnthropic press release - Anthropic's Updated RSPAlignment Forum blogpost - Anthropic's Updated RSPTweet - Response to scare regarding Anthropic training on user dataAnthropic press release - Developing a computer use modelSimon Willison article - Initial explorations of Anthropic’s new Computer Use capabilityTweet - ARC Prize performanceThe Information article - Anthropic Has Floated $40 Billion Valuation in Funding TalksOther SourcesLWN.net article - OSI readies controversial Open AI definitionNational Security MemorandumFramework to Advance AI Governance and Risk Management in National SecurityReuters article - Mother sues AI chatbot company Character.AI, Google over son's suicideMedium article - A Small Step Towards Reproducing OpenAI o1: Progress Report on the Steiner Open Source ModelsThe Guardian article - Google's solution to accidental algorithmic racism: ban gorillasTIME article - Ethical AI Isn’t to Blame for Google’s Gemini DebacleLatacora article - The SOC2 Starting SevenGrandview Research market trends - Robotic Process Automation Market Trends
Winter is Coming for OpenAI

Winter is Coming for OpenAI

2024-10-2201:22:37

Brace yourselves, winter is coming for OpenAI - atleast, that's what we think. In this episode we look at OpenAI's recent massive funding round and ask "why would anyone want to fund a company that is set to lose net 5 billion USD for 2024?" We scrape through a whole lot of muck to find the meaningful signals in all this news, and there is a lot of it, so get ready!(00:00) - Intro (00:28) - Hot off the press (02:43) - Why listen? (06:07) - Why might VCs invest? (15:52) - What are people saying (23:10) - How *is* OpenAI making money? (28:18) - Is AI hype dying? (41:08) - Why might big companies invest? (48:47) - Concrete impacts of AI (52:37) - Outcome 1: OpenAI as a commodity (01:04:02) - Outcome 2: AGI (01:04:42) - Outcome 3: best plausible case (01:07:53) - Outcome 1*: many ways to bust (01:10:51) - Outcome 4+: shock factor (01:12:51) - What's the muck (01:21:17) - Extended outro LinksReuters article - OpenAI closes $6.6 billion funding haul with investment from Microsoft and NvidiaGoldman Sachs report - GenAI: Too Much Spend, Too Little BenefitApricitas Economics article - The AI Investment BoomDiscussion of "The AI Investment Boom" on YCombinatorState of AI in 13 ChartsFortune article - OpenAI sees $5 billion loss in 2024 and soaring sales as big ChatGPT fee hikes planned, report saysMore on AI Hype (Dying)Latent Space article - The Winds of AI WinterArticle by Gary Marcus - The Great AI Retrenchment has BegunTimmermanReport article - AI: If Not Now, When? No, Really - When?MIT News article - Who Will Benefit from AI?Washington Post article - The AI Hype bubble is deflating. Now comes the hard part.Andreesen Horowitz article - Why AI Will Save the WorldOther SourcesHuman-Centered Artificial Intelligence Foundation Model Transparency IndexCointelegraph article - Europe gathers global experts to draft ‘Code of Practice’ for AIReuters article - Microsoft's VP of GenAI research to join OpenAITwitter post from Tim Brooks on joining DeepMindEdward Zitron article - The Man Who Killed Google Search
The Open Source AI Definition is out after years of drafting, will it reestablish brand meaning for the “Open Source” term? Also, the 2024 Nobel Prizes in Physics and Chemistry are heavily tied to AI; we scrutinize not only this year's prizes, but also Nobel Prizes as a concept. (00:00) - Intro (00:30) - Hot off the press (03:45) - Open Source AI background (10:30) - Definitions and changes in RC1 (18:36) - “Business source” (22:17) - Parallels with legislation (26:22) - Impacts of the OSAID (33:58) - 2024 Nobel Prize Context (37:21) - Chemistry prize (45:06) - Physics prize (50:29) - Takeaways (52:03) - What’s the real muck? (01:00:27) - Outro LinksOpen Source AI Definition, Release Candidate 1OSAID RC1 announcementAll Nobel Prizes 2024More Reading on Open Source AIKairos.FM article - Open Source AI is a lie, but it doesn't have to beThe Register article - The open source AI civil war approachesMIT Technology Review article - We finally have a definition for open-source AIOn Nobel PrizesPaper - Access to Opportunity in the Sciences: Evidence from the Nobel LaureatesPhysics prize - scientific background, popular infoChemistry prize - scientific background, popular infoReuters article - Google's Nobel prize winners stir debate over AI researchWikipedia article - Nobel diseaseOther SourcesPivot.ai article - People are ‘blatantly stealing my work,’ AI artist complainsPaper - GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsPaper - Reclaiming AI as a Theoretical Tool for Cognitive Science | Computational Brain & Behavior 
loading
Comments 
loading