177: Vector Databases
Update: 2024-11-04
Description
Intro topic: Buying a Car
News/Links:
- Cognitive Load is what Matters
- Diffusion models are Real-Time Game Engines
- Your Company Needs Junior Devs
- Seamless Streaming / Fish Speech / LLaMA Omni
Book of the Show
- Patrick:
- Thought Emporium Youtube
- Jason:
- Novel Minds
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Escape Simulator
- Jason:
- Cursor IDE
Topic: Vector Databases (~54 min)
- How computers represent data traditionally
- ASCII values
- RGB values
- How traditional compression works
- Huffman encoding (tree structure)
- Lossy example: Fourier Transform & store coefficients
- How embeddings are computed
- Pairwise (contrastive) methods
- Forward models (self-supervised)
- Similarity metrics
- Approximate Nearest Neighbors (ANN)
- Sub-Linear ANN
- Clustering
- Space Partitioning (e.g. K-D Trees)
- What a vector database does
- Perform nearest-neighbors with many different similarity metrics
- Store the vectors and the data structures to support sub-linear ANN
- Handle updates, deletes, rebalancing/reclustering, backups/restores
- Examples
- pgvector: a vector-database plugin for postgres
- Weaviate, Pinecone
- Milvus
★ Support this podcast on Patreon ★
Comments
Top Podcasts
The Best New Comedy Podcast Right Now – June 2024The Best News Podcast Right Now – June 2024The Best New Business Podcast Right Now – June 2024The Best New Sports Podcast Right Now – June 2024The Best New True Crime Podcast Right Now – June 2024The Best New Joe Rogan Experience Podcast Right Now – June 20The Best New Dan Bongino Show Podcast Right Now – June 20The Best New Mark Levin Podcast – June 2024
In Channel