Intro topic: Buying a Car
News/Links:
Cognitive Load is what Matters
Diffusion models are Real-Time Game Engines
Your Company Needs Junior Devs
https://softwaredoug.com/blog/2024/09/07/your-team-needs-juniors)
Seamless Streaming / Fish Speech / LLaMA Omni
Seamless: https://huggingface.co/facebook/seamless-streaming)
LLaMA Omni: https://github.com/ictnlp/LLaMA-Omni)
Book of the Show
Patrick:
Thought Emporium Youtube
Jason:
Novel Minds
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h)
Tool of the Show
Patrick:
Escape Simulator
Jason:
Cursor IDE
Topic: Vector Databases (~54 min)
How computers represent data traditionally
ASCII values
RGB values
How traditional compression works
Huffman encoding (tree structure)
Lossy example: Fourier Transform & store coefficients
How embeddings are computed
Pairwise (contrastive) methods
Forward models (self-supervised)
Similarity metrics
Approximate Nearest Neighbors (ANN)
Sub-Linear ANN
Clustering
Space Partitioning (e.g. K-D Trees)
What a vector database does
Perform nearest-neighbors with many different similarity metrics
Store the vectors and the data structures to support sub-linear ANN
Handle updates, deletes, rebalancing/reclustering, backups/restores
Examples
pgvector: a vector-database plugin for postgres
Weaviate, Pinecone
Milvus