We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 177: Vector Databases

177: Vector Databases

2024/11/4
logo of podcast Programming Throwdown

Programming Throwdown

Shownotes Transcript

Intro topic:  Buying a Car

News/Links:

 

Book of the Show

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h)

Tool of the Show

Topic: Vector Databases (~54 min)

  • How computers represent data traditionally

  • ASCII values

  • RGB values

  • How traditional compression works

  • Huffman encoding (tree structure)

  • Lossy example: Fourier Transform & store coefficients

  • How embeddings are computed

  • Pairwise (contrastive) methods

  • Forward models (self-supervised)

  • Similarity metrics

  • Approximate Nearest Neighbors (ANN)

  • Sub-Linear ANN

  • Clustering

  • Space Partitioning (e.g. K-D Trees)

  • What a vector database does

  • Perform nearest-neighbors with many different similarity metrics

  • Store the vectors and the data structures to support sub-linear ANN

  • Handle updates, deletes, rebalancing/reclustering, backups/restores

  • Examples

  • pgvector: a vector-database plugin for postgres

  • Weaviate, Pinecone 

  • Milvus

** ★ Support this podcast on Patreon ★) **