We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // Bernie Wu // #270

2024/10/22

Bernie Wu) is VP of Business Development for MemVerge). He has 25+ years of experience as a senior executive for data center hardware and software infrastructure companies including companies such as Conner/Seagate, Cheyenne Software, Trend Micro, FalconStor, Levyx, and MetalSoft.

Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // MLOps Podcast #270 with Bernie Wu, VP Strategic Partnerships/Business Development of MemVerge.

// Abstract Limited memory capacity hinders the performance and potential of research and production environments utilizing Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques. This discussion explores how leveraging industry-standard CXL memory can be configured as a secondary, composable memory tier to alleviate this constraint.

We will highlight some recent work we’ve done in integrating of this novel class of memory into LLM/RAG/vector database frameworks and workflows.

Disaggregated shared memory is envisioned to offer high performance, low latency caches for model/pipeline checkpoints of LLM models, KV caches during distributed inferencing, LORA adaptors, and in-process data for heterogeneous CPU/GPU workflows. We expect to showcase these types of use cases in the coming months.

// Bio Bernie is VP of Strategic Partnerships/Business Development for MemVerge. His focus has been building partnerships in the AI/ML, Kubernetes, and CXL memory ecosystems. He has 25+ years of experience as a senior executive for data center hardware and software infrastructure companies including companies such as Conner/Seagate, Cheyenne Software, Trend Micro, FalconStor, Levyx, and MetalSoft. He is also on the Board of Directors for Cirrus Data Solutions. Bernie has a BS/MS in Engineering from UC Berkeley and an MBA from UCLA.

// MLOps Swag/Merch https://mlops-community.myshopify.com/)

// Related Links Website: www.memverge.com) Accelerating Data Retrieval in Retrieval Augmentation Generation (RAG) Pipelines using CXL: https://memverge.com/accelerating-data-retrieval-in-rag-pipelines-using-cxl/)

--------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack) Follow us on Twitter: @mlopscommunity) Sign up for the next meetup: https://go.mlops.community/register) Catch all episodes, blogs, newsletters, and more: https://mlops.community/)

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/) Connect with Bernie on LinkedIn: https://www.linkedin.com/in/berniewu/)

Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // Bernie Wu // #270

MLOps.community

Shownotes Transcript

Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // Bernie Wu // #270 55:18 Share

MLOps.community

Shownotes Transcript

Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // Bernie Wu // #270