An AI engineer is a hybrid role combining data science, software engineering, and ML engineering. They select models, optimize them for specific tasks using techniques like fine-tuning and RAG, and deploy them into production. Their responsibilities include choosing the right LLM, building baseline models, and ensuring models meet business requirements.
AI engineers are in demand because they bridge the gap between data science, software engineering, and ML engineering. There are currently around 4,000 job openings for LLM engineers in the U.S., comparable to the number of data science jobs.
AI engineers evaluate models based on data quality, evaluation criteria, and non-functional requirements like budget and time to market. They often start with closed-source models like GPT-4 for prototyping and may switch to open-source models if proprietary data or privacy concerns are involved.
Key techniques include fine-tuning models with domain-specific data, RAG (Retrieval Augmented Generation) for enhancing responses with relevant context, and agentic AI for creating autonomous, proactive systems that can solve complex problems and use tools.
RAG (Retrieval Augmented Generation) is a technique where an LLM retrieves relevant documents or information from a database to improve its responses. It involves encoding the query into a vector and finding the closest matching documents to provide context to the model.
Agentic AI refers to systems that can autonomously solve complex problems by breaking them into smaller steps, using tools, and even acting proactively beyond a single interaction. For example, an agentic AI could detect a price drop for a flight and notify the user without being prompted.
Important benchmarks include GPQA (Google Proof Question and Answers) for expert-level knowledge, MMLU Pro for language understanding, and BBHard (Big Bench Hard) for testing advanced capabilities like sarcasm detection. These benchmarks help evaluate model performance across various tasks.
AI engineers can deploy models using platforms like modal.com for serverless deployment, Lightning Studios for seamless prototyping to production, or Docker and Kubernetes for full production services. For agentic AI, platforms like LandGraph and Crew AI Enterprise can be used to deploy multi-agent systems.
Outsmart is a game where four LLMs compete against each other in a strategic environment. Each model starts with 12 coins and must decide whom to take coins from and whom to give coins to, using private messages to strategize. The game evaluates how well models can form alliances and outsmart each other, providing an ELO rating based on their performance.
Useful leaderboards include Hugging Face's open LLM leaderboard, Vellum.ai for cost and context window comparisons, and LMArena.ai (formerly LMSYS) for head-to-head human evaluations. These leaderboards help compare models based on performance, cost, and hardware requirements.
Ed Donner co-founded AI-driven recruitment platform, Nebula.io, with The SuperDataScience Podcast’s host, Jon Krohn. Ed and Jon reminisce about how they launched their company, the growing opportunities for data scientists, how to choose an LLM, and today’s top technical terms in AI.
Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected]) for sponsorship information.
In this episode you will learn:
(11:15) What an AI engineer does
(19:23) Defining today’s key terms in AI: RAG, fine tuning, agentic.
(27:09) How to select an LLM
(49:41) Pitting LLMs against each other in a game
(53:14) What to do once you’ve selected an AI model
Additional materials: www.superdatascience.com/847)