Yifan Mai left Google to work at Stanford because he wanted to focus on research and build infrastructure that supports scientific researchers, rather than being on the faculty track or publishing research himself. He enjoys being close to research and enabling other researchers through open-source software.
The HELM project, led by Yifan Mai, is a research initiative that benchmarks the performance of large language models (LLMs) across various tasks and use cases. It provides a standardized and transparent framework for evaluating models, allowing users to compare their performance on different benchmarks and use cases.
Open weights refer to models where the parameters (weights) are available for anyone to download and run locally, such as Meta's LLaMA. Closed weights, on the other hand, are models like OpenAI's GPT or Google's Gemini, which are only accessible through the company's API or services, and the parameters are not publicly available.
Evaluating LLMs in high-stakes domains like medicine or law is challenging because it requires domain-specific benchmarks and expert evaluation. For example, medical advice given by an LLM needs to be assessed by a real doctor, and legal advice requires verification against existing case law. These evaluations are complex and often require human judgment, which is difficult to automate.
The 'win rate' in the HELM project is a metric that measures the probability of one model performing better than another across a variety of benchmarks. It aggregates results from multiple benchmarks to give an overall sense of how models compare to each other in different tasks.
Yifan Mai highlights several potential harms of LLMs, including the generation of harmful outputs like instructions for building bombs or political disinformation. There are also concerns about bias, fairness, and labor displacement, as well as the ethical implications of using AI in high-stakes applications like unemployment benefits processing.
Yifan Mai is optimistic about the future of AI accessibility, particularly with the improvement of smaller, more efficient models that can run on consumer-grade hardware like MacBooks. He believes this will make AI more evenly distributed, though he remains concerned about who gets to decide how the technology is used and the power dynamics involved.
Yifan Mai advises aspiring AI engineers to focus on building strong software engineering fundamentals, including programming, software engineering practices, and foundational knowledge in AI, such as probability and statistics. He believes these fundamentals will be crucial regardless of the specific AI technologies that emerge in the future.
On this week's episode of the podcast, freeCodeCamp founder Quincy Larson interviews Yifan Mai, a Senior Software Engineer on Google's TensorFlow team who left the private sector to go do AI research at Stanford. He's the lead maintainer of the open source HELM project, where he benchmarks the performance of Large Language Models.
We talk about: - Open Source VS Open Weights in LLMs - The Ragged Frontier of LLM use cases - AI impact on jobs and our predictions - What to learn so you can stay above the waterline
Can you guess what song I'm playing in the intro? I put the entire cover song at the end of the podcast if you want to listen to it, and you can watch me play all the instruments on the YouTube version of this episode.
Also, I want to thank the 10,993 kind people who support our charity each month, and who make this podcast possible. You can join them and support our mission at: https://www.freecodecamp.org/donate
Links we talk about during our conversation:
Yifan's personal webpage: yifanmai.com
HELM Leaderboards: https://crfm.stanford.edu/helm/
HELM GitHub Repository: https://github.com/stanford-crfm/helm
Stanford HAI Blog: https://crfm.stanford.edu/helm/