Yifan Mai left Google to work at Stanford because he wanted to focus on research and building open-source software that supports academic researchers. He enjoys being closer to the research process and enabling researchers with the infrastructure they need.
The HELM project is a research initiative that benchmarks the performance of large language models (LLMs) across various tasks and benchmarks. It provides a standardized and transparent way to evaluate models, allowing users to compare their performance and use the framework for their own evaluations.
Open-source models allow users to run the model locally on their machines, giving them control over the input and output. Closed-weight models, like GPT-4 and Google Gemini, are only accessible through company APIs or services, meaning users cannot directly access the model's parameters or run it locally.
Evaluating LLMs in high-stakes domains like law or medicine is challenging because it requires expert judgment to assess the accuracy and usefulness of the model's outputs. For example, medical advice given by an LLM would need to be verified by a doctor, and legal advice would need to be checked against existing case law.
The 'win rate' is a metric that measures the probability of one model performing better than another on a randomly selected benchmark. It aggregates results across multiple benchmarks to give an overall sense of a model's comparative performance.
Yifan Mai highlights several ethical concerns, including the potential for LLMs to generate harmful outputs like instructions for building bombs or spreading disinformation. There are also concerns about bias in models, labor displacement, and the uneven distribution of power between big tech companies and workers.
Yifan Mai is optimistic about the increasing accessibility of AI, particularly with the development of smaller, more efficient models that can run on consumer-grade hardware. However, he remains concerned about who gets to decide how these tools are used and the potential for power imbalances in their deployment.
Yifan Mai advises aspiring engineers to focus on building strong software engineering fundamentals, including programming, software engineering practices, and foundational knowledge in AI. He believes these skills will remain valuable regardless of the specific technology trends.
On this week's episode of the podcast, freeCodeCamp founder Quincy Larson interviews Yifan Mai, a Senior Software Engineer on Google's TensorFlow team who left the private sector to go do AI research at Stanford. He's the lead maintainer of the open source HELM project, where he benchmarks the performance of Large Language Models.
We talk about: - Open Source VS Open Weights in LLMs - The Ragged Frontier of LLM use cases - AI impact on jobs and our predictions - What to learn so you can stay above the waterline
Can you guess what song I'm playing in the intro? I put the entire cover song at the end of the podcast if you want to listen to it, and you can watch me play all the instruments on the YouTube version of this episode.
Also, I want to thank the 10,993 kind people who support our charity each month, and who make this podcast possible. You can join them and support our mission at: https://www.freecodecamp.org/donate
Links we talk about during our conversation:
Yifan's personal webpage: yifanmai.com
HELM Leaderboards: https://crfm.stanford.edu/helm/
HELM GitHub Repository: https://github.com/stanford-crfm/helm
Stanford HAI Blog: https://crfm.stanford.edu/helm/