We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“Which evals resources would be good?” by Marius Hobbhahn

2024/11/16

I want to make a serious effort to create a bigger evals field. I’m very interested in which resources you think would be most helpful. I’m also looking for others to contribute and potentially some funding.

Which resources would be most helpful? Suggestions include: Open Problems in Evals list: A long list of open relevant problems & projects in evals. More Inspect Tutorials/Examples: Specifically, examples with deep explanations, examples of complex agent setups, and more detailed examples of the components around the evals. Evals playbook: A detailed guide on how to build evals with detailed examples for agentic and non-agentic evals. Salient demos: I don’t even necessarily want to build scary demos, I just want normal people to be less hit-by-a-truck when they learn about LM agent capabilities in the near future. More “day-to-day” evals process resources: A lot of evals expertise is latent knowledge that professional evaluators [...]

First published: November 16th, 2024

Source: https://www.lesswrong.com/posts/uoAAGzn4hPAHLeG2Y/which-evals-resources-would-be-good)

---

Narrated by TYPE III AUDIO).

“Which evals resources would be good?” by Marius Hobbhahn

LessWrong (30+ Karma)

Shownotes Transcript

“Which evals resources would be good?” by Marius Hobbhahn 09:55 Share

LessWrong (30+ Karma)

Shownotes Transcript

“Which evals resources would be good?” by Marius Hobbhahn