We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)

Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)

2025/3/18
logo of podcast Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript
People
M
Max Bartolo
Topics
Max Bartolo: 我在Cohere从事人工智能模型的研究工作,主要关注模型的推理能力、鲁棒性和实用性。我的研究涵盖了模型验证、对抗性数据收集、模型评估以及人类反馈机制等多个方面。我发现,模型的推理能力并非简单的模式匹配,而是结合了模式匹配和基于规则的推理。模型的鲁棒性至关重要,需要通过动态基准测试和对抗性数据收集来不断提升。人类反馈在模型训练和评估中发挥着重要作用,但其并非金标准,人类的偏好受多种因素影响,例如格式、风格和自信程度。因此,我们需要更细致地分析人类反馈,并根据用户的个人偏好动态调整模型的行为。此外,模型的上下文窗口大小也至关重要,需要在性能和效率之间取得平衡。未来,我们需要开发更通用的推理模型,并更关注模型在实际应用中的价值。

Deep Dive

Chapters
This chapter explores the challenges of ensuring AI model consistency and robustness. It questions whether models truly reason or simply excel at specific benchmarks and highlights the need for reliable performance.
  • The expectation of machines is consistently that they are right all the time.
  • Model consistency and robustness are crucial; if a model fails inconsistently, it raises doubts about its reasoning capabilities.
  • Humans are adept at finding examples where models fail, suggesting a lack of genuine reasoning.

Shownotes Transcript

Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.

Max Bartolo (Cohere):

https://www.maxbartolo.com/

https://cohere.com/command

TRANSCRIPT:

https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0

TOC:

  1. Model Reasoning and Verification

[00:00:00] 1.1 Model Consistency and Reasoning Verification

[00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis

[00:10:28] 1.3 AI Application Development and Model Deployment

[00:14:24] 1.4 AI Alignment and Human Feedback Limitations

  1. Evaluation and Bias Assessment

    [00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment

    [00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior

    [00:32:43] 2.3 Adversarial Examples and Model Robustness

  2. Benchmarking Systems and Methods

    [00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches

    [00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics

    [00:50:33] 3.3 Evolution of Model Benchmarking Methods

    [00:51:15] 3.4 Hierarchical Capability Testing Framework

    [00:52:35] 3.5 Benchmark Platforms and Tools

  3. Model Architecture and Performance

    [00:55:15] 4.1 Cohere's Model Development Process

    [01:00:26] 4.2 Model Quantization and Performance Evaluation

    [01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards

    [01:08:27] 4.4 Training Progression and Technical Challenges

  4. Future Directions and Challenges

    [01:13:48] 5.1 Context Window Evolution and Trade-offs

    [01:22:47] 5.2 Enterprise Applications and Future Challenges

REFS:

[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.

https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20

[00:04:15] Influence functions in machine learning, Koh & Liang

https://arxiv.org/abs/1703.04730

[00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.

https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf

[00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmann

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AI

https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

[00:13:30] OpenInterpreter

https://github.com/KillianLucas/open-interpreter

[00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsom

https://arxiv.org/abs/2309.16349

[00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.

https://arxiv.org/abs/2404.16019

[00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry

https://arxiv.org/abs/1905.02175

[00:43:00] DynaBench platform paper, Douwe Kiela et al.

https://aclanthology.org/2021.naacl-main.324.pdf

[00:50:15] Sara Hooker's work on compute limitations, Sara Hooker

https://arxiv.org/html/2407.05694v1

[00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.

https://arxiv.org/abs/2207.10062

[01:04:35] DROP, Dheeru Dua et al.

https://arxiv.org/abs/1903.00161

[01:07:05] GSM8k, Cobbe et al.

https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k

[01:09:30] ARC, François Chollet

https://github.com/fchollet/ARC-AGI

[01:15:50] Command A, Cohere

https://cohere.com/blog/command-a

[01:22:55] Enterprise search using LLMs, Cohere

https://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers