“The resources used to train the model can be repurposed to run millions of instances of it (this matches projected cluster sizes by ~2027), and the model can absorb information and generate actions at roughly 10x-100x human speed. … We could summarize this as a ‘country of geniuses in a datacenter’.” Dario Amodei, CEO of Anthropic, Machines of Loving Grace “Let's say each copy of GPT-4 is producing 10 words per second. It turns out they would be able to run something like 300,000 copies of GPT-4 in parallel. And by the time they are training GPT-5 it will be a more extreme situation where just using the computer chips they used to train GPT-5, using them to kind of run copies of GPT-5 in parallel, you know, again, each producing 10 words per second, they’d be able to run 3 million copies of GPT-5 in parallel. And for [...]
Outline:
(02:28) Section I - The Question
(05:13) Section II - The Scenario
(10:54) Section III - Existing Estimates
(19:53) Section IV - Compute
(27:16) Section VI - Inference
(39:15) Section V - Human Equivalents
(45:38) Section VII - The Estimates
(46:04) Method 1: Total training to inference per token ratio
(50:40) Method 2: Flat inference costs
(53:39) Method 3: Human brain equivalent
(55:02) Method 4: Chip capabilities
(58:00) Method 5: Adjusting for capabilities per token
(59:01) Section VIII - Implications
The original text contained 4 footnotes which were omitted from this narration.
First published: November 26th, 2024
Source: https://www.lesswrong.com/posts/CH9mkk6BqASf3uztv/counting-agis)
---
Narrated by TYPE III AUDIO).