July 30, 2024

Energy of LLMs

#data #LLM #commentary

The rapid increase in model parameters and training data for LLMs has led to a significant rise in compute and power requirements. Let's delve into the compute and energy demands of LLMs by examining details from OpenAI's GPT-3 publication.

Compute needs are often expressed in petaflop/s-days. One petaflop/s equals one quadrillion floating-point operations per second, and "days" indicates the duration for which this performance is sustained.

Parameter and Compute Relationship

The GPT-3 paper details the parameters and compute used during training for eight different model sizes:

GPT-3 Small - 0.125B
GPT-3 Medium - 0.350B
GPT-3 Large - 0.760B
GPT-3 XL - 1.3B
GPT-3 2.7B - 2.7B
GPT-3 6.7B - 6.7B
GPT-3 13.0B - 13.0B
GPT-3 175.0B ("GPT-3") - 175.0B

The training compute can be interpolated from the graph above. For these GPT-3 models, every 1 billion parameter increase requires approximately 20 petaflop/s-days of compute. The commercial version of GPT-3, with 175 billion parameters, demands roughly 3470 petaflop/s-days of compute power.

Compute and Energy Resources

The world's fastest supercomputer, Frontier at Oak Ridge National Labs, cost $600 million to build and has a peak performance of 1.6 exaflop/s (1600 petaflop/s). Training GPT-3 on Frontier would take over 2 days of continuous computation: $ \frac{3470 \text{ petaflop/s-days}}{1600 \text{ petaflop/s}} = 2.2 \text{ days}$.

Frontier consumes 21 MW of power at peak usage, equating to about 1090 MWh of energy over 2.2 days. Considering the average U.S. household uses 10,800 kWh annually, training GPT-3 uses the same amount of energy as approximately 100 homes use in one year.

Conclusion

Although OpenAI has not disclosed details beyond GPT-3, estimates suggest GPT-4 has 1.76 trillion parameters. Assuming similar training efficiency, GPT-4 would require around 35,200 petaflop/s-days, 22 days on Frontier, and energy equivalent to over 1000 households. This highlights how the immense scale of current and future LLMs introduces significant compute and power challenges, unlike early LLMs. Companies are now focused on enhancing computational capabilities, exploring sustainable power options, and improving training efficiency to meet these demands.

Musings