Thursday, May 30, 2024

Power Consumption Trends from Increased AI and Data Center Utilization

US Data Center distribution aggregated together by general classifications of commercial, co-location, and hyperscaler. As of 2022, approximately 2,701 data centers were operational in the U.S.—with the geographic concentration following the sequential distribution of: California, Texas, Virginia, and New York. Illustrating the socio-economic determinants of population density and grid infrastructure on data center location

Electric Power Research Intitute (EPRI)

Power Consumption Trends from Increased AI and Data Center Utilization

Jordan Aljbour | Technology Innovation – Strategic Research August 2023

Summary

The document discusses the increasing power consumption trends driven by the growth of AI and data center utilization. Key points include:

1. AI models, particularly those used for process automation, predictive analytics, and natural language processing, require intensive resources and have a significant carbon footprint.

2. Data centers, supporting digital services like cloud computing and AI, account for a small but growing percentage of total electricity consumption in the US and worldwide.

3. Factors driving increased power consumption include the explosion of data, increasing complexity of AI models, and the 24/7 operation of data centers.

4. Strategies to manage efficiency, usage, and environmental impact include developing efficient AI algorithms, optimizing hardware, and utilizing renewable energy sources for data centers.

5. Future scenarios range from optimistic (50% reduction in energy use with successful adoption of efficient technologies) to pessimistic (doubling of energy use without significant advances and adoption of green practices).

6. The document emphasizes the need for ongoing research, collaboration between tech companies and policymakers, and the establishment of best practices and regulations to mitigate the environmental impact of AI and data centers while leveraging their benefits.

Powering Intelligence - Analyzing Artificial Intelligence and Data Center Energy Consumption 2024

KEY TAKEAWAYS

AI and data center systems, despite their transformative impact, pose substantial environmental challenges due to high energy consumption. This growing energy demand underscores a critical trade-off between digital innovation and environmental sustainability. However, advances in software and hardware efficiency present promising strategies to mitigate this energy footprint, paving the way towards a more sustainable future.

The future leans towards greener AI and data centers, but this direction hinges on relentless research and collaboration between tech companies and policymakers to establish best practices and regulations. In doing so, we can continue to leverage the transformative benefits of AI and databases while remaining cognizant of our environmental responsibilities.

Summary

The 2024 white paper "Powering Intelligence: Analyzing Artificial Intelligence and Data Center Energy Consumption" builds upon and expands the insights from the August 2023 report "Power Consumption Trends from Increased AI and Data Center Utilization." Key additions and changes include:

  1. Updated projections of U.S. data center electricity consumption to 2030, with four scenarios (low, moderate, high, and higher growth) based on the latest available data and expert insights. The 2024 report projects data centers could consume 4.6% to 9.1% of total U.S. electricity by 2030, up from an estimated 4% in 2023.
  2. Detailed analysis of the potential impact of generative AI models like ChatGPT on data center electricity demand. The 2024 report highlights how these AI applications are much more energy-intensive than traditional data center workloads.
  3. State-level projections of data center electricity consumption, illustrating the geographic concentration of the industry. The 2024 report notes that 15 states accounted for 80% of national data center load in 2023.
  4. Discussion of strategies to manage data center efficiency, usage, and environmental impact, including energy-efficient algorithms, hardware, and cooling technologies; scalable clean energy use; and monitoring and analytics.
  5. A roadmap for supporting rapid data center expansion, emphasizing improved data center operational efficiency and flexibility, increased collaboration between data center developers and electric companies, and better anticipation of future point load growth through improved forecasting and modeling.

Overall, the 2024 white paper provides a more comprehensive and forward-looking analysis of the energy implications of AI and data center growth, reflecting the rapid evolution of these technologies and their increasing importance for the electricity sector.

State Projected Load 

The table below is sorted by projected 2030 Data Center Load fraction under moderate growth of 5%. It shows that by then Virginia is in real trouble, with 46% of its power capacity used. Surprisingly, the following states are also projected to have over 20% of their power consumed: North Dakota, Nebraska, Iowa, Oregon, and Nevada.

State Low Growth (3.7%) Moderate Growth (5%) High Growth (10%) Higher Growth (15%) 2030 Projected Total State Electricity Consumption (TWh)
New York 3.40% 3.69% 5.05% 6.75% 154.2
Pennsylvania 3.78% 4.11% 5.61% 7.49% 123.2
California 4.43% 4.81% 6.54% 8.70% 271.9
Georgia 5.08% 5.51% 7.48% 9.92% 157.4
Texas 5.47% 5.94% 8.04% 10.64% 514.4
New Jersey 6.46% 7.00% 9.44% 12.44% 74.6
Illinois 6.53% 7.08% 9.54% 12.56% 147.2
Washington 6.77% 7.34% 9.88% 13.00% 98.7
Arizona 8.81% 9.53% 12.73% 16.58% 91.6
Nevada 10.28% 11.10% 14.75% 19.07% 42.9
Oregon 13.39% 14.43% 18.93% 24.14% 61.8
Iowa 13.44% 14.48% 18.99% 24.21% 46.6
Nebraska 13.75% 14.81% 19.41% 24.71% 27.2
North Dakota 18.00% 19.31% 24.89% 31.11% 21.3
Virginia 29.28% 31.10% 38.47% 46.00% 149.5

 Based on the information provided in the 2024 white paper, here is a table showing the projected data center electricity consumption as a percentage of each state's total electricity consumption in 2030 under the four growth scenarios:

StateLow Growth (3.7%)Moderate Growth (5%)High Growth (10%)Higher Growth (15%)2030 Projected Total State Electricity Consumption (TWh)
Virginia29.28%31.10%38.47%46.00%149.48
Texas5.47%5.94%8.04%10.64%514.40
California4.43%4.81%6.54%8.70%271.90
Illinois6.53%7.08%9.54%12.56%147.20
Oregon13.39%14.43%18.93%24.14%61.80
Arizona8.81%9.53%12.73%16.58%91.60
Iowa13.44%14.48%18.99%24.21%46.60
Georgia5.08%5.51%7.48%9.92%157.40
Washington6.77%7.34%9.88%13.00%98.70
Pennsylvania3.78%4.11%5.61%7.49%123.20
New York3.40%3.69%5.05%6.75%154.20
New Jersey6.46%7.00%9.44%12.44%74.60
Nebraska13.75%14.81%19.41%24.71%27.20
North Dakota18.00%19.31%24.89%31.11%21.30
Nevada10.28%11.10%14.75%19.07%42.90

Please note that the 2030 projected total state electricity consumption values were not provided in the white paper. These values would be necessary to calculate the absolute projected data center electricity consumption in each state.

LLM AI System Projections

The 2024 white paper highlights the potential impact of large language models (LLMs) and generative AI systems, such as ChatGPT, on data center electricity consumption. These AI applications are much more energy-intensive than traditional data center workloads, and their rapid growth could significantly influence future electricity demand.

Key points about the projected growth of LLM AI systems:

1. Demand surge: The release of ChatGPT in November 2022 triggered a surge in interest and demand for generative AI capabilities, with major tech companies like Microsoft, Alphabet, Meta, and Bing launching their own chatbots and LLMs.

2. Energy intensity: AI queries are estimated to require about 10 times more electricity than traditional searches. For example, a ChatGPT request uses about 2.9 watt-hours (Wh), compared to 0.3 Wh for a typical Google search.

3. Potential impact on search engines: If Google were to implement LLMs in every search, it could require an additional 80 gigawatt-hours (GWh) of electricity per day or 29.2 terawatt-hours (TWh) per year. Another analysis suggests that this could necessitate around 400,000 additional servers, consuming 62.4 GWh daily or 22.8 TWh yearly.

4. Emerging capabilities: New AI applications, such as generating original music, photos, and videos based on user prompts, could require even more power than current LLMs.

5. Widespread adoption: With 5.3 billion global internet users, the widespread adoption of LLMs and generative AI tools could potentially lead to a significant increase in data center power requirements.

The white paper emphasizes that the ultimate impact of LLMs on data center energy consumption will depend on factors such as the rate of adoption, the development of new applications, and the efficiency of the underlying hardware and algorithms. Nonetheless, the rapid growth and increasing complexity of these AI systems highlight the need for continued research and collaboration to ensure sustainable growth in the data center industry.


1-bit LLMs Could Solve AI’s Energy Demands - IEEE Spectrum

Summary

1-bit LLMs Could Solve AI’s Energy Demands

1-bit LLMs Could Solve AI’s Energy Demands

“Imprecise” language models are smaller, speedier—and nearly as accurate

 Summary

This IEEE Spectrum article discusses the potential of 1-bit large language models (LLMs) to address the increasing energy demands and computational power required by AI systems like ChatGPT. Researchers are exploring ways to compress these models by drastically reducing the precision of the parameters that store their memories, allowing them to run on smaller devices like cellphones.

Two main approaches are mentioned:

1. Post-training quantization (PTQ): Quantizing the parameters of a full-precision network after training.
2. Quantization-aware training (QAT): Training a network from scratch to have low-precision parameters.

Recent research has shown promising results:

1. BiLLM, a PTQ method, approximates most parameters using 1 bit and a few salient weights using 2 bits, significantly reducing memory requirements while maintaining performance.
2. BitNet, a QAT method developed by Microsoft Research Asia, creates LLMs that are roughly 10 times as energy-efficient as full-precision models.
3. BitNet 1.58b, an improvement on BitNet, performs just as well as a full-precision LLaMA model with the same number of parameters and training but is faster, uses less GPU memory, and consumes less energy.
4. OneBit, a method combining PTQ and QAT, achieves a good balance between performance and memory usage.

The article suggests that the development of custom hardware optimized for 1-bit LLMs could further enhance their efficiency and speed. However, it also notes that developing new hardware is a long process, and 1-bit models and processors should grow together.


a glowing number 1 in front of a illustration of a brain with interconnections
Getty Images

Large language models, the AI systems that power chatbots like ChatGPT, are getting better and better—but they’re also getting bigger and bigger, demanding more energy and computational power. For LLMs that are cheap, fast, and environmentally friendly, they’ll need to shrink, ideally small enough to run directly on devices like cellphones. Researchers are finding ways to do just that by drastically rounding off the many high-precision numbers that store their memories to equal just 1 or -1.

LLMs, like all neural networks, are trained by altering the strengths of connections between their artificial neurons. These strengths are stored as mathematical parameters. Researchers have long compressed networks by reducing the precision of these parameters—a process called quantization—so that instead of taking up 16 bits each, they might take up 8 or 4. Now researchers are pushing the envelope to a single bit.

How to Make a 1-bit LLM

There are two general approaches. One approach, called post-training quantization (PTQ) is to quantize the parameters of a full-precision network. The other approach, quantization-aware training (QAT), is to train a network from scratch to have low-precision parameters. So far, PTQ has been more popular with researchers.

In February, a team including Haotong Qin at ETH Zurich, Xianglong Liu at Beihang University, and Wei Huang at the University of Hong Kong introduced a PTQ method called BiLLM. It approximates most parameters in a network using 1 bit, but represents a few salient weights—those most influential to performance—using 2 bits. In one test, the team binarized a version of Meta’s LLaMa LLM that has 13 billion parameters.

“One-bit LLMs open new doors for designing custom hardware and systems specifically optimized for 1-bit LLMs.” —Furu Wei, Microsoft Research Asia

To score performance, the researchers used a metric calledperplexity, which is basically a measure of how surprised the trained model was by each ensuing piece of text. For one dataset, the original model had a perplexity of around 5, and the BiLLM version scored around 15, much better than the closest binarization competitor, which scored around 37 (for perplexity, lower numbers are better). That said, the BiLLM model required about a tenth of the memory capacity as the original.

PTQ has several advantages over QAT, says Wanxiang Che, a computer scientist at Harbin Institute of Technology, in China. It doesn’t require collecting training data, it doesn’t require training a model from scratch, and the training process is more stable. QAT, on the other hand, has the potential to make models more accurate, since quantization is built into the model from the beginning.

1-bit LLMs Find Success Against Their Larger Cousins

Last year, a team led by Furu Wei and Shuming Ma, at Microsoft Research Asia, in Beijing, created BitNet, the first 1-bit QAT method for LLMs. After fiddling with the rate at which the network adjusts its parameters, in order to stabilize training, they created LLMs that performed better than those created using PTQ methods. They were still not as good as full-precision networks, but roughly 10 times as energy efficient.

In February, Wei’s team announced BitNet 1.58b, in which parameters can equal -1, 0, or 1, which means they take up roughly 1.58 bits of memory per parameter. A BitNet model with 3 billion parameters performed just as well on various language tasks as a full-precision LLaMA model with the same number of parameters and amount of training—Wei called this an “aha moment”—but it was 2.71 times as fast, used 72 percent less GPU memory, and used 94 percent less GPU energy. Further, the researchers found that as they trained larger models, efficiency advantages improved.

A BitNet model with 3 billion parameters performed just as well on various language tasks as a full-precision LLaMA model.

This year, a team led by Che, of Harbin Institute of Technology, released a preprint on another LLM binarization method, called OneBit. OneBit combines elements of both PTQ and QAT. It uses a full-precision pretrained LLM to generate data for training a quantized version. The team’s 13-billion-parameter model achieved a perplexity score of around 9 on one dataset, versus 5 for a LLaMA model with 13 billion parameters. Meanwhile, OneBit occupied only 10 percent as much memory. On customized chips, it could presumably run much faster.

Wei, of Microsoft, says quantized models have multiple advantages. They can fit on smaller chips, they require less data transfer between memory and processors, and they allow for faster processing. Current hardware can’t take full advantage of these models, though. LLMs often run on GPUs like those made by Nvidia, which represent weights using higher precision and spend most of their energy multiplying them. New hardware could natively represent each parameter as a -1 or 1 (or 0), and then simply add and subtract values and avoid multiplication. “One-bit LLMs open new doors for designing custom hardware and systems specifically optimized for 1-bit LLMs,” Wei says.

“They should grow up together,” Huang, of the University of Hong Kong, says of 1-bit models and processors. “But it’s a long way to develop new hardware.”

 



No comments:

Post a Comment

Novel AI System Achieves 90% Accuracy in Detecting Drone Jamming Attacks

Loss convergence analysis using test data under LoS and NLoS conditions     Novel AI System Achieves 90% Accuracy in Detecting Drone Jamming...