Sustainable Intelligence: The AI-Energy Nexus
Energy-Aware LLM Serving
January 2026 1
Yue Dong, Nanpeng Yu
Energy Systems Research Workshop – Seed Grants
Yue Dong Assistant Professor of CSE at UCR.
◦ Expertise: NLP, Machine Learning, and Trustworthy AI.
◦ Research Interests: Efficient LLMs, controllable text generation, and LLM safety
Nanpeng Yu Professor of ECE at UCR.
◦ Expertise: Smart Grid, Machine Learning, and Power Systems.
◦ Research Interests: AI-Energy Nexus, data-driven modeling and control for power distribution networks
The Challenge: AI’s Growing Energy Footprint
Energy-Aware LLM Serving
January 2026 2
Yue Dong, Nanpeng Yu
Energy Systems Research Workshop – Seed Grants
Energy Intensity
A single ChatGPT query consumes nearly 10x energy of a Google search.
Future Demand
Infrastructure Load
Projected share of total U.S. electricity consumption by data centers by 2030.
The scale of power demand required for next-generation Al data centers.
Joint LLM + Energy Serving Optimization
Energy-Aware LLM Serving
January 2026 3
Yue Dong, Nanpeng Yu
Energy Systems Research Workshop – Seed Grants
• The Bottleneck in LLMs: KV Cache Memory Intensity
Joint work has been submitted to IEEE PES General Meeting Conference
In auto-regressive LLM generation, the model must store intermediate attention states to avoid redundant computation.
Real-World Impact: A Mistral-7B model requires ~7.3GB of KV memory for a single 30k token context.
Multiplier Effect: 7.3 GB x Tensor Multiplication x Thousands of Concurrent Users = Massive Energy Load.
Moving Beyond Static Compression
Energy-Aware LLM Serving
January 2026 4
Yue Dong, Nanpeng Yu
Energy Systems Research Workshop – Seed Grants
From Fixed Pruning to Learning to Compress with Energy Dynamics
Proposed Solution: Dynamic Control
Current State: Static
- Fixed ratios chosen offline (e.g.,SnapKV).
- Ignores external factors.
- Result: High cost during peak hours OR low quality during off-peak hours.
- Energy-Aware Control Framework.
- Embeds electricity market awareness into the inference algorithm.
- Adapts compression based on real-time price and demand.
System Architecture
Energy-Aware LLM Serving
January 2026 5
Yue Dong, Nanpeng Yu
Energy Systems Research Workshop – Seed Grants
A closed feedback loop where grid economics dictate computational precision in real-time.
Results: Efficiency Without Compromise
Energy-Aware LLM Serving
January 2026 6
Yue Dong, Nanpeng Yu
Energy Systems Research Workshop – Seed Grants
Ours matches aggressive compression success (97%) while preserving higher quality, and cuts energy use by 20% versus FullKV.
Towards Energy-Aware Intelligence
Energy-Aware LLM Serving
January 2026 7
Yue Dong, Nanpeng Yu
Energy Systems Research Workshop – Seed Grants
We have demonstrated a scalable framework that reduces Al energy consumption by 20% by making the inference algorithm an active participant in the electricity market.
Target Agency: NSF – Energy, Power, Control, and Networks (EPCN) & Industrial funds
Timeline: 2026 Fall
Estimated Request: ~ $600,000
Research Focus:
Data center efficiency → adaptive reasoning → sustainable edge intelligence