1 of 7

Sustainable Intelligence: The AI-Energy Nexus

Energy-Aware LLM Serving

January 2026 1

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

Yue Dong Assistant Professor of CSE at UCR.

◦ Expertise: NLP, Machine Learning, and Trustworthy AI.

◦ Research Interests: Efficient LLMs, controllable text generation, and LLM safety

Nanpeng Yu Professor of ECE at UCR.

◦ Expertise: Smart Grid, Machine Learning, and Power Systems.

◦ Research Interests: AI-Energy Nexus, data-driven modeling and control for power distribution networks

2 of 7

The Challenge: AI’s Growing Energy Footprint

Energy-Aware LLM Serving

January 2026 2

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

Energy Intensity

A single ChatGPT query consumes nearly 10x energy of a Google search.

Future Demand

Infrastructure Load

Projected share of total U.S. electricity consumption by data centers by 2030.

The scale of power demand required for next-generation Al data centers.

Modern LLM services rely on continuously operating GPU clusters.
Current power management is reactive hardware scaling.
We propose a shift to algorithm-level awareness.

3 of 7

Joint LLM + Energy Serving Optimization

Energy-Aware LLM Serving

January 2026 3

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

• The Bottleneck in LLMs: KV Cache Memory Intensity

Joint work has been submitted to IEEE PES General Meeting Conference

In auto-regressive LLM generation, the model must store intermediate attention states to avoid redundant computation.

Real-World Impact: A Mistral-7B model requires ~7.3GB of KV memory for a single 30k token context.

Multiplier Effect: 7.3 GB x Tensor Multiplication x Thousands of Concurrent Users = Massive Energy Load.

4 of 7

Moving Beyond Static Compression

Energy-Aware LLM Serving

January 2026 4

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

From Fixed Pruning to Learning to Compress with Energy Dynamics

Proposed Solution: Dynamic Control

Current State: Static

- Fixed ratios chosen offline (e.g.,SnapKV).

- Ignores external factors.

- Result: High cost during peak hours OR low quality during off-peak hours.

- Energy-Aware Control Framework.

- Embeds electricity market awareness into the inference algorithm.

- Adapts compression based on real-time price and demand.

5 of 7

System Architecture

Energy-Aware LLM Serving

January 2026 5

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

A closed feedback loop where grid economics dictate computational precision in real-time.

6 of 7

Results: Efficiency Without Compromise

Energy-Aware LLM Serving

January 2026 6

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

Ours matches aggressive compression success (97%) while preserving higher quality, and cuts energy use by 20% versus FullKV.

7 of 7

Towards Energy-Aware Intelligence

Energy-Aware LLM Serving

January 2026 7

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

We have demonstrated a scalable framework that reduces Al energy consumption by 20% by making the inference algorithm an active participant in the electricity market.

Target Agency: NSF – Energy, Power, Control, and Networks (EPCN) & Industrial funds

Timeline: 2026 Fall

Estimated Request: ~ $600,000

Research Focus:

Data center efficiency → adaptive reasoning → sustainable edge intelligence

LLM-Aware Data Center Infrastructure

Dynamic LLM-aware infrastructure that adapts to workload characteristics and context growth
Joint optimization of compute, memory, energy, and cooling for efficient training and serving
Intelligent scheduling and inference control using context length, KV cache footprint, reasoning complexity, and adaptive compression

On-device LLM reasoning under strict energy budgets�Adaptive computation, memory, and decoding to fit edge power constraints