1 of 7

Sustainable Intelligence: The AI-Energy Nexus

Energy-Aware LLM Serving

January 2026 1

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

Yue Dong     Assistant Professor of CSE at UCR.

    ◦ Expertise: NLP, Machine Learning, and Trustworthy AI.

    ◦ Research Interests: Efficient LLMs, controllable text generation, and LLM safety

Nanpeng Yu   Professor of ECE at UCR.

    ◦ Expertise: Smart Grid, Machine Learning, and Power Systems.

    ◦ Research Interests: AI-Energy Nexus, data-driven modeling and control for power distribution networks

2 of 7

The Challenge: AI’s Growing Energy Footprint

Energy-Aware LLM Serving

January 2026 2

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

Energy Intensity

A single ChatGPT query consumes nearly 10x energy of a Google search.

Future Demand

Infrastructure Load

Projected share of total U.S. electricity consumption by data centers by 2030.

The scale of power demand required for next-generation Al data centers.

  • Modern LLM services rely on continuously operating GPU clusters.
  • Current power management is reactive hardware scaling.
  • We propose a shift to algorithm-level awareness.

3 of 7

Joint LLM + Energy Serving Optimization

Energy-Aware LLM Serving

January 2026 3

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

• The Bottleneck in LLMs: KV Cache Memory Intensity

Joint work has been submitted to IEEE PES General Meeting Conference

In auto-regressive LLM generation, the model must store intermediate attention states to avoid redundant computation.

Real-World Impact: A Mistral-7B model requires ~7.3GB of KV memory for a single 30k token context.

Multiplier Effect: 7.3 GB x Tensor Multiplication x Thousands of Concurrent Users = Massive Energy Load.

4 of 7

Moving Beyond Static Compression

Energy-Aware LLM Serving

January 2026 4

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

From Fixed Pruning to Learning to Compress with Energy Dynamics

Proposed Solution: Dynamic Control

Current State: Static

- Fixed ratios chosen offline (e.g.,SnapKV).

- Ignores external factors.

- Result: High cost during peak hours OR low quality during off-peak hours.

- Energy-Aware Control Framework.

- Embeds electricity market awareness into the inference algorithm.

- Adapts compression based on real-time price and demand.

5 of 7

System Architecture

Energy-Aware LLM Serving

January 2026 5

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

A closed feedback loop where grid economics dictate computational precision in real-time.

6 of 7

Results: Efficiency Without Compromise

Energy-Aware LLM Serving

January 2026 6

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

Ours matches aggressive compression success (97%) while preserving higher quality, and cuts energy use by 20% versus FullKV.

7 of 7

Towards Energy-Aware Intelligence

Energy-Aware LLM Serving

January 2026 7

Yue Dong, Nanpeng Yu

Energy Systems Research Workshop – Seed Grants

We have demonstrated a scalable framework that reduces Al energy consumption by 20% by making the inference algorithm an active participant in the electricity market.

Target Agency: NSF – Energy, Power, Control, and Networks (EPCN) & Industrial funds

Timeline: 2026 Fall

Estimated Request: ~ $600,000

Research Focus:

Data center efficiency → adaptive reasoning → sustainable edge intelligence

  1. LLM-Aware Data Center Infrastructure
    • Dynamic LLM-aware infrastructure that adapts to workload characteristics and context growth
    • Joint optimization of compute, memory, energy, and cooling for efficient training and serving
    • Intelligent scheduling and inference control using context length, KV cache footprint, reasoning complexity, and adaptive compression
  2. On-device LLM reasoning under strict energy budgets�Adaptive computation, memory, and decoding to fit edge power constraints