SGLang Roadmap
Liangsheng Yin
Core Dev at SGLang
Content
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
01
SGLang Overview
02
Recent Feature Highlight
03
Future Roadmap (2026 Q1)
SGLang Roadmap
Breakthrough : Large-Scale Deployment
06
SGLang: Overview
Industry-First Performance
SGLang is the first open-source system that nearly match the performance of DeepSeek official blog with PD disaggregation and EP.
Performance Metrics (May 2025)
"This performance breakthrough validates our architectural decisions and positions SGLang as the go-to solution for organizations requiring enterprise-scale AI inference capabilities."
52.3k input tokens/s/node Industry-leading input processing speed
22.3k output tokens/s/node Exceptional generation throughput
5x cost reduction vs. DeepSeek API pricing
10+ teams successfully reproduced results
SGLang Roadmap
Breakthrough : Large-Scale Deployment
06
Multiple Hardware Support
H20
GB300
AMD
JAX
Spark
Intel
SemiAnalysis
Developer Community Expansion
1000+
Contributors
Active developers contributing code, documentation, and community support
60+
Institution
Universities, research labs, and companies actively using SGLang
20+
Enterprise Users
Companies adopting SGLang as their default DeepSeek inference engine in the first month of release
SGLang Roadmap
Community Growth & Industry Adoption
08
Community Growth & Industry Adoption
Recent Feature Highlight
EPD Disaggregation
Optimized data handling for large-scale multimodal models deployment.
Mini-SGLang
A lightweight yet high-performance inference framework sharing the high-level system architectures as SGLang
Spec Forge v0.2
Draft Model Training Framework
SGL Diffusion
Accelerate image and video generation for diffusion models for production-level serving
Zero-overhead Speculative Decoding
Tune the scheduler for speculative decoding and seeing 10% - 20% speedup across the board.
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
Zero-overhead Speculative Decoding
SGLang Roadmap
07
Zero-overhead CPU runtime for LLM
SGLang has been pioneering the zero-overhead CPU runtime for LLM runtime last year.
Scheduler Tuning for Spec Decoding
~20% speedup across the board.
Stream Design
GPU Forward Stream: Handle all tensor forwarding without GPU blocking
CPU Schedule Stream: Delay one step output process and schedule continuous batching
Accelerating Image and Video Generation
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
SGLang Diffusion
SpecBundle & SpecForge v0.2:
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
SpecForge: Draft Model Training Framework
Native SGLang integration for speculative decoding optimization
Native Session Support with RadixCache
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
Scheduling Refactor: Decoupled Forward Patterns
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
Scheduling Pipeline Refactor
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
Mini-SGLang
SGLang Roadmap
2025 H1 Strategic Focus Areas
05
Roadmap (2026 Q1)
Speculative
decoding
x
x
x
x
All kinds of
parallelism
PD
disaggregation
Improving Compatibility
Spec V2
(in progress)
PP/EP
Refactor
All kinds of
memory pool
Mem V2
(in progress)
Overlap
scheduler
Improving Compatibility
Performance & Architecture Improvements
Parallelism
Multimodality
Roadmap (2026 Q1)
Check it out on SGLang Github!
Question & Answer
Starred
24.7K
https://www.sglang.io/
Follow and ⭐ star us!