Gateway API Inference Extension�v0.1 Review
January 7, 2025
Overview
Project Goals
Project Use Cases
Project Scope
Request Flow
3. The extension gets metrics from compatible model server frameworks
Gateway
InferencePool�Pods running compatible Model Server Framework
Endpoint Selection Extension
GET /completions
2. The gateway forwards the request and endpoint info to extension
1. The gateway selects the InferencePool to route to based on standard Gateway API configuration
5. The gateway sends request to endpoint selected by endpoint picker
4. The extension tells the Gateway which endpoint to route to
Personas
Inference Platform Admin
Inference Workload Owner
Inference Platform Owner: Scope of API
Future
Now
InferencePool (Owned by Inference Platform Owner)
apiVersion: inference.x-k8s.io/v1alpha1
kind: InferencePool
metadata:
name: gemma-pool
spec:� targetPortNumber: 443� selector:
app: vllm-gemma-1-5-pro
extensionRef:
name: endpoint-picker
InferencePool Target Port Options
InferencePool Why Not Service?
Pod Selector
Port Config
ClusterIP
NodePort
Service
InferencePool
Extension Config
Inference Config
Multiple Ports
Protocols
Load Balancer
Session Affinity
DNS
Inference Workload Owner: Scope of API
Future
Now
InferenceModel (Owned by Inference Workload Owner)
apiVersion: inference.x-k8s.io/v1alpha1
kind: InferenceModel
metadata:
name: tweet-summary
spec:
modelName: tweet-summary
criticality: Critical
poolRef:
name: gemma-pool
targetModels:
- name: tweet-summary-0
weight: 50� - name: tweet-summary-1
weight: 50
InferencePool Extensions and Algorithms
InferencePool The Straightforward Bits
InferencePool Extension Ref Options
API Structure
👷🏾♀️👷🏻♂️ �Inference Platform Owners
👨🏾💼👩🏻💻 �Application
Developers
👨🏽🔧👩🏼🔧 �Cluster Operators
Gateway
HTTPRoute
Service
InferencePool
InferenceModel
InferenceModel
🧑🏼⚕️🧑🏿💻 �Inference Workload Owners
Timeline
Resources