llm

3 posts

Production LLM Serving on Kubernetes: vLLM + KServe Stack

Deploy vLLM with KServe on Kubernetes: InferenceService CRD, KEDA autoscaling on queue depth, and distributed KV cache with LMCache for production inference.

by KubeDojo

gateway-api inference-extension ai-gateway

AI Gateway Working Group and Gateway API Inference Extension GA

Gateway API Inference Extension GA standardizes model-aware routing for self-hosted LLMs with KV-cache-aware scheduling and LoRA adapter affinity.

by KubeDojo

ai-security prompt-injection cloudflare

AI Security for LLM-Powered Applications: Prompt Injection Defense

Cloudflare made AI Security for Apps GA with prompt injection protection, while RFC 9457 structured error responses cut AI agent token costs by 98%.

by KubeDojo