Member of Technical Staff - Distributed Systems
IT
San Francisco, CA, USA
Posted on Jul 1, 2026
Build the systems that make AI inference fast, reliable, and cost-efficient at global scale. You’ll design the control plane that schedules a huge queue of tokens over a diverse fleet of machines, spread all over the world. What you’ll do: Design and implement high-performance schedulers (admission control, queuing, priority, fairness, preemption, bin packing). Build global routing and traffic management (latency-aware dispatch, predictive autoscaling, failover strategies). LLM-specific routing optimizations, e.g. KV caching that lets us trade memory for compute, across the pyramid of GPU RAM, CPU RAM, and NVMe flash. Build deep observability: we want to trace every millisecond of our systems, and catch failures early enough that we can make things right before the customer even notices. What we’re looking for: Strong distributed systems fundamentals (concurrency, networking, databases, performance engineering). Eagerness to work with agents; careful testing and clear plans/tests. Bonus: experience with ML inference stacks (vLLM/SGLang), GPUs/accelerators.