Portfolio Jobs

Discover opportunities across our network

companies

Jobs

Member of Technical Staff - Inference

Sail Research

IT

San Francisco, CA, USA

Posted on Jul 1, 2026

Apply now

Optimize token processing down to the lowest layers of the stack. You'll optimize kernel performance, develop new scheduling and parallelism strategies, and help us squeeze every FLOP out of our hardware. What you’ll do: Modify and extend state-of-the-art inference engines like vLLM and SGLang. Understand every microsecond of GPU time during a forward pass; be able to explain every kernel launch on an NSys profile. Design and implement exotic parallelism schemes to work with 'interesting' hardware topologies. Write custom GPU kernels to excel in specific regimes, such as cascade attention. What we’re looking for: Strong understanding of LLM mechanics (KV cache, mixture-of-experts, prefill vs. decode phases). Interest in MLSys research (speculative decoding, sparse attention). Familiarity with modern, tile-based GPU programming (Triton, CUTLASS, ThunderKittens), or interest in learning these.

Apply now

See more open positions at Sail Research