FlashInfer | Introduce Techniques to accelerate Large Language Model Deployment

Posts

Oct 21, 2025
FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems
Mar 10, 2025
Sorting-Free GPU Kernels for LLM Sampling
Dec 16, 2024
FlashInfer 0.2 - Efficient and Customizable Kernels for LLM Inference Serving
Feb 2, 2024
Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding
Feb 2, 2024
Accelerating Self-Attentions for LLM Serving with FlashInfer