FlashInfer
Documentation Slack Paper Github

Posts

  • Mar 10, 2025

    Sorting-Free GPU Kernels for LLM Sampling

  • Dec 16, 2024

    FlashInfer 0.2 - Efficient and Customizable Kernels for LLM Inference Serving

  • Feb 2, 2024

    Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding

  • Feb 2, 2024

    Accelerating Self-Attentions for LLM Serving with FlashInfer

subscribe via RSS

FlashInfer

Copyright © 2023-2025, FlashInfer team

  • flashinfer-ai

Introduce Techniques to accelerate Large Language Model Deployment