Documentation
Github
Posts
Dec 16, 2024
FlashInfer 0.2 - Efficient and Customizable Kernels for LLM Inference Serving
Feb 2, 2024
Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding
Feb 2, 2024
Accelerating Self-Attentions for LLM Serving with FlashInfer
subscribe
via RSS