Documentation
Github
Posts
Feb 2, 2024
Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding
Feb 2, 2024
Accelerating Self-Attentions for LLM Serving with FlashInfer
subscribe
via RSS