Quantization

Real-Time On-Device Diffusion: Practical Acceleration via Fused Low-Bit Kernels featured image

Real-Time On-Device Diffusion: Practical Acceleration via Fused Low-Bit Kernels

A systems paper on accelerating diffusion inference with fused low-bit kernels and cache-update fusion.

avatar
Xia Ruize
Read more