Real-Time On-Device Diffusion: Practical Acceleration via Fused Low-Bit Kernels
A systems paper on accelerating diffusion inference with fused low-bit kernels and cache-update fusion.
A systems paper on accelerating diffusion inference with fused low-bit kernels and cache-update fusion.