Improved and Accelerated Text-to-Image Generation with Collect, Reflect, and Refine

Recently, enhancing the generative capability of text-to-image (T2I) models has become a promising direction in both academia and industry. Prior studies often focused on either improving generative quality or reducing inference latency, but typically failed to improve both quality and speed simultaneously. Moreover, existing inference-enhancement methods do not achieve significant improvements simultaneously across both diffusion models (DMs) and autoregressive models (ARMs). In this paper, we