Stability AI, the creator of the renowned Stable Diffusion text-to-image AI technology, has unveiled a new model named Stable Cascade. This innovative model, according to a recent VentureBeat report, represents a leap forward in image generation technology, aiming to offer more efficient and flexible solutions than its predecessors. Since its initial launch in 2022, Stability AI has continuously refined its Stable Diffusion model, leading to significant updates with the SDXL 1.0 in July 2023 and the SDXL Turbo in November 2023.
Stable Cascade introduces a novel approach to image generation, utilizing a different architecture inspired by the Würstchen architecture. This method incorporates advanced techniques to enhance both performance and accuracy. According to the Würstchen research abstract, a key innovation is the development of a latent diffusion technique that employs a highly compressed yet detailed semantic image representation. This approach significantly reduces the computational requirements to achieve state-of-the-art results, marking a new milestone in AI-driven image creation.
Stability AI’s modular three-stage architecture for enhanced efficiency
Unlike the single large model used by Stable Diffusion, Stable Cascade employs a modular three-stage architecture, consisting of Stages A, B, and C. This setup allows for significant improvements in training efficiency and customization. The process begins with Stage C, which converts text prompts into compact 24×24 pixel latents. These latents are then decoded into full high-resolution images by Stages A and B. By decoupling the text-to-image generation from the image decoding, the initial text-conditional model can be trained and fine-tuned with greater efficiency. Stability AI reports that fine-tuning Stage C alone results in a 16x cost reduction compared to fine-tuning a single model of similar size to Stable Diffusion.
Direct Preference Optimization (DPO) is another area where Stable Cascade aims to improve image quality. DPO, an alternative to reinforcement learning, adjusts models to align with human preferences. Stability AI’s founder and CEO, Emad Mostaque, has indicated that combining Stable Cascade with DPO will yield superior images. Despite being a research preview model, Stable Cascade already excels in image quality and prompt alignment, surpassing other leading AI art models, including SDXL, in evaluations conducted by Stability AI.
A notable advancement with Stable Cascade is its capability to accurately generate text within images, enhancing the model’s utility for a wide range of applications. This feature positions Stable Cascade as a significant competitor in the AI art generation space, offering more variety and consistency in the creation of AI-generated images.
Stable Cascade also introduces functionalities for generating variations of a given image while maintaining style and composition, as well as performing image-to-image translations. Advanced techniques like in-painting and super-resolution are supported through ControlNets. Currently available for non-commercial use in a research preview, Stable Cascade’s code can be accessed on GitHub, inviting developers and researchers to explore its potential further.