Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
Stability AI is out today with a new Stable Diffusion base model that dramatically improves image quality and users’ ability to generate highly detailed images with just a text prompt.
Stable Diffusion XL (SDXL) 1.0 is the new, leading-edge flagship text-to-image generation model from Stability AI. The release comes as Stability AI aims to level up its capabilities and open the model in the face of competition from rivals like Midjourney and Adobe, which recently entered the space with its Firefly service.
Stability AI has been previewing the capabilities of SDXL 1.0 since June with a research-only release that helped to demonstrate the model’s power. Among the enhancements is an improved image-refining process that the company claims will generate more vibrant colors, lighting and contrast than previous Stable Diffusion models. SDXL 1.0 also introduces a fine-tuning feature that enables users to create highly customized images with less effort.
>>Don’t miss our special issue: The Future of the data center: Handling greater and greater demands.<<
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
The SDXL 1.0 model was developed using a highly optimized training approach that benefits from a 3.5 billion-parameter base model. Stability AI is positioning it as a solid base model on which the company expects to see an ecosystem of tools and capabilities to be built.
“Base models are really interesting, they’re like a Minecraft release where a whole modding community appears, and you’ve seen that richness within the Stable Diffusion community. But you need to have a really solid foundation from which to build,” Emad Mostaque, CEO of Stability AI, told VentureBeat.
How Stable Diffusion’s fine-tuning has been improved with ControlNet in SDXL 1.0
Getting the best possible image with text-to-image generation is typically an iterative process, and one that SDXL 1.0 is aiming to make a whole lot easier.
“The amount of images that are acquired for fine-tuning dropped dramatically,” Mostaque said. “Now with as few as five to 10 images, you can fine-tune an amazing model really quickly.”
One of the key innovations that helps to enable the easier fine-tuning and improved composition in SDXL 1.0 is an approach known as “ControlNet.” A Stanford University research paper detailed this technique earlier this year. Mostaque explained that a ControlNet can, for example, enable an input such as a skeleton figure and then map that image to the base diffusion noise infrastructure to create a higher degree of accuracy and control.
Why more parameters in SDXL 1.0 are a big deal
Mostaque commented that one of the key things that’s helped to kick off the generative AI boom overall has been scaling, whereby the parameter count is increased leading to more features and more and more knowledge. Mostaque said that the 3.5 billion parameters in the base SDXL 1.0 model leads to more accuracy overall.
“You’re teaching the model various things and you’re teaching it more in-depth,” he said. “Parameter count actually matters — the more concepts that it knows, and the deeper it knows them.”
While SDXL 1.0 has more parameters, it does not require users to input long tokens or prompts to get the better results, as is often the case with text generation models. Mostaque said that with SDXL 1.0, a user can provide complicated, multi-part instructions, which now require fewer words than prior models, to generate an accurate image. With previous Stable Diffusion models, users needed longer text prompts.
“You don’t need to do that with this model, because we did the reinforcement learning with human feedback (RLHF) stage with the community and our partners for the 0.9 release,” he explained.
The SDXL 1.0 base model is available today in a variety of locations, including the Amazon Bedrock and Amazon SageMaker Jumpstart services.
>>Follow VentureBeat’s ongoing generative AI coverage<<
“The base model is open and it’s available to the entire community with a CreativeML ethical use license,” Mostaque said. “Bedrock, Jumpstart and then our own API services, as well as interfaces like Clipdrop that we have, just make it easy to use, because the base model by itself is … a bit complicated to use.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.