Text to Image C++ Generation Pipeline
Examples in this folder showcase inference of text to image models like Stable Diffusion 1.5, 2.1, LCM. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample features ov::genai::Text2ImagePipeline
and uses a text prompt as input source.
There are several sample files:
text2image.cpp
demonstrates basic usage of the text to image pipelinetext2image_concurrency.cpp
demonstrates concurrent usage of the text to image pipeline to create multiple images with different promptslora_text2image.cpp
shows how to apply LoRA adapters to the pipelineheterogeneous_stable_diffusion.cpp
shows how to assemble a heterogeneous txt2image pipeline from individual subcomponents (scheduler, text encoder, unet, vae decoder)image2image.cpp
demonstrates basic usage of the image to image pipelineimage2image_concurrency.cpp.cpp
demonstrates concurrent usage of the image to image pipeline to create multiple images with different promptsinpainting.cpp
demonstrates basic usage of the inpainting pipelinebenchmark_image_gen.cpp
demonstrates how to benchmark the text to image / image to image / inpainting pipeline
Users can change the sample code and play with the following generation parameters:
- Change width or height of generated image
- Generate multiple images per prompt
- Adjust a number of inference steps
- Play with guidance scale (read more details)
- (SD 1.x, 2.x; SD3, SDXL) Add negative prompt when guidance scale > 1
- (SDXL, SD3, FLUX) Specify other positive prompts like
prompt_2
- Apply multiple different LoRA adapters and mix them with different blending coefficients
- (Image to image and inpainting) Play with
strength
parameter to control how initial image is noised and reduce number of inference steps
[!NOTE] Image generated with HuggingFace / Optimum Intel is not the same generated by this C++ sample: C++ random generation with MT19937 results differ from
numpy.random.randn()
anddiffusers.utils.randn_tensor
(usestorch.Generator
inside). So, it's expected that image generated by Diffusers and C++ versions provide different images, because latent images are initialize differently.