How does Midjourney create images in real time? Understanding diffusion models
Description
How does Midjourney create images in real time? Understanding diffusion models
While Midjourney’s model is proprietary and not documented as open source, it probably integrates diffusion models with language models to create images in real time. The language model interprets the textual description, extracting key features and themes. This interpreted information then guides the diffusion process, ensuring that the generated image aligns with the textual description.
The process possibly begins with an initial noise tensor, essentially a random array of values that doesn't resemble any meaningful image. Think of this as a canvas filled with random splatters of paint.
Before the diffusion process starts, the system needs to understand the text prompt. A language model or a text encoder processes the prompt and converts it into a fixed-size vector, known as an embedding. This embedding captures the semantic essence of the text and guides the diffusion process to ensure the final image aligns with the prompt.