AI Engineering for Art — with comfyanonymous, of ComfyUI
Description
Applications for the NYC AI Engineer Summit, focused on Agents at Work, are open!
When we first started Latent Space, in the lightning round we’d always ask guests: “What’s your favorite AI product?”. The majority would say Midjourney. The simple UI of prompt → very aesthetic image turned it into a $300M+ ARR bootstrapped business as it rode the first wave of AI image generation.
In open source land, StableDiffusion was congregating around AUTOMATIC1111 as the de-facto web UI. Unlike Midjourney, which offered some flags but was mostly prompt-driven, A1111 let users play with a lot more parameters, supported additional modalities like img2img, and allowed users to load in custom models. If you’re interested in some of the SD history, you can look at our episodes with Lexica, Replicate, and Playground.
One of the people involved with that community was comfyanonymous, who was also part of the Stability team in 2023, decided to build an alternative called ComfyUI, now one of the fastest growing open source projects in generative images, and is now the preferred partner for folks like Black Forest Labs’s Flux Tools on Day 1. The idea behind it was simple: “Everyone is trying to make easy to use interfaces. Let me try to make a powerful interface that's not easy to use.”
Unlike its predecessors, ComfyUI does not have an input text box. Everything is based around the idea of a node: there’s a text input node, a CLIP node, a checkpoint loader node, a KSampler node, a VAE node, etc. While daunting for simple image generation, the tool is amazing for more complex workflows since you can break down every step of the process, and then chain many of them together rather than manually switching between tools. You can also re-start execution halfway instead of from the beginning, which can save a lot of time when using larger models.
To give you an idea of some of the new use cases that this type of UI enables:
* Sketch something → Generate an image with SD from sketch → feed it into SD Video to animate
* Generate an image of an object → Turn into a 3D asset → Feed into interactive experiences
* Input audio → Generate audio-reactive videos
Their Examples page also includes some of the more common use cases like AnimateDiff, etc. They recently launched the Comfy Registry, an online library of different nodes that users can pull from rather than having to build everything from scratch. The project has >60,000 Github stars, and as the community grows, some of the projects that people build have gotten quite complex:
The most interesting thing about Comfy is that it’s not a UI, it’s a runtime. You can build full applications on top of image models simply by using Comfy. You can expose Comfy workflows as an endpoint and chain them together just like you chain a single node. We’re seeing the rise of AI Engineering applied to art.
Major Tom’s ComfyUI Resources from the Latent Space Discord
Major shoutouts to Major Tom on the LS Discord who is a image generation expert, who offered these pointers:
* “best thing about comfy is the fact it supports almost immediately every new thing that comes out - unlike A1111 or forge, which still don't support flux cnet for instance. It will be perfect tool when conflicting nodes will be resolved”
* AP Workflows from Alessandro Perili are a nice example of an all-in-one train-evaluate-generate system built atop Comfy
* ComfyUI YouTubers to learn from:
* ComfyUI Nodes to check out:
* https://github.com/kijai/ComfyUI-IC-Light
* https://github.com/MrForExample/ComfyUI-3D-Pack
* https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait
* https://github.com/pydn/ComfyUI-to-Python-Extension
* https://github.com/THtianhao/ComfyUI-Portrait-Maker
* https://github.com/ssitu/ComfyUI_NestedNodeBuilder
* https://github.com/longgui0318/comfyui-magic-clothing
* https://github.com/atmaranto/ComfyUI-SaveAsScript
* https://github.com/ZHO-ZHO-ZHO/ComfyUI-InstantID
* https://github.com/AIFSH/ComfyUI-FishSpeech
* https://github.com/coolzilj/ComfyUI-Photopea
* https://github.com/lks-ai/anynode
* Sarav: https://www.youtube.com/@mickmumpitz/videos ( applied stuff )
* Sarav: https://www.youtube.com/@latentvision (technical, but infrequent)
* look for comfyui node for https://github.com/magic-quill/MagicQuill
* “Comfy for Video” resources
* Kijai (https://github.com/kijai) pushing out support for Mochi, CogVideoX, AnimateDif, LivePortrait etc
* Comfyui node support like LTX https://github.com/Lightricks/ComfyUI-LTXVideo , and HunyuanVideo
* FloraFauna AI and Krea.ai
* Communities: https://www.reddit.com/r/StableDiffusion/, https://www.reddit.com/r/comfyui/
Full YouTube Episode
As usual, you can find the full video episode on our YouTube (and don’t forget to like and subscribe!)
Timestamps
* 00:00:04 Introduction of hosts and anonymous guest
* 00:00:35 Origins of Comfy UI and early Stable Diffusion landscape
* 00:02:58 Comfy's background and development of high-res fix
* 00:05:37 Area conditioning and compositing in image generation
* 00:07:20 Discussion on different AI image models (SD, Flux, etc.)
* 00:11:10 Closed source model APIs and community discussions on SD versions
* 00:14:41 LoRAs and textual inversion in image generation
* 00:18:43 Evaluation methods in the Comfy community
* 00:20:05 CLIP models and text encoders in image generation
* 00:23:05 Prompt weighting and negative prompting
* 00:26:22 Comfy UI's unique features and design choices
* 00:31:00 Memory management in Comfy UI
* 00:33:50 GPU market share and compatibility issues
* 00:35:40 Node design and parameter settings in Comfy UI
* 00:38:44 Custom nodes and community contributions
* 00:41:40 Video generation models and capabilities
* 00:44:47 Comfy UI's development timeline and rise to popularity
* 00:48:13 Current state of Comfy UI team and future plans
* 00:50:11 Discussion on other Comfy startups and potential text generation support
Transcript
Alessio [00:00:04 ]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.
swyx [00:00:12 ]: Hey everyone, we are in the Chroma Studio again, but with our first ever anonymous guest, Comfy Anonymous, welcome.
Comfy [00:00:19 ]: Hello.
swyx [00:00:21 ]: I feel like that's your full name,