Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

Update: 2024-12-21

Description

🤗 Upvotes: 13 | cs.CV

Authors:

Jixuan He, Wanhua Li, Ye Liu, Junsik Kim, Donglai Wei, Hanspeter Pfister

Title:

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

Arxiv:

http://arxiv.org/abs/2412.14462v1

Abstract:

As a common image editing operation, image composition involves integrating foreground objects into background scenes. In this paper, we expand the application of the concept of Affordance from human-centered image composition tasks to a more general object-scene composition framework, addressing the complex interplay between foreground objects and background scenes. Following the principle of Affordance, we define the affordance-aware object insertion task, which aims to seamlessly insert any object into any scene with various position prompts. To address the limited data issue and incorporate this task, we constructed the SAM-FB dataset, which contains over 3 million examples across more than 3,000 object categories. Furthermore, we propose the Mask-Aware Dual Diffusion (MADD) model, which utilizes a dual-stream architecture to simultaneously denoise the RGB image and the insertion mask. By explicitly modeling the insertion mask in the diffusion process, MADD effectively facilitates the notion of affordance. Extensive experimental results show that our method outperforms the state-of-the-art methods and exhibits strong generalization performance on in-the-wild images. Please refer to our code on https://github.com/KaKituken/affordance-aware-any.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

Qwen2.5 Technical Report

2024-12-2125:31

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

2024-12-2123:02

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

2024-12-2123:11

How to Synthesize Text Data without Model Collapse?

2024-12-2124:20

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

2024-12-2119:57

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

2024-12-2120:44

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

2024-12-2121:08

DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation

2024-12-2123:08

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

2024-12-2124:09

No More Adam: Learning Rate Scaling at Initialization is All You Need

2024-12-2021:59

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

2024-12-2021:56

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

2024-12-2024:45

AniDoc: Animation Creation Made Easier

2024-12-2022:20

FashionComposer: Compositional Fashion Image Generation

2024-12-2019:47

GUI Agents: A Survey

2024-12-2021:01

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

2024-12-2022:42

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

2024-12-2020:41

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

2024-12-2020:52

Are Your LLMs Capable of Stable Reasoning?

2024-12-1924:11

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

2024-12-1922:34

00:00

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-173491162909470{-webkit-line-clamp:2;}Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

Jingwen Liang, Gengyu Wang

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion