TransPixar: Advancing Text-to-Video Generation with Transparency

Update: 2025-01-08

Description

🤗 Upvotes: 9 | cs.CV

Authors:

Luozhou Wang, Yijun Li, Zhifei Chen, Jui-Hsien Wang, Zhifei Zhang, He Zhang, Zhe Lin, Yingcong Chen

Title:

TransPixar: Advancing Text-to-Video Generation with Transparency

Arxiv:

http://arxiv.org/abs/2501.03006v1

Abstract:

Text-to-video generative models have made significant strides, enabling diverse applications in entertainment, advertising, and education. However, generating RGBA video, which includes alpha channels for transparency, remains a challenge due to limited datasets and the difficulty of adapting existing models. Alpha channels are crucial for visual effects (VFX), allowing transparent elements like smoke and reflections to blend seamlessly into scenes. We introduce TransPixar, a method to extend pretrained video models for RGBA generation while retaining the original RGB capabilities. TransPixar leverages a diffusion transformer (DiT) architecture, incorporating alpha-specific tokens and using LoRA-based fine-tuning to jointly generate RGB and alpha channels with high consistency. By optimizing attention mechanisms, TransPixar preserves the strengths of the original RGB model and achieves strong alignment between RGB and alpha channels despite limited training data. Our approach effectively generates diverse and consistent RGBA videos, advancing the possibilities for VFX and interactive content creation.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

2025-01-0822:18

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

2025-01-0826:54

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

2025-01-0822:26

Personalized Graph-Based Retrieval for Large Language Models

2025-01-0821:16

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

2025-01-0821:38

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

2025-01-0822:25

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

2025-01-0822:15

TransPixar: Advancing Text-to-Video Generation with Transparency

2025-01-0822:45

AutoPresent: Designing Structured Visuals from Scratch

2025-01-0819:20

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

2025-01-0724:44

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

2025-01-0720:37

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

2025-01-0723:02

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

2025-01-0722:38

SDPO: Segment-Level Direct Preference Optimization for Social Agents

2025-01-0719:44

Graph Generative Pre-trained Transformer

2025-01-0720:24

LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models

2025-01-0723:14

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

2025-01-0725:56

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

2025-01-0423:53

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

2025-01-0423:32

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

2025-01-0419:15

00:00

TransPixar: Advancing Text-to-Video Generation with Transparency

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-173639603418270{-webkit-line-clamp:2;}TransPixar: Advancing Text-to-Video Generation with Transparency

TransPixar: Advancing Text-to-Video Generation with Transparency

Jingwen Liang, Gengyu Wang

TransPixar: Advancing Text-to-Video Generation with Transparency