摘要
3D facial video editing has promising applications but is hindered by challenges such as a lack of spatio-temporal consistency and limited adaptability to natural language inputs. To address these challenges, we propose the disentangle to edit (D-to-E) framework. First, facial codes disentanglement inversion is exploited to disentangle identity and motion codes, allowing editing of identity attributes while preserving facial motion, thereby improving the temporal consistency of the results. Second, a diffusion-based facial code editor extends 2D editing to 3D, enabling flexible editing of identity codes through natural language guidance. Furthermore, we introduce an identity-structure preservation mechanism to enhance the spatial consistency of the results. Extensive experiments demonstrate that D-to-E can effectively perform spatio-temporal consistent multi-view facial video editing through natural language instructions.
| 源语言 | 英语 |
|---|---|
| 文章编号 | 221 |
| 期刊 | Multimedia Systems |
| 卷 | 32 |
| 期 | 3 |
| DOI | |
| 出版状态 | 已出版 - 6月 2026 |
学术指纹
探究 'Disentangle to edit: instruction-guided latent manipulation for 3D facial video consistency' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver