跳到主要导航 跳到搜索 跳到主要内容

Disentangle to edit: instruction-guided latent manipulation for 3D facial video consistency

  • Peixu Zhang
  • , Zhaoxi Mu
  • , Shulei Ji
  • , Wang Xi
  • , Xinyu Yang
  • Xi'an Jiaotong University
  • Zhejiang University

科研成果: 期刊稿件文章同行评审

摘要

3D facial video editing has promising applications but is hindered by challenges such as a lack of spatio-temporal consistency and limited adaptability to natural language inputs. To address these challenges, we propose the disentangle to edit (D-to-E) framework. First, facial codes disentanglement inversion is exploited to disentangle identity and motion codes, allowing editing of identity attributes while preserving facial motion, thereby improving the temporal consistency of the results. Second, a diffusion-based facial code editor extends 2D editing to 3D, enabling flexible editing of identity codes through natural language guidance. Furthermore, we introduce an identity-structure preservation mechanism to enhance the spatial consistency of the results. Extensive experiments demonstrate that D-to-E can effectively perform spatio-temporal consistent multi-view facial video editing through natural language instructions.

源语言英语
文章编号221
期刊Multimedia Systems
32
3
DOI
出版状态已出版 - 6月 2026

学术指纹

探究 'Disentangle to edit: instruction-guided latent manipulation for 3D facial video consistency' 的科研主题。它们共同构成独一无二的指纹。

引用此