HHAvatar's Project Page

Abstract

Creating high-fidelity 3D head avatars has always been a research hotspot, but it remains a great challenge under lightweight sparse view setups. In this paper, we propose HHAvatar represented by controllable 3D Gaussians for high-fidelity head avatar with dynamic hair modeling. We first use 3D Gaussians to represent the appearance of the head, and then jointly optimize neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. To address the problem of dynamic hair modeling, we introduce a hybrid head model into our avatar representation based Gaussian Head Avatar and a training method that considers timing information and an occlusion perception module to model the non-rigid motion of hair. Experiments show that our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions and driving hairs reasonably with the motion of the head.

Fig 1. HHAvatar achieves ultra high-fidelity image synthesis with controllable expressions at 2K resolution. The above shows different identities animated by the same expression. The bottom shows that variations in hair positions can arise for identical poses, stemming from diverse hair status (i.e., position and speed) at the previous moment.

Method

Fig 2. The pipeline of the HHAvatar rendering and reconstruction. We first optimize the guidance model including a neutral mesh, a deformation MLP and a color MLP in the Initialization stage. Then we use them to initialize the neutral Gaussians and the dynamic generator. Finally, 2K RGB images are synthesized through differentiable rendering and the super-resolution network, and the segmentation maps of the hair and the head are also synthesized through differentiable rendering. The HHAvatar are trained under the supervision of multi-view RGB videos and multi-view masks from face-parsing.

Results

Fig 3. Self reenactment results.

Fig 4. Emotion.

Fig 5. Cross-identity reenactment results.

Fig 6. Qualitative comparisons of different methods on self reenactment task with dynamic hairs in the self-captured dataset. From left to right: HAvatar, GaussianAvatars, MeGA and Ours. Our method can reconstruct details with high quality.

Fig 7. Qualitative comparisons of different methods on cross-identity reenactment task with dynamic hairs in the self-captured dataset. From left to right: HAvatar, GaussianAvatars, MeGA and Ours. Our method synthesizes high-fidelity images while ensuring the accuracy of hair motion.

Fig 8. Novel view synthesis results of our method. Top: we use 8-view synchronized videos for training the avatar. Bottom: we use 4-view synchronized videos for training the avatar with dynamic hais.

HHAvatar:

Gaussian Head Avatar with Dynamic Hairs

Zhanfeng Liao, Yuelang Xu, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, Yebin Liu

Abstract

Method

Results

Demo Video

Citation

Reference

HHAvatar:

Gaussian Head Avatar with Dynamic Hairs

Zhanfeng Liao*, Yuelang Xu*, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, Yebin Liu

Abstract

Method

Results

Demo Video

Citation

Reference

Zhanfeng Liao, Yuelang Xu, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, Yebin Liu