Creating high-fidelity 3D head avatars has always been a research hotspot, but it remains a great challenge under lightweight sparse view setups. In this paper, we propose HHAvatar represented by controllable 3D Gaussians for high-fidelity head avatar with dynamic hair modeling. We first use 3D Gaussians to represent the appearance of the head, and then jointly optimize neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. To address the problem of dynamic hair modeling, we introduce a hybrid head model into our avatar representation based Gaussian Head Avatar and a training method that considers timing information and an occlusion perception module to model the non-rigid motion of hair. Experiments show that our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions and driving hairs reasonably with the motion of the head.
Fig 1. HHAvatar achieves ultra high-fidelity image synthesis with controllable expressions at 2K resolution. The above shows different identities animated by the same expression. The bottom shows that variations in hair positions can arise for identical poses, stemming from diverse hair status (i.e., position and speed) at the previous moment.
Fig 2. The pipeline of the HHAvatar rendering and reconstruction. We first optimize the guidance model including a neutral mesh, a deformation MLP and a color MLP in the Initialization stage. Then we use them to initialize the neutral Gaussians and the dynamic generator. Finally, 2K RGB images are synthesized through differentiable rendering and the super-resolution network, and the segmentation maps of the hair and the head are also synthesized through differentiable rendering. The HHAvatar are trained under the supervision of multi-view RGB videos and multi-view masks from face-parsing.
Fig 3. Self reenactment results.
Fig 4. Emotion.
Fig 5. Cross-identity reenactment results.
Fig 6. Qualitative comparisons of different methods on self reenactment task with dynamic hairs in the self-captured dataset. From left to right: HAvatar, GaussianAvatars, MeGA and Ours. Our method can reconstruct details with high quality.
Fig 7. Qualitative comparisons of different methods on cross-identity reenactment task with dynamic hairs in the self-captured dataset. From left to right: HAvatar, GaussianAvatars, MeGA and Ours. Our method synthesizes high-fidelity images while ensuring the accuracy of hair motion.
Fig 8. Novel view synthesis results of our method. Top: we use 8-view synchronized videos for training the avatar. Bottom: we use 4-view synchronized videos for training the avatar with dynamic hais.
@@inproceedings{xu2023gaussianheadavatar,
title={Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians},
author={Xu, Yuelang and Chen, Benwang and Li, Zhe and Zhang, Hongwen and Wang, Lizhen and Zheng, Zerong and Liu, Yebin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
NeRFBlendShape: Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, and Juyong Zhang. Reconstructing person- alized semantic facial nerf models from monocular video. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), 41(6), 2022.
NeRFace: Guy Gafni, Justus Thies, Michael Zollhofer, and Matthias Niessner. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8645-8654, 2021.
HAvatar: Xiaochen Zhao, Lizhen Wang, Jingxiang Sun, Hongwen Zhang, Jinli Suo, and Yebin Liu. Havatar: High-fidelity head avatar via facial model conditioned neural radiance field. ACM Trans. Graph. 2023.
GaussianAvatars: S. Qian, T. Kirschstein, L. Schoneveld, D. Davoli, S. Giebenhain, and M. Nießner. Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20299-20309, 2024.
MeGA: Wang, Cong and Kang, Di and Sun, He-Yi and Qian, Shen-Han and Wang, Zi-Xuan and Bao, Linchao and Zhang, Song-Hai. MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing. arXiv preprint arXiv:2404.19026, 2024.