We propose a neural rendering-based system that creates head avatars from a single photograph. Our approach models a person's appearance by decomposing it into two layers. The first layer is a pose-dependent coarse image that is synthesized by a small neural network. The second layer is defined by a pose-independent texture image that contains high-frequency details. The texture image is generated offline, warped and added to the coarse image to ensure a high effective resolution of synthesized head views. We compare our system to analogous state-of-the-art systems in terms of visual quality and speed. The experiments show significant inference speedup over previous neural head avatar models for a given visual quality. We also report on a real-time smartphone-based implementation of our system.
In our approach, the main idea is to split a single heavy generator network for images, which is run for each frame during test time, into two: one is run only during initialization (i.e., once per identity), and a much lighter network, which we call an inference generator, is run once per frame. In our proposed implementation, the following networks are trained in an end-to-end fashion:
During training, we first encode a source frame into the embeddings, then we initialize adaptive parameters of both inference and texture generators, and predict a high-frequency texture. These operations are only done once per avatar. Target keypoints are then used to predict a low-frequency component of the output image and a warping field, which, applied to the texture, provides the high-frequency component. Two components are then added together to produce an output. We therefore decompose an output image into two layers: a low-frequency layer is produced by the small inference generator directly, while a high-frequency layer is produced by a warping of a static texture.
The results below are all achieved with a model running in 42ms per frame on Snapdragon 855.
@InProceedings{Zakharov20, author={Zakharov, Egor and Ivakhnenko, Aleksei and Shysheya, Aliaksandra and Lempitsky, Victor}, title={Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars}, booktitle = {European Conference of Computer vision (ECCV)}, month = {August}, year = {2020}}