Neural Head Reenactment with Latent Pose Descriptors

CVPR 2020

Pose-Identity Disentanglement
Intuitively, nothing prevents our system from encoding person-specific information into the pose embedding.
Apparently, this doesn't happen with 3 simple techniques enabled:

  1. Pose encoder's capacity is lower than that of the identity encoder (in our case, MobileNetV2 vs ResNeXt-50).
  2. Pose augmentations (transformations that preserve person's identity in an image) are applied to pose source.
  3. Foreground mask is predicted, and reconstruction losses are applied computed with background blacked out.
Disabling the above techniques harms driver invariance, but improves pose encoding capability (can be useful in self-reenactment scenarios):

Identity source Pose source Ours No pose augm. No segmentation Heavier pose enc. Heavier pose enc., no pose augm.
More Results

Identity source

Identity source