AI systems can now create images of humans that are so lifelike they look like photographs, except the people in them don’t really exist.
See for yourself. Each picture below is an output produced by a generative adversarial network (GAN), a system made up of two different networks including a generator and a discriminator. Developers have used GANs to create everything from artwork to dental crowns.
Some of the images created from Nvidia’s style transfer GAN. Image credit: Karras et al. and Nvidia
The performance of a GAN is often tied to how realistic its results are. What started out as tiny, blurry, greyscale images of human faces four years ago, has since morphed into full colour portraits.
Early results from when the idea of GANs were first introduced. Image credit: Goodfellow et al.
The new GAN built by Nvidia researchers rests on the idea of “style transfer”. First, the generator network learns a constant input taken from a photograph of a real person. This face is used as a reference, and encoded as a vector that is mapped to a latent space that describe all the features in the image.
These features correlate to the essential characteristics that make up a face: eyes, nose, mouth, hair, pose, face shape, etc. After the generator learns these features it can begin adjusting these details to create a new face.
The transformation that determines how the appearance of these features change is determined from another secondary photo. In other words, the original photo copies the style of another photo so the end result is a sort of mishmash between both images. Finally, an element of noise is also added to generate random details, such as the exact placement of hairs, stubble, freckles, or skin pores, to make the images
“Our generator thinks of an image as a collection of ‘styles,” where each style controls the effects at a particular scale,” the researchers explained. The different features can be broken down into various styles: Coarse styles include the pose, hair, face shape; Middle styles are made up of facial features; and Fine styles determines the overall colour.
How the different style types are learned and transferred by crossing a photo with a source photo. Image credit: Kerras et al. and Nvidia.
The different style types can, therefore, be crossed continuously with other photos to generate a range of completely new images to cover pictures of people of different ethnicities, genders and ages. You can watch a video demonstration of this happening below.
The discriminator network inspects the images coming from the generator and tries to work out if they’re real or fake. The generator improves over time so that its outputs consistently trick the discriminator.