Generative AI models have a propensity for learning complex data distributions, which is why they’re great at producing human-like speech and convincing images of burgers and faces. But training these models requires lots of labeled data, and depending on the task at hand, the necessary corpora are sometimes in short supply.
The solution might lie in an approach proposed by researchers at Google and ETH Zurich. In a paper published on the preprint server Arxiv.org (“High-Fidelity Image Generation With Fewer Labels“), they describe a “semantic extractor” that can pull out features from training data, along with methods of inferring labels for an entire training set from a small subset of labeled images. These self- and semi-supervised techniques together, they say, can outperform state-of-the-art methods on popular benchmarks like ImageNet.
“In a nutshell, instead of providing hand-annotated ground truth labels for real images to the discriminator, we … provide inferred ones,” the paper’s authors explained.
In one of several unsupervised methods the researchers posit, they first extract a feature representation — a set of techniques for automatically discovering the representations needed for raw data classification — on a target training dataset using the aforementioned feature extractor. They then perform cluster analysis — i.e., grouping the representations in such a way that those in the same group share more in common than those in other groups. And lastly, they train a GAN — a two-part neural network consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples — by inferring labels.