minDALL-E creates images based on text input
minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for non-commercial purposes. Environment Setup Basic setup Other packages Model Checkpoint Model structure (two-stage autoregressive model) Stage1: Unlike the original DALL-E [1], we replace Discrete VAE with VQGAN [2] to generate high-quality samples effectively. We Read more about minDALL-E creates images based on text input[…]