[…] This week, OpenAI open sourced Point-E, a machine learning system that creates a 3D object given a text prompt. According to a paper published alongside the code base, Point-E can produce 3D models in one to two minutes on a single Nvidia V100 GPU.
Outside of the mesh-generating model, which stands alone, Point-E consists of two models: a text-to-image model and an image-to-3D model. The text-to-image model, similar to generative art systems like OpenAI’s own DALL-E 2 and Stable Diffusion, was trained on labeled images to understand the associations between words and visual concepts. The image-to-3D model, on the other hand, was fed a set of images paired with 3D objects so that it learned to effectively translate between the two.
When given a text prompt — for example, “a 3D printable gear, a single gear 3 inches in diameter and half inch thick” — Point-E’s text-to-image model generates a synthetic rendered object that’s fed to the image-to-3D model, which then generates a point cloud.
After training the models on a dataset of “several million” 3D objects and associated metadata, Point-E could produce colored point clouds that frequently matched text prompts, the OpenAI researchers say. It’s not perfect — Point-E’s image-to-3D model sometimes fails to understand the image from the text-to-image model, resulting in a shape that doesn’t match the text prompt.
Earlier this year, Google released DreamFusion, an expanded version of Dream Fields, a generative 3D system that the company unveiled back in 2021. Unlike Dream Fields, DreamFusion requires no prior training, meaning that it can generate 3D representations of objects without 3D data.