Have you ever had a beautiful vision but lacked the drawing skills to get it down on paper? A new artificial intelligence (AI) system in pre-release from OpenAI has unlocked the artist in the machine. DALL-E, as the technology is called, can convert simple text prompts into digital illustrations in an array of styles, from the painterly to the photo-realistic — such as a sea otter inspired by Johannes Vermeer’s “Girl with a Pearl Earring” (1665), or teddy bears shopping for groceries in the style of Japanese Ukiyo-e prints.
OpenAI first introduced DALL-E, named with nods to the endearing robot protagonist of the 2008 Pixar movie WALL-E and Surrealist painter Salvador Dalí, in January of 2021 and has been working to refine the system ever since. DALL-E 2, the most current version, renders images in higher resolution based on greater comprehension of the prompts. It also has the added feature of “in-painting,” which enables a user to swap one aspect of a photograph for another — for example, seamlessly replacing a dog sitting on a chair for a cat, as demonstrated in an introductory video released by the company this month. Further, DALL-E can analyze an existing image and present an array of variations with different angles, styles, and colorways.
DALL-E leverages a two-stage model, first internally generating a “CLIP” image that matches with text based on deep-machine learning that has taught it to identify and correlate text with images, and then using a “decoder” that generates an image to meet the described conditions.
“We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity,” said an OpenAI research paper, published on the DALL-E 2 website. “Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation.”
In non-clinical terms, if you want to see “a bowl of soup that looks like a monster, knitted out of wool,” well, now you can. “A palm with a tree growing on top of it” — why not? These and more are available on DALL-E’s Instagram, where you can decide for yourself if this is the next great art trend (though unfortunately you cannot purchase that Vermeer-esque sea otter as a poster) and DM them with ideas for image generation.
Like all of us, DALL-E is still learning, and has certain limitations. Some of these are flaws in the data pool — for example, mislabeled images that amount to teaching the AI the wrong word for something, which might then affect its output. Others are imposed restrictions on the software capabilities, which includes a content policy that bans hateful symbolism, harassment, violence, self-harm, X-rated content, shocking or illegal activity, deception, political propaganda or images of voting mechanisms, spam, and public health.
The software, for instance, did not completely understand the art historical implications of Hyperallergic’s request for “‘The Scream’ on a roller coaster,” or “a photo of a Jeff Koons balloon dog getting popped with a pin in outer space,” but the images are pretty satisfying nonetheless.
Currently, OpenAI is guarding their technology closely, generating images upon request but not allowing it for open use outside the company. They also will not generate images of real people, which means the photos of my tasteful beach wedding to Channing Tatum are on hold AGAIN.
This points to a pitfall of AI-generated imagery, and one that the company is seemingly preparing to address: The creation of realistic-looking false images presents a potential new buttress for fake news, a movement which has already led to geo-political destabilization and a global public health crisis in recent decades. It’s all fun and games when you’re generating “robot playing chess” in the style of Matisse, but dropping machine-generated imagery on a public that seems less capable than ever of distinguishing fact from fiction feels like a dangerous trend.
Additionally, DALL-E’s neural network can yield sexist and racist images, a recurring issue with AI technology. For instance, a reporter at Vice found that prompts including search terms like “CEO” exclusively generated images of White men in business attire. The company acknowledges that DALL-E “inherits various biases from its training data, and its outputs sometimes reinforce societal stereotypes.”
For their part, OpenAI is still controlling the technology and requiring that the use of their images include disclosure of their status as AI-generated, as well as the inclusion of a little color-bar logo in the lower right-hand corner of all images — but the ability to enforce such measures seems difficult to maintain if their product is eventually open for use at the scale of the entire internet.
For now, we are in that hopeful, playful part of tech development, where we wonder at the marvelous nature of our own invention. As the saying goes, the road to singularity is paved with “Otter with a Pearl Earring.”