The AI image generator DALL-E’s response to the prompt “Mondrian painting of a tomato by the sea” (courtesy DALL-E)

People get up in arms whenever the hand of the artist is detached from the final artwork. “Are photographs real art?” they muttered in the 19th Century. “God I hate this Pollock guy,” cried haters witnessing a splattered canvas that the artist seemingly never touched. So it’s no wonder that AI image-generators have got art historians in a twist, as more artists make use of these tools to inform their practice. I love diving into what gets people’s blood boiling in the art world, and this summer AI crept its way onto the leaderboard of irritants. But why? And what might it teach us about art history and visual consumption?

DALL-E 2, a text-to-image AI generation program, went live to audiences this autumn. The initial version of the model — which takes its tongue-in-cheek name from Disney’s lovable 2008 robot WALL-E and the Surrealist artist Salvador Dalí — was released in January 2021 by the OpenAI research lab. A version of the tech, DALL-E Mini, was released early on the Hugging Face platform, taking Twitter by storm as a meme sensation, and springboarding the program to international interest beyond AI experts. Over 1.5 million users are creating more than 2 million images per-day with DALL-E.  

DALL-E 2 uses the “GPT-3” model of CLIP (Contrastive Language-Image Pre-Training, announced by OpenAI last year), a computer vision system, to generate 1024×1024 pixel images from typed text prompts. The tool was trained using 650 million pairs of images and captions taken from the internet. After collecting image-text pairs, researchers trained the CLIP model to generate text to accurately describe an image, creating a mathematically reliant model. DALL-E then reversed this process, generating images that are well-described by text inputs based on CLIP’s data. Users can also use DALL-E 2 to “outpaint” images — extending pre-existing images beyond their previous borders — and to edit a pre-existing image using text commands. 

DALL-E image based on the prompt: “An Impressionist painting of a tomato by the sea.”

When inputting your DALL-E 2 request, you’re given the instruction to “start with a detailed description” and the example of “an Impressionist oil painting of sunflowers in a purple vase.” But, what does DALL-E 2 actually “understand” by the style of the Impressionists? Or any artistic style or movement, for that matter? Using the same prompt of “a tomato climbing a ladder by the sea” I put DALL-E 2’s art historical prowess to the test.

For the Impressionists (“An Impressionist painting of a tomato climbing a ladder by the sea”), DALL-E 2 seems to identify that it is a style based around loose brushstrokes, and color-contrasts indicating the impact of light.

It did a surprisingly good job at pinpointing what was meant by “18th Century” art, too. Adding textural elements at the sides and producing a really quite broodingly regal image. What is also interesting is that DALL-E 2 represented what 18th Century artworks look like today, their color palette dulled by time. 

DALL-E’s interpretation of “18th-Century painting of a tomato by the sea.”

My personal favorite was DALL-E’s interpretation of Robert Mapplethorpe’s style. The monochrome image gave the tomato a distinctly pygian look, a sexy nod to Mapplethorpe’s figures. The “Henry Moore sculpture” prompt also made me smile: it would seem second nature to DALL-E 2 that a sculpture requires a plinth. 

There were some styles that DALL-E was less adept at recreating, like De Stijl or the Surrealists. It made a good go of interpreting “Mondrian” in the prompt, adding straight lines which cut through the image. Close enough. Warhol’s tomato, too, captured some of the flatness associated with his work, and the Cubist attempt was — in places — angular.

Playing with DALL-E 2 brings up two questions: to what extent can this technology truly “understand” art, and is it useful to see art history quantified in the “mind” of a machine? 

Rinat Akhmetov, Product Lead at the Artificial Intelligence consultancy Provectus told Hyperallergic that while DALL-E 2 does not have the emotional capacity of a human scholar, “the model has seen more images, paintings, styles, etc., than most experts [in] humanity, and its opinion will be a subjective but comprehensive view.”

It is true that DALL-E 2 did indeed optimize hundreds of billions of parameters in its quest for knowledge, but it is significant to remember that the images it consumed weren’t necessarily “neutral.” Much like artists project their gaze onto their canvases, DALL-E 2 has absorbed a data set not without its own biases. José Lizarraga, senior innovation and creative advisor for the Algorithmic Justice League, spoke to me about this issue of gaze: “What is striking about AI-generated art is that it is dependent on systems and a corpus of visual artifacts that still center the white heterosexual male gaze — by virtue of who is represented in the tech design and development world, and who does content moderation … Similarly, it has been shown to generate offensive and racist images because of the unfiltered data that the AI uses”.

DALL-E’s Cubist “tomato by the sea.”

Professor Sunil Manghani at the University of Southampton, who has recently begun a special interest group for Arts & AI, is in agreement. The biases in the data set do not only affect which gaze is platformed by DALL-E 2’s creations, but also its accuracy in recreating certain genres, he said. 

“There will be historical biases,” Manghani told Hyperallergic in an interview. “Very early works will likely be less in abundance, while very contemporary works may be skewed too (partly for copyright reasons). That leaves us in the ‘middle’ with perhaps high preponderance of styles such as Impressionism, Surrealism, [and] Expressionism”.

So where does this technology fall in the ongoing writing of the history of art? Manghani gives an apt metaphor for DALL-E 2’s image generation: “If I toss a coin three times and get all heads I might be led to think there is a higher probability to get heads. But if I toss the coin 1 million times it will be quite clear it is 50/50. As such, image diffusion models, while extremely clever in terms of the method (they convert imagery into noise then re-tune back up to a credible or probable picture) are only really working from previously existing imagery. In this sense, why would we add them to the history of art? DALL-E 2 is re-making all that has been before.”

DALL-E’s Andy Warhol version of a “tomato by the sea.”

However, these images — while based on a fixed data set from the past — are new. Maghani sees the crux of the issue as what we define “art history” as: “If you treat [art history] as a discipline to be policed then you are inclined to reject DALL-E 2 imagery. If you treat art history as simply the expression of a worldly history of the creation of imagery (going back to the beginnings of civilization) then you should logically now accept DALL-E 2.” 

Very shortly after the technology’s release on Hugging Face, DALL-E memes swept the internet. Within two months of the announcement of DALL-E 2, the technology was used to produce the very first AI-generated cover of Cosmopolitan magazine. Big brands like Heinz and Nestlé have also harnessed the technology for advertising campaigns — Nestlé opting for a particularly art-historical angle by outpainting Johannes Vermeer’s 1657 painting The Milkmaid. The technology is beginning to deeply ingrain itself in visual culture.

AI artist Mario Klingemann points out that DALL-E’s propensity for art-knowledge isn’t necessarily its ultimate purpose. “I think that replicating existing art styles is not the true calling for these models anyway,” he said in an interview. “Their real potential lies in learning what is relevant and important in all forms of image-making and perception and then hopefully allow[ing] us to discover new modes of expression”. 

DALL-E could become instrumental in arts education, and will without doubt be incorporated into artists’ practices. Rishabh Misra, a senior machine learning engineer at Twitter, mentions the potential of AI as an independent art form. “AI has drastically changed the nature of creative processes and is disrupting the art industry,” he told Hyperallergic. “AI can be treated as a creative entity in its own right, capable of replicating aspects of creative artistic behavior and augmenting human creativity.” 

Aditya Ramesh, a lead researcher on the DALL-E project, commented on the significance of DALL-E for art lovers, telling Hyperallergic, “AI-generated art creates the opportunity for personalized art generation. DALL·E has shown us that a lot of people find value in the process of creating art and engaging with a community of creators, even if they haven’t had formal artistic training.”

Whether the images generated through DALL-E have intrinsic value as stand-alone pieces of art opens up an entirely different debate. But OpenAI have already predicted DALL-E’s potential as an image-generator de force, implementing content regulations and preemptive anti-deep fake features, meaning that no recognizable faces can be generated. It would seem that to find a definitive answer to what DALL-E 2’s place is in art history requires a limited view of what art history is, and a reductive understanding of the technology’s potential.

Verity Babbs is an art writer and presenter based in Southampton, UK. She graduated with a degree in History of Art from the University of Oxford and now hosts the art-themed comedy night Art Laughs.