The image you see at the opening of this report was created by an artificial intelligence. And there was no need to use codes or have any knowledge of machine learning (algorithm training method) to create it. A simple text description was enough: “a robot holding a camera”.

This “self-portrait” was generated by DALL-E 2, an ML model created by OpenAI, an artificial intelligence research firm funded by Elon Musk. The company is also the creator of some of the most advanced AI models in the world, such as the GPT-3, which generates natural text conversations and is used for chatbots.

Introduced in April this year, DALL-E 2 (pronounced “Dalí”, after the Spanish painter Salvador Dalí) is the second version of this artificial intelligence capable of creating any image from descriptions in natural language. The first was released in 2021, but it didn’t go viral as much as the new one.

The reason is the level of realism of the new version. Just weeks after the reveal of DALL-E 2, social media was taken over by stunning images created with simple descriptions. It didn’t take long for more companies to want to jump on the bandwagon. In May, Google announced its competitor: Imagen.

But AI models capable of creating images are nothing new. Independent developers have been exploring the idea for years. Apps like Wombo’s Dream do it for free.

In recent days, the DALL-E mini has become a fever on social media, generating strange, funny and random images. It is also an open version, hosted on the collaborative platform. hugging Face, far less powerful than the DALL-E 2 — but also far less constrained.

Images created by DALL-E 2

1 / 7 Request: ‘painting of a caramel mutt chasing a motorcyclist with the Corcovado of Rio de Janeiro in the background, in the style of Vincent Van Gogh’ Playback/DALL-E 2 two / 7 Request: ‘Cat-headed man selling fish-headed popsicle to children on a beach on Mars’ Playback/DALL-E 2 3 / 7 Request: ‘a robot holding a camera’ Playback/DALL-E 2 4 / 7 Request: ‘thousands of people dressed in blue and white in front of the National Congress in Brasilia’ Playback/DALL-E 2 5 / 7 More versions of ‘Cat-headed man selling popsicle to fish-headed kids on a beach on Mars’ Playback/DALL-E 2 6 / 7 Variations created by DALL-E 2 from the photo of reporter Lucas Carvalho, from Tilt Playback/DALL-E 2 7 / 7 One more variation, with the change request: ‘with clown clothes’ Playback/DALL-E 2

How it works?

“Usually, we are used to using AI to identify and understand things. Here we have what we call generative AI: it creates new things, and not only understands what already exists”, he explains. Yuri Malheiros, professor at the Federal University of Paraíba and coordinator of ARIA, Laboratory of Artificial Intelligence Applications at UFPB. “That’s very impressive.”

DALL-E 2 and Imagen are based on the same principle as any machine learning model: the algorithm processes a huge volume of data and is trained to identify patterns among them. In this case, the data are images and text descriptions.

The second step is content generation: through a process called “diffusion”, the robot is able to piece together all the horse images it has ever seen, mix them up, and then highlight the common parts to create a new high resolution image.

In addition to creating images, AI can also generate variations of an already made image. tilt had brief access to the full DALL-E 2 (which is still closed to OpenAI guests) and we asked her to generate variations of a photo of this reporter signing the story. The result:

Photo variations by reporter Lucas Carvalho created by DALL-E 2; the original is the first on the left, in the top row Image: Reproduction/DALL-E 2

open versions

While OpenAI and Google keep access to their tool restricted to researchers, the DALL-E mini is having a party with the lego audience.

This kind of open source “copy” of DALL-E was created by Boris Dayma, a French developer married to a Brazilian woman and a former student at the Pontifical Catholic University of Rio de Janeiro (PUC-Rio).

“The first DALL-E is more or less similar to ours [DALL-E mini]“, says Boris, who speaks Portuguese, in an interview with tilt.

Anyone can generate images with English descriptions on the DALL-E mini website. For more artistic images, the results are comparable to the OpenAI models, except for the lower resolution. In more photorealistic images, however, the difference between a company backed by the richest man in the world and an app made by volunteers becomes clearer.

“O DALL-E 2 has a very different thing, which is diffusion. With this architecture, it is slower, but it achieves a much more impressive result”, recognizes Dayma.

Even knowing that there would be limitations in the capacity of the “mini”, the developer felt it was important to bring the tool to people.

“The technological challenge [de recriá-la] was very interesting, but I also wanted to give the public access to a version that anyone could use,” he says. Limits. When you have a demo, an app, you can play around and play and really see what it’s like.”

For Clem Delangue, CEO of Hugging Face, which hosts this version, open source alternatives like this allow technology to evolve in a fairer way, with free access for students and researchers, and prevent innovation from being monopolized by Big Techs. .

“If you look at any technology and any science, there have always been these two approaches, open and closed”, says Delangue, in an interview with tilt. “These are complementary approaches. But the beauty of open source it’s the same beauty of science: doing things in an open, transparent, collaborative way. It’s being able to distribute power so any organization can stay current and ensure ethical safeguards so technology can evolve.”