In the world of artificial intelligence, there are so-called text-image generators. It’s a very self-explanatory name: based on the phrase that the user types, the system returns an image corresponding to what was written.
Until then, the leader in the field of this type of program was DALL-E, software created by the OpenAi laboratory. Now, Google has decided to enter the game with the Imagen, announced last Tuesday (24).
Imagen works in the same way as the other generators: based on a text, it generates an image. On the page dedicated to the show, he is described as having an “unprecedented degree of photorealism and a deep understanding of language”. In fact, just look at the images released by the company to understand the potential of the new tool:


According to Google, Imagen produces better images than DALL-E. To reach this conclusion, the company created a comparison metric, called DrawBench. It’s nothing too complex: they used the same text to create images in several generators. The productions were submitted to human judges, who chose their favorites. And the Imagen results were chosen more times than the competitors.
The problems of images
Despite the impressive results on Imagen, caution is needed. After all, the images released were handpicked to show the best of the software’s capabilities – and may not represent the average test result.
Another problem with Imagen: even with a gigantic artistic and creative potential, the program could be used to generate fake news and disinformation – as has happened with deep fakes.
The Google team also draws attention to problems caused by the project’s database. Let’s go by parts: systems like this work through the machine learning (“machine learning”). The software is exposed to an immense amount of data (in the case of text-image generators, texts and images related to them). The program then studies this data to find patterns (associate the word “ball” with images with different types of balls, for example).
The objective is that, with this learning, the program can replicate these patterns according to the user’s demand. If I type in “football” it needs not only to understand that I want the image of a ball, but that it’s a brown oval ball with the seam visible.
To create images as complex as the ones you saw above, Imagen, of course, needs a huge amount of data. And the greater this volume, the more difficult it is to filter it. And therein lies the problem: theBy absorbing this information from internet banks, the machines learn to carry with them the same prejudices and stereotypes that are spread on the net.
“There is a risk that Imagen has encoded harmful stereotypes and representations, which justifies our decision not to release Imagen for public use,” the project team said on its official page. After a preliminary assessment, the company identified “various social prejudices and stereotypes” embodied by Imagen, “including a tendency to generate images of people with lighter skin tones and an inclination to portray different professions in line with Western gender stereotypes”.

It is for these and other reasons that Imagen still does not have a release date for the public. Google has committed to fixing “these challenges and limitations in future work.” It is hoped that, with new updates, the program will become a safe tool for generating amazing images from simple texts.
Share this article via: