Google’s Imagen AI is creating art based on text, but there are limitations
Imagine “a robot couple fine dining with Eiffel Tower in the background”? For us humans, it is pretty simple to picture this in our heads. Of course, the more creative ones among us can easily bring these words to life in their artwork. And now Google’s AI model called Imagen is capable of doing something similar. In a new announcement, Google has showcased how Imagen, which is a text-to-image diffusion model, is able to create images based on written text.
The most remarkable part though is the accuracy and the photorealism seen in the pictures, all of which are created these models. Google has showcased a number of artworks which were created Imagen, which accurately depict the sentence in question. For instance, there’s one Android mascot made out of bamboo. Another shows an angry bird. Another shows a chrome-plated duck with a golden beak arguing with an angry turtle in a forest.
Check out some of the artwork below
A robot couple fine dining with Eiffel Tower in the background.
A really angry bird.
An Android mascot made out of bamboo.
A chrome-plated duck with a golden beak arguing with an angry turtle in a forest.
Google says Imagen is based on its “large transformer language models” which help the AI understand the text. Imagen has also helped Google researchers make another key discovery, that generic large language models “are surprisingly effective at encoding text for image synthesis.”
However, the company notes that there are limitations to this, including “several ethical challenges facing text-to-image research broadly.” It admits this could impact “society in complex ways,” and there’s a risk of misuse of such models. This is why it is not releasing the code or a public demo right now.
Google’s blog notes “the data requirements of text-to-image models have led researchers to rely heavily on large, mostly uncurated, web-scraped datasets”. The problem with such datasets is that they often “reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalised identity groups,” according to the blog.
The post adds that “a subset of our training data was filtered to remove noise and undesirable content, such as pornographic imagery and toxic language,”. But the dataset Google used, which is the LAION-400M, known to “contain a wide range of inappropriate content including pornographic imagery, rac slurs, and harmful social stereotypes,” notes the company.
Google admits that “there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.”
Finally, Imagen is still very limited when it comes to generating art that depicts people, and it is mostly giving stereotypical results. It tends to have “social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones,” states Google. Also, there’s a preference for showcasing Western gender stereotypes when asked to portray different professions.