Google AI on verge of writing fiction, generating videos… but humans are still a factor
Could AI soon be writing fiction, generating long-form videos or composing music? Well, that’s what Google is trying to understand. With its new Wordcraft project, Google’s chatbot LaMDA is now writing fiction based on inputs from writers. LaMDA is Google’s conversational AI that run into controversy earlier this year after an engineer claimed the AI was sentient. Google revealed at its AI event in New York that it had “teamed up with professional writers who used the Wordcraft editor to create a volume of short stories.” These stories are now available online for the public to read.
“I believe we’re going to transform how people express themselves creatively. We engaged with professional authors and invited them to write experimental fiction using LaMDA as a tool. We also learned it’s not easy. LaMDA is also not doing all the work. It’s the writers who are doing the work,” Douglas Eck, Senior Research Director at Google Research, said in a press briefing prior to the event.
So does Google see a future where LaMDA could perhaps replace human writers? Not yet, according to Eck, who admitted that if one asked LaMDA to write the whole story, the results are not as good or interesting. “What’s interesting is to use the technology as a spice, an addition to what you’re trying to do. We’ll keep moving the bar on what these tools can do. But these tools will effectively remain a spice of sorts, they’ll remain as a way to enable us to tell stories differently,” he explained.
He also acknowledged that these models pose serious risks, and the aim is not to blur the dinction between what’s real and what is AI-based. “We also have to consider the conversation about generative models intersecting with intellectual property,” he noted.
Writing fiction is not the only creative avenue that Google is exploring with the help of its AI models. Google is also looking at how AI could be used to generate video and music.
On AI-based video generation, Google revealed two new models called Imagen and Phenaki. While the Imagen Video uses diffusion to generate high-quality individual images, which Google claims are more suitable for shorter videos, Phenaki uses a “sequence learning technique that generates a series of tokens over time,” to create long-form videos. Google said that combining the two models will ensure super-resolution at the frame level and coherence in time.
It also revealed videos created the two models. When asked about the challenges of using AI to create a video, Eck said that while they are seeing progress, it is still a tough task. “The difficulty is ensuring coherence between each frame. If you predict one frame from the previous frame, the model begins to lose coherence,” he explained. This was a fundamental challenge in video generation, which Google says it is yet to solve fully.
Finally, AudioLM is a new framework for generating realic speech and music based on only a short audio sample. The music is limited to piano for now. Google says this “is a pure audio model trained without any text or symbolic representation of music.”