Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show an anachronistic group of racially diverse soldiers while rendering “Zulu warriors” as stereotypically Black.
Google CEO Sundar Pichai apologized, and Demis Hassabis, the co-founder of Google’s AI research division DeepMind, said that a fix should arrive “in very short order” — within the next couple of weeks. It ended up taking much, much longer than that (despite some Googlers pulling 120-hour workweeks!). But in the coming days, Gemini will once again be able to create pics showing people.
Well… sort of.
Only certain users — specifically those signed up for one of Google’s paid Gemini plans, Gemini Advanced, Business or Enterprise — will regain Gemini’s people-generating feature as part of an early access, English-language-only test.
Google wouldn’t say when the test will expand to the free Gemini tier and other languages.
“Gemini Advanced gives our users priority access to our latest features,” a Google spokesperson told TechCrunch. “This helps us gather valuable feedback while delivering a highly anticipated feature first to our premium subscribers.”
So what fixes did Google implement for people generation? According to the company, Imagen 3, the latest image-generating model built into Gemini, contains mitigations to make the people images Gemini produces more “fair.” For example, Imagen 3 was trained on AI-generated captions designed to “improve the variety and diversity of concepts associated with images in [its] training data,” according to a technical paper shared with TechCrunch. And the model’s training data was filtered for “safety,” plus “review[ed] … with consideration to fairness issues,” claims Google.
We asked for more details about Imagen 3’s training data, but the spokesperson would only say that the model was trained on “a large dataset comprising images, text and associated annotations.”
“We’ve significantly reduced the potential for undesirable responses through extensive internal and external red-teaming testing, collaborating with independent experts to ensure ongoing improvement,” the spokesperson continued. “Our focus has been on rigorously testing people generation before turning it back on.”
Imagen 3 and Gems
In a spot of better news, all Gemini users will get Imagen 3 within the week — minus people generation for those not subscribed to the premium Gemini tiers.
Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer artifacts and errors, Google claims, and is the best Imagen model yet for rendering text.
To allay concerns about the potential for deepfakes, Imagen 3 will use SynthID, an approach developed by DeepMind to apply invisible, cryptographic watermarks to various forms of AI-originated media. Google previously announced Imagen 3 would use SynthID, so this doesn’t come as much surprise. But I’ll note that the contrast between how Google’s treating image generation in Gemini versus other products, like its Pixel Studio, is a bit curious.
Alongside Imagen 3, Google’s rolling out Gems for Gemini — albeit only for Gemini Advanced, Business and Enterprise users. Like OpenAI’s GPTs, Gems are custom-tailored versions of Gemini that can act as “experts” on particular topics (e.g. vegetarian cooking).
Here’s how Google describes them in a blog post: “With Gems, you can create a team of experts to help you think through a challenging project, brainstorm ideas for an upcoming event, or write the perfect caption for a social media post. Your Gem can also remember a detailed set of instructions to help you save time on tedious, repetitive, or difficult tasks.”
To create a Gem, users write instructions, give it a name and they’re off to the races.
Gems are available on desktop and mobile in 150 countries and “most languages,” Google says (but not supported in Gemini Live just yet). There are several examples at launch, including a “learning coach,” a “career guide,” a “brainstormer” and a “coding partner.”
We asked Google if it had any plans for ways to let users publish and use other users’ Gems, similar to GPTs on OpenAI’s GPT Store. The answer was “no,” basically.
“Right now, we’re focused on learning how people will use Gems for creativity and productivity,” the spokesperson said. “Nothing further to share at this time.”