Happy Thursday!
You may have heard Google launched its “ChatGPT Killer” Gemini, and that part of it was fake.
What actually happened? And should you use Google's new tools?
That and more in this week’s Future Work, the 65th edition, and the last one of 2023 – I’ll be back in January.
Rather listen? The spoken version is available on YouTube, Spotify, and Apple Podcasts.
The need to know
- Google Launched Its ChatGPT Killer “Gemini:” Google unveiled Gemini, a new AI technology designed to compete with OpenAI's ChatGPT.
- Key Features: include its multimodal capabilities (processing text, video, voice, and code from the ground up) and enhanced reasoning abilities, positioning it as more intuitively intelligent than existing models.
- Controversy and Comparison with ChatGPT: Critics note that Google's comparisons with GPT-4 may be unfair. Gemini has not yet been released, and the benchmarks do not provide a direct comparison. Furthermore, some demos were revealed to be less interactive than initially presented.
Google’s ChatGPT Killer Gemini is Here, But Fake? Should You Use It?
After being pressured to deliver a ChatGPT alternative, Google presented Gemini, its OpenAI competitor, last week.
In the launch video, Google CEO Sundar Pichai explains that Google’s AI product is a logical extension of its mission “to make the world’s information accessible,” especially as that information has gotten larger and more complex than ever:
“We always viewed our mission as a timeless mission. It's to organize the world's information and make it universally accessible and useful. But as information has grown in scale and complexity, the problem has gotten harder. So we always knew we needed to have a deeper breakthrough to make progress.” – Sundar Pichai, CEO, Google
What Makes Google Gemini Different from OpenAI’s ChatGPT?
The two key differences that Google highlighted in its launch of Gemini are Multimodal and Reasoning:
Multimodal refers to how Google has built Gemini from the ground up to process text, video, voice, and code. In ChatGPT, this is something that came later into the model.
“Traditionally, multimodal models are created by stitching together text-only, vision-only, and audio-only models in a suboptimal way at a secondary stage. Gemini is multimodal from the ground up, so it can seamlessly have a conversation across modalities and give you the best possible response.” – Oriol Vinyals, VP Research, Google DeepMind
Reasoning means that Google believes Gemini is better at ‘thinking’ than ChatGPT, logically processing inputs and outputs.
(Update: OpenAI is getting ready to release ChatGPT 5. Check out our guide to its new features)
As Demis Hassabis, the CEO and Co-Founder of Google Deepmind, explains, combining multimodal with reasoning means that Gemini is more intuitively smart:
“We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models.” – Demis Hassabis, CEO and Co-Founder, Google Deepmind
Like a detective looking for clues on a mass scale, the combination of these benefits means Gemini is good at finding important stuff in large data sets no matter the ‘modality’: text, visuals, voice, or code.
Three Flavors of Gemini: Ultra, Pro, and Nano
Fun fact? Gemini, the Latin word for "twins," came about because Google had different teams working on language modeling.
According to Chief Scientist Jeff Dean, Google felt it was time for them to start working together, resulting in Gemini.
“The twins are the folks in the legacy Brain team (many from the PaLM/PaLM-2 effort) and the legacy DeepMind team (many from the Chinchilla effort) that started to work together on the ambitious multimodal model project we called Gemini, eventually joined by many people from all across Google.” – Jeff Dean, Chief Scientist, Google
Jeff added that Gemini was also the NASA project that was the bridge to the moon between the Mercury and Apollo programs.
Gemini, Google’s “largest and most capable AI model,” comes in three flavors:
- Gemini Ultra: the largest and most capable model Google has yet built for highly complex tasks. Most reporting about Google’s Gemini versus OpenAI’s ChatGPT focuses on Ultra. It’s very important to note that this version of Gemini, used in all the demos below, is not yet live. Google is still finalizing testing, including safety tests, ahead of a roll-out sometime next year. This also means that it’s possible that by the time Gemini launches, GPT-5 will be out, and comparisons made this week will hold no ground.
- Gemini Pro: a less powerful but more economically sensible version of Gemini. Gemini Pro will be available through Google’s GPT, Bard. It is best compared to GPT3.5, which powers the free version of ChatGPT.
- Gemini Nano: an efficient model for on-device tasks. This model will be used on Pixel phones, allowing you to summarize recordings and have Smart Reply in Gboard, starting with WhatsApp.
Google shared that in the coming months, Gemini will be available in more products and services like Search, Ads, Chrome, and Duet AI.
What Does Google Gemini Do?
Google shared a couple of use cases for which it thinks its model will be particularly useful.
Analyzing large data sets
One is finding information in large data sets. Two Google researchers showcase this when scientists look through tens of thousands of records to find the most relevant information.
Based on a prompt to Gemini, the model could distinguish between studies relevant to the query and those that weren't. Something that otherwise would have been a long and painstaking process done by a human.
Another prompt then made Gemini review the data in the paper and extract it in a universal format.
As Software Engineer Taylor Applebaum concludes, over a lunch break, Gemini read 200,000 papers, filtered it down to 250, extracted the data, and even updated a graph in real time.
This kind of AI-powered research shows us why AI can be a powerful driver of scientific breakthroughs.
The amount of data that can now be analyzed at these speeds makes it so scientists can get to breakthroughs faster.
As one biotechnology student commented:
“Last year, I wrote a paper about functional food and its possible effect on gut microbiota. Literature scanning is always an extensive endeavor in genetics. We hit dead ends, found irrelevant information, and tiredness caught up with us. Gemini would not only be a time saver, but it can also enable scientists to pull incredible feats in a heartbeat.”
Combine this data and insights capability with AI’s ability to generate limitless scenarios of synthetical test data, and it’s easy to see how these developments will usher in an era where medicine is created, not invented, with a simple prompt.
Explaining reasoning in math and physics
If you’re a parent and ever needed to help your kids with homework, Google has good news for you.
In a demo, Google’s Sam Cheung shows how Gemini can read a hand-written piece of homework, analyze the answers, and explain where and how the student went wrong.
It can then explain the concepts that need more clarification.
AI can do so in any phrasing that is most helpful and relevant to you.
As futurist Antony Slumbers said in our interview, AI could, for example, clarify economics concepts through analogies an art major would understand or any variation on this.
Creating bespoke web experiences
You’re used to typing something into Google, clicking a link, and landing on a webpage, where information is presented in the way the owner intended.
Gemini spins this into something much more bespoke. Interacting with one of its Experiments, a search query becomes a highly personalized web experience tailored to the user.
In the example, starting from the problem statement of needing to organize a birthday party, Gemini creates a custom interface that guides the user to the final ‘information,’ all in a highly engaging and interactive way.
With this, Google’s approach seems to cater to a more ‘day-to-day’ AI user with its friendly and colorful UI rather than the pretty technical-looking ChatGPT interface.
Benchmark Scores and Fake Demo Controversy
One of the things that Google seemingly couldn’t emphasize enough is that Gemini Ultra beats GPT-4 in several benchmarks.
These benchmarks are standardized tests that allow us to compare AI models head to head.
Some criticism quickly emerged.
Researchers pointed out that Google is comparing a model that has not been released, with ChatGPT-4, which is over a year old and has been publicly available since March 2023.
As The AI Advantage hilariously summarizes:
“We're comparing a 2024 model with the 2023 GPT-4 model, so it's not exactly apples to apples. Just wait for OpenAI to make their move with GPT 5 and then we will return to this comparison. As of now Google is winning the announcement game whereas gp4 is winning the usable product game.”
Additionally, the benchmarks compare Gemini to GPT-4 in ways that are not apples to apples. For one, the version of GPT-4 they used is through APIs, not directly from the model.
Second, Gemini was tested with a different methodology than GPT-4, optimizing it for better quality outputs.
As Philipp Schmid, Tech Lead at AI community Hugging Face, says: “🚨Never trust marketing content🚨”
But a much bigger controversy broke out mere days after the Gemini announcement when it came out that the most impressive demo wasn’t real.
A video where squiggly lines progressively turn into a duck, and where Gemini seems to be guessing along the way, together with other games, did not happen as such.
While in the demo, it looks like there’s a two-way conversation between the user and Gemini, in reality, Gemini responded to still images of the original video alongside text prompts. A very different experience than what was shown:
The Bottom Line
Fake demos, manipulated benchmarks, and announced-yet-not-released models aside, Google is in the arena with its Gemini model, at least being on par with ChatGPT–3.5.
Once the model does launch, it will be another great tool to have in your toolkit as a future forward leader.
The applications in data analysis, problem-solving, and interactive content interface Google demonstrated will all be valuable to work smarter, not harder.
To stay ahead, try the new Bard with image inputs and be first in line to pilot Ultra once it arrives.
Until then, have a great remainder of your week.
I will be out the last two weeks of the year and will see you back in January!
Happy holidays, and happy new year.
– Daan
PS: Thanks to The AI Breakdown for its great analysis of the topic.
More on AI in the Workplace from me:
- AI Creates $100k+ WFH Jobs – But What About Women?
- The AI-Driven 10-Hour Workweek
- The Top 6 AI Trends for 2024
- The 3-Day Workweek is Here, Says Bill Gates. I Agree.
- What Actually Happened with Sam Altman and OpenAI?
- [FlexOS Exclusive] Generative AI at Work Research Report
- How To Use AI at Work (With 39 reviewed tools)
- Smart Managers Create Their Own GPT Today (With 50+ Examples)
You Might Also Like …
Future Work
A weekly column and podcast on the remote, hybrid, and AI-driven future of work. By FlexOS founder Daan van Rossum.
AI Colleagues, Personalization, and a CEO Rejecting the Return to Office
Our latest articles
FlexOS helps you stay ahead in the future of work.