Estimated reading time: 3 minutes
Google is really pushing ahead with Gemini, their big bundle of AI models, apps, and services. It seems like it could be pretty cool in some ways, but there are definitely areas where it’s not quite hitting the mark ā at least according to our casual look into it.
So, what exactly is Gemini? How do you even use it? And is it any better or worse than what other companies are offering?
To help you stay in the loop with all things Gemini, we’ve whipped up this handy guide. We’ll keep it updated with any new Gemini stuff that Google puts out, whether it’s new models, features, or just their plans for the whole Gemini shebang.
What is Gemini?
Gemini is Googleās long-promised, next-gen GenAI model family, developed by Googleās AI research labs DeepMind and Google Research. It comes in three flavors:
- Gemini Ultra, the flagship Gemini model.
- Gemini Pro, a āliteā Gemini model.
- Gemini Nano, a smaller ādistilledā model that runs on mobile devices like the Pixel 8 Pro.
All Gemini models were trained to be ānatively multimodalā ā in other words, able to work with and use more than just words. They were pretrained and fine-tuned on a variety of audio, images and videos, a large set of codebases and text in different languages.
This sets Gemini apart from models such as Googleās own LaMDA, which was trained exclusively on text data. LaMDA canāt understand or generate anything other than text (e.g., essays, email drafts), but that isnāt the case with Gemini models.
Whatās the difference between the Gemini apps and Gemini models?
Google, proving once again that it lacks a knack for branding, didnāt make it clear from the outset that Gemini is separate and distinct from the Gemini apps on the web and mobile (formerly Bard). The Gemini apps are simply an interface through which certain Gemini models can be accessed ā think of it as a client for Googleās GenAI.
Incidentally, the Gemini apps and models are also totally independent from Imagen 2, Googleās text-to-image model thatās available in some of the companyās dev tools and environments. Donāt worry ā youāre not the only one confused by this.
What can Gemini do?
Because the Gemini models are multimodal, they can in theory perform a range of multimodal tasks, from transcribing speech to captioning images and videos to generating artwork. Few of these capabilities have reached the product stage yet (more on that later), but Googleās promising all of them ā and more ā at some point in the not-too-distant future.
Of course, itās a bit hard to take the company at its word.
Google seriously underdelivered with the original Bard launch. And more recently it ruffled feathers with a video purporting to show Geminiās capabilities that turned out to have been heavily doctored and was more or less aspirational.
Discover more from News Round The Clock
Subscribe to get the latest posts sent to your email.