Thursday, 21 November, 2024

Everything you need to know about Google Gemini


Estimated reading time: 3 minutes

Google is really pushing ahead with Gemini, their big bundle of AI models, apps, and services. It seems like it could be pretty cool in some ways, but there are definitely areas where it’s not quite hitting the mark ā€” at least according to our casual look into it.

So, what exactly is Gemini? How do you even use it? And is it any better or worse than what other companies are offering?

To help you stay in the loop with all things Gemini, we’ve whipped up this handy guide. We’ll keep it updated with any new Gemini stuff that Google puts out, whether it’s new models, features, or just their plans for the whole Gemini shebang.

What is Gemini?

Gemini is Googleā€™s long-promised, next-gen GenAI model family, developed by Googleā€™s AI research labs DeepMind and Google Research. It comes in three flavors:

  • Gemini Ultra, the flagship Gemini model.
  • Gemini Pro, a ā€œliteā€ Gemini model.
  • Gemini Nano, a smaller ā€œdistilledā€ model that runs on mobile devices like the Pixel 8 Pro.

All Gemini models were trained to be ā€œnatively multimodalā€ ā€” in other words, able to work with and use more than just words. They were pretrained and fine-tuned on a variety of audio, images and videos, a large set of codebases and text in different languages.

This sets Gemini apart from models such as Googleā€™s own LaMDA, which was trained exclusively on text data. LaMDA canā€™t understand or generate anything other than text (e.g., essays, email drafts), but that isnā€™t the case with Gemini models.

Whatā€™s the difference between the Gemini apps and Gemini models?

Google, proving once again that it lacks a knack for branding, didnā€™t make it clear from the outset that Gemini is separate and distinct from the Gemini apps on the web and mobile (formerly Bard). The Gemini apps are simply an interface through which certain Gemini models can be accessed ā€” think of it as a client for Googleā€™s GenAI.

Incidentally, the Gemini apps and models are also totally independent from Imagen 2, Googleā€™s text-to-image model thatā€™s available in some of the companyā€™s dev tools and environments. Donā€™t worry ā€” youā€™re not the only one confused by this.

What can Gemini do?

Because the Gemini models are multimodal, they can in theory perform a range of multimodal tasks, from transcribing speech to captioning images and videos to generating artwork. Few of these capabilities have reached the product stage yet (more on that later), but Googleā€™s promising all of them ā€” and more ā€” at some point in the not-too-distant future.

Of course, itā€™s a bit hard to take the company at its word.

Google seriously underdelivered with the original Bard launch. And more recently it ruffled feathers with a video purporting to show Geminiā€™s capabilities that turned out to have been heavily doctored and was more or less aspirational.


Discover more from News Round The Clock

Subscribe to get the latest posts sent to your email.

Join The Conversation

Join Our Mailing List

Nigerian Wedding – Dolapo + Jide ā¤ļøšŸ’

GROCERIES CATEGORY

Premier League Table

The Super Eagles at the FIFA World Cup (1994-2018)

Follow NRTC on Twitter

Discover more from News Round The Clock

Subscribe now to keep reading and get access to the full archive.

Continue reading