Google introduced the successor of the Gemini 1.5 family of AI models, dubbed Gemini 2.0, on Wednesday. The new AI models come with improved capabilities including native support for image generation and audio generation, the company highlighted. Currently, the Gemini 2.0 model is available in beta to select developers and testers, whereas the Gemini 2.0 Flash AI model has been added to the web and mobile apps of the chatbot for all users. Google said the larger model will also be pushed to its products soon.

Google Gemini 2.0 AI Models

Nine months after the release of the Gemini 1.5 series of AI models, Google has now introduced the upgraded version of the large language model (LLM). In a blog post, the company announced that it was releasing the first model in the Gemini 2.0 family — an experimental version of Gemini 2.0 Flash. The Flash model generally contains fewer parameters and is not fit for complex tasks. However, it compensates for it with low latency and higher efficiency than larger models.

The Mountain View-based tech giant highlighted that the Gemini 2.0 Flash now supports multimodal output such as image generation with text and steerable text-to-speech (TTS) multilingual audio. Additionally, the AI model is also equipped with agentic functions. 2.0 Flash natively calls tools like Google Search, code execution-related tools, as well as third-party functions once a user defines them via the API.

Coming to performance, Google shared Gemini 2.0 Flash’s benchmark scores based on internal testing. On the Massive Multitask Language Understanding (MMLU), Natural2Code, MATH, and Graduate-Level Google-Proof Q&A (GPQA) benchmarks, it outperforms even the Gemini 1.5 Pro model.

Gemini users can select the experimental model from the model selector option located at the top left of the web and the top of the mobile app interface. Apart from that, the AI model is also available via the Gemini application programming interface (API) in Google AI Studio and Vertex AI. The model will be available to developers with multimodal input and text output. Image and text-to-speech capabilities are currently only available to Google’s early-access partners.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *