Google’s Gemini AI platform prepares to remember with ‘Memory’ feature rollout: Report
Google’s Gemini AI platform prepares to remember with ‘Memory’ feature rollout: Report
Google’s conversational AI platform, Gemini, has been receiving improvements quickly in recent months, and another big addition is about to be released. “Memory,” a highly awaited film, is reportedly getting closer to its release. According to early 9To5Google information from the previous year, Gemini was working on a “Memory” feature to improve user interactions. This function allegedly uses information from previous interactions to guide future interactions, maybe simplifying talks by remembering specifics like food preferences, ideal reaction times, and family demographics.
Dylan Roussel recently revealed new information on Twitter/X that shows off a redesigned user interface for Gemini’s upcoming “Memory” function. Roussel said that it may be launched in a few days at most. Because the improved functionality stores and recalls user-provided information, it should improve conversational efficiency by eliminating the need for repeated disclosures. Google promotes “Memories” as an instrument to gradually improve Gemini’s answers and customize them to user preferences. Users will be able to use the dedicated Memory page to manage saved data or to disable the capability.
These initiatives demonstrate Google’s dedication to improving and adjusting the Gemini platform to accommodate changing customer requirements. Additionally, May 14 at the Shoreline Amphitheatre in Mountain View, California, will mark the dates of Google’s annual I/O 2024 developer conference. The main event will begin with a keynote speech from Alphabet CEO Sundar Pichai, and then there will be a number of workshops that offer much-needed clarification on all of Google’s recent announcements. It will be intriguing to see what Google has planned for all the fans.
The tech behemoths OpenAI, Microsoft, Meta, Google Research, and others have been engaged in a multimodal AI system race for the duration of the last year. The CEO of Alphabet, Google, and DeepMind, Demis Hassabis, together with Sundar Pichai, have unveiled the highly anticipated generative AI system, called Gemini. This is the company’s most powerful and all-around artificial intelligence (AI) model, which can understand and produce text, audio, code, video, and images through native multimodality. In general tasks, math, programming, and reasoning abilities, it performs better than OpenAI’s GPT-4. After the April release of Google’s own LLM, PaLM 2, some of the models in the Google search engine family, comes this launch.
Google Gemini: What is it?
The first release, Gemini 1.0, is the ultimate example of artificial intelligence, exhibiting exceptional adaptability and development. Designed with a high degree of flexibility and scalability to function smoothly across varied platforms, ranging from large data centres to portable mobile devices, this generative AI model is well-suited for jobs requiring the integration of multiple data sources. The models perform extraordinarily well, outperforming the state-of-the-art outcomes in several benchmarks. It can solve problems with complex logic and can sometimes even surpass human specialists in certain situations. Let’s now explore the technological innovations that provide Gemini its remarkable powers.

Ability to handle text, video, code, images, and audio proficiently
Due to its natural multimodal nature, Gemini 1.0 may be trained on text, images, audio, and video simultaneously. The AI model can interpret and produce content across many data kinds with ease because to its combined training on a variety of data sources. It demonstrates extraordinary skill in managing:
Write something
Gemini’s abilities include sophisticated language comprehension, synthesis, reasoning, and textual information problem-solving. Due to its ability to perform well on text-based tasks, it is ranked among the best large language models (LLMs), surpassing even the most powerful models such as PaLM 2, Claude 2, and GPT-3.5, as well as inference-optimized models like GPT-3.5.
Coding is a common use for contemporary Large Language Models (LLMs), and Gemini Ultra excels at it. Gemini Ultra demonstrates its abilities in a range of coding-related activities by means of a thorough assessment of both internal and traditional benchmarks. In the HumanEval standard code-completion benchmark, instruction-tuned Gemini Ultra properly implements an astounding 74.4% of issues, where the model links function descriptions to Python implementations. Furthermore, Gemini Ultra achieves the maximum score of 74.9% on a recently created held-out evaluation benchmark for Python code generation jobs, Natural2Code, guaranteeing no web leakage. These outcomes highlight Gemini’s remarkable proficiency in coding situations and place it at the top of AI models in this field.
picture
In terms of picture production and interpretation, Gemini performs similarly to earlier SOTA models or OpenAI’s GPT-4V. Even in zero-shot, Gemini Ultra routinely outperforms the current methods, particularly for picture comprehension tasks linked to OCR that don’t require an external OCR engine. The system demonstrates impressive performance in a variety of tasks, including reading science diagrams, charts, and infographics, and responding to queries based on scanned documents and natural imagery.
Without relying on a middle plain language description, which might impede the model’s capacity to describe visuals, Gemini can generate images natively. In a few-shot context, this allows the model to uniquely produce visuals with prompts utilizing the interleaved image and text sequences. For instance, the user may ask the model to create text and picture suggestions for a webpage or blog post.
Video Comprehension
Gemini’s video comprehension ability is thoroughly assessed using a range of held-out criteria. With 16 frames sampled each video challenge, Gemini models show remarkable temporal understanding. Gemini Ultra demonstrated its strong performance in November 2023 by achieving cutting-edge outcomes on few-shot video captioning and zero-shot video question-answering challenges.
The aforementioned example offers a qualitative illustration of Gemini Ultra’s aptitude for improving game-related reasoning by exhibiting its comprehension of ball-striking mechanics in a soccer player’s video. These results validate Gemini’s high-level video comprehension skills, a crucial stage in developing a complex and skilled generalist agent.
Compared to USM and Whisper models, Gemini Pro performs much better on all ASR and AST tasks for both English and multilingual test sets. In particular, the FLEURS benchmark shows a significant improvement over its competitors as a result of Gemini Pro’s training on the FLEURS dataset. With a WER of 15.8, Gemini Pro beats Whisper even without FLEURS. Additionally, Gemini Nano-1 beats Whisper and USM on every dataset but FLEURS. Although Gemini Ultra’s acoustic performance has not yet been assessed, its larger model size raises hope for improved outcomes.
Groups of the Gemini Model
There are three sizes available for the model, and each size is designed to cater to distinct application needs and computational constraints:
Gemini Ultra
Efficient scalability on TPU accelerators is made possible by the Gemini architecture, which allows the most powerful model—the Gemini Ultra—to attain cutting-edge performance on a variety of intricate tasks, such as multimodal functions and reasoning.
Gemini Pro
A model that is tuned to prioritize cost, latency, and performance while performing well in a variety of activities. Strong thinking skills and broad multimodal capabilities are displayed.
Mini Gemini Nano
The most efficient mode, Gemini Nano, is intended for on-device use. There are two versions available: Nano-1 for low-memory devices with 1.8B parameters and Nano-2 for high-memory devices with 3.25B parameters. It is distilled from bigger Gemini models and goes through 4-bit quantization to provide best-in-class performance for optimal deployment. Let’s now examine the Gemini models’ technological capabilities.
Technical Abilities
It took improvements in training algorithms, datasets, and infrastructure to develop the Gemini models. Scalable infrastructure helps the Pro model, which uses a small portion of Ultra’s resources to complete pretraining in a matter of weeks. The Nano series leads the way in training and distillation, producing superior tiny language models for a range of applications and enhancing on-device experiences. Now let’s explore the technological advancements:
Infrastructure for Training Instruction Gemini Ultra made use of a sizable fleet of TPUv4 accelerators spread across several data centres, whereas the other Gemini models used Tensor Processing Units (TPUs), TPUv5e, and TPUv4. Growing beyond the previous flagship model, PaLM-2, presented infrastructure problems that required hardware failure and network connectivity solutions at previously unheard-of sizes. While in-memory model state redundancy greatly accelerated recovery from unforeseen hardware failures, the “single controller” programming architecture of Pathways and Jax simplified the development effort. Innovative methods including proactive SDC scanners and deterministic replay were used to address Silent Data Corruption (SDC) issues at this scale.
Instructional Dataset
Web documents, books, code, and media data are all included in the varied dataset that the Gemini models are trained on. It is also multilingual and multimodal. Sentence Piece tokenizer training on a substantial sample of the complete corpus improves model performance and vocabulary, making it possible to tokenize non-Latin scripts effectively. Heuristic rules and model-based classifiers are among the quality and safety filters that are used, and the size of the training dataset changes according to the size of the model. Abelations on smaller models are used to identify data mixes and weights, and staged training is used to modify the composition for the best pretraining outcomes.
The Architecture of Gemini
The Gemini models are constructed on top of Transformer decoders with architectural and model optimization enhancements for reliable training at scale, the researchers note, though they did not disclose all the specifics. TPUs are used in the model training process, which is done in Jax. With a distinct text and vision encoder, the architecture is comparable to that of Flamingo, CoCa, and PaLI from DeepMind.

Moral Thoughts
Gemini measures, tracks, and controls expected future societal effects on the models through an organized method of responsible deployment.
Quality Control and Safety Testing
Gemini prioritizes quality assurance and safety testing within the context of responsible development. Gemini’s commitment to maintaining ethical standards is demonstrated by the strict evaluation objectives that Google DeepMind’s Responsibility and Safety Council (RSC) has established across important policy domains, including but not limited to child safety. By including safety concerns throughout the development process, this dedication guarantees that Gemini satisfies the highest standards of ethical responsibility and quality.
Complete assessments of safety and trust are presently being carried out on Gemini Ultra, including red-teaming by dependable outside parties. Before being made generally available to users, the model is further improved via fine-tuning and reinforcement learning from human feedback (RLHF) to guarantee its resilience.
Possible Dangers and Obstacles
There are certain hazards associated with developing a multimodal AI model. Gemini places a high priority on reducing these risks, which includes factual accuracy, safeguarding children, hazardous material, cybersecurity, biorisk, representation, and diversity. The Gemini impact evaluations include a range of the model’s functionalities and methodically analyze the possible outcomes in accordance with Google’s AI Principles.
Do Hallucinations Affect Geminis?
Gemini’s report describes the steps taken to reduce the frequency of hallucinations, even if it does not specifically mention testing involving them. To solve this issue, Gemini specifically stresses instruction tweaking, focusing on three critical behaviours that are in line with real-world scenarios: attributions, closed-book answer generation, and hedging.
Performance & Application Improvements
Google BARD Chatbot x Gemini Pro
Gemini Pro is currently powering Bard, Google’s response to ChatGPT. Formerly known as LaMDA (Language Model for Dialogue Applications), Bard is an experimental conversational AI service created by Google. It aims to demystify complicated subjects and engage users in meaningful discussions by combining vast information with massive language models to provide inventive and educational solutions.
Nano Gemini x Pixel8 Pro
For on-device apps, Gemini Nano will be made available as a Pixel 8 Pro feature update. Two improved features Smart Reply in Gboard and Summarize in Recorder are brought forth by this connection. Gemini Nano offers offline capability and guarantees that private information remains on the device. With no network connection, Summarize in Recorder offers distilled insights from recorded material, and Gboard’s Smart Reply powered by Gemini Nano proposes intelligent, conversation-aware replies.
Search Generative
With a 40% decrease in latency for English searches conducted in the United States, Gemini AI will now be employed for Search Generative Experience (SGE). This improvement improves the quality of search results and speeds up the search process. With its potential to completely change how consumers interact with Google Search, Gemini’s use in Search is a major step toward a more effective and sophisticated generative search experience.
Integrations with Google Platforms
With improved functionality and experiences promised, Gemini is planning to expand its presence across a number of Google goods and services in the upcoming months. Users may expect Gemini to be integrated into important platforms like Duet AI, Chrome, Ads, and Search.
What Comes Next?
According to the paper, Gemini 1.0’s potential are mainly focused on the wide range of new applications and use cases that its capabilities make possible. Let’s examine these models’ potential implications in more detail. Understanding complicated images: Gemini’s aptitude for deciphering intricate visuals, such infographics or charts, opens up new avenues for the interpretation and analysis of visual data. Multimodal reasoning refers to the model’s ability to provide replies that integrate several modalities, such as text, audio, and visual sequences, via reasoning. This holds great promise for applications that need to integrate different kinds of data.
Applications in education: Gemini’s sophisticated comprehension and reasoning abilities may be used in classrooms to improve intelligent tutoring programs and individualized learning. Multilingual communication: Gemini might significantly enhance translation and multilingual communication given its ability to handle several languages. Information extraction and summarizing: Like previous state-of-the-art models (e.g., GPT-4), Gemini is perfect for data extraction and summarization jobs due to its capacity to digest and synthesize massive volumes of information. Applications in the creative domain: Another important aspect of the model’s potential is its ability to produce original material and support creative processes in creative assignments.
Google Gemini to Reportedly Get a Memory Feature That Lets It Remember Specific Information
According to reports, Google plans to launch a new Memory feature that would enable Gemini, their AI chatbot, to recall certain user information for use in subsequent exchanges. The software behemoth seems to have learned from OpenAI, since the latter debuted a comparable function for its ChatGPT app in February 2024. According to the report’s details, Gemini’s Memory feature may operate in a similar manner. This feature is reportedly going to be launched in the next days. It may also make an appearance during Tuesday’s Google I/O event.
On X, the previous Twitter platform, tipster Dylan Roussel shared information on this feature and stated that Google intended to name it Memory. Additionally, he provided a snapshot of the app that shows the feature’s introduction slide. The internet company was reportedly working on integrating long-term contextual memory into its chatbot, according to a 9to5Google story published last year. This sparked the initial rumors about the capability. It’s thought that Gemini will acquire a number of additional AI functions in the upcoming weeks.
Users will be able to instruct Gemini to recall details like their home, place of employment, place of study, and any allergies based on the snapshot. The chatbot will keep this information in mind and adjust its replies accordingly in subsequent exchanges. For example, if a user asks the AI for a simple sandwich recipe and has a peanut allergy, the AI won’t recommend any recipes that potentially contain peanuts. The user wouldn’t have to bother constantly reminding Gemini thanks to this.
Gemini remembers information you provide in conversations, such as your likes and preferences, so you don’t have to repeat yourself, according to the feature description seen in the picture. More replies catered to your requirements will come from Gemini as it gains more knowledge about you. At any time, turn off memory or manage information on the Memory page.
Further details are unknown at this time, such as potential locations for this feature’s addition, whether it will just work with the app or be accessible online, whether memories can be erased, and how private the information provided will be. It may be formally unveiled by Google during its I/O event, which gets underway at 10:00 a.m. PT (10:30 p.m. IST).
Google equips its Gemini AI chatbot with greater abilities and a longer memory.
During Tuesday’s I/O developer event, Google included a demo of a new feature in its Gemini AI chatbot: it could now converse with you like a human assistant. The Gemini Pro 1.5 model from Google DeepMind, which balances speed and performance, powers the advances in AI. A chatbot that can process and recall a lot more information more than any other consumer chatbot will be available to Gemini Advanced service tier users in the near future. According to Google, Gemini has a one million-token context window, which means it can sum up 100 emails or recall up to 1,500 pages of information.
According to Google, a user may upload a long rental agreement and then ask Gemini questions like pet policies or rent issues. Now, users may submit things straight into the chatbot from Google Drive. Google demonstrated in real time how the chatbot can now assist you in organizing your itinerary for a future trip. The first step is to extract the travel information (such as flight schedules and hotel locations) from the Gmail confirmation emails you received from your hotel and airline. It may recommend some nearby attractions that are likely to suit your needs after learning more about your interests and those of your family (based on Google Maps data).
Emmani lives
A more sophisticated version of the Gemini assistant that converses with you in a somewhat natural manner, Gemini Live was the most intriguing news made by the search giant. Gemini Advanced service tier members who are “Live” users, according to Google, have many speech options to select from. Gemini Pro is quick enough that if you interrupt it, it will pause and request further information before proceeding, much like a human assistant would. You may ask Live to assist you with getting ready for a job interview. In addition to provide feedback on your resume and abilities that are pertinent to the position, the assistant may even act out an interview and assess your answers.
Later this year, according to Google, a visual component will be added, enabling Live to converse with users about images it “sees” using a phone’s camera. Although Gemini Live isn’t now accessible (Google states it will be available “in the coming months”), it represents the most advanced AI helper available right now. In a Monday interview with Fast Company, Sissie Hsiao, general manager of Gemini and vice president at Google, stated, “We’ve been working on reasoning as a new capability set.” “Therefore, the next epic is applying it to solve problems using other tools, and creating those tools, rather than just text or image generation.”