Gemini 2.5: AI Browser Interaction Model
Description
Tune in to explore Google’s latest advancement in artificial intelligence: the Gemini 2.5 Computer Use model. This new AI model is designed with the unique capability to navigate and interact with the web just like a human user.
The Gemini 2.5 Computer Use model can perform actions such as clicking, scrolling, and typing within a browser window. It utilizes “visual understanding and reasoning capabilities” to analyze a user’s request and then carry out complex tasks, such as filling out and submitting forms. This functionality is crucial because it allows the AI agent to access data and operate within interfaces that lack an API or other direct connection.
Google’s new model currently supports 13 distinct actions, including opening a web browser, typing text, and dragging and dropping elements. It can be employed for tasks like UI testing or navigating interfaces created for people. For example, previous versions have been utilized in research prototypes like Project Mariner to execute tasks in a browser, such as adding items to a cart based on a list of ingredients. Developers can access the Gemini 2.5 Computer Use model through Google AI Studio and Vertex AI.
While this announcement follows other industry moves—such as OpenAI focusing on its ChatGPT Agent feature and Anthropic releasing a version of its Claude AI with similar capabilities—Google notes a key distinction. Unlike leading alternatives, Google’s new model is currently restricted only to accessing a browser environment, not an entire desktop operating system. Despite this, Google asserts that the Gemini 2.5 Computer Use model “outperforms leading alternatives on multiple web and mobile benchmarks”.







