DiscoverThe Daily AI ChatGemini 2.5: AI Browser Interaction Model
Gemini 2.5: AI Browser Interaction Model

Gemini 2.5: AI Browser Interaction Model

Update: 2025-10-09
Share

Description

Tune in to explore Google’s latest advancement in artificial intelligence: the Gemini 2.5 Computer Use model. This new AI model is designed with the unique capability to navigate and interact with the web just like a human user.

The Gemini 2.5 Computer Use model can perform actions such as clicking, scrolling, and typing within a browser window. It utilizes “visual understanding and reasoning capabilities” to analyze a user’s request and then carry out complex tasks, such as filling out and submitting forms. This functionality is crucial because it allows the AI agent to access data and operate within interfaces that lack an API or other direct connection.

Google’s new model currently supports 13 distinct actions, including opening a web browser, typing text, and dragging and dropping elements. It can be employed for tasks like UI testing or navigating interfaces created for people. For example, previous versions have been utilized in research prototypes like Project Mariner to execute tasks in a browser, such as adding items to a cart based on a list of ingredients. Developers can access the Gemini 2.5 Computer Use model through Google AI Studio and Vertex AI.

While this announcement follows other industry moves—such as OpenAI focusing on its ChatGPT Agent feature and Anthropic releasing a version of its Claude AI with similar capabilities—Google notes a key distinction. Unlike leading alternatives, Google’s new model is currently restricted only to accessing a browser environment, not an entire desktop operating system. Despite this, Google asserts that the Gemini 2.5 Computer Use model “outperforms leading alternatives on multiple web and mobile benchmarks”.

Comments 
In Channel
Will AI take my Job?

Will AI take my Job?

2025-10-2427:09

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Gemini 2.5: AI Browser Interaction Model

Gemini 2.5: AI Browser Interaction Model

Koloza LLC