Fara-7B: The 7B Agentic SLM Redefining On-Device CUA Performance
Description
Join us for a deep dive into Fara-7B, Microsoft Research's first agentic Small Language Model (SLM) designed specifically for computer use. This open-weight, ultra-compact model is pushing the frontiers of computer-use agents, optimized for real-world web tasks.As ML insiders, discover how Fara-7B achieves state-of-the-art performance within its size class (only 7 billion parameters) and is competitive with significantly larger, more resource-intensive agentic systems. This efficiency allows Fara-7B to run directly on devices, paving the way for personal and private agentic computing by offering reduced latency and improved privacy, as user data remains local.We explore the technical innovation behind this Computer Use Agent (CUA):
1. Perception and Action: Unlike systems that rely on separate models or accessibility trees, Fara-7B operates by visually perceiving a webpage and takes actions—like scrolling, typing, and clicking—based on directly predicted coordinates, using the same modalities as humans.
2. Data Generation: Learn about the novel, scalable synthetic data generation pipeline built on the Magentic-One framework. This pipeline generates high-quality demonstrations for supervised finetuning by using a multi-agent system composed of an Orchestrator, a WebSurfer, and a UserSimulator agent. The final training dataset consists of 145,000 trajectories.
3. Architecture: Fara-7B uses Qwen2.5-VL-7B as its base model, chosen for its strong performance on grounding tasks and ability to support long contexts.
4. Evaluation: We break down the model's strong benchmark results against models like GPT-4o (SoM Agent) and UI-TARS-1.5-7B. Crucially, Fara-7B introduces and excels on WebTailBench, a new benchmark focusing on 11 real-world task types underrepresented in existing evaluations, such as finding job postings and comparing prices. Fara-7B "breaks ground on a new pareto frontier" when considering accuracy and cost efficiency on WebVoyager.We also cover the essential focus on safety and responsible deployment. Fara-7B's training enforces stopping at "Critical Points"—situations requiring user data or consent—before proceeding with irreversible actions.Fara-7B is available open-weight on Microsoft Foundry and Hugging Face under an MIT license. We discuss how developers can utilize the quantized and silicon-optimized version for turnkey experimentation on Copilot+ PCs powered by Windows 11. This experimental release invites the community to build and test agentic experiences beyond pure research, automating everyday tasks like form filling, searching, shopping, and booking travel





