MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Update: 2025-01-04

Description

🤗 Upvotes: 16 | cs.CL

Authors:

Mahir Labib Dihan, Md Tanvir Hassan, Md Tanvir Parvez, Md Hasebul Hasan, Md Almash Alam, Muhammad Aamir Cheema, Mohammed Eunus Ali, Md Rizwan Parvez

Title:

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Arxiv:

http://arxiv.org/abs/2501.00316v1

Abstract:

Recent advancements in foundation models have enhanced AI systems' capabilities in autonomous tool usage and reasoning. However, their ability in location or map-based reasoning - which improves daily life by optimizing navigation, facilitating resource discovery, and streamlining logistics - has not been systematically studied. To bridge this gap, we introduce MapEval, a benchmark designed to assess diverse and complex map-based user queries with geo-spatial reasoning. MapEval features three task types (textual, API-based, and visual) that require collecting world information via map tools, processing heterogeneous geo-spatial contexts (e.g., named entities, travel distances, user reviews or ratings, images), and compositional reasoning, which all state-of-the-art foundation models find challenging. Comprising 700 unique multiple-choice questions about locations across 180 cities and 54 countries, MapEval evaluates foundation models' ability to handle spatial relationships, map infographics, travel planning, and navigation challenges. Using MapEval, we conducted a comprehensive evaluation of 28 prominent foundation models. While no single model excelled across all tasks, Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro achieved competitive performance overall. However, substantial performance gaps emerged, particularly in MapEval, where agents with Claude-3.5-Sonnet outperformed GPT-4o and Gemini-1.5-Pro by 16% and 21%, respectively, and the gaps became even more amplified when compared to open-source LLMs. Our detailed analyses provide insights into the strengths and weaknesses of current models, though all models still fall short of human performance by more than 20% on average, struggling with complex map images and rigorous geo-spatial reasoning. This gap highlights MapEval's critical role in advancing general-purpose foundation models with stronger geo-spatial understanding.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

2025-01-0724:44

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

2025-01-0720:37

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

2025-01-0723:02

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

2025-01-0722:38

SDPO: Segment-Level Direct Preference Optimization for Social Agents

2025-01-0719:44

Graph Generative Pre-trained Transformer

2025-01-0720:24

LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models

2025-01-0723:14

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

2025-01-0725:56

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

2025-01-0423:53

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

2025-01-0423:32

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

2025-01-0419:15

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

2025-01-0424:49

ProgCo: Program Helps Self-Correction of Large Language Models

2025-01-0420:19

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

2025-01-0425:32

A3: Android Agent Arena for Mobile GUI Agents

2025-01-0423:35

MLLM-as-a-Judge for Image Safety without Human Labeling

2025-01-0422:20

Dynamic Scaling of Unit Tests for Code Reward Modeling

2025-01-0421:52

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

2025-01-0322:38

Xmodel-2 Technical Report

2025-01-0317:16

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

2025-01-0324:50

00:00

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-173623556296646{-webkit-line-clamp:2;}MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Jingwen Liang, Gengyu Wang

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models