Training Software Engineering Agents and Verifiers with SWE-Gym

Update: 2025-01-01

Description

🤗 Upvotes: 6 | cs.SE, cs.CL

Authors:

Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, Yizhe Zhang

Title:

Training Software Engineering Agents and Verifiers with SWE-Gym

Arxiv:

http://arxiv.org/abs/2412.21139v1

Abstract:

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents , achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also experiment with inference-time scaling through verifiers trained on agent trajectories sampled from SWE-Gym. When combined with our fine-tuned SWE agents, we achieve 32.0% and 26.0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents. To facilitate further research, we publicly release SWE-Gym, models, and agent trajectories.

Comments

In Channel

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

2025-12-2324:01

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

2025-12-2325:33

When Reasoning Meets Its Laws

2025-12-2321:45

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

2025-12-2325:34

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

2025-12-2326:30

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

2025-12-2323:40

Are We on the Right Way to Assessing LLM-as-a-Judge?

2025-12-2323:16

Kling-Omni Technical Report

2025-12-2024:17

Adaptation of Agentic AI

2025-12-2026:20

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

2025-12-2026:32

Next-Embedding Prediction Makes Strong Vision Learners

2025-12-2022:00

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

2025-12-2024:04

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

2025-12-2022:14

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

2025-12-2021:29

Generative Refocusing: Flexible Defocus Control from a Single Image

2025-12-2025:27

DeContext as Defense: Safe Image Editing in Diffusion Transformers

2025-12-2023:34

Step-GUI Technical Report

2025-12-1926:21

DEER: Draft with Diffusion, Verify with Autoregressive Models

2025-12-1925:44

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

2025-12-1921:47

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

2025-12-1922:07

00:00

Training Software Engineering Agents and Verifiers with SWE-Gym

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-176653610897425{-webkit-line-clamp:2;}Training Software Engineering Agents and Verifiers with SWE-Gym

Training Software Engineering Agents and Verifiers with SWE-Gym

Jingwen Liang, Gengyu Wang

Training Software Engineering Agents and Verifiers with SWE-Gym