PVUW: Pixel-level Video Understanding in the Wild Challenge

Introduction

The 4th PVUW challenge will be held in conjunction with CVPR 2025 at the Music City Center, Nashville TN. Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video segmentation is more reasonable and practical for realistic applications. To advance the segmentation task from images to videos, we will present new datasets and competitions in this workshop, aiming at performing the challenging yet practical Pixel-level Video Understanding in the Wild (PVUW). This workshop will cover but not limit to the following topics:

Semantic/panoptic segmentation for images/videos
Referring image/video comprehension/segmentation
Video object/instance segmentation
Video understanding in complex environments
Language-guided video understanding
Audio-guided video segmentation
Efficient computation for video scene parsing
Semi-supervised recognition in videos
New metrics to evaluate the quality of video scene parsing results
Real-world video applications, e.g., autonomous driving, indoor robotics, visual navigation, etc.

Workshop Schedule

The workshop will be held on June 11, 2025, at conference room 105A of the Music City Center.

Event	Time
Chairs’ opening remarks	13:30 PM
Invited talk 1, Dr. Daniel Bolya, Meta, FAIR	13:45 PM
Invited talk 2, Prof. Hengshuang Zhao, The University of Hong Kong	14:30 PM
Invited talk 3, Dr. Xiangtai Li, ByteDance Inc.	15:15 PM
Break	16:00 PM
Challenge Track 1 1st-Place Winners’ Oral Presentation	16:15 PM
Challenge Track 2 1st-Place Winners’ Oral Presentation	16:25 PM
Award ceremony and concluding remarks	16:35 PM

*All times are in local conference time.

Challenge Tracks & Submission

Track 1: Complex Video Object Segmentation (MOSE) Track

MOSE aims to track and segment objects in videos of complex environments. MOSE submission server [click here].

Track 2: Motion Expression guided Video Segmentation (MeViS) Track

MeViS focuses on segmenting objects in video based on a sentence describing the motion of the objects. MeViS submission server [click here].

Challenge Timeline

Event	Date
Challenge release	Mar 01, 2025
Validation server online [click here]	Mar 01, 2025
Test server online	Mar 15, 2025
Submission deadline	Mar 25, 2025
Notification	Mar 27, 2025

*All dates are in UTC, 23:59 of the specified day.

Call for Paper

[Update] PVUW 2025 will be using [OpenReview] to manage submissions. We are looking forward to your work and engaging discussions at the workshop!

We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) through the paper submission portal.

Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Paper Submission Timeline

[Update] We just received updates form IEEE that the metadata submission deadline is extended, and we can keep the original submission timeline unchanged. Please follow the timeline in the table and submit your paper before Mar 25. All accepted papers will be published in the official CVPR Workshops Proceedings. Sorry for the confusion!

Event	Date
Regular paper submission deadline	Mar 25, 2025
Supplemental material deadline	Mar 25, 2025
Notification of paper acceptance	Mar 31, 2025
Challenge paper submission deadline	Apr 1, 2025
Camera ready deadline	Apr 7, 2025