Pixel-level Video Understanding in the Wild

Workshop in conjunction with CVPR 2025

June 2025

Music City Center, Nashville TN

Introduction

The 4th PVUW challenge will be held in conjunction with CVPR 2025 at the Music City Center, Nashville TN. Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video segmentation is more reasonable and practical for realistic applications. To advance the segmentation task from images to videos, we will present new datasets and competitions in this workshop, aiming at performing the challenging yet practical Pixel-level Video Understanding in the Wild (PVUW). This workshop will cover but not limit to the following topics:

  • Semantic/panoptic segmentation for images/videos
  • Referring image/video comprehension/segmentation
  • Video object/instance segmentation
  • Video understanding in complex environments
  • Language-guided video understanding
  • Audio-guided video segmentation
  • Efficient computation for video scene parsing
  • Semi-supervised recognition in videos
  • New metrics to evaluate the quality of video scene parsing results
  • Real-world video applications, e.g., autonomous driving, indoor robotics, visual navigation, etc.

Challenge Tracks & Submission



Track 1: Complex Video Object Segmentation (MOSE) Track

MOSE aims to track and segment objects in videos of complex environments. MOSE submission server [click here].

Track 2: Motion Expression guided Video Segmentation (MeViS) Track

MeViS focuses on segmenting objects in video based on a sentence describing the motion of the objects. MeViS submission server [click here].

Challenge Timeline

Event Date
Challenge release Mar 01, 2025
Validation server online [click here] Mar 01, 2025
Test server online Mar 15, 2025
Submission deadline Mar 25, 2025
Notification Mar 27, 2025
*All dates are in UTC, 23:59 of the specified day.

Call for Paper

[Update] PVUW 2025 will be using [OpenReview] to manage submissions. We are looking forward to your work and engaging discussions at the workshop!


We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) through the paper submission portal.


Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Paper Submission Timeline

[Update] We just received updates form IEEE that the metadata submission deadline is extended, and we can keep the original submission timeline unchanged. Please follow the timeline in the table and submit your paper before Mar 25. All accepted papers will be published in the official CVPR Workshops Proceedings. Sorry for the confusion!



Event Date
Regular paper submission deadline Mar 25, 2025
Supplemental material deadline Mar 25, 2025
Notification of paper acceptance Mar 31, 2025
Challenge paper submission deadline Apr 1, 2025
Camera ready deadline Apr 7, 2025
*All dates are in UTC, 23:59 of the specified day.

Speakers

Xiangtai Li

ByteDance

Hengshuang Zhao

The University of Hong Kong

Organizers

Henghui Ding

Fudan University

Nikhila Ravi

META AI

Chang Liu

Nanyang Technological University

Yunchao Wei

Beijing Jiaotong University

Jiaxu Miao

Sun Yat-Sen University

Shuting He

Shanghai University of Finance and Economics

Zuxuan Wu

Fudan University

Zongxin Yang

Harvard University

Yi Yang

Zhejiang University

Si Liu

Beihang University

Yi Zhu

Amazon

Elisa Ricci

University of Trento

Cees Snoek

University of Amsterdam

Song Bai

ByteDance

Philip Torr

University of Oxford

Contact