Pixel-level Video Understanding in the Wild

Workshop in conjunction with CVPR 2025

June 2025

Music City Center, Nashville TN

Introduction

The 4th PVUW challenge will be held in conjunction with CVPR 2025 at the Music City Center, Nashville TN. Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video segmentation is more reasonable and practical for realistic applications. To advance the segmentation task from images to videos, we will present new datasets and competitions in this workshop, aiming at performing the challenging yet practical Pixel-level Video Understanding in the Wild (PVUW). This workshop includes workshop papers:

  • Semantic/panoptic segmentation for images/videos
  • Referring image/video comprehension/segmentation
  • Video object/instance segmentation
  • Video understanding in complex environments
  • Language-guided video understanding
  • Audio-guided video segmentation
  • Efficient computation for video scene parsing
  • Semi-supervised recognition in videos
  • New metrics to evaluate the quality of video scene parsing results
  • Real-world video applications, e.g., autonomous driving, indoor robotics, visual navigation, etc.
  • Dates

    Event Date
    Challenge release TBD
    Validation server online TBD
    Test server online TBD
    Submission deadline TBD
    Notification TBD

    Speakers

    Xiangtai Li

    ByteDance

    Hengshuang Zhao

    The University of Hong Kong

    Tracks & Submission



    Track 1: Complex Video Object Segmentation (MOSE) Track

    The complex video object segmentation task aims to track and segment objects in complex environments.

    Track 2: Motion Expression guided Video Segmentation (MeViS) Track

    The motion expression guided video segmentation track focuses on segmenting objects in video content based on a sentence describing the motion of the objects.

    Organizers

    Henghui Ding

    Fudan University

    Nikhila Ravi

    META AI

    Chang Liu

    Nanyang Technological University

    Yunchao Wei

    Beijing Jiaotong University

    Jiaxu Miao

    Sun Yat-Sen University

    Shuting He

    Shanghai University of Finance and Economics

    Zuxuan Wu

    Fudan University

    Zongxin Yang

    Harvard University

    Yi Yang

    Zhejiang University

    Si Liu

    Beihang University

    Yi Zhu

    Amazon

    Elisa Ricci

    University of Trento

    Cees Snoek

    University of Amsterdam

    Song Bai

    ByteDance

    Philip Torr

    University of Oxford

    Contact