Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

ECCV 2024 Workshop @ Milano, Italy, Sep 30th Monday


Visual perception and comprehension data samples from CODA-LM.


Track 1: Corner Case Scene Understanding

This competition is dedicated to enhancing multimodal perception and comprehension capabilities of MLLMs for autonomous driving, focusing on the global scene understanding, local area reasoning, and actionable navigation. With our CODA-LM dataset, constructed from CODA dataset and including ~10K images with the corresponding textual descriptions covering global driving scenarios, detailed analyses of corner cases, and future driving recommendation, this competition seeks to promote the development of more reliable and interpretable autonomous driving agents.


Challenge Progress


Leaderboard

Last update: June 30, 2024


Task Description

General Perception.  The foundational aspect of the general perception task lies in the comprehensive understanding of key entities within driving scenes, including the appearance, location, and the reasons they influence driving. Given a first-view road driving scene, MLLMs are required not only to describe all the potential road obstacles in the traffic scenes but also to provide explanations about why they would affect the driving decisions. We primarily focus on seven categories of obstacles, including vehicles, vulnerable road users (VRUs), traffic cones, traffic lights, traffic signs, barriers, and miscellaneous.

Region Perception.  This task measures the MLLMs' capability to understand corner objects when provided with specific bounding boxes, followed by describing these objects and explaining why they might affect self-driving behavior. Note that for region perception, there are no constraints on how to use bounding boxes in MLLMs. Any possible encoding leading to better performance is feasible. We provide an example by visualizing the bounding boxes on images with red rectangles here for reference.

Driving Suggestions.  This task aims to evaluate MLLMs' capability in formulating driving advice in the autonomous driving domain. This task is closely related to the planning process of autonomous driving, requiring MLLMs to provide optimal driving suggestions for the ego car after correctly perceiving the general and regional aspects of the current driving environment.


Data Description

Training.  CODA-LM is a real-world multimodal autonomous driving dataset. Constructed on CODA dataset, CODA-LM contains high-quality annotations for general perception, region perception, and driving suggestions. The training dataset contains 4884 samples, with a validation set with 4384 samples optionally for training. For more details, please refer to the arxiv report and dataset website.
Besides CODA-LM, any external training resources are all allowed, which, however, should be discussed in technical reports.

Evaluation.  CODA-LM test set, containing 500 samples, is used for our evaluation. For each of the three tasks, we follow the official evaluation protocols of CODA-LM and calculate a GPT-Score separately using a GPT-4 judge, with the Final-Score as the average. $$\text{Final-Score}= \frac{1}{3}(\text{GPT-Score}_{\text{General Perception}} + \text{GPT-Score}_{\text{Region Perception}} + \text{GPT-Score}_{\text{Driving Suggestions}}) $$

Download.  Check the official Github Repo of CODA-LM for more details.

Terms of Use.  CODA-LM dataset is released only for academic research, which is free to researchers from educational or research institutions for non-commercial purposes. When downloading the dataset you agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data. For the full terms of use information, please refer to CODA/term of use website.


Submission

To submit your results, you must strictly complete the following steps, otherwise your submissions will not be considered valid.

  • For the first time, register your team information here. Each team can only have one registration, and each registration is valid for one single track. Note that your Team Name will be used as reference for evaluation latter.
  • Download CODA-LM, and convert annotations into basic VQA formats via this script. After conversion, you will have three jsonl files (general_perception.jsonl, region_perception.jsonl, driving_suggestion.jsonl) under $CODA_ROOT/Test/vqa_anno, all in the following formats:
  • {"question_id": 0, "image": test/images/0001.jpg, "question": <str>}
    {"question_id": 1, "image": test/images/0012.jpg, "question": <str>}
    {"question_id": 2, "image": test/images/0013.jpg, "question": <str>}
    ...
    
  • Run your inference with the above annotation files, and save your responses with an extra answer key. You must not modify the contents of question_id, image and question, which are essential for evaluation. Note for region perception, even if you are encouraged to utilize different bbox encodings, you still need to maintain contents of the three keys. Your results should be like:
  • {"question_id": 0, "image": test/images/0001.jpg, "question": <str>, "answer": <str>}
    {"question_id": 1, "image": test/images/0012.jpg, "question": <str>, "answer": <str>}
    {"question_id": 2, "image": test/images/0013.jpg, "question": <str>, "answer": <str>}
    ...
    
  • Name results as general_perception_answer.jsonl, region_perception_answer.jsonl, driving_suggestion_answer.jsonl, zip them in results.zip, and submit via OneDrive. To better understand the submission format, we also provide an example submission here. Enter your Track and Team Name and click Upload (Our server gathers results every 5 minutes. Please refrain from uploading too frequently).
  • Please strictly follow the naming format below when submitting, otherwise your submission will be considered invalid!
    • File name: must be results.zip, instead of answer.zip, sample_answer.zip, etc.
    • First name: must be Track1, instead of Track 1, Track_1, etc.
    • Last name: must be your full Team name in the registration form, which will be used as the key to match your registration information. You must not separate your Team name into different parts!
    • File structure: your submission should only contain the three jsonl files. You must not create any new folders, especially for Mac users on the __MACOSX directory.
    • Any violation will result in an error when dealing with the submission file, and thus, considered as invalid.
  • Once a week, we will download all the latest submissions, run evaluation, and update Challenge Progress and Leaderboard.


General Rules

  • To ensure fairness, the Top-3 winners are required to submit a technical report for re-productivity verification.
  • Each entry is required to be associated to a team and its affiliation (all members of one team must register as one).
  • Using multiple accounts to increase the number of submissions is strictly prohibited.
  • Results should follow the correct format and be submitted following the instructions, which, otherwise, will not be considered as valid submissions. Detailed information is provided on the Submission section.
  • The best entry of each team is public in the leaderboard at all time.
  • The organizer reserves the absolute right to disqualify entries that are incomplete, illegible, late, or violating the rules.


Awards

Participants with the most successful and innovative entries will be invited to present at this workshop and receive awards. A 1,000 USD cash prize will be awarded to the top team, while the 2nd and 3rd will be awarded with 800 USD and 600 USD separately.


Contact

To contact the organizers please use w-coda2024@googlegroups.com