CoPerception-UAV

A Virtual Collaborative Perception Dataset for UAVs.

About CoPerception-UAV

Coperception-UAV is the first comprehensive dataset for UAV-based collaborative perception.
A UAV swarm has the potential to distribute tasks and achieve better, faster, and more robust performances than a single UAV. Recently, planning and control of a UAV swarm have been intensively studied; however, the collaborative perception remains under-explored due to the lack of a comprehensive dataset. This work aims to fill this gap and proposes a collaborative perception dataset for UAV swarm.
Based on the co-simulation platform of AirSim and CARLA, our dataset consists of 131.9k synchronous images collected from 5 coordinated UAVs flying at 3 altitudes over 3 simulated towns with 2 swarm formations. Each image is fully annotated with the pixel-wise semantic segmentation labels and 2D/3D bounding boxes of vehicles. We further build a benchmark on the proposed dataset by evaluating a variety of related multi-agent collaborative methods on multiple perception tasks, including object detection, semantic segmentation, and bird's eye-view (BEV) semantic segmentation.




Swarm Arrangement

We arrange two types of formation modes for a UAV swarm: discipline mode, where the swarm keeps a consistent and relatively static array, and dynamic mode, where each UAV navigates independently in the scene.

teaser




Sensor Setup

In the UAV swarm, Each UAV is equipped with 5 RGB cameras in 5 directions and 5 semantic cameras collecting semantic ground truth for RGB cameras.
- 90° horizontal FoV
- 1 bird's eye view camera and 4 cameras facing forward, backward, right, and left with a pitch degree of -45◦
- image size: 800x450 pixels

teaser

Data Annotations


Fully-annotated data are provided in the dataset, including synchronous images with pixel-wise semantic labels, 2D & 3D bounding boxes of vehicles, and BEV semantic map.


Camera data

We collect synchronous images from all cameras on 5 UAVs, which is 25 images in a sample. In total, 123.8K images are collected for the discipline swarm mode and 8.1K for the dynamic swarm mode.
agent_0
Agent_0_front
agent_1
Agent_1_front
agent_2
Agent_2_front



Semantic label

We provide semantic label for each image.
agent_0
Agent_0_front
agent_1
Agent_1_front
agent_2
Agent_2_front



Bounding boxes

3D bounding boxes of vehicles are recorded at the same moment with images, including location (x, y, z), rotation (w, x, y, z in quaternion) in the global coordinate and their length, width and height.
To specifically address the occlusion issue, we also provide a binary label for the occlusion status of each bounding box.
agent_0
Agent_0_front
agent_1
Agent_1_front
agent_2
Agent_2_front



BEV semantic label

We provide BEV segmentation labels of four categories: roadway, building, vehicle, and others, which are the key elements to construct the layout of a city and foreground objects. The resolution of the BEV map is 0.25mx0.25m.
agent_0
Agent_0_front
agent_1
Agent_1_front
agent_2
Agent_2_front

Citation

If you use the DAIR-V2X dataset and our complemented annotations, please cite:
   @inproceedings{Where2comm:22,
      author    = {Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, Siheng Chen},
      title     = {Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps},
      booktitle = {Thirty-sixth Conference on Neural Information Processing Systems (Neurips)},
      month     = {November},  
      year      = {2022}
      }
   }