SurfaceAug: Closing the Gap in Multimodal Ground Truth Sampling

by Ryan Rubel, Nathan Clark, Andrew Dudash

Despite recent advances in both model architectures and data augmentation, multimodal object detectors still barely outperform their LiDAR-only counterparts. This shortcoming has been attributed to a lack of sufficiently powerful multimodal data augmentation. To address this, we present SurfaceAug, a novel ground truth sampling algorithm. SurfaceAug pastes objects by resampling both images and point clouds, enabling object-level transformations in both modalities. We evaluate our algorithm by training a multimodal detector on KITTI and compare its performance to previous works. We show experimentally that SurfaceAug outperforms existing methods on car detection tasks and establishes a new state of the art for multimodal ground truth sampling.

Submitted to the 2024 Conference on Computer Vision and Pattern Recognition; accepted at 2025 IEEE International Conference on Robotics and Automation (ICRA)

Read the full paper (pre-print)