Abstract

Teaser image

LiDAR registration is a fundamental task in robotic mapping and localization. A critical component of aligning two point clouds is identifying robust point correspondences using point descriptors. This step becomes particularly challenging in scenarios involving domain shifts, seasonal changes, and variations in point cloud structures. These factors substantially impact both handcrafted and learning-based approaches. In this paper, we address these problems by proposing to use DINOv2 features, obtained from surround-view images, as point descriptors. We demonstrate that coupling these descriptors with traditional registration algorithms, such as RANSAC or ICP, facilitates robust 6DoF alignment of LiDAR scans with 3D maps, even when the map was recorded more than a year before. Although conceptually straightforward, our method substantially outperforms more complex baseline techniques. In contrast to previous learning-based point descriptors, our method does not require domain-specific retraining and is agnostic to the point cloud structure, effectively handling both sparse LiDAR scans and dense 3D maps. We show that leveraging the additional camera data enables our method to outperform the best baseline by +24.8 and +17.3 registration recall on the NCLT and Oxford RobotCar datasets.

Technical Approach

Overview of our approach

In this work, we address the task of long-term scan-to-map registration by leveraging the advances made by recent visual foundation models. Our main contribution is to demonstrate that using DINOv2 features, obtained from surround-view images, as point descriptors allows for finding highly robust point correspondence. Our proposed approach consists of three steps: 1) We extract DINOv2 features from surround-view image data. These features are then attached to the point cloud as point descriptors via point-to-pixel projection. 2) We perform a point-wise similarity search using cosine similarity between the descriptors of the LiDAR scan and the descriptors of the voxelized 3D map. 3) We use a traditional coarse-to-fine registration scheme with RANSAC and point-to-point ICP for obtaining a highly accurate pose estimate within the provided map frame.

Video

Code

A software implementation of this project based on PyTorch can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

If you find our work useful, please consider citing our paper:

Niclas Vödisch, Giovanni Cioffi, Marco Cannici, Wolfram Burgard, and Davide Scaramuzza
LiDAR Registration with Visual Foundation Models
arXiv preprint arXiv:2502.19374, 2025.

(PDF) (BibTeX)

Authors

Niclas Vödisch

Niclas Vödisch

University of Freiburg

Giovanni Cioffi

Giovanni Cioffi

University of Zurich

Marco Cannici

Marco Cannici

University of Zurich

Wolfram Burgard

Wolfram Burgard

University of Technology Nuremberg

Davide Scaramuzza

Davide Scaramuzza

University of Zurich

Acknowledgment

This work was partially supported by a fellowship of the German Academic Exchange Service (DAAD). Niclas Vödisch acknowledges travel support from the European Union’s Horizon 2020 research and innovation program under ELISE grant agreement No. 951847 and from the ELSA Mobility Program within the project European Lighthouse On Safe And Secure AI (ELSA) under the grant agreement No. 101070617.