ScaRF-SLAM: Scale-Consistent Reconstruction with Feed-Forward Models and Classical Visual SLAM
Yuhao Zhang, Yifu Tao, Frank Dellaert, Maurice Fallon
ScaRF-SLAM is a dense visual mapping framework that combines the robustness of classical visual SLAM with the reconstruction capability of modern geometric foundation models.
Description
ScaRF-SLAM decouples localization and dense mapping: a classical SLAM frontend provides reliable low-latency camera poses, while geometric foundation models are used only for feed-forward depth prediction and reconstruction. By anchoring dense mapping to robust SLAM poses and enforcing lightweight scale-consistency optimization across frames and submaps, the system produces globally consistent, high-quality 3D reconstructions while remaining robust to limited batch sizes and loop closures. The framework is compatible with monocular, stereo, mono-inertial, multi-camera, and fisheye-camera SLAM systems, making it practical for real-world robotics and large-scale mapping.
Real-World Dataset
We evaluate ScaRF-SLAM on a real-world dataset collected at the Oxford Robotics Institute with accurate ground-truth camera trajectories and LiDAR point clouds for quantitative evaluation.
The dataset uses the front fisheye camera and IMU of an Insta360 ONE RS 1-Inch rigidly mounted to a LiDAR-inertial mapping system. Ground-truth poses are obtained by registering undistorted LiDAR scans to a high-precision terrestrial laser scanner map. The public release includes five sequences together with trajectories and point clouds for reconstruction evaluation.
Key Idea
Unlike methods that depend on learned geometry for both tracking and mapping, ScaRF-SLAM wraps around an existing classical visual SLAM system. This makes it straightforward to combine strong pose estimation with modern feed-forward reconstruction models, while preserving metric scale and enabling loop-closure-aware dense mapping.
Citation
@article{zhang2026scarfslam,
title={{ScaRF-SLAM}: Scale-Consistent Reconstruction with Feed-Forward Models and Classical Visual SLAM},
author={Zhang, Yuhao and Tao, Yifu and Dellaert, Frank and Fallon, Maurice},
journal={arXiv preprint arXiv:2606.00307},
year={2026}
}