Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping
AuthorsJustin Lazarow, Kai Kang, Afshin Dehghan
Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping
AuthorsJustin Lazarow, Kai Kang, Afshin Dehghan
We revisit scene-level 3D object detection as the output of an object-centric framework capable of both localization and mapping using 3D oriented boxes as the underlying geometric primitive. While existing 3D object detection approaches operate globally and implicitly rely on the a priori existence of metric camera poses, our method, Rooms from Motion (RfM) operates on a collection of un-posed images. By replacing the standard 2D keypoint-based matcher of structure-from-motion with an object-centric matcher based on image-derived 3D boxes, we estimate metric camera poses, object tracks, and finally produce a global, semantic 3D object map. When a priori pose is available, we can significantly improve map quality through optimization of global 3D boxes against individual observations. RfM shows strong localization performance and subsequently produces maps of higher quality than leading point-based and multi-view 3D object detection methods on CA-1M and ScanNet++, despite these global methods relying on overparameterization through point clouds or dense volumes. Rooms from Motion achieves a general, object-centric representation which not only extends the work of Cubify Anything to full scenes but also allows for inherently sparse localization and parametric mapping proportional to the number of objects in a scene.
Cubify Anything: Scaling Indoor 3D Object Detection
May 21, 2025research area Computer Visionconference CVPR
We consider indoor 3D object detection with respect to a single RGB(-D) frame acquired from a commodity handheld device. We seek to significantly advance the status quo with respect to both data and modeling. First, we establish that existing datasets have significant limitations to scale, accuracy, and diversity of objects. As a result, we introduce the Cubify-Anything 1M (CA-1M) dataset, which exhaustively labels over 400K 3D objects on over 1K…
3D Scene understanding has been an active area of machine learning (ML) research for more than a decade. More recently the release of LiDAR sensor functionality in Apple iPhone and iPad has begun a new era in scene understanding for the computer vision and developer communities. Fundamental research in scene understanding combined with the advances in ML can now impact everyday experiences. A variety of methods are addressing different parts of the challenge, like depth estimation, 3D reconstruction, instance segmentation, object detection, and more. Among these problems, creating a 3D floor plan is becoming key for many applications in augmented reality, robotics, e-commerce, games, and real estate.