View publication

We present a novel method for generating, predicting, and using Spatiotemporal Occupancy Grid Maps (SOGM), which embed future information of dynamic scenes. Our automated generation process creates groundtruth SOGMs from previous navigation data. We build on prior work to annotate lidar points based on their dynamic properties, which are then projected on time-stamped 2D grids: SOGMs. We design a 3D-2D feedforward architecture, trained to predict the future time steps of SOGMs, given 3D lidar frames as input. Our pipeline is entirely self-supervised, thus enabling lifelong learning for robots. The network is composed of a 3D back-end that extracts rich features and enables the semantic segmentation of the lidar frames, and a 2D front-end that predicts the future information embedded in the SOGMs within planning. We also design a navigation pipeline that uses these predicted SOGMs. We provide both quantitative and qualitative insights into the predictions and validate our choices of network design with a comparison to the state of the art and ablation studies.

Related readings and updates.

3D Parametric Room Representation with RoomPlan

3D Scene understanding has been an active area of machine learning (ML) research for more than a decade. More recently the release of LiDAR sensor functionality in Apple iPhone and iPad has begun a new era in scene understanding for the computer vision and developer communities. Fundamental research in scene understanding combined with the advances in ML can now impact everyday experiences. A variety of methods are addressing different parts of the challenge, like depth estimation, 3D reconstruction, instance segmentation, object detection, and more. Among these problems, creating a 3D floor plan is becoming key for many applications in augmented reality, robotics, e-commerce, games, and real estate.

See highlight details

Self-Supervised Learning of Lidar Segmentation for Autonomous Indoor Navigation

We present a self-supervised learning approach for the semantic segmentation of lidar frames. Our method is used to train a deep point cloud segmentation architecture without any human annotation. The annotation process is automated with the combination of simultaneous localization and mapping (SLAM) and ray-tracing algorithms. By performing multiple navigation sessions in the same environment, we are able to identify permanent structures, such…
See paper details