Learning Spatiotemporal Occupancy Grid Maps for Lifelong Navigation in Dynamic Scenes
In collaboration with University of Toronto
AuthorsHugues Thomas, Matthieu Gallet de Saint Aurin, Jian Zhang, Timothy D. Barfoot
We present a novel method for generating, predicting, and using Spatiotemporal Occupancy Grid Maps (SOGM), which embed future information of dynamic scenes. Our automated generation process creates groundtruth SOGMs from previous navigation data. We build on prior work to annotate lidar points based on their dynamic properties, which are then projected on time-stamped 2D grids: SOGMs. We design a 3D-2D feedforward architecture, trained to predict the future time steps of SOGMs, given 3D lidar frames as input. Our pipeline is entirely self-supervised, thus enabling lifelong learning for robots. The network is composed of a 3D back-end that extracts rich features and enables the semantic segmentation of the lidar frames, and a 2D front-end that predicts the future information embedded in the SOGMs within planning. We also design a navigation pipeline that uses these predicted SOGMs. We provide both quantitative and qualitative insights into the predictions and validate our choices of network design with a comparison to the state of the art and ablation studies.
3D Scene understanding has been an active area of machine learning (ML) research for more than a decade. More recently the release of LiDAR sensor functionality in Apple iPhone and iPad has begun a new era in scene understanding for the computer vision and developer communities. Fundamental research in scene understanding combined with the advances in ML can now impact everyday experiences. A variety of methods are addressing different parts of the challenge, like depth estimation, 3D reconstruction, instance segmentation, object detection, and more. Among these problems, creating a 3D floor plan is becoming key for many applications in augmented reality, robotics, e-commerce, games, and real estate.