TensorFlow 3D你了解多少?
TF 3D背景和意义
AR的兴起
谷歌推出增强现实平台ARKit和ARCore
三星为其Galaxy Note 10和Galaxy S10 5G恢复了飞行时间(ToF)传感器
谷歌也在其Pixel 4中的Project Soli对雷达进行了简短介绍
苹果在其TrueDepth前置摄像头取得突破后,为最新的旗舰系列机型iPhone 12 Pro和iPad Pro系列产品添加了LiDAR传感器
机器学习对于创建高级AR体验是必不可少的
基于对AI研究的关注,谷歌在AR的未来中扮演着与苹果、Facebook、Snap以及微软一样重要的角色
TF 3D概览
提供3D操作和工具
TF 3D provides a set of popular operations, loss functions, data processing tools, models and metrics that enables the broader research community to develop, train and deploy state-of-the-art 3D scene understanding models
支持3大类3D算法
TF 3D contains training and evaluation pipelines for state-of-the-art 3D semantic segmentation, 3D object detection and 3D instance segmentation, with support for distributed training.
提供数据集管理功能,提供3大经典数据集使用案例
It offers a unified dataset specification and configuration for training and evaluation of the standard 3D scene understanding datasets. It currently supports the Waymo Open, ScanNet, and Rio datasets.
3D 稀疏卷积网络
稀疏卷积的背景
scene that contains a set of objects of interest (e.g. cars, pedestrians, etc.) surrounded mostly by open space, which is of limited (or no) interest. As such, 3D data is inherently sparse
3D数据集支持
Frame
frame level data like color and depth camera images, point cloud, camera intrinsics, groundtruth semantic and instance segmentations annotations
Scene
point-cloud/mesh data of a whole scene and a lightweight information to all frames in the scene
数据集样例:Waymo Open dataset
Scene
peach 20 second snippet of car's journey
Frame pdata collected within a small time range (approximately single timestamp)
vehicle frame poses are provided w.r.t a global world frame for a particular scene
数据集样例:RIO: 3D Object Instance Re-Localization
multiple 3D snapshots of naturally changing indoor environments, designed for benchmarking emerging tasks such as long-term SLAM, scene change detection and object instance re-localization
TF3D Models(1)
l模型定义
输入:a call function that receives a dictionary of input tensors
运行: runs a deep network with potentially some postprocessing;applying the loss functions to the outputs
输出:a dictionary of the computed output tensors
深度学习网络
输入:a sparse set of voxel indices
输出:the length, height, width, rotation matrix, center, and logits for each voxel
后处理:predictions are aggregated and a set of objects are proposed
TF3D Models(2)
Unet 网络
An encoder that downsamples the input sparse voxels.
A bottleneck,
A decoder (with skip connections) that upsamples the sparse voxel features back to original resolution
l调优
pThe computation scale can be adjusted by changing SparseConvUNet's conv_filter_size and encoder/bottleneck/decoder_dimensions parameter
HourGlass Network (tf3d/layers/sparse_voxel_hourglass.py) is one or multiple stacked UNet Networks.
输出head
output head takes in (sparse) voxel features as input, and produces per-voxel prediction with a specific dimension
- 点赞
- 收藏
- 关注作者
评论(0)