import os
import tensorrt as trt
os.environ["CUDA_VISIBLE_DEVICES"]='0'
TRT_LOGGER = trt.Logger()
onnx_file_path = 'Unet375-simple.onnx'
engine_file_path = 'Unet337.trt'
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = 1 << 28 # 256MiB
builder.max_batch_size = 1
# Parse model file
if not os.path.exists(onnx_file_path):
print('ONNX file {} not found, please run yolov3_to_onnx.py first to generate it.'.format(onnx_file_path))
exit(0)
print('Loading ONNX file from path {}...'.format(onnx_file_path))
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
if not parser.parse(model.read()):
print ('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print (parser.get_error(error))
network.get_input(0).shape = [1, 3, 300, 400]
print('Completed parsing of ONNX file')
print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
#network.mark_output(network.get_layer(network.num_layers-1).get_output(0))
engine = builder.build_cuda_engine(network)
print("Completed creating Engine")
with open(engine_file_path, "wb") as f:
f.write(engine.serialize())
trtexec --loadEngine=mnist16.trt --batch=1
打印输出:
trtexec会打印出很多时间,这里需要对每个时间的含义进行解释,然后大家各取所需,进行评测。总的打印如下:
[09/06/2021-13:50:34] [I] Average on 10 runs - GPU latency: 2.74553 ms - Host latency: 3.74192 ms (end to end 4.93066 ms, enqueue 0.624805 ms) # 跑了10次,GPU latency: GPU计算耗时, Host latency:GPU输入+计算+输出耗时,end to end:GPU端到端的耗时,eventout - eventin,enqueue:CPU异步耗时
[09/06/2021-13:50:34] [I] Host Latency
[09/06/2021-13:50:34] [I] min: 3.65332 ms (end to end 3.67603 ms)
[09/06/2021-13:50:34] [I] max: 5.95093 ms (end to end 6.88892 ms)
[09/06/2021-13:50:34] [I] mean: 3.71375 ms (end to end 5.30082 ms)
[09/06/2021-13:50:34] [I] median: 3.70032 ms (end to end 5.32935 ms)
[09/06/2021-13:50:34] [I] percentile: 4.10571 ms at 99% (end to end 6.11792 ms at 99%)
[09/06/2021-13:50:34] [I] throughput: 356.786 qps
[09/06/2021-13:50:34] [I] walltime: 3.00741 s
[09/06/2021-13:50:34] [I] Enqueue Time
[09/06/2021-13:50:34] [I] min: 0.248474 ms
[09/06/2021-13:50:34] [I] max: 2.12134 ms
[09/06/2021-13:50:34] [I] median: 0.273987 ms
[09/06/2021-13:50:34] [I] GPU Compute
[09/06/2021-13:50:34] [I] min: 2.69702 ms
[09/06/2021-13:50:34] [I] max: 4.99219 ms
[09/06/2021-13:50:34] [I] mean: 2.73299 ms
[09/06/2021-13:50:34] [I] median: 2.71875 ms
[09/06/2021-13:50:34] [I] percentile: 3.10791 ms at 99%
[09/06/2021-13:50:34] [I] total compute time: 2.93249 s
Host Latency gpu: 输入+计算+输出 三部分的耗时
Enqueue Time:CPU异步的时间(该时间不具有参考意义,因为GPU的计算可能还没有完成)
GPU Compute:GPU计算的耗时
综上,去了Enqueue Time时间都是有意义的
同时定位与地图构建(英语:Simultaneous localization and mapping,一般直接称SLAM)是一种概念:希望机器人从未知环境的未知地点出发,在运动过程中通过重复观测到的地图特征(比如,墙角,柱子等)定位自身位置和姿态,再根据自身位置增量式的构建地图,从而达到同时定位和地图构建的目的。
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (n.d.). A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Volume 1 (CVPR’06) (Vol. 1, pp. 519–528). IEEE. https://doi.org/10.1109/CVPR.2006.19
[2] N.Snavely, S.M. Seitz, R.Szeliski, “Modeling the World from Internet Photo Collections”, International Journal of Computer Vision, vol.80, no.2, 2008.
[3] Y. Furukawa, J.Ponce, “Accurate, Dense and Robust Multi-view Stereopsis” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
提出了一种新的神经渲染方法neural rendering approach MVSNeRF,它可以有效地重建用于视图合成的神经辐射场。与先前关于神经辐射场的工作不同,这些工作考虑对密集捕获的图像进行逐场景优化,我们提出了一种通用的深度神经网络,该网络可以通过快速网络推理仅从三个附近的输入视图重建辐射场。我们的方法利用平面扫描成本体plane-swept cost volumes(广泛用于多视图立体multi-view stereo)进行几何感知场景推理,并将其与基于物理的体渲染相结合进行神经辐射场重建。我们在DTU数据集中的真实对象上训练我们的网络,并在三个不同的数据集上测试它以评估它的有效性和可推广性generalizability我们的方法可以跨场景(甚至室内场景,完全不同于我们的对象训练场景)进行推广generalize across scenes,并仅使用三幅输入图像生成逼真的视图合成结果,明显优于目前的广义辐射场重建generalizable radiance field reconstruction工作。此外,如果捕捉到密集图像dense images are captured,我们估计的辐射场表示可以容易地微调easily fine-tuned;这导致快速的逐场景重建fast per-scene reconstruction,比NeRF具有更高的渲染质量和更少的优化时间。
我们利用最近在深度多视图立体(MVS)deep multi- view stereo (MVS)上的成功[50,18,10]。这一系列工作可以通过对成本体积应用3D卷积applying 3D convolutions on cost volumes来训练用于3D重建任务的可概括的神经网络。与[50]类似,我们通过将来自附近输入视图的2D图像特征(由2D CNN推断)扭曲warping到参考视图的平截头体中的扫描平面上sweeping planes in the reference view’s frustrum,在输入参考视图处构建成本体。不像MVS方法[50,10]仅在这样的成本体积上进行深度推断depth inference,我们的网络推理关于场景几何形状和外观reasons about both scene geometry and appearance,并输出神经辐射场(见图2),实现视图合成。具体来说,利用3D CNN,我们重建(从成本体)神经场景编码体neural scene encoding volume,其由 编码关于局部场景几何形状和外观的信息的每个体素神经特征per-voxel neural features 组成。然后,我们利用多层感知器(MLP)在编码体积encoding volume内使用三线性插值神经特征tri-inearly interpolated neural features来解码任意连续位置处的体积密度volume density和辐射度radiance。本质上,编码体是辐射场的局部神经表示;一旦估计,该体积可以直接用于(丢弃3D CNN)通过可微分射线行进differentiable ray marching(如在[34]中)的最终渲染。