如何在Jetson上将YOLOv5实时检测速度提升至120+FPS
【摘要】 本文将介绍如何在Jetson Orin Nano上将YOLOv5实时目标检测速度提升至120+FPS,基于Pybind11实现Python绑定,并提供ByteTrack跟踪算法的Python插件,可以在Python中实现高效目标检测和跟踪。
如何在Jetson上将YOLOv5实时检测速度提升至120+FPS

这个项目提供了基于 Pybind11 的 TensorRT YOLOv5 插件 Python 绑定,实现了令人难以置信的实时目标检测性能!
- ⚡ 超100FPS性能: 在
Jetson Orin Nano上轻松实现超过120帧/秒的检测速度 - 🎯 高精度检测: 基于成熟的
YOLOv5架构,准确识别COCO数据集上的80类目标 - 🔌 即插即用: 简单的
Python接口,无需复杂的配置 - 🛠️ 工业级优化: 采用
TensorRT进行模型优化和加速
1. Building the plugin
首先安装必要的库克隆仓库构建项目,注意JetPack 5.x版本才能正常运行:
sudo apt update
sudo apt install ffmpeg
sudo apt install pybind11-dev
git clone https://github.com/HouYanSong/yolov5_trt_pybind11.git
cd yolov5_trt_pybind11
pip install pybind11
rm -fr build
cmake -S . -B build
cmake --build build
2. Model quantization
生成量化图片对YOLOv5s模型进行Int8量化,保存量化后的模型:
./media/gen_calib.sh
./build/build weights/yolov5s.onnx 1 ./media/ ./media/filelist.txt weights/yolov5s.engine
[11/06/2025-11:57:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +221, GPU +0, now: CPU 249, GPU 4229 (MiB)
[11/06/2025-11:57:39] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +277, now: CPU 574, GPU 4529 (MiB)
[11/06/2025-11:57:39] [I] [TRT] ----------------------------------------------------------------
[11/06/2025-11:57:39] [I] [TRT] Input filename: weights/yolov5s.onnx
[11/06/2025-11:57:39] [I] [TRT] ONNX IR version: 0.0.7
[11/06/2025-11:57:39] [I] [TRT] Opset version: 12
[11/06/2025-11:57:39] [I] [TRT] Producer name:
[11/06/2025-11:57:39] [I] [TRT] Producer version:
[11/06/2025-11:57:39] [I] [TRT] Domain:
[11/06/2025-11:57:39] [I] [TRT] Model version: 0
[11/06/2025-11:57:39] [I] [TRT] Doc string:
[11/06/2025-11:57:39] [I] [TRT] ----------------------------------------------------------------
[11/06/2025-11:57:39] [I] [TRT] No importer registered for op: YoloLayer_TRT. Attempting to import as plugin.
[11/06/2025-11:57:39] [I] [TRT] Searching for plugin: YoloLayer_TRT, plugin_version: 1, plugin_namespace:
[11/06/2025-11:57:39] [I] [TRT] Successfully created plugin: YoloLayer_TRT
[11/06/2025-11:57:39] [I] sample0001.png
[11/06/2025-11:57:39] [I] sample0002.png
[11/06/2025-11:57:39] [I] sample0003.png
[11/06/2025-11:57:39] [I] sample0004.png
[11/06/2025-11:57:39] [I] sample0005.png
[11/06/2025-11:57:39] [I] sample0006.png
[11/06/2025-11:57:39] [I] sample0007.png
[11/06/2025-11:57:39] [I] sample0008.png
[11/06/2025-11:57:39] [I] sample0009.png
[11/06/2025-11:57:39] [I] sample0010.png
[11/06/2025-11:57:39] [I] sample0011.png
[11/06/2025-11:57:39] [I] sample0012.png
[11/06/2025-11:57:39] [I] sample0013.png
[11/06/2025-11:57:39] [I] sample0014.png
[11/06/2025-11:57:39] [I] sample0015.png
[11/06/2025-11:57:39] [I] sample0016.png
[11/06/2025-11:57:39] [I] sample0017.png
[11/06/2025-11:57:39] [I] sample0018.png
[11/06/2025-11:57:39] [I] sample0019.png
[11/06/2025-11:57:39] [I] sample0020.png
[11/06/2025-11:57:39] [I] sample0021.png
[11/06/2025-11:57:39] [I] sample0022.png
[11/06/2025-11:57:39] [I] sample0023.png
[11/06/2025-11:57:39] [I] sample0024.png
[11/06/2025-11:57:39] [I] sample0025.png
[11/06/2025-11:57:39] [I] sample0026.png
[11/06/2025-11:57:39] [I] sample0027.png
[11/06/2025-11:57:39] [I] sample0028.png
[11/06/2025-11:57:39] [I] sample0029.png
[11/06/2025-11:57:39] [I] sample0030.png
[11/06/2025-11:57:39] [I] sample0031.png
[11/06/2025-11:57:39] [I] sample0032.png
[11/06/2025-11:57:39] [I] sample0033.png
[11/06/2025-11:57:39] [I] sample0034.png
[11/06/2025-11:57:39] [I] sample0035.png
[11/06/2025-11:57:39] [I] sample0036.png
[11/06/2025-11:57:39] [I] sample0037.png
[11/06/2025-11:57:39] [I] sample0038.png
[11/06/2025-11:57:39] [I] sample0039.png
[11/06/2025-11:57:39] [I] sample0040.png
[11/06/2025-11:57:39] [I] sample0041.png
[11/06/2025-11:57:39] [I] sample0042.png
[11/06/2025-11:57:39] [I] sample0043.png
[11/06/2025-11:57:39] [I] sample0044.png
[11/06/2025-11:57:39] [I] sample0045.png
[11/06/2025-11:57:39] [I] sample0046.png
[11/06/2025-11:57:39] [I] sample0047.png
[11/06/2025-11:57:39] [I] sample0048.png
[11/06/2025-11:57:39] [I] sample0049.png
[11/06/2025-11:57:39] [I] sample0050.png
[11/06/2025-11:57:39] [I] sample0051.png
[11/06/2025-11:57:39] [I] sample0052.png
[11/06/2025-11:57:39] [I] sample0053.png
[11/06/2025-11:57:39] [I] sample0054.png
[11/06/2025-11:57:39] [I] sample0055.png
[11/06/2025-11:57:39] [I] sample0056.png
[11/06/2025-11:57:39] [I] sample0057.png
[11/06/2025-11:57:39] [I] sample0058.png
[11/06/2025-11:57:39] [I] sample0059.png
[11/06/2025-11:57:39] [I] sample0060.png
[11/06/2025-11:57:39] [I] sample0061.png
[11/06/2025-11:57:39] [I] sample0062.png
[11/06/2025-11:57:39] [I] sample0063.png
[11/06/2025-11:57:39] [I] sample0064.png
[11/06/2025-11:57:39] [I] sample0065.png
[11/06/2025-11:57:39] [I] sample0066.png
[11/06/2025-11:57:39] [I] sample0067.png
[11/06/2025-11:57:39] [I] sample0068.png
[11/06/2025-11:57:39] [I] sample0069.png
[11/06/2025-11:57:39] [I] sample0070.png
[11/06/2025-11:57:39] [I] sample0071.png
[11/06/2025-11:57:39] [I] sample0072.png
[11/06/2025-11:57:39] [I] sample0073.png
[11/06/2025-11:57:39] [I] sample0074.png
[11/06/2025-11:57:39] [I] sample0075.png
[11/06/2025-11:57:39] [I] sample0076.png
[11/06/2025-11:57:39] [I] sample0077.png
[11/06/2025-11:57:39] [I] sample0078.png
[11/06/2025-11:57:39] [I] sample0079.png
[11/06/2025-11:57:39] [I] sample0080.png
[11/06/2025-11:57:39] [I] sample0081.png
[11/06/2025-11:57:39] [I] sample0082.png
[11/06/2025-11:57:39] [I] sample0083.png
[11/06/2025-11:57:39] [I] sample0084.png
[11/06/2025-11:57:39] [I] sample0085.png
[11/06/2025-11:57:39] [I] sample0086.png
[11/06/2025-11:57:39] [I] sample0087.png
[11/06/2025-11:57:39] [I] sample0088.png
[11/06/2025-11:57:39] [I] sample0089.png
[11/06/2025-11:57:39] [I] sample0090.png
[11/06/2025-11:57:39] [I] sample0091.png
[11/06/2025-11:57:39] [I] sample0092.png
[11/06/2025-11:57:39] [I] sample0093.png
[11/06/2025-11:57:39] [I] sample0094.png
[11/06/2025-11:57:39] [I] sample0095.png
[11/06/2025-11:57:39] [I] sample0096.png
[11/06/2025-11:57:39] [I] sample0097.png
[11/06/2025-11:57:39] [I] sample0098.png
[11/06/2025-11:57:39] [I] sample0099.png
[11/06/2025-11:57:39] [I] sample0100.png
[11/06/2025-11:57:39] [I] sample0101.png
[11/06/2025-11:57:39] [I] sample0102.png
[11/06/2025-11:57:39] [I] sample0103.png
[11/06/2025-11:57:39] [I] sample0104.png
[11/06/2025-11:57:39] [I] sample0105.png
[11/06/2025-11:57:39] [I] sample0106.png
[11/06/2025-11:57:39] [I] sample0107.png
[11/06/2025-11:57:39] [I] sample0108.png
[11/06/2025-11:57:39] [I] sample0109.png
[11/06/2025-11:57:39] [I] sample0110.png
[11/06/2025-11:57:39] [I] sample0111.png
[11/06/2025-11:57:39] [I] sample0112.png
[11/06/2025-11:57:39] [I] sample0113.png
[11/06/2025-11:57:39] [I] sample0114.png
[11/06/2025-11:57:39] [I] sample0115.png
[11/06/2025-11:57:39] [I] sample0116.png
[11/06/2025-11:57:39] [I] sample0117.png
[11/06/2025-11:57:39] [I] sample0118.png
[11/06/2025-11:57:39] [I] sample0119.png
[11/06/2025-11:57:39] [I] sample0120.png
[11/06/2025-11:57:39] [I] sample0121.png
[11/06/2025-11:57:39] [I] sample0122.png
[11/06/2025-11:57:39] [I] sample0123.png
[11/06/2025-11:57:39] [I] sample0124.png
[11/06/2025-11:57:39] [I] sample0125.png
[11/06/2025-11:57:39] [I] sample0126.png
[11/06/2025-11:57:39] [I] sample0127.png
[11/06/2025-11:57:39] [I] sample0128.png
[11/06/2025-11:57:39] [I] sample0129.png
[11/06/2025-11:57:39] [I] sample0130.png
[11/06/2025-11:57:39] [I] sample0131.png
[11/06/2025-11:57:39] [I] sample0132.png
[11/06/2025-11:57:39] [I] sample0133.png
[11/06/2025-11:57:39] [I] sample0134.png
[11/06/2025-11:57:39] [I] sample0135.png
[11/06/2025-11:57:39] [I] sample0136.png
[11/06/2025-11:57:39] [I] sample0137.png
[11/06/2025-11:57:39] [I] sample0138.png
[11/06/2025-11:57:39] [I] sample0139.png
[11/06/2025-11:57:39] [I] sample0140.png
[11/06/2025-11:57:39] [I] sample0141.png
[11/06/2025-11:57:39] [I] sample0142.png
[11/06/2025-11:57:39] [I] sample0143.png
[11/06/2025-11:57:39] [I] sample0144.png
[11/06/2025-11:57:39] [I] sample0145.png
CalibrationDataReader: 145 images, 145 batches.
[11/06/2025-11:57:39] [I] [TRT] Reading Calibration Cache for calibrator: MinMaxCalibration
[11/06/2025-11:57:39] [I] [TRT] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[11/06/2025-11:57:39] [I] [TRT] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeNumDetection, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeDetectionClasses, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on DLA ----------
[11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on GPU ----------
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.0/conv/Conv + PWN(PWN(/model.0/act/Sigmoid), /model.0/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.1/conv/Conv + PWN(PWN(/model.1/act/Sigmoid), /model.1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv1/conv/Conv + PWN(PWN(/model.2/cv1/act/Sigmoid), /model.2/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv2/conv/Conv + PWN(PWN(/model.2/cv2/act/Sigmoid), /model.2/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv1/conv/Conv + PWN(PWN(/model.2/m/m.0/cv1/act/Sigmoid), /model.2/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.2/m/m.0/cv2/act/Sigmoid), /model.2/m/m.0/cv2/act/Mul), /model.2/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv3/conv/Conv + PWN(PWN(/model.2/cv3/act/Sigmoid), /model.2/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.3/conv/Conv + PWN(PWN(/model.3/act/Sigmoid), /model.3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv1/conv/Conv + PWN(PWN(/model.4/cv1/act/Sigmoid), /model.4/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv2/conv/Conv + PWN(PWN(/model.4/cv2/act/Sigmoid), /model.4/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv1/conv/Conv + PWN(PWN(/model.4/m/m.0/cv1/act/Sigmoid), /model.4/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.0/cv2/act/Sigmoid), /model.4/m/m.0/cv2/act/Mul), /model.4/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv1/conv/Conv + PWN(PWN(/model.4/m/m.1/cv1/act/Sigmoid), /model.4/m/m.1/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.1/cv2/act/Sigmoid), /model.4/m/m.1/cv2/act/Mul), /model.4/m/m.1/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv3/conv/Conv + PWN(PWN(/model.4/cv3/act/Sigmoid), /model.4/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.5/conv/Conv + PWN(PWN(/model.5/act/Sigmoid), /model.5/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv1/conv/Conv + PWN(PWN(/model.6/cv1/act/Sigmoid), /model.6/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv2/conv/Conv + PWN(PWN(/model.6/cv2/act/Sigmoid), /model.6/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv1/conv/Conv + PWN(PWN(/model.6/m/m.0/cv1/act/Sigmoid), /model.6/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.0/cv2/act/Sigmoid), /model.6/m/m.0/cv2/act/Mul), /model.6/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv1/conv/Conv + PWN(PWN(/model.6/m/m.1/cv1/act/Sigmoid), /model.6/m/m.1/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.1/cv2/act/Sigmoid), /model.6/m/m.1/cv2/act/Mul), /model.6/m/m.1/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv1/conv/Conv + PWN(PWN(/model.6/m/m.2/cv1/act/Sigmoid), /model.6/m/m.2/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.2/cv2/act/Sigmoid), /model.6/m/m.2/cv2/act/Mul), /model.6/m/m.2/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv3/conv/Conv + PWN(PWN(/model.6/cv3/act/Sigmoid), /model.6/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.7/conv/Conv + PWN(PWN(/model.7/act/Sigmoid), /model.7/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv1/conv/Conv + PWN(PWN(/model.8/cv1/act/Sigmoid), /model.8/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv2/conv/Conv + PWN(PWN(/model.8/cv2/act/Sigmoid), /model.8/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv1/conv/Conv + PWN(PWN(/model.8/m/m.0/cv1/act/Sigmoid), /model.8/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.8/m/m.0/cv2/act/Sigmoid), /model.8/m/m.0/cv2/act/Mul), /model.8/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv3/conv/Conv + PWN(PWN(/model.8/cv3/act/Sigmoid), /model.8/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv1/conv/Conv + PWN(PWN(/model.9/cv1/act/Sigmoid), /model.9/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m/MaxPool
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_1/MaxPool
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_2/MaxPool
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/cv1/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m/MaxPool_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m_1/MaxPool_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv2/conv/Conv + PWN(PWN(/model.9/cv2/act/Sigmoid), /model.9/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.10/conv/Conv + PWN(PWN(/model.10/act/Sigmoid), /model.10/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.11/Resize
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.11/Resize_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv1/conv/Conv + PWN(PWN(/model.13/cv1/act/Sigmoid), /model.13/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv2/conv/Conv + PWN(PWN(/model.13/cv2/act/Sigmoid), /model.13/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv1/conv/Conv + PWN(PWN(/model.13/m/m.0/cv1/act/Sigmoid), /model.13/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv2/conv/Conv + PWN(PWN(/model.13/m/m.0/cv2/act/Sigmoid), /model.13/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv3/conv/Conv + PWN(PWN(/model.13/cv3/act/Sigmoid), /model.13/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.14/conv/Conv + PWN(PWN(/model.14/act/Sigmoid), /model.14/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.15/Resize
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.15/Resize_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.4/cv3/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv1/conv/Conv + PWN(PWN(/model.17/cv1/act/Sigmoid), /model.17/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv2/conv/Conv + PWN(PWN(/model.17/cv2/act/Sigmoid), /model.17/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv1/conv/Conv + PWN(PWN(/model.17/m/m.0/cv1/act/Sigmoid), /model.17/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv2/conv/Conv + PWN(PWN(/model.17/m/m.0/cv2/act/Sigmoid), /model.17/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv3/conv/Conv + PWN(PWN(/model.17/cv3/act/Sigmoid), /model.17/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.18/conv/Conv + PWN(PWN(/model.18/act/Sigmoid), /model.18/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.0/Conv + PWN(/model.24/Sigmoid)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.14/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv1/conv/Conv + PWN(PWN(/model.20/cv1/act/Sigmoid), /model.20/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv2/conv/Conv + PWN(PWN(/model.20/cv2/act/Sigmoid), /model.20/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv1/conv/Conv + PWN(PWN(/model.20/m/m.0/cv1/act/Sigmoid), /model.20/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv2/conv/Conv + PWN(PWN(/model.20/m/m.0/cv2/act/Sigmoid), /model.20/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv3/conv/Conv + PWN(PWN(/model.20/cv3/act/Sigmoid), /model.20/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.21/conv/Conv + PWN(PWN(/model.21/act/Sigmoid), /model.21/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.1/Conv + PWN(/model.24/Sigmoid_1)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.10/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv1/conv/Conv + PWN(PWN(/model.23/cv1/act/Sigmoid), /model.23/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv2/conv/Conv + PWN(PWN(/model.23/cv2/act/Sigmoid), /model.23/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv1/conv/Conv + PWN(PWN(/model.23/m/m.0/cv1/act/Sigmoid), /model.23/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv2/conv/Conv + PWN(PWN(/model.23/m/m.0/cv2/act/Sigmoid), /model.23/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv3/conv/Conv + PWN(PWN(/model.23/cv3/act/Sigmoid), /model.23/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.2/Conv + PWN(/model.24/Sigmoid_2)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] PLUGIN_V2: YoloLayer
[11/06/2025-11:57:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +689, now: CPU 1137, GPU 5200 (MiB)
[11/06/2025-11:57:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +132, now: CPU 1220, GPU 5332 (MiB)
[11/06/2025-11:57:41] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/06/2025-12:00:45] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 1115794944
[11/06/2025-12:01:03] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/06/2025-12:01:03] [I] [TRT] Total Host Persistent Memory: 175984
[11/06/2025-12:01:03] [I] [TRT] Total Device Persistent Memory: 614912
[11/06/2025-12:01:03] [I] [TRT] Total Scratch Memory: 0
[11/06/2025-12:01:03] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 7 MiB, GPU 553 MiB
[11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 67 steps to complete.
[11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 2.77161ms to assign 6 blocks to 67 nodes requiring 10925056 bytes.
[11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 10925056
[11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB)
[11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB)
[11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +7, GPU +8, now: CPU 7, GPU 8 (MiB)
Engine build success!
Python call example
以下是一个简单Python示例调用C++生成的动态链接库,仅需指定模型文件的路径和视频输入的大小,就能返回视频每一帧的检测结果,并且在视频推理过程中可以动态调整置信度和交并比等参数的阈值。
import cv2
import time
import ctypes
ctypes.CDLL("./build/libyolo_plugin.so", mode=ctypes.RTLD_GLOBAL)
ctypes.CDLL("./build/libyolo_utils.so", mode=ctypes.RTLD_GLOBAL)
from build import yolov5_trt
def draw_detections(image, detections, fps):
for detection in detections:
class_id = detection['class_id']
x1, y1, x2, y2 = detection['bbox']
confidence = detection['confidence']
cv2.rectangle(image, (x1, y1), (x2, y2), (0x27, 0xC1, 0x36), 2)
cv2.putText(image, f"{class_id}:{confidence:.2f}", (x1, y1 - 10),
cv2.FONT_HERSHEY_PLAIN, 1.2, (0x27, 0xC1, 0x36), 2)
cv2.putText(image, f"FPS: {fps:.2f}", (10, 30),
cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 0, 255), 2)
return image
def main(input_path, output_path):
cap = cv2.VideoCapture(input_path)
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
detector = yolov5_trt.YOLOv5Detector("./weights/yolov5s.engine", width, height)
writer = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'MJPG'), fps, (width, height))
fps_list = []
frame_count = 0
total_time = 0.0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
start_time = time.time()
detections = detector.detect(input_image=frame,
input_w=640, input_h=640,
conf_thresh=0.45, nms_thresh=0.55)
process_time = time.time() - start_time
current_fps = 1.0 / process_time if process_time > 0 else 0
frame_count += 1
total_time += process_time
fps_list.append(current_fps)
image = draw_detections(frame, detections, current_fps)
writer.write(image)
cap.release()
writer.release()
if frame_count > 0:
avg_fps = frame_count / total_time if total_time > 0 else 0
print(f"Processed {frame_count} frames")
print(f"Average FPS: {avg_fps:.2f}")
print(f"Min FPS: {min(fps_list):.2f}")
print(f"Max FPS: {max(fps_list):.2f}")
if __name__ == "__main__":
input_video = "./media/sample_720p.mp4"
output_video = "./result.avi"
main(input_video, output_video)
对应的C++推理代码如下:
#include "NvInfer.h"
#include "logger.h"
#include "common.h"
#include "buffers.h"
#include "utils/preprocess.h"
#include "utils/postprocess.h"
#include "utils/types.h"
#include "utils/utils.h"
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <pybind11/stl.h>
#include <memory>
#include <mutex>
namespace py = pybind11;
// 将numpy数组转换为cv::Mat
cv::Mat numpy_to_mat(py::array_t<unsigned char>& input) {
py::buffer_info buf_info = input.request();
if (buf_info.ndim == 3) {
// 彩色图像
int height = buf_info.shape[0];
int width = buf_info.shape[1];
int channels = buf_info.shape[2];
cv::Mat mat(height, width, CV_8UC3, (unsigned char*)buf_info.ptr);
return mat.clone();
} else if (buf_info.ndim == 2) {
// 灰度图像
int height = buf_info.shape[0];
int width = buf_info.shape[1];
cv::Mat mat(height, width, CV_8UC1, (unsigned char*)buf_info.ptr);
return mat.clone();
}
throw std::runtime_error("Unsupported array dimensions");
}
// 将cv::Mat转换为numpy数组
py::array_t<unsigned char> mat_to_numpy(cv::Mat& mat) {
if (mat.empty()) {
return py::array_t<unsigned char>();
}
if (mat.channels() == 1) {
// 灰度图像
auto result = py::array_t<unsigned char>({mat.rows, mat.cols});
auto buf = result.request();
memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total());
return result;
} else {
// 彩色图像
auto result = py::array_t<unsigned char>({mat.rows, mat.cols, mat.channels()});
auto buf = result.request();
memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total() * mat.channels());
return result;
}
}
// 加载模型文件
std::vector<unsigned char> load_engine_file(const std::string &file_name)
{
std::vector<unsigned char> engine_data;
std::ifstream engine_file(file_name, std::ios::binary);
assert(engine_file.is_open() && "Unable to load engine file.");
engine_file.seekg(0, engine_file.end);
int length = engine_file.tellg();
engine_data.resize(length);
engine_file.seekg(0, engine_file.beg);
engine_file.read(reinterpret_cast<char *>(engine_data.data()), length);
return engine_data;
}
// YOLOv5推理器类
class YOLOv5Detector {
private:
std::unique_ptr<nvinfer1::IRuntime> runtime;
std::shared_ptr<nvinfer1::ICudaEngine> engine;
std::unique_ptr<nvinfer1::IExecutionContext> context;
std::unique_ptr<samplesCommon::BufferManager> buffers;
bool initialized = false;
public:
YOLOv5Detector(const std::string& engine_file, int frame_width, int frame_height) {
initialize(engine_file);
int img_size = frame_width * frame_height;
cuda_preprocess_init(img_size); // 申请cuda内存
}
void initialize(const std::string& engine_file) {
// ========== 1. 创建推理运行时runtime ==========
runtime = std::unique_ptr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(sample::gLogger.getTRTLogger()));
if (!runtime) {
throw std::runtime_error("Failed to create TensorRT runtime");
}
// ========== 2. 反序列化生成engine ==========
auto plan = load_engine_file(engine_file);
engine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(plan.data(), plan.size()));
if (!engine) {
throw std::runtime_error("Failed to deserialize engine");
}
// ========== 3. 创建执行上下文context ==========
context = std::unique_ptr<nvinfer1::IExecutionContext>(engine->createExecutionContext());
if (!context) {
throw std::runtime_error("Failed to create execution context");
}
// ========== 4. 创建输入输出缓冲区 ==========
buffers = std::make_unique<samplesCommon::BufferManager>(engine);
initialized = true;
}
py::list detect(py::array_t<unsigned char>& input_image, int input_w=kInputW, int input_h=kInputH, float conf_thresh=kConfThresh, float nms_thresh=kNmsThresh) {
if (!initialized) {
throw std::runtime_error("Detector not initialized");
}
// 将numpy数组转换为cv::Mat
cv::Mat frame = numpy_to_mat(input_image);
if (frame.empty()) {
throw std::runtime_error("Invalid input image");
}
// CUDA预处理
process_input_gpu(frame, (float *)buffers->getDeviceBuffer(kInputTensorName), input_w, input_h);
// ========== 5. 执行推理 ==========
context->executeV2(buffers->getDeviceBindings().data());
// 拷贝回host
buffers->copyOutputToHost();
// 从buffer manager中获取模型输出
int32_t *num_det = (int32_t *)buffers->getHostBuffer(kOutNumDet);
int32_t *cls = (int32_t *)buffers->getHostBuffer(kOutDetCls);
float *conf = (float *)buffers->getHostBuffer(kOutDetScores);
float *bbox = (float *)buffers->getHostBuffer(kOutDetBBoxes);
// 执行nms(非极大值抑制)
std::vector<Detection> bboxs;
yolo_nms(bboxs, num_det, cls, conf, bbox, conf_thresh, nms_thresh);
// 返回检测结果
py::list result_list;
for (size_t j = 0; j < bboxs.size(); j++) {
cv::Rect r = get_rect(frame, bboxs[j].bbox, input_w, input_h);
py::dict detection;
detection["class_id"] = (int)bboxs[j].class_id;
detection["confidence"] = (float)bboxs[j].conf;
detection["bbox"] = py::cast(std::vector<int>{r.x, r.y, r.x + r.width, r.y + r.height});
result_list.append(detection);
}
return result_list;
}
};
// Python绑定代码
PYBIND11_MODULE(yolov5_trt, m) {
m.doc() = "YOLOv5 TensorRT Python bindings";
py::class_<YOLOv5Detector>(m, "YOLOv5Detector")
.def(py::init<const std::string&, int, int>(), "Initialize detector with engine file",
py::arg("engine_file"),
py::arg("frame_width"),
py::arg("frame_height"))
.def("detect", &YOLOv5Detector::detect, "Perform detection on input image",
py::arg("input_image"),
py::arg("input_w") = kInputW,
py::arg("input_h") = kInputH,
py::arg("conf_thresh") = kConfThresh,
py::arg("nms_thresh") = kNmsThresh);
}
实际在Jetson Oron Nano (8GB)上对720P输入大小的视频进行目标检测,平均帧率稳定在120+ FPS,满足工业场景下对实时性的要求。
python yolov5_infer.py
[11/06/2025-15:23:26] [I] [TRT] Loaded engine size: 7 MiB
Deserialize yoloLayer plugin: YoloLayer
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +536, GPU +955, now: CPU 830, GPU 4470 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +149, now: CPU 913, GPU 4619 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7, now: CPU 0, GPU 7 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 913, GPU 4620 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +3, now: CPU 913, GPU 4623 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +11, now: CPU 0, GPU 18 (MiB)
Processed 1442 frames
Average FPS: 127.51
Min FPS: 75.75
Max FPS: 134.67
Conclusion Remarks
最后我们还提供了ByteTrack跟踪算法的Python绑定,基于Pybind11实现,并在原有算法基础上提供了跟踪目标的类别信息,Jetson Orin Nano也能在此基础上也能实现高达83 FPS的实时目标检测和跟踪性能:ByteTrack-Pybind11: 高性能实时目标跟踪解决方案 🚀
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)