- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

如何在Jetson上将YOLOv5实时检测速度提升至120+FPS

HouYanSong 发表于 2025/11/08 15:27:15 2025/11/08

【摘要】本文将介绍如何在Jetson Orin Nano上将YOLOv5实时目标检测速度提升至120+FPS，基于Pybind11实现Python绑定，并提供ByteTrack跟踪算法的Python插件，可以在Python中实现高效目标检测和跟踪。

如何在Jetson上将YOLOv5实时检测速度提升至120+FPS

这个项目提供了基于 Pybind11 的 TensorRT YOLOv5 插件 Python 绑定，实现了令人难以置信的实时目标检测性能！

⚡ 超100FPS性能: 在 Jetson Orin Nano 上轻松实现超过 120 帧/秒的检测速度
🎯 高精度检测: 基于成熟的 YOLOv5 架构，准确识别COCO数据集上的80类目标
🔌 即插即用: 简单的 Python 接口，无需复杂的配置
🛠️ 工业级优化: 采用 TensorRT 进行模型优化和加速

1. Building the plugin

首先安装必要的库克隆仓库构建项目，注意JetPack 5.x版本才能正常运行：

sudo apt update
sudo apt install ffmpeg
sudo apt install pybind11-dev
git clone https://github.com/HouYanSong/yolov5_trt_pybind11.git
cd yolov5_trt_pybind11
pip install pybind11
rm -fr build
cmake -S . -B build
cmake --build build

2. Model quantization

生成量化图片对YOLOv5s模型进行Int8量化，保存量化后的模型：

./media/gen_calib.sh
./build/build weights/yolov5s.onnx 1 ./media/ ./media/filelist.txt weights/yolov5s.engine

[11/06/2025-11:57:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +221, GPU +0, now: CPU 249, GPU 4229 (MiB)
[11/06/2025-11:57:39] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +277, now: CPU 574, GPU 4529 (MiB)
[11/06/2025-11:57:39] [I] [TRT] ----------------------------------------------------------------
[11/06/2025-11:57:39] [I] [TRT] Input filename:   weights/yolov5s.onnx
[11/06/2025-11:57:39] [I] [TRT] ONNX IR version:  0.0.7
[11/06/2025-11:57:39] [I] [TRT] Opset version:    12
[11/06/2025-11:57:39] [I] [TRT] Producer name:    
[11/06/2025-11:57:39] [I] [TRT] Producer version: 
[11/06/2025-11:57:39] [I] [TRT] Domain:           
[11/06/2025-11:57:39] [I] [TRT] Model version:    0
[11/06/2025-11:57:39] [I] [TRT] Doc string:       
[11/06/2025-11:57:39] [I] [TRT] ----------------------------------------------------------------
[11/06/2025-11:57:39] [I] [TRT] No importer registered for op: YoloLayer_TRT. Attempting to import as plugin.
[11/06/2025-11:57:39] [I] [TRT] Searching for plugin: YoloLayer_TRT, plugin_version: 1, plugin_namespace: 
[11/06/2025-11:57:39] [I] [TRT] Successfully created plugin: YoloLayer_TRT
[11/06/2025-11:57:39] [I] sample0001.png
[11/06/2025-11:57:39] [I] sample0002.png
[11/06/2025-11:57:39] [I] sample0003.png
[11/06/2025-11:57:39] [I] sample0004.png
[11/06/2025-11:57:39] [I] sample0005.png
[11/06/2025-11:57:39] [I] sample0006.png
[11/06/2025-11:57:39] [I] sample0007.png
[11/06/2025-11:57:39] [I] sample0008.png
[11/06/2025-11:57:39] [I] sample0009.png
[11/06/2025-11:57:39] [I] sample0010.png
[11/06/2025-11:57:39] [I] sample0011.png
[11/06/2025-11:57:39] [I] sample0012.png
[11/06/2025-11:57:39] [I] sample0013.png
[11/06/2025-11:57:39] [I] sample0014.png
[11/06/2025-11:57:39] [I] sample0015.png
[11/06/2025-11:57:39] [I] sample0016.png
[11/06/2025-11:57:39] [I] sample0017.png
[11/06/2025-11:57:39] [I] sample0018.png
[11/06/2025-11:57:39] [I] sample0019.png
[11/06/2025-11:57:39] [I] sample0020.png
[11/06/2025-11:57:39] [I] sample0021.png
[11/06/2025-11:57:39] [I] sample0022.png
[11/06/2025-11:57:39] [I] sample0023.png
[11/06/2025-11:57:39] [I] sample0024.png
[11/06/2025-11:57:39] [I] sample0025.png
[11/06/2025-11:57:39] [I] sample0026.png
[11/06/2025-11:57:39] [I] sample0027.png
[11/06/2025-11:57:39] [I] sample0028.png
[11/06/2025-11:57:39] [I] sample0029.png
[11/06/2025-11:57:39] [I] sample0030.png
[11/06/2025-11:57:39] [I] sample0031.png
[11/06/2025-11:57:39] [I] sample0032.png
[11/06/2025-11:57:39] [I] sample0033.png
[11/06/2025-11:57:39] [I] sample0034.png
[11/06/2025-11:57:39] [I] sample0035.png
[11/06/2025-11:57:39] [I] sample0036.png
[11/06/2025-11:57:39] [I] sample0037.png
[11/06/2025-11:57:39] [I] sample0038.png
[11/06/2025-11:57:39] [I] sample0039.png
[11/06/2025-11:57:39] [I] sample0040.png
[11/06/2025-11:57:39] [I] sample0041.png
[11/06/2025-11:57:39] [I] sample0042.png
[11/06/2025-11:57:39] [I] sample0043.png
[11/06/2025-11:57:39] [I] sample0044.png
[11/06/2025-11:57:39] [I] sample0045.png
[11/06/2025-11:57:39] [I] sample0046.png
[11/06/2025-11:57:39] [I] sample0047.png
[11/06/2025-11:57:39] [I] sample0048.png
[11/06/2025-11:57:39] [I] sample0049.png
[11/06/2025-11:57:39] [I] sample0050.png
[11/06/2025-11:57:39] [I] sample0051.png
[11/06/2025-11:57:39] [I] sample0052.png
[11/06/2025-11:57:39] [I] sample0053.png
[11/06/2025-11:57:39] [I] sample0054.png
[11/06/2025-11:57:39] [I] sample0055.png
[11/06/2025-11:57:39] [I] sample0056.png
[11/06/2025-11:57:39] [I] sample0057.png
[11/06/2025-11:57:39] [I] sample0058.png
[11/06/2025-11:57:39] [I] sample0059.png
[11/06/2025-11:57:39] [I] sample0060.png
[11/06/2025-11:57:39] [I] sample0061.png
[11/06/2025-11:57:39] [I] sample0062.png
[11/06/2025-11:57:39] [I] sample0063.png
[11/06/2025-11:57:39] [I] sample0064.png
[11/06/2025-11:57:39] [I] sample0065.png
[11/06/2025-11:57:39] [I] sample0066.png
[11/06/2025-11:57:39] [I] sample0067.png
[11/06/2025-11:57:39] [I] sample0068.png
[11/06/2025-11:57:39] [I] sample0069.png
[11/06/2025-11:57:39] [I] sample0070.png
[11/06/2025-11:57:39] [I] sample0071.png
[11/06/2025-11:57:39] [I] sample0072.png
[11/06/2025-11:57:39] [I] sample0073.png
[11/06/2025-11:57:39] [I] sample0074.png
[11/06/2025-11:57:39] [I] sample0075.png
[11/06/2025-11:57:39] [I] sample0076.png
[11/06/2025-11:57:39] [I] sample0077.png
[11/06/2025-11:57:39] [I] sample0078.png
[11/06/2025-11:57:39] [I] sample0079.png
[11/06/2025-11:57:39] [I] sample0080.png
[11/06/2025-11:57:39] [I] sample0081.png
[11/06/2025-11:57:39] [I] sample0082.png
[11/06/2025-11:57:39] [I] sample0083.png
[11/06/2025-11:57:39] [I] sample0084.png
[11/06/2025-11:57:39] [I] sample0085.png
[11/06/2025-11:57:39] [I] sample0086.png
[11/06/2025-11:57:39] [I] sample0087.png
[11/06/2025-11:57:39] [I] sample0088.png
[11/06/2025-11:57:39] [I] sample0089.png
[11/06/2025-11:57:39] [I] sample0090.png
[11/06/2025-11:57:39] [I] sample0091.png
[11/06/2025-11:57:39] [I] sample0092.png
[11/06/2025-11:57:39] [I] sample0093.png
[11/06/2025-11:57:39] [I] sample0094.png
[11/06/2025-11:57:39] [I] sample0095.png
[11/06/2025-11:57:39] [I] sample0096.png
[11/06/2025-11:57:39] [I] sample0097.png
[11/06/2025-11:57:39] [I] sample0098.png
[11/06/2025-11:57:39] [I] sample0099.png
[11/06/2025-11:57:39] [I] sample0100.png
[11/06/2025-11:57:39] [I] sample0101.png
[11/06/2025-11:57:39] [I] sample0102.png
[11/06/2025-11:57:39] [I] sample0103.png
[11/06/2025-11:57:39] [I] sample0104.png
[11/06/2025-11:57:39] [I] sample0105.png
[11/06/2025-11:57:39] [I] sample0106.png
[11/06/2025-11:57:39] [I] sample0107.png
[11/06/2025-11:57:39] [I] sample0108.png
[11/06/2025-11:57:39] [I] sample0109.png
[11/06/2025-11:57:39] [I] sample0110.png
[11/06/2025-11:57:39] [I] sample0111.png
[11/06/2025-11:57:39] [I] sample0112.png
[11/06/2025-11:57:39] [I] sample0113.png
[11/06/2025-11:57:39] [I] sample0114.png
[11/06/2025-11:57:39] [I] sample0115.png
[11/06/2025-11:57:39] [I] sample0116.png
[11/06/2025-11:57:39] [I] sample0117.png
[11/06/2025-11:57:39] [I] sample0118.png
[11/06/2025-11:57:39] [I] sample0119.png
[11/06/2025-11:57:39] [I] sample0120.png
[11/06/2025-11:57:39] [I] sample0121.png
[11/06/2025-11:57:39] [I] sample0122.png
[11/06/2025-11:57:39] [I] sample0123.png
[11/06/2025-11:57:39] [I] sample0124.png
[11/06/2025-11:57:39] [I] sample0125.png
[11/06/2025-11:57:39] [I] sample0126.png
[11/06/2025-11:57:39] [I] sample0127.png
[11/06/2025-11:57:39] [I] sample0128.png
[11/06/2025-11:57:39] [I] sample0129.png
[11/06/2025-11:57:39] [I] sample0130.png
[11/06/2025-11:57:39] [I] sample0131.png
[11/06/2025-11:57:39] [I] sample0132.png
[11/06/2025-11:57:39] [I] sample0133.png
[11/06/2025-11:57:39] [I] sample0134.png
[11/06/2025-11:57:39] [I] sample0135.png
[11/06/2025-11:57:39] [I] sample0136.png
[11/06/2025-11:57:39] [I] sample0137.png
[11/06/2025-11:57:39] [I] sample0138.png
[11/06/2025-11:57:39] [I] sample0139.png
[11/06/2025-11:57:39] [I] sample0140.png
[11/06/2025-11:57:39] [I] sample0141.png
[11/06/2025-11:57:39] [I] sample0142.png
[11/06/2025-11:57:39] [I] sample0143.png
[11/06/2025-11:57:39] [I] sample0144.png
[11/06/2025-11:57:39] [I] sample0145.png
CalibrationDataReader: 145 images, 145 batches.
[11/06/2025-11:57:39] [I] [TRT] Reading Calibration Cache for calibrator: MinMaxCalibration
[11/06/2025-11:57:39] [I] [TRT] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[11/06/2025-11:57:39] [I] [TRT] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeNumDetection, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeDetectionClasses, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on DLA ----------
[11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on GPU ----------
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.0/conv/Conv + PWN(PWN(/model.0/act/Sigmoid), /model.0/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.1/conv/Conv + PWN(PWN(/model.1/act/Sigmoid), /model.1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv1/conv/Conv + PWN(PWN(/model.2/cv1/act/Sigmoid), /model.2/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv2/conv/Conv + PWN(PWN(/model.2/cv2/act/Sigmoid), /model.2/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv1/conv/Conv + PWN(PWN(/model.2/m/m.0/cv1/act/Sigmoid), /model.2/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.2/m/m.0/cv2/act/Sigmoid), /model.2/m/m.0/cv2/act/Mul), /model.2/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv3/conv/Conv + PWN(PWN(/model.2/cv3/act/Sigmoid), /model.2/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.3/conv/Conv + PWN(PWN(/model.3/act/Sigmoid), /model.3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv1/conv/Conv + PWN(PWN(/model.4/cv1/act/Sigmoid), /model.4/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv2/conv/Conv + PWN(PWN(/model.4/cv2/act/Sigmoid), /model.4/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv1/conv/Conv + PWN(PWN(/model.4/m/m.0/cv1/act/Sigmoid), /model.4/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.0/cv2/act/Sigmoid), /model.4/m/m.0/cv2/act/Mul), /model.4/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv1/conv/Conv + PWN(PWN(/model.4/m/m.1/cv1/act/Sigmoid), /model.4/m/m.1/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.1/cv2/act/Sigmoid), /model.4/m/m.1/cv2/act/Mul), /model.4/m/m.1/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv3/conv/Conv + PWN(PWN(/model.4/cv3/act/Sigmoid), /model.4/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.5/conv/Conv + PWN(PWN(/model.5/act/Sigmoid), /model.5/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv1/conv/Conv + PWN(PWN(/model.6/cv1/act/Sigmoid), /model.6/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv2/conv/Conv + PWN(PWN(/model.6/cv2/act/Sigmoid), /model.6/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv1/conv/Conv + PWN(PWN(/model.6/m/m.0/cv1/act/Sigmoid), /model.6/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.0/cv2/act/Sigmoid), /model.6/m/m.0/cv2/act/Mul), /model.6/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv1/conv/Conv + PWN(PWN(/model.6/m/m.1/cv1/act/Sigmoid), /model.6/m/m.1/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.1/cv2/act/Sigmoid), /model.6/m/m.1/cv2/act/Mul), /model.6/m/m.1/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv1/conv/Conv + PWN(PWN(/model.6/m/m.2/cv1/act/Sigmoid), /model.6/m/m.2/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.2/cv2/act/Sigmoid), /model.6/m/m.2/cv2/act/Mul), /model.6/m/m.2/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv3/conv/Conv + PWN(PWN(/model.6/cv3/act/Sigmoid), /model.6/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.7/conv/Conv + PWN(PWN(/model.7/act/Sigmoid), /model.7/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv1/conv/Conv + PWN(PWN(/model.8/cv1/act/Sigmoid), /model.8/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv2/conv/Conv + PWN(PWN(/model.8/cv2/act/Sigmoid), /model.8/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv1/conv/Conv + PWN(PWN(/model.8/m/m.0/cv1/act/Sigmoid), /model.8/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv2/conv/Conv
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.8/m/m.0/cv2/act/Sigmoid), /model.8/m/m.0/cv2/act/Mul), /model.8/m/m.0/Add)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv3/conv/Conv + PWN(PWN(/model.8/cv3/act/Sigmoid), /model.8/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv1/conv/Conv + PWN(PWN(/model.9/cv1/act/Sigmoid), /model.9/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m/MaxPool
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_1/MaxPool
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_2/MaxPool
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/cv1/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m/MaxPool_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m_1/MaxPool_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv2/conv/Conv + PWN(PWN(/model.9/cv2/act/Sigmoid), /model.9/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.10/conv/Conv + PWN(PWN(/model.10/act/Sigmoid), /model.10/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.11/Resize
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.11/Resize_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv1/conv/Conv + PWN(PWN(/model.13/cv1/act/Sigmoid), /model.13/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv2/conv/Conv + PWN(PWN(/model.13/cv2/act/Sigmoid), /model.13/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv1/conv/Conv + PWN(PWN(/model.13/m/m.0/cv1/act/Sigmoid), /model.13/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv2/conv/Conv + PWN(PWN(/model.13/m/m.0/cv2/act/Sigmoid), /model.13/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv3/conv/Conv + PWN(PWN(/model.13/cv3/act/Sigmoid), /model.13/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.14/conv/Conv + PWN(PWN(/model.14/act/Sigmoid), /model.14/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.15/Resize
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.15/Resize_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.4/cv3/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv1/conv/Conv + PWN(PWN(/model.17/cv1/act/Sigmoid), /model.17/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv2/conv/Conv + PWN(PWN(/model.17/cv2/act/Sigmoid), /model.17/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv1/conv/Conv + PWN(PWN(/model.17/m/m.0/cv1/act/Sigmoid), /model.17/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv2/conv/Conv + PWN(PWN(/model.17/m/m.0/cv2/act/Sigmoid), /model.17/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv3/conv/Conv + PWN(PWN(/model.17/cv3/act/Sigmoid), /model.17/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.18/conv/Conv + PWN(PWN(/model.18/act/Sigmoid), /model.18/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.0/Conv + PWN(/model.24/Sigmoid)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.14/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv1/conv/Conv + PWN(PWN(/model.20/cv1/act/Sigmoid), /model.20/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv2/conv/Conv + PWN(PWN(/model.20/cv2/act/Sigmoid), /model.20/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv1/conv/Conv + PWN(PWN(/model.20/m/m.0/cv1/act/Sigmoid), /model.20/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv2/conv/Conv + PWN(PWN(/model.20/m/m.0/cv2/act/Sigmoid), /model.20/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv3/conv/Conv + PWN(PWN(/model.20/cv3/act/Sigmoid), /model.20/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.21/conv/Conv + PWN(PWN(/model.21/act/Sigmoid), /model.21/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.1/Conv + PWN(/model.24/Sigmoid_1)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.10/act/Mul_output_0 copy
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv1/conv/Conv + PWN(PWN(/model.23/cv1/act/Sigmoid), /model.23/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv2/conv/Conv + PWN(PWN(/model.23/cv2/act/Sigmoid), /model.23/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv1/conv/Conv + PWN(PWN(/model.23/m/m.0/cv1/act/Sigmoid), /model.23/m/m.0/cv1/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv2/conv/Conv + PWN(PWN(/model.23/m/m.0/cv2/act/Sigmoid), /model.23/m/m.0/cv2/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv3/conv/Conv + PWN(PWN(/model.23/cv3/act/Sigmoid), /model.23/cv3/act/Mul)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.2/Conv + PWN(/model.24/Sigmoid_2)
[11/06/2025-11:57:39] [I] [TRT] [GpuLayer] PLUGIN_V2: YoloLayer
[11/06/2025-11:57:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +689, now: CPU 1137, GPU 5200 (MiB)
[11/06/2025-11:57:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +132, now: CPU 1220, GPU 5332 (MiB)
[11/06/2025-11:57:41] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/06/2025-12:00:45] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 1115794944
[11/06/2025-12:01:03] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/06/2025-12:01:03] [I] [TRT] Total Host Persistent Memory: 175984
[11/06/2025-12:01:03] [I] [TRT] Total Device Persistent Memory: 614912
[11/06/2025-12:01:03] [I] [TRT] Total Scratch Memory: 0
[11/06/2025-12:01:03] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 7 MiB, GPU 553 MiB
[11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 67 steps to complete.
[11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 2.77161ms to assign 6 blocks to 67 nodes requiring 10925056 bytes.
[11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 10925056
[11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB)
[11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB)
[11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +7, GPU +8, now: CPU 7, GPU 8 (MiB)
Engine build success!

Python call example

以下是一个简单Python示例调用C++生成的动态链接库，仅需指定模型文件的路径和视频输入的大小，就能返回视频每一帧的检测结果，并且在视频推理过程中可以动态调整置信度和交并比等参数的阈值。

import cv2
import time
import ctypes
ctypes.CDLL("./build/libyolo_plugin.so", mode=ctypes.RTLD_GLOBAL)
ctypes.CDLL("./build/libyolo_utils.so", mode=ctypes.RTLD_GLOBAL)
from build import yolov5_trt
    

def draw_detections(image, detections, fps):
    for detection in detections:
        class_id = detection['class_id']
        x1, y1, x2, y2 = detection['bbox']
        confidence = detection['confidence']
        cv2.rectangle(image, (x1, y1), (x2, y2), (0x27, 0xC1, 0x36), 2)
        cv2.putText(image, f"{class_id}:{confidence:.2f}", (x1, y1 - 10), 
                    cv2.FONT_HERSHEY_PLAIN, 1.2, (0x27, 0xC1, 0x36), 2)
        
    cv2.putText(image, f"FPS: {fps:.2f}", (10, 30), 
                cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 0, 255), 2)
        
    return image

def main(input_path, output_path):
    cap = cv2.VideoCapture(input_path)
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    detector = yolov5_trt.YOLOv5Detector("./weights/yolov5s.engine", width, height)
    writer = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'MJPG'), fps, (width, height))    
    
    fps_list = []
    frame_count = 0
    total_time = 0.0

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
            
        start_time = time.time()
        detections = detector.detect(input_image=frame, 
                                     input_w=640, input_h=640, 
                                     conf_thresh=0.45, nms_thresh=0.55)
        
        process_time = time.time() - start_time
        current_fps = 1.0 / process_time if process_time > 0 else 0
        
        frame_count += 1
        total_time += process_time
        fps_list.append(current_fps)

        image = draw_detections(frame, detections, current_fps)
        writer.write(image)

    cap.release()
    writer.release()
    
    if frame_count > 0:
        avg_fps = frame_count / total_time if total_time > 0 else 0
        print(f"Processed {frame_count} frames")
        print(f"Average FPS: {avg_fps:.2f}")
        print(f"Min FPS: {min(fps_list):.2f}")
        print(f"Max FPS: {max(fps_list):.2f}")


if __name__ == "__main__":
    input_video = "./media/sample_720p.mp4"  
    output_video = "./result.avi"  
    main(input_video, output_video)

对应的C++推理代码如下：

#include "NvInfer.h"
#include "logger.h"
#include "common.h"
#include "buffers.h"
#include "utils/preprocess.h"
#include "utils/postprocess.h"
#include "utils/types.h"
#include "utils/utils.h"
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <pybind11/stl.h>
#include <memory>
#include <mutex>

namespace py = pybind11;

// 将numpy数组转换为cv::Mat
cv::Mat numpy_to_mat(py::array_t<unsigned char>& input) {
    py::buffer_info buf_info = input.request();
    
    if (buf_info.ndim == 3) {
        // 彩色图像
        int height = buf_info.shape[0];
        int width = buf_info.shape[1];
        int channels = buf_info.shape[2];
        
        cv::Mat mat(height, width, CV_8UC3, (unsigned char*)buf_info.ptr);
        return mat.clone();
    } else if (buf_info.ndim == 2) {
        // 灰度图像
        int height = buf_info.shape[0];
        int width = buf_info.shape[1];
        
        cv::Mat mat(height, width, CV_8UC1, (unsigned char*)buf_info.ptr);
        return mat.clone();
    }
    
    throw std::runtime_error("Unsupported array dimensions");
}

// 将cv::Mat转换为numpy数组
py::array_t<unsigned char> mat_to_numpy(cv::Mat& mat) {
    if (mat.empty()) {
        return py::array_t<unsigned char>();
    }
    
    if (mat.channels() == 1) {
        // 灰度图像
        auto result = py::array_t<unsigned char>({mat.rows, mat.cols});
        auto buf = result.request();
        memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total());
        return result;
    } else {
        // 彩色图像
        auto result = py::array_t<unsigned char>({mat.rows, mat.cols, mat.channels()});
        auto buf = result.request();
        memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total() * mat.channels());
        return result;
    }
}

// 加载模型文件
std::vector<unsigned char> load_engine_file(const std::string &file_name)
{
    std::vector<unsigned char> engine_data;
    std::ifstream engine_file(file_name, std::ios::binary);
    assert(engine_file.is_open() && "Unable to load engine file.");
    engine_file.seekg(0, engine_file.end);
    int length = engine_file.tellg();
    engine_data.resize(length);
    engine_file.seekg(0, engine_file.beg);
    engine_file.read(reinterpret_cast<char *>(engine_data.data()), length);
    return engine_data;
}

// YOLOv5推理器类
class YOLOv5Detector {
private:
    std::unique_ptr<nvinfer1::IRuntime> runtime;
    std::shared_ptr<nvinfer1::ICudaEngine> engine;
    std::unique_ptr<nvinfer1::IExecutionContext> context;
    std::unique_ptr<samplesCommon::BufferManager> buffers;
    bool initialized = false;

public:
    YOLOv5Detector(const std::string& engine_file, int frame_width, int frame_height) {
        initialize(engine_file);
        int img_size = frame_width * frame_height;
        cuda_preprocess_init(img_size); // 申请cuda内存
    }
    
    void initialize(const std::string& engine_file) {
        // ========== 1. 创建推理运行时runtime ==========
        runtime = std::unique_ptr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(sample::gLogger.getTRTLogger()));
        if (!runtime) {
            throw std::runtime_error("Failed to create TensorRT runtime");
        }

        // ========== 2. 反序列化生成engine ==========
        auto plan = load_engine_file(engine_file);
        engine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(plan.data(), plan.size()));
        if (!engine) {
            throw std::runtime_error("Failed to deserialize engine");
        }

        // ========== 3. 创建执行上下文context ==========
        context = std::unique_ptr<nvinfer1::IExecutionContext>(engine->createExecutionContext());
        if (!context) {
            throw std::runtime_error("Failed to create execution context");
        }

        // ========== 4. 创建输入输出缓冲区 ==========
        buffers = std::make_unique<samplesCommon::BufferManager>(engine);
        
        initialized = true;
    }
    
    py::list detect(py::array_t<unsigned char>& input_image, int input_w=kInputW, int input_h=kInputH, float conf_thresh=kConfThresh, float nms_thresh=kNmsThresh) {
        if (!initialized) {
            throw std::runtime_error("Detector not initialized");
        }
        
        // 将numpy数组转换为cv::Mat
        cv::Mat frame = numpy_to_mat(input_image);
        
        if (frame.empty()) {
            throw std::runtime_error("Invalid input image");
        }
        
        // CUDA预处理
        process_input_gpu(frame, (float *)buffers->getDeviceBuffer(kInputTensorName), input_w, input_h);
        
        // ========== 5. 执行推理 ==========
        context->executeV2(buffers->getDeviceBindings().data());
        
        // 拷贝回host
        buffers->copyOutputToHost();
        
        // 从buffer manager中获取模型输出
        int32_t *num_det = (int32_t *)buffers->getHostBuffer(kOutNumDet);
        int32_t *cls = (int32_t *)buffers->getHostBuffer(kOutDetCls);
        float *conf = (float *)buffers->getHostBuffer(kOutDetScores);
        float *bbox = (float *)buffers->getHostBuffer(kOutDetBBoxes);
        
        // 执行nms（非极大值抑制）
        std::vector<Detection> bboxs;
        yolo_nms(bboxs, num_det, cls, conf, bbox, conf_thresh, nms_thresh);
        
        // 返回检测结果
        py::list result_list;
        for (size_t j = 0; j < bboxs.size(); j++) {
            cv::Rect r = get_rect(frame, bboxs[j].bbox, input_w, input_h);
            py::dict detection;
            detection["class_id"] = (int)bboxs[j].class_id;
            detection["confidence"] = (float)bboxs[j].conf;
            detection["bbox"] = py::cast(std::vector<int>{r.x, r.y, r.x + r.width, r.y + r.height});
            result_list.append(detection);
        }
        
        return result_list;
    }
};

// Python绑定代码
PYBIND11_MODULE(yolov5_trt, m) {
    m.doc() = "YOLOv5 TensorRT Python bindings";
    
    py::class_<YOLOv5Detector>(m, "YOLOv5Detector")
        .def(py::init<const std::string&, int, int>(), "Initialize detector with engine file",
            py::arg("engine_file"),
            py::arg("frame_width"),
            py::arg("frame_height"))
        .def("detect", &YOLOv5Detector::detect, "Perform detection on input image",
            py::arg("input_image"), 
            py::arg("input_w") = kInputW, 
            py::arg("input_h") = kInputH,
            py::arg("conf_thresh") = kConfThresh, 
            py::arg("nms_thresh") = kNmsThresh); 
}

实际在Jetson Oron Nano (8GB)上对720P输入大小的视频进行目标检测，平均帧率稳定在120+ FPS，满足工业场景下对实时性的要求。

python yolov5_infer.py

[11/06/2025-15:23:26] [I] [TRT] Loaded engine size: 7 MiB
Deserialize yoloLayer plugin: YoloLayer
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +536, GPU +955, now: CPU 830, GPU 4470 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +149, now: CPU 913, GPU 4619 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7, now: CPU 0, GPU 7 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 913, GPU 4620 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +3, now: CPU 913, GPU 4623 (MiB)
[11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +11, now: CPU 0, GPU 18 (MiB)
Processed 1442 frames
Average FPS: 127.51
Min FPS: 75.75
Max FPS: 134.67

Conclusion Remarks

最后我们还提供了ByteTrack跟踪算法的Python绑定，基于Pybind11实现，并在原有算法基础上提供了跟踪目标的类别信息，Jetson Orin Nano也能在此基础上也能实现高达83 FPS的实时目标检测和跟踪性能：ByteTrack-Pybind11: 高性能实时目标跟踪解决方案 🚀

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

如何在Jetson上将YOLOv5实时检测速度提升至120+FPS

如何在Jetson上将YOLOv5实时检测速度提升至120+FPS

1. Building the plugin

2. Model quantization

Python call example

Conclusion Remarks

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

如何在Jetson上将YOLOv5实时检测速度提升至120+FPS

如何在Jetson上将YOLOv5实时检测速度提升至120+FPS

1. Building the plugin

2. Model quantization

Python call example

Conclusion Remarks

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品