RDK X5如何使用BPU对ONNX模型进行推理

Anan1 · 2026 年3 月 26 日 01:28

RDK X5上通过CPU对ONNX模型进行推理，运行程序会出现程序卡顿现象，于是想利用BPU对ONNX模型进行推理，如何使用BPU对ONNX模型进行推理，利用工具链应该怎么进行操作？

地瓜小助手_bot · 2026 年3 月 26 日 01:28

你好 @Anan1，欢迎加入地瓜机器人开发者社区！

针对你提出的 RDK X5 如何使用 BPU 对 ONNX 模型进行推理 的问题，核心结论如下：

不能直接在板端将 ONNX 模型用于 BPU 推理。
必须先在**开发机（PC/WSL）上使用地平线算法工具链（Docker 环境）**将 .onnx 模型量化、编译为 .bin 模型，然后将 .bin 文件传输到 RDK X5 板端，通过 BPU API 进行推理。

完整操作流程（3 步走）

步骤 1：准备开发机环境

系统要求：Ubuntu 20.04 或 WSL2 + Ubuntu 22.04
安装 Docker 并配置 NVIDIA Container Toolkit（如需 GPU 加速量化）
获取 RDK X5 工具链镜像：
- 从地平线开发者平台下载 RDK OE 交付包（如 horizon_x5_open_explorer_v1.2.8）
- 导入 Docker 镜像：
```
docker load -i docker_openexplorer_ubuntu_20_x5_gpu_v1.2.8.tar.gz
```

步骤 2：在 Docker 中转换 ONNX → BIN

启动 Docker 容器：

sudo docker run -it --rm --gpus all --shm-size=15g \
  -v "$AI_TOOLCHAIN_PATH":/open_explorer \
  -v "$DATASET_PATH":/data \
  openexplorer/ai_toolchain_ubuntu_20_x5_gpu:v1.2.8-py310

检查 ONNX 模型兼容性：

hb_mapper checker --model-type onnx --march bayes-e --model /path/to/your_model.onnx

配置量化 YAML 文件（参考官方模板）：
- 指定输入尺寸、数据类型、校准数据集路径等
- 示例（YOLOv5 检测）：ptq_yamls/yolov5_detect_bayese_640x640_nv12.yaml

执行量化编译：

hb_mapper makertbin --model-type onnx --config your_config.yaml

输出：.bin 或 .hbm 模型文件

验证模型（可选）：
```
hb_perf --model your_model.bin
```

步骤 3：板端部署与推理

传输模型到 RDK X5：

scp your_model.bin user@rdk_x5_ip:/home/user/models/

板端推理代码示例（C++）：

#include "hb_dnn.h"

hbDNNHandle_t packed_dnn_handle_;
const char* model_path = "/home/user/models/your_model.bin";

// 初始化 BPU 模型
hbDNNInitializeFromFiles(&packed_dnn_handle_, &model_path, 1);

// 准备输入 tensor（NV12/RGB 等格式）
hbDNNTensor input_tensor;
// ... 填充 input_tensor ...

// 执行推理
hbDNNTensor output_tensors[MAX_OUTPUT_NUM];
hbDNNInference(packed_dnn_handle_, &input_tensor, output_tensors);

// 处理输出结果
// ...

Python 推理（如使用 hb_pybind）：

from hb_dnn import DNNModel

model = DNNModel("your_model.bin")
outputs = model.inference(input_image)

关键提示

板端无 hb_mapper 工具：所有转换必须在开发机 Docker 中完成
校准数据集：PTQ 量化需要代表性数据集（通常 100~500 张图像）
输入格式：BPU 通常要求 NV12 或 RGB 格式，注意预处理一致性
性能对比：BPU 推理相比 CPU 通常有 10~50 倍加速，具体取决于模型结构

如果你在转换过程中遇到具体问题（如 YAML 配置、算子不支持、量化精度下降等），欢迎将错误日志和模型结构贴到社区，我们会进一步协助排查！

建议你将此问题的解决过程也整理成帖子，帮助其他开发者～

Anan1 · 2026 年3 月 26 日 02:45

有没有具体的YAML文件示例

地瓜小助手_bot · 2026 年3 月 26 日 02:46

你好 @Anan1，有的！为你提供一个 RDK X5 ONNX 模型 PTQ 量化 的完整 YAML 配置示例，你可以直接参考修改使用。

RDK X5 ONNX 模型量化 YAML 示例

最简配置（精度验证用）

# ==================== 模型参数组 ====================
model_parameters:
  onnx_model: "your_model.onnx"          # 你的 ONNX 模型文件
  march: "bayes-e"                       # RDK X5 固定使用 bayes-e
  output_model_file_prefix: "your_model" # 输出模型前缀
  working_dir: "./model_output"          # 输出目录
  layer_out_dump: False                  # 是否输出中间层（调试用）

# ==================== 输入参数组 ====================
input_parameters:
  input_type_train: "bgr"                # 训练时输入格式：bgr/rgb/yuv444/gray/featuremap
  input_layout_train: "NCHW"             # 训练时数据排布：NCHW 或 NHWC
  input_shape: "1x3x640x640"             # 输入尺寸，根据你的模型修改
  input_type_rt: "nv12"                  # 板端推理时输入格式：nv12/rgb/bgr
  norm_type: "data_mean_and_scale"       # 预处理方式
  mean_value: "103.94 116.78 123.68"     # 通道均值（BGR 顺序）
  scale_value: "0.017"                   # 缩放系数

# ==================== 校准参数组 ====================
calibration_parameters:
  cal_data_dir: "./calibration_data"     # 校准数据目录（100 张左右）
  cal_data_type: "float32"               # 校准数据类型
  calibration_type: "default"            # 校准算法：default/kl/max/mix
  # preprocess_on: False                 # 关闭自动预处理（推荐手动预处理）

# ==================== 编译参数组 ====================
compiler_parameters:
  compile_mode: "latency"                # 编译策略：latency/throughput
  optimize_level: "O3"                   # 优化等级：O0~O3
  debug: False                           # 是否开启 debug 模式
  core_num: 1                            # 使用核心数

进阶配置（多输入/指定算子）

如果你的模型有多个输入或需要指定算子运行位置，参考以下配置：

model_parameters:
  onnx_model: "multi_input_model.onnx"
  march: "bayes-e"
  output_model_file_prefix: "multi_model"
  working_dir: "./model_output"
  
  # 指定算子运行位置（性能优化用）
  node_info:
    "/model.0/conv/Conv":
      "ON": "BPU"
      "InputType": "int16"
      "OutputType": "int16"
    "/model.2/Concat":
      "ON": "BPU"

input_parameters:
  # 多输入模型，用分号分隔
  input_name: "input1;input2"
  input_type_train: "bgr;featuremap"
  input_layout_train: "NCHW;NHWC"
  input_shape: "1x3x640x640;1x128x128x2"
  input_type_rt: "nv12;featuremap"
  norm_type: "data_mean_and_scale;no_preprocess"
  mean_value: "103.94 116.78 123.68;0"
  scale_value: "0.017;1.0"

calibration_parameters:
  cal_data_dir: "./cal_data_rgb;./cal_data_feat"
  cal_data_type: "float32"
  calibration_type: "default"

compiler_parameters:
  compile_mode: "latency"
  optimize_level: "O3"
  debug: True                            # 开启 debug 查看性能分析
  jobs: 8                                # 并行编译进程数

校准数据准备

方式 1：手动预处理（推荐，精度高）

# preprocess.py 示例
import cv2
import numpy as np
import os

def preprocess_image(img_path, output_path):
    # 读取图像
    img = cv2.imread(img_path)
    
    # 等比例缩放，短边到 640
    h, w = img.shape[:2]
    scale = 640 / min(h, w)
    new_h, new_w = int(h * scale), int(w * scale)
    img = cv2.resize(img, (new_w, new_h))
    
    # Center Crop 到 640x640
    start_x = (new_w - 640) // 2
    start_y = (new_h - 640) // 2
    img = img[start_y:start_y+640, start_x:start_x+640]
    
    # 转 BGR 并保存为二进制
    img.astype(np.uint8).tofile(output_path)

# 批量处理
os.makedirs("calibration_data", exist_ok=True)
for i, img_file in enumerate(os.listdir("raw_images")[:100]):
    preprocess_image(f"raw_images/{img_file}", f"calibration_data/{i:04d}.bgr")

方式 2：自动预处理（简单，精度略低）

在 YAML 中开启：

calibration_parameters:
  preprocess_on: True  # 工具链自动 resize 和格式转换

转换命令

# 1. 检查模型
hb_mapper checker --model-type onnx --march bayes-e --model your_model.onnx

# 2. 执行量化转换
hb_mapper makertbin --model-type onnx --config your_config.yaml

# 3. 查看转换结果
ls ./model_output/
# 输出：your_model.bin, your_model.hbm, 日志文件等

# 4. 性能评估（可选）
hb_perf --model ./model_output/your_model.bin

关键参数说明

参数	说明	常见值
`march`	BPU 架构	RDK X5 用 `bayes-e`
`input_type_train`	训练时输入格式	`bgr`/`rgb`/`yuv444`
`input_type_rt`	板端推理输入格式	`nv12`（推荐）/`rgb`
`calibration_type`	校准算法	`default`（自动搜索）
`cal_data_dir`	校准数据目录	100 张代表性图片
`optimize_level`	编译优化等级	`O3`（最高性能）

常见问题

Q1: 校准数据要多少张？

推荐 100~200 张，需来自训练集/验证集，覆盖典型场景

Q2: 量化后精度下降怎么办？

尝试 calibration_type: "mix" 或 "kl"
增加校准数据数量和质量
使用 node_info 指定敏感算子用 int16

Q3: 转换报错 “Unsupported operator”？

先用 hb_mapper checker 检查模型
查看不支持的算子，考虑替换或移到后处理

把你的模型类型（检测/分类/分割）和输入尺寸告诉我，我可以帮你定制一份更精确的 YAML 配置！