RDK X5量化Tinker模型过程

一、准备工作

1. 安装docker

根据系统完成docker安装,可参考docker官方安装教程

2. 配置使用D-Robotics算法工具链

- 下载RDK的OE交付包及对应的Docker镜像

- 解压OE交付包

tar -xvf horizon_x5_open_explorer_v1.2.8-py310_20240926.tar.gz

- 设置docker的映射路径并导入镜像

export version=v1.2.8
export ai_toolchain_package_path=/path/to/horizon_x5_open_explorer_v1.2.8-py310_20240926

docker load < docker_openexplorer_ubuntu_20_x5_gpu_v1.2.8.tar.gz

- 启动并验证镜像

sudo docker run -it --rm -v "$ai_toolchain_package_path":/open_explorer -v "$dataset_path":/data openexplorer/ai_toolchain_ubuntu_20_x5_cpu:v1.2.8-py310

hb_mapper #有输出打印则说明环境安装完成

3. 转换模型为ONNX格式并准备矫正数据集

files.zip (1.2 MB)

  • onnx模型——best.onnx
  • cal_data1(shape:1 * 39,input通道输入)
  • cal_data2(shape:1 * 10 * 39,obs_hist通道输入)

二、PTQ 量化模型

1.编写模型转换配置文件

参考链接(模型量化yaml配置文件模板来源)
yaml参数说明

# Copyright (c) 2020 D-Robotics.All Rights Reserved.

# 模型转化相关的参数
model_parameters:
  # 必选参数
  # Onnx浮点网络数据模型文件, 例如:onnx_model: './horizon_ultra_onnx.onnx'
  onnx_model: ''
  march: "bayes-e"
  layer_out_dump: False
  working_dir: 'model_output'
  output_model_file_prefix: 'horizon_x5'

# 模型输入相关参数
input_parameters:
  input_name: ""
  input_shape: ''
  input_type_rt: 'nv12'
  input_layout_rt: ''

  # 必选参数
  # 原始浮点模型训练框架中所使用训练的数据类型,可选的值为rgb/bgr/gray/featuremap/yuv444, 例如:input_type_train: 'bgr'
  input_type_train: ''

  # 必选参数
  # 原始浮点模型训练框架中所使用训练的数据排布, 可选值为 NHWC/NCHW, 例如:input_layout_train: 'NHWC'
  input_layout_train: ''
  #input_batch: 1

  # 必选参数  
  # 原始浮点模型训练框架中所使用数据预处理方法,可配置:no_preprocess/data_mean/data_scale/data_mean_and_scale
  # no_preprocess 不做任何操作,对应的 mean_value  或者 scale_value 均无需配置
  # data_mean 减去通道均值mean_value,对应的 mean_value 需要配置,并注释掉scale_value
  # data_scale 对图像像素乘以data_scale系数,对应的 scale_value需要配置,并注释掉mean_value
  # data_mean_and_scale 减去通道均值后再乘以scale系数,标识下方对应的 mean_value  和 scale_value 均需配置

  norm_type: ''

  # 必选参数
  # 图像减去的均值, 如果是通道均值,value之间必须用空格分隔
  # 例如:mean_value: 128.0 或者 mean_value: 111.0 109.0 118.0 
  mean_value: 

  # 必选参数
  # 图像预处理缩放比例,如果是通道缩放比例,value之间必须用空格分隔,计算公式:scale = 1/std
  # 例如:scale_value: 0.0078125 或者 scale_value: 0.0078125 0.001215 0.003680
  scale_value: 

# 模型量化相关参数
calibration_parameters:
  # 必选参数
  # 模型量化的参考图像的存放目录,图片格式支持Jpeg、Bmp等格式,图片来源一般是从测试集中选择100张图片,并要覆盖典型场景,不要是偏僻场景,如过曝光、饱和、模糊、纯黑、纯白等图片 
  # 请根据 02_preprocess.sh 脚本中的文件夹路径来配置,例如:cal_data_dir: './calibration_data_yuv_f32'
  cal_data_dir: ''
  
  cal_data_type: 'float32'
  calibration_type: 'default'
  # max_percentile: 0.99996

# 编译器相关参数
compiler_parameters:
  compile_mode: 'latency'
  debug: False
  # core_num: 2
  optimize_level: 'O3'

进入所需目录,生成配置文件

cd /path/to/xxx
touch xxx.yaml
nano xxx.yaml

复制粘贴上述模板内容至yam文件,并按需修改必选参数,其中模型输入相关参数可以从可视化模型结构获取。(具体配置可参考yaml文件如下)

calibration_parameters:
  cal_data_dir: './cal_data1_bin;./cal_data2_bin'
  cal_data_type: 'float32;float32'
  calibration_type: 'default'
  # max_percentile: 0.99999
  optimization: set_all_nodes_int16
  per_channel: True
compiler_parameters:
  compile_mode: latency
  optimize_level: O3
input_parameters:
  input_layout_rt: NCHW;NCHW
  input_layout_train: NCHW;NCHW
  input_name: input;obs_hist.1
  input_shape: 1x39;1x10x39
  input_type_rt: featuremap;featuremap
  input_type_train: featuremap;featuremap
model_parameters:
  march: bayes-e
  onnx_model: best.onnx
  output_model_file_prefix: sup
  working_dir: ./model_output_sup

注意,多输入模型的参数需要按照以下结构填写,要求每个输入的各个参数严格按列对应:

 input_name: 'input;obs_hist.1'
 input_shape: '1x39;1X10x39'
 input_type_rt: 'featuremap;featuremap'
 input_layout_rt: 'NCHW;NCHW'

接下来输入指令转换模型。

cd /path/to/xxx.yaml
hb_mapper makertbin --model-type onnx --config xxx.yaml

终端没有报错则表示转换成功,转换过程中会输出FPS、各个算子的余弦相似度、模型余弦相似度等信息,并且也会以hb_mapper_makertbin.log文件形式保存至当前目录;下面贴上一份具体的log供参考。

2025-10-11 17:45:17,691 file: model_builder.py func: model_builder line No: 35 Start to Horizon NN Model Convert.
2025-10-11 17:45:17,696 file: model_debugger.py func: model_debugger line No: 67 Loading horizon_nn debug methods:set()
2025-10-11 17:45:17,696 file: quantization_config.py func: quantization_config line No: 305 The activation calibration parameters:
    calibration_type:     ['max', 'kl']
    per_channel:          [True, False]
    max_percentile:       [0.99995, 1.0]
    asymmetric:           [True, False]
The modelwise search parameters:
    similarity:           0.995
    metric:               cosine-similarity
All nodes in the model are set to datatype: int16
2025-10-11 17:45:17,696 file: model_builder.py func: model_builder line No: 197 The specified model compilation architecture: bayes-e.
2025-10-11 17:45:17,696 file: model_builder.py func: model_builder line No: 207 The specified model compilation optimization parameters: [].
2025-10-11 17:45:17,696 file: model_builder.py func: model_builder line No: 35 Start to prepare the onnx model.
2025-10-11 17:45:17,702 file: prepare.py func: prepare line No: 106 Input ONNX Model Information:
ONNX IR version:          6
Opset version:            ['ai.onnx v11', 'horizon v1']
Producer:                 pytorch v2.4.1
Domain:                   None
Version:                  None
Graph input:
    input:                shape=[1, 39], dtype=FLOAT32
    obs_hist.1:           shape=[1, 10, 39], dtype=FLOAT32
Graph output:
    output:               shape=[1, 10], dtype=FLOAT32
2025-10-11 17:45:17,733 file: model_builder.py func: model_builder line No: 38 End to prepare the onnx model.
2025-10-11 17:45:17,739 file: model_builder.py func: model_builder line No: 265 Saving model to: sup_original_float_model.onnx.
2025-10-11 17:45:17,739 file: model_builder.py func: model_builder line No: 35 Start to optimize the onnx model.
2025-10-11 17:45:17,786 file: constant_folding.py func: constant_folding line No: 66 Summary info for constant_folding:
2025-10-11 17:45:17,786 file: constant_folding.py func: constant_folding line No: 67   After constant_folding, the number of nodes has changed from 96 to 71.
2025-10-11 17:45:17,786 file: constant_folding.py func: constant_folding line No: 71   After constant_folding, the number of parameters has changed from 409592 to 409592.
2025-10-11 17:45:17,786 file: constant_folding.py func: constant_folding line No: 76 Detailed info for constant_folding:
2025-10-11 17:45:17,786 file: constant_folding.py func: constant_folding line No: 88   After folding node (op_name: /actor/Flatten, op_type: Flatten), the number of increased parameters is 0.
  After folding node (op_name: /actor/Flatten_1, op_type: Flatten), the number of increased parameters is 0.
  After folding node (op_name: /actor/Flatten_2, op_type: Flatten), the number of increased parameters is 0.
2025-10-11 17:45:17,825 file: model_builder.py func: model_builder line No: 38 End to optimize the onnx model.
2025-10-11 17:45:17,832 file: model_builder.py func: model_builder line No: 265 Saving model to: sup_optimized_float_model.onnx.
2025-10-11 17:45:17,832 file: model_builder.py func: model_builder line No: 35 Start to calibrate the model.
2025-10-11 17:45:18,038 file: tool_utils.py func: tool_utils line No: 321 The input0 of Node(name:/actor/MatMul_8, type:MatMul) does not support data type: int16
2025-10-11 17:45:18,043 file: calibration_data_set.py func: calibration_data_set line No: 111 input name: input,  number_of_samples: 173
2025-10-11 17:45:18,043 file: tool_utils.py func: tool_utils line No: 321 The input0 of Node(name:/actor/MatMul_5, type:MatMul) does not support data type: int16
2025-10-11 17:45:18,044 file: calibration_data_set.py func: calibration_data_set line No: 111 input name: obs_hist.1,  number_of_samples: 173
2025-10-11 17:45:18,044 file: tool_utils.py func: tool_utils line No: 321 The input0 of Node(name:/actor/MatMul_2, type:MatMul) does not support data type: int16
2025-10-11 17:45:18,045 file: calibration_data_set.py func: calibration_data_set line No: 123 There are 173 samples in the data set.
2025-10-11 17:45:18,045 file: infer_thresholds.py func: infer_thresholds line No: 84 Run calibration model with modelwise search method.
2025-10-11 17:45:18,062 file: tool_utils.py func: tool_utils line No: 321 The input1 of Node(name:/actor/MatMul_2, type:MatMul) does not support data type: int16
2025-10-11 17:45:18,088 file: tool_utils.py func: tool_utils line No: 321 The input1 of Node(name:/actor/MatMul_5, type:MatMul) does not support data type: int16
2025-10-11 17:45:18,136 file: tool_utils.py func: tool_utils line No: 321 The input1 of Node(name:/actor/MatMul_8, type:MatMul) does not support data type: int16
2025-10-11 17:45:18,147 file: base.py func: base line No: 138 Calibration using batch 8
2025-10-11 17:45:18,152 file: tool_utils.py func: tool_utils line No: 321 The output of Node(name:/actor/Unsqueeze) is int16, then requantized to int8
2025-10-11 17:45:18,212 file: tool_utils.py func: tool_utils line No: 321 The output of Node(name:/actor/MatMul_transpose_0_reshape) is int16, then requantized to int8
2025-10-11 17:45:18,214 file: ort.py func: ort line No: 207 Reset batch_size=1 and execute forward again...
2025-10-11 17:45:18,214 file: tool_utils.py func: tool_utils line No: 321 The output of Node(name:/actor/Unsqueeze_2) is int16, then requantized to int8
2025-10-11 17:45:18,283 file: tool_utils.py func: tool_utils line No: 321 The output of Node(name:/actor/MatMul_3_transpose_0_reshape) is int16, then requantized to int8
2025-10-11 17:45:18,284 file: tool_utils.py func: tool_utils line No: 321 The output of Node(name:/actor/Unsqueeze_4) is int16, then requantized to int8
2025-10-11 17:45:18,284 file: tool_utils.py func: tool_utils line No: 321 The output of Node(name:/actor/MatMul_6_transpose_0_reshape) is int16, then requantized to int8
2025-10-11 17:59:00,088 file: modelwise_search.py func: modelwise_search line No: 75 Select max-percentile:percentile=0.99995 method.
2025-10-11 17:59:01,501 file: model_builder.py func: model_builder line No: 38 End to calibrate the model.
2025-10-11 17:59:01,725 file: model_builder.py func: model_builder line No: 265 Saving model to: sup_calibrated_model.onnx.
2025-10-11 17:59:01,725 file: model_builder.py func: model_builder line No: 35 Start to quantize the model.
2025-10-11 17:59:03,636 file: constant_folding.py func: constant_folding line No: 66 Summary info for constant_folding:
2025-10-11 17:59:03,636 file: constant_folding.py func: constant_folding line No: 67   After constant_folding, the number of nodes has changed from 88 to 88.
2025-10-11 17:59:03,636 file: constant_folding.py func: constant_folding line No: 71   After constant_folding, the number of parameters has changed from 468136 to 468136.
2025-10-11 17:59:03,636 file: constant_folding.py func: constant_folding line No: 76 Detailed info for constant_folding:
2025-10-11 17:59:03,636 file: constant_folding.py func: constant_folding line No: 88 
2025-10-11 17:59:03,948 file: model_builder.py func: model_builder line No: 38 End to quantize the model.
2025-10-11 17:59:04,262 file: model_builder.py func: model_builder line No: 265 Saving model to: sup_quantized_model.onnx.
2025-10-11 17:59:04,262 file: model_builder.py func: model_builder line No: 35 Start to compile the model with march bayes-e.
2025-10-11 17:59:05,868 file: hybrid_build.py func: hybrid_build line No: 111 Compile submodel: main_graph_subgraph_0
2025-10-11 17:59:05,888 file: hbdk_cc.py func: hbdk_cc line No: 126 hbdk-cc parameters:['--O3', '--debug', '--core-num', '1', '--fast', '--input-layout', 'NHWC', '--output-layout', 'NHWC', '--input-source', 'ddr,ddr']
2025-10-11 17:59:05,888 file: hbdk_cc.py func: hbdk_cc line No: 127 hbdk-cc command used:hbdk-cc -f hbir -m /tmp/tmpwc6dher3/main_graph_subgraph_0.hbir -o /tmp/tmpwc6dher3/main_graph_subgraph_0.hbm --march bayes-e --progressbar --O3 --debug --core-num 1 --fast --input-layout NHWC --output-layout NHWC --input-source ddr,ddr
2025-10-11 17:59:27,179 file: tool_utils.py func: tool_utils line No: 326 consumed time 21.2712
2025-10-11 17:59:27,238 file: tool_utils.py func: tool_utils line No: 326 FPS=738.86, latency = 1353.4 us, DDR = 8020704 bytes   (see main_graph_subgraph_0.html)
2025-10-11 17:59:27,305 file: model_builder.py func: model_builder line No: 38 End to compile the model with march bayes-e.
2025-10-11 17:59:29,190 file: print_info_dict.py func: print_info_dict line No: 72 The main quantized node information:
===================================================================================================================================
Node                                                ON   Subgraph  Type                    Cosine Similarity  Threshold  DataType  
-----------------------------------------------------------------------------------------------------------------------------------
/Slice                                              BPU  id(0)     Slice                   1.000000           1.41351    int16     
/Unsqueeze                                          BPU  id(0)     Reshape                 1.000000           3.30783    int16     
/Unsqueeze_output_0_calibrated_Requantize           BPU  id(0)     HzRequantize            --                 --         int16     
/Concat                                             BPU  id(0)     Concat                  1.000000           1.41351    int16     
/Slice_1                                            BPU  id(0)     Slice                   1.000000           1.41351    int16     
/Reshape                                            BPU  id(0)     Reshape                 1.000000           1.41351    int16     
/mlp_encoder/0/Gemm                                 BPU  id(0)     HzSQuantizedConv        0.999983           1.41351    int16     
/mlp_encoder//Elu                                   BPU  id(0)     HzLut2Layer             0.999984           2.22292    int16     
/mlp_encoder/2/Gemm                                 BPU  id(0)     HzSQuantizedConv        0.999997           2.09289    int16     
/mlp_encoder/Elu                                    BPU  id(0)     HzLut2Layer             0.999996           10.168     int16     
/mlp_encoder/4/ReduceMean                           BPU  id(0)     HzSQuantizedReduceMean  1.000000           8.73395    int16     
/mlp_encoder/4/Sub                                  BPU  id(0)     HzSElementwiseSub       0.999996           8.73395    int16     
/mlp_encoder/4/Pow                                  BPU  id(0)     HzLut2Layer             0.999994           8.19725    int16     
/mlp_encoder/4/ReduceMean_1                         BPU  id(0)     HzSQuantizedReduceMean  1.000000           67.195     int16     
/mlp_encoder/4/Div_reciprocal                       BPU  id(0)     HzLut2Layer             1.000000           4.9403     int16     
/mlp_encoder/4/Div_mul                              BPU  id(0)     HzSElementwiseMul       0.999997           8.19725    int16     
/mlp_encoder/5/Gemm                                 BPU  id(0)     HzSQuantizedConv        0.999995           5.55314    int16     
/mlp_encoder/Elu_1                                  BPU  id(0)     HzLut2Layer             0.999992           13.3564    int16     
/mlp_encoder/7/ReduceMean                           BPU  id(0)     HzSQuantizedReduceMean  1.000000           9.16719    int16     
/mlp_encoder/7/Sub                                  BPU  id(0)     HzSElementwiseSub       0.999992           9.16719    int16     
/mlp_encoder/7/Pow                                  BPU  id(0)     HzLut2Layer             0.999992           8.24024    int16     
/mlp_encoder/7/ReduceMean_1                         BPU  id(0)     HzSQuantizedReduceMean  1.000000           67.9016    int16     
/mlp_encoder/7/Div_reciprocal                       BPU  id(0)     HzLut2Layer             1.000000           4.58363    int16     
/mlp_encoder/7/Div_mul                              BPU  id(0)     HzSElementwiseMul       0.999992           8.24024    int16     
/mlp_encoder/8/Gemm                                 BPU  id(0)     HzSQuantizedConv        0.999987           4.93075    int16     
/actor/Concat                                       BPU  id(0)     Concat                  0.999987           3.30783    int16     
/actor/gate/0/Gemm_pre_reshape                      BPU  id(0)     Reshape                 0.999987           3.30783    int16     
/actor/gate/0/Gemm                                  BPU  id(0)     HzSQuantizedConv        0.999962           3.30783    int16     
/actor/gate/1/Elu                                   BPU  id(0)     HzLut2Layer             0.999962           1.55307    int16     
/actor/gate/2/Gemm                                  BPU  id(0)     HzSQuantizedConv        0.999979           1.52875    int16     
/actor/gate/3/Elu                                   BPU  id(0)     HzLut2Layer             0.999977           2.0178     int16     
/actor/gate/4/Gemm                                  BPU  id(0)     HzSQuantizedConv        0.999989           1.95905    int16     
/actor/Softmax_reducemax_FROM_QUANTIZED_SOFTMAX     BPU  id(0)     HzQuantizedReduceMax    1.000000           2.79715    int16     
/actor/Softmax_sub_FROM_QUANTIZED_SOFTMAX           BPU  id(0)     HzSElementwiseSub       0.999997           2.79715    int16     
/actor/Softmax_exp_FROM_QUANTIZED_SOFTMAX           BPU  id(0)     HzLut2Layer             0.999998           11.0903    int16     
/actor/Softmax_reducesum_FROM_QUANTIZED_SOFTMAX     BPU  id(0)     HzSQuantizedReduceSum   1.000000           1.0        int16     
/actor/Softmax_reciprocal_FROM_QUANTIZED_SOFTMAX    BPU  id(0)     HzLut2Layer             1.000000           3.53229    int16     
/actor/Softmax_mul_FROM_QUANTIZED_SOFTMAX           BPU  id(0)     HzSElementwiseMul       0.999998           1.0        int16     
/actor/MatMul_reshape_input                         BPU  id(0)     Reshape                 0.999998           0.920062   int16     
/actor/MatMul                                       BPU  id(0)     HzSQuantizedConv        0.999996           0.920062   int16     
variable_227_Requantize                             BPU  id(0)     HzRequantize            --                 --         int16     
/actor/Unsqueeze                                    BPU  id(0)     Reshape                 0.999987           3.30783    int16     
/actor/MatMul_1                                     BPU  id(0)     HzSQuantizedConv        0.999991           0.920062   int16     
variable_228_Requantize                             BPU  id(0)     HzRequantize            --                 --         int16     
/actor/Unsqueeze_output_0_calibrated_Requantize     BPU  id(0)     HzRequantize            --                 --         int16     
/actor/MatMul_2                                     BPU  id(0)     HzSQuantizedMatmul      0.999709           3.30783    int8      
/actor/Mul                                          BPU  id(0)     HzSElementwiseMul       0.999709           0.791948   int16     
/actor/Add                                          BPU  id(0)     HzSElementwiseAdd       0.999719           0.791948   int16     
/actor/Elu                                          BPU  id(0)     HzLut2Layer             0.999726           0.791326   int16     
/actor/Elu/actor/Elu_output_0_Reshape_0             BPU  id(0)     Reshape                 0.999726           3.28868    int16     
/actor/MatMul_3                                     BPU  id(0)     HzSQuantizedConv        0.999994           0.920062   int16     
variable_229_Requantize                             BPU  id(0)     HzRequantize            --                 --         int16     
/mlp_encoder/8/Gemm_output_0_calibrated_Requantize  BPU  id(0)     HzRequantize            --                 --         int16     
/actor/Elu_output_0_calibrated_Requantize           BPU  id(0)     HzRequantize            --                 --         int16     
/actor/Concat_1                                     BPU  id(0)     Concat                  0.999852           3.30783    int8      
/actor/Unsqueeze_2                                  BPU  id(0)     Reshape                 0.999852           3.28868    int8      
/actor/MatMul_4                                     BPU  id(0)     HzSQuantizedConv        0.999994           0.920062   int16     
variable_225_Requantize                             BPU  id(0)     HzRequantize            --                 --         int16     
/actor/MatMul_5                                     BPU  id(0)     HzSQuantizedMatmul      0.999979           3.28868    int8      
/actor/Mul_2                                        BPU  id(0)     HzSElementwiseMul       0.999979           2.44794    int16     
/actor/Add_1                                        BPU  id(0)     HzSElementwiseAdd       0.999980           2.44794    int16     
/actor/Elu_1                                        BPU  id(0)     HzLut2Layer             0.999982           2.46283    int16     
/actor/Elu_1/actor/Elu_1_output_0_Reshape_0         BPU  id(0)     Reshape                 0.999982           3.28868    int16     
/actor/MatMul_6                                     BPU  id(0)     HzSQuantizedConv        0.999994           0.920062   int16     
variable_226_Requantize                             BPU  id(0)     HzRequantize            --                 --         int16     
/actor/Elu_1_output_0_calibrated_Requantize         BPU  id(0)     HzRequantize            --                 --         int16     
/actor/Concat_2                                     BPU  id(0)     Concat                  0.999909           3.30783    int8      
/actor/Unsqueeze_4                                  BPU  id(0)     Reshape                 0.999909           3.28868    int8      
/actor/MatMul_7                                     BPU  id(0)     HzSQuantizedConv        0.999996           0.920062   int16     
/actor/MatMul_8                                     BPU  id(0)     HzSQuantizedMatmul      0.999995           3.28868    int8      
/actor/Mul_4                                        BPU  id(0)     HzSElementwiseMul       0.999995           4.41123    int16     
/actor/Add_2                                        BPU  id(0)     HzSElementwiseAdd       0.999995           4.41123    int16     
/actor/Squeeze_2                                    BPU  id(0)     Reshape                 0.999995           4.43782    int16
2025-10-11 17:59:29,190 file: print_info_dict.py func: print_info_dict line No: 72 The quantized model output:
=============================================================================
Output      Cosine Similarity  L1 Distance  L2 Distance  Chebyshev Distance  
-----------------------------------------------------------------------------
output      0.999995           0.003761     0.001519     0.009816
2025-10-11 17:59:29,195 file: model_builder.py func: model_builder line No: 38 End to Horizon NN Model Convert.
2025-10-11 17:59:29,203 file: hb_mapper_makertbin.py func: hb_mapper_makertbin line No: 601 start convert to *.bin file....
2025-10-11 17:59:29,224 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4326 ONNX model output num : 1
2025-10-11 17:59:29,228 file: layout_util.py func: layout_util line No: 15 set_featuremap_layout start
2025-10-11 17:59:29,228 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4060 model_deps_info: {'hb_mapper_version': '1.24.3', 'hbdk_version': '3.49.15', 'hbdk_runtime_version': ' 3.15.55.0', 'horizon_nn_version': '1.1.0', 'onnx_model': '/open_explorer/models/duck/best.onnx', 'march': 'bayes-e', 'layer_out_dump': False, 'log_level': 'DEBUG', 'working_dir': '/open_explorer/models/duck/model_output_sup', 'model_prefix': 'sup', 'input_names': ['obs_hist.1', 'input'], 'input_type_rt': ['featuremap', 'featuremap'], 'input_space_and_range': ['regular', 'regular'], 'input_type_train': ['featuremap', 'featuremap'], 'input_layout_rt': ['NCHW', 'NCHW'], 'input_layout_train': ['NCHW', 'NCHW'], 'norm_type': ['no_preprocess', 'no_preprocess'], 'scale_value': ['', ''], 'mean_value': ['', ''], 'input_shape': ['1x10x39', '1x39'], 'input_batch': [], 'cal_dir': ['/open_explorer/models/duck/cal_data2_bin', '/open_explorer/models/duck/cal_data1_bin'], 'cal_data_type': ['float32', 'float32'], 'preprocess_on': False, 'calibration_type': 'default', 'per_channel': 'True', 'optimization': ['set_all_nodes_int16'], 'hbdk_params': {'hbdk_pass_through_params': '--O3 --debug --core-num 1 --fast ', 'input-source': {'input': 'ddr', 'obs_hist.1': 'ddr', '_default_value': 'ddr'}}, 'debug': True, 'compile_mode': 'latency'}
2025-10-11 17:59:29,228 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4183 ############# model deps info #############
2025-10-11 17:59:29,228 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4184 hb_mapper version   : 1.24.3
2025-10-11 17:59:29,228 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4187 hbdk version        : 3.49.15
2025-10-11 17:59:29,228 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4189 hbdk runtime version: 3.15.55.0
2025-10-11 17:59:29,228 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4192 horizon_nn version  : 1.1.0
2025-10-11 17:59:29,228 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4196 ############# model_parameters info #############
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4202 onnx_model          : /open_explorer/models/duck/best.onnx
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4203 BPU march           : bayes-e
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4204 layer_out_dump      : False
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4205 log_level           : DEBUG
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4206 working dir         : /open_explorer/models/duck/model_output_sup
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4207 output_model_file_prefix: sup
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4228 ############# input_parameters info #############
2025-10-11 17:59:29,229 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4246 ------------------------------------------
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4248 ---------input info : obs_hist.1 ---------
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4249 input_name          : obs_hist.1
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4250 input_type_rt       : featuremap
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4252 input_space&range   : regular
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4254 input_layout_rt     : NCHW
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4255 input_type_train    : featuremap
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4256 input_layout_train  : NCHW
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4257 norm_type           : no_preprocess
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4258 input_shape         : 1x10x39
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4268 cal_data_dir        : /open_explorer/models/duck/cal_data2_bin
2025-10-11 17:59:29,230 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4270 cal_data_type       : float32
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4271 ---------input info : obs_hist.1 end -------
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4248 ---------input info : input ---------
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4249 input_name          : input
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4250 input_type_rt       : featuremap
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4252 input_space&range   : regular
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4254 input_layout_rt     : NCHW
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4255 input_type_train    : featuremap
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4256 input_layout_train  : NCHW
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4257 norm_type           : no_preprocess
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4258 input_shape         : 1x39
2025-10-11 17:59:29,231 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4268 cal_data_dir        : /open_explorer/models/duck/cal_data1_bin
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4270 cal_data_type       : float32
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4271 ---------input info : input end -------
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4272 ------------------------------------------
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4274 ############# calibration_parameters info #############
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4275 preprocess_on       : False
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4276 calibration_type:   : default
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4278 optimization        : set_all_nodes_int16;
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4284 per_channel         : True
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4303 ############# compiler_parameters info #############
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4305 debug               : True
2025-10-11 17:59:29,232 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4307 compile_mode        : latency
2025-10-11 17:59:29,233 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4310 hbdk_pass_through_params: --O3 --debug --core-num 1 --fast
2025-10-11 17:59:29,233 file: onnx2horizonrt.py func: onnx2horizonrt line No: 4310 input-source        : {'input': 'ddr', 'obs_hist.1': 'ddr', '_default_value': 'ddr'}
2025-10-11 17:59:29,236 file: hb_mapper_makertbin.py func: hb_mapper_makertbin line No: 783 Convert to runtime bin file successfully!
2025-10-11 17:59:29,236 file: hb_mapper_makertbin.py func: hb_mapper_makertbin line No: 784 End Model Convert

成功执行转换命令后便会在当前目录保存有output文件夹。转换产出物解读参考链接
至此,模型量化完成。

三、PTQ权重拆分调优

参考链接( PTQ权重拆分调优)

1.找到模型精度表现最佳的yaml组合

调整量化策略如max,max_percentile,per_channel等等,并且set_all_nodes_int16通常都需要打开。

calibration_type max_percentile per_channel advanced_parameters.set_all_nodes_int16 余弦相似度
default - - - 0.997538
max 0.99996 True True 0.999967
mix - True True 0.999988
default - True - 0.999995

2.做权重的精度debug

按照手册的精度debug指南章节,进行weight的精度debug,记录debug结果中相似度低于0.9999conv算子的完整node name。


并将下述脚本代码粘贴至文件中,并按需修改main部分代码。注意,需要输入由PTQ量化步骤输出的xxx_optimized_float_model.onnx文件,与精度debug步骤中得到的节点名:

import numpy as np  
from copy import deepcopy

# horizon_nn 1.1.0
from horizon_nn.common import constant_folding
from horizon_nn.ir import load_model, save_model

# horizon_nn newer version
# from hmct.common import constant_folding
# from hmct.it import load_model, save_model

# 最近的patch也有可能改成了这个
# from hmct.common import ConstantFolding
# from hmct.ir import load_model, save_model

def split_conv_nodes(model, conv_names):
    for conv_name in conv_names:
        conv_node = model.graph.node_mappings[conv_name]
        before_node = conv_node.inputs[0].src_op
        conv_weight_value = deepcopy(conv_node.inputs[1].value)
        conv_weight_max = abs(conv_weight_value).max(axis=(1,2,3))
        moded = (conv_weight_max / 127)[:, np.newaxis, np.newaxis, np.newaxis] + 1e-10
        conv_weight_high = np.floor(np.clip(conv_weight_value / moded + 1e-5, -127, 127)) * moded
        conv_weight_low = conv_weight_value - conv_weight_high
        conv_bias_value = conv_node.inputs[2].value if len(conv_node.inputs) == 3 else np.zeros(conv_weight_value.shape[0], np.float32)
        conv1_weight_var = model.graph.create_variable(
            is_param=True,
            value=conv_weight_high,
        )
        conv1_bias_var = conv_node.inputs[2] if len(conv_node.inputs) == 3 else model.graph.create_variable(
            is_param=True,
            value=np.zeros_like(conv_bias_value, np.float32),
        )
        conv1_node = model.graph.create_node(
            op_type="Conv",
            name = conv_node.name + "_split0",
            attributes=conv_node.attributes,
            inputs=[conv_node.inputs[0], conv1_weight_var, conv1_bias_var],
            num_outputs=1)
        if before_node is not None:
            conv1_node.insert_after(before_node)
        else:
            conv1_node.prepend_on()
        conv2_weight_var = model.graph.create_variable(
            is_param=True,
            value=conv_weight_low,
        )
        conv2_bias_var = model.graph.create_variable(
            is_param=True,
            value=np.zeros_like(conv_bias_value, np.float32),
        )
        conv2_node = model.graph.create_node(
            op_type="Conv",
            name = conv_node.name + "_split1",
            attributes=conv_node.attributes,
            inputs=[conv_node.inputs[0], conv2_weight_var, conv2_bias_var],
            num_outputs=1)
        if before_node is not None:
            conv2_node.insert_after(before_node)
        else:
            conv2_node.prepend_on()
        add1_node = model.graph.create_node(
            op_type="Add",
            inputs=[conv1_node.outputs[0], conv2_node.outputs[0]],
            name=conv_node.name + "_split_add0",
            num_outputs=1).insert_after(conv1_node)
        conv_node.replace_all_uses_with(add1_node)
        if not conv_node.is_used:
            conv_node.destroy()
    model.infer_shapes()
    model.check_validity()
    return model

if __name__ == "__main__":
    model = constant_folding(load_model("./xxx_optimized_float_model.onnx"))
    model = split_conv_nodes(model, conv_names=[
        "/mlp_encoder/0/Gemm",
        "/mlp_encoder/2/Gemm",
        "/mlp_encoder/5/Gemm",
        "/mlp_encoder/8/Gemm",
        "/actor/MatMul_1",
        "/actor/MatMul_3",
        "/actor/MatMul_4",
        "/actor/MatMul_6",
        "/actor/MatMul_7",
    ])
    save_model(model, "xxx_split.onnx")

接着将得到的xxx_split.onnx文件重复一遍《二、PTQ量化模型》章节的流程,注意必须修改量化yaml文件中模型参数部分的配置(参考以下):

model_parameters:
 march: bayes-e
 onnx_model: xxxSplit.onnx
 output_model_file_prefix: xxxSplit
 working_dir: ./model_output_xxxSplit

量化后就能得到经过权重拆分调优的.bin文件,最终输出文件夹目录如图。
sup_split.zip (1.1 MB)
image