Precision loss of GlobalAveragePool operator

1.芯片型号:X3派

2.天工开物开发包OpenExplorer版本:horizon_xj3_open_explorer_v2.4.2_20221227

3.问题定位:模型转换<-->板端部署

4.问题具体描述

During using "hb_mapper makertbin" to convert my *.onnx to *.bin, i have found that one operator named "GlobalAveragePool_380" has low cosine similarity (about 0.84). It is this operator that brings about my single object tracking model generating invalid outputs. After executing "GlobalAveragePool_380" on CPU, i have had success in reducing loss of precision. However, this will make my *.bin have higher latency. Now, i want to question how can i execute this operator on BPU while having both low latency and high cosine similarity (0.90 may be enough) ?

Some extra descriptions:

  • My DCMT_sim.onnx has other GlobalAveragePool operators, but those operators (such as GlobalAveragePool_306) have high cosine similarity.
  • GlobalAveragePool_380 meets the restrictions mentioned in Horizon supported_op_list_and_restrictions document (https://developer.horizon.ai/api/v1/fileData/documents_pi/ai_toolchain_develop/horizon_ai_toolchain_user_guide/supported_op_list_and_restrictions.html).
  • Even if i put GlobalAveragePool_380 on CPU, the cosine similarity of it is not very high yet (about 0.90).
  • No matter what the inputs are, the second ouput (named output2) of *.bin with GlobalAveragePool_380 on BPU has constant values. You can see the strange phenomenon by changing the inputs in debug.py provided below.

My config.yaml is as follows:

# 模型参数组
model_parameters:
  # 原始Onnx浮点模型文件
  onnx_model: 'DCMT_sim.onnx'
  # 转换的目标AI芯片架构
  march: 'bernoulli2'
  # 模型转换输出的用于上板执行的模型文件的名称前缀
  output_model_file_prefix: 'DCMT'
  # 模型转换输出的结果的存放目录
  working_dir: './model/'
  # 指定转换后混合异构模型是否保留输出各层的中间结果的能力
  layer_out_dump: False

# 输入信息参数组
input_parameters:
  # 原始浮点模型的输入节点名称
  input_name: "input1; input2; input3"
  # 原始浮点模型的输入数据格式(数量/顺序与input_name一致)
  input_type_train: 'rgb; rgb; featuremap'
  # 原始浮点模型的输入数据排布(数量/顺序与input_name一致)
  input_layout_train: 'NCHW; NCHW; NCHW'
  # 原始浮点模型的输入数据尺寸
  input_shape: '1x3x127x127; 1x3x255x255; 1x4x1x1'
  # 转换后混合异构模型需要适配的输入数据格式(数量/顺序与input_name一致)
  input_type_rt: 'bgr; bgr; featuremap'
  # 转换后混合异构模型需要适配的输入数据排布(数量/顺序与input_name一致),若input_type_rt配置为nv12,则此处参数不需要配置
  input_layout_rt: 'NHWC; NHWC; NCHW'

# 校准参数组
calibration_parameters:
  # 模型校准使用的标定样本的存放目录
  cal_data_dir: './calibration/template/; ./calibration/search/; ./calibration/template_bbox/'
  # 指定校准数据二进制文件的数据存储类型
  cal_data_type: 'uint8; uint8; float32'
  # 开启图片校准样本自动处理(skimage read; resize到输入节点尺寸)
  preprocess_on: False
  # 校准使用的算法类型
  calibration_type: 'default'
  # 强制指定OP在CPU上运行,一般不需要配置,在模型精度调优阶段可以开启此功能,用于尝试精度优化
  #run_on_cpu: 'GlobalAveragePool_380; Exp_474'
  # max 校准方式的参数
  max_percentile: 1.0

# 编译参数组
compiler_parameters:
  # 编译策略选择
  compile_mode: 'latency'
  # 是否打开编译的debug信息
  debug: False
  # 模型运行核心数
  core_num: 2
  # 模型编译的优化等级选择
  optimize_level: 'O3'

The DCMT_sim.onnx and files of calibration can be found in the link below:

链接: https://pan.baidu.com/s/1xh4KNGrXgJkPHrrdYX6oZg 提取码: zshn

PS: DCMT_sim.onnx and files of calibration has been validated and they are good~

To verify the *.bin, you can use codes of debug.py as follows:

from horizon_nn import horizon_onnx
import horizon_nn.horizon_onnxruntime as rt
import numpy as np


# reference: https://developer.horizon.ai/forumDetail/71036815603174578
if __name__ == '__main__':

    model_type = 'original'
    # model_type = 'optimized'
    # model_type = 'quantized'

    x_bgr = np.fromfile('./data/x.bin', dtype=np.int8).reshape(1, 255, 255, 3)
    z_bgr = np.fromfile('./data/z.bin', dtype=np.int8).reshape(1, 127, 127, 3)
    z_box = np.fromfile('./data/b.bin', dtype=np.float32).reshape(1, 4, 1, 1)

    if model_type == 'original':
        #1 *.onnx load
        onnx_model = horizon_onnx.load("./model/DCMT_original_float_model.onnx")
        sess = rt.InferenceSession(onnx_model.SerializeToString())
        input_names = [input.name for input in sess.get_inputs()]
        output_names = [output.name for output in sess.get_outputs()]
        print('input_names: ', input_names)
        print('output_names: ', output_names)

        #2 Input data
        # z = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 127, 127)).astype(np.float32)  # DCMT_original_float_model.onnx
        # x = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 255, 255)).astype(np.float32)  # DCMT_original_float_model.onnx
        # bbox_t = np.asarray([30, 40, 100, 120]).astype(np.float32)
        # b = np.expand_dims(np.expand_dims(np.expand_dims(bbox_t, axis=-1), axis=-1), axis=0)  # (1x4x1x1)
        z = z_bgr.transpose(0, 3, 1, 2).astype(np.float32)
        x = x_bgr.transpose(0, 3, 1, 2).astype(np.float32)
        b = z_box
        feed_dict = {input_names[0]: z, input_names[1]: x, input_names[2]: b}  # DCMT_original_float_model.onnx

        #3 Run model
        result = sess.run(output_names, feed_dict)
        print(result[0].shape)
        print(result[1].shape)
        print(result[1][0, :, 0, 0])

    elif model_type == 'optimized':
        #1 *.onnx load
        onnx_model = horizon_onnx.load("./model/DCMT_optimized_float_model.onnx")
        sess = rt.InferenceSession(onnx_model.SerializeToString())
        input_names = [input.name for input in sess.get_inputs()]
        output_names = [output.name for output in sess.get_outputs()]
        print('input_names: ', input_names)
        print('output_names: ', output_names)

        #2 Input data
        # z = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 127, 127)).astype(np.float32)  # DCMT_optimized_float_model.onnx
        # x = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 255, 255)).astype(np.float32)  # DCMT_optimized_float_model.onnx
        # bbox_t = np.asarray([30, 40, 100, 120]).astype(np.float32)
        # b = np.expand_dims(np.expand_dims(np.expand_dims(bbox_t, axis=-1), axis=-1), axis=0)  # (1x4x1x1)
        z = z_bgr.transpose(0, 3, 1, 2).astype(np.float32)
        x = x_bgr.transpose(0, 3, 1, 2).astype(np.float32)
        b = z_box
        feed_dict = {input_names[0]: z, input_names[1]: x, input_names[2]: b}  # DCMT_optimized_float_model.onnx

        #3 Run model
        result = sess.run(output_names, feed_dict)
        print(result[0].shape)
        print(result[1].shape)
        print(result[1][0, :, 0, 0])

    elif model_type == 'quantized':
        #1 *.onnx load
        onnx_model = horizon_onnx.load("./model/DCMT_quantized_model.onnx")
        sess = rt.InferenceSession(onnx_model.SerializeToString())
        input_names = [input.name for input in sess.get_inputs()]
        output_names = [output.name for output in sess.get_outputs()]
        print('input_names: ', input_names)
        print('output_names: ', output_names)

        #2 Input data
        # z = np.random.uniform(low=0.0, high=255.0, size=(1, 127, 127, 3)).astype(np.int8)  # DCMT_quantized_model.onnx
        # x = np.random.uniform(low=0.0, high=255.0, size=(1, 255, 255, 3)).astype(np.int8)  # DCMT_quantized_model.onnx
        # bbox_t = np.asarray([30, 40, 100, 120]).astype(np.float32)
        # b = np.expand_dims(np.expand_dims(np.expand_dims(bbox_t, axis=-1), axis=-1), axis=0)  # (1x4x1x1)
        z = z_bgr
        x = x_bgr
        b = z_box
        feed_dict = {input_names[0]: x, input_names[1]: z, input_names[2]: b}  # DCMT_quantized_model.onnx

        #3 Run model
        result = sess.run(output_names, feed_dict)
        print(result[0].shape)
        print(result[1].shape)
        print(result[1][0, :, 0, 0])

x.bin, z.bin and b.bin can be found in https://developer.horizon.ai/forumDetail/146176815327779277

THANKS VERY MUCH~

感谢您使用地平线芯片算法工具链,最近我们在收集大家的满意度反馈,欢迎您填写问卷,详细情况可见:https://developer.horizon.ai/forumDetail/146177053698464782

Hi, it is recommended to try only using Exp_ 474 run_ on_ cpu. Other issues will be discussed after your feedback. Look forward to your reply~

Emm…If i execute only Exp_474 on cpu, it seems that the model’s outputs are right (with some precision loss but the outputs’ value of “output2” are not constant)! ?

So, the operator Exp_474 causes the problem of constant outputs?!! Why?

and How can i execute the operator on BPU well? ?

I also notice that if i put Exp_474 on cpu and GlobalAveragePool_380 on BPU, the cosine similarity of GlobalAveragePool_380 is not high(0.839681).

The hb_mapper_makertbin.log and other model files can be found in the following link:

链接: https://pan.baidu.com/s/1fkAB7REE\_lpHLvBof1fwQw 提取码: n5kf

Emm…We are still analyzing the problem of the Exp_474 operator.

It is recommended to verify the accuracy of the model. Cosine similarity is not necessarily proportional to accuracy, it is only used as a reference indicator

Hi,Please install the three patches in the compressed package. Then, Exp_474 can be run on bpu.

链接:https://pan.horizon.ai/index.php/s/X9QAAGziJKP98AJ