1.芯片型号:X3派
2.天工开物开发包OpenExplorer版本:horizon_xj3_open_explorer_v2.4.2_20221227
3.问题定位:模型转换<-->板端部署
4.问题具体描述:
During using "hb_mapper makertbin" to convert my *.onnx to *.bin, i have found that one operator named "GlobalAveragePool_380" has low cosine similarity (about 0.84). It is this operator that brings about my single object tracking model generating invalid outputs. After executing "GlobalAveragePool_380" on CPU, i have had success in reducing loss of precision. However, this will make my *.bin have higher latency. Now, i want to question how can i execute this operator on BPU while having both low latency and high cosine similarity (0.90 may be enough) ?
Some extra descriptions:
- My DCMT_sim.onnx has other GlobalAveragePool operators, but those operators (such as GlobalAveragePool_306) have high cosine similarity.
- GlobalAveragePool_380 meets the restrictions mentioned in Horizon supported_op_list_and_restrictions document (https://developer.horizon.ai/api/v1/fileData/documents_pi/ai_toolchain_develop/horizon_ai_toolchain_user_guide/supported_op_list_and_restrictions.html).
- Even if i put GlobalAveragePool_380 on CPU, the cosine similarity of it is not very high yet (about 0.90).
- No matter what the inputs are, the second ouput (named output2) of *.bin with GlobalAveragePool_380 on BPU has constant values. You can see the strange phenomenon by changing the inputs in debug.py provided below.
My config.yaml is as follows:
# 模型参数组
model_parameters:
# 原始Onnx浮点模型文件
onnx_model: 'DCMT_sim.onnx'
# 转换的目标AI芯片架构
march: 'bernoulli2'
# 模型转换输出的用于上板执行的模型文件的名称前缀
output_model_file_prefix: 'DCMT'
# 模型转换输出的结果的存放目录
working_dir: './model/'
# 指定转换后混合异构模型是否保留输出各层的中间结果的能力
layer_out_dump: False
# 输入信息参数组
input_parameters:
# 原始浮点模型的输入节点名称
input_name: "input1; input2; input3"
# 原始浮点模型的输入数据格式(数量/顺序与input_name一致)
input_type_train: 'rgb; rgb; featuremap'
# 原始浮点模型的输入数据排布(数量/顺序与input_name一致)
input_layout_train: 'NCHW; NCHW; NCHW'
# 原始浮点模型的输入数据尺寸
input_shape: '1x3x127x127; 1x3x255x255; 1x4x1x1'
# 转换后混合异构模型需要适配的输入数据格式(数量/顺序与input_name一致)
input_type_rt: 'bgr; bgr; featuremap'
# 转换后混合异构模型需要适配的输入数据排布(数量/顺序与input_name一致),若input_type_rt配置为nv12,则此处参数不需要配置
input_layout_rt: 'NHWC; NHWC; NCHW'
# 校准参数组
calibration_parameters:
# 模型校准使用的标定样本的存放目录
cal_data_dir: './calibration/template/; ./calibration/search/; ./calibration/template_bbox/'
# 指定校准数据二进制文件的数据存储类型
cal_data_type: 'uint8; uint8; float32'
# 开启图片校准样本自动处理(skimage read; resize到输入节点尺寸)
preprocess_on: False
# 校准使用的算法类型
calibration_type: 'default'
# 强制指定OP在CPU上运行,一般不需要配置,在模型精度调优阶段可以开启此功能,用于尝试精度优化
#run_on_cpu: 'GlobalAveragePool_380; Exp_474'
# max 校准方式的参数
max_percentile: 1.0
# 编译参数组
compiler_parameters:
# 编译策略选择
compile_mode: 'latency'
# 是否打开编译的debug信息
debug: False
# 模型运行核心数
core_num: 2
# 模型编译的优化等级选择
optimize_level: 'O3'
The DCMT_sim.onnx and files of calibration can be found in the link below:
链接: https://pan.baidu.com/s/1xh4KNGrXgJkPHrrdYX6oZg 提取码: zshn
PS: DCMT_sim.onnx and files of calibration has been validated and they are good~
To verify the *.bin, you can use codes of debug.py as follows:
from horizon_nn import horizon_onnx
import horizon_nn.horizon_onnxruntime as rt
import numpy as np
# reference: https://developer.horizon.ai/forumDetail/71036815603174578
if __name__ == '__main__':
model_type = 'original'
# model_type = 'optimized'
# model_type = 'quantized'
x_bgr = np.fromfile('./data/x.bin', dtype=np.int8).reshape(1, 255, 255, 3)
z_bgr = np.fromfile('./data/z.bin', dtype=np.int8).reshape(1, 127, 127, 3)
z_box = np.fromfile('./data/b.bin', dtype=np.float32).reshape(1, 4, 1, 1)
if model_type == 'original':
#1 *.onnx load
onnx_model = horizon_onnx.load("./model/DCMT_original_float_model.onnx")
sess = rt.InferenceSession(onnx_model.SerializeToString())
input_names = [input.name for input in sess.get_inputs()]
output_names = [output.name for output in sess.get_outputs()]
print('input_names: ', input_names)
print('output_names: ', output_names)
#2 Input data
# z = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 127, 127)).astype(np.float32) # DCMT_original_float_model.onnx
# x = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 255, 255)).astype(np.float32) # DCMT_original_float_model.onnx
# bbox_t = np.asarray([30, 40, 100, 120]).astype(np.float32)
# b = np.expand_dims(np.expand_dims(np.expand_dims(bbox_t, axis=-1), axis=-1), axis=0) # (1x4x1x1)
z = z_bgr.transpose(0, 3, 1, 2).astype(np.float32)
x = x_bgr.transpose(0, 3, 1, 2).astype(np.float32)
b = z_box
feed_dict = {input_names[0]: z, input_names[1]: x, input_names[2]: b} # DCMT_original_float_model.onnx
#3 Run model
result = sess.run(output_names, feed_dict)
print(result[0].shape)
print(result[1].shape)
print(result[1][0, :, 0, 0])
elif model_type == 'optimized':
#1 *.onnx load
onnx_model = horizon_onnx.load("./model/DCMT_optimized_float_model.onnx")
sess = rt.InferenceSession(onnx_model.SerializeToString())
input_names = [input.name for input in sess.get_inputs()]
output_names = [output.name for output in sess.get_outputs()]
print('input_names: ', input_names)
print('output_names: ', output_names)
#2 Input data
# z = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 127, 127)).astype(np.float32) # DCMT_optimized_float_model.onnx
# x = np.random.uniform(low=0.0, high=255.0, size=(1, 3, 255, 255)).astype(np.float32) # DCMT_optimized_float_model.onnx
# bbox_t = np.asarray([30, 40, 100, 120]).astype(np.float32)
# b = np.expand_dims(np.expand_dims(np.expand_dims(bbox_t, axis=-1), axis=-1), axis=0) # (1x4x1x1)
z = z_bgr.transpose(0, 3, 1, 2).astype(np.float32)
x = x_bgr.transpose(0, 3, 1, 2).astype(np.float32)
b = z_box
feed_dict = {input_names[0]: z, input_names[1]: x, input_names[2]: b} # DCMT_optimized_float_model.onnx
#3 Run model
result = sess.run(output_names, feed_dict)
print(result[0].shape)
print(result[1].shape)
print(result[1][0, :, 0, 0])
elif model_type == 'quantized':
#1 *.onnx load
onnx_model = horizon_onnx.load("./model/DCMT_quantized_model.onnx")
sess = rt.InferenceSession(onnx_model.SerializeToString())
input_names = [input.name for input in sess.get_inputs()]
output_names = [output.name for output in sess.get_outputs()]
print('input_names: ', input_names)
print('output_names: ', output_names)
#2 Input data
# z = np.random.uniform(low=0.0, high=255.0, size=(1, 127, 127, 3)).astype(np.int8) # DCMT_quantized_model.onnx
# x = np.random.uniform(low=0.0, high=255.0, size=(1, 255, 255, 3)).astype(np.int8) # DCMT_quantized_model.onnx
# bbox_t = np.asarray([30, 40, 100, 120]).astype(np.float32)
# b = np.expand_dims(np.expand_dims(np.expand_dims(bbox_t, axis=-1), axis=-1), axis=0) # (1x4x1x1)
z = z_bgr
x = x_bgr
b = z_box
feed_dict = {input_names[0]: x, input_names[1]: z, input_names[2]: b} # DCMT_quantized_model.onnx
#3 Run model
result = sess.run(output_names, feed_dict)
print(result[0].shape)
print(result[1].shape)
print(result[1][0, :, 0, 0])
x.bin, z.bin and b.bin can be found in https://developer.horizon.ai/forumDetail/146176815327779277
THANKS VERY MUCH~