尝试将自己训练的语义分割onnx模型,运行03_build.sh不成功

lbq123456 · 2023 年4 月 27 日 09:18

我用的是镜像是ai_toolchain_centos_7_xj3:v2.4.2.-
在运行03_build.sh的时候，最后会报出03_build.sh: line 17: 396 Killed hb_mapper makertbin --config `$`{config_file} --model-type ${model_type}，期间会报出-

…-
WARNING: Sub_1238 not supported by BPU-
WARNING: Sub_1457 not supported by BPU-
WARNING: ReduceSum_1459 not supported by BPU-
WARNING: ReduceMax_1456 not supported by BPU-
主要是这三类节点不支持，但是我在<算子支持与约束列表>中发现这三个节点是支持的。

我主要是在ddk/samples/ai_toolchain/horizon_model_convert_sample/07_segmentation/01_unet_mobilenet/mapper中的内容修改的,具体如下

01_check.sh

set -e -v-
cd (dirname(dirname(dirname0)

model_type=“onnx”-
onnx_model=“poolformer_car.onnx”-
march=“bernoulli2”

hb_mapper checker --model-type ${model_type}-
--model ${onnx_model}-
--input-shape input 1x3x512x512-
--march ${march}

02_preprocess.sh

set -e

cd (dirname(dirname(dirname0) || exit

python3 ../../../data_preprocess.py-
--src_dir ../../../01_common/calibration_data/slot-
--dst_dir ./calibration_data_yuv_f32-
--pic_ext .yuv-
--read_mode opencv-
--saved_data_type float32

03_build.sh

set -e-
cd (dirname(dirname(dirname0) || exit

config_file=“./unet_mobilenet_config.yaml”-
model_type=“onnx”-
hb_mapper makertbin --config ${config_file}-
--model-type ${model_type}

打印信息

[root@0912055c8049 mapper]# bash 03_build.sh > record.txt-
/usr/local/lib/python3.6/site-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.-
from cryptography.hazmat.backends import default_backend-
2023-04-27 16:56:53,451 INFO log will be stored in /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/07_segmentation/02_slot_segformer/mapper/hb_mapper_makertbin.log-
2023-04-27 16:56:53,452 INFO Start hb_mapper…-
2023-04-27 16:56:53,452 INFO hbdk version 3.41.4-
2023-04-27 16:56:53,452 INFO horizon_nn version 0.15.3-
2023-04-27 16:56:53,452 INFO hb_mapper version 1.13.3-
2023-04-27 16:56:53,452 INFO Start Model Convert…-
2023-04-27 16:56:53,460 INFO Using onnx model file: /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/07_segmentation/02_slot_segformer/mapper/poolformer_car.onnx-
2023-04-27 16:56:53,575 INFO Model has 1 inputs according to model file-
2023-04-27 16:56:53,576 INFO Model name not given in yaml_file, using model name from model file: [‘input’]-
2023-04-27 16:56:53,576 INFO nv12 input type rt received.-
2023-04-27 16:56:53,577 INFO The calibration dir name suffix is the same as the value float32 of the cal_data_type parameter and will be read with the value of cal_data_type.-
2023-04-27 16:56:53,577 INFO custom_op does not exist, skipped-
2023-04-27 16:56:53,577 WARNING Input node input’s input_source not set, it will be set to pyramid by default-
2023-04-27 16:56:53,580 INFO *******************************************-
2023-04-27 16:56:53,580 INFO First calibration picture name: 202302231837221_leftImg8bit.yuv-
2023-04-27 16:56:53,580 INFO First calibration picture md5:-
2023-04-27 16:56:53,593 INFO *******************************************-
2023-04-27 16:56:53,681 INFO [Thu Apr 27 16:56:53 2023] Start to Horizon NN Model Convert.-
2023-04-27 16:56:53,682 INFO Parsing the input parameter:{‘input’: {‘input_shape’: [1, 3, 512, 512], ‘expected_input_type’: ‘YUV444_128’, ‘original_input_type’: ‘YUV444’, ‘original_input_layout’: ‘NCHW’, ‘means’: array([128.], dtype=float32), ‘scales’: array([0.0078125], dtype=float32)}}-
2023-04-27 16:56:53,682 INFO Parsing the calibration parameter-
2023-04-27 16:56:53,683 INFO Parsing the hbdk parameter:{‘hbdk_pass_through_params’: '–O3 --core-num 1 --fast ', ‘input-source’: {‘input’: ‘pyramid’, ‘_default_value’: ‘ddr’}}-
2023-04-27 16:56:53,683 INFO HorizonNN version: 0.15.3-
2023-04-27 16:56:53,684 INFO HBDK version: 3.41.4-
2023-04-27 16:56:53,684 INFO [Thu Apr 27 16:56:53 2023] Start to parse the onnx model.-
2023-04-27 16:56:53,817 INFO Input ONNX model infomation:-
ONNX IR version: 6-
Opset version: [11]-
Producer: pytorch1.12.1-
Domain: none-
Input name: input, [1, 3, 512, 512]-
Output name: output, [1, 0, 0, 0]-
2023-04-27 16:56:54,487 INFO [Thu Apr 27 16:56:54 2023] End to parse the onnx model.-
2023-04-27 16:56:54,488 INFO Model input names parsed from model: [‘input’]-
2023-04-27 16:56:54,488 INFO Create a preprocessing operator for input_name input with means=[128.], std=[128.], original_input_layout=NCHW, color convert from ‘YUV_BT601_FULL_RANGE’ to ‘YUV_BT601_FULL_RANGE’.-
2023-04-27 16:56:55,144 INFO Saving the original float model: poolformer_car_original_float_model.onnx.-
2023-04-27 16:56:55,146 INFO [Thu Apr 27 16:56:55 2023] Start to optimize the model.-
2023-04-27 16:57:03,434 INFO [Thu Apr 27 16:57:03 2023] End to optimize the model.-
2023-04-27 16:57:03,690 INFO Saving the optimized model: poolformer_car_optimized_float_model.onnx.-
2023-04-27 16:57:03,690 INFO [Thu Apr 27 16:57:03 2023] Start to calibrate the model.-
2023-04-27 16:57:03,693 INFO There are 24 samples in the calibration data set.

WARNING: ReduceMax_1456 not supported by BPU-
WARNING: Sub_1457 not supported by BPU-
WARNING: ReduceSum_1459 not supported by BPU-
2023-04-27 16:57:04,454 INFO Run calibration model with default calibration method.-
Default calibration in progress: 0%| | 0/3 [00:00<?, ?it/s]2023-04-27 16:57:09.181786079 [E:onnxruntime:, sequential_executor.cc:183 Execute] Non-zero status code returned while running Reshape node. Name:‘Reshape_9’ Status Message: /home/jenkins/agent/workspace/model_convert/onnxruntime/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:43 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector&) gsl::narrow_cast(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{8,64,128,128}, requested shape:{1,1,1048576}

Default calibration in progress: 0%| | 0/3 [00:00<?, ?it/s]-
2023-04-27 16:57:09,182 INFO Above info is caused by batch mode infer and can be ignored-
2023-04-27 16:57:09,183 INFO Reset batch_size=1 and execute calibration again…-
Default calibration in progress: 100%|████████████████████████████████████████████████████████████████████████████| 24/24 [02:49<00:00, 7.07s/it]-
…-
WARNING: Sub_1457 not supported by BPU-
WARNING: ReduceSum_1459 not supported by BPU-
WARNING: ReduceMax_1456 not supported by BPU-
/usr/local/lib/python3.6/site-packages/horizon_nn/quantization/loss_function.py:59: RuntimeWarning: invalid value encountered in true_divide-
data1, data2) / (np.linalg.norm(data1) * np.linalg.norm(data2))-
…-
WARNING: Sub_1457 not supported by BPU-
WARNING: ReduceSum_1459 not supported by BPU-
WARNING: ReduceMax_1456 not supported by BPU-
Default calibration in progress: 0%| | 0/3 [00:00<?, ?it/s]2023-04-27 17:01:01.017468869 [E:onnxruntime:, sequential_executor.cc:183 Execute] Non-zero status code returned while running Reshape node. Name:‘Reshape_9’ Status Message: /home/jenkins/agent/workspace/model_convert/onnxruntime/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:43 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector&) gsl::narrow_cast(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{8,64,128,128}, requested shape:{1,1,1048576}

Default calibration in progress: 0%| | 0/3 [00:00<?, ?it/s]-
2023-04-27 17:01:01,018 INFO Above info is caused by batch mode infer and can be ignored-
2023-04-27 17:01:01,018 INFO Reset batch_size=1 and execute calibration again…-
Default calibration in progress: 100%|████████████████████████████████████████████████████████████████████████████| 24/24 [01:02<00:00, 2.59s/it]-
…-
WARNING: Sub_1457 not supported by BPU-
WARNING: ReduceSum_1459 not supported by BPU-
WARNING: ReduceMax_1456 not supported by BPU-
2023-04-27 17:02:43,493 INFO Select max method.-
2023-04-27 17:02:43,661 INFO [Thu Apr 27 17:02:43 2023] End to calibrate the model.-
2023-04-27 17:02:43,662 INFO [Thu Apr 27 17:02:43 2023] Start to quantize the model.-
…-
WARNING: Sub_1457 not supported by BPU-
WARNING: ReduceSum_1459 not supported by BPU-
WARNING: ReduceMax_1456 not supported by BPU-
03_build.sh: line 17: 396 Killed hb_mapper makertbin --config configfile−−model−type{config_file} --model-typeconfigfile−−model−type{model_type}

颜值即正义 · 2023 年4 月 27 日 11:24

感谢您使用地平线芯片算法工具链，最近我们在收集大家的满意度反馈，欢迎您填写问卷，详细情况可见：https://developer.horizon.ai/forumDetail/146177053698464782

颜值即正义 · 2023 年4 月 27 日 11:24

您好，分成两个问题答复您：

1. warning警告信息是指这些算子不满足BPU的约束，跑在CPU上了。

2. killed是自动退出还是手动中止了呀？建议您升级到OE2.6.2再进行尝试，看看是否还会存在这个问题，获取链接：https://developer.horizon.ai/forumDetail/136488103547258769

lbq123456 · 2023 年4 月 28 日 01:48

1.<算子支持与约束列表>里，我看了ReduceMax是只能在CPU上跑的，ReduceSum、Sub是支持在BPU上的。

2. killed是自动退出的

颜值即正义 · 2023 年4 月 28 日 02:48

您好，在xj3的算子支持列表中，ReduceMax，ReduceSum、Sub算子只能在CPU上运行的

lbq123456 · 2023 年4 月 28 日 06:14

你好，我没找到OE2.6.2，我下的是openexplorer/ai_toolchain_ubuntu_20_xj3_gpu:v2.5.2，在执行03_build.sh的时候还是报类似的错误

03_build.sh: line 17: 341 Killed hb_mapper makertbin --config ${config_file} --model-type ${model_type}

颜值即正义 · 2023 年4 月 28 日 08:20

麻烦提供下onnx模型和对应的yaml文件，我们来复现一下哈

lbq123456 · 2023 年5 月 5 日 08:51

已上传

https://pan.baidu.com/s/1vTuim1vq_Ac8PFU1EBGI9A?pwd=b3nu

颜值即正义 · 2023 年5 月 5 日 11:41

你好，我使用OE2.5.2对应的docker是可以转换的呀，如下图：

建议更新一下

lbq123456 · 2023 年5 月 16 日 07:26

我发现了是我docker装的有问题，找其他同事试了转成功了。

还有能否开放ONNX算子支持的相关代码，这样更有助于开发者自定义算子，而不用为了若干个不支持的算子而修改模型结构。

颜值即正义 · 2023 年5 月 16 日 07:42

这个暂时尚未开放，我这边会把您的需求记录下来进行反馈~

lbq123456 · 2023 年5 月 16 日 09:17

你好，感谢你的支持与恢复。-
我还有一个问题，我如何在量化一个模型的时候指定一部分算子在CPU上跑，如下图所示，exp、div、argmax算子跑到了BPU上了，这样BPU和CPU间的频繁互相拷贝是很耗时

颜值即正义 · 2023 年5 月 17 日 09:12

可以使用run_on_bpu这个参数尝试一下，前提是需要满足BPU的约束哈~

lbq123456 · 2023 年6 月 15 日 09:52

你好，我们在用04_inference.sh推理这个模型的时候出现了，问题发现：-
1. seg_inference.py中-
output = sess.run(output_name, {input_name: image_data},-
input_offset=input_offset)-
的输出全是零。-
2.官方给的模型是基于unet的，输出维度是[1,1024,2048,19]，这里的19应该是unet最后一个卷积层进行分类而设置的，对应的应该是cityscapes的总类别数，而我们的模型输出是[1,1,512,512]；所以如何将输出对齐，保证我们的模型或者后续其它维度模型能正常推理。-
新模型：-
_链接：百度网盘请输入提取码

提取码：3mzj

颜值即正义 · 2023 年6 月 15 日 09:54

新问题，就重新发个帖子吧，记得按照对应格式去发哈