关于语义分割模型移植问题的求助

用户您好,请详细描述您所遇到的问题:

  1. 系统软件版本: (通过 cat /etc/version 获得):x3j3_lnx_db_20220609 debug
  2. 问题涉及的技术领域: (硬件、操作系统、驱动、其他) :模型移植
  3. 问题描述:(尽可能详细的描述在进行什么功能的开发或者测试,发现了什么问题,问题现象,并且提供预期的结果):

在J3开发移植语义分割模型时出现问题,表现如下:

在宿主机上check通过,build通过,生成了在J3可运行的bin文件模型;

在开发板上按流程运行模型出现Segmentation fault;

已在开发板上成功运行yolov5模型;

两个模型运行均为solution_example模板修改制作。

经自己测试LOG打印发现preprocess已经完成,postprocess还未进入,推测为模型推理过程出错,此推测与宿主机上测试结果相违背。

程序版本分别为:horizon_xj3_open_explorer_v1.7.5_20211122;J3-PlatformSDK-PL2.1-V1.0.0-20210608。

4. 复现概率:必现

5. 提供必要的问题日志:

正常运行LOG:

[HorizonRT] The model builder version = 1.3.64

(inference_engine_dnn.cc:200): found model: LD-GA-IR-640-2-20230719

[HorizonRT] The model builder version = 1.3.64

(inference_engine_dnn.cc:200): found model: LD-GA-IR-640-2-20230719

(video_source.cc:379): =========================================================

(video_source.cc:380): VideoSource VERSION: 1.0.15 Mon Nov 15 19:47:50 2021

(video_source.cc:381): =========================================================

[ERROR][“LOG”][m6cPoc_utility.c:282] sensor_info->extra_mode :0

[ERROR][“LOG”][m6cPoc_utility.c:127] sensor_m6cPoc_9296_init,5,0x48

(vps_module_vapi.cc:313): group_id: 0 this_src_chn: 0 this_ipu_chn: 0 ipu_data_type: 0

[ERROR][“LOG”][m6cPoc_utility.c:365] sensor_m6cPoc_9296_start,5,0x48

[DIAG_ERROR][diag_lib_app.cpp:132] Please start diag service first

(main.cc:81): video_source_plg Start

(main.cc:83): smart_plg Start

(main.cc:85): websocket_plg Start

(image_process.cc:297): Resize output 256 160

(image_process.cc:297): Resize output 256 160

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 0.000209167 LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 0.000154667 LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0365547 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(test_postprocess.cc:57): YoloV5PostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0380149 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(pidnet_postprocess.cc:57): PIDNETPostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(image_process.cc:297): Resize output 256 160

(web_display_plugin.cc:168): open config file /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq failed

(image_process.cc:297): Resize output 256 160

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 3.1167e-05 LD-GA-IR-640-2-20230719

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 2.7e-05 LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0112875 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(test_postprocess.cc:57): YoloV5PostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0111994 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(pidnet_postprocess.cc:57): PIDNETPostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(web_display_plugin.cc:168): open config file /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq failed

(image_process.cc:297): Resize output 256 160

(image_process.cc:297): Resize output 256 160

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 3.2084e-05 LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 1.65e-05 LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0112856 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(pidnet_postprocess.cc:57): PIDNETPostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0112655 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(test_postprocess.cc:57): YoloV5PostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(web_display_plugin.cc:168): open config file /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq failed

(image_process.cc:297): Resize output 256 160

(image_process.cc:297): Resize output 256 160

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 2.7958e-05 LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 9.4e-05 LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0109415 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(test_postprocess.cc:57): YoloV5PostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0118326 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(pidnet_postprocess.cc:57): PIDNETPostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

分割模型移植报错LOG:

[HorizonRT] The model builder version = 1.3.64

(inference_engine_dnn.cc:200): found model: LD-GA-IR-640-2-20230719

[HorizonRT] The model builder version = 1.3.64

(inference_engine_dnn.cc:200): found model: Seg-PIDNet-IR-640-2-20230720

(video_source.cc:379): =========================================================

(video_source.cc:380): VideoSource VERSION: 1.0.15 Mon Nov 15 19:47:50 2021

(video_source.cc:381): =========================================================

[ERROR][“LOG”][m6cPoc_utility.c:282] sensor_info->extra_mode :0

[ERROR][“LOG”][m6cPoc_utility.c:127] sensor_m6cPoc_9296_init,5,0x48

(vps_module_vapi.cc:313): group_id: 0 this_src_chn: 0 this_ipu_chn: 0 ipu_data_type: 0

[ERROR][“LOG”][m6cPoc_utility.c:365] sensor_m6cPoc_9296_start,5,0x48

[DIAG_ERROR][diag_lib_app.cpp:132] Please start diag service first

(executor.cc:99): task_queue_ is empty

(main.cc:81): video_source_plg Start

(main.cc:83): smart_plg Start

(main.cc:85): websocket_plg Start

(image_process.cc:297): Resize output 368 288

(image_process.cc:297): Resize output 256 160

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:179): task_time : 0.0001835 LD-GA-IR-640-2-20230719

(inferencer.cc:166): preprocessing finished Seg-PIDNet-IR-640-2-20230720

(inferencer.cc:179): task_time : 5.7833e-05 Seg-PIDNet-IR-640-2-20230720

(inferencer.cc:194): infer_time : 0.0281339 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(test_postprocess.cc:57): YoloV5PostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(image_process.cc:297): Resize output 256 160

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 3.6292e-05 LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0112798 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(test_postprocess.cc:57): YoloV5PostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

(image_process.cc:297): Resize output 256 160

(pyramid_preprocess.cc:137): Pyramid Preprocessing finished

(inferencer.cc:166): preprocessing finished LD-GA-IR-640-2-20230719

(inferencer.cc:179): task_time : 7.0375e-05 LD-GA-IR-640-2-20230719

(inferencer.cc:194): infer_time : 0.0109589 LD-GA-IR-640-2-20230719

(inferencer.cc:195): postprocessing start LD-GA-IR-640-2-20230719

(test_postprocess.cc:57): YoloV5PostProcess Execute 1

(inferencer.cc:197): postprocessing finished LD-GA-IR-640-2-20230719

Segmentation fault (core dumped)

6. 软件上是否有做自定义修改:

修改model_inference的后处理以适配我移植的模型,在solution_example上修改摄像头和模型配置以适配我使用的摄像头和模型

您好,地平线工具链在持续迭代优化,为了给您提供更好的服务,希望您能抽出3分钟左右的时间,将您在使用工具链期间的感受和建议告诉我们,您的宝贵意见对我们很重要,非常感谢!-
问卷链接:地平线算法工具链使用满意度反馈

可以将您的模型在J3上用hrt_model_exc perf跑一下性能测试,看能否正常打印结果

参考命令:./hrt_model_exec perf --model_file xxxx.bin --frame_count 20

可以正确通过检查,是不是表示模型没问题?此log我附在下面。-
我已经定位了跳出的位置是inferencer.cc里的GetResult函数的for循环内部;这里官方的注释是等待推理结束,我的理解是这里只做模型推理,请问这一步还做哪些操作呢?

[HorizonRT] The model builder version = 1.3.64

Load model to DDR cost 227.882ms.

I0101 08:05:29.668665 2045 main.cpp:927] get model handle success

I0101 08:05:29.668880 2045 main.cpp:1540] get model input count success

I0101 08:05:29.669019 2045 main.cpp:1547] prepare input tensor success!

I0101 08:05:29.669060 2045 main.cpp:1553] get model output count success

Frame count: 20, Thread Average: 118.903099 ms, thread max latency: 146.033005 ms, thread min latency: 117.287003 ms, FPS: 8.407060

Running condition:

Thread number is: 1

Frame count is: 20

Program run time: 2379.136000 ms

Perf result:

Frame totally latency is: 2378.062012 ms

Average latency is: 118.903099 ms

Frame rate is: 8.406413 FPS

有这样的结果打印,表明模型本身是没问题的,而且能正常完成多次循环推理。可以对照horizon_runtime_sample的00 quick start示例代码,检查下C++程序编写有没有问题。