全bpu算子,但运行在x3上时,依旧高cpu占用低bpu占用

rt,我将我的模型转化为旭日派可用bin模型,运行在旭日派x3上,发现bpu占用只有百分之15,cpu占用高达百分之88,相比较之下,官方demo为bpu百分之80,cpu百分之39,请问我是忽视哪里了吗

转化日志

2023-12-05 10:51:15,951 INFO The converted model node information:

==============================================================================================================================================

Node ON Subgraph Type Cosine Similarity Threshold In/Out DataType

----------------------------------------------------------------------------------------------------------------------------------------------

HZ_PREPROCESS_FOR_input BPU id(0) HzSQuantizedPreprocess 1.000096 127.000000 int8/int8

/model.0/conv/Conv BPU id(0) HzSQuantizedConv 0.999624 1.006476 int8/int8

/model.0/act/Mul BPU id(0) HzLut 0.989087 31.359735 int8/int8

/model.1/conv/Conv BPU id(0) HzSQuantizedConv 0.975517 8.344982 int8/int8

/model.1/act/Mul BPU id(0) HzLut 0.982939 20.146502 int8/int8

/model.2/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.969041 18.242432 int8/int8

/model.2/cv1/act/Mul BPU id(0) HzLut 0.986417 16.059618 int8/int8

/model.2/Slice BPU id(0) Slice 0.982046 16.059616 int8/int8

/model.2/Slice_1 BPU id(0) Slice 0.992236 16.059616 int8/int8

/model.2/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.975491 16.059616 int8/int8

/model.2/m.0/cv1/act/Mul BPU id(0) HzLut 0.988838 7.824300 int8/int8

/model.2/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.978795 4.335206 int8/int8

/model.2/m.0/cv2/act/Mul BPU id(0) HzLut 0.985265 11.735908 int8/int8

UNIT_CONV_FOR_/model.2/m.0/Add BPU id(0) HzSQuantizedConv 0.990493 16.059616 int8/int8

…Slice_output_0_calibrated_0.08912_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…/model.2/Slice_1_output_0_0.08912_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.2/Concat BPU id(0) Concat 0.987388 16.059616 int8/int8

/model.2/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.989501 11.318681 int8/int8

/model.2/cv2/act/Mul BPU id(0) HzLut 0.995370 6.897004 int8/int8

/model.3/conv/Conv BPU id(0) HzSQuantizedConv 0.988529 4.332334 int8/int8

/model.3/act/Mul BPU id(0) HzLut 0.987565 5.411256 int8/int8

/model.4/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.984164 2.939682 int8/int8

/model.4/cv1/act/Mul BPU id(0) HzLut 0.985169 5.470881 int8/int8

/model.4/Slice BPU id(0) Slice 0.971627 5.447958 int8/int8

/model.4/Slice_1 BPU id(0) Slice 0.993531 5.447958 int8/int8

/model.4/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.987754 5.447958 int8/int8

/model.4/m.0/cv1/act/Mul BPU id(0) HzLut 0.979171 4.910393 int8/int8

/model.4/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.987746 2.504302 int8/int8

/model.4/m.0/cv2/act/Mul BPU id(0) HzLut 0.989044 4.240636 int8/int8

UNIT_CONV_FOR_/model.4/m.0/Add BPU id(0) HzSQuantizedConv 0.994704 5.447958 int8/int8

/model.4/m.1/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.993597 4.559353 int8/int8

/model.4/m.1/cv1/act/Mul BPU id(0) HzLut 0.987519 5.032671 int8/int8

/model.4/m.1/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.988236 1.211932 int8/int8

/model.4/m.1/cv2/act/Mul BPU id(0) HzLut 0.989025 9.180493 int8/int8

UNIT_CONV_FOR_/model.4/m.1/Add BPU id(0) HzSQuantizedConv 0.994187 4.559353 int8/int8

…Slice_output_0_calibrated_0.03856_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…/model.4/Slice_1_output_0_0.03856_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…/model.4/m.0/Add_output_0_0.03856_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.4/Concat BPU id(0) Concat 0.992830 5.447958 int8/int8

/model.4/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.986011 4.896566 int8/int8

/model.4/cv2/act/Mul BPU id(0) HzLut 0.979562 6.373096 int8/int8

/model.5/conv/Conv BPU id(0) HzSQuantizedConv 0.989439 2.997622 int8/int8

/model.5/act/Mul BPU id(0) HzLut 0.981459 6.085490 int8/int8

/model.6/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.977101 2.548100 int8/int8

/model.6/cv1/act/Mul BPU id(0) HzLut 0.976240 7.886718 int8/int8

/model.6/Slice BPU id(0) Slice 0.969597 7.883756 int8/int8

/model.6/Slice_1 BPU id(0) Slice 0.987118 7.883756 int8/int8

/model.6/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.992820 7.883756 int8/int8

/model.6/m.0/cv1/act/Mul BPU id(0) HzLut 0.979208 8.399978 int8/int8

/model.6/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.985994 4.175181 int8/int8

/model.6/m.0/cv2/act/Mul BPU id(0) HzLut 0.985562 8.896528 int8/int8

UNIT_CONV_FOR_/model.6/m.0/Add BPU id(0) HzSQuantizedConv 0.987012 7.883756 int8/int8

/model.6/m.1/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.995380 5.489950 int8/int8

/model.6/m.1/cv1/act/Mul BPU id(0) HzLut 0.986267 8.740604 int8/int8

/model.6/m.1/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.983756 2.248126 int8/int8

/model.6/m.1/cv2/act/Mul BPU id(0) HzLut 0.985131 12.755758 int8/int8

UNIT_CONV_FOR_/model.6/m.1/Add BPU id(0) HzSQuantizedConv 0.989923 5.489950 int8/int8

…Slice_output_0_calibrated_0.05323_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…/model.6/Slice_1_output_0_0.05323_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…/model.6/m.0/Add_output_0_0.05323_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.6/Concat BPU id(0) Concat 0.986134 7.883756 int8/int8

/model.6/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.989668 6.760187 int8/int8

/model.6/cv2/act/Mul BPU id(0) HzLut 0.977054 7.424064 int8/int8

/model.7/conv/Conv BPU id(0) HzSQuantizedConv 0.987566 2.017377 int8/int8

/model.7/act/Mul BPU id(0) HzLut 0.952370 8.626787 int8/int8

/model.8/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.975240 2.050196 int8/int8

/model.8/cv1/act/Mul BPU id(0) HzLut 0.950288 8.381120 int8/int8

/model.8/Slice BPU id(0) Slice 0.943360 8.379200 int8/int8

/model.8/Slice_1 BPU id(0) Slice 0.961379 8.379200 int8/int8

/model.8/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.977783 8.379200 int8/int8

/model.8/m.0/cv1/act/Mul BPU id(0) HzLut 0.960151 10.350728 int8/int8

/model.8/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.955445 2.185022 int8/int8

/model.8/m.0/cv2/act/Mul BPU id(0) HzLut 0.950957 16.610804 int8/int8

UNIT_CONV_FOR_/model.8/m.0/Add BPU id(0) HzSQuantizedConv 0.945748 8.379200 int8/int8

…Slice_output_0_calibrated_0.04812_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…/model.8/Slice_1_output_0_0.04812_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.8/Concat BPU id(0) Concat 0.946949 8.379200 int8/int8

/model.8/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.942499 6.110874 int8/int8

/model.8/cv2/act/Mul BPU id(0) HzLut 0.940454 11.257348 int8/int8

/model.9/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.964588 3.772692 int8/int8

/model.9/cv1/act/Mul BPU id(0) HzLut 0.969747 7.868598 int8/int8

/model.9/m/MaxPool BPU id(0) HzQuantizedMaxPool 0.989405 6.990780 int8/int8

/model.9/m_1/MaxPool BPU id(0) HzQuantizedMaxPool 0.992766 6.990780 int8/int8

/model.9/m_2/MaxPool BPU id(0) HzQuantizedMaxPool 0.993922 6.990780 int8/int8

/model.9/Concat BPU id(0) Concat 0.990622 6.990780 int8/int8

/model.9/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.977730 6.990780 int8/int8

/model.9/cv2/act/Mul BPU id(0) HzLut 0.929337 9.760682 int8/int8

/model.10/Resize BPU id(0) HzQuantizedResizeUpsample 0.929319 2.969530 int8/int8

…esize_output_0_calibrated_0.01588_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.11/Concat BPU id(0) Concat 0.934048 2.969530 int8/int8

/model.12/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.944902 2.017377 int8/int8

/model.12/cv1/act/Mul BPU id(0) HzLut 0.928889 8.449221 int8/int8

/model.12/Slice BPU id(0) Slice 0.914075 8.447412 int8/int8

/model.12/Slice_1 BPU id(0) Slice 0.939400 8.447412 int8/int8

/model.12/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.933003 8.447412 int8/int8

/model.12/m.0/cv1/act/Mul BPU id(0) HzLut 0.844034 9.292811 int8/int8

/model.12/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.830614 3.682564 int8/int8

/model.12/m.0/cv2/act/Mul BPU id(0) HzLut 0.874821 12.595719 int8/int8

…Slice_output_0_calibrated_0.03873_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…model.12/Slice_1_output_0_0.03873_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.12/Concat BPU id(0) Concat 0.897341 8.447412 int8/int8

/model.12/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.892441 4.918080 int8/int8

/model.12/cv2/act/Mul BPU id(0) HzLut 0.863770 8.295031 int8/int8

/model.13/Resize BPU id(0) HzQuantizedResizeUpsample 0.863755 3.700210 int8/int8

…esize_output_0_calibrated_0.02863_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…el.4/cv2/act/Mul_output_0_0.02863_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.14/Concat BPU id(0) Concat 0.903632 3.700210 int8/int8

/model.15/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.938993 3.635771 int8/int8

/model.15/cv1/act/Mul BPU id(0) HzLut 0.953686 8.245295 int8/int8

/model.15/Slice BPU id(0) Slice 0.926305 8.243131 int8/int8

/model.15/Slice_1 BPU id(0) Slice 0.972177 8.243131 int8/int8

/model.15/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.963200 8.243131 int8/int8

/model.15/m.0/cv1/act/Mul BPU id(0) HzLut 0.949354 6.327644 int8/int8

/model.15/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.945956 1.188308 int8/int8

/model.15/m.0/cv2/act/Mul BPU id(0) HzLut 0.966197 8.213437 int8/int8

…Slice_output_0_calibrated_0.04240_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…model.15/Slice_1_output_0_0.04240_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.15/Concat BPU id(0) Concat 0.960495 8.243131 int8/int8

/model.15/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.942040 5.384799 int8/int8

/model.15/cv2/act/Mul BPU id(0) HzLut 0.956004 8.580413 int8/int8

/model.16/conv/Conv BPU id(0) HzSQuantizedConv 0.911747 3.368347 int8/int8

/model.16/act/Mul BPU id(0) HzLut 0.901471 8.049969 int8/int8

/model.17/Concat BPU id(0) Concat 0.883104 3.700210 int8/int8

/model.18/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.884937 3.700210 int8/int8

/model.18/cv1/act/Mul BPU id(0) HzLut 0.879953 8.232371 int8/int8

/model.18/Slice BPU id(0) Slice 0.799417 8.230183 int8/int8

/model.18/Slice_1 BPU id(0) Slice 0.930030 8.230183 int8/int8

/model.18/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.915408 8.230183 int8/int8

/model.18/m.0/cv1/act/Mul BPU id(0) HzLut 0.904517 7.373038 int8/int8

/model.18/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.888237 1.645537 int8/int8

/model.18/m.0/cv2/act/Mul BPU id(0) HzLut 0.894237 9.227028 int8/int8

…Slice_output_0_calibrated_0.02041_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…model.18/Slice_1_output_0_0.02041_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.18/Concat BPU id(0) Concat 0.887850 8.230183 int8/int8

/model.18/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.890692 2.592202 int8/int8

/model.18/cv2/act/Mul BPU id(0) HzLut 0.900308 8.699157 int8/int8

/model.19/conv/Conv BPU id(0) HzSQuantizedConv 0.873450 2.656330 int8/int8

/model.19/act/Mul BPU id(0) HzLut 0.819536 8.125890 int8/int8

/model.20/Concat BPU id(0) Concat 0.867348 2.969530 int8/int8

/model.21/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.892068 2.969530 int8/int8

/model.21/cv1/act/Mul BPU id(0) HzLut 0.871668 8.229555 int8/int8

/model.21/Slice BPU id(0) Slice 0.834567 8.227361 int8/int8

/model.21/Slice_1 BPU id(0) Slice 0.904359 8.227361 int8/int8

/model.21/m.0/cv1/conv/Conv BPU id(0) HzSQuantizedConv 0.907093 8.227361 int8/int8

/model.21/m.0/cv1/act/Mul BPU id(0) HzLut 0.882285 8.573235 int8/int8

/model.21/m.0/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.887049 2.812801 int8/int8

/model.21/m.0/cv2/act/Mul BPU id(0) HzLut 0.863030 11.853851 int8/int8

…Slice_output_0_calibrated_0.02407_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

…model.21/Slice_1_output_0_0.02407_TO_FUSE_SCALE BPU id(0) HzSQuantizedConv int8/int8

/model.21/Concat BPU id(0) Concat 0.866348 8.227361 int8/int8

/model.21/cv2/conv/Conv BPU id(0) HzSQuantizedConv 0.862311 3.057136 int8/int8

/model.21/cv2/act/Mul BPU id(0) HzLut 0.856975 14.351915 int8/int8

您好,您的这个模型确实所有算子都在BPU上,我猜测是原因是您BPU监控方式不科学,没有捕捉到BPU的峰值利用率的点导致的,您可以写一个脚本按照一个较短的采样时间(例如5ms)采集BPU利用率,然后将数据绘图查看整个过程中BPU的利用率变化。-

另外这里有一篇帖子可利用多线程技术提高BPU利用率:XJ3多核BPU的合理使用技巧与建议 (horizon.cc)

bpu的监控方式的话,我都是用的官方提供的hrut_somstatus来手敲眼看,俩组对比是完全一致的环境下多次对比均为如此,官方demo的cpu占用低bpu占用高,我自己的bpu占用低,cpu占用高

不应该这么奇怪啊,您再试试吧,我这确实没啥建议了。另外hrut_somstatus是个比较慢的指令,您可以直接读这两个Linux文件来获取bpu占用率,手敲那个也不一定能捕捉到,BPU推理几十毫秒就结束了:-
bpu0 = open(‘/sys/devices/system/bpu/bpu0/ratio’, ‘r’, encoding=‘utf-8’)

bpu1 = open(‘/sys/devices/system/bpu/bpu1/ratio’, ‘r’, encoding=‘utf-8’)

那按理来说,我应该俩边一样,哪怕说,发生一次我的模型占用高,官方demo低也行啊,实际上是每次都是我的低,官方的高,我敲的速度也很快了,次数也很多,没有出现过一次我高官方低的情况

为了保证过程无任何干扰,我demo写的是while(true)的执行对同一个数组执行推理

你好,模型本身肯定是不一样的,不然不会出现这种情况,建议使用官方的方案加快推理速度。-
不然模型不同,虽然看上去都是bpu运行,但是实际上运行差别还是挺大的

呃呃,现在的话,我需要去评价一下,我的模型哪里违规了,比如我如果用tensorflow在windows上时,如果tensorrt,或者cuda环境没装,它会在运行前有个提示,现在我用bpu,运行前中后均无任何提示,表现就是bpu占用奇低,cpu奇高,我需要找到这个问题是因为什么导致的,如果是模型违规我会改模型,如果是输入参违规我会修改输入参

可以参考4.1.7 章节,4.1.1.7. 模型性能分析与调优 — Horizon Open Explorer

好的,另外,咋们论坛这个问号按钮的交流群的那个人,是微信号换了吗?无法添加

您可以扫这个链接内的二维码进入微信交流群:https://developer.horizon.cc/forumDetail/112555549341653649

我通过所有工具排查我的模型,所有工具均说我的模型运行为bpu节点,但最终出来的效果,以性能监控来看还是跑在bpu上了

链接:百度网盘-链接不存在

提取码:7rhn

--来自百度网盘超级会员V7的分享

我的模型地址