感谢您使用地平线芯片算法工具链,最近我们在收集大家的满意度反馈,欢迎您填写问卷,详细情况可见:https://developer.horizon.ai/forumDetail/146177053698464782
您好,出现这种情况可能有两种原因:
第一:hb_perf是
不含CPU部分的计算评估,如果CPU计算仅限于模型输入或输出部分的常规性处理,不含计算密集型计算节点,这个影响不大。 否则,您就一定需要利用开发板工具实测性能。-
第二:板端实测模型FPS时,请参考如下命令:比如是否用了多线程、是否用了双核等。-
你好,这个是我在板端推理出来的,在bpu上推理需要247ms,请问这个是需要考虑卷积对齐或者chanel对齐吗
你好,可以发一下你的板端运行命令界面吗?
还有,需要麻烦你的是,thread_num是什么意思。是指创建8个线程进行推理吗
thread_num是表示线程数的意思。麻烦您提供以下信息哈:
单核单线程跑测试latency,截图提供一下运行界面和log;
双核5线程下测FPS,截图提供一下运行界面和log;
测试FPS命令可参考:
测试latency命令参考:
下面这个是单线程
{-
“perf_result”: {-
“FPS”: 11.233394094761993,-
“average_latency”: 88.96385955810547-
},-
“running_condition”: {-
“core_id”: 0,-
“frame_count”: 200,-
“model_name”: “yolo_shuffletnet”,-
“run_time”: 17804.058,-
“thread_num”: 1-
}-
}-
***-
{-
“chip_latency”: {-
“BPU_inference_time_cost”: {-
“avg_time”: 87.901325,-
“max_time”: 105.169,-
“min_time”: 78.489-
},-
“CPU_inference_time_cost”: {-
“avg_time”: 0.8687999999999999,-
“max_time”: 1.615,-
“min_time”: 0.647-
}-
},-
“model_latency”: {-
“BPU_torch-jit-export_subgraph_0”: {-
“avg_time”: 87.901325,-
“max_time”: 105.169,-
“min_time”: 78.489-
},-
“Dequantize_901_HzDequantize”: {-
“avg_time”: 0.590795,-
“max_time”: 1.079,-
“min_time”: 0.443-
},-
“Dequantize_921_HzDequantize”: {-
“avg_time”: 0.14845,-
“max_time”: 0.262,-
“min_time”: 0.111-
},-
“Dequantize_941_HzDequantize”: {-
“avg_time”: 0.040835,-
“max_time”: 0.087,-
“min_time”: 0.028-
},-
“torch-jit-export_subgraph_0_output_layout_convert”: {-
“avg_time”: 0.08872,-
“max_time”: 0.187,-
“min_time”: 0.065-
}-
},-
“task_latency”: {-
“TaskPendingTime”: {-
“avg_time”: 0.014060000000000001,-
“max_time”: 0.11,-
“min_time”: 0.0-
},-
“TaskRunningTime”: {-
“avg_time”: 88.49078,-
“max_time”: 106.825,-
“min_time”: 0.0-
}-
}-
}
下面这个是5线程
{-
“perf_result”: {-
“FPS”: 38.35208014971118,-
“average_latency”: 129.03817749023438-
},-
“running_condition”: {-
“core_id”: 0,-
“frame_count”: 200,-
“model_name”: “yolo_shuffletnet”,-
“run_time”: 5214.841,-
“thread_num”: 5-
}-
}-
***-
{-
“chip_latency”: {-
“BPU_inference_time_cost”: {-
“avg_time”: 127.62822,-
“max_time”: 187.154,-
“min_time”: 102.937-
},-
“CPU_inference_time_cost”: {-
“avg_time”: 0.989485,-
“max_time”: 4.19,-
“min_time”: 0.65-
}-
},-
“model_latency”: {-
“BPU_torch-jit-export_subgraph_0”: {-
“avg_time”: 127.62822,-
“max_time”: 187.154,-
“min_time”: 102.937-
},-
“Dequantize_901_HzDequantize”: {-
“avg_time”: 0.697225,-
“max_time”: 2.721,-
“min_time”: 0.444-
},-
“Dequantize_921_HzDequantize”: {-
“avg_time”: 0.16266999999999998,-
“max_time”: 0.396,-
“min_time”: 0.112-
},-
“Dequantize_941_HzDequantize”: {-
“avg_time”: 0.045145000000000005,-
“max_time”: 0.916,-
“min_time”: 0.028-
},-
“torch-jit-export_subgraph_0_output_layout_convert”: {-
“avg_time”: 0.08444499999999999,-
“max_time”: 0.157,-
“min_time”: 0.066-
}-
},-
“task_latency”: {-
“TaskPendingTime”: {-
“avg_time”: 92233720182656.36,-
“max_time”: 1.8446744055121744e+16,-
“min_time”: 0.0-
},-
“TaskRunningTime”: {-
“avg_time”: 92233718044899.89,-
“max_time”: 1.8446744055122e+16,-
“min_time”: 0.0-
}-
}-
}
还有就是,您的OE包版本应该非常旧了,建议您去下载最新版本的OE包来进行使用,下载链接可见于:https://developer.horizon.ai/forumDetail/136488103547258769。迭代过程中会优化掉很多问题,性能也会有所提升。
哦不好意思
好的我现在就去下载
你好,请问有什么问题吗。毕业论文要弄这个
总体执行看,没有什么问题的,最后的执行出错是因为在转换yaml中配置了双核,因此板端必须配置为0。建议使用最新版本工具链去重新转换模型。
双核多线程测试FPS也是没问题的,毕竟仿真中没有统计CPU部分耗时
谢谢了,等我用了最新的转之后再麻烦你了。
后续有新的问题的话,欢迎新开贴提问,这个帖子我先关了哈~