[图像算法][RDK X3]system only support 1024 funccalls at most

92123740 · 2025 年5 月 28 日 01:11

模型infer的时候报错:

[E][DNN][hbm_exec_plan.cpp:1105][Plan](2025-05-28,09:02:45.996.763) Subgraph [main_graph_subgraph_25] generate funccalls exceeds limit!

[E][DNN][multi_model_task.cpp:157][Task](2025-05-28,09:02:45.996.932) The model [rt_igev] genarate 2597 funccalls, but system only support 1024 funccalls at most. Funccalls can be reduced by: 1. Increase `max-time-per-fc` time at compile stage; 2. Decrease the number of batchsize at compile stage; 3. Reduce the number of roi when calling hbDNNRoiInfer

E0528 09:02:45.999325 46811 function_util.cpp:205] hbDNNWaitTaskDone failed, error code:-6000012

E0528 09:02:45.999452 46811 main.cpp:352] model inference failed, error code:-6000012

这是模型的百度网盘地址：百度网盘请输入提取码提取码1234

我的模型的onnx节点就有1666个，funccalls是否和模型节点数相关？

max-time-per-fc默认是0吧，batchsize 是1。我该如何改进模型让它能跑在X3上？

X5最大支持多少funccalls？

MAACCC · 2025 年6 月 9 日 10:15

数量和模型段数有关，你这个应该就是CPU节点太多导致的。-

由于您并没有提供rdkos_info命令和工具链相关的版本信息，所以假定您使用的工具版本都是最新的了，所以限制是1024.

92123740 · 2025 年5 月 28 日 02:24

打开main_graph_subgraph_25的json文件显示如下内容：是否是因为Reshape_48_output_0的单维最高是15360，超过了地平线X3最大支持的单维限制?X3单维有最大限制吗？

{

“summary”: {

“BPU OPs per frame (effective)”: 768,

“BPU OPs per run (effective)”: 11796480,

“BPU PE number”: 1,

“BPU core number”: 1,

“BPU march”: “BERNOULLI2”,

“DDR bytes per frame”: 3840,

“DDR bytes per run”: 58985216,

“DDR bytes per second”: 1655262259,

“DDR megabytes per frame”: 0.004,

“DDR megabytes per run”: 56.253,

“DDR megabytes per second”: 1578.6,

“FPS”: 431037.3,

“HBDK version”: “3.49.13”,

“compiling options”: “-f hbir -m /tmp/tmpydjf5ywf/main_graph_subgraph_25.hbir -o /tmp/tmpydjf5ywf/main_graph_subgraph_25.hbm --march bernoulli2 --progressbar --O3 --core-num 1 --fast --input-layout NHWC --output-layout NCHW --input-source ddr”,

“frame per run”: 15360,

“frame per second”: 431037.3,

“input features”: [

[

“input name”,

“input size”

],

[

“/Reshape_48_output_0_calibrated_quantized”,

“15360x1x48x8”

]

],

“interval computing unit utilization”: [

0.66,

0.623,

0.624,

0.623,

0.647,

0.664,

0.638,

0.625,

0.623,

0.627,

0.664,

0.658,

0.625,

0.624,

0.623,

0.647,

0.658,

0.645,

0.625,

0.624,

0.643,

0.64,

0.711,

0.776,

0.766,

0.792,

0.766,

0.782,

0.748,

0.789,

0.769,

0.764,

0.796,

0.761,

0.795,

0.481

],

“interval computing units utilization”: [

0.66,

0.623,

0.624,

0.623,

0.647,

0.664,

0.638,

0.625,

0.623,

0.627,

0.664,

0.658,

0.625,

0.624,

0.623,

0.647,

0.658,

0.645,

0.625,

0.624,

0.643,

0.64,

0.711,

0.776,

0.766,

0.792,

0.766,

0.782,

0.748,

0.789,

0.769,

0.764,

0.796,

0.761,

0.795,

0.481

],

“interval loading bandwidth (megabytes/s)”: [

517,

518,

517,

515,

488,

465,

493,

515,

485,

466,

495,

515,

479,

466,

501,

515,

508,

462,

426,

431,

433,

426,

431,

426,

422,

434,

433,

436,

338

],

“interval number”: 36,

“interval storing bandwidth (megabytes/s)”: [

654,

722,

774,

758,

757,

755,

722,

731,

758,

749,

764,

723,

722,

766,

758,

755,

764,

722,

758,

748,

764,

961,

1384,

1677,

1735,

1734,

1677,

1678,

1735,

1471

],

“interval time (ms)”: 1.0,

“latency (ms)”: 35.63,

“latency (ms) by segments”: [

35.635

],

“latency (us)”: 35635.0,

“loaded bytes per frame”: 1152,

“loaded bytes per run”: 17697536,

“model json CRC”: “7a868267”,

“model json file”: “/tmp/tmpydjf5ywf/main_graph_subgraph_25.hbir”,

“model name”: “main_graph_subgraph_25”,

“model param CRC”: “00000000”,

“multicore sync time (ms)”: 0.0,

“run per second”: 28.06,

“runtime version”: “3.15.53.0”,

“stored bytes per frame”: 2688,

“stored bytes per run”: 41287680,

“worst FPS”: 431037.3

}