j5 上 grid sample 输入的 quantize 很耗时,如何剔除或者优化?
“latency (ms) by segments”: [-
4.053-
],-
“latency (us)”: 4052.9,-
“layer details”: [-
[-
“layer”,-
“original ops”,-
“original cost”,-
“computing cost (no DDR)”,-
“layer util (no DDR)”,-
“load/store cost”,-
“layer util (with DDR)”,-
“active period of time”-
],-
[-
“arg1_torch_native”,-
“0”,-
“34 us (57.1% of model)”,-
“2204 us (54.3% of model)”,-
“1.5%”,-
“122 us (3.0% of model)”,-
“1.4%”,-
“373 ~ 2827 us (2454)”-
],-
[-
“_sample_hz_grid_sample_grid_rescale_torch_native”,-
“0”,-
“9 us (14.2% of model)”,-
“9 us (0.2% of model)”,-
“93.4%”,-
“1 us (0% of model)”,-
“84.8%”,-
“2813 ~ 3798 us (985)”-
],-
[-
“_sample_hz_grid_sample”,-
“0”,-
“17 us (28.5% of model)”,-
“1213 us (29.9% of model)”,-
“1.4%”,-
“534 us (13.1% of model)”,-
“0.9%”,-
“0 ~ 4053 us (4053)”-
]-
],