在model_checker.py检查模型的时候报错

beyondaa · 2023 年10 月 26 日 03:23

用户您好，请详细描述您所遇到的问题，这会帮助我们快速定位问题~

1.芯片型号：J5

2.天工开物开发包OpenExplorer版本：J5_OE_1.1.60

3.问题定位：模型转换

4.问题具体描述：在model_checker.py检查模型的时候报错

`fx_force_duplicate_shared_convbn` will be set False by default after plugin 1.9.0. If you are not loading old checkpoint, please set `fx_force_duplicate_shared_convbn` False to train your new model.

`aidisdk` dependency is not available.

INFO - 2023-10-26 10:51:35,486 - driver - Generating grammar tables from /usr/lib/python3.8/lib2to3/Grammar.txt

INFO - 2023-10-26 10:51:35,510 - driver - Generating grammar tables from /usr/lib/python3.8/lib2to3/PatternGrammar.txt

WARNING - 2023-10-26 10:51:41,671 - registry - “RandomFlip:<class ‘hat_plugin.data.transforms.lidar.RandomFlip’> was already registered in HAT_OBJECT_REGISTRY registry, but get a new object <class ‘hat.data.transforms.detection.RandomFlip’>!”. Some objects in hat.data.transforms.detection are not registered!

WARNING - 2023-10-26 10:51:41,741 - registry - “RandomFlip:<class ‘hat_plugin.data.transforms.lidar.RandomFlip’> was already registered in HAT_OBJECT_REGISTRY registry, but get a new object <class ‘hat.data.transforms.detection.RandomFlip’>!”. Some objects in hat.data.transforms.seq_transform are not registered!

WARNING - 2023-10-26 10:51:42,579 - logger - wrap usage has been changed, please pass necessary args

WARNING - 2023-10-26 10:51:43,399 - registry - No module named ‘torchdynamo’. Some objects in hat.utils.compile_backends are not registered!

WARNING - 2023-10-26 10:51:43,467 - logger - GridSample module is deprecated,please use torch.nn.functional.grid_sample

INFO - 2023-10-26 10:51:43,738 - logger - building bifpn cell 0

INFO - 2023-10-26 10:51:43,739 - logger - fnode 0 : {‘inputs_offsets’: [3, 4], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

WARNING - 2023-10-26 10:51:43,739 - logger - default upsampling behavior when mode=bilinear is changed to align_corners=False since torch 0.4.0. Please specify align_corners=True if the old behavior is desired.

INFO - 2023-10-26 10:51:43,740 - logger - fnode 1 : {‘inputs_offsets’: [2, 5], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,740 - logger - fnode 2 : {‘inputs_offsets’: [1, 6], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,741 - logger - fnode 3 : {‘inputs_offsets’: [0, 7], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,742 - logger - fnode 4 : {‘inputs_offsets’: [1, 7, 8], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,742 - logger - fnode 5 : {‘inputs_offsets’: [2, 6, 9], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,743 - logger - fnode 6 : {‘inputs_offsets’: [3, 5, 10], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,744 - logger - fnode 7 : {‘inputs_offsets’: [4, 11], ‘sampling’: [‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,745 - logger - building bifpn cell 1

INFO - 2023-10-26 10:51:43,745 - logger - fnode 0 : {‘inputs_offsets’: [3, 4], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,745 - logger - fnode 1 : {‘inputs_offsets’: [2, 5], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,746 - logger - fnode 2 : {‘inputs_offsets’: [1, 6], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,746 - logger - fnode 3 : {‘inputs_offsets’: [0, 7], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,747 - logger - fnode 4 : {‘inputs_offsets’: [1, 7, 8], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,752 - logger - fnode 5 : {‘inputs_offsets’: [2, 6, 9], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,752 - logger - fnode 6 : {‘inputs_offsets’: [3, 5, 10], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,753 - logger - fnode 7 : {‘inputs_offsets’: [4, 11], ‘sampling’: [‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,753 - logger - building bifpn cell 2

INFO - 2023-10-26 10:51:43,753 - logger - fnode 0 : {‘inputs_offsets’: [3, 4], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,754 - logger - fnode 1 : {‘inputs_offsets’: [2, 5], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,754 - logger - fnode 2 : {‘inputs_offsets’: [1, 6], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,755 - logger - fnode 3 : {‘inputs_offsets’: [0, 7], ‘sampling’: [‘keep’, ‘up’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,755 - logger - fnode 4 : {‘inputs_offsets’: [1, 7, 8], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,755 - logger - fnode 5 : {‘inputs_offsets’: [2, 6, 9], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,756 - logger - fnode 6 : {‘inputs_offsets’: [3, 5, 10], ‘sampling’: [‘keep’, ‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’, ‘module’]}

INFO - 2023-10-26 10:51:43,756 - logger - fnode 7 : {‘inputs_offsets’: [4, 11], ‘sampling’: [‘keep’, ‘down’], ‘upsample_type’: [‘module’, ‘module’]}

INFO - 2023-10-26 10:51:44,026 - logger - neck total_fuse 33

INFO - 2023-10-26 10:51:44,398 - converters - Successfully convert float model to qat model.

2023-10-26 10:51:44,413 WARNING: ConvReLU2d has not collected any statistics of activations and its scale is 1, please check whether this is intended!

INFO - 2023-10-26 10:51:45,330 - converters - Successfully convert qat model to quantize model.

/usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/qtensor.py:1000: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

if scale is not None and scale.numel() > 1:

/usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/conv2d.py:290: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

per_channel_axis=-1 if self.out_scale.numel() == 1 else 1,

/clever/volumes/anjiaju-owner/dev/history/zone_hat/hat/models/necks/fast_scnn.py:75: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.

input_size = torch.Tensor((x.shape[2], x.shape[3])).cpu().numpy()

/clever/volumes/anjiaju-owner/dev/history/zone_hat/hat/models/necks/fast_scnn.py:75: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

input_size = torch.Tensor((x.shape[2], x.shape[3])).cpu().numpy()

/clever/volumes/anjiaju-owner/dev/history/zone_hat/hat/models/necks/fast_scnn.py:75: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

input_size = torch.Tensor((x.shape[2], x.shape[3])).cpu().numpy()

/usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/functional_modules.py:163: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

per_channel_axis=-1 if self.scale.numel() == 1 else 1,

/usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/utils/script_quantized_fn.py:239: UserWarning: operator() profile_node %323 : int = prim::profile_ivalue(%_storage_type)

does not have profile information (Triggered internally at ../torch/csrc/jit/codegen/cuda/graph_fuser.cpp:105.)

return compiled_fn(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/qtensor.py:296: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

assert input.q_scale().numel() == 1, (

/root/.local/lib/python3.8/site-packages/torch/jit/_trace.py:976: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module’s inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.

module._c._create_method_from_trace(

WARNING: _neck_extract_hz_cat_rescale_0 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _neck_extract_hz_cat_rescale_1 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _neck_extract_hz_cat_rescale_2 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _neck_extract_hz_cat_rescale_3 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _neck_extract_hz_cat_rescale_4 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _view_transformer_grid_samples_0_hz_grid_sample_grid_rescale is useless. iscale vs. oscale: tensor([0.0156]) , tensor([0.0156])

WARNING: _lidar_backbone_resBlock0_hz_cat_rescale_0 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock0_hz_cat_rescale_1 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock0_hz_cat_rescale_2 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock1_hz_cat_rescale_0 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock1_hz_cat_rescale_1 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock1_hz_cat_rescale_2 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock2_hz_cat_rescale_0 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock2_hz_cat_rescale_1 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock2_hz_cat_rescale_2 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock3_hz_cat_rescale_0 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock3_hz_cat_rescale_1 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

WARNING: _lidar_backbone_resBlock3_hz_cat_rescale_2 is useless. iscale vs. oscale: tensor([1.]) , tensor([1.])

Traceback (most recent call last):

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/prim_registry.py”, line 38, in prim_CallFunction

ret = getattr(horizon, func)(builder,

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/horizon_registry.py”, line 406, in interpolate

assert align_corners in [False, None]

AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 89, in _visit_node

getattr(prim, func_name)(self, node, *raw_args)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/prim_registry.py”, line 42, in prim_CallFunction

raise ValueError(

ValueError: (‘parsing horizon function’, ‘interpolate’, ‘in named childern’, ‘.lidar_backbone.upBlock0’, ‘args are’, (TensorManager: record[0]: TensorRecord(hbir=_lidar_backbone_resBlock3_hz_add, size=1x32x80x64, dtype=int8, scale=[], shift=[], torch_native=False), None, [2.0, 2.0], ‘bilinear’, True, None, object<resBlock3.floatFs.scale>, tensor([0]), ‘qint8’, ‘bayes’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 89, in _visit_node

getattr(prim, func_name)(self, node, *raw_args)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/prim_registry.py”, line 84, in prim_CallMethod

builder._visit_node(node)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 92, in _visit_node

raise ValueError(

ValueError: (‘parsing prim node’, %data.1 : Tensor = prim::CallFunction(%99, %88, %89, %91, %92, %93, %89, %scale.1, %argument_4.1, %97, %98) # :0:0

, ‘in named childern’, ‘.lidar_backbone.upBlock0’, ‘args are’, [‘interpolate’, TensorManager: record[0]: TensorRecord(hbir=_lidar_backbone_resBlock3_hz_add, size=1x32x80x64, dtype=int8, scale=[], shift=[], torch_native=False), None, [2.0, 2.0], ‘bilinear’, True, None, object<resBlock3.floatFs.scale>, tensor([0]), ‘qint8’, ‘bayes’])

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 89, in _visit_node

getattr(prim, func_name)(self, node, *raw_args)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/prim_registry.py”, line 84, in prim_CallMethod

builder._visit_node(node)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 92, in _visit_node

raise ValueError(

ValueError: (‘parsing prim node’, %145 : (Tensor, Tensor) = prim::CallMethod[name=“forward”](%upBlock0, %scale.11, %scale.9, %136, %137, %128, %129) # :0:0

, ‘in named childern’, ‘.lidar_backbone.upBlock0’, ‘args are’, [object, object<resBlock3.floatFs.scale>, object<resBlock2.floatFs.scale>, TensorManager: record[0]: TensorRecord(hbir=_lidar_backbone_resBlock3_hz_add, size=1x32x80x64, dtype=int8, scale=[], shift=[], torch_native=False), tensor([0]), TensorManager: record[0]: TensorRecord(hbir=_lidar_backbone_resBlock2_hz_add, size=1x64x160x64, dtype=int8, scale=[], shift=[], torch_native=False), tensor([0])])

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “tools/model_checker.py”, line 84, in

model_checker(args.config, args_env=args_env)

File “tools/model_checker.py”, line 64, in model_checker

flag = check_model(deploy_model, deploy_inputs, advice=10)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/tools.py”, line 162, in check_model

export_hbir(module, example_inputs, hbir_file.name, march)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/tools.py”, line 112, in export_hbir

builder.build_from_jit(script_module, example_inputs)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 294, in build_from_jit

self._build_from_jit_script(jit_obj, example_inputs)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 263, in _build_from_jit_script

self._visit_node(node)

File “/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py”, line 92, in _visit_node

raise ValueError(

ValueError: (‘parsing prim node’, %143 : (Tensor, Tensor) = prim::CallMethod[name=“forward”](%lidar_backbone, %109) # :0:0

, ‘in named childern’, ‘.lidar_backbone.upBlock0’, ‘args are’, [object<lidar_backbone>, TensorManager: record[0]: TensorRecord(hbir=arg0[lidar_rimg_points][0], size=1x256x640x4, dtype=int8, scale=[1.0], shift=[], torch_native=False)])

请问这个问题导致的原因是什么，该如何取解决

颜值即正义 · 2023 年10 月 26 日 04:47

你好，从log来看报错是和插值函数的align_corners配置有关，请结合算子列表检查一下interpolate算子的使用是否符合限制。-