用户您好,请详细描述您所遇到的问题。
1.硬件获取渠道:购买J5芯片
2.当前系统镜像版本:docker_openexplorer_ubuntu_20_j5_gpu_v1.1.40_py38
3.当前天工开物版本:horizon_j5_open_explorer_v1.1.40_py38_20230210
4.问题定位:执行bev训练的命令出现错误
5.开发的demo/案例:bev_release_package-1.6.16
6.需要提供的解决方案:
在ddk/package/host/路径下,执行bash install.sh后,再执行如下训练命令:python3 tools/train.py --config configs/bev/bev_mt_lss.py --stage float
出现错误如下:
2023-03-03 17:25:12,057 INFO [logger.py:147] Node[0] ==================================================BEGIN FLOAT STAGE==================================================
2023-03-03 17:25:12,090 INFO [thread_init.py:38] Node[1] init torch_num_thread is `12`,opencv_num_thread is `12`,openblas_num_thread is `12`,mkl_num_thread is `12`,omp_num_thread is `12`,
2023-03-03 17:25:12,108 INFO [thread_init.py:38] Node[3] init torch_num_thread is `12`,opencv_num_thread is `12`,openblas_num_thread is `12`,mkl_num_thread is `12`,omp_num_thread is `12`,
2023-03-03 17:25:12,108 INFO [thread_init.py:38] Node[2] init torch_num_thread is `12`,opencv_num_thread is `12`,openblas_num_thread is `12`,mkl_num_thread is `12`,omp_num_thread is `12`,
2023-03-03 17:25:12,111 INFO [thread_init.py:38] Node[0] init torch_num_thread is `12`,opencv_num_thread is `12`,openblas_num_thread is `12`,mkl_num_thread is `12`,omp_num_thread is `12`,
2023-03-03 17:25:12,143 ERROR [ddp_trainer.py:363] Node[1] Traceback (most recent call last):
File “/root/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py”, line 359, in _with_exception
fn(*args)
File “/open_explorer/bev_release_package/tools/train.py”, line 185, in train_entrance
trainer = build_from_registry(trainer)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 236, in build_from_registry
return _impl(x)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 213, in _impl
_raise_invalid_type_error(object_type)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 75, in _raise_invalid_type_error
raise TypeError(
TypeError: LSSTransformer has not registered in any of registry [‘HAT_OBJECT_REGISTRY’] and is not a class, which is not allowed
2023-03-03 17:25:12,157 ERROR [ddp_trainer.py:363] Node[0] Traceback (most recent call last):
File “/root/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py”, line 359, in _with_exception
fn(*args)
File “/open_explorer/bev_release_package/tools/train.py”, line 185, in train_entrance
trainer = build_from_registry(trainer)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 236, in build_from_registry
return _impl(x)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 213, in _impl
_raise_invalid_type_error(object_type)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 75, in _raise_invalid_type_error
raise TypeError(
TypeError: LSSTransformer has not registered in any of registry [‘HAT_OBJECT_REGISTRY’] and is not a class, which is not allowed
2023-03-03 17:25:12,157 ERROR [ddp_trainer.py:363] Node[3] Traceback (most recent call last):
File “/root/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py”, line 359, in _with_exception
fn(*args)
File “/open_explorer/bev_release_package/tools/train.py”, line 185, in train_entrance
trainer = build_from_registry(trainer)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 236, in build_from_registry
return _impl(x)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 213, in _impl
_raise_invalid_type_error(object_type)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 75, in _raise_invalid_type_error
raise TypeError(
TypeError: LSSTransformer has not registered in any of registry [‘HAT_OBJECT_REGISTRY’] and is not a class, which is not allowed
2023-03-03 17:25:12,158 ERROR [ddp_trainer.py:363] Node[2] Traceback (most recent call last):
File “/root/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py”, line 359, in _with_exception
fn(*args)
File “/open_explorer/bev_release_package/tools/train.py”, line 185, in train_entrance
trainer = build_from_registry(trainer)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 236, in build_from_registry
return _impl(x)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in _impl
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 196, in
build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 213, in _impl
_raise_invalid_type_error(object_type)
File “/root/.local/lib/python3.8/site-packages/hat/registry.py”, line 75, in _raise_invalid_type_error
raise TypeError(
TypeError: LSSTransformer has not registered in any of registry [‘HAT_OBJECT_REGISTRY’] and is not a class, which is not allowed
ERROR:__main__:launch trainer failed! process 0 terminated with exit code 1
Traceback (most recent call last):
File “tools/train.py”, line 277, in
train(
File “tools/train.py”, line 272, in train
raise e
File “tools/train.py”, line 255, in train
launch(
File “/root/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py”, line 328, in launch
mp.spawn(
File “/root/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method=‘spawn’)
File “/root/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 188, in start_processes
while not context.join():
File “/root/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 139, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 1