OE-v1.1.68 数据集处理 ModuleNotFoundError: No module named 'hat'

1.芯片型号:J5等

2.天工开物开发包OpenExplorer版本:J5_OE_1.1.68等

3.问题定位:数据集打包

4.问题具体描述:

复现步骤:

nvidia-docker run -it --shm-size=“15g” -v `pwd`:/open_explorer openexplorer/ai_toolchain_ubuntu_20_j5_gpu:v1.1.68

cd ddk/samples/ai_toolchain/horizon_model_train_sample/scripts/

python3 tools/datasets/nuscenes_packer.py --src-data-dir /data/horizon_j5/data/nuscenes/ --pack-type lmdb --target-data-dir /data/horizon_j5/data/tmp_data/nuscenes/v1.0-mini --version v1.0-mini --split-name train

Traceback (most recent call last):

File “tools/datasets/nuscenes_packer.py”, line 5, in

from hat.data.datasets.nuscenes_dataset import NuscenesPacker

ModuleNotFoundError: No module named ‘hat’

你好,从报错看,是缺少hat,建议按照如下方式测试一下docker环境:

环境检测如上,是还需要设置什么环境变量吗

直接使用run_docker.sh的脚本启动容器试试嘞

启动后,通过如下方式,检测torch和cuda是否正常

脚本内容:

import torch-
import time-

print(torch.__version__)-
print(torch.cuda.is_available())-

a = torch.randn(10000, 1000)-
b = torch.randn(1000, 2000)-
t0 = time.time()-
c = torch.matmul(a, b)-
t1 = time.time()-
print(a.device, t1 - t0, c.norm(2))-
device = torch.device(‘cuda’)-
a = a.to(device)-
b = b.to(device)-
t0 = time.time()-
c = torch.matmul(a, b)-
t2 = time.time()-
print(a.device, t2 - t0, c.norm(2))-
t0 = time.time()-
c = torch.matmul(a, b)-
t2 = time.time()-
print(a.device, t2 - t0, c.norm(2))

也试过了,可以离线安装hat的whl包吗,但我没找到安装包的对应文件路径

root@61969cbf279b:/open_explorer# python3 1.py

1.13.0+cu116

Segmentation fault (core dumped)

看起来是torch cuda环境也有问题

这样的话,就需要检查下驱动版本是否符合要求了:

再请问下地平线OE包里面提供训练好的bev lss模型吗,还是说必须我们自己重头training?

模型是提供的,你去尝试使用就知道啦。可参考:https://developer.horizon.cc/forumDetail/143772473308124163