1.芯片型号:J5等
2.天工开物开发包OpenExplorer版本:J5_OE_1.1.68等
3.问题定位:数据集打包
4.问题具体描述:
复现步骤:
nvidia-docker run -it --shm-size=“15g” -v `pwd`:/open_explorer openexplorer/ai_toolchain_ubuntu_20_j5_gpu:v1.1.68
cd ddk/samples/ai_toolchain/horizon_model_train_sample/scripts/
python3 tools/datasets/nuscenes_packer.py --src-data-dir /data/horizon_j5/data/nuscenes/ --pack-type lmdb --target-data-dir /data/horizon_j5/data/tmp_data/nuscenes/v1.0-mini --version v1.0-mini --split-name train
Traceback (most recent call last):
File “tools/datasets/nuscenes_packer.py”, line 5, in
from hat.data.datasets.nuscenes_dataset import NuscenesPacker
ModuleNotFoundError: No module named ‘hat’
颜值即正义
2
你好,从报错看,是缺少hat,建议按照如下方式测试一下docker环境:

颜值即正义
5
直接使用run_docker.sh的脚本启动容器试试嘞
颜值即正义
6
启动后,通过如下方式,检测torch和cuda是否正常

脚本内容:
import torch-
import time-
print(torch.__version__)-
print(torch.cuda.is_available())-
a = torch.randn(10000, 1000)-
b = torch.randn(1000, 2000)-
t0 = time.time()-
c = torch.matmul(a, b)-
t1 = time.time()-
print(a.device, t1 - t0, c.norm(2))-
device = torch.device(‘cuda’)-
a = a.to(device)-
b = b.to(device)-
t0 = time.time()-
c = torch.matmul(a, b)-
t2 = time.time()-
print(a.device, t2 - t0, c.norm(2))-
t0 = time.time()-
c = torch.matmul(a, b)-
t2 = time.time()-
print(a.device, t2 - t0, c.norm(2))
也试过了,可以离线安装hat的whl包吗,但我没找到安装包的对应文件路径
root@61969cbf279b:/open_explorer# python3 1.py
1.13.0+cu116
Segmentation fault (core dumped)
看起来是torch cuda环境也有问题
再请问下地平线OE包里面提供训练好的bev lss模型吗,还是说必须我们自己重头training?
颜值即正义
11