xlm_demo运行官方的DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm显示内存不足,报错如下

./xlm_demo --hbm_path /DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm --tokenizer_dir ../config/DeepSeek_R1_Distill_Qwen_1.5B_config/ --model_type 3
[UCP]: log level = 3
[UCP]: UCP version = 3.7.3
[VP]: log level = 3
[DNN]: log level = 3
[HPL]: log level = 3
[UCPT]: log level = 6
hbm_path: /DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm
tokenizer_dir: ../config/DeepSeek_R1_Distill_Qwen_1.5B_config/
model_type: 3
[BPU][[BPU_MONITOR]][281472903848064][INFO]BPULib verison(2, 1, 2)[0d3f195]!
[DNN] HBTL_EXT_DNN log level:6
[DNN]: 3.7.3_(4.2.11 HBRT)
[E][11155][10-27][13:11:07:199][configuration.cpp:208][xlm_demo][DNN] [HBRT] 4.2.11: [05h:11m:07s:199344900ns ERROR hbrt4_mem::unified] pid:11155 tid:11155 hbrt4_mem/src/unified.rs:138: Cannot malloc bpu memory with length 2234994104 bytes
[E][11155][10-27][13:11:07:199][packed_model.cpp:220][xlm_demo][DNN] [Model] Load hbm failed! error: HBRT4_STATUS_BAD_DATA
[E][11155][10-27][13:11:07:199][packed_model.cpp:130][xlm_demo][DNN] [Model] Load model failed, model file:/DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm
[E][11155][10-27][13:11:07:200][model_manager.cc:129][xlm_demo][mod_mgr] hbDNNInitializeFromFiles failed!
[E][11155][10-27][13:11:07:200][model_manager.cc:57][xlm_demo][mod_mgr] Load hbm pack failed!
[E][11155][10-27][13:11:07:200][deepseek_model.cc:59][xlm_demo][DeepSeekModel] DeepSeek Model manager init failed
[E][11155][10-27][13:11:07:200][xlm_impl.cc:39][xlm_demo][ret_check] Failed to init LmModel, error code: -1
xlm init failed
[E][11155][10-27][13:11:07:200][xlm.cc:56][xlm_demo][ret_check] xlm init failed, error code: 1
[E][11155][10-27][13:11:08:812][hb_dnn.cpp:104][xlm_demo][DNN] [Model] packed dnn handle is invalid

您好,需要调整ION内存设置为BPU first RDK S 系列设备树内存分配调整指南 - 社区动态 / 活动公告 - 地瓜机器人论坛

宁浩,我修改内存分配以后,解决掉了第一个问题,然后又报了一个npu驱动找不到的问题,我尝试手动insmod都失败了,log如下:./config/DeepSeek_R1_Distill_Qwen_1.5B_config/DeepSeek_R1_Distill_Qwen_1.5B.jinja --model_type 1 --bpu_core 0
[UCP]: log level = 3
[UCP]: UCP version = 3.7.3
[VP]: log level = 3
[DNN]: log level = 3
[HPL]: log level = 3
[UCPT]: log level = 6
[E][69757][10-30][22:18:11:711][task_scheduler.cpp:330][oellm_run][UCP] Init memory module failed, return -16777207
[E][69757][10-30][22:18:11:715][configuration.cpp:109][oellm_run][DNN] [Util] BPU drive error! Please check BPU drive info with cmd: lsmod
[E][69757][10-30][22:18:11:716][hb_dnn.cpp:51][oellm_run][DNN] [Model] Configuration init failed, please check!
[E][69757][10-30][22:18:11:716][model_manager.cc:187][oellm_run][mod_mgr] hbDNNInitializeFromFiles failed!
[E][69757][10-30][22:18:11:716][model_manager.cc:95][oellm_run][mod_mgr] Load hbm pack failed!
[E][69757][10-30][22:18:11:716][deepseek_model.cc:57][oellm_run][DeepSeekModel] DeepSeek Model manager init failed
[E][69757][10-30][22:18:11:716][xlm_impl.cc:92][oellm_run][XlmImpl] Failed to init LmModel
[E][69757][10-30][22:18:11:716][xlm_impl.cc:41][oellm_run][XlmImpl] Failed to XlmInitInner
xlm init failed
[E][69757][10-30][22:18:11:716][xlm.cc:60][oellm_run][ret_check] xlm init failed, error code: -1
[BPU][[HB_BPU]:281473661417024:1223][ERR]Invalid bpu core
ret -16777208
[E][69757][10-30][22:18:11:717][task_scheduler.cpp:346][oellm_run][UCP] Deinit memory module failed, return -16777208
我手动insmod报错的log如下:sudo insmod /lib64/modules/$(uname -r)/hobot-drivers/bpu/bpu_cores.ko
insmod: ERROR: could not insert module /lib64/modules/6.1.112-rt43-DR-4.0.3-2508182243-g2e3829-ge68b6e/hobot-drivers/bpu/bpu_cores.ko: Unknown symbol in module

您好,从日志看首报错还是memory fail BPU内存分配失败