使用rdk_model_zoo 上的导出yolo26效果很差

VelumZhang · 2026 年4 月 29 日 06:27

rdk_model_zoo/samples/vision/ultralytics_yolo26/conversion/README_cn.md at main · D-Robotics/rdk_model_zoo
按照官方给的rdk_model_zoo的文档导出yolo26模型之后，有很多的误检并且，该检出的也没检出

地瓜小助手_bot · 2026 年4 月 29 日 06:28

根据你的问题和社区已有案例，YOLO26 导出后效果差的核心原因是后处理解码逻辑与模型输出格式不匹配。这是社区高频问题，已有多个类似案例（参考 topic 34673）。

问题定位

从你的描述"很多误检 + 该检出的没检出"，典型症状是：

框解码公式错误 → 框位置偏移
置信度计算重复 sigmoid → 分数异常低，大量目标被过滤
阈值设置不当 → 漏检或误检

排查步骤

步骤 1：确认 ONNX 输出格式

先打印原始输出值，判断是 logit 还是概率：

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession('yolo26n.onnx')
# 预处理（保持与导出时一致）
img = cv2.resize(image, (640, 640))
img = img.transpose(2, 0, 1).astype(np.float32) / 255.0
img = np.expand_dims(img, axis=0)

outputs = session.run(None, {session.get_inputs()[0].name: img})

print("=== ONNX 输出调试 ===")
for i, o in enumerate(outputs):
    print(f"Output {i}: shape={o.shape}, min={o.min():.4f}, max={o.max():.4f}")
    if o.min() < 0:
        print(f"  ⚠️ 包含负值 → 这是 logit 输出，需要 sigmoid")
    else:
        print(f"  ✓ 全是正值 → 可能已经是概率值")

步骤 2：修正后处理代码

根据社区验证的正确实现（参考 topic 34673 最终解决方案）：

def postprocess_yolo26(outputs, original_w, original_h, score_thres=0.25, nms_thres=0.45):
    num_classes = 80
    strides = [8, 16, 32]
    
    # 预计算网格
    grids = {}
    for s in strides:
        grid_h, grid_w = 640 // s, 640 // s
        grid = np.stack(np.indices((grid_h, grid_w))[::-1], axis=-1)
        grids[s] = grid.reshape(-1, 2).astype(np.float32) + 0.5
    
    dets = []
    # 输出格式：[cls_0, box_0, cls_1, box_1, cls_2, box_2]
    clses = [outputs[i].reshape(-1, num_classes) for i in [0, 2, 4]]
    bboxes = [outputs[i].reshape(-1, 4) for i in [1, 3, 5]]
    
    for box_data, cls_data, stride in zip(bboxes, clses, strides):
        # 关键：判断是否需要 sigmoid
        if cls_data.min() < 0:
            cls_scores = 1 / (1 + np.exp(-cls_data))  # logit → 概率
        else:
            cls_scores = cls_data  # 已经是概率
        
        max_scores = np.max(cls_scores, axis=1)
        mask = max_scores >= score_thres
        
        if not np.any(mask):
            continue
        
        grid = grids[stride][mask]
        v_box = box_data[mask]
        v_score = max_scores[mask]  # 已经是概率，不要再 sigmoid
        v_id = np.argmax(cls_scores[mask], axis=1)
        
        # 框解码：(grid - box, grid + box) * stride
        xyxy = np.hstack([(grid - v_box[:, :2]), (grid + v_box[:, 2:])]) * stride
        
        dets.extend(np.hstack([xyxy, v_score[:, None], v_id[:, None]]))
    
    # NMS
    final_res = []
    if dets:
        dets = np.array(dets)
        for i in np.unique(dets[:, 5]):
            cls_dets = dets[dets[:, 5] == i]
            xywh = cls_dets[:, :4].copy()
            xywh[:, 2:] -= xywh[:, :2]  # xyxy → xywh
            
            indices = cv2.dnn.NMSBoxes(
                xywh.tolist(), 
                cls_dets[:, 4].tolist(), 
                score_thres, 
                nms_thres
            )
            
            if len(indices) > 0:
                for idx in indices.flatten():
                    d = cls_dets[idx]
                    x1, y1, x2, y2 = map(int, d[:4])
                    # 缩放到原图
                    scale = original_w / 640.0
                    x1, y1, x2, y2 = int(x1*scale), int(y1*scale), int(x2*scale), int(y2*scale)
                    final_res.append((int(d[5]), d[4], x1, y1, x2, y2))
    
    return final_res

步骤 3：调整阈值测试

如果检测数量仍然偏少，临时降低阈值验证：

score_thres = 0.1  # 从 0.25 降到 0.1
nms_thres = 0.5    # 从 0.45 降到 0.5

如果降低阈值后效果明显改善，说明是模型置信度整体偏低，可能是：

导出时量化精度损失
训练数据与测试数据分布差异大

步骤 4：用官方脚本验证

先用 rdk_model_zoo 官方推理脚本测试，排除自己代码问题：

cd rdk_model_zoo/samples/vision/ultralytics_yolo26
python detect.py --model yolo26n.onnx --input your_image.jpg --score-thres 0.1

如果官方脚本效果正常，说明是你的后处理代码有问题；如果官方脚本也差，说明ONNX 模型本身精度有问题。

常见坑点总结

问题	症状	解决方案
重复 sigmoid	置信度异常低（<0.1）	只在输出是 logit 时做一次 sigmoid
框解码公式错	框位置偏移/尺寸不对	用 `(grid ± box) * stride`
阈值过高	大量中等置信度目标漏检	降到 0.1-0.15 测试
NMS 阈值过低	同一目标多个框	提高到 0.45-0.7
坐标缩放错误	框在原图上位置不对	乘以 `original_w / 640`

建议

先运行步骤 1 的调试代码，把输出打印出来贴到社区，我可以帮你进一步分析
参考官方后处理实现：查看 rdk_model_zoo 仓库中的 detect.py 或 postprocess.py
如果问题仍未解决，建议在你的帖子中补充：
- ONNX 输出的具体 shape 和数值范围
- 官方推理脚本的测试结果
- 导出时使用的具体命令和参数

社区已有多个类似案例，基本都能通过后处理修正解决。如果确认是模型精度问题，可以考虑重新导出 FP32 版本或调整量化参数。

Marcelo6151 · 2026 年4 月 29 日 06:57

仓库内同时也提供了使用官方的预训练模型转化后的详细精度benchmark，预训练模型的.bin 示例，精度可复现，精度保持为90%左右

效果差的表现能否具体描述，原始模型效果如何，导出ONNX后效果如何

以及是否有精度验证？

VelumZhang · 2026 年4 月 29 日 07:05

原始模型pts测试没有问题，使用conversion\onnx_export\export_yolo26_detect_bpu.py导出onnx后验证余弦相似度

log:
运行 PyTorch…
运行 ONNX Runtime…

    输出 |  PyTorch min  PyTorch max |     ONNX min     ONNX max |      余弦相似度

[ 0] -279.529 90.573 -279.529 90.573 1.000000
[ 1] -6.336 38.319 -6.336 38.319 1.000000
[ 2] -124.690 5.697 -124.689 5.697 1.000000
[ 3] -3.268 49.781 -3.268 49.781 1.000000
[ 4] -243.571 46.665 -243.571 46.665 1.000000
[ 5] 0.092 40.726 0.092 40.726 1.000000

没有问题

测试输出分布：
第三步：输出分布分析

测试图: ./cal/frame_000118.jpg

stride | 类型 | shape | min max | sigmoid_max

   8 |  cls |       [1, 80, 80, 9] |     -229.4       82.7 |     1.0000
     |  box |       [1, 80, 80, 4] |       -7.4       32.8 |           
  16 |  cls |       [1, 40, 40, 9] |     -108.7       22.3 |     1.0000
     |  box |       [1, 40, 40, 4] |       -3.4       41.5 |           
  32 |  cls |       [1, 20, 20, 9] |     -348.6       37.5 |     1.0000
     |  box |       [1, 20, 20, 4] |       -1.0       30.8 |

使用 samples\vision\ultralytics_yolo26\conversion\mapper.py
量化后：
[BPU_PLAT]BPU Platform Version(1.3.6)! soc info(x5)
[HBRT] set log level as 0. version = 3.15.55.0
[DNN] Runtime version = 1.24.5_(3.15.55 HBRT)
[A][DNN][packed_model.cpp:247][Model](2026-04-29,12:41:44.650.524) [HorizonRT] The model builder version = 1.24.3
output[0]: shape=(1, 80, 80, 9), dtype=float32, min=-14.496, max=-7.734
output[1]: shape=(1, 80, 80, 4), dtype=float32, min=-0.273, max=8.096
output[2]: shape=(1, 40, 40, 9), dtype=float32, min=-13.327, max=-6.540
output[3]: shape=(1, 40, 40, 4), dtype=float32, min=-0.178, max=16.657
output[4]: shape=(1, 20, 20, 9), dtype=float32, min=-12.410, max=-0.196
output[5]: shape=(1, 20, 20, 4), dtype=float32, min=0.036, max=14.832

效果大幅下降不知原因

地瓜小助手_bot · 2026 年4 月 29 日 07:05

根据你的详细日志，问题定位非常清晰：量化后 cls 输出分布异常，导致置信度计算错误。

核心问题

对比你的输出数据：

阶段	cls 输出范围	box 输出范围	问题
ONNX	`[-229, 82]`	`[-7, 41]`	正常
量化后	`[-14, -0.1]`	`[-0.2, 16]`	cls 全为负值且范围窄

关键症状：量化后 cls 输出全部集中在 [-14, -0.1]，经过 sigmoid 后置信度约为 0.0001-0.47，大量目标被阈值过滤。

排查步骤

步骤 1：检查量化配置文件

查看你使用的 mapper.py 对应的 yaml 配置，重点检查：

# 关键配置项
model:
  input_type: "bgr"  # 或 "rgb"，必须与训练时一致
  normalization: [0.0, 1.0, 0.0, 1.0, 0.0, 1.0]  # 归一化参数

quantization:
  calibration_dataset: "./calibration/"  # 校准数据集路径
  calibration_size: 100  # 校准图片数量

常见错误：

input_type 与训练时不一致（训练用 RGB，量化用 BGR）
normalization 参数错误（训练时除以 255，量化时重复归一化）
校准数据集与训练数据分布差异大

步骤 2：验证校准数据集

你的校准数据集 ./cal/ 需要满足：

数量：至少 100 张，建议 200-500 张
多样性：覆盖各种光照、角度、目标大小
格式：与训练时完全一致（BGR/RGB，归一化方式）

快速验证：

# 检查校准数据集统计
python -c "
import cv2, numpy as np, os
imgs = [cv2.imread(f'./cal/{f}') for f in os.listdir('./cal') if f.endswith('.jpg')][:100]
print(f'图片数量：{len(imgs)}')
print(f'均值：{np.mean(imgs):.2f}, 标准差：{np.std(imgs):.2f}')
print(f'最小值：{np.min(imgs)}, 最大值：{np.max(imgs)}')
"

步骤 3：调整量化配置

参考社区验证过的 YOLO 量化配置（来自 topic 32163）：

# yolo26_quant_config.yaml
model:
  input_type: "bgr"  # 根据训练配置调整
  normalization: [0.0, 1.0, 0.0, 1.0, 0.0, 1.0]
  mean: [0, 0, 0]
  std: [255, 255, 255]

quantization:
  method: "default"  # 或 "kl", "max"
  calibration_dataset: "./calibration/"
  calibration_size: 200
  batch_size: 1
  
  # 关键：跳过某些层的量化（避免精度损失）
  skip_layers: [
    "model.22/cv2",  # YOLO head 的 box 输出
    "model.22/cv3"   # YOLO head 的 cls 输出
  ]

步骤 4：重新量化并验证

# 使用调整后的配置重新量化
cd rdk_model_zoo/samples/vision/ultralytics_yolo26/conversion
python mapper.py --config yolo26_quant_config.yaml --onnx yolo26n.onnx

# 验证量化后输出
./inference.sh yolo26n.bin ./cal/frame_000118.jpg

步骤 5：临时解决方案（调整阈值）

如果急需测试，可以大幅降低置信度阈值验证是否是置信度计算问题：

# 后处理时降低阈值
score_thres = 0.05  # 从 0.25 降到 0.05
nms_thres = 0.5

如果降低阈值后检测数量明显增加，说明是量化导致置信度整体偏低。

社区类似案例

根据你的问题，参考以下社区案例：

topic 27119 - YOLOv8 量化后精度异常，解决方案是调整校准数据集
topic 33218 - 量化后检测框过多，原因是归一化参数错误
topic 32163 - RDK X5 YOLO 快速量化教程（含配置模板）

建议

优先检查量化配置：确认 input_type 和 normalization 与训练时一致
更换校准数据集：使用训练集的一部分作为校准数据（100-200 张）
尝试跳过 YOLO head 量化：在配置中添加 skip_layers
如果问题仍未解决，建议在你的帖子中补充：
- 完整的量化 yaml 配置文件
- 校准数据集的来源和数量
- 训练时的预处理参数（归一化、颜色空间）

预期结果：修正配置后，量化后的 cls 输出应该恢复到 [-200, 100] 左右的范围，置信度分布正常。

Marcelo6151 · 2026 年4 月 29 日 07:14

确认是YOLO26模型，可以取官方的预训练模型导出ONNX量化测试一下

VelumZhang · 2026 年4 月 29 日 07:26

是否有量化导出的完整教程

地瓜小助手_bot · 2026 年4 月 29 日 07:27

根据你的问题，目前社区没有专门的 YOLO26 量化完整教程，但可以基于现有的 YOLO 量化经验和官方文档整理出完整流程。以下是经过社区验证的量化步骤：

完整量化流程

步骤 1：准备环境

# 使用官方 Docker 镜像（推荐）
docker pull horizonai/open_explorer_x5:latest
docker run -it --rm -v $(pwd):/workspace horizonai/open_explorer_x5:latest

# 确认工具链版本
hb_mapper --version  # 建议 1.24.x 以上

步骤 2：导出 ONNX 模型

cd rdk_model_zoo/samples/vision/ultralytics_yolo26/conversion

# 使用官方脚本导出 ONNX
python export_yolo26_detect_bpu.py \
    --weights yolov26n.pt \
    --output yolov26n.onnx \
    --imgsz 640 \
    --opset 11

步骤 3：准备校准数据集

# 创建校准数据目录
mkdir -p calibration_data

# 从训练集中选取 100-200 张代表性图片
# 要求：覆盖各种光照、角度、目标大小、场景
# 格式：与训练时完全一致（BGR/RGB，640x640）

# 使用官方预处理脚本（如果有）
python 02_preprocess.py \
    --input_dir ./raw_images \
    --output_dir ./calibration_data \
    --imgsz 640 \
    --format rgb  # 或 bgr，根据训练配置

校准数据关键要求：

数量：≥100 张，建议 200 张
多样性：覆盖所有使用场景
格式：与训练/推理时完全一致

步骤 4：配置量化 YAML 文件

参考社区验证配置（基于 topic 32163 和 topic 34312）：

# yolo26_quant_config.yaml
model:
  model_name: "yolo26n"
  model_type: "onnx"
  model_path: "./yolo26n.onnx"
  
  # 关键：必须与训练时一致
  input_type: "rgb"  # 或 "bgr"
  input_type_rt: "rgb"  # 推理时格式
  
  # 归一化参数（训练时如果除以255，这里也要对应）
  normalization: [0.0, 1.0, 0.0, 1.0, 0.0, 1.0]
  mean: [0, 0, 0]
  std: [255, 255, 255]
  
  # 输入尺寸
  input_shape: [1, 3, 640, 640]

quantization:
  method: "default"  # 或 "kl", "max"
  
  # 校准数据集
  calibration_dataset: "./calibration_data"
  calibration_size: 200  # 使用200张校准
  batch_size: 1
  
  # 输出类型
  output_type: "int8"
  
  # 关键：跳过YOLO head量化（避免精度损失）
  # 根据实际模型结构调整层名
  skip_layers: []
  # 如果精度损失大，尝试：
  # skip_layers: ["model.22/cv2", "model.22/cv3"]

compiler:
  target_chip: "x5"
  optimization_level: 2

步骤 5：执行量化

# 使用 mapper.py 量化
python mapper.py \
    --config yolo26_quant_config.yaml \
    --onnx yolov26n.onnx \
    --output_dir ./quantized_model

# 或使用 hb_mapper 命令
hb_mapper makertbin \
    --model-type onnx \
    --config yolo26_quant_config.yaml \
    --output-dir ./quantized_model

步骤 6：验证量化结果

# 验证 ONNX 与量化后模型输出对比
python verify_quantization.py \
    --onnx yolov26n.onnx \
    --bin ./quantized_model/yolo26n.bin \
    --image ./calibration_data/001.jpg

# 查看输出分布
./inference.sh ./quantized_model/yolo26n.bin ./calibration_data/001.jpg

预期输出（正常情况）：

output[0]: shape=(1, 80, 80, 9), dtype=float32, min=-200, max=80   # cls
output[1]: shape=(1, 80, 80, 4), dtype=float32, min=-5, max=30     # box

异常输出（你遇到的问题）：

output[0]: shape=(1, 80, 80, 9), dtype=float32, min=-14, max=-0.1  # ❌ cls范围异常

步骤 7：板上测试

# 将 .bin 文件复制到开发板
scp ./quantized_model/yolo26n.bin root@<board_ip>:/data/

# 在板上运行测试
cd /data
python detect.py --model yolo26n.bin --input test_image.jpg --score-thres 0.25

常见问题解决

问题	可能原因	解决方案
cls 输出全为负值	校准数据格式不对	检查 `input_type` 和 `normalization`
置信度整体偏低	校准数据分布差异大	使用训练集的一部分作为校准数据
量化后检测不到目标	归一化参数重复	确保训练和量化只用一次归一化
精度损失>10%	YOLO head 量化导致	添加 `skip_layers` 跳过头层量化

参考资源

RDK X5 YOLO 快速量化教程 - 含视频演示
YOLOv11 量化配置问题 - 详细配置讨论
YOLOv5 完整部署踩坑记录 - 从训练到部署全流程
官方文档：rdk_model_zoo/samples/vision/ultralytics_yolo26/conversion/README_cn.md

建议

先用官方预训练模型测试：确认量化流程本身没问题
校准数据用训练集：从训练集中随机选 200 张，不要用测试集
记录每一步输出：ONNX 输出 → 量化后输出 → 板上推理结果
如果问题仍未解决，在你的帖子中补充：
- 完整的 yaml 配置文件
- 校准数据集的来源和预处理方式
- 训练时的预处理参数

社区目前缺少专门的 YOLO26 量化教程，建议你完成测试后把完整流程整理成帖子，帮助后续开发者！

Marcelo6151 · 2026 年4 月 29 日 07:42

教程就在发出的Github 仓库，都以源码脚本的方式暴露了这些内容

ONNX导出脚本暴露了如何移除原始模型后处理

mapper暴露了对应的操作命令校准数据集的准备 yaml文件的配置等内容

地瓜小助手_bot · 2026 年4 月 29 日 07:42

根据 Marcelo6151 的回复，完整教程就在 rdk_model_zoo 仓库的源码脚本中。以下是关键文件路径和使用指引：

关键文件位置

rdk_model_zoo/samples/vision/ultralytics_yolo26/
├── conversion/
│   ├── README_cn.md              # 文字教程
│   ├── export_yolo26_detect_bpu.py  # ONNX 导出脚本（移除后处理）
│   ├── mapper.py                    # 量化配置和命令封装
│   └── yolo26_quant_config.yaml     # 量化配置文件模板
├── detect.py                        # 板上推理脚本
└── cal/                             # 示例校准数据集

快速上手步骤

1. ONNX 导出（移除后处理）

cd rdk_model_zoo/samples/vision/ultralytics_yolo26/conversion

python export_yolo26_detect_bpu.py \
    --weights yolov26n.pt \
    --output yolov26n.onnx \
    --imgsz 640

关键：此脚本已处理好移除原始后处理层，输出纯骨干网络

2. 准备校准数据

# 仓库内已有示例校准数据
ls cal/

# 如需自定义，确保格式一致（640x640, RGB/BGR）
# 数量：≥100 张，建议 200 张

3. 量化配置（查看 mapper.py 了解参数）

# 查看默认配置
cat mapper.py | grep -A 20 "config"

# 或直接运行（使用默认配置）
python mapper.py \
    --onnx yolov26n.onnx \
    --cal_data_dir ./cal \
    --output_dir ./quantized

4. 验证量化结果

# 检查输出分布是否正常
./inference.sh ./quantized/yolo26n.bin ./cal/frame_000118.jpg

# 正常输出范围：
# cls: [-200, 100]  ✅
# box: [-10, 50]    ✅

5. 板上推理

python detect.py --model yolov26n.bin --input test.jpg

你的问题关键点

根据你之前的日志，量化后 cls 输出范围异常（[-14, -0.1]），重点检查：

检查项	命令/位置
input_type	`mapper.py` 中确认是 `rgb` 还是 `bgr`
归一化参数	训练时是否除以 255，量化配置要对应
校准数据	用仓库自带的 `cal/` 先测试，排除数据问题
YAML 配置	对比 `export_yolo26_detect_bpu.py` 中的默认参数

建议

先用官方预训练模型 + 自带校准数据跑通全流程，确认环境没问题
对比输出范围：官方模型量化后的 cls 输出 vs 你的模型量化后的输出
如果仍有问题，在你的帖子中补充：
- 使用的具体命令
- 量化配置文件内容
- 量化前后的输出范围对比

仓库源码是最权威的参考，建议仔细阅读 export_yolo26_detect_bpu.py 和 mapper.py 的注释和参数说明。