问题
1.地瓜工具链是否能够提供精度对齐的工具?来验证校准数据集的预处理和原始训练数据集的预处理是否一致。
2.或者说请问我以下的量化转换过程有问题吗?
3.想了解下特定地对于transformer架构(不一定是分割模型)的地瓜BPU量化实践,有没有一些经验以及最佳实践可以分享?
算法工具链文档参考
环境安装 | RDK DOC
算法工具链docker
openexplorer/ai_toolchain_ubuntu_20_x5_cpu
模型
采用的是预训练好的pytorch.bin分割模型: nvidia/segformer-b0-finetuned-ade-512-512 · Hugging Face
校准数据集图片
挑选了COCO2017数据集的100张含有草地的图片,例子如下
校准数据集准备
1.segformer的预处理配置文件preprocessor.json
{
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "SegformerFeatureExtractor",
"image_mean": [
0.485,
0.456,
0.406
],
"image_std": [
0.229,
0.224,
0.225
],
"reduce_labels": true,
"resample": 2,
"size": 512
}
2.根据预处理配置文件生成预处理好的校准数据集代码如下
from pathlib import Path
import numpy as np
from PIL import Image
from transformers import SegformerImageProcessor
# 必填参数
MODEL_DIR = r"nvidia/segformer-b0-finetuned-ade-512-512" # 含 preprocessor_config.json 的目录
SRC = r"./grass_img" # 图片文件或图片文件夹
DST = r"./calibrate_float512" # 单图->.bin文件;批量->输出目录
# DST = r"inference_512int8" # 单图->.bin文件;批量->输出目录
LAYOUT = "NCHW" # 或 "NHWC"
EXTS = {".jpg", ".jpeg", ".png", ".bmp", ".webp", ".tif", ".tiff"}
def is_img(p: Path):
return p.suffix.lower() in EXTS
def process_one(processor, img_path: Path, layout: str):
img = Image.open(img_path).convert("RGB")#Image读图出来,转成numpy后,通道是HWC
arr = processor(images=img, return_tensors="np")["pixel_values"] # (1,3,H,W) float32
print("ori_arr:",arr.shape)#1 3 512 512
if layout.upper() == "NHWC":
arr = np.transpose(arr, (0, 2, 3, 1)) # -> (1,H,W,C)
arr = arr.astype(np.float32, copy=False)
# print("arr:",arr)
# arr = arr.astype(np.int8, copy=False)
print("arr:", arr)
return np.ascontiguousarray(arr)
def save_bin(arr: np.ndarray, out_path: Path):
out_path.parent.mkdir(parents=True, exist_ok=True)
arr.tofile(out_path) # 原始小端 float32 二进制
def main():
processor = SegformerImageProcessor.from_pretrained(MODEL_DIR)
src, dst = Path(SRC), Path(DST)
if src.is_file(): # 单张
out = dst if dst.suffix.lower() == ".bin" else dst.with_suffix(".bin")
arr = process_one(processor, src, LAYOUT)
save_bin(arr, out)
print(f"Saved {out} shape={arr.shape} dtype={arr.dtype}")
else: # 批量
files = [p for p in src.rglob("*") if p.is_file() and is_img(p)]
dst.mkdir(parents=True, exist_ok=True)
for p in files:
arr = process_one(processor, p, LAYOUT)
save_bin(arr, dst / f"{p.stem}.bin")
print(f"Saved {len(files)} files to {dst.resolve()}")
if __name__ == "__main__":
main()
量化配置文件
# Copyright (c) 2020 Horizon Robotics.All Rights Reserved.
#
# The material in this file is confidential and contains trade secrets
# of Horizon Robotics Inc. This is proprietary information owned by
# Horizon Robotics Inc. No part of this work may be disclosed,
# reproduced, copied, transmitted, or used in any way for any purpose,
# without the express written permission of Horizon Robotics Inc.
# 模型转化相关的参数
# ------------------------------------
# model conversion related parameters
model_parameters:
# onnx 浮点模型文件
# -----------------------------------------------------------
# floating-point ONNX model file
onnx_model: "./segformer_512.onnx"
# 适用BPU架构
# --------------------------------
# the applicable BPU architecture
march: "bayes-e"
# 指定模型转换过程中是否输出各层的中间结果,如果为True,则输出所有层的中间输出结果,
# --------------------------------------------------------------------------------------
# specifies whether or not to dump the intermediate results of all layers in conversion
# if set to True, then the intermediate results of all layers shall be dumped
layer_out_dump: False
# 模型转换输出的结果的存放目录
# -----------------------------------------------------------
# the directory in which model conversion results are stored
working_dir: './model_convert/board_model/'
# 模型转换输出的用于上板执行的模型文件的名称前缀
# -----------------------------------------------------------------------------------------
# model conversion generated name prefix of those model files used for dev board execution
output_model_file_prefix: 'segformer_b0_512x512'
#remove_node_type: "Reshape; Cast"
# 模型输入相关参数, 若输入多个节点, 则应使用';'进行分隔, 使用默认缺省设置则写None
# --------------------------------------------------------------------------
# model input related parameters,
# please use ";" to seperate when inputting multiple nodes,
# please use None for default setting
input_parameters:
# (选填) 模型输入的节点名称, 此名称应与模型文件中的名称一致, 否则会报错, 不填则会使用模型文件中的节点名称
# --------------------------------------------------------------------------------------------------------
# (Optional) node name of model input,
# it shall be the same as the name of model file, otherwise an error will be reported,
# the node name of model file will be used when left blank
input_name: ""
# 网络实际执行时,输入给网络的数据格式,包括 nv12/rgb/bgr/yuv444/gray/featuremap,
# -----------------------------------------------------------------------------------------
# the data formats to be passed into neural network when actually performing neural network
# available options: nv12/rgb/bgr/yuv444/gray/featuremap,
input_type_rt: 'rgb'
# 网络实际执行时输入的数据排布, 可选值为 NHWC/NCHW
# 若input_type_rt配置为nv12,则此处参数不需要配置
# ------------------------------------------------------------------
# the data layout formats to be passed into neural network when actually performing neural network, available options: NHWC/NCHW
# If input_type_rt is configured as nv12, then this parameter does not need to be configured
input_layout_rt: 'NCHW'
# 网络训练时输入的数据格式,可选的值为rgb/bgr/gray/featuremap/yuv444
# --------------------------------------------------------------------
# the data formats in network training
# available options: rgb/bgr/gray/featuremap/yuv444
input_type_train: 'rgb'
# 网络训练时输入的数据排布, 可选值为 NHWC/NCHW
# ------------------------------------------------------------------
# the data layout in network training, available options: NHWC/NCHW
input_layout_train: 'NCHW'
# (选填) 模型网络的输入大小, 以'x'分隔, 不填则会使用模型文件中的网络输入大小,否则会覆盖模型文件中输入大小
# -------------------------------------------------------------------------------------------
# (Optional)the input size of model network, seperated by 'x'
# note that the network input size of model file will be used if left blank
# otherwise it will overwrite the input size of model file
input_shape: '1x3x512x512'
# 网络输入的预处理方法,主要有以下几种:
# no_preprocess 不做任何操作
# data_mean 减去通道均值mean_value
# data_scale 对图像像素乘以data_scale系数
# data_mean_and_scale 减去通道均值后再乘以scale系数
# -------------------------------------------------------------------------------------------
# preprocessing methods of network input, available options:
# 'no_preprocess' indicates that no preprocess will be made
# 'data_mean' indicates that to minus the channel mean, i.e. mean_value
# 'data_scale' indicates that image pixels to multiply data_scale ratio
# 'data_mean_and_scale' indicates that to multiply scale ratio after channel mean is minused
# norm_type: 'data_mean_and_scale'
norm_type: 'no_preprocess'
# 图像减去的均值, 如果是通道均值,value之间必须用空格分隔
# --------------------------------------------------------------------------
# the mean value minused by image
# note that values must be seperated by space if channel mean value is used
# mean_value: 123.675 116.28 103.53
mean_value: 0 0 0
# 图像预处理缩放比例,如果是通道缩放比例,value之间必须用空格分隔
# ---------------------------------------------------------------------------
# scale value of image preprocess
# note that values must be seperated by space if channel scale value is used
# scale_value: 4.3668 4.4643 4.4444
scale_value: 1 1 1
# 模型量化相关参数
# -----------------------------
# model calibration parameters
calibration_parameters:
# 模型量化的参考图像的存放目录,图片格式支持Jpeg、Bmp等格式,输入的图片
# 应该是使用的典型场景,一般是从测试集中选择20~100张图片,另外输入
# 的图片要覆盖典型场景,不要是偏僻场景,如过曝光、饱和、模糊、纯黑、纯白等图片
# 若有多个输入节点, 则应使用';'进行分隔
# -------------------------------------------------------------------------------------------------
# the directory where reference images of model quantization are stored
# image formats include JPEG, BMP etc.
# should be classic application scenarios, usually 20~100 images are picked out from test datasets
# in addition, note that input images should cover typical scenarios
# and try to avoid those overexposed, oversaturated, vague,
# pure blank or pure white images
# use ';' to seperate when there are multiple input nodes
cal_data_dir: '../calibrate_precess/calibrate_float512'
# 校准数据二进制文件的数据存储类型,可选值为:float32, uint8
# calibration data binary file save type, available options: float32, uint8
cal_data_type: 'float32'
# 如果输入的图片文件尺寸和模型训练的尺寸不一致时,并且preprocess_on为true,
# 则将采用默认预处理方法(skimage resize),
# 将输入图片缩放或者裁减到指定尺寸,否则,需要用户提前把图片处理为训练时的尺寸
# ---------------------------------------------------------------------------------
# In case the size of input image file is different from that of in model training
# and that preprocess_on is set to True,
# shall the default preprocess method(skimage resize) be used
# i.e., to resize or crop input image into specified size
# otherwise user must keep image size as that of in training in advance
preprocess_on: False
# 模型量化的算法类型,支持default、mix、kl、max、load,通常采用default即可满足要求
# 如不符合预期可先尝试修改为mix 仍不符合预期再尝试kl或max
# 当使用QAT导出模型时,此参数则应设置为load
# 相关参数的技术原理及说明请您参考用户手册中的PTQ原理及步骤中参数组详细介绍部分
# ----------------------------------------------------------------------------------
# The algorithm type of model quantization, support default, mix, kl, max, load, usually use default can meet the requirements.
# If it does not meet the expectation, you can try to change it to mix first. If there is still no expectation, try kl or max again.
# When using QAT to export the model, this parameter should be set to load.
# For more details of the parameters, please refer to the parameter details in PTQ Principle And Steps section of the user manual.
calibration_type: 'default'
# 该参数为'max'校准方法的参数,用以调整'max'校准的截取点。此参数仅在calibration_type为'max'时有效。
# 该参数取值范围:0.0 ~ 1.0。常用配置选项有:0.99999/0.99995/0.99990/0.99950/0.99900。
# ------------------------------------------------------------------------------------------------
# this is the parameter of the 'max' calibration method and it is used for adjusting the intercept point of the 'max' calibration.
# this parameter will only become valid when the calibration_type is specified as 'max'.
# RANGE: 0.0 - 1.0. Typical options includes: 0.99999/0.99995/0.99990/0.99950/0.99900.
max_percentile: 0.99999
# 编译器相关参数
# ----------------------------
# compiler related parameters
compiler_parameters:
# 编译策略,支持bandwidth和latency两种优化模式;
# bandwidth以优化ddr的访问带宽为目标;
# latency以优化推理时间为目标
# -------------------------------------------------------------------------------------------
# compilation strategy, there are 2 available optimization modes: 'bandwidth' and 'lantency'
# the 'bandwidth' mode aims to optimize ddr access bandwidth
# while the 'lantency' mode aims to optimize inference duration
compile_mode: 'latency'
# 设置debug为True将打开编译器的debug模式,能够输出性能仿真的相关信息,如帧率、DDR带宽占用等
# -----------------------------------------------------------------------------------
# the compiler's debug mode will be enabled by setting to True
# this will dump performance simulation related information
# such as: frame rate, DDR bandwidth usage etc.
debug: True
# 编译模型指定核数,不指定默认编译单核模型, 若编译双核模型,将下边注释打开即可
# -------------------------------------------------------------------------------------
# specifies number of cores to be used in model compilation
# as default, single core is used as this value left blank
# please delete the "# " below to enable dual-core mode when compiling dual-core model
# core_num: 2
# 优化等级可选范围为O0~O3
# O0不做任何优化, 编译速度最快,优化程度最低,
# O1-O3随着优化等级提高,预期编译后的模型的执行速度会更快,但是所需编译时间也会变长。
# 推荐用O2做最快验证
# ----------------------------------------------------------------------------------------------------------
# optimization level ranges between O0~O3
# O0 indicates that no optimization will be made
# the faster the compilation, the lower optimization level will be
# O1-O3: as optimization levels increase gradually, model execution, after compilation, shall become faster
# while compilation will be prolonged
# it is recommended to use O2 for fastest verification
optimize_level: 'O3'
量化结果读取代码
sunrise@ubuntu:~$ python3
import bpu_infer_lib
inf = bpu_infer_lib.Infer(True)
inf.load_model("/userdata/segformer_b0_512x512.bin")
import numpy as np
input_img=np.fromfile("/userdata/000000000872.bin",dtype=np.int8)
inf.read_input(input_img,0)
inf.forward()
inf.get_output()
out_data=inf.outputs[0].data
输出out_data的分割分类值均为负值,无法实现分割
已做的排查
1.量化中间的产物 original_float.onnx的分割结果,没有问题,输出值也正常
2.配置文件参数确认
3.量化精度cos值小于0.8
*新人发帖只能发一张图,需要我补充的信息随时联系我
