各位大佬,有没有稀疏卷积的相关实现示例,咱们官方有相关的3D卷积实现示例demo不
你好,可以参考一下nodehub里面S100的视频分类,这个是目前释放出来的demo:developer.d-robotics.cc/nodehubdetail/1984163899718823937
另外,方便了解一下您这边的项目背景吗?请问是个人研究测试,还是有具体的商业产品落地规划?如果是商业项目,具体是应用在哪类终端产品上?了解这些背景有助于我们协调更多内部资源支持。
现在还在预研测试阶段,有落地规划,希望能将一些包含spconv的算法在咱们的开发板上部署呢
我看到里面主要还是普通的3D卷积,spconv不知道是否也有相关案例呢
您好,目前我们呢的案例在Nodehub和手册中都可以找到,整体BPU模型部署可以参考一下算法工具链:OE文档脉络 - OpenExplorer
包括附录的算子支持列表,通用的onnx算子我们都是支持的,一般的算法都是没问题的
那如果是不在你们手册中的算子,想要用你们的BPU来加速,有没有参考文档可用呢
您好,目前不支持,建议进行算子替换,转化成可支持的算子进行落地
你好,我也在做点云算法移植,碰到同样的问题,大家可以交流下 wx: haipeng90
结论
目前地平线工具链不支持稀疏卷积(spconv)算子,官方建议进行算子替换,将稀疏卷积转化为标准3D卷积或其他支持的算子进行部署。
解决方案
1. 现有资源参考
社区已有成熟的点云算法参考实现,可直接参考:
- PointPillars参考算法 - 包含点云预处理和标准卷积实现
- CenterPoint参考算法 - 3D检测完整方案
2. 算子替换策略
# 稀疏卷积 → 标准卷积的转换思路
# 方案1: 将稀疏体素网格转换为稠密网格
# 方案2: 使用PointPillarsScatter等支持的算子替代
# 方案3: 修改网络结构,使用标准3D卷积层
3. 验证算子支持
查看官方算子支持列表:
- S100/S100P ONNX算子支持:S100/S100P ONNX 算子支持列表 - 地瓜算法工具链官方用户手册
- S100/S100P Torch算子支持:S100/S100P Torch 算子支持列表 - 地瓜算法工具链官方用户手册
4. 建议步骤
- 导出ONNX模型,使用
hb_compile工具检查算子支持情况 - 定位不支持的算子,查看报错信息
- 修改网络结构,替换为支持的算子组合
- 参考PointPillars实现,采用已验证的架构
社区交流
话题中已有开发者在研究类似问题,可以:
- 在该帖子下继续讨论技术细节
- 参考PointPillars/CenterPoint的开源实现
- 联系官方技术支持获取定制方案(如为商业项目)
需要我帮你查看具体的算子替换方案或PointPillars的实现细节吗?
subconv3d 算子
针对你提到的 subconv3d(稀疏子流形3D卷积)算子,目前地平线工具链确实不支持。这是稀疏卷积(Sparse Convolution)的核心算子之一,在 spconv、MinkowskiEngine 等库中广泛使用。
核心结论
subconv3d 不在官方算子支持列表中,无法直接在 BPU 上加速。必须通过算子替换或网络结构调整来实现部署。
可行方案
方案 1:使用 PointPillarsScatter 替代(推荐)
社区已有成熟的 PointPillars 参考算法,它通过以下方式规避了稀疏卷积:
# 关键改动:使用地平线定制的 PointPillarsScatter 算子
# 将稀疏体素特征转换为稠密伪图像,然后用标准 2D 卷积处理
# 原始稀疏卷积流程(不支持)
# sparse_tensor → subconv3d → sparse_tensor
# 替代方案(支持)
# 点云 → Voxelization → PillarFeatureNet → PointPillarsScatter → 2D CNN
优势:
PointPillarsScatter是地平线定制算子,已集成在horizon_plugin_pytorch中- 后续使用标准
Conv2d,BPU 完全支持 - 已在 J5/S100 平台上验证,性能 25.8ms + 后处理 2.6ms
方案 2:体素化 + 标准 3D 卷积
将稀疏卷积网络改为稠密 3D 卷积:
# 修改前(spconv)
import spconv
self.conv1 = spconv.SubMConv3d(...)
# 修改后(标准卷积)
import torch.nn as nn
# 1. 先将稀疏体素转换为稠密张量
# 2. 使用标准 3D 卷积
self.conv1 = nn.Conv3d(in_channels, out_channels, kernel_size=3, padding=1)
注意:
- 需要调整网络结构,确保体素网格尺寸合理(避免显存爆炸)
- 可能损失稀疏性带来的效率优势
方案 3:算子分解(复杂,不推荐)
尝试将 subconv3d 分解为支持的算子组合:
Gather+Conv2D+Scatter- 但实现复杂,性能难以保证
验证步骤
-
检查算子支持
# 导出 ONNX 模型 hb_compile --check-model your_model.onnx # 查看报错的算子类型 -
参考 PointPillars 实现
- 源码位置:OE 包中
ddk/samples/ai_toolchain/horizon_model_train_sample - 关键算子:
horizon_plugin_pytorch.nn.PointPillarsScatter - 文档:PointPillars 参考算法详解
- 源码位置:OE 包中
-
联系官方支持
- 如果是商业项目,可在原帖中说明项目背景
- 官方可能提供定制化算子支持或优化建议
建议
优先采用 PointPillars 架构,这是目前社区验证最成熟的点云部署方案。如果必须使用稀疏卷积网络(如 CenterPoint 的稀疏 Backbone),建议:
- 评估是否可改为 Pillar-based 方法
- 联系地平线技术支持,确认是否有未公开的稀疏卷积优化方案
- 考虑在 CPU/GPU 上运行稀疏卷积部分,BPU 仅处理稠密部分(混合部署)
需要我帮你分析具体的网络结构,给出修改建议吗?
spconv 算子什么时候支持
结论
目前官方没有公开 spconv/subconv3d 算子的具体支持时间表。
根据社区历史回复模式,新算子支持通常通过工具链版本更新逐步释放(如之前 MatMul、centered_yuv2rgb 等算子都是在后续版本中添加),但稀疏卷积属于较复杂的自定义算子,暂无明确规划。
建议行动
1. 短期方案(立即落地)
继续采用算子替换策略:
- 优先使用 PointPillars 架构(已验证)
- 或将稀疏卷积改为标准 3D 卷积 + 体素化预处理
2. 推动官方支持
如果你是商业项目且有明确落地需求:
- 在当前帖子中详细说明项目背景、终端产品、预期出货量
- 官方曾表示:“了解背景有助于协调内部资源支持”
- 可能获得定制化算子支持或优先级排期
3. 关注版本更新
- 定期查看工具链发布说明:https://toolchain.d-robotics.cc/
- 关注社区 Model Zoo 分类 的新算子公告
- 订阅 OE 包版本更新通知
4. 替代架构调研
如果项目不强制要求稀疏卷积,可评估:
- PointPillars:2D 卷积处理伪图像,BPU 高效支持
- CenterPoint 稠密版:使用标准 3D 卷积
- 混合部署:稀疏部分 CPU/GPU 运行,BPU 处理后续稠密层
需要我帮你:
- 分析你的网络结构,给出具体修改方案?
- 查看 PointPillars 的完整实现细节?
- 协助准备商业项目背景说明,向官方申请资源支持?
DefaultSegmentorV2(
(seg_head): Linear(in_features=72, out_features=11, bias=True)
(backbone): LitePT(
(embedding): Embedding(
(stem): PointSequential(
(conv): SubMConv3d(4, 36, kernel_size=[5, 5, 5], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.Native)
(norm): BatchNorm1d(36, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(act): GELU(approximate=‘none’)
)
)
(enc): PointSequential(
(enc0): PointSequential(
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(36, 36, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=36, out_features=36, bias=True)
(2): LayerNorm((36,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(36, 36, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=36, out_features=36, bias=True)
(2): LayerNorm((36,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc1): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=36, out_features=72, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(72, 72, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=72, out_features=72, bias=True)
(2): LayerNorm((72,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(72, 72, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=72, out_features=72, bias=True)
(2): LayerNorm((72,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc2): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=72, out_features=144, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(144, 144, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=144, out_features=144, bias=True)
(2): LayerNorm((144,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(144, 144, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=144, out_features=144, bias=True)
(2): LayerNorm((144,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc3): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=144, out_features=252, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.138)
)
)
(block1): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.162)
)
)
(block2): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.185)
)
)
(block3): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.208)
)
)
(block4): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.231)
)
)
(block5): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.254)
)
)
)
(enc4): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=252, out_features=504, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(504, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(norm0): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=504, out_features=1512, bias=True)
(proj): Linear(in_features=504, out_features=504, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=504, out_features=2016, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=2016, out_features=504, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.277)
)
)
(block1): Block(
(norm0): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=504, out_features=1512, bias=True)
(proj): Linear(in_features=504, out_features=504, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=504, out_features=2016, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=2016, out_features=504, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.300)
)
)
)
)
(dec): PointSequential(
(dec3): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=504, out_features=252, bias=True)
(1): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=252, out_features=252, bias=True)
(1): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec2): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=252, out_features=144, bias=True)
(1): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=144, out_features=144, bias=True)
(1): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec1): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=144, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=72, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec0): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=72, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=36, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
)
)
)
看到你贴的网络结构了,这是一个典型的 稀疏卷积点云分割网络(类似 PTv2/PTv3 架构)。问题比较明确:
核心问题分析
你的网络中不支持的算子:
| 算子 | 出现位置 | 支持状态 |
|---|---|---|
SubMConv3d |
stem + enc0~enc2 多处 | |
GridPooling |
enc0~enc4 下采样 | |
GridUnpooling |
dec0~dec3 上采样 | |
PointROPEAttention |
enc3~enc4 注意力 | |
PointSequential |
整个网络封装 |
支持的算子:
Linear、BatchNorm1d、LayerNorm、GELU、Softmax、Dropout
可行方案
方案 1:架构替换(推荐,可落地)
改为 PointPillars 或 Cylinder3D 架构,使用标准卷积:
# 原架构(稀疏卷积)
# 点云 → SubMConv3d → GridPooling → SubMConv3d → ...
# 替代架构(稠密卷积)
# 点云 → Voxelization → PointPillarsScatter → Conv2d → ...
修改要点:
- 用
Voxelization+PointPillarsScatter替换稀疏体素处理 - 将 3D 稀疏卷积改为 2D 稠密卷积
- 保留 Transformer 部分(Linear/Softmax 支持),但需改为稠密特征输入
参考实现:PointPillars 参考算法
方案 2:混合部署(折中方案)
┌─────────────────────────────────────────────────────┐
│ CPU/GPU │ BPU (地平线) │
├─────────────────────────────────────────────────────┤
│ 点云预处理 │ │
│ Voxelization │ │
│ SubMConv3d (enc0-2) │ │
│ GridPooling │ │
│───────────────────────│─────────────────────────────│
│ │ 稠密特征 (enc3 之后) │
│ │ Transformer Block │
│ │ Decoder (改为标准卷积) │
│ │ Segmentation Head │
└─────────────────────────────────────────────────────┘
优点:保留部分稀疏卷积架构
缺点:CPU-BPU 数据传输开销,整体性能可能不如纯 BPU
方案 3:推动官方支持(商业项目)
如果你的项目有明确商业落地需求:
-
在当前帖子回复,说明:
- 项目背景(自动驾驶/机器人/工业检测等)
- 预期出货量
- 时间节点
- 为什么必须用稀疏卷积(精度/性能需求)
-
官方之前表示过:
“了解背景有助于协调内部资源支持”
-
可能获得:
- 定制化算子支持
- 优先级排期
- 技术对接资源
我的建议
| 优先级 | 方案 | 适用场景 |
|---|---|---|
| 改用 PointPillars 架构 | 快速落地,已有验证 | |
| 混合部署 | 必须保留稀疏卷积 | |
| 联系官方推动支持 | 商业项目,有出货量 |
下一步行动
需要我帮你:
- 分析 PointPillars 架构,对比与你当前网络的精度/性能差异?
- 给出具体代码修改方案,如何将该网络改为支持算子?
- 协助准备项目背景说明,用于向官方申请资源?
请告诉我你的项目场景(个人研究/商业落地)和时间要求,我给出更针对性的建议。
给出具体代码修改方案
代码修改方案
根据你的网络结构,我提供两种修改方案:方案 A(推荐,全 BPU 部署)和方案 B(混合部署,保留部分架构)。
方案 A:改为 PointPillars 架构(全 BPU 部署)
核心思路
原架构:点云 → SubMConv3d → GridPooling → Transformer → 分割头
新架构:点云 → Voxelization → PointPillarsScatter → Conv2d → Transformer → 分割头
完整代码实现
import torch
import torch.nn as nn
from horizon_plugin_pytorch.nn import PointPillarsScatter, voxelization
class PointCloudSegmentor(nn.Module):
"""
适配地平线 BPU 的点云分割网络
替换 SubMConv3d 为 Conv2d + PointPillarsScatter
"""
def __init__(self, num_classes=11, voxel_size=[0.2, 0.2, 4.0],
pc_range=[0, -40, -2, 70.4, 40, 2]):
super().__init__()
self.voxel_size = torch.tensor(voxel_size)
self.pc_range = torch.tensor(pc_range)
# ========== 1. 体素化参数 ==========
self.max_voxels = 40000
self.max_points_per_voxel = 64
# 计算伪图像尺寸
grid_size = (torch.tensor(pc_range[3:]) - torch.tensor(pc_range[:3])) / torch.tensor(voxel_size)
self.nx, self.ny = int(grid_size[0]), int(grid_size[1])
# ========== 2. Stem - 替换 SubMConv3d(4,36) ==========
# 原:SubMConv3d(4, 36, kernel_size=[5,5,5])
# 新:PFN Layer + Conv2d
self.stem_pfn = nn.Sequential(
nn.Linear(4, 36),
nn.BatchNorm1d(36),
nn.GELU(),
)
# ========== 3. Backbone - 替换 SubMConv3d + GridPooling ==========
# 原:多层 SubMConv3d(36,36) + GridPooling
# 新:2D Conv + MaxPool2d
self.backbone = nn.Sequential(
# Block 0: 36 -> 36
nn.Conv2d(36, 36, kernel_size=3, padding=1),
nn.BatchNorm2d(36),
nn.GELU(),
nn.Conv2d(36, 36, kernel_size=3, padding=1),
nn.BatchNorm2d(36),
nn.GELU(),
# Downsample: 替换 GridPooling
nn.Conv2d(36, 72, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(72),
nn.GELU(),
# Block 1: 72 -> 72
nn.Conv2d(72, 72, kernel_size=3, padding=1),
nn.BatchNorm2d(72),
nn.GELU(),
nn.Conv2d(72, 72, kernel_size=3, padding=1),
nn.BatchNorm2d(72),
nn.GELU(),
# Downsample
nn.Conv2d(72, 144, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(144),
nn.GELU(),
# Block 2: 144 -> 144
nn.Conv2d(144, 144, kernel_size=3, padding=1),
nn.BatchNorm2d(144),
nn.GELU(),
nn.Conv2d(144, 144, kernel_size=3, padding=1),
nn.BatchNorm2d(144),
nn.GELU(),
# Downsample
nn.Conv2d(144, 252, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(252),
nn.GELU(),
)
# ========== 4. Transformer Block - 保留但修改 ==========
# 原:PointROPEAttention (不支持)
# 新:标准 Self-Attention (支持 Linear + Softmax)
self.transformer_blocks = nn.ModuleList([
SelfAttentionBlock(252, num_heads=8, mlp_ratio=4)
for _ in range(6)
])
# ========== 5. Decoder - 替换 GridUnpooling ==========
# 原:GridUnpooling 上采样
# 新:ConvTranspose2d
self.decoder = nn.Sequential(
# 252 -> 252
nn.ConvTranspose2d(252, 252, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(252),
nn.GELU(),
# 252 -> 144
nn.ConvTranspose2d(252, 144, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(144),
nn.GELU(),
# 144 -> 72
nn.ConvTranspose2d(144, 72, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(72),
nn.GELU(),
# 72 -> 72
nn.ConvTranspose2d(72, 72, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(72),
nn.GELU(),
)
# ========== 6. Segmentation Head ==========
self.seg_head = nn.Linear(72, num_classes)
def forward(self, points):
"""
Args:
points: (B, N, 4) - xyz + intensity
Returns:
seg_logits: (B, num_classes, H, W) 或 (B, N, num_classes)
"""
batch_size = points.shape[0]
# ===== Step 1: Voxelization =====
voxels, coors, num_points = voxelization(
points,
voxel_size=self.voxel_size,
pc_range=self.pc_range,
max_voxels=self.max_voxels,
max_points_per_voxel=self.max_points_per_voxel,
)
# ===== Step 2: Pillar Feature Net =====
# voxels: (max_voxels, max_points, 4)
# coors: (max_voxels, 3)
features = self.stem_pfn(voxels) # (max_voxels, max_points, 36)
features = features.max(dim=1)[0] # (max_voxels, 36)
# ===== Step 3: PointPillarsScatter =====
# 将稀疏体素特征转换为稠密伪图像 (B, C, H, W)
pseudo_image = PointPillarsScatter(
features, coors,
output_shape=(batch_size, 36, self.ny, self.nx)
)
# ===== Step 4: Backbone (2D Conv) =====
x = self.backbone(pseudo_image) # (B, 252, H/8, W/8)
# ===== Step 5: Transformer Blocks =====
B, C, H, W = x.shape
x = x.flatten(2).transpose(1, 2) # (B, H*W, C)
for block in self.transformer_blocks:
x = block(x)
x = x.transpose(1, 2).reshape(B, C, H, W) # (B, 252, H/8, W/8)
# ===== Step 6: Decoder (Upsample) =====
x = self.decoder(x) # (B, 72, H, W)
# ===== Step 7: Segmentation Head =====
# 输出方式1: 伪图像分割 (B, num_classes, H, W)
seg_logits = self.seg_head(x.permute(0, 2, 3, 1)) # (B, H, W, num_classes)
return seg_logits
class SelfAttentionBlock(nn.Module):
"""标准 Self-Attention,替换 PointROPEAttention"""
def __init__(self, dim, num_heads=8, mlp_ratio=4.0):
super().__init__()
self.norm1 = nn.LayerNorm(dim)
self.norm2 = nn.LayerNorm(dim)
self.qkv = nn.Linear(dim, dim * 3)
self.proj = nn.Linear(dim, dim)
self.softmax = nn.Softmax(dim=-1)
mlp_hidden = int(dim * mlp_ratio)
self.mlp = nn.Sequential(
nn.Linear(dim, mlp_hidden),
nn.GELU(),
nn.Linear(mlp_hidden, dim),
)
def forward(self, x):
B, N, C = x.shape
# Self-Attention
normed = self.norm1(x)
qkv = self.qkv(normed).reshape(B, N, 3, C).permute(2, 0, 1, 3)
q, k, v = qkv[0], qkv[1], qkv[2]
attn = (q @ k.transpose(-2, -1)) / (C ** 0.5)
attn = self.softmax(attn)
x = x + self.proj((attn @ v).transpose(1, 2).reshape(B, N, C))
# MLP
x = x + self.mlp(self.norm2(x))
return x
模型导出与量化
# ===== 1. 导出 ONNX =====
def export_onnx():
model = PointCloudSegmentor(num_classes=11)
model.eval()
# 模拟输入 (batch, points, 4)
dummy_input = torch.randn(1, 150000, 4)
torch.onnx.export(
model,
dummy_input,
"pointcloud_seg.onnx",
input_names=["points"],
output_names=["seg_logits"],
dynamic_axes={
"points": {0: "batch", 1: "num_points"},
"seg_logits": {0: "batch"}
},
opset_version=13
)
# ===== 2. 模型检查 =====
# hb_compile --check-model pointcloud_seg.onnx
# ===== 3. PTQ 量化 =====
# hb_compile --convert-config quant_config.json \
# --output-dir ./output \
# pointcloud_seg.onnx
方案 B:混合部署(保留部分稀疏卷积)
如果必须保留原始架构精度:
# 部署架构
# CPU: 点云 → SubMConv3d(enc0-2) → GridPooling → 稠密特征
# BPU: 稠密特征 → Transformer → Decoder → 分割头
class HybridSegmentor(nn.Module):
def __init__(self):
super().__init__()
# CPU 部分(保留原始稀疏卷积)
self.cpu_backbone = OriginalLitePT_Backbone() # 在 CPU 运行
# BPU 部分(仅包含支持的算子)
self.bpu_transformer = nn.ModuleList([...]) # Linear + Softmax
self.bpu_decoder = nn.Sequential([...]) # ConvTranspose2d
self.bpu_head = nn.Linear(72, 11)
def forward(self, points):
# CPU 推理
dense_features = self.cpu_backbone(points) # 输出稠密特征
# 数据传输到 BPU
# (需要在板端用 C++/Python 协调)
# BPU 推理
x = self.bpu_transformer(dense_features)
x = self.bpu_decoder(x)
return self.bpu_head(x)
算子替换对照表
| 原算子 | 替换方案 | 备注 |
|---|---|---|
SubMConv3d |
Conv2d + PointPillarsScatter |
核心改动 |
GridPooling |
Conv2d(stride=2) |
下采样 |
GridUnpooling |
ConvTranspose2d |
上采样 |
PointROPEAttention |
SelfAttention (无 RoPE) |
精度可能略降 |
PointSequential |
nn.Sequential |
直接替换 |
下一步
- 先用方案 A 的代码训练验证精度
- 导出 ONNX 后用
hb_compile --check-model验证算子 - 如有精度损失,再考虑方案 B 混合部署
需要我帮你:
- 提供完整的训练配置文件?
- 解释量化参数设置?
- 分析精度损失预期?