稀疏卷积/3D卷积部署问题

各位大佬,有没有稀疏卷积的相关实现示例,咱们官方有相关的3D卷积实现示例demo不

你好,可以参考一下nodehub里面S100的视频分类,这个是目前释放出来的demo:developer.d-robotics.cc/nodehubdetail/1984163899718823937
另外,方便了解一下您这边的项目背景吗?请问是个人研究测试,还是有具体的商业产品落地规划?如果是商业项目,具体是应用在哪类终端产品上?了解这些背景有助于我们协调更多内部资源支持。

现在还在预研测试阶段,有落地规划,希望能将一些包含spconv的算法在咱们的开发板上部署呢

我看到里面主要还是普通的3D卷积,spconv不知道是否也有相关案例呢

您好,目前我们呢的案例在Nodehub和手册中都可以找到,整体BPU模型部署可以参考一下算法工具链:OE文档脉络 - OpenExplorer

包括附录的算子支持列表,通用的onnx算子我们都是支持的,一般的算法都是没问题的

那如果是不在你们手册中的算子,想要用你们的BPU来加速,有没有参考文档可用呢

您好,目前不支持,建议进行算子替换,转化成可支持的算子进行落地

你好,我也在做点云算法移植,碰到同样的问题,大家可以交流下 wx: haipeng90

结论

目前地平线工具链不支持稀疏卷积(spconv)算子,官方建议进行算子替换,将稀疏卷积转化为标准3D卷积或其他支持的算子进行部署。

解决方案

1. 现有资源参考

社区已有成熟的点云算法参考实现,可直接参考:

2. 算子替换策略

# 稀疏卷积 → 标准卷积的转换思路
# 方案1: 将稀疏体素网格转换为稠密网格
# 方案2: 使用PointPillarsScatter等支持的算子替代
# 方案3: 修改网络结构,使用标准3D卷积层

3. 验证算子支持

查看官方算子支持列表:

4. 建议步骤

  1. 导出ONNX模型,使用hb_compile工具检查算子支持情况
  2. 定位不支持的算子,查看报错信息
  3. 修改网络结构,替换为支持的算子组合
  4. 参考PointPillars实现,采用已验证的架构

社区交流

话题中已有开发者在研究类似问题,可以:

  • 在该帖子下继续讨论技术细节
  • 参考PointPillars/CenterPoint的开源实现
  • 联系官方技术支持获取定制方案(如为商业项目)

需要我帮你查看具体的算子替换方案或PointPillars的实现细节吗?

subconv3d 算子

针对你提到的 subconv3d(稀疏子流形3D卷积)算子,目前地平线工具链确实不支持。这是稀疏卷积(Sparse Convolution)的核心算子之一,在 spconv、MinkowskiEngine 等库中广泛使用。

核心结论

subconv3d 不在官方算子支持列表中,无法直接在 BPU 上加速。必须通过算子替换网络结构调整来实现部署。

可行方案

方案 1:使用 PointPillarsScatter 替代(推荐)

社区已有成熟的 PointPillars 参考算法,它通过以下方式规避了稀疏卷积:

# 关键改动:使用地平线定制的 PointPillarsScatter 算子
# 将稀疏体素特征转换为稠密伪图像,然后用标准 2D 卷积处理

# 原始稀疏卷积流程(不支持)
# sparse_tensor → subconv3d → sparse_tensor

# 替代方案(支持)
# 点云 → Voxelization → PillarFeatureNet → PointPillarsScatter → 2D CNN

优势

  • PointPillarsScatter 是地平线定制算子,已集成在 horizon_plugin_pytorch
  • 后续使用标准 Conv2d,BPU 完全支持
  • 已在 J5/S100 平台上验证,性能 25.8ms + 后处理 2.6ms

方案 2:体素化 + 标准 3D 卷积

将稀疏卷积网络改为稠密 3D 卷积:

# 修改前(spconv)
import spconv
self.conv1 = spconv.SubMConv3d(...)

# 修改后(标准卷积)
import torch.nn as nn
# 1. 先将稀疏体素转换为稠密张量
# 2. 使用标准 3D 卷积
self.conv1 = nn.Conv3d(in_channels, out_channels, kernel_size=3, padding=1)

注意

  • 需要调整网络结构,确保体素网格尺寸合理(避免显存爆炸)
  • 可能损失稀疏性带来的效率优势

方案 3:算子分解(复杂,不推荐)

尝试将 subconv3d 分解为支持的算子组合:

  • Gather + Conv2D + Scatter
  • 但实现复杂,性能难以保证

验证步骤

  1. 检查算子支持

    # 导出 ONNX 模型
    hb_compile --check-model your_model.onnx
    # 查看报错的算子类型
    
  2. 参考 PointPillars 实现

    • 源码位置:OE 包中 ddk/samples/ai_toolchain/horizon_model_train_sample
    • 关键算子:horizon_plugin_pytorch.nn.PointPillarsScatter
    • 文档:PointPillars 参考算法详解
  3. 联系官方支持

    • 如果是商业项目,可在原帖中说明项目背景
    • 官方可能提供定制化算子支持或优化建议

建议

优先采用 PointPillars 架构,这是目前社区验证最成熟的点云部署方案。如果必须使用稀疏卷积网络(如 CenterPoint 的稀疏 Backbone),建议:

  1. 评估是否可改为 Pillar-based 方法
  2. 联系地平线技术支持,确认是否有未公开的稀疏卷积优化方案
  3. 考虑在 CPU/GPU 上运行稀疏卷积部分,BPU 仅处理稠密部分(混合部署)

需要我帮你分析具体的网络结构,给出修改建议吗?

spconv 算子什么时候支持

结论

目前官方没有公开 spconv/subconv3d 算子的具体支持时间表

根据社区历史回复模式,新算子支持通常通过工具链版本更新逐步释放(如之前 MatMul、centered_yuv2rgb 等算子都是在后续版本中添加),但稀疏卷积属于较复杂的自定义算子,暂无明确规划。

建议行动

1. 短期方案(立即落地)

继续采用算子替换策略:

  • 优先使用 PointPillars 架构(已验证)
  • 或将稀疏卷积改为标准 3D 卷积 + 体素化预处理

2. 推动官方支持

如果你是商业项目且有明确落地需求:

  • 在当前帖子中详细说明项目背景、终端产品、预期出货量
  • 官方曾表示:“了解背景有助于协调内部资源支持”
  • 可能获得定制化算子支持或优先级排期

3. 关注版本更新

4. 替代架构调研

如果项目不强制要求稀疏卷积,可评估:

  • PointPillars:2D 卷积处理伪图像,BPU 高效支持
  • CenterPoint 稠密版:使用标准 3D 卷积
  • 混合部署:稀疏部分 CPU/GPU 运行,BPU 处理后续稠密层

需要我帮你:

  1. 分析你的网络结构,给出具体修改方案?
  2. 查看 PointPillars 的完整实现细节?
  3. 协助准备商业项目背景说明,向官方申请资源支持?

DefaultSegmentorV2(
(seg_head): Linear(in_features=72, out_features=11, bias=True)
(backbone): LitePT(
(embedding): Embedding(
(stem): PointSequential(
(conv): SubMConv3d(4, 36, kernel_size=[5, 5, 5], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.Native)
(norm): BatchNorm1d(36, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(act): GELU(approximate=‘none’)
)
)
(enc): PointSequential(
(enc0): PointSequential(
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(36, 36, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=36, out_features=36, bias=True)
(2): LayerNorm((36,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(36, 36, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=36, out_features=36, bias=True)
(2): LayerNorm((36,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc1): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=36, out_features=72, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(72, 72, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=72, out_features=72, bias=True)
(2): LayerNorm((72,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(72, 72, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=72, out_features=72, bias=True)
(2): LayerNorm((72,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc2): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=72, out_features=144, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(144, 144, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=144, out_features=144, bias=True)
(2): LayerNorm((144,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(144, 144, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=144, out_features=144, bias=True)
(2): LayerNorm((144,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc3): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=144, out_features=252, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.138)
)
)
(block1): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.162)
)
)
(block2): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.185)
)
)
(block3): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.208)
)
)
(block4): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.231)
)
)
(block5): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.254)
)
)
)
(enc4): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=252, out_features=504, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(504, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(norm0): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=504, out_features=1512, bias=True)
(proj): Linear(in_features=504, out_features=504, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=504, out_features=2016, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=2016, out_features=504, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.277)
)
)
(block1): Block(
(norm0): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=504, out_features=1512, bias=True)
(proj): Linear(in_features=504, out_features=504, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=504, out_features=2016, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=2016, out_features=504, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.300)
)
)
)
)
(dec): PointSequential(
(dec3): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=504, out_features=252, bias=True)
(1): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=252, out_features=252, bias=True)
(1): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec2): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=252, out_features=144, bias=True)
(1): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=144, out_features=144, bias=True)
(1): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec1): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=144, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=72, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec0): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=72, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=36, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
)
)
)

看到你贴的网络结构了,这是一个典型的 稀疏卷积点云分割网络(类似 PTv2/PTv3 架构)。问题比较明确:

核心问题分析

你的网络中不支持的算子

算子 出现位置 支持状态
SubMConv3d stem + enc0~enc2 多处 :cross_mark: 不支持
GridPooling enc0~enc4 下采样 :cross_mark: 不支持
GridUnpooling dec0~dec3 上采样 :cross_mark: 不支持
PointROPEAttention enc3~enc4 注意力 :cross_mark: 不支持
PointSequential 整个网络封装 :cross_mark: 不支持

支持的算子

  • LinearBatchNorm1dLayerNormGELUSoftmaxDropout :white_check_mark:

可行方案

方案 1:架构替换(推荐,可落地)

改为 PointPillars 或 Cylinder3D 架构,使用标准卷积:

# 原架构(稀疏卷积)
# 点云 → SubMConv3d → GridPooling → SubMConv3d → ...

# 替代架构(稠密卷积)
# 点云 → Voxelization → PointPillarsScatter → Conv2d → ...

修改要点

  1. Voxelization + PointPillarsScatter 替换稀疏体素处理
  2. 将 3D 稀疏卷积改为 2D 稠密卷积
  3. 保留 Transformer 部分(Linear/Softmax 支持),但需改为稠密特征输入

参考实现PointPillars 参考算法


方案 2:混合部署(折中方案)

┌─────────────────────────────────────────────────────┐
│  CPU/GPU              │  BPU (地平线)               │
├─────────────────────────────────────────────────────┤
│  点云预处理           │                              │
│  Voxelization         │                              │
│  SubMConv3d (enc0-2)  │                              │
│  GridPooling          │                              │
│───────────────────────│─────────────────────────────│
│                      │  稠密特征 (enc3 之后)         │
│                      │  Transformer Block            │
│                      │  Decoder (改为标准卷积)        │
│                      │  Segmentation Head            │
└─────────────────────────────────────────────────────┘

优点:保留部分稀疏卷积架构
缺点:CPU-BPU 数据传输开销,整体性能可能不如纯 BPU


方案 3:推动官方支持(商业项目)

如果你的项目有明确商业落地需求:

  1. 在当前帖子回复,说明:

    • 项目背景(自动驾驶/机器人/工业检测等)
    • 预期出货量
    • 时间节点
    • 为什么必须用稀疏卷积(精度/性能需求)
  2. 官方之前表示过

    “了解背景有助于协调内部资源支持”

  3. 可能获得

    • 定制化算子支持
    • 优先级排期
    • 技术对接资源

我的建议

优先级 方案 适用场景
:fire: 首选 改用 PointPillars 架构 快速落地,已有验证
:high_voltage: 备选 混合部署 必须保留稀疏卷积
:date: 长期 联系官方推动支持 商业项目,有出货量

下一步行动

需要我帮你:

  1. 分析 PointPillars 架构,对比与你当前网络的精度/性能差异?
  2. 给出具体代码修改方案,如何将该网络改为支持算子?
  3. 协助准备项目背景说明,用于向官方申请资源?

请告诉我你的项目场景(个人研究/商业落地)和时间要求,我给出更针对性的建议。

给出具体代码修改方案

代码修改方案

根据你的网络结构,我提供两种修改方案:方案 A(推荐,全 BPU 部署)和方案 B(混合部署,保留部分架构)。


方案 A:改为 PointPillars 架构(全 BPU 部署):white_check_mark:

核心思路

原架构:点云 → SubMConv3d → GridPooling → Transformer → 分割头
新架构:点云 → Voxelization → PointPillarsScatter → Conv2d → Transformer → 分割头

完整代码实现

import torch
import torch.nn as nn
from horizon_plugin_pytorch.nn import PointPillarsScatter, voxelization

class PointCloudSegmentor(nn.Module):
    """
    适配地平线 BPU 的点云分割网络
    替换 SubMConv3d 为 Conv2d + PointPillarsScatter
    """
    
    def __init__(self, num_classes=11, voxel_size=[0.2, 0.2, 4.0], 
                 pc_range=[0, -40, -2, 70.4, 40, 2]):
        super().__init__()
        
        self.voxel_size = torch.tensor(voxel_size)
        self.pc_range = torch.tensor(pc_range)
        
        # ========== 1. 体素化参数 ==========
        self.max_voxels = 40000
        self.max_points_per_voxel = 64
        
        # 计算伪图像尺寸
        grid_size = (torch.tensor(pc_range[3:]) - torch.tensor(pc_range[:3])) / torch.tensor(voxel_size)
        self.nx, self.ny = int(grid_size[0]), int(grid_size[1])
        
        # ========== 2. Stem - 替换 SubMConv3d(4,36) ==========
        # 原:SubMConv3d(4, 36, kernel_size=[5,5,5])
        # 新:PFN Layer + Conv2d
        self.stem_pfn = nn.Sequential(
            nn.Linear(4, 36),
            nn.BatchNorm1d(36),
            nn.GELU(),
        )
        
        # ========== 3. Backbone - 替换 SubMConv3d + GridPooling ==========
        # 原:多层 SubMConv3d(36,36) + GridPooling
        # 新:2D Conv + MaxPool2d
        self.backbone = nn.Sequential(
            # Block 0: 36 -> 36
            nn.Conv2d(36, 36, kernel_size=3, padding=1),
            nn.BatchNorm2d(36),
            nn.GELU(),
            nn.Conv2d(36, 36, kernel_size=3, padding=1),
            nn.BatchNorm2d(36),
            nn.GELU(),
            
            # Downsample: 替换 GridPooling
            nn.Conv2d(36, 72, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            
            # Block 1: 72 -> 72
            nn.Conv2d(72, 72, kernel_size=3, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            nn.Conv2d(72, 72, kernel_size=3, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            
            # Downsample
            nn.Conv2d(72, 144, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            
            # Block 2: 144 -> 144
            nn.Conv2d(144, 144, kernel_size=3, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            nn.Conv2d(144, 144, kernel_size=3, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            
            # Downsample
            nn.Conv2d(144, 252, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(252),
            nn.GELU(),
        )
        
        # ========== 4. Transformer Block - 保留但修改 ==========
        # 原:PointROPEAttention (不支持)
        # 新:标准 Self-Attention (支持 Linear + Softmax)
        self.transformer_blocks = nn.ModuleList([
            SelfAttentionBlock(252, num_heads=8, mlp_ratio=4)
            for _ in range(6)
        ])
        
        # ========== 5. Decoder - 替换 GridUnpooling ==========
        # 原:GridUnpooling 上采样
        # 新:ConvTranspose2d
        self.decoder = nn.Sequential(
            # 252 -> 252
            nn.ConvTranspose2d(252, 252, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(252),
            nn.GELU(),
            
            # 252 -> 144
            nn.ConvTranspose2d(252, 144, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            
            # 144 -> 72
            nn.ConvTranspose2d(144, 72, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            
            # 72 -> 72
            nn.ConvTranspose2d(72, 72, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
        )
        
        # ========== 6. Segmentation Head ==========
        self.seg_head = nn.Linear(72, num_classes)
        
    def forward(self, points):
        """
        Args:
            points: (B, N, 4) - xyz + intensity
        Returns:
            seg_logits: (B, num_classes, H, W) 或 (B, N, num_classes)
        """
        batch_size = points.shape[0]
        
        # ===== Step 1: Voxelization =====
        voxels, coors, num_points = voxelization(
            points,
            voxel_size=self.voxel_size,
            pc_range=self.pc_range,
            max_voxels=self.max_voxels,
            max_points_per_voxel=self.max_points_per_voxel,
        )
        
        # ===== Step 2: Pillar Feature Net =====
        # voxels: (max_voxels, max_points, 4)
        # coors: (max_voxels, 3)
        features = self.stem_pfn(voxels)  # (max_voxels, max_points, 36)
        features = features.max(dim=1)[0]  # (max_voxels, 36)
        
        # ===== Step 3: PointPillarsScatter =====
        # 将稀疏体素特征转换为稠密伪图像 (B, C, H, W)
        pseudo_image = PointPillarsScatter(
            features, coors, 
            output_shape=(batch_size, 36, self.ny, self.nx)
        )
        
        # ===== Step 4: Backbone (2D Conv) =====
        x = self.backbone(pseudo_image)  # (B, 252, H/8, W/8)
        
        # ===== Step 5: Transformer Blocks =====
        B, C, H, W = x.shape
        x = x.flatten(2).transpose(1, 2)  # (B, H*W, C)
        for block in self.transformer_blocks:
            x = block(x)
        x = x.transpose(1, 2).reshape(B, C, H, W)  # (B, 252, H/8, W/8)
        
        # ===== Step 6: Decoder (Upsample) =====
        x = self.decoder(x)  # (B, 72, H, W)
        
        # ===== Step 7: Segmentation Head =====
        # 输出方式1: 伪图像分割 (B, num_classes, H, W)
        seg_logits = self.seg_head(x.permute(0, 2, 3, 1))  # (B, H, W, num_classes)
        
        return seg_logits


class SelfAttentionBlock(nn.Module):
    """标准 Self-Attention,替换 PointROPEAttention"""
    
    def __init__(self, dim, num_heads=8, mlp_ratio=4.0):
        super().__init__()
        self.norm1 = nn.LayerNorm(dim)
        self.norm2 = nn.LayerNorm(dim)
        
        self.qkv = nn.Linear(dim, dim * 3)
        self.proj = nn.Linear(dim, dim)
        self.softmax = nn.Softmax(dim=-1)
        
        mlp_hidden = int(dim * mlp_ratio)
        self.mlp = nn.Sequential(
            nn.Linear(dim, mlp_hidden),
            nn.GELU(),
            nn.Linear(mlp_hidden, dim),
        )
        
    def forward(self, x):
        B, N, C = x.shape
        
        # Self-Attention
        normed = self.norm1(x)
        qkv = self.qkv(normed).reshape(B, N, 3, C).permute(2, 0, 1, 3)
        q, k, v = qkv[0], qkv[1], qkv[2]
        
        attn = (q @ k.transpose(-2, -1)) / (C ** 0.5)
        attn = self.softmax(attn)
        x = x + self.proj((attn @ v).transpose(1, 2).reshape(B, N, C))
        
        # MLP
        x = x + self.mlp(self.norm2(x))
        return x

模型导出与量化

# ===== 1. 导出 ONNX =====
def export_onnx():
    model = PointCloudSegmentor(num_classes=11)
    model.eval()
    
    # 模拟输入 (batch, points, 4)
    dummy_input = torch.randn(1, 150000, 4)
    
    torch.onnx.export(
        model,
        dummy_input,
        "pointcloud_seg.onnx",
        input_names=["points"],
        output_names=["seg_logits"],
        dynamic_axes={
            "points": {0: "batch", 1: "num_points"},
            "seg_logits": {0: "batch"}
        },
        opset_version=13
    )

# ===== 2. 模型检查 =====
# hb_compile --check-model pointcloud_seg.onnx

# ===== 3. PTQ 量化 =====
# hb_compile --convert-config quant_config.json \
#            --output-dir ./output \
#            pointcloud_seg.onnx

方案 B:混合部署(保留部分稀疏卷积)

如果必须保留原始架构精度:

# 部署架构
# CPU: 点云 → SubMConv3d(enc0-2) → GridPooling → 稠密特征
# BPU: 稠密特征 → Transformer → Decoder → 分割头

class HybridSegmentor(nn.Module):
    def __init__(self):
        super().__init__()
        # CPU 部分(保留原始稀疏卷积)
        self.cpu_backbone = OriginalLitePT_Backbone()  # 在 CPU 运行
        
        # BPU 部分(仅包含支持的算子)
        self.bpu_transformer = nn.ModuleList([...])  # Linear + Softmax
        self.bpu_decoder = nn.Sequential([...])  # ConvTranspose2d
        self.bpu_head = nn.Linear(72, 11)
    
    def forward(self, points):
        # CPU 推理
        dense_features = self.cpu_backbone(points)  # 输出稠密特征
        
        # 数据传输到 BPU
        # (需要在板端用 C++/Python 协调)
        
        # BPU 推理
        x = self.bpu_transformer(dense_features)
        x = self.bpu_decoder(x)
        return self.bpu_head(x)

算子替换对照表

原算子 替换方案 备注
SubMConv3d Conv2d + PointPillarsScatter 核心改动
GridPooling Conv2d(stride=2) 下采样
GridUnpooling ConvTranspose2d 上采样
PointROPEAttention SelfAttention (无 RoPE) 精度可能略降
PointSequential nn.Sequential 直接替换

下一步

  1. 先用方案 A 的代码训练验证精度
  2. 导出 ONNX 后用 hb_compile --check-model 验证算子
  3. 如有精度损失,再考虑方案 B 混合部署

需要我帮你:

  • 提供完整的训练配置文件?
  • 解释量化参数设置?
  • 分析精度损失预期?