稀疏卷积/3D卷积部署问题

63241323 · 2026 年1 月 8 日 02:18

各位大佬，有没有稀疏卷积的相关实现示例，咱们官方有相关的3D卷积实现示例demo不

D-Gwen · 2026 年1 月 8 日 02:31

你好，可以参考一下nodehub里面S100的视频分类，这个是目前释放出来的demo：developer.d-robotics.cc/nodehubdetail/1984163899718823937
另外，方便了解一下您这边的项目背景吗？请问是个人研究测试，还是有具体的商业产品落地规划？如果是商业项目，具体是应用在哪类终端产品上？了解这些背景有助于我们协调更多内部资源支持。

63241323 · 2026 年1 月 8 日 05:39

现在还在预研测试阶段，有落地规划，希望能将一些包含spconv的算法在咱们的开发板上部署呢

63241323 · 2026 年1 月 8 日 05:40

我看到里面主要还是普通的3D卷积，spconv不知道是否也有相关案例呢

Marcelo6151 · 2026 年1 月 8 日 06:28

您好，目前我们呢的案例在Nodehub和手册中都可以找到，整体BPU模型部署可以参考一下算法工具链：OE文档脉络 - OpenExplorer

包括附录的算子支持列表，通用的onnx算子我们都是支持的，一般的算法都是没问题的

63241323 · 2026 年1 月 8 日 07:13

那如果是不在你们手册中的算子，想要用你们的BPU来加速，有没有参考文档可用呢

guosheng_xu · 2026 年1 月 8 日 07:20

您好，目前不支持，建议进行算子替换，转化成可支持的算子进行落地

14798484 · 2026 年4 月 7 日 07:17

你好，我也在做点云算法移植，碰到同样的问题，大家可以交流下 wx: haipeng90

地瓜小助手_bot · 2026 年4 月 7 日 07:17

结论

目前地平线工具链不支持稀疏卷积（spconv）算子，官方建议进行算子替换，将稀疏卷积转化为标准3D卷积或其他支持的算子进行部署。

解决方案

1. 现有资源参考

社区已有成熟的点云算法参考实现，可直接参考：

PointPillars参考算法 - 包含点云预处理和标准卷积实现
CenterPoint参考算法 - 3D检测完整方案

2. 算子替换策略

# 稀疏卷积 → 标准卷积的转换思路
# 方案1: 将稀疏体素网格转换为稠密网格
# 方案2: 使用PointPillarsScatter等支持的算子替代
# 方案3: 修改网络结构，使用标准3D卷积层

3. 验证算子支持

查看官方算子支持列表：

S100/S100P ONNX算子支持：S100/S100P ONNX 算子支持列表 - 地瓜算法工具链官方用户手册
S100/S100P Torch算子支持：S100/S100P Torch 算子支持列表 - 地瓜算法工具链官方用户手册

4. 建议步骤

导出ONNX模型，使用hb_compile工具检查算子支持情况
定位不支持的算子，查看报错信息
修改网络结构，替换为支持的算子组合
参考PointPillars实现，采用已验证的架构

社区交流

话题中已有开发者在研究类似问题，可以：

在该帖子下继续讨论技术细节
参考PointPillars/CenterPoint的开源实现
联系官方技术支持获取定制方案（如为商业项目）

需要我帮你查看具体的算子替换方案或PointPillars的实现细节吗？

14798484 · 2026 年4 月 7 日 07:23

subconv3d 算子

地瓜小助手_bot · 2026 年4 月 7 日 07:24

针对你提到的 subconv3d（稀疏子流形3D卷积）算子，目前地平线工具链确实不支持。这是稀疏卷积（Sparse Convolution）的核心算子之一，在 spconv、MinkowskiEngine 等库中广泛使用。

核心结论

subconv3d 不在官方算子支持列表中，无法直接在 BPU 上加速。必须通过算子替换或网络结构调整来实现部署。

可行方案

方案 1：使用 PointPillarsScatter 替代（推荐）

社区已有成熟的 PointPillars 参考算法，它通过以下方式规避了稀疏卷积：

# 关键改动：使用地平线定制的 PointPillarsScatter 算子
# 将稀疏体素特征转换为稠密伪图像，然后用标准 2D 卷积处理

# 原始稀疏卷积流程（不支持）
# sparse_tensor → subconv3d → sparse_tensor

# 替代方案（支持）
# 点云 → Voxelization → PillarFeatureNet → PointPillarsScatter → 2D CNN

优势：

PointPillarsScatter 是地平线定制算子，已集成在 horizon_plugin_pytorch 中
后续使用标准 Conv2d，BPU 完全支持
已在 J5/S100 平台上验证，性能 25.8ms + 后处理 2.6ms

方案 2：体素化 + 标准 3D 卷积

将稀疏卷积网络改为稠密 3D 卷积：

# 修改前（spconv）
import spconv
self.conv1 = spconv.SubMConv3d(...)

# 修改后（标准卷积）
import torch.nn as nn
# 1. 先将稀疏体素转换为稠密张量
# 2. 使用标准 3D 卷积
self.conv1 = nn.Conv3d(in_channels, out_channels, kernel_size=3, padding=1)

注意：

需要调整网络结构，确保体素网格尺寸合理（避免显存爆炸）
可能损失稀疏性带来的效率优势

方案 3：算子分解（复杂，不推荐）

尝试将 subconv3d 分解为支持的算子组合：

Gather + Conv2D + Scatter
但实现复杂，性能难以保证

验证步骤

检查算子支持

# 导出 ONNX 模型
hb_compile --check-model your_model.onnx
# 查看报错的算子类型

参考 PointPillars 实现
- 源码位置：OE 包中 ddk/samples/ai_toolchain/horizon_model_train_sample
- 关键算子：horizon_plugin_pytorch.nn.PointPillarsScatter
- 文档：PointPillars 参考算法详解
联系官方支持
- 如果是商业项目，可在原帖中说明项目背景
- 官方可能提供定制化算子支持或优化建议

建议

优先采用 PointPillars 架构，这是目前社区验证最成熟的点云部署方案。如果必须使用稀疏卷积网络（如 CenterPoint 的稀疏 Backbone），建议：

评估是否可改为 Pillar-based 方法
联系地平线技术支持，确认是否有未公开的稀疏卷积优化方案
考虑在 CPU/GPU 上运行稀疏卷积部分，BPU 仅处理稠密部分（混合部署）

需要我帮你分析具体的网络结构，给出修改建议吗？

14798484 · 2026 年4 月 9 日 10:22

spconv 算子什么时候支持

地瓜小助手_bot · 2026 年4 月 9 日 10:23

结论

目前官方没有公开 spconv/subconv3d 算子的具体支持时间表。

根据社区历史回复模式，新算子支持通常通过工具链版本更新逐步释放（如之前 MatMul、centered_yuv2rgb 等算子都是在后续版本中添加），但稀疏卷积属于较复杂的自定义算子，暂无明确规划。

建议行动

1. 短期方案（立即落地）

继续采用算子替换策略：

优先使用 PointPillars 架构（已验证）
或将稀疏卷积改为标准 3D 卷积 + 体素化预处理

2. 推动官方支持

如果你是商业项目且有明确落地需求：

在当前帖子中详细说明项目背景、终端产品、预期出货量
官方曾表示：“了解背景有助于协调内部资源支持”
可能获得定制化算子支持或优先级排期

3. 关注版本更新

定期查看工具链发布说明：https://toolchain.d-robotics.cc/
关注社区 Model Zoo 分类的新算子公告
订阅 OE 包版本更新通知

4. 替代架构调研

如果项目不强制要求稀疏卷积，可评估：

PointPillars：2D 卷积处理伪图像，BPU 高效支持
CenterPoint 稠密版：使用标准 3D 卷积
混合部署：稀疏部分 CPU/GPU 运行，BPU 处理后续稠密层

需要我帮你：

分析你的网络结构，给出具体修改方案？
查看 PointPillars 的完整实现细节？
协助准备商业项目背景说明，向官方申请资源支持？

14798484 · 2026 年4 月 9 日 10:25

DefaultSegmentorV2(
(seg_head): Linear(in_features=72, out_features=11, bias=True)
(backbone): LitePT(
(embedding): Embedding(
(stem): PointSequential(
(conv): SubMConv3d(4, 36, kernel_size=[5, 5, 5], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.Native)
(norm): BatchNorm1d(36, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(act): GELU(approximate=‘none’)
)
)
(enc): PointSequential(
(enc0): PointSequential(
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(36, 36, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=36, out_features=36, bias=True)
(2): LayerNorm((36,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(36, 36, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=36, out_features=36, bias=True)
(2): LayerNorm((36,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc1): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=36, out_features=72, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(72, 72, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=72, out_features=72, bias=True)
(2): LayerNorm((72,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(72, 72, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=72, out_features=72, bias=True)
(2): LayerNorm((72,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc2): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=72, out_features=144, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(conv): PointSequential(
(0): SubMConv3d(144, 144, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=144, out_features=144, bias=True)
(2): LayerNorm((144,), eps=1e-05, elementwise_affine=True)
)
)
(block1): Block(
(conv): PointSequential(
(0): SubMConv3d(144, 144, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(1): Linear(in_features=144, out_features=144, bias=True)
(2): LayerNorm((144,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc3): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=144, out_features=252, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.138)
)
)
(block1): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.162)
)
)
(block2): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.185)
)
)
(block3): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.208)
)
)
(block4): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.231)
)
)
(block5): Block(
(norm0): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=252, out_features=756, bias=True)
(proj): Linear(in_features=252, out_features=252, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((252,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=252, out_features=1008, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=1008, out_features=252, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.254)
)
)
)
(enc4): PointSequential(
(down): GridPooling(
(proj): Linear(in_features=252, out_features=504, bias=True)
(norm): PointSequential(
(0): BatchNorm1d(504, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(act): PointSequential(
(0): GELU(approximate=‘none’)
)
)
(block0): Block(
(norm0): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=504, out_features=1512, bias=True)
(proj): Linear(in_features=504, out_features=504, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=504, out_features=2016, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=2016, out_features=504, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.277)
)
)
(block1): Block(
(norm0): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(norm1): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(attn): PointROPEAttention(
(qkv): Linear(in_features=504, out_features=1512, bias=True)
(proj): Linear(in_features=504, out_features=504, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
(rope): PointROPE()
)
(norm2): PointSequential(
(0): LayerNorm((504,), eps=1e-05, elementwise_affine=True)
)
(mlp): PointSequential(
(0): MLP(
(fc1): Linear(in_features=504, out_features=2016, bias=True)
(act): GELU(approximate=‘none’)
(fc2): Linear(in_features=2016, out_features=504, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(drop_path): PointSequential(
(0): DropPath(drop_prob=0.300)
)
)
)
)
(dec): PointSequential(
(dec3): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=504, out_features=252, bias=True)
(1): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=252, out_features=252, bias=True)
(1): BatchNorm1d(252, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec2): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=252, out_features=144, bias=True)
(1): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=144, out_features=144, bias=True)
(1): BatchNorm1d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec1): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=144, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=72, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
(dec0): PointSequential(
(up): GridUnpooling(
(proj): PointSequential(
(0): Linear(in_features=72, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
(proj_skip): PointSequential(
(0): Linear(in_features=36, out_features=72, bias=True)
(1): BatchNorm1d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): GELU(approximate=‘none’)
)
)
)
)
)
)

地瓜小助手_bot · 2026 年4 月 9 日 10:25

看到你贴的网络结构了，这是一个典型的 稀疏卷积点云分割网络（类似 PTv2/PTv3 架构）。问题比较明确：

核心问题分析

你的网络中不支持的算子：

算子	出现位置	支持状态
`SubMConv3d`	stem + enc0~enc2 多处	不支持
`GridPooling`	enc0~enc4 下采样	不支持
`GridUnpooling`	dec0~dec3 上采样	不支持
`PointROPEAttention`	enc3~enc4 注意力	不支持
`PointSequential`	整个网络封装	不支持

支持的算子：

Linear、BatchNorm1d、LayerNorm、GELU、Softmax、Dropout

可行方案

方案 1：架构替换（推荐，可落地）

改为 PointPillars 或 Cylinder3D 架构，使用标准卷积：

# 原架构（稀疏卷积）
# 点云 → SubMConv3d → GridPooling → SubMConv3d → ...

# 替代架构（稠密卷积）
# 点云 → Voxelization → PointPillarsScatter → Conv2d → ...

修改要点：

用 Voxelization + PointPillarsScatter 替换稀疏体素处理
将 3D 稀疏卷积改为 2D 稠密卷积
保留 Transformer 部分（Linear/Softmax 支持），但需改为稠密特征输入

参考实现：PointPillars 参考算法

方案 2：混合部署（折中方案）

┌─────────────────────────────────────────────────────┐
│  CPU/GPU              │  BPU (地平线)               │
├─────────────────────────────────────────────────────┤
│  点云预处理           │                              │
│  Voxelization         │                              │
│  SubMConv3d (enc0-2)  │                              │
│  GridPooling          │                              │
│───────────────────────│─────────────────────────────│
│                      │  稠密特征 (enc3 之后)         │
│                      │  Transformer Block            │
│                      │  Decoder (改为标准卷积)        │
│                      │  Segmentation Head            │
└─────────────────────────────────────────────────────┘

优点：保留部分稀疏卷积架构
缺点：CPU-BPU 数据传输开销，整体性能可能不如纯 BPU

方案 3：推动官方支持（商业项目）

如果你的项目有明确商业落地需求：

在当前帖子回复，说明：
- 项目背景（自动驾驶/机器人/工业检测等）
- 预期出货量
- 时间节点
- 为什么必须用稀疏卷积（精度/性能需求）
官方之前表示过：

“了解背景有助于协调内部资源支持”
可能获得：
- 定制化算子支持
- 优先级排期
- 技术对接资源

我的建议

优先级	方案	适用场景
首选	改用 PointPillars 架构	快速落地，已有验证
备选	混合部署	必须保留稀疏卷积
长期	联系官方推动支持	商业项目，有出货量

下一步行动

需要我帮你：

分析 PointPillars 架构，对比与你当前网络的精度/性能差异？
给出具体代码修改方案，如何将该网络改为支持算子？
协助准备项目背景说明，用于向官方申请资源？

请告诉我你的项目场景（个人研究/商业落地）和时间要求，我给出更针对性的建议。

14798484 · 2026 年4 月 10 日 02:51

给出具体代码修改方案

地瓜小助手_bot · 2026 年4 月 10 日 02:52

代码修改方案

根据你的网络结构，我提供两种修改方案：方案 A（推荐，全 BPU 部署）和方案 B（混合部署，保留部分架构）。

方案 A：改为 PointPillars 架构（全 BPU 部署）

核心思路

原架构：点云 → SubMConv3d → GridPooling → Transformer → 分割头
新架构：点云 → Voxelization → PointPillarsScatter → Conv2d → Transformer → 分割头

完整代码实现

import torch
import torch.nn as nn
from horizon_plugin_pytorch.nn import PointPillarsScatter, voxelization

class PointCloudSegmentor(nn.Module):
    """
    适配地平线 BPU 的点云分割网络
    替换 SubMConv3d 为 Conv2d + PointPillarsScatter
    """
    
    def __init__(self, num_classes=11, voxel_size=[0.2, 0.2, 4.0], 
                 pc_range=[0, -40, -2, 70.4, 40, 2]):
        super().__init__()
        
        self.voxel_size = torch.tensor(voxel_size)
        self.pc_range = torch.tensor(pc_range)
        
        # ========== 1. 体素化参数 ==========
        self.max_voxels = 40000
        self.max_points_per_voxel = 64
        
        # 计算伪图像尺寸
        grid_size = (torch.tensor(pc_range[3:]) - torch.tensor(pc_range[:3])) / torch.tensor(voxel_size)
        self.nx, self.ny = int(grid_size[0]), int(grid_size[1])
        
        # ========== 2. Stem - 替换 SubMConv3d(4,36) ==========
        # 原：SubMConv3d(4, 36, kernel_size=[5,5,5])
        # 新：PFN Layer + Conv2d
        self.stem_pfn = nn.Sequential(
            nn.Linear(4, 36),
            nn.BatchNorm1d(36),
            nn.GELU(),
        )
        
        # ========== 3. Backbone - 替换 SubMConv3d + GridPooling ==========
        # 原：多层 SubMConv3d(36,36) + GridPooling
        # 新：2D Conv + MaxPool2d
        self.backbone = nn.Sequential(
            # Block 0: 36 -> 36
            nn.Conv2d(36, 36, kernel_size=3, padding=1),
            nn.BatchNorm2d(36),
            nn.GELU(),
            nn.Conv2d(36, 36, kernel_size=3, padding=1),
            nn.BatchNorm2d(36),
            nn.GELU(),
            
            # Downsample: 替换 GridPooling
            nn.Conv2d(36, 72, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            
            # Block 1: 72 -> 72
            nn.Conv2d(72, 72, kernel_size=3, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            nn.Conv2d(72, 72, kernel_size=3, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            
            # Downsample
            nn.Conv2d(72, 144, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            
            # Block 2: 144 -> 144
            nn.Conv2d(144, 144, kernel_size=3, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            nn.Conv2d(144, 144, kernel_size=3, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            
            # Downsample
            nn.Conv2d(144, 252, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(252),
            nn.GELU(),
        )
        
        # ========== 4. Transformer Block - 保留但修改 ==========
        # 原：PointROPEAttention (不支持)
        # 新：标准 Self-Attention (支持 Linear + Softmax)
        self.transformer_blocks = nn.ModuleList([
            SelfAttentionBlock(252, num_heads=8, mlp_ratio=4)
            for _ in range(6)
        ])
        
        # ========== 5. Decoder - 替换 GridUnpooling ==========
        # 原：GridUnpooling 上采样
        # 新：ConvTranspose2d
        self.decoder = nn.Sequential(
            # 252 -> 252
            nn.ConvTranspose2d(252, 252, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(252),
            nn.GELU(),
            
            # 252 -> 144
            nn.ConvTranspose2d(252, 144, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(144),
            nn.GELU(),
            
            # 144 -> 72
            nn.ConvTranspose2d(144, 72, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
            
            # 72 -> 72
            nn.ConvTranspose2d(72, 72, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(72),
            nn.GELU(),
        )
        
        # ========== 6. Segmentation Head ==========
        self.seg_head = nn.Linear(72, num_classes)
        
    def forward(self, points):
        """
        Args:
            points: (B, N, 4) - xyz + intensity
        Returns:
            seg_logits: (B, num_classes, H, W) 或 (B, N, num_classes)
        """
        batch_size = points.shape[0]
        
        # ===== Step 1: Voxelization =====
        voxels, coors, num_points = voxelization(
            points,
            voxel_size=self.voxel_size,
            pc_range=self.pc_range,
            max_voxels=self.max_voxels,
            max_points_per_voxel=self.max_points_per_voxel,
        )
        
        # ===== Step 2: Pillar Feature Net =====
        # voxels: (max_voxels, max_points, 4)
        # coors: (max_voxels, 3)
        features = self.stem_pfn(voxels)  # (max_voxels, max_points, 36)
        features = features.max(dim=1)[0]  # (max_voxels, 36)
        
        # ===== Step 3: PointPillarsScatter =====
        # 将稀疏体素特征转换为稠密伪图像 (B, C, H, W)
        pseudo_image = PointPillarsScatter(
            features, coors, 
            output_shape=(batch_size, 36, self.ny, self.nx)
        )
        
        # ===== Step 4: Backbone (2D Conv) =====
        x = self.backbone(pseudo_image)  # (B, 252, H/8, W/8)
        
        # ===== Step 5: Transformer Blocks =====
        B, C, H, W = x.shape
        x = x.flatten(2).transpose(1, 2)  # (B, H*W, C)
        for block in self.transformer_blocks:
            x = block(x)
        x = x.transpose(1, 2).reshape(B, C, H, W)  # (B, 252, H/8, W/8)
        
        # ===== Step 6: Decoder (Upsample) =====
        x = self.decoder(x)  # (B, 72, H, W)
        
        # ===== Step 7: Segmentation Head =====
        # 输出方式1: 伪图像分割 (B, num_classes, H, W)
        seg_logits = self.seg_head(x.permute(0, 2, 3, 1))  # (B, H, W, num_classes)
        
        return seg_logits


class SelfAttentionBlock(nn.Module):
    """标准 Self-Attention，替换 PointROPEAttention"""
    
    def __init__(self, dim, num_heads=8, mlp_ratio=4.0):
        super().__init__()
        self.norm1 = nn.LayerNorm(dim)
        self.norm2 = nn.LayerNorm(dim)
        
        self.qkv = nn.Linear(dim, dim * 3)
        self.proj = nn.Linear(dim, dim)
        self.softmax = nn.Softmax(dim=-1)
        
        mlp_hidden = int(dim * mlp_ratio)
        self.mlp = nn.Sequential(
            nn.Linear(dim, mlp_hidden),
            nn.GELU(),
            nn.Linear(mlp_hidden, dim),
        )
        
    def forward(self, x):
        B, N, C = x.shape
        
        # Self-Attention
        normed = self.norm1(x)
        qkv = self.qkv(normed).reshape(B, N, 3, C).permute(2, 0, 1, 3)
        q, k, v = qkv[0], qkv[1], qkv[2]
        
        attn = (q @ k.transpose(-2, -1)) / (C ** 0.5)
        attn = self.softmax(attn)
        x = x + self.proj((attn @ v).transpose(1, 2).reshape(B, N, C))
        
        # MLP
        x = x + self.mlp(self.norm2(x))
        return x

模型导出与量化

# ===== 1. 导出 ONNX =====
def export_onnx():
    model = PointCloudSegmentor(num_classes=11)
    model.eval()
    
    # 模拟输入 (batch, points, 4)
    dummy_input = torch.randn(1, 150000, 4)
    
    torch.onnx.export(
        model,
        dummy_input,
        "pointcloud_seg.onnx",
        input_names=["points"],
        output_names=["seg_logits"],
        dynamic_axes={
            "points": {0: "batch", 1: "num_points"},
            "seg_logits": {0: "batch"}
        },
        opset_version=13
    )

# ===== 2. 模型检查 =====
# hb_compile --check-model pointcloud_seg.onnx

# ===== 3. PTQ 量化 =====
# hb_compile --convert-config quant_config.json \
#            --output-dir ./output \
#            pointcloud_seg.onnx

方案 B：混合部署（保留部分稀疏卷积）

如果必须保留原始架构精度：

# 部署架构
# CPU: 点云 → SubMConv3d(enc0-2) → GridPooling → 稠密特征
# BPU: 稠密特征 → Transformer → Decoder → 分割头

class HybridSegmentor(nn.Module):
    def __init__(self):
        super().__init__()
        # CPU 部分（保留原始稀疏卷积）
        self.cpu_backbone = OriginalLitePT_Backbone()  # 在 CPU 运行
        
        # BPU 部分（仅包含支持的算子）
        self.bpu_transformer = nn.ModuleList([...])  # Linear + Softmax
        self.bpu_decoder = nn.Sequential([...])  # ConvTranspose2d
        self.bpu_head = nn.Linear(72, 11)
    
    def forward(self, points):
        # CPU 推理
        dense_features = self.cpu_backbone(points)  # 输出稠密特征
        
        # 数据传输到 BPU
        # (需要在板端用 C++/Python 协调)
        
        # BPU 推理
        x = self.bpu_transformer(dense_features)
        x = self.bpu_decoder(x)
        return self.bpu_head(x)

算子替换对照表

原算子	替换方案	备注
`SubMConv3d`	`Conv2d` + `PointPillarsScatter`	核心改动
`GridPooling`	`Conv2d(stride=2)`	下采样
`GridUnpooling`	`ConvTranspose2d`	上采样
`PointROPEAttention`	`SelfAttention` (无 RoPE)	精度可能略降
`PointSequential`	`nn.Sequential`	直接替换

下一步

先用方案 A 的代码训练验证精度
导出 ONNX 后用 hb_compile --check-model 验证算子
如有精度损失，再考虑方案 B 混合部署

需要我帮你：

提供完整的训练配置文件？
解释量化参数设置？
分析精度损失预期？