零件名称识别:三维零件分割与命名
Name That Part: 3D Part Segmentation and Naming

原始链接: https://name-that-part.github.io/

## ALIGN-Parts:命名3D部件分割 许多机器人和图形应用需要识别和标记3D对象的特定*部件*,而不仅仅是整个对象。ALIGN-Parts通过将命名3D部件分割重新定义为集合到集合的对齐问题来解决这一挑战。模型不进行单独的点标注,而是预测一组“部件块”——每个部件块代表一个部件,包含一个分割掩码和一个描述该部件的文本嵌入。然后,这些部件块通过双向匹配对齐到候选部件描述,从而允许每个形状具有灵活数量的部件。 关键创新在于融合几何信息、多视角外观特征以及从LLM生成的、感知可操作性的描述中获得的语义知识(例如,“椅子的座位”)。这消除了部件名称的歧义,并提高了定位精度。ALIGN-Parts使用文本对齐损失、掩码准确率损失和部件性预测损失进行训练,并辅以正则化项以提高分割质量。 在3DCoMPaT++、PartNet和Find3D等数据集上的实验表明,ALIGN-Parts显著优于现有方法——最高可达到60%的mIoU和显著的标签准确率提升——同时通过避免事后聚类,速度提高了约100倍。该方法还促进了不同部件分类法的对齐,以实现统一训练和可扩展的标注。未来的工作将侧重于提高对噪声数据的鲁棒性,并扩展到铰接对象。

Hacker News新 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交登录 命名那部分:3D零件分割和命名 (name-that-part.github.io) 33点 由 unisub_guy 2天前 | 隐藏 | 过去 | 收藏 | 4评论 darubedarob 18小时前 | 下一个 [–] 这能用于图集纹理吗? 就像AI创建彩色分割图,然后将UV移动到图集中的相应纹理?回复 unisub_guy 10小时前 | 父级 | 下一个 [–] 这是一个很棒的应用。是的,我认为这项工作可以做到。回复 mabedan 1天前 | 前一个 | 下一个 [–] 很棒的东西。页面上的布局移动动画让阅读变得非常困难。回复 3d_stuff 2天前 | 前一个 [–] 在视频游戏中会很棒 - 就像角色可以与3D资产的任何部分交互,而无需事先显式保存交互。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Motivation

Many vision and graphics applications require 3D parts, not just whole-object labels: robots must grasp handles, and creators need editable, semantically meaningful components. This requires solving two problems at once: segmenting parts and naming them.

While part-annotated datasets exist, their label definitions are often inconsistent across sources, limiting robust training and evaluation. Existing approaches typically cover only one side of the problem: segmentation-only models produce unnamed regions, while language-grounded systems often retrieve one part at a time and fail to produce a complete named decomposition.

Introduction

ALIGN-Parts reframes named 3D part segmentation as a set-to-set alignment problem. Instead of labeling each point independently, we predict a small set of partlets - each partlet represents one part with (i) a soft segmentation mask over points and (ii) a text embedding that can be matched to part descriptions. We then align predicted partlets to candidate descriptions via bipartite matching, enforcing permutation consistency and allowing a null option so the number of parts can adapt per shape.

To make partlets both geometrically separable and semantically meaningful, we fuse (1) geometry from a 3D part-field backbone, (2) multi-view appearance features lifted onto 3D, and (3) semantic knowledge from LLM-generated, affordance-aware descriptions (e.g., “the horizontal surface of a chair where a person sits”).

Bare part names can be ambiguous across categories (e.g., “legs”). ALIGN-Parts trains with LLM-generated affordance-aware descriptions (embedded with a sentence transformer) to disambiguate part naming during set alignment.

Training losses

Setup & notation. We represent a 3D shape as a point set $\mathcal{P}=\{\mathbf{x}_i\}_{i=1}^N$ (sampled from a mesh/point cloud). The model predicts $K$ Partlets, each with mask logits $\mathbf{m}_k\in\mathbb{R}^{N}$ and a text embedding $\hat{\mathbf{z}}_k\in\mathbb{R}^{d_t}$. Ground-truth provides $A$ part masks $\mathbf{m}^{\mathrm{gt}}_a\in\{0,1\}^{N}$ with text embeddings $\hat{\mathbf{t}}_a\in\mathbb{R}^{d_t}$. A differentiable set matching (Sinkhorn) yields an assignment $\pi(k)\in\{1,\ldots,A\}\cup\{\emptyset\}$; let $\mathcal{M}=\{k:\pi(k)\neq\emptyset\}$ denote matched Partlets.

Text alignment (InfoNCE). Makes Partlet embeddings nameable by pulling matched (Partlet, text) pairs together and pushing others apart.

$$ ℒ_{\text{text}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} -\log\frac{\exp(\hat{\mathbf{z}}_k\cdot\hat{\mathbf{t}}_{\pi(k)}/\tau)} {\sum_{a=1}^{A}\exp(\hat{\mathbf{z}}_k\cdot\hat{\mathbf{t}}_a/\tau)} $$

Mask supervision (BCE + Dice). Encourages accurate part boundaries and robust overlap with ground-truth parts.

$$ ℒ_{\text{mask}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} \Big[\mathrm{BCE}(\mathbf{m}_k,\mathbf{m}^{\mathrm{gt}}_{\pi(k)}) +\big(1-\mathrm{Dice}(\sigma(\mathbf{m}_k),\mathbf{m}^{\mathrm{gt}}_{\pi(k)})\big)\Big] $$

Partness loss. Learns when a Partlet should be “active” vs. “no-part”, enabling variable part counts.

$$ ℒ_{\text{part}}=\frac{1}{K}\sum_{k=1}^{K}\mathrm{BCE}(\text{part}_k,\mathbf{1}[\pi(k)\neq\emptyset]) $$

Regularizers. Reduce over/under-segmentation and prevent multiple Partlets from claiming the same points.

$$ ℒ_{\text{cov}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} \left|\frac{\sum_i \sigma(m_{ki})-\sum_i m^{\mathrm{gt}}_{\pi(k)i}}{N}\right| \qquad ℒ_{\text{overlap}}=\frac{1}{N}\sum_{i=1}^{N}\Big(\sum_{k=1}^{K}\sigma(m_{ki})-1\Big)^2 $$

Total objective. A weighted sum of the above terms (plus an auxiliary global alignment loss):

$$ ℒ_{\text{total}}= \lambda_{\text{mask}}ℒ_{\text{mask}}+ \lambda_{\text{part}}ℒ_{\text{part}}+ \lambda_{\text{text}}ℒ_{\text{text}}+ \lambda_{\text{cov}}ℒ_{\text{cov}}+ \lambda_{\text{ov}}ℒ_{\text{overlap}} $$

Experiments

We evaluate ALIGN-Parts on named 3D part segmentation across 3DCoMPaT++, PartNet, and Find3D, using class-agnostic segmentation (mIoU) and two label-aware metrics - LA-mIoU (strict) and rLA-mIoU (relaxed) - that measure whether predicted parts are named correctly. ALIGN-Parts outperforms strong baselines while avoiding slow, post-hoc clustering, yielding ~100× faster inference.

We also align heterogeneous taxonomies via a two-stage pipeline (embedding similarity + LLM validation), enabling unified training on consistent part semantics and supporting scalable annotation with human verification.

Metrics. mIoU evaluates geometric segmentation quality while ignoring part names. LA-mIoU assigns IoU credit only when the predicted part name exactly matches the ground-truth label. rLA-mIoU softens strict matching by weighting IoU using cosine similarity between MPNet text embeddings of predicted and ground-truth names (e.g., “screen” vs. “monitor”), making evaluation robust to near-synonyms. By construction, mIoU $\ge$ rLA-mIoU $\ge$ LA-mIoU.