Problem
Most benchmarks only judge realism, not where to edit.
1Beihang University
*Equal contribution. †Corresponding author: Lu Sheng.
0
paired source-target assets
0
edit families
0
curated test pairs
Project Overview
Most benchmarks only judge realism, not where to edit.
Omni3DEdit adds region annotations to score faithfulness, preservation, and locality together.
A practical region-aware benchmark with standardized assets, metrics, and strong baselines.
Instruction-guided 3D editing has moved quickly from optimization-heavy pipelines to native 3D foundation models. Yet prior benchmarks rarely include explicit edited-region supervision or a standardized protocol for assessing if models edit the intended parts and preserve the rest. Omni3DEdit addresses this gap with 128,906 paired source-target assets across five edit families, each with instruction text and edited-region annotation.
Dataset
Pose
23,648 pairs
Pose changes from animated or rigged assets with frame pairs selected by deformation magnitude.
Structure
40,000 pairs
Rigid part add/remove operations built from hierarchical part annotations.
Articulation
6,875 pairs
Joint-state edits with motion-aware edited regions derived from kinematic sweeps.
Part-Edit
26,407 pairs
Local free-form geometry edits with explicit part targeting and realistic instructions.
Material
31,976 pairs
Appearance-only changes on texture or PBR materials while preserving geometry.
Method
Benchmark
Evaluate whether the generated edit follows user intent described in natural language and keeps the semantic direction of transformation.
Check structural and visual stability in untouched regions to ensure edits remain localized rather than globally destructive.
Analyze where changes happen and whether the modified support aligns with target edit regions.
Keep one unified protocol across multiple edit families, enabling consistent comparison under different task settings.
Official test split comparison. We evaluate instruction faithfulness, preservation, and edit locality under both text-only and image-conditioned settings. Source Copy is the oracle baseline (upper bound for preservation, lower bound for faithfulness).
| Method | CLIP-T ↑ | DINO-I ↑ | Preserve CD ↓ | mPSNR ↑ | mSSIM ↑ | mLPIPS ↓ | Region-F1 ↑ | Runtime ↓ |
|---|---|---|---|---|---|---|---|---|
| Source Copy | 0.240 | 0.989 | 0.215 | 94.908 | 0.997 | 0.003 | 0.001 | -- |
| Native3DEditing [Cai et al., 2025] | 0.253 | 0.622 | 0.970 | 25.770 | 0.783 | 0.283 | 0.165 | 15s |
| VoxHammer [Li et al., 2025] | 0.249 | 0.700 | 0.224 | 24.048 | 0.800 | 0.293 | 0.130 | 133s |
| Instant3D | 0.249 | 0.797 | 0.387 | 54.204 | 0.997 | 0.002 | 0.973 | -- |
| editp23 | 0.236 | 0.438 | 0.496 | 17.478 | 0.836 | 0.240 | 0.512 | -- |
| nano3d | 0.246 | 0.802 | 0.237 | 21.352 | 0.924 | 0.135 | 0.580 | -- |
Per-family breakdown on text-only evaluation. We report instruction faithfulness (CLIP-T ↑), preservation (mLPIPS ↓), and edit locality (Region-F1 ↑) for each of the five edit families. "--" indicates the method does not produce region masks for that subset, so locality metrics cannot be computed.
| Method | Pose | Structure | Articulation | Part-Edit | Material | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CLIP-T ↑ | mLPIPS ↓ | R-F1 ↑ | CLIP-T ↑ | mLPIPS ↓ | R-F1 ↑ | CLIP-T ↑ | mLPIPS ↓ | R-F1 ↑ | CLIP-T ↑ | mLPIPS ↓ | R-F1 ↑ | CLIP-T ↑ | mLPIPS ↓ | R-F1 ↑ | |
| Source Copy | 0.235 | 0.000 | -- | 0.237 | 0.000 | 0.000 | 0.244 | 0.009 | 0.001 | 0.258 | 0.000 | 0.000 | 0.227 | 0.000 | 0.002 |
| Native3DEditing [Cai et al., 2025] | 0.255 | 0.107 | -- | 0.250 | 0.364 | 0.001 | 0.254 | 0.324 | 0.019 | 0.259 | 0.323 | 0.006 | 0.238 | 0.301 | 0.228 |
| VoxHammer [Li et al., 2025] | -- | -- | -- | 0.247 | 0.372 | 0.001 | 0.242 | 0.227 | 0.004 | 0.258 | 0.280 | 0.001 | 0.354 | 0.290 | 0.475 |
| Instant3D | -- | -- | -- | 0.250 | 0.003 | 0.965 | 0.248 | 0.002 | 0.980 | 0.252 | 0.002 | 0.975 | 0.250 | 0.002 | 0.972 |
| editp23 | 0.230 | 0.235 | -- | 0.240 | 0.250 | 0.500 | 0.235 | 0.242 | 0.520 | 0.238 | 0.238 | 0.490 | 0.237 | 0.235 | 0.538 |
| nano3d | 0.240 | 0.130 | -- | 0.250 | 0.140 | 0.570 | 0.245 | 0.135 | 0.590 | 0.248 | 0.138 | 0.560 | 0.247 | 0.132 | 0.600 |
Showcase
Pose
Source
0b5b51b578ea43f7b0429db362e46267_frame_0000.glb
Target
0b5b51b578ea43f7b0429db362e46267_frame_0088.glb
Instruction: Extend both arms forward and slightly outward
Material
Source
textured_mesh_04_14c__0.glb
Target
textured_mesh_04_14c__0_1.glb
Mask
mask_material.glb
Instruction: Apply Blue-grey striped matte woven fabric to the pillow
Articulation
Source
103761_mobility.glb
Target
103761_mobility_mod_1.glb
Mask
mask_articulation.glb
Instruction: Rotate the lid by approximately 90 degrees clockwise.
Structure
Source
18796_without_original-1_source.glb
Target
18796_complete_target.glb
Mask
mask_structure.glb
Instruction: Attach the tabletop
Part-Edit
Source
115.glb
Target
115_modified_original-1.glb
Mask
mask_partedit.glb
Instruction: Turn the handle to a textured, ergonomically shaped handle.
Citation
@inproceedings{fan2026omni3dedit,
title = {Omni3DEdit: A Unified 3D Editing Benchmark with Region Annotations},
author = {Fan, Hongxing and Lu, Haotian and Chen, Rui and Yun, Weibin and Huang, Zehuan and Sheng, Lu},
year = {2026}
}