Omni3DEdit A Unified 3D Editing Benchmark with Region Annotations

Hongxing Fan1,*, Haotian Lu1,*, Rui Chen1,*, Weibin Yun1, Zehuan Huang1, Lu Sheng1,†

1Beihang University

*Equal contribution. Corresponding author: Lu Sheng.

0

paired source-target assets

0

edit families

0

curated test pairs

Omni3DEdit subset overview

Project Overview

Why Omni3DEdit

Problem

Most benchmarks only judge realism, not where to edit.

Core Idea

Omni3DEdit adds region annotations to score faithfulness, preservation, and locality together.

Outcome

A practical region-aware benchmark with standardized assets, metrics, and strong baselines.

Instruction-guided 3D editing has moved quickly from optimization-heavy pipelines to native 3D foundation models. Yet prior benchmarks rarely include explicit edited-region supervision or a standardized protocol for assessing if models edit the intended parts and preserve the rest. Omni3DEdit addresses this gap with 128,906 paired source-target assets across five edit families, each with instruction text and edited-region annotation.

Dataset

Five Edit Families

Pose

23,648 pairs

Pose changes from animated or rigged assets with frame pairs selected by deformation magnitude.

Structure

40,000 pairs

Rigid part add/remove operations built from hierarchical part annotations.

Articulation

6,875 pairs

Joint-state edits with motion-aware edited regions derived from kinematic sweeps.

Part-Edit

26,407 pairs

Local free-form geometry edits with explicit part targeting and realistic instructions.

Material

31,976 pairs

Appearance-only changes on texture or PBR materials while preserving geometry.

Omni3DEdit teaser examples
Project teaser showing representative source-target editing pairs and region-aware supervision.

Method

Unified Region-Aware Benchmark Design

Omni3DEdit method overview
Method overview: unified region-aware benchmark pipeline and evaluation design.

Benchmark

Evaluation Framework

Edit Intent Alignment

Evaluate whether the generated edit follows user intent described in natural language and keeps the semantic direction of transformation.

Instruction-level consistency

Non-edited Region Preservation

Check structural and visual stability in untouched regions to ensure edits remain localized rather than globally destructive.

Geometry and appearance stability

Localized Edit Behavior

Analyze where changes happen and whether the modified support aligns with target edit regions.

Region-aware change analysis

Cross-Scenario Generality

Keep one unified protocol across multiple edit families, enabling consistent comparison under different task settings.

Unified multi-family evaluation

Main Results

Official test split comparison. We evaluate instruction faithfulness, preservation, and edit locality under both text-only and image-conditioned settings. Source Copy is the oracle baseline (upper bound for preservation, lower bound for faithfulness).

Text-only Evaluation

Method CLIP-T ↑ DINO-I ↑ Preserve CD ↓ mPSNR ↑ mSSIM ↑ mLPIPS ↓ Region-F1 ↑ Runtime ↓
Source Copy 0.240 0.989 0.215 94.908 0.997 0.003 0.001 --
Native3DEditing [Cai et al., 2025] 0.253 0.622 0.970 25.770 0.783 0.283 0.165 15s
VoxHammer [Li et al., 2025] 0.249 0.700 0.224 24.048 0.800 0.293 0.130 133s
Instant3D 0.249 0.797 0.387 54.204 0.997 0.002 0.973 --
editp23 0.236 0.438 0.496 17.478 0.836 0.240 0.512 --
nano3d 0.246 0.802 0.237 21.352 0.924 0.135 0.580 --

Subset-wise Results

Per-family breakdown on text-only evaluation. We report instruction faithfulness (CLIP-T ↑), preservation (mLPIPS ↓), and edit locality (Region-F1 ↑) for each of the five edit families. "--" indicates the method does not produce region masks for that subset, so locality metrics cannot be computed.

Method Pose Structure Articulation Part-Edit Material
CLIP-T ↑mLPIPS ↓R-F1 ↑ CLIP-T ↑mLPIPS ↓R-F1 ↑ CLIP-T ↑mLPIPS ↓R-F1 ↑ CLIP-T ↑mLPIPS ↓R-F1 ↑ CLIP-T ↑mLPIPS ↓R-F1 ↑
Source Copy 0.2350.000-- 0.2370.0000.000 0.2440.0090.001 0.2580.0000.000 0.2270.0000.002
Native3DEditing [Cai et al., 2025] 0.2550.107-- 0.2500.3640.001 0.2540.3240.019 0.2590.3230.006 0.2380.3010.228
VoxHammer [Li et al., 2025] ------ 0.2470.3720.001 0.2420.2270.004 0.2580.2800.001 0.3540.2900.475
Instant3D ------ 0.2500.0030.965 0.2480.0020.980 0.2520.0020.975 0.2500.0020.972
editp23 0.2300.235-- 0.2400.2500.500 0.2350.2420.520 0.2380.2380.490 0.2370.2350.538
nano3d 0.2400.130-- 0.2500.1400.570 0.2450.1350.590 0.2480.1380.560 0.2470.1320.600

Showcase

GLB Examples By Category

Pose

Source

0b5b51b578ea43f7b0429db362e46267_frame_0000.glb

Target

0b5b51b578ea43f7b0429db362e46267_frame_0088.glb

Instruction: Extend both arms forward and slightly outward

Material

Source

textured_mesh_04_14c__0.glb

Target

textured_mesh_04_14c__0_1.glb

Mask

mask_material.glb

Instruction: Apply Blue-grey striped matte woven fabric to the pillow

Articulation

Source

103761_mobility.glb

Target

103761_mobility_mod_1.glb

Mask

mask_articulation.glb

Instruction: Rotate the lid by approximately 90 degrees clockwise.

Structure

Source

18796_without_original-1_source.glb

Target

18796_complete_target.glb

Mask

mask_structure.glb

Instruction: Attach the tabletop

Part-Edit

Source

115.glb

Target

115_modified_original-1.glb

Mask

mask_partedit.glb

Instruction: Turn the handle to a textured, ergonomically shaped handle.

Citation

BibTeX

Copy citation
@inproceedings{fan2026omni3dedit,
	  title     = {Omni3DEdit: A Unified 3D Editing Benchmark with Region Annotations},
	  author    = {Fan, Hongxing and Lu, Haotian and Chen, Rui and Yun, Weibin and Huang, Zehuan and Sheng, Lu},
	  year      = {2026}
	}