Omni3DEdit

Project Overview

Why Omni3DEdit

Problem

Most benchmarks only judge realism, not where to edit.

Core Idea

Omni3DEdit adds region annotations to score faithfulness, preservation, and locality together.

Outcome

A practical region-aware benchmark with standardized assets, metrics, and strong baselines.

Instruction-guided 3D editing has moved quickly from optimization-heavy pipelines to native 3D foundation models. Yet prior benchmarks rarely include explicit edited-region supervision or a standardized protocol for assessing if models edit the intended parts and preserve the rest. Omni3DEdit addresses this gap with 128,906 paired source-target assets across five edit families, each with instruction text and edited-region annotation.

Dataset

Five Edit Families

Pose

23,648 pairs

Pose changes from animated or rigged assets with frame pairs selected by deformation magnitude.

Structure

40,000 pairs

Rigid part add/remove operations built from hierarchical part annotations.

Articulation

6,875 pairs

Joint-state edits with motion-aware edited regions derived from kinematic sweeps.

Part-Edit

26,407 pairs

Local free-form geometry edits with explicit part targeting and realistic instructions.

Material

31,976 pairs

Appearance-only changes on texture or PBR materials while preserving geometry.

Omni3DEdit teaser examples — Project teaser showing representative source-target editing pairs and region-aware supervision.

Method

Unified Region-Aware Benchmark Design

Benchmark

Evaluation Framework

Edit Intent Alignment

Evaluate whether the generated edit follows user intent described in natural language and keeps the semantic direction of transformation.

Instruction-level consistency

Non-edited Region Preservation

Check structural and visual stability in untouched regions to ensure edits remain localized rather than globally destructive.

Geometry and appearance stability

Localized Edit Behavior

Analyze where changes happen and whether the modified support aligns with target edit regions.

Region-aware change analysis

Cross-Scenario Generality

Keep one unified protocol across multiple edit families, enabling consistent comparison under different task settings.

Unified multi-family evaluation

Main Results

Official test split comparison. We evaluate instruction faithfulness, preservation, and edit locality under both text-only and image-conditioned settings. Source Copy is the oracle baseline (upper bound for preservation, lower bound for faithfulness).

Text-only Evaluation

Method	CLIP-T ↑	DINO-I ↑	Preserve CD ↓	mPSNR ↑	mSSIM ↑	mLPIPS ↓	Region-F1 ↑	Runtime ↓
Source Copy	0.240	0.989	0.215	94.908	0.997	0.003	0.001	--
Native3DEditing [Cai et al., 2025]	0.253	0.622	0.970	25.770	0.783	0.283	0.165	15s
VoxHammer [Li et al., 2025]	0.249	0.700	0.224	24.048	0.800	0.293	0.130	133s
Instant3D	0.249	0.797	0.387	54.204	0.997	0.002	0.973	--
editp23	0.236	0.438	0.496	17.478	0.836	0.240	0.512	--
nano3d	0.246	0.802	0.237	21.352	0.924	0.135	0.580	--

Subset-wise Results

Per-family breakdown on text-only evaluation. We report instruction faithfulness (CLIP-T ↑), preservation (mLPIPS ↓), and edit locality (Region-F1 ↑) for each of the five edit families. "--" indicates the method does not produce region masks for that subset, so locality metrics cannot be computed.

Method	Pose			Structure			Articulation			Part-Edit			Material
	CLIP-T ↑	mLPIPS ↓	R-F1 ↑	CLIP-T ↑	mLPIPS ↓	R-F1 ↑	CLIP-T ↑	mLPIPS ↓	R-F1 ↑	CLIP-T ↑	mLPIPS ↓	R-F1 ↑	CLIP-T ↑	mLPIPS ↓	R-F1 ↑
Source Copy	0.235	0.000	--	0.237	0.000	0.000	0.244	0.009	0.001	0.258	0.000	0.000	0.227	0.000	0.002
Native3DEditing [Cai et al., 2025]	0.255	0.107	--	0.250	0.364	0.001	0.254	0.324	0.019	0.259	0.323	0.006	0.238	0.301	0.228
VoxHammer [Li et al., 2025]	--	--	--	0.247	0.372	0.001	0.242	0.227	0.004	0.258	0.280	0.001	0.354	0.290	0.475
Instant3D	--	--	--	0.250	0.003	0.965	0.248	0.002	0.980	0.252	0.002	0.975	0.250	0.002	0.972
editp23	0.230	0.235	--	0.240	0.250	0.500	0.235	0.242	0.520	0.238	0.238	0.490	0.237	0.235	0.538
nano3d	0.240	0.130	--	0.250	0.140	0.570	0.245	0.135	0.590	0.248	0.138	0.560	0.247	0.132	0.600

Showcase

GLB Examples By Category

Pose

Source

0b5b51b578ea43f7b0429db362e46267_frame_0000.glb

Target

0b5b51b578ea43f7b0429db362e46267_frame_0088.glb

Instruction: Extend both arms forward and slightly outward

Material

Source

textured_mesh_04_14c__0.glb

Target

textured_mesh_04_14c__0_1.glb

Mask

mask_material.glb

Instruction: Apply Blue-grey striped matte woven fabric to the pillow

Articulation

Source

103761_mobility.glb

Target

103761_mobility_mod_1.glb

Mask

mask_articulation.glb

Instruction: Rotate the lid by approximately 90 degrees clockwise.

Structure

Source

18796_without_original-1_source.glb

Target

18796_complete_target.glb

Mask

mask_structure.glb

Instruction: Attach the tabletop

Part-Edit

Source

115.glb

Target

115_modified_original-1.glb

Mask

mask_partedit.glb

Instruction: Turn the handle to a textured, ergonomically shaped handle.

Resources

BibTeX

@inproceedings{fan2026omni3dedit,
	  title     = {Omni3DEdit: A Unified 3D Editing Benchmark with Region Annotations},
	  author    = {Fan, Hongxing and Lu, Haotian and Chen, Rui and Yun, Weibin and Huang, Zehuan and Sheng, Lu},
	  year      = {2026}
	}

Omni3DEdit A Unified 3D Editing Benchmark with Region Annotations

Why Omni3DEdit

Problem

Core Idea

Outcome

Five Edit Families

Unified Region-Aware Benchmark Design

Evaluation Framework

Edit Intent Alignment

Non-edited Region Preservation

Localized Edit Behavior

Cross-Scenario Generality

Main Results

Text-only Evaluation

Subset-wise Results

GLB Examples By Category

Everything For Reproducibility

Paper

Supplement

Code

Dataset

BibTeX