Cross-modal deep learning framework for 3D reconstruction and information integration of Zhejiang wood carving heritage.
Juya Wang, Haifeng Xu
Abstract
Open AccessCurrent 3D reconstruction methods employ fixed-weight fusion strategies that ignore local surface characteristics in heritage artifacts. This study introduces an adaptive cross-modal deep learning framework for Zhejiang wood carving heritage, featuring surface-complexity-aware gating networks that dynamically weight geometric and visual modalities based on local informativeness. The approach was validated on a newly constructed dataset of 300 annotated artifacts captured using hybrid laser scanning and eight-camera RGB-D arrays. Experimental results demonstrate that the framework achieves 0.52 mm Chamfer Distance and 86.7% F-Score, representing 20% and 6.8% improvements over 3D Gaussian Splatting, while semantic segmentation reaches 76.3% mIoU, surpassing point-cloud-only methods by 11.7%, with particularly superior performance in preserving intricate openwork structures and relief patterns. The adaptive fusion strategy successfully addresses the heterogeneous nature of wood carving surfaces through mathematically principled modality weighting, establishing new benchmarks for heritage reconstruction accuracy. The framework enables museums and conservation institutions to document collections with unprecedented precision while maintaining semantic context, facilitating virtual apprenticeship systems for traditional craft education and intelligent heritage resource management through automated cataloging and quantitative deterioration assessment.