Weakly-Aligned Cross-Modal Learning Framework for Subsurface Defect Segmentation on Building Facades Using Unmanned Aerial Vehicles

This study introduces a Weakly-aligned Cross-modal Learning (WCL) framework for subsurface defect segmentation using UAVs. The proposed WCL framework comprises two main components: the Multimodal Feature Description Network (MFDN) and the Prompt-aided Cross-modal Graph Learning (PCGL) algorithm. Initially, the undistorted RGB and infrared images are processed by MFDN to extract local feature descriptors for multi-modal alignment due to UAV motion. Subsequently, the PCGL algorithm is developed to identify visually critical areas by implementing graph partitioning on a prompt-aided Wasserstein graph. Then, the critical visual areas are transferred to the well-aligned infrared image and a Wasserstein adjacency graph is constructed based on masked superpixel segmentation. Moreover, an edge-based method is developed for pinpointing the location and contour of defects by detecting abnormal vertices on the WAG. The practicality and efficiency of the proposed methodology are validated through controlled laboratory experiments on concrete samples and field applications on tiled facades.

Keywords: Infrared thermography, unmanned aerial vehicle, building facade inspection, subsurface defect segmentation, multimodal image alignment, prompt-aided multimodal graph learning.

Suggested Citation: Suggested Citation

HE, Sudao and Zhao, Gang and Chen, Jun and Zhang, Shenghan and Mishra, Dhanda and Yuen, Matthew MF, Weakly-Aligned Cross-Modal Learning Framework for Subsurface Defect Segmentation on Building Facades Using Unmanned Aerial Vehicles. Available at SSRN: https://ssrn.com/abstract=4845688 or http://dx.doi.org/10.2139/ssrn.4845688