affiliation not provided to SSRN
Indoor scene understanding, 3D Scene graph, 3D Point cloud, Visual-text model, Multimodal features