H-Hrgan: Knowledge Graph-Driven Representation for Missing Value Imputation
17 Pages Posted: 21 Apr 2025
Abstract
Missing data is a pervasive challenge in real-world applications, often arising from non-responses, sensor failures, or system inconsistencies. While numerous imputation techniques have been proposed, most are designed specifically for continuous variables and tend to perform poorly when applied to categorical data. In structured formats, such as tabular datasets, categorical variables exhibit intricate semantic relationships that are inadequately captured by conventional methods relying on one-hot encodings or statistical heuristics. To address this issue, we represent each discrete feature value as a node and treat attributes as relations, constructing a knowledge graph. We then propose the Heterogeneous-Homogeneous Relational Graph Attention Network (H-HRGAN), a novel framework for imputing missing categorical values. A hierarchical graph structure is constructed to separately capture heterogeneous attribute-value relations and homogeneous co-occurrence patterns. Additionally, a relational graph attention mechanism is employed to perform multi-level reasoning over this structure. Through this mechanism, we leverage a Graph Neural Network (GNN) framework to achieve more rational feature aggregation, leading to improved predictive performance on categorical data. Extensive experiments on multiple real-world datasets demonstrate that H-HRGAN outperforms state-of-the-art imputation methods, particularly under high missingness rates and complex dependency scenarios.
Keywords: Data imputation Knowledge graph Knowledge representationGraph neural network
Suggested Citation: Suggested Citation