TY - JOUR
T1 - Propagate and Inject
T2 - 42nd International Conference on Machine Learning, ICML 2025
AU - Um, Daeho
AU - Kim, Sunoh
AU - Park, Jiwoong
AU - Lim, Jongin
AU - Ahn, Seong Jin
AU - Park, Seulki
N1 - Publisher Copyright:
© 2025, ML Research Press. All rights reserved.
PY - 2025
Y1 - 2025
N2 - In this paper, we address learning tasks on graphs with missing features, enhancing the applicability of graph neural networks to real-world graph-structured data. We identify a critical limitation of existing imputation methods based on feature propagation: they produce channels with nearly identical values within each channel, and these low-variance channels contribute very little to performance in graph learning tasks. To overcome this issue, we introduce synthetic features that target the root cause of low-variance channel production, thereby increasing variance in these channels. By preventing propagation-based imputation methods from generating meaningless feature values shared across all nodes, our synthetic feature propagation scheme mitigates significant performance degradation, even under extreme missing rates. Extensive experiments demonstrate the effectiveness of our approach across various graph learning tasks with missing features, ranging from low to extremely high missing rates. Additionally, we provide both empirical evidence and theoretical proof to validate the low-variance problem. The source code is available at https://github.com/daehoum1/fisf.
AB - In this paper, we address learning tasks on graphs with missing features, enhancing the applicability of graph neural networks to real-world graph-structured data. We identify a critical limitation of existing imputation methods based on feature propagation: they produce channels with nearly identical values within each channel, and these low-variance channels contribute very little to performance in graph learning tasks. To overcome this issue, we introduce synthetic features that target the root cause of low-variance channel production, thereby increasing variance in these channels. By preventing propagation-based imputation methods from generating meaningless feature values shared across all nodes, our synthetic feature propagation scheme mitigates significant performance degradation, even under extreme missing rates. Extensive experiments demonstrate the effectiveness of our approach across various graph learning tasks with missing features, ranging from low to extremely high missing rates. Additionally, we provide both empirical evidence and theoretical proof to validate the low-variance problem. The source code is available at https://github.com/daehoum1/fisf.
UR - https://www.scopus.com/pages/publications/105023829078
M3 - Conference article
AN - SCOPUS:105023829078
SN - 2640-3498
VL - 267
SP - 60530
EP - 60560
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 13 July 2025 through 19 July 2025
ER -