TY - JOUR
T1 - Unveiling the unseen potential of graph learning through MLPs
T2 - Effective graph learners using propagation-embracing MLPs
AU - Shin, Yong Min
AU - Shin, Won Yong
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/10/9
Y1 - 2024/10/9
N2 - Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semi-supervised node classification on graphs, by training a student MLP by knowledge distillation (KD) from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during KD, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation T and propagation Π, we re-frame the KD process as enabling the student MLP to explicitly learn both T and Π. Although this can be achieved by applying the inverse propagation Π−1 before distillation from the teacher GNN, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation Π−1. Through comprehensive evaluations using real-world benchmark datasets, we demonstrate the effectiveness of P&D by showing further performance boost of the student MLP.
AB - Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semi-supervised node classification on graphs, by training a student MLP by knowledge distillation (KD) from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during KD, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation T and propagation Π, we re-frame the KD process as enabling the student MLP to explicitly learn both T and Π. Although this can be achieved by applying the inverse propagation Π−1 before distillation from the teacher GNN, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation Π−1. Through comprehensive evaluations using real-world benchmark datasets, we demonstrate the effectiveness of P&D by showing further performance boost of the student MLP.
KW - Graph neural network
KW - Knowledge distillation
KW - Multilayer perceptron
KW - Propagation
KW - Semi-supervised node classification
UR - http://www.scopus.com/inward/record.url?scp=85200517524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85200517524&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.112297
DO - 10.1016/j.knosys.2024.112297
M3 - Article
AN - SCOPUS:85200517524
SN - 0950-7051
VL - 301
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 112297
ER -