New Perspectives and Methods in Link Prediction

☆

https://www.researchgate.net/profile/Nitesh_Chawla/publication/221654448_New_perspectives_and_methods_in_link_prediction/links/0deec51ff3a2aa4a88000000/New-perspectives-and-methods-in-link-prediction.pdf

Authors: Ryan N. Lichtenwalter, Jake T. Lussier, Nitesh V. Chawla
Liked by:
Domains: Dynamic graph
Tags: Link prediction

Uploaded by: Open Reading Group
Upload date: 2018-02-08 11:01:08

Comments:

Nice reference on supervised link prediction and comparisons to unsupervised techniques, which gives a good overview of possible methods and their limits. The general idea is to put forward the relevance of a supervised approach to predict links. My main comment is that unsupervised and supervised approaches are so different that they are hardly comparable, and I guess that in most cases people would use supervised learning if their data allow them to do so... __Summary__ A few questions discussed in this paper: _Comparison unsupervised vs supervised_ Unsupervised: we use a score (for pairs of nodes) correlated to the existence of a link, e.g. the Adar-Adamic index, pairs are ranked according to this score, then the top score pairs are selected for prediction. Supervised: it turns to a standard binary classification task, with a ground truth to evaluate the quality of the results, so we can define usual evaluation metrics: precision, recall, ROC curve, etc. _The problem of class imbalance_ When comparing the number of links that actually exists ($M$ which is more or less of the order of magnitude of $N$ in real networks), to the number of possible links (in $N^2$), the ratio is in $\frac{1}{N}$. It implies that the class of existing links is much smaller than the class of possible links, making the classification task imbalanced. This problem tends to worsen with the size of the network. _Variance reduction_ With a supervised approach, it is possible to use ensemble frameworks to reduce variance and avoid overfitting (e.g. bagging, random forests). The broad idea is to transform the learning into an ensemble of learnings and average the results (typically via voting between classes). _Class of pairs_ To improve the prediction quality, they divide pairs into classes depending on the distance in the graph between nodes, then learn on each class independently. It reduces drastically the ratio (existing link) / (possible link) in each of these classes.

Open Reading Group at 2018-05-10 16:51:29

Papers^γ

Comments:

Please consider to register or login to comment on the paper.

Papersγ

Comments:

Please consider to register or login to comment on the paper.

Papers^γ