Nice reference on supervised link prediction and comparisons to unsupervised techniques, which gives a good overview of possible methods and their limits. The general idea is to put forward the relevance of a supervised approach to predict links. My main comment is that unsupervised and supervised approaches are so different that they are hardly comparable, and I guess that in most cases people would use supervised learning if their data allow them to do so... __Summary__ A few questions discussed in this paper: _Comparison unsupervised vs supervised_ Unsupervised: we use a score (for pairs of nodes) correlated to the existence of a link, e.g. the Adar-Adamic index, pairs are ranked according to this score, then the top score pairs are selected for prediction. Supervised: it turns to a standard binary classification task, with a ground truth to evaluate the quality of the results, so we can define usual evaluation metrics: precision, recall, ROC curve, etc. _The problem of class imbalance_ When comparing the number of links that actually exists ($M$ which is more or less of the order of magnitude of $N$ in real networks), to the number of possible links (in $N^2$), the ratio is in $\frac{1}{N}$. It implies that the class of existing links is much smaller than the class of possible links, making the classification task imbalanced. This problem tends to worsen with the size of the network. _Variance reduction_ With a supervised approach, it is possible to use ensemble frameworks to reduce variance and avoid overfitting (e.g. bagging, random forests). The broad idea is to transform the learning into an ensemble of learnings and average the results (typically via voting between classes). _Class of pairs_ To improve the prediction quality, they divide pairs into classes depending on the distance in the graph between nodes, then learn on each class independently. It reduces drastically the ratio (existing link) / (possible link) in each of these classes.

You comment anonymously! You will not be able to edit/delete the comment.

Please consider to register or login.

Use $\LaTeX$ to type formulæ and markdown to format text.
When you post something to which you hold the copyright you authorise us to do distribute this data across the scientific community. You can post public domain content. All user-generated content will be freely available online. Please see this page to learn more about Papersγ's terms of use and privacy policy.