### Can be extended to k-cliques and k-motifs:
- To list k-cliques: https://github.com/maxdan94/kClist
- To list k-motifs: https://github.com/maxdan94/kmotif
### Typos:
- ", cf.[23],"
- "conductance problem.Notice that"
- "(CNM) [12] , Cfinder"
- "Louvaine" instead of "Louvain" in Table 2
### Can be extended to k-cliques and k-motifs:
- To list k-cliques: https://github.com/maxdan94/kClist
- To list k-motifs: https://github.com/maxdan94/kmotif
### Typos:
- ", cf.[23],"
- "conductance problem.Notice that"
- "(CNM) [12] , Cfinder"
- "Louvaine" instead of "Louvain" in Table 2
Great paper!
### Comparison to existing methods:
Doing some experimental comparisons against [Eppstein and Wang 2004](https://papers-gamma.link/paper/35) for the closeness centrality might be interesting.
### Typos:
- $|\mathscr{B}_{G}(v,t)|-|\mathscr{B}_{G}(v,t-1)|$ instead of $|\mathscr{B}_{G}(v,t+1)|-|\mathscr{B}_{G}(v,t)|$. This is corrected in the other formula on centralities.
- "the the reciprocal of a"
- "can be easily computed in a cumulative fashion nothing that"
- "on the approximation the diameter"
- "its importance it by 1/2"
### Minors:
- "Nodes with empty coreachable set have centrality 1 by definition"
. By definition the coreachable set of a node is never empty, it contains at least the concerned node.
Great paper!
### Comparison to existing methods:
Doing some experimental comparisons against [Eppstein and Wang 2004](https://papers-gamma.link/paper/35) for the closeness centrality might be interesting.
### Typos:
- $|\mathscr{B}_{G}(v,t)|-|\mathscr{B}_{G}(v,t-1)|$ instead of $|\mathscr{B}_{G}(v,t+1)|-|\mathscr{B}_{G}(v,t)|$. This is corrected in the other formula on centralities.
- "the the reciprocal of a"
- "can be easily computed in a cumulative fashion nothing that"
- "on the approximation the diameter"
- "its importance it by 1/2"
### Minors:
- "Nodes with empty coreachable set have centrality 1 by definition"
. By definition the coreachable set of a node is never empty, it contains at least the concerned node.
Nice reference on supervised link prediction and comparisons to unsupervised techniques, which gives a good overview of possible methods and their limits. The general idea is to put forward the relevance of a supervised approach to predict links.
My main comment is that unsupervised and supervised approaches are so different that they are hardly comparable, and I guess that in most cases people would use supervised learning if their data allow them to do so...
__Summary__
A few questions discussed in this paper:
_Comparison unsupervised vs supervised_
Unsupervised: we use a score (for pairs of nodes) correlated to the existence of a link, e.g. the Adar-Adamic index, pairs are ranked according to this score, then the top score pairs are selected for prediction.
Supervised: it turns to a standard binary classification task, with a ground truth to evaluate the quality of the results, so we can define usual evaluation metrics: precision, recall, ROC curve, etc.
_The problem of class imbalance_
When comparing the number of links that actually exists ($M$ which is more or less of the order of magnitude of $N$ in real networks), to the number of possible links (in $N^2$), the ratio is in $\frac{1}{N}$.
It implies that the class of existing links is much smaller than the class of possible links, making the classification task imbalanced. This problem tends to worsen with the size of the network.
_Variance reduction_
With a supervised approach, it is possible to use ensemble frameworks to reduce variance and avoid overfitting (e.g. bagging, random forests). The broad idea is to transform the learning into an ensemble of learnings and average the results (typically via voting between classes).
_Class of pairs_
To improve the prediction quality, they divide pairs into classes depending on the distance in the graph between nodes, then learn on each class independently. It reduces drastically the ratio (existing link) / (possible link) in each of these classes.
Nice reference on supervised link prediction and comparisons to unsupervised techniques, which gives a good overview of possible methods and their limits. The general idea is to put forward the relevance of a supervised approach to predict links.
My main comment is that unsupervised and supervised approaches are so different that they are hardly comparable, and I guess that in most cases people would use supervised learning if their data allow them to do so...
__Summary__
A few questions discussed in this paper:
_Comparison unsupervised vs supervised_
Unsupervised: we use a score (for pairs of nodes) correlated to the existence of a link, e.g. the Adar-Adamic index, pairs are ranked according to this score, then the top score pairs are selected for prediction.
Supervised: it turns to a standard binary classification task, with a ground truth to evaluate the quality of the results, so we can define usual evaluation metrics: precision, recall, ROC curve, etc.
_The problem of class imbalance_
When comparing the number of links that actually exists ($M$ which is more or less of the order of magnitude of $N$ in real networks), to the number of possible links (in $N^2$), the ratio is in $\frac{1}{N}$.
It implies that the class of existing links is much smaller than the class of possible links, making the classification task imbalanced. This problem tends to worsen with the size of the network.
_Variance reduction_
With a supervised approach, it is possible to use ensemble frameworks to reduce variance and avoid overfitting (e.g. bagging, random forests). The broad idea is to transform the learning into an ensemble of learnings and average the results (typically via voting between classes).
_Class of pairs_
To improve the prediction quality, they divide pairs into classes depending on the distance in the graph between nodes, then learn on each class independently. It reduces drastically the ratio (existing link) / (possible link) in each of these classes.
Comments: