Wellcome to Open Reading Group's library,

  You can find here all papers liked or uploaded by Open Reading Group
  together with brief user bio and description of her/his academic activity.


### Upcoming readings: No upcoming readings for now... ### Past Readings: - 21/05/2018 [Graph Embedding Techniques, Applications, and Performance: A Survey](https://papers-gamma.link/paper/52) - 14/05/2018 [A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem](https://papers-gamma.link/paper/24) - 07/05/2018 [VERSE: Versatile Graph Embeddings from Similarity Measures](https://papers-gamma.link/paper/48) - 30/04/2018 [Hierarchical Clustering Beyond the Worst-Case](https://papers-gamma.link/paper/45) - 16/04/2018 [Scalable Motif-aware Graph Clustering](https://papers-gamma.link/paper/18) - 02/04/2018 [Practical Algorithms for Linear Boolean-width](https://papers-gamma.link/paper/40) - 26/03/2018 [New Perspectives and Methods in Link Prediction](https://papers-gamma.link/paper/28/New%20Perspectives%20and%20Methods%20in%20Link%20Prediction) - 19/03/2018 [In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond](https://papers-gamma.link/paper/37) - 12/03/2018 [Diversity is All You Need: Learning Skills without a Reward Function](https://papers-gamma.link/paper/36) - 05/03/2018 [When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors](https://papers-gamma.link/paper/23) - 26/02/2018 [Fast Approximation of Centrality](https://papers-gamma.link/paper/35/Fast%20Approximation%20of%20Centrality) - 19/02/2018 [Indexing Public-Private Graphs](https://papers-gamma.link/paper/19/Indexing%20Public-Private%20Graphs) - 12/02/2018 [On the uniform generation of random graphs with prescribed degree sequences](https://papers-gamma.link/paper/26/On%20the%20uniform%20generation%20of%20random%20graphs%20with%20prescribed%20d%20egree%20sequences) - 05/02/2018 [Linear Additive Markov Processes](https://papers-gamma.link/paper/21/Linear%20Additive%20Markov%20Processes) - 29/01/2018 [ESCAPE: Efficiently Counting All 5-Vertex Subgraphs](https://papers-gamma.link/paper/17/ESCAPE:%20Efficiently%20Counting%20All%205-Vertex%20Subgraphs) - 22/01/2018 [The k-peak Decomposition: Mapping the Global Structure of Graphs](https://papers-gamma.link/paper/16/The%20k-peak%20Decomposition:%20Mapping%20the%20Global%20Structure%20of%20Graphs)

Comments:

## About scalability: "We apply a standard cleaning procedure (for similarity) and remove high out-degrees. In other words, if some vertex v has more than 10K followers (outdegree > 10K), we remove all these edges. (We do not remove the vertex, but rather only its out-edges.) Intuitively, the fact that two vertices are followed by v is not a useful signal for similarity. In flock and friendster, such vertices are typically spammers and should be ignored. For webgraphs, a page linking to more than 10K other pages is probably not useful for similarity measurement." - Typo: should be "more than 10K followeEs (outdegree > 10K)" not "followers". - Such a cleaning procedure makes sense, in addition, it has a valuable side effect: in practice, after applying such a cleaning procedure, many pairs of nodes with very small non-zero similarity (in the original directed graph) have a similarity of zero (in the new graph with the out-edges of the out-hubs removed). - Given that such a cleaning procedure is applied, a brute force approach (that computes the similarity of each pair of nodes sharing at least one in-neighbor) is scalable to a large extent. - The following C code and experiments support the previous claim: https://github.com/maxdan94/cosineSparseMatrix - If scalability is still an issue, in-edges of nodes with very high in-degree could also be removed. "In our example, this turns out to be more than 100 trillion triples. This is infeasible even for a large industrial-strength cluster.". Listing 100 trillion triples seems doable with a medium-strength cluster. "If a matrix A has a 100 billion non-zeroes, it takes upwards of 1TB just to store the entries. This is more than an order of magnitude of the storage of a commodity machine in a cluster. Any approach of partitioning A into submatrices cannot scale." Indeed, memory seems to be a problem. However, it is unclear why an approach partitioning A into submatrices cannot scale. ## Dataset: "The dataset friendster is a social network, and is available from the Stanford Large Network Dataset Collection [28]." It is unclear which friendster dataset is used. In SNAP, http://snap.stanford.edu/data/com-Friendster.html seems to be "undirected" and has 1.8B edges, not "1.6B edges" as written in Table 1 (even after removing edges of nodes with degree larger than 10,000). ## Typos and minors: - Inside the paper $O(100B)$ looks a bit strange, because according to [the definition](https://en.wikipedia.org/wiki/Big_O_notation#Formal_definition) the set $O(100B)$ is equal to $O(0.00001)$ - should be "more than 10K followeEs (outdegree > 10K)" not "followers".
Read the paper, add your comments…

Comments:

### Can be extended to k-cliques and k-motifs: - To list k-cliques: https://github.com/maxdan94/kClist - To list k-motifs: https://github.com/maxdan94/kmotif ### Typos: - ", cf.[23]," - "conductance problem.Notice that" - "(CNM) [12] , Cfinder" - "Louvaine" instead of "Louvain" in Table 2
Read the paper, add your comments…

Comments:

It's a very nice paper. There are some parts I don't fully understand yet, as I have to read more on hyperLoglog counter and webgraph framework. ### Comparison to existing methods: Doing some experimental comparisons to [Eppstein and Wang 2004](https://papers-gamma.link/paper/35) for the closeness centrality might be interesting. ### Typos: - $\mathscr{B}_{G}(x,r)=\{y | d(x,y) \leq r\}$ instead of $\mathscr{B}_{G}(x,r)=\{y | d(x,y) \leq t\}$ - $|\mathscr{B}_{G}(v,t)|-|\mathscr{B}_{G}(v,t-1)|$ instead of $|\mathscr{B}_{G}(v,t+1)|-|\mathscr{B}_{G}(v,t)|$. This is corrected in the other formula on centralities. - "the the reciprocal of a" - "can be easily computed in a cumulative fashion nothing that" - "on the approximation the diameter" - "its importance it by 1/2" ### Minors: - "Nodes with empty coreachable set have centrality 1 by definition" . By definition the coreachable set of a node is never empty, it contains at least the concerned node.
Read the paper, add your comments…
Pages: 1 2 3 4 5 6 7 8 9 10 11 12