  ##To take away:## - This paper is about a slight improvement of the $k$-clique Algorithm of Chiba and Nishizeki - The performance in practice on sparse graphs is impressive - The parallelization is non-trivial and the speedup is nearly optimal up to 40 threads - Authors generate a stream of k-cliques to compute "compact" subgraphs - A parallel C code is available here: https://github.com/maxdan94/kClist ##Suggestions to extend this work:## - Can we find a node ordering better than the core ordering? - Generate a stream of $k$-cliques to compute other quantities? - Generalize the algorithm to $k$-motifs? - Parallelization on higher order $k$-cliques if more threads are available?
> Another extension: can we guarantee a given order on the output stream? that it is uniformly random, for instance? I think that this is a very interesting and open question! I have tried to generate a stream of k-cliques such that the order is random by modifying the kClist algorithm. But I was not able to do so. I wanted to do that in order to optimize a function depending on the k-cliques using stochastic gradient descent. I have found that using a random ordering lead to a faster convergence than using the order the k-cliques are outputed by the kClist algorithm. Here is what I've tried: - If you have enough RAM, then you can of course store all k-cliques and do a [random permutation](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle). But, since you mention "steam", I do not think that this is the case for you. - You can use another node-ordering (different from the core-ordering) to form the DAG. You can use, for instance, a random node ordering. You may lose the theoretical upperbound on the running time, but you will see that, in practice, the algorithm is still very fast (say twice slower than with the core ordering (but this depends on the input graph and k, you may also find some settings where it is actually faster than with the core ordering)). The order the k-cliques are stream will then change, but it will not be uniform at random. - Once you have formed the DAG using the node ordering (core ordering or any other ordering), you do not need to process the nodes in that same order. You can use another random ordering for that. It will add some randomness in the stream, but the order will still not be uniform at random. Please let me know if you have any better ideas.
> Another extension: can we guarantee a given order on the output stream? that it is uniformly random, for instance? One possible way to do is using a buffer, however, it remains non uniform. A buffer of size n/100 can be filled at first using the first n/100 output. Afterwards, one K-clique is randomly selected to be « outputted » and replaced with a new k-clique. The larger the buffer, the closer the ouput will be to a uniformly random output.

### Very nice application to graph classification / clustering: Frequencies of motifs are graph's features that can be plugged in a Machine Learning algorithm in order to perform classification or clustering of a set of graphs. ### Very nice application to link prediction / recommendation: page 8 "Trends in pattern counts". Some motifs are rarely observed as induced motifs. Thus, if such a motif is seen at time $t$ in a dynamic graph, then an edge between a pair of its vertices might be created (or be deleted) at time $t+1$. Note that the framework only counts motifs without listing/enumerating them. Some adaptations are thus needed to deal with link prediction and recommendation. ### Application to graph generation: Generate a random graph with a given number of each of the 21 possible 5-motifs. ### Some statements might be a little too strong: - "As we see in our experiments, the counts of 5-vertex patterns are in the orders of billions to trillions, even for graphs with a few million edges. An enumeration algorithm is forced to touch each occurrence, and cannot terminate in a reasonable time." - The framework is absolutely critical for exact counting, since enumeration is clearly infeasible even for graphs with a million edges. For instance, an autonomous systems graph with 11M edges has $10^{11}$ instances of pattern 17 (in Fig. 1)." - "For example, the tech-as-skitter graph has 11M edges, but 2 trillion 5-cycles. Most 5-vertex pattern counts are upwards of many billions for most of the graphs (which have only 10M edges). Any algorithm that explicitly touches each such pattern is bound to fail." The following C code generates a program that counts from 1 to one trillion. This program terminates within 30 minutes on a commodity machine (with -O3 option for the compiler optimization). c main() { unsigned long long i=0; while (i ＜ 1e12) { i++; } }  This shows that "touching" billions or trillions of motifs might actually be possible in a reasonable amount of time. ### Minors: - "Degeneracy style ordering" might be confusing as a degree ordering is used and not a degeneracy ordering (i.e. k-core decomposition). - Figure 7: "The line is defined by...", should be "The prediction is defined by..." as the line is just "$y=x$". ### Typos: - "they do not scale graphs of such sizes." - "For example, the 6th 4-pattern in the four-clique and the 8th 5-pattern is the five-cycle." - "and $O\big(n+m+T(G)$ storage." - "An alternate (not equivalent) measure is the see the fraction of 4-cycle" - "an automonous systems graph with" ### Others: - https://oeis.org/A001349 - Is it possible to generalize the framework to any k?
### Nice validation of the obtained embedding: The obtained embedding seems useful for: (i) node clustering, (ii) link prediction and (iii) node classification. ### Concerns about the scalability of Louvain: The Louvain algorithm can process Orkut in a few seconds on a commodity machine, contrary to what Table 2 suggests. The obtained modularity is of 0.68 such as written in Table 3 (the table presenting the dataset). The value of 0.68 is much more than the modularity obtained thanks to embeddings and k-means. ### Table 2 reporting the average and worst case running time is confusing: In Table 2, $\Theta$ is used to denote the average running time and $O$ is used to denote the worst case running time. This is very confusing as worst-case running time can be given in terms of big-oh or theta. Definitions are given here: https://en.wikipedia.org/wiki/Big_O_notation It is also not clear how these asymptotic complexities were obtained, in particular, the asymptotic time and memory complexities of the suggested algorithm are not proven.
Hi! > If you see the matrix $W$ used in LINE as the similarity matrix you use in VERSE, then yes I think that what I have suggested is similar to LINE. So VERSE can be seen as using LINE on the input similarity matrix? Is that correct? VERSE can be seen as going beyond this ;) If we just plug similarities into LINE training objective, we observe that (Fig. 3, NS line) the objective does not recover similarities at all! This is very surprising, and suggests that current embedding methods such as node2vec or LINE do not optimize the actual objective they claim to optimize. When I discovered that, I thought this is just a bug in the code, so I have made a version of VERSE that optimizes the random walk based similarity from DeepWalk. It turned out that DeepWalk does not preserve the similarity ranking of these walks. > I've missed that. You mean that in LINE, "negative sampling" is used for the optimization; while in VERSE, "Noise Contrastive Estimation" is used? Yes! To overcome the aforementioned problem, we have proposed an NCE-based sampling solution. It seems to work much better than plain negative sampling, and show that it is close to the full neural factorization of the matrix (which is only possible for VERSE model)! > >> Perhaps, an interesting research direction would be something like embedding-guided modularity optimization. > > Maybe using the modularity-matrix (https://en.wikipedia.org/wiki/Modularity_(networks)#Matrix_formulation) as the similarity matrix used in VERSE? > This matrix can have negative entries though... Yes, modularity matrix would probably not suit the algorithm well, as it is not probabilistic per node. Maybe some reformulation would help, I am not aware of any to this day.