### Nice validation of the obtained embedding: The obtained embedding seems useful for: (i) node clustering, (ii) link prediction and (iii) node classification. ### Concerns about the scalability of Louvain: The Louvain algorithm can process Orkut in a few seconds on a commodity machine, contrary to what Table 2 suggests. The obtained modularity is of 0.68 such as written in Table 3 (the table presenting the dataset). The value of 0.68 is much more than the modularity obtained thanks to embeddings and k-means. ### Table 2 reporting the average and worst case running time is confusing: In Table 2, $\Theta$ is used to denote the average running time and $O$ is used to denote the worst case running time. This is very confusing as worst-case running time can be given in terms of big-oh or theta. Definitions are given here: https://en.wikipedia.org/wiki/Big_O_notation It is also not clear how these asymptotic complexities were obtained, in particular, the asymptotic time and memory complexities of the suggested algorithm are not proven.
### VERSE Embedding Model "We define the unnormalized distance between two nodes u,v in the embedding space as the dot product of their embeddings Wu·W⊤v. The similarity distribution in the embedded space is then normalized with softmax." What is the intuition behind the softmax?
> Perhaps, an interesting research direction would be something like embedding-guided modularity optimization. Maybe using the modularity-matrix (https://en.wikipedia.org/wiki/Modularity_(networks)#Matrix_formulation) as the similarity matrix used in VERSE? This matrix can have negative entries though...
Hi! > If you see the matrix $W$ used in LINE as the similarity matrix you use in VERSE, then yes I think that what I have suggested is similar to LINE. So VERSE can be seen as using LINE on the input similarity matrix? Is that correct? VERSE can be seen as going beyond this ;) If we just plug similarities into LINE training objective, we observe that (Fig. 3, NS line) the objective does not recover similarities at all! This is very surprising, and suggests that current embedding methods such as node2vec or LINE do not optimize the actual objective they claim to optimize. When I discovered that, I thought this is just a bug in the code, so I have made a version of VERSE that optimizes the random walk based similarity from DeepWalk. It turned out that DeepWalk does not preserve the similarity ranking of these walks. > I've missed that. You mean that in LINE, "negative sampling" is used for the optimization; while in VERSE, "Noise Contrastive Estimation" is used? Yes! To overcome the aforementioned problem, we have proposed an NCE-based sampling solution. It seems to work much better than plain negative sampling, and show that it is close to the full neural factorization of the matrix (which is only possible for VERSE model)! > >> Perhaps, an interesting research direction would be something like embedding-guided modularity optimization. > > Maybe using the modularity-matrix (https://en.wikipedia.org/wiki/Modularity_(networks)#Matrix_formulation) as the similarity matrix used in VERSE? > This matrix can have negative entries though... Yes, modularity matrix would probably not suit the algorithm well, as it is not probabilistic per node. Maybe some reformulation would help, I am not aware of any to this day.

Overall a very nice paper. Given a small set of query nodes, how can we find its community (a "compact" connected subgraph strongly related to the query nodes)? An optimization is formalized and an efficient optimal algorithm (inspired by the k-core decomposition) to solve the optimization is devised. Efficient heuristics are also suggested. The following points could be improved. ### Not clear examples of monotone functions: - "Example 2 (Minimum degree) Let $f_m(G)$ be the minimum degree of any node in $G$. The function $f_m$ is monotone.". That is not true: when removing nodes from a given graph, the minimum degree can increase and then decrease. - "Example 3 (Distance) The functions $D_Q(G, v)$ and $D_Q(G)$, defined by Equations (1) and (2) are node-monotone and monotone, respectively." That is not true: "$D_Q(G)$" is not monotone: it can decrease and increase when removing nodes. In addition, with this definitions the query nodes should not be removed, this is not specified. - "A lower bound on the number of nodes in a graph is monotone". This is not clear. I think that what is meant is not that any function that is a lower bound on the number of nodes (e.g., $f(V)=|V|/2$) is monotone, but that requiring a lower bound on the number of nodes is monotone. - "$M_Q(G)=\max_{v\in V(G)}\{M_Q(G,v)\}$ is monotone". That is not true. - "The number of disjoint paths between two nodes (which is a popular measure for friendship strength) is node-monotone non-increasing." This is not clear, the function should of the form $f_m(G,v)$, is $v$ one of the two nodes? - "with a monotone function (such as maximum distance and minimum degree).". It is not clear what "maximum distance" means. If it means "the diameter of the graph", then this function is not monotone as it can increase or decrease when removing nodes. - Note that only node-monotone functions are of interest for the suggested method and monotone functions are not used. "Problem 3 (Cocktail party) We are given an undirected graph $G=(V, E)$, a node-monotone non-increasing function $f$, as well as a set of monotone non-increasing properties $f_1,...,f_k$. We seek to find an induced subgraph $H$ of $G$ that maximizes $f$ among all induced subgraphs of $G$ and satisfies $f_1,...,f_k$. A similar problem can be defined by considering to minimize monotone non-decreasing functions.". In that problem "monotone" should be replaced by "node-monotone". ### Experiments: - "We implemented our algorithms in Perl and all experiments run on a dual-core Opteron processor at 3GHz." The implementation does not seem to be publicly available. - I think that more informations on how to reconstruct the graphs on which the experiments are run could be given. Some of the datasets do not seem to be publically available. - In Table 1, I think that the first line is about the distance constraint. The number of query nodes and how they are picked does not seem to be specified. - As the experiments in Table 1, Figure 1 and Figure 2 are the results of an average, error bars could be given to see how significant the differences are. ### Implementation details and asymptotic complexity: - An asymptotic time complexity is not given. - Implementation details are lacking. In particular, it is not clear how to check that the query nodes are connected at each step. This does not seem to be so trivial. If a BFS has to be run everytime a node is deleted, then the algorithm would be in $\Omega(n^2)$ and might be too slow for large graphs. ### Comparison against connectivity subgraph (reference [10], [28] and [21]): "Faloutsos et al. [10], Tong et al. [28], as well as other researchers have studied the problem of finding a subgraph that connects a set of query nodes in a graph. The main difference of our approach is that we are not just interested in connecting the query nodes, but also in finding a meaningful community of query nodes." To me "community search" is similar to "connectivity subgraph": if the upper bound on the number of nodes is small enough then indeed "connectivity subgraph" can be seen as "just connecting the query nodes". However, if this upper bound $k$ is higher, then the goal of "connectivity subgraph" is to find $k$ nodes connecting the query nodes and relevant to it, which is similar to the goal of "community search". I think that a comparison to the "connectivity subgraph" line of work can be carried. At least for the "case study" on the communities of Papadimitriou in DBLP. ### Node-monotone functions: Maybe defining a node-monotone function as a function $f(G,U,v)$ of the input graph $G=(V,E)$, a set of vertices $U\subset V$ and a node $v\in U$ is interesting. This would allow taking the outside of the induced subgraph on $U$ into account. As an example: the number of neighbors $v$ has in $U$ is node-monotone non-increasing, but the number of neighbors $v$ has in $V\setminus U$ is node-monotone non-decreasing (this second function is hard to express with the definition of node-monotone function given in the paper). ### Typos: - "Graphs is one of most ubiquitous data representations" ->? "Graphs are one of the most ubiquitous data representations" - "there is need for" ->? "there is a need for" - "It has been one of them most well-studied problems" -> "one of the most" - "Brunch and bound" :) -> "Branch and bound" - "in order to extracting informations" -> "in order to extract informations" - "divided by all possible possible edges" - "that are far way from" ->? "far away from" - "We start by present an optimal algorithm" - "$G_{T−1}$": "T" -> "t". - "we still assume that $d=|V|$" -> "$d=|V|^3$" - "Now are are ready" - "The dist ance" - ", which we denote be $q=|Q|$" - "for the the heuristic" - "This reason is that" -> "The reason is that" - "with our heuristics This is shown in": "." lacking. - " we think that it is quite remarkable that GreedyDist finds subgraph average distance more than 5" -> "finds a subgraph of average degree more than 5", and not "distance". - "we can see from indeed" - "it belongs in" ->? "it belongs to" - "The aim is to find compact a community" ->? "The aim is to find a compact community" - "the situation is not so agreeable" ->? "the situation is not so pleasant" - "and it is densely connected" ->? "and is densely connected"
### Wrong reference: I read "It is also known that, for $m >\sqrt{2 n}$, the problem (49) has no local optima except the global ones [BM03]." In [BM03], which is [this paper](https://papers-gamma.link/paper/55), there is no such proof! We can only read: "Theorem 2.2. There exists an optimal solution $X^∗$ of (1) with rank r satisfying the inequality $r(r+ 1)/2 ≤ m$." and "we present computational results which show that the method finds optimal solutions to (2) quite reliably, and although we are able to derive some amount of theoretical justification for this, our belief that the method is not strongly affected by the inherent nonconvexity of (2) is largely experimental" but not that any local minimum is a global minimum.