Overall a very nice paper. Given a small set of query nodes, how can we find its community (a "compact" connected subgraph strongly related to the query nodes)?
An optimization is formalized and an efficient optimal algorithm (inspired by the k-core decomposition) to solve the optimization is devised. Efficient heuristics are also suggested.
The following points could be improved.
### Not clear examples of monotone functions:
- "Example 2 (Minimum degree) Let $f_m(G)$ be the minimum degree of any node in $G$. The function $f_m$ is monotone.". That is not true: when removing nodes from a given graph, the minimum degree can increase and then decrease.
- "Example 3 (Distance) The functions $D_Q(G, v)$ and $D_Q(G)$, defined by Equations (1) and (2) are node-monotone and monotone, respectively." That is not true: "$D_Q(G)$" is not monotone: it can decrease and increase when removing nodes. In addition, with this definitions the query nodes should not be removed, this is not specified.
- "A lower bound on the number of nodes in a graph is monotone". This is not clear. I think that what is meant is not that any function that is a lower bound on the number of nodes (e.g., $f(V)=|V|/2$) is monotone, but that requiring a lower bound on the number of nodes is monotone.
- "$M_Q(G)=\max_{v\in V(G)}\{M_Q(G,v)\}$ is monotone". That is not true.
- "The number of disjoint paths between two nodes (which is a popular measure for friendship strength) is node-monotone non-increasing." This is not clear, the function should of the form $f_m(G,v)$, is $v$ one of the two nodes?
- "with a monotone function (such as maximum distance and minimum degree).". It is not clear what "maximum distance" means. If it means "the diameter of the graph", then this function is not monotone as it can increase or decrease when removing nodes.
- Note that only node-monotone functions are of interest for the suggested method and monotone functions are not used.
"Problem 3 (Cocktail party) We are given an undirected graph $G=(V, E)$, a node-monotone non-increasing function $f$, as well as a set of monotone non-increasing properties $f_1,...,f_k$. We seek to find an induced subgraph $H$ of $G$ that maximizes $f$ among all induced subgraphs of $G$ and satisfies $f_1,...,f_k$. A similar problem can be defined by considering to minimize monotone non-decreasing functions.". In that problem "monotone" should be replaced by "node-monotone".
### Experiments:
- "We implemented our algorithms in Perl and all experiments run on a dual-core Opteron processor at 3GHz." The implementation does not seem to be publicly available.
- I think that more informations on how to reconstruct the graphs on which the experiments are run could be given. Some of the datasets do not seem to be publically available.
- In Table 1, I think that the first line is about the distance constraint. The number of query nodes and how they are picked does not seem to be specified.
- As the experiments in Table 1, Figure 1 and Figure 2 are the results of an average, error bars could be given to see how significant the differences are.
### Implementation details and asymptotic complexity:
- An asymptotic time complexity is not given.
- Implementation details are lacking. In particular, it is not clear how to check that the query nodes are connected at each step. This does not seem to be so trivial. If a BFS has to be run everytime a node is deleted, then the algorithm would be in $\Omega(n^2)$ and might be too slow for large graphs.
### Comparison against connectivity subgraph (reference [10], [28] and [21]):
"Faloutsos et al. [10], Tong et al. [28], as well as other researchers have studied the problem of finding a subgraph that connects a set of query nodes in a graph. The main difference of our approach is that we are not just interested in connecting the query nodes, but also in finding a meaningful community of query nodes."
To me "community search" is similar to "connectivity subgraph": if the upper bound on the number of nodes is small enough then indeed "connectivity subgraph" can be seen as "just connecting the query nodes". However, if this upper bound $k$ is higher, then the goal of "connectivity subgraph" is to find $k$ nodes connecting the query nodes and relevant to it, which is similar to the goal of "community search".
I think that a comparison to the "connectivity subgraph" line of work can be carried. At least for the "case study" on the communities of Papadimitriou in DBLP.
### Node-monotone functions:
Maybe defining a node-monotone function as a function $f(G,U,v)$ of the input graph $G=(V,E)$, a set of vertices $U\subset V$ and a node $v\in U$ is interesting. This would allow taking the outside of the induced subgraph on $U$ into account. As an example: the number of neighbors $v$ has in $U$ is node-monotone non-increasing, but the number of neighbors $v$ has in $V\setminus U$ is node-monotone non-decreasing (this second function is hard to express with the definition of node-monotone function given in the paper).
### Typos:
- "Graphs is one of most ubiquitous data representations" ->? "Graphs are one of the most ubiquitous data representations"
- "there is need for" ->? "there is a need for"
- "It has been one of them most well-studied problems" -> "one of the most"
- "Brunch and bound" :) -> "Branch and bound"
- "in order to extracting informations" -> "in order to extract informations"
- "divided by all possible possible edges"
- "that are far way from" ->? "far away from"
- "We start by present an optimal algorithm"
- "$G_{T−1}$": "T" -> "t".
- "we still assume that $d=|V|$" -> "$d=|V|^3$"
- "Now are are ready"
- "The dist ance"
- ", which we denote be $q=|Q|$"
- "for the the heuristic"
- "This reason is that" -> "The reason is that"
- "with our heuristics This is shown in": "." lacking.
- " we think that it is quite remarkable that GreedyDist finds subgraph average distance more than 5" -> "finds a subgraph of average degree more than 5", and not "distance".
- "we can see from indeed"
- "it belongs in" ->? "it belongs to"
- "The aim is to find compact a community" ->? "The aim is to find a compact community"
- "the situation is not so agreeable" ->? "the situation is not so pleasant"
- "and it is densely connected" ->? "and is densely connected"
Overall a very nice paper. Given a small set of query nodes, how can we find its community (a "compact" connected subgraph strongly related to the query nodes)?
An optimization is formalized and an efficient optimal algorithm (inspired by the k-core decomposition) to solve the optimization is devised. Efficient heuristics are also suggested.
The following points could be improved.
### Not clear examples of monotone functions:
- "Example 2 (Minimum degree) Let $f_m(G)$ be the minimum degree of any node in $G$. The function $f_m$ is monotone.". That is not true: when removing nodes from a given graph, the minimum degree can increase and then decrease.
- "Example 3 (Distance) The functions $D_Q(G, v)$ and $D_Q(G)$, defined by Equations (1) and (2) are node-monotone and monotone, respectively." That is not true: "$D_Q(G)$" is not monotone: it can decrease and increase when removing nodes. In addition, with this definitions the query nodes should not be removed, this is not specified.
- "A lower bound on the number of nodes in a graph is monotone". This is not clear. I think that what is meant is not that any function that is a lower bound on the number of nodes (e.g., $f(V)=|V|/2$) is monotone, but that requiring a lower bound on the number of nodes is monotone.
- "$M_Q(G)=\max_{v\in V(G)}\{M_Q(G,v)\}$ is monotone". That is not true.
- "The number of disjoint paths between two nodes (which is a popular measure for friendship strength) is node-monotone non-increasing." This is not clear, the function should of the form $f_m(G,v)$, is $v$ one of the two nodes?
- "with a monotone function (such as maximum distance and minimum degree).". It is not clear what "maximum distance" means. If it means "the diameter of the graph", then this function is not monotone as it can increase or decrease when removing nodes.
- Note that only node-monotone functions are of interest for the suggested method and monotone functions are not used.
"Problem 3 (Cocktail party) We are given an undirected graph $G=(V, E)$, a node-monotone non-increasing function $f$, as well as a set of monotone non-increasing properties $f_1,...,f_k$. We seek to find an induced subgraph $H$ of $G$ that maximizes $f$ among all induced subgraphs of $G$ and satisfies $f_1,...,f_k$. A similar problem can be defined by considering to minimize monotone non-decreasing functions.". In that problem "monotone" should be replaced by "node-monotone".
### Experiments:
- "We implemented our algorithms in Perl and all experiments run on a dual-core Opteron processor at 3GHz." The implementation does not seem to be publicly available.
- I think that more informations on how to reconstruct the graphs on which the experiments are run could be given. Some of the datasets do not seem to be publically available.
- In Table 1, I think that the first line is about the distance constraint. The number of query nodes and how they are picked does not seem to be specified.
- As the experiments in Table 1, Figure 1 and Figure 2 are the results of an average, error bars could be given to see how significant the differences are.
### Implementation details and asymptotic complexity:
- An asymptotic time complexity is not given.
- Implementation details are lacking. In particular, it is not clear how to check that the query nodes are connected at each step. This does not seem to be so trivial. If a BFS has to be run everytime a node is deleted, then the algorithm would be in $\Omega(n^2)$ and might be too slow for large graphs.
### Comparison against connectivity subgraph (reference [10], [28] and [21]):
"Faloutsos et al. [10], Tong et al. [28], as well as other researchers have studied the problem of finding a subgraph that connects a set of query nodes in a graph. The main difference of our approach is that we are not just interested in connecting the query nodes, but also in finding a meaningful community of query nodes."
To me "community search" is similar to "connectivity subgraph": if the upper bound on the number of nodes is small enough then indeed "connectivity subgraph" can be seen as "just connecting the query nodes". However, if this upper bound $k$ is higher, then the goal of "connectivity subgraph" is to find $k$ nodes connecting the query nodes and relevant to it, which is similar to the goal of "community search".
I think that a comparison to the "connectivity subgraph" line of work can be carried. At least for the "case study" on the communities of Papadimitriou in DBLP.
### Node-monotone functions:
Maybe defining a node-monotone function as a function $f(G,U,v)$ of the input graph $G=(V,E)$, a set of vertices $U\subset V$ and a node $v\in U$ is interesting. This would allow taking the outside of the induced subgraph on $U$ into account. As an example: the number of neighbors $v$ has in $U$ is node-monotone non-increasing, but the number of neighbors $v$ has in $V\setminus U$ is node-monotone non-decreasing (this second function is hard to express with the definition of node-monotone function given in the paper).
### Typos:
- "Graphs is one of most ubiquitous data representations" ->? "Graphs are one of the most ubiquitous data representations"
- "there is need for" ->? "there is a need for"
- "It has been one of them most well-studied problems" -> "one of the most"
- "Brunch and bound" :) -> "Branch and bound"
- "in order to extracting informations" -> "in order to extract informations"
- "divided by all possible possible edges"
- "that are far way from" ->? "far away from"
- "We start by present an optimal algorithm"
- "$G_{T−1}$": "T" -> "t".
- "we still assume that $d=|V|$" -> "$d=|V|^3$"
- "Now are are ready"
- "The dist ance"
- ", which we denote be $q=|Q|$"
- "for the the heuristic"
- "This reason is that" -> "The reason is that"
- "with our heuristics This is shown in": "." lacking.
- " we think that it is quite remarkable that GreedyDist finds subgraph average distance more than 5" -> "finds a subgraph of average degree more than 5", and not "distance".
- "we can see from indeed"
- "it belongs in" ->? "it belongs to"
- "The aim is to find compact a community" ->? "The aim is to find a compact community"
- "the situation is not so agreeable" ->? "the situation is not so pleasant"
- "and it is densely connected" ->? "and is densely connected"
### Very nice associated website:
http://web.stanford.edu/~montanar/SDPgraph/
Slides: http://www.lps.ens.fr/~krzakala/LESHOUCHES2017/talks/LesHouches2017_RicciTersenghi.pdf
### Generalising to the Goemans-Williamson algorithm:
Can we implement the Goemans-Williamson algorithm using this "spin technique"?
This is a try: https://github.com/maxdan94/spingraphSDP
I think that it may be interesting as it could lead to a scalable implementation of the Goemans-Williamson algorithm.
### Minors:
- In the main text, some equation numbers correspond to the equations in the supplementary material, e.g. "(42)", "(49)" or "(288)". This makes the paper hard to read. "(288)" should be replaced by "(10)".
- "solving the combinatorial optimization problem in Eq. (5) is a prototypical NP-complete problem." This is not a decision problem and thus it cannot be NP-complete by definition.
- "(total CPU time was about 10 years)" This is not clear: what does it mean exactly? Which type of machines were used?
### Very nice associated website:
http://web.stanford.edu/~montanar/SDPgraph/
Slides: http://www.lps.ens.fr/~krzakala/LESHOUCHES2017/talks/LesHouches2017_RicciTersenghi.pdf
### Generalising to the Goemans-Williamson algorithm:
Can we implement the Goemans-Williamson algorithm using this "spin technique"?
This is a try: https://github.com/maxdan94/spingraphSDP
I think that it may be interesting as it could lead to a scalable implementation of the Goemans-Williamson algorithm.
### Minors:
- In the main text, some equation numbers correspond to the equations in the supplementary material, e.g. "(42)", "(49)" or "(288)". This makes the paper hard to read. "(288)" should be replaced by "(10)".
- "solving the combinatorial optimization problem in Eq. (5) is a prototypical NP-complete problem." This is not a decision problem and thus it cannot be NP-complete by definition.
- "(total CPU time was about 10 years)" This is not clear: what does it mean exactly? Which type of machines were used?
### Wrong reference:
I read "It is also known that, for $m >\sqrt{2 n}$, the problem (49) has no local optima except the global ones [BM03]."
In [BM03], which is [this paper](https://papers-gamma.link/paper/55), there is no such proof!
We can only read:
"Theorem 2.2. There exists an optimal solution $X^∗$ of (1) with rank r satisfying the inequality $r(r+ 1)/2 ≤ m$." and
"we present computational results which show that the method finds optimal solutions to (2) quite reliably, and although we are able to derive some amount of theoretical justification for this, our belief that the method is not strongly affected by the inherent nonconvexity of (2) is largely experimental"
but not that any local minimum is a global minimum.
### Wrong reference:
I read "It is also known that, for $m >\sqrt{2 n}$, the problem (49) has no local optima except the global ones [BM03]."
In [BM03], which is [this paper](https://papers-gamma.link/paper/55), there is no such proof!
We can only read:
"Theorem 2.2. There exists an optimal solution $X^∗$ of (1) with rank r satisfying the inequality $r(r+ 1)/2 ≤ m$." and
"we present computational results which show that the method finds optimal solutions to (2) quite reliably, and although we are able to derive some amount of theoretical justification for this, our belief that the method is not strongly affected by the inherent nonconvexity of (2) is largely experimental"
but not that any local minimum is a global minimum.
> ### Minors:
>
> - "(total CPU time was about 10 years)" This is not clear: what does it mean exactly? Which type of machines were used?
Although the algorithm based on the minimization of a non-convex cost function in terms of m-component variables is extremely fast to solve a single SBM problem (see e.g. figure 5 in https://arxiv.org/pdf/1603.09045, where many numerical experiments are reported) the precise determination of the critical threshold in the sparse case (i.e. SBM) required a large numerical effort of more than 80,000 core hours (we run all out simulations on Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz).
> ### Minors:
>
> - "(total CPU time was about 10 years)" This is not clear: what does it mean exactly? Which type of machines were used?
Although the algorithm based on the minimization of a non-convex cost function in terms of m-component variables is extremely fast to solve a single SBM problem (see e.g. figure 5 in https://arxiv.org/pdf/1603.09045, where many numerical experiments are reported) the precise determination of the critical threshold in the sparse case (i.e. SBM) required a large numerical effort of more than 80,000 core hours (we run all out simulations on Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz).
Comments: