The Community-search Problem and How to Plan a Successful Cocktail Party
Authors: Mauro Sozio, Aristides Gionis
Liked by: Maximimi
Domains: Graph mining
Tags: KDD2010
Uploaded by: Maximimi
Upload date: 2018-06-08 09:46:56


Overall a very nice paper. Given a small set of query nodes, how can we find its community (a "compact" connected subgraph strongly related to the query nodes)? An optimization is formalized and an efficient optimal algorithm (inspired by the k-core decomposition) to solve the optimization is devised. Efficient heuristics are also suggested. The following points could be improved. ### Not clear examples of monotone functions: - "Example 2 (Minimum degree) Let $f_m(G)$ be the minimum degree of any node in $G$. The function $f_m$ is monotone.". That is not true: when removing nodes from a given graph, the minimum degree can increase and then decrease. - "Example 3 (Distance) The functions $D_Q(G, v)$ and $D_Q(G)$, defined by Equations (1) and (2) are node-monotone and monotone, respectively." That is not true: "$D_Q(G)$" is not monotone: it can decrease and increase when removing nodes. In addition, with this definitions the query nodes should not be removed, this is not specified. - "A lower bound on the number of nodes in a graph is monotone". This is not clear. I think that what is meant is not that any function that is a lower bound on the number of nodes (e.g., $f(V)=|V|/2$) is monotone, but that requiring a lower bound on the number of nodes is monotone. - "$M_Q(G)=\max_{v\in V(G)}\{M_Q(G,v)\}$ is monotone". That is not true. - "The number of disjoint paths between two nodes (which is a popular measure for friendship strength) is node-monotone non-increasing." This is not clear, the function should of the form $f_m(G,v)$, is $v$ one of the two nodes? - "with a monotone function (such as maximum distance and minimum degree).". It is not clear what "maximum distance" means. If it means "the diameter of the graph", then this function is not monotone as it can increase or decrease when removing nodes. - Note that only node-monotone functions are of interest for the suggested method and monotone functions are not used. "Problem 3 (Cocktail party) We are given an undirected graph $G=(V, E)$, a node-monotone non-increasing function $f$, as well as a set of monotone non-increasing properties $f_1,...,f_k$. We seek to find an induced subgraph $H$ of $G$ that maximizes $f$ among all induced subgraphs of $G$ and satisfies $f_1,...,f_k$. A similar problem can be defined by considering to minimize monotone non-decreasing functions.". In that problem "monotone" should be replaced by "node-monotone". ### Experiments: - "We implemented our algorithms in Perl and all experiments run on a dual-core Opteron processor at 3GHz." The implementation does not seem to be publicly available. - I think that more informations on how to reconstruct the graphs on which the experiments are run could be given. Some of the datasets do not seem to be publically available. - In Table 1, I think that the first line is about the distance constraint. The number of query nodes and how they are picked does not seem to be specified. - As the experiments in Table 1, Figure 1 and Figure 2 are the results of an average, error bars could be given to see how significant the differences are. ### Implementation details and asymptotic complexity: - An asymptotic time complexity is not given. - Implementation details are lacking. In particular, it is not clear how to check that the query nodes are connected at each step. This does not seem to be so trivial. If a BFS has to be run everytime a node is deleted, then the algorithm would be in $\Omega(n^2)$ and might be too slow for large graphs. ### Comparison against connectivity subgraph (reference [10], [28] and [21]): "Faloutsos et al. [10], Tong et al. [28], as well as other researchers have studied the problem of finding a subgraph that connects a set of query nodes in a graph. The main difference of our approach is that we are not just interested in connecting the query nodes, but also in finding a meaningful community of query nodes." To me "community search" is similar to "connectivity subgraph": if the upper bound on the number of nodes is small enough then indeed "connectivity subgraph" can be seen as "just connecting the query nodes". However, if this upper bound $k$ is higher, then the goal of "connectivity subgraph" is to find $k$ nodes connecting the query nodes and relevant to it, which is similar to the goal of "community search". I think that a comparison to the "connectivity subgraph" line of work can be carried. At least for the "case study" on the communities of Papadimitriou in DBLP. ### Node-monotone functions: Maybe defining a node-monotone function as a function $f(G,U,v)$ of the input graph $G=(V,E)$, a set of vertices $U\subset V$ and a node $v\in U$ is interesting. This would allow taking the outside of the induced subgraph on $U$ into account. As an example: the number of neighbors $v$ has in $U$ is node-monotone non-increasing, but the number of neighbors $v$ has in $V\setminus U$ is node-monotone non-decreasing (this second function is hard to express with the definition of node-monotone function given in the paper). ### Typos: - "Graphs is one of most ubiquitous data representations" ->? "Graphs are one of the most ubiquitous data representations" - "there is need for" ->? "there is a need for" - "It has been one of them most well-studied problems" -> "one of the most" - "Brunch and bound" :) -> "Branch and bound" - "in order to extracting informations" -> "in order to extract informations" - "divided by all possible possible edges" - "that are far way from" ->? "far away from" - "We start by present an optimal algorithm" - "$G_{T−1}$": "T" -> "t". - "we still assume that $d=|V|$" -> "$d=|V|^3$" - "Now are are ready" - "The dist ance" - ", which we denote be $q=|Q|$" - "for the the heuristic" - "This reason is that" -> "The reason is that" - "with our heuristics This is shown in": "." lacking. - " we think that it is quite remarkable that GreedyDist finds subgraph average distance more than 5" -> "finds a subgraph of average degree more than 5", and not "distance". - "we can see from indeed" - "it belongs in" ->? "it belongs to" - "The aim is to find compact a community" ->? "The aim is to find a compact community" - "the situation is not so agreeable" ->? "the situation is not so pleasant" - "and it is densely connected" ->? "and is densely connected"
Maximimi at 2018-06-28 13:09:36
Edited by Maximimi at 2018-07-05 09:50:24

Please consider to register or login to comment on the paper.