Interesting paper around the question: "Can machine learning be used to cheaply detect and exploit structure in practically relevant instances of NP-hard problems that come from the same distribution?". The problem of listing all maximum cliques in a graph is considered. A binary classifier is trained to discriminate between a node belonging to a maximum clique and a node that do not belong to a maximum clique: logistic regression is used with node features such as the total number of edges, node degree, clustering coefficient and trained on a set of graphs. The classifier is then used to prune the search space of a test graph: nodes that are labeled "not belonging to a maximum clique" by the classifier are removed. Some speedup is observed when comparing the running time of state-of-the-art methods on the original graph (or on a graph pruned using a baseline pruning method (degree pruning)) and on the pruned graph. # A more challenging baseline pruning For the baseline pruning: a heuristic is used to find a large clique and then if the found clique has size $k$, all nodes of degree $k-2$ or less are removed. It would have been better to iteratively remove all nodes of degree $k-2$ or less till there is no more such nodes and thus only keeping the $(k-1)$-core of the graph (it seems that only nodes of degree $k-2$ or less in the original graphs are removed). That would lead to much smaller graphs and a more challenging baseline pruning. How good is this k-core pruning? In table 1, the speedup obtained with degree pruning is not reported for igraph and EmMCE. Is the speedup obtained by the proposed pruning much better than the one obtained by the degree pruning for these two programs? # Setting the confidence threshold The confidence threshold used for the logistic regression classifier is arbitrarily fixed to $q=0.55$. If $q$ is higher, then the list of maximum cliques computed on the pruned graph may not be correct (i.e. it may not be the same as in the original graph as nodes belonging to a maximum clique would be pruned). It is not clear to me why all results are exact in table 1: just luck? How about using $q=0.6$ or $0.7$? Parameter sensitivity is not discussed. Note that if there is a single maximum clique of size $k$ and many cliques of size $k-1$ then removing a node from the maximum clique will completely change the output (which may then be many cliques of size $k-1$ instead of a single clique of size $k$).

Very well written paper proposing an interesting algorithm to solve (lossless and lossy) graph summarization problems (cf problem definition). The algorithm is parallelizable in a shared memory environment and can be distributed using mapreduce. The theoretical analysis is sound and shows that the algorithm has a linear time complexity in practical scenarios. The experimental evaluation shows that the algorithm achieves a better trade-of between time and quality than existing approaches. In particular, the algorithm is faster than the greedy heuristic while leading to similar results. The proposed algorithm can be seen as a scalable implementation of the greedy heuristic: using shingle hashing of nodes allows to prune the search space and consider only some relevant pairs of nodes instead of considering all pairs of nodes (or all pairs of nodes sharing at least one neighbor). ### No theoretical approximation guarantees: The graph summarization problem is well defined and has an optimal solution. The proposed algorithm does not seem to have any theoretical approximation guarantees. According to the experimental evaluation, the quality of the output is similar to the one of the (straightforward) greedy heuristic, but we do not know how far from optimality it is. Is there any algorithm for the problem with some theoretical approximation guarantees? ### Absolute performance: While the performance relatively to other algorithms is good, the absolute performance of the algorithm is somehow disappointing. Figure 3: we see that the size of the output graphs is larger than 40% the size of the input graphs (in terms of (super) edges) for all graphs (and in many cases larger than 70%) except for the two web graphs (it is known that web graphs can be compressed efficiently (cf the webgraph framework)) and surprisingly a protein interaction graph (only 10% of edges are kept).