together with brief user bio and description of her/his academic activity.

to appear...

☆

0

Very well written paper proposing an interesting algorithm to solve (lossless and lossy) graph summarization problems (cf problem definition). The algorithm is parallelizable in a shared memory environment and can be distributed using mapreduce. The theoretical analysis is sound and shows that the algorithm has a linear time complexity in practical scenarios. The experimental evaluation shows that the algorithm achieves a better trade-of between time and quality than existing approaches. In particular, the algorithm is faster than the greedy heuristic while leading to similar results. The proposed algorithm can be seen as a scalable implementation of the greedy heuristic: using shingle hashing of nodes allows to prune the search space and consider only some relevant pairs of nodes instead of considering all pairs of nodes (or all pairs of nodes sharing at least one neighbor).
### No theoretical approximation guarantees:
The graph summarization problem is well defined and has an optimal solution. The proposed algorithm does not seem to have any theoretical approximation guarantees. According to the experimental evaluation, the quality of the output is similar to the one of the (straightforward) greedy heuristic, but we do not know how far from optimality it is.
Is there any algorithm for the problem with some theoretical approximation guarantees?
### Absolute performance:
While the performance relatively to other algorithms is good, the absolute performance of the algorithm is somehow disappointing. Figure 3: we see that the size of the output graphs is larger than 40% the size of the input graphs (in terms of (super) edges) for all graphs (and in many cases larger than 70%) except for the two web graphs (it is known that web graphs can be compressed efficiently (cf the webgraph framework)) and surprisingly a protein interaction graph (only 10% of edges are kept).

☆

1

Hello, I need the key point of your algorithm is not really clear
> • We start with the root such that all edges are possibly
> here or not. The upper bound is the sum of the $n \choose 2$
> heaviest edges, while the associated solution is the empty
> subgraph, the lower bound is thus 0.
>
> • Then at each iteration we create two children for the node
> with maximum lower bound (i.e. density of the associated
> solution). Suppose the node is at depth $i$ in the tree, we
> keep the decisions made on the first $i-1$ edges and create
> two children, one where the $i^\text{th}$ edge is included and one
> where it is not.
I need to understand this part in clear way if that is possible please

> I need to understand this part in clear way if that is possible please
I see. I agree it is not perfectly clear, sorry about that.
Can you try to understand it with the branch and bound [wikipedia page](https://en.wikipedia.org/wiki/Branch_and_bound), [the slides](https://drive.google.com/file/d/0B6cGK503Ibt0Qlg3bUVKRnFBTG8/view) and [the code](https://github.com/maxdan94/HkS)?
If it is still not clear after that, please come back and I'll try to phrase a better explanation ASAP.

## Comments: