The idea of coarse-graining the graph by merging "similar" nodes (using edge collapsing or star collapsing) before applying a graph embedding method is interesting. Bringing that idea to the next level by building a hierarchy of coarse-grained graphs is also relevant. The hierarchy of coarse-grained graphs allows finding a better initialization for embedding methods relying on deep learning. The two ways of collapsing (edge collapsing or star collapsing) are very simple and more sophisticated ways are not suggested: for instance based on modular decomposition or community detection. Maybe such more advanced ways lead to even better results. The experimental section shows that coarse-graining the input graph before applying a graph embedding methods leads to a "better" embedding. Indeed, using such an embedding improves the performance of the classification task compared to using the embedding obtained without coarse-graining. However, neither a theoretical justification of this fact is provided, nor an intuition. How about other machine learning tasks? For instance, does the suggested method also help for clustering and link prediction? The method does not make the embedding faster even though this might have been expected since the graphs are smaller. Figure 4: I do not understand in what the various level as relevant. To me, no level looks like the wished embedding. The graphs used in the experiments are extremely small (less than 30k nodes). It is not clear how the method would perform on large and sparse graphs, say 1G edges 100M nodes (for instance these: http://snap.stanford.edu/data/#communities).

Great paper! ### Input graph in main memory It seems that to compute the suggested ordering (later used to compress the input graph), the graph needs to be stored in the main memory (as an adjacency list). However, if the graph already fits in the main memory of the machine, then compressing it is less interesting. Some experiments are carried on huge graphs that do not fit in the main memory of the machine if not compressed. The trick is that the web graphs are already compressed (maybe with the lexicographic url ordering) to compute the suggested ordering, while the considered social networks are actually not that large and fit in the main memory of a commodity machine. Footnote 21: "It is possible in principle to avoid keeping the graph in the main memory, but the cost becomes $O(n \log n)$.". How can I do that? ### Heuristic to minimize the average gap cost For social networks, it is shown that the compression is highly correlated to the average gap cost (average log gaps) if the "intervalisation" of the BV framework is turned off. The authors note that the suggested ordering is excellent at minimizing this average gap cost. And that even though it does not seem to minimize it directly. Can a heuristic that is explicitly designed to minimize this average gap cost lead to a better compression? ### Typos: - ref lacking: "label propagation [RAK07, ?]" - "until it is possible to do so" -> "until it is not possible to do so" - ref lacking: "Absolute Pott Model (APM) [?]" - "tecniques" - "Some simple experiments not reported here shows that the same happen" -> "Some simple experiments not reported here show that the same happens"