### Concerns: - It seems that the method needs to store $L$ integers per node. That is a total of $N*L$ integers while storing the graph requires $2*M$ integers (where $M$ is the number of edges), thus the method is useful only if $L<2*M/N$ (that is if $L$ smaller than the average degree). In the experiments, the average degree of the three graphs is approximately 7, 55 and 42, while $L$ is 50 or 80. There is thus no gain in space compared to a good implementation of a straightforward algorithm storing the full graph. - Once the sketch for each node is obtained, it is not clear how to obtain each relevant pair of nodes and its similarity. It seems that each possible pair of nodes has to be considered which is very under-optimal and not scalable. - Each graph used in the experiments easily fits in the RAM of any commodity machine. For such small graphs, an exact method storing the full graph should be faster in practice. ### Alternative approach: https://github.com/maxdan94/NeighSim
This a nice paper and I think you manage to keep your article easily readable. Specially the figures, such as figure 3, helped me follow your explanations. I haven't finished reading the paper but I have minor remarks/typo: - Page 9 in the definition of multigraph, should e(v, V) be transformed to e({v},V) in order to be completely rigorous? - I don't know what “maximum-entropy sampling” is but why quoting it ? - I don't get the use of Reversible Decompression, apart from being an extreme case of external information. I don't think this case deserve to be explained in detailed, as it does not have any impact on the understanding of the paper. - I find the notation for definition 10 a bit confusing. A Cartesian MultiEdge Partition is made of Cartesian products of two vertex subsets. The notation for each cartesian product uses subscript on vertex subset: $(V_i \times V'_i)$ for the $i$th product. This lead me to think that $(V_i \times V'_i)$ was a possible Cartesian product. This also lead me to think that Cartesian MultiEdge Partition are made of Cartesian product of vertex partition: $B(V \times V) = \{(V_i \times V_j)\}\ \forall V_i \in P_1, V_j \in P_2 \wedge P_1 \in B(V), P_2 \in B(V)$. I think all Cartesian product of vertex partition are Cartesian MultiEdge partition. Is there any Cartesian MultiEdge Partition that is not a Cartesian product of vertex partition? - In the conplete case after definition 11, I don't know the notation $\omega$ for the number of feasible partitions. - I think you missed a $\lambda$ in the lagrange function page 21. Should it be $q_{\lambda}(VVT)= |VVT| + \lambda Loss(VVT)$