together with brief user bio and description of her/his academic activity.

[Link to my homepage](https://sites.google.com/view/danisch/home)
## I will read the following papers.
- [Quasi-Succinct Indices](https://papers-gamma.link/paper/130)
- [PageRank as a Function of the Damping Factor](https://papers-gamma.link/paper/106)
- [Graph Stream Algorithms: A Survey](https://papers-gamma.link/paper/102)
- [Network Sampling: From Static to Streaming Graphs](https://papers-gamma.link/paper/122)
- [The Protein-Folding Problem, 50 Years On](https://papers-gamma.link/paper/78)
- [Computational inference of gene regulatory networks: Approaches, limitations and opportunitie](https://papers-gamma.link/paper/77)
- [Graph complexity analysis identifies an ETV5 tumor-specific network in human and murine low-grade glioma](https://papers-gamma.link/paper/79)
- [Gene Networks in Plant Biology: Approaches in Reconstruction and Analysis](https://papers-gamma.link/paper/76)
- [The non-convex Burer–Monteiro approach works on smooth semidefinite programs](https://papers-gamma.link/paper/80)
- [Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality](https://papers-gamma.link/paper/81)
- [Influence maximization in complex networks through optimal percolation](https://papers-gamma.link/paper/70)
- [Motifs in Temporal Networks](https://papers-gamma.link/paper/61)
- [Deep Sparse Rectifier Neural Networks](https://papers-gamma.link/paper/69)
- [Sparse Convolutional Neural Networks](https://papers-gamma.link/paper/67)
- [A fast and simple algorithm for training neural probabilistic language models](https://papers-gamma.link/paper/58)
- [Adding One Neuron Can Eliminate All Bad Local Minima](https://papers-gamma.link/paper/71)
## I read the following papers.
### 2018-2019:
- [SWeG: Lossless and Lossy Summarization of Web-Scale Graphs](https://papers-gamma.link/paper/139)
- [Smoothed Analysis: An Attempt to Explain the Behavior of Algorithms in Practice](https://papers-gamma.link/paper/129)
- [Are stable instances easy?](https://papers-gamma.link/paper/128)
- [Hierarchical Taxonomy Aware Network Embedding](https://papers-gamma.link/paper/116)
- [Billion-scale Network Embedding with Iterative Random Projection](https://papers-gamma.link/paper/110)
- [HARP: Hierarchical Representation Learning for Networks](https://papers-gamma.link/paper/109/)
- [Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks](https://papers-gamma.link/paper/105)
### 2017-2018:
- [Link Prediction in Graph Streams](https://papers-gamma.link/paper/101)
- [The Community-search Problem and How to Plan a Successful Cocktail Party](https://papers-gamma.link/paper/74)
- [A Nonlinear Programming Algorithm for Solving Semidefinite Programs via Low-rank Factorization](https://papers-gamma.link/paper/55)
- [Deep Learning](https://papers-gamma.link/paper/68)
- [Reducing the Dimensionality of Data with Neural Networks](https://papers-gamma.link/paper/65)
- [Representation Learning on Graphs: Methods and Applications](https://papers-gamma.link/paper/60)
- [Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION](https://papers-gamma.link/paper/56)
- [Cauchy Graph Embedding](https://papers-gamma.link/paper/53)
- [Phase Transitions in Semidefinite Relaxations](https://papers-gamma.link/paper/57)
- [Graph Embedding Techniques, Applications, and Performance: A Survey](https://papers-gamma.link/paper/52)
- [VERSE: Versatile Graph Embeddings from Similarity Measures](https://papers-gamma.link/paper/48)
- [Hierarchical Clustering Beyond the Worst-Case](https://papers-gamma.link/paper/45)
- [Scalable Motif-aware Graph Clustering](https://papers-gamma.link/paper/18)
- [Practical Algorithms for Linear Boolean-width](https://papers-gamma.link/paper/40)
- [New Perspectives and Methods in Link Prediction](https://papers-gamma.link/paper/28/New%20Perspectives%20and%20Methods%20in%20Link%20Prediction)
- [In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond](https://papers-gamma.link/paper/37)
- [Diversity is All You Need: Learning Skills without a Reward Function](https://papers-gamma.link/paper/36)
- [When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors](https://papers-gamma.link/paper/23)
- [Fast Approximation of Centrality](https://papers-gamma.link/paper/35/Fast%20Approximation%20of%20Centrality)
- [Indexing Public-Private Graphs](https://papers-gamma.link/paper/19/Indexing%20Public-Private%20Graphs)
- [On the uniform generation of random graphs with prescribed degree sequences](https://papers-gamma.link/paper/26/On%20the%20uniform%20generation%20of%20random%20graphs%20with%20prescribed%20d%20egree%20sequences)
- [Linear Additive Markov Processes](https://papers-gamma.link/paper/21/Linear%20Additive%20Markov%20Processes)
- [ESCAPE: Efficiently Counting All 5-Vertex Subgraphs](https://papers-gamma.link/paper/17/ESCAPE:%20Efficiently%20Counting%20All%205-Vertex%20Subgraphs)
- [The k-peak Decomposition: Mapping the Global Structure of Graphs](https://papers-gamma.link/paper/16/The%20k-peak%20Decomposition:%20Mapping%20the%20Global%20Structure%20of%20Graphs)
- [A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem](https://papers-gamma.link/paper/24)

☆

1

## Great overview of graph embedding methods
### The encoder, decoder and loss function formalism is very well designed!
### Similarity matrix:
In Table 1, maybe there are methods dedicated to the case where the similarity matrix is required to be a positive semidefinite (SDP) matrix.
Indeed, (i) there are plenty of such relevant SDP matrices (such as "$D+A$", "$A*A$" or "the cosine similarity" which are all SDP) and (ii) the optimal embedding minimizing the L2 loss function is easy to find using the eigenvectors associated with the largest eigenvalues. It thus makes this kind of method very handy.
This kind of [kernel PCA](https://en.wikipedia.org/wiki/Kernel_principal_component_analysis)-like method does not work if the similarity matrix (somehow the kernel here) is not SDP.
### Shallow embedding VS autoencoder-based embedding:
Limitations of "shallow embedding methods" are highlighted on page 9 and it seems that autoencoder-based methods answer some of them. But this part is not totally clear to me.
From what I understand, the major difference between the two kinds of methods is that:
- for the methods referred to as "shallow embedding methods", if you want to reconstruct the entry $s(i,j)$ of the input similarity matrix, then you need the learned feature vectors of both nodes $i$ and $j$ and then the reconstruction of $s(i,j)$ is often simple, e.g. the scalar product between the two feature vectors (cf. equation (5)) or sometimes a softmax (cf. equation (10)) is added on top of the scalar product;
- while in the case of autoencoder-based embedding methods, you only need the learned feature vector of a single node $i$ to reconstruct the full column $s_i$ of the input similarity matrix and the reconstruction is more elaborated.
We can notice that no parameters are shared between nodes in the decoder for the shallow methods, while some parameters can be shared in the case of autoencoder-based methods. There thus can be fewer parameters in autoencoder-based methods and thus it could be in principle faster and have some regularisation.
In addition, if a previously unseen node arrives, an autoencoder-based method could in principle still give a feature vector for that node, while it does not seem to be the case for shallow methods (they are referred to as "inherently transductive" methods).
"Shallow embedding also fails to leverage node attributes during encoding". I think that it is possible to adapt both shallow methods and autoencoder-based methods to take into account this additional information on node attributes. I didn't get this point.
As noted in the survey, it seems that current autoencoder-based methods (SDNE and DNGR) are not scalable and are inherently transductive: "the input dimension to the autoencoder is fixed at $|V|$, which can be extremely costly and even intractable for graphs with millions of nodes. In addition, the structure and size of the autoencoder is fixed, so SDNE and DNGR are strictly transductive and cannot cope with evolving graphs, nor can they generalize across graphs."
The scalability issue has also been noted in [VERSE](https://papers-gamma.link/paper/48/):
"Works such as [12, 48] investigate deep learning approaches for graph embeddings. Their results amount to complex models that require elaborate parameter tuning and computationally expensive optimization, leading to time and space complexities unsuitable for large graph" where [12] and [48] are SDNE and DNGR.
### Neighborhood aggregation and convolutional encoders:
This part seems important. In particular, Algorithm 1 seems important as it is referred to in subsequent parts. However, I found this section very hard to understand.
### Incorporating task-specific supervision:
Even though this section is short, I find it very interesting.
### Typos:
- "they found the the more complex aggregators"
- In equation (24) $h_j^{k-1}$
- End of page 20. What is $\Epsilon$ in $O(|\Epsilon|)$?
- "(e.g. , using..."
- "[8] J. Bruna, W. Zaremba, and Y. Szlam, A.and LeCun. Spectral networks and locally connected networks on graphs..."

☆

1

Great SDP relaxation of max-k-cut and max-bisection: very inspiring!
### Complicated analysis:
The analysis to prove the approximation guarantee is quite complicated though. Much more complicated than the one of [the Goemans-Williamson algorithm](https://en.wikipedia.org/wiki/Semidefinite_programming#Example_3) for max-cut.
Is a simpler analysis possible? Or another relaxation leading to a simpler analysis having similar or better approximation guarantees?
### Implementing the algorithms in a scalable way:
How can we implement such an algorithm in a scalable way? Say for a sparse graph with 1M nodes and 100M edges?
For Goemans-Williamson, this is a try: https://github.com/maxdan94/spingraphSDP

## Comments: