Wellcome to Maximimi's library,

  You can find here all papers liked or uploaded by Maximimi
  together with brief user bio and description of her/his academic activity.


[Link to my homepage](https://sites.google.com/view/danisch/home) ## I will read the following papers. - [Quasi-Succinct Indices](https://papers-gamma.link/paper/130) - [PageRank as a Function of the Damping Factor](https://papers-gamma.link/paper/106) - [Graph Stream Algorithms: A Survey](https://papers-gamma.link/paper/102) - [Network Sampling: From Static to Streaming Graphs](https://papers-gamma.link/paper/122) - [The Protein-Folding Problem, 50 Years On](https://papers-gamma.link/paper/78) - [Computational inference of gene regulatory networks: Approaches, limitations and opportunitie](https://papers-gamma.link/paper/77) - [Graph complexity analysis identifies an ETV5 tumor-specific network in human and murine low-grade glioma](https://papers-gamma.link/paper/79) - [Gene Networks in Plant Biology: Approaches in Reconstruction and Analysis](https://papers-gamma.link/paper/76) - [The non-convex Burer–Monteiro approach works on smooth semidefinite programs](https://papers-gamma.link/paper/80) - [Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality](https://papers-gamma.link/paper/81) - [Influence maximization in complex networks through optimal percolation](https://papers-gamma.link/paper/70) - [Motifs in Temporal Networks](https://papers-gamma.link/paper/61) - [Deep Sparse Rectifier Neural Networks](https://papers-gamma.link/paper/69) - [Sparse Convolutional Neural Networks](https://papers-gamma.link/paper/67) - [A fast and simple algorithm for training neural probabilistic language models](https://papers-gamma.link/paper/58) - [Adding One Neuron Can Eliminate All Bad Local Minima](https://papers-gamma.link/paper/71) ## I read the following papers. ### 2019-2020 - [Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs](https://papers-gamma.link/paper/162) - [Karp-Sipser based kernels for bipartite graph matching](https://papers-gamma.link/paper/160) - [Speedup Graph Processing by Graph Ordering](https://papers-gamma.link/paper/159) - [Tree Sampling Divergence: An Information-Theoretic Metric for Hierarchical Graph Clustering](https://papers-gamma.link/paper/154) ### 2018-2019: - [SWeG: Lossless and Lossy Summarization of Web-Scale Graphs](https://papers-gamma.link/paper/139) - [Smoothed Analysis: An Attempt to Explain the Behavior of Algorithms in Practice](https://papers-gamma.link/paper/129) - [Are stable instances easy?](https://papers-gamma.link/paper/128) - [Hierarchical Taxonomy Aware Network Embedding](https://papers-gamma.link/paper/116) - [Billion-scale Network Embedding with Iterative Random Projection](https://papers-gamma.link/paper/110) - [HARP: Hierarchical Representation Learning for Networks](https://papers-gamma.link/paper/109/) - [Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks](https://papers-gamma.link/paper/105) ### 2017-2018: - [Link Prediction in Graph Streams](https://papers-gamma.link/paper/101) - [The Community-search Problem and How to Plan a Successful Cocktail Party](https://papers-gamma.link/paper/74) - [A Nonlinear Programming Algorithm for Solving Semidefinite Programs via Low-rank Factorization](https://papers-gamma.link/paper/55) - [Deep Learning](https://papers-gamma.link/paper/68) - [Reducing the Dimensionality of Data with Neural Networks](https://papers-gamma.link/paper/65) - [Representation Learning on Graphs: Methods and Applications](https://papers-gamma.link/paper/60) - [Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION](https://papers-gamma.link/paper/56) - [Cauchy Graph Embedding](https://papers-gamma.link/paper/53) - [Phase Transitions in Semidefinite Relaxations](https://papers-gamma.link/paper/57) - [Graph Embedding Techniques, Applications, and Performance: A Survey](https://papers-gamma.link/paper/52) - [VERSE: Versatile Graph Embeddings from Similarity Measures](https://papers-gamma.link/paper/48) - [Hierarchical Clustering Beyond the Worst-Case](https://papers-gamma.link/paper/45) - [Scalable Motif-aware Graph Clustering](https://papers-gamma.link/paper/18) - [Practical Algorithms for Linear Boolean-width](https://papers-gamma.link/paper/40) - [New Perspectives and Methods in Link Prediction](https://papers-gamma.link/paper/28/New%20Perspectives%20and%20Methods%20in%20Link%20Prediction) - [In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond](https://papers-gamma.link/paper/37) - [Diversity is All You Need: Learning Skills without a Reward Function](https://papers-gamma.link/paper/36) - [When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors](https://papers-gamma.link/paper/23) - [Fast Approximation of Centrality](https://papers-gamma.link/paper/35/Fast%20Approximation%20of%20Centrality) - [Indexing Public-Private Graphs](https://papers-gamma.link/paper/19/Indexing%20Public-Private%20Graphs) - [On the uniform generation of random graphs with prescribed degree sequences](https://papers-gamma.link/paper/26/On%20the%20uniform%20generation%20of%20random%20graphs%20with%20prescribed%20d%20egree%20sequences) - [Linear Additive Markov Processes](https://papers-gamma.link/paper/21/Linear%20Additive%20Markov%20Processes) - [ESCAPE: Efficiently Counting All 5-Vertex Subgraphs](https://papers-gamma.link/paper/17/ESCAPE:%20Efficiently%20Counting%20All%205-Vertex%20Subgraphs) - [The k-peak Decomposition: Mapping the Global Structure of Graphs](https://papers-gamma.link/paper/16/The%20k-peak%20Decomposition:%20Mapping%20the%20Global%20Structure%20of%20Graphs) - [A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem](https://papers-gamma.link/paper/24)

Comments:

Very nice paper! The main idea: if the distance between two embedded nodes has to be large then one should not waste energy in order to make this distance a bit smaller. Trying to make large distances a bit smaller is actually what Laplacian embedding is doing with its quadratic objective. One should rather focus on small distances and give up on the pairs of embedded nodes that are far away. ### Embedding complex networks: This method could be applied to social networks (or complex networks in general): social networks are known to have "weak ties" (roughly speaking, weak ties are links between people that do not have many friends in common) that may perturb laplacian embedding. ### Asymptotic complexity and scalability in practice: The suggested algorithm to solve the optimization does not seem very scalable. The asymptotic running time complexity is not given. ### Concerns about the optimization in dimension higher than 1. The example in 2-D. In equations (9-11), what if we change the constraints into: $||x||^2+||y||^2=1$, $e^Tx=0$ and $e^Ty=0$? In n-D, denoting $r_i$ the vector associated to node $i$, that would give $\sum_ir_i^2=1$ and $\sum_ir_i=0$. This seems to be a more relevant optimization to me: minimize the distance between nodes in 2-D without setting constraints for each dimension, but rather global constraints. If the objective is the one of the laplacian embedding, is it true that, in the optimal solution, $y$ will be proportional to $x$? ($x$ and $y$ will both be proportional to the eigenvector associated with the smallest eigenvalue of the Laplacian?) If so, is it still true if the objective is another one? Like the Cauchy embedding objective for instance? ### Others: - How to tune the parameter $\sigma$?
> - How to tune the parameter $σ$? I don't see where $σ$ is defined, did I miss something?
Note that $\sum_i r_i = 0$ means that vector $r$ is orthogonal to vector $v_0 = (1,1,\ldots\,1,1,1)$. The latter corresponds to the smallest eigen-value $\lambda_0 = 0$ of [Laplacian matrix](https://en.wikipedia.org/wiki/Laplacian_matrix). Adding constraint $r \perp v_0$ to the optimization problem drives us to seek the second smallest eigenpair. You may also read very related and well-written [lecture](https://ocw.mit.edu/courses/mathematics/18-409-topics-in-theoretical-computer-science-an-algorithmists-toolkit-fall-2009/lecture-notes/MIT18_409F09_scribe3.pdf) about Rayleigh quotients, min-max theorem (Courant–Fischer–Wey). Gives a good background in the field of eigen-based solutions for optimization problems. In 2016, during the preparation of [my talk](http://www.complexnetworks.fr/temporal-density-of-complex-networks-and-ego-community-dynamics/) titled "Temporal density of complex networks and ego-community dynamics" ([slides](https://kirgizov.link/talks/temporal-density-4-july-2016.pdf), french). I have read stuff about graph optimization problems. Below I present my personal (perhaps a little bit random selection) of related papers and books * [Four Cheeger-type Inequalities for Graph Partitioning Algorithms](https://www.math.ucsd.edu/~fan/wp/heaticcm.pdf), Fan Chung, 2007 * [Normalized cuts and image segmentation](https://people.eecs.berkeley.edu/~malik/papers/SM-ncut.pdf), Jianbo Shi and Jitendra Malik, 2000 * [Eigenvalues of graphs](https://web.cs.elte.hu/~lovasz/eigenvals-x.pdf), László Lovász, 2007 * [Spectres de graphes](https://www-fourier.ujf-grenoble.fr/~ycolver/All-Articles/98a.pdf) de Yves Colin de Verdière, 1998
Read the paper, add your comments…

Comments:

Read the paper, add your comments…

Comments:

## Great overview of graph embedding methods ### The encoder, decoder and loss function formalism is very well designed! ### Similarity matrix: In Table 1, maybe there are methods dedicated to the case where the similarity matrix is required to be a positive semidefinite (SDP) matrix. Indeed, (i) there are plenty of such relevant SDP matrices (such as "$D+A$", "$A*A$" or "the cosine similarity" which are all SDP) and (ii) the optimal embedding minimizing the L2 loss function is easy to find using the eigenvectors associated with the largest eigenvalues. It thus makes this kind of method very handy. This kind of [kernel PCA](https://en.wikipedia.org/wiki/Kernel_principal_component_analysis)-like method does not work if the similarity matrix (somehow the kernel here) is not SDP. ### Shallow embedding VS autoencoder-based embedding: Limitations of "shallow embedding methods" are highlighted on page 9 and it seems that autoencoder-based methods answer some of them. But this part is not totally clear to me. From what I understand, the major difference between the two kinds of methods is that: - for the methods referred to as "shallow embedding methods", if you want to reconstruct the entry $s(i,j)$ of the input similarity matrix, then you need the learned feature vectors of both nodes $i$ and $j$ and then the reconstruction of $s(i,j)$ is often simple, e.g. the scalar product between the two feature vectors (cf. equation (5)) or sometimes a softmax (cf. equation (10)) is added on top of the scalar product; - while in the case of autoencoder-based embedding methods, you only need the learned feature vector of a single node $i$ to reconstruct the full column $s_i$ of the input similarity matrix and the reconstruction is more elaborated. We can notice that no parameters are shared between nodes in the decoder for the shallow methods, while some parameters can be shared in the case of autoencoder-based methods. There thus can be fewer parameters in autoencoder-based methods and thus it could be in principle faster and have some regularisation. In addition, if a previously unseen node arrives, an autoencoder-based method could in principle still give a feature vector for that node, while it does not seem to be the case for shallow methods (they are referred to as "inherently transductive" methods). "Shallow embedding also fails to leverage node attributes during encoding". I think that it is possible to adapt both shallow methods and autoencoder-based methods to take into account this additional information on node attributes. I didn't get this point. As noted in the survey, it seems that current autoencoder-based methods (SDNE and DNGR) are not scalable and are inherently transductive: "the input dimension to the autoencoder is fixed at $|V|$, which can be extremely costly and even intractable for graphs with millions of nodes. In addition, the structure and size of the autoencoder is fixed, so SDNE and DNGR are strictly transductive and cannot cope with evolving graphs, nor can they generalize across graphs." The scalability issue has also been noted in [VERSE](https://papers-gamma.link/paper/48/): "Works such as [12, 48] investigate deep learning approaches for graph embeddings. Their results amount to complex models that require elaborate parameter tuning and computationally expensive optimization, leading to time and space complexities unsuitable for large graph" where [12] and [48] are SDNE and DNGR. ### Neighborhood aggregation and convolutional encoders: This part seems important. In particular, Algorithm 1 seems important as it is referred to in subsequent parts. However, I found this section very hard to understand. ### Incorporating task-specific supervision: Even though this section is short, I find it very interesting. ### Typos: - "they found the the more complex aggregators" - In equation (24) $h_j^{k-1}$ - End of page 20. What is $\Epsilon$ in $O(|\Epsilon|)$? - "(e.g. , using..." - "[8] J. Bruna, W. Zaremba, and Y. Szlam, A.and LeCun. Spectral networks and locally connected networks on graphs..."
Read the paper, add your comments…
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26