### Concerns about the time complexity given in Table 1:
Section 3.3.5. Graph Factorization (GF). "To the best of our knowledge, Graph Factorization [21] was
the first method to obtain a graph embedding in $O(|E|)$ time". This contradicts the time complexities given in Table 1 "$O(|E|d)$" for GF and earlier complexities in $O(|E|d^2)$.
### Dimensionality reduction is different from graph embedding:
Some of the given references are actually for dimensionality reductions: the input is a set of vectors in dimension $D$ and the output is a set of vectors in dimension $d$ (such that $d<<D$). The problem where the input is a graph and the output is a set of vectors of small dimension is actually different.
LLE and Laplacian Eigenmaps are methods starting from a set of vectors, then forming a graph out of the vectors and then forming vectors of small dimension out of the graph.
Maybe this can be emphasized. In particular, maybe using these techniques on social networks (for instance) may not be adapted. The input graph is in most of the cases unweighted and the structure might be very different from the structure of a graph obtained from vectors.
### Cauchy graph embedding:
- What is the complexity of Cauchy graph embedding (not given in table 1)?
- How is the solution obtained? From eigenvectors like the two previous methods (LLE and Laplacian Eigenmaps)?
- How to tune the parameter $\sigma$?
After reading the paper [33], the Cauchy graph embedding method is more complex and seems less scalable.
### GF and SDP matrices:
Page 4, section 3.3.5. Graph Factorization (GF). "Note that as the adjacency matrix is often not positive semidefinite, the minimum of the loss function is greater than 0 even if the dimensionality of embedding is |V|.".
- The loss function is never 0 if $\lambda>0$.
- If $W$ is SDP (and different from the null matrix) then the associated graph needs to contain self-loops. If self-loops are forbidden, then the matrix is never SDP.
- If $W$ is SDP, then there exist a set of vectors $\{Y_i\}$ such that $\forall i,j$, $W_{ij}=<Y_i,Y_j>$. Is it true that such vectors can always be of dimension $n$ (the size of $W$) or less? A reference or proof?
### Minors:
- Page 3, section 3.1.1. Locally Linear Embedding (LLE). It should be specified that the rows of the weight matrix need sum to one: $\sum_j W_{ij}=1$.
- Page 3, section 3.1.1. Locally Linear Embedding (LLE). "the sparse matrix $(I-W)^T(I-W)$" this matrix is often not sparse (in the sense that it has a lot more non-zero values compared to the adjacency matrix $W$).
- Page 3, section 3.1.3. Cauchy Graph Embedding. Is there a solution of the optimization using eigenvectors as in the two previous methods?
- What is "positive pointwise mutual information (PPMI)"?
- Figure 1 refers to CPE (Community Preserving Embedding), but CPE is not discussed in the survey.
- The section on GraRep refers to the section on HOPE which is after, maybe swapping the two sections?
### Typos:
- Many text-overflows
### Concerns about the time complexity given in Table 1:
Section 3.3.5. Graph Factorization (GF). "To the best of our knowledge, Graph Factorization [21] was
the first method to obtain a graph embedding in $O(|E|)$ time". This contradicts the time complexities given in Table 1 "$O(|E|d)$" for GF and earlier complexities in $O(|E|d^2)$.
### Dimensionality reduction is different from graph embedding:
Some of the given references are actually for dimensionality reductions: the input is a set of vectors in dimension $D$ and the output is a set of vectors in dimension $d$ (such that $d<<D$). The problem where the input is a graph and the output is a set of vectors of small dimension is actually different.
LLE and Laplacian Eigenmaps are methods starting from a set of vectors, then forming a graph out of the vectors and then forming vectors of small dimension out of the graph.
Maybe this can be emphasized. In particular, maybe using these techniques on social networks (for instance) may not be adapted. The input graph is in most of the cases unweighted and the structure might be very different from the structure of a graph obtained from vectors.
### Cauchy graph embedding:
- What is the complexity of Cauchy graph embedding (not given in table 1)?
- How is the solution obtained? From eigenvectors like the two previous methods (LLE and Laplacian Eigenmaps)?
- How to tune the parameter $\sigma$?
After reading the paper [33], the Cauchy graph embedding method is more complex and seems less scalable.
### GF and SDP matrices:
Page 4, section 3.3.5. Graph Factorization (GF). "Note that as the adjacency matrix is often not positive semidefinite, the minimum of the loss function is greater than 0 even if the dimensionality of embedding is |V|.".
- The loss function is never 0 if $\lambda>0$.
- If $W$ is SDP (and different from the null matrix) then the associated graph needs to contain self-loops. If self-loops are forbidden, then the matrix is never SDP.
- If $W$ is SDP, then there exist a set of vectors $\{Y_i\}$ such that $\forall i,j$, $W_{ij}=<Y_i,Y_j>$. Is it true that such vectors can always be of dimension $n$ (the size of $W$) or less? A reference or proof?
### Minors:
- Page 3, section 3.1.1. Locally Linear Embedding (LLE). It should be specified that the rows of the weight matrix need sum to one: $\sum_j W_{ij}=1$.
- Page 3, section 3.1.1. Locally Linear Embedding (LLE). "the sparse matrix $(I-W)^T(I-W)$" this matrix is often not sparse (in the sense that it has a lot more non-zero values compared to the adjacency matrix $W$).
- Page 3, section 3.1.3. Cauchy Graph Embedding. Is there a solution of the optimization using eigenvectors as in the two previous methods?
- What is "positive pointwise mutual information (PPMI)"?
- Figure 1 refers to CPE (Community Preserving Embedding), but CPE is not discussed in the survey.
- The section on GraRep refers to the section on HOPE which is after, maybe swapping the two sections?
### Typos:
- Many text-overflows
### The Louvain algorithm:
"In some cases, it may be possible to have a very fast algorithms based on heuristics to compute partitions,
however, we are unaware of any such methods that would have provable guarantees for the kinds of graphs that
appear in hierarchical clustering"
The Louvain algorithm: https://perso.uclouvain.be/vincent.blondel/research/louvain.html is such an efficient algorithm.
It outputs a hierarchical clustering. The algorithm optimizes the "modularity" in a greedy way, it then merges clusters into "hypernodes" once a local minimum is found and iterates on the newly formed graph.
In most cases, only the partition corresponding to the last level (the larger clusters) is used.
It is known for being fast and accurate in practice. However, to the best of my knowledge, it does not have provable approximation guarantees.
### Wikipedia is a graph with a ground truth hierarchical clustering:
The set of Wikipedia pages and the hypertext links among them form a graph. The Wikipedia categories can be seen as a ground-truth hierarchical clustering of this graph. It would be interesting to see whether the suggested algorithm can find them.
Datatset: http://cfinder.org/wiki/?n=Main.Data#toc1
### Points in an ultrametric space have a natural hierarchical clustering:
What is the intuition behind this fact?
### Minors:
- in Defnition 1.1 "Let G be a graph generated by a minimal ultrametric". "ultrametric" is defined earlier in the paper, but "minimal ultrametric" is not.
### Typos:
- "a very fast algorithms based on"
- "The expected graph as the is the weighted complete graph"
### The Louvain algorithm:
"In some cases, it may be possible to have a very fast algorithms based on heuristics to compute partitions,
however, we are unaware of any such methods that would have provable guarantees for the kinds of graphs that
appear in hierarchical clustering"
The Louvain algorithm: https://perso.uclouvain.be/vincent.blondel/research/louvain.html is such an efficient algorithm.
It outputs a hierarchical clustering. The algorithm optimizes the "modularity" in a greedy way, it then merges clusters into "hypernodes" once a local minimum is found and iterates on the newly formed graph.
In most cases, only the partition corresponding to the last level (the larger clusters) is used.
It is known for being fast and accurate in practice. However, to the best of my knowledge, it does not have provable approximation guarantees.
### Wikipedia is a graph with a ground truth hierarchical clustering:
The set of Wikipedia pages and the hypertext links among them form a graph. The Wikipedia categories can be seen as a ground-truth hierarchical clustering of this graph. It would be interesting to see whether the suggested algorithm can find them.
Datatset: http://cfinder.org/wiki/?n=Main.Data#toc1
### Points in an ultrametric space have a natural hierarchical clustering:
What is the intuition behind this fact?
### Minors:
- in Defnition 1.1 "Let G be a graph generated by a minimal ultrametric". "ultrametric" is defined earlier in the paper, but "minimal ultrametric" is not.
### Typos:
- "a very fast algorithms based on"
- "The expected graph as the is the weighted complete graph"
> It outputs a hierarchical clustering. The algorithm optimizes the "modularity" in a greedy way, it then merges clusters into "hypernodes" once a local minimum is found and iterates on the newly formed graph.
It seems that the trees considered in the article are binary trees, while Louvain algorithm yields a hierarchical clustering tree which is not necessarily binary.
> It outputs a hierarchical clustering. The algorithm optimizes the "modularity" in a greedy way, it then merges clusters into "hypernodes" once a local minimum is found and iterates on the newly formed graph.
It seems that the trees considered in the article are binary trees, while Louvain algorithm yields a hierarchical clustering tree which is not necessarily binary.
Video of the talk: https://www.lincs.fr/events/hierarchical-clustering-objective-functions-and-algorithms/
Video of the talk: https://www.lincs.fr/events/hierarchical-clustering-objective-functions-and-algorithms/
> ### Wikipedia is a graph with a ground truth hierarchical clustering:
>
> The set of Wikipedia pages and the hypertext links among them form a graph. The Wikipedia categories can be seen as a ground-truth hierarchical clustering of this graph. It would be interesting to see whether the suggested algorithm can find them.
>
> Datatset: http://cfinder.org/wiki/?n=Main.Data#
Actually, this hierarchy is a DAG (directed acyclic graph), not a tree.
> ### Wikipedia is a graph with a ground truth hierarchical clustering:
>
> The set of Wikipedia pages and the hypertext links among them form a graph. The Wikipedia categories can be seen as a ground-truth hierarchical clustering of this graph. It would be interesting to see whether the suggested algorithm can find them.
>
> Datatset: http://cfinder.org/wiki/?n=Main.Data#
Actually, this hierarchy is a DAG (directed acyclic graph), not a tree.
Hard to understand for a newbie in Deep RL.
A formal definition of what is a skill ("A skill is
simply a policy.") would help.
### Typos:
- "guaranteeing that is has maximum entropy"
- "We discuss the the log p(z) term in Appendix B."
- "so it much first gather momentum"
- "While are skills are learned"
Hard to understand for a newbie in Deep RL.
A formal definition of what is a skill ("A skill is
simply a policy.") would help.
### Typos:
- "guaranteeing that is has maximum entropy"
- "We discuss the the log p(z) term in Appendix B."
- "so it much first gather momentum"
- "While are skills are learned"
I agree with the previous comment.
The article seems to aim at people who are already familiar with reinforcement (not necessarily deep, or based on neural networks I guess) and its usual benchmark.
The implementations are not detailed, the authors lay stress on the general idea (which is relatively simple to get), and its visual results which look quite spectacular.
I agree with the previous comment.
The article seems to aim at people who are already familiar with reinforcement (not necessarily deep, or based on neural networks I guess) and its usual benchmark.
The implementations are not detailed, the authors lay stress on the general idea (which is relatively simple to get), and its visual results which look quite spectacular.
The problem from my point of view is that a scientist has close to no incentive to give additional context information. For example, an author would not add a link to a Wikipedia page, as it is considered (rightly or wrongly) under the scientific quality standards.
Even worse: I have observed that in most scientific communities, it is expected that you don't give too much details on the some definitions, assumptions or computations if they are considered as "common knowledge". In other words, adding too much details is understood as a sign that you are an outsider of the community.
In anyway, hyperlinks would not solve everything, as the writer still has to target a given "level of knowledge" of the reader, he would make the information understandable by a specific audience.
Typically, when I read about a field which is new to me, I first go to Wikipedia. The level of details of a Wikipedia page is widely heterogeneous: some pages are too basic for what I already know, but others are too elaborate, and as you said, I am not willing to spend the "focusing time" necessary for me to understand such a page if I cannot even evaluate if I will have a much better understanding of the paper afterwards.
The author is certainly more capable of evaluating what are the context information which are useful to understand what he writes, but you are right that it costs time to him as well. So, I think that in a way, the problem can be seen as an economic one, of which focusing time is the main resource.
The problem from my point of view is that a scientist has close to no incentive to give additional context information. For example, an author would not add a link to a Wikipedia page, as it is considered (rightly or wrongly) under the scientific quality standards.
Even worse: I have observed that in most scientific communities, it is expected that you don't give too much details on the some definitions, assumptions or computations if they are considered as "common knowledge". In other words, adding too much details is understood as a sign that you are an outsider of the community.
In anyway, hyperlinks would not solve everything, as the writer still has to target a given "level of knowledge" of the reader, he would make the information understandable by a specific audience.
Typically, when I read about a field which is new to me, I first go to Wikipedia. The level of details of a Wikipedia page is widely heterogeneous: some pages are too basic for what I already know, but others are too elaborate, and as you said, I am not willing to spend the "focusing time" necessary for me to understand such a page if I cannot even evaluate if I will have a much better understanding of the paper afterwards.
The author is certainly more capable of evaluating what are the context information which are useful to understand what he writes, but you are right that it costs time to him as well. So, I think that in a way, the problem can be seen as an economic one, of which focusing time is the main resource.
> For example, an author would not add a link to a Wikipedia page, as it is considered (rightly or wrongly) under the scientific quality standards
I would not agree, some scientists already write blog-posts about their beloved subjects using a lot of hyperlinks to Wikipedia and other sites, consider for example [Igor Pak's](https://igorpak.wordpress.com/2015/05/26/the-power-of-negative-thinking-part-i-pattern-avoidance/) or [Terence Tao's](https://terrytao.wordpress.com/2009/04/26/szemeredis-regularity-lemma-via-random-partitions/) blogs. Currently, such blog-posts are considered complementary to "real" scientific publications. But but one day everything will change.
> For example, an author would not add a link to a Wikipedia page, as it is considered (rightly or wrongly) under the scientific quality standards
I would not agree, some scientists already write blog-posts about their beloved subjects using a lot of hyperlinks to Wikipedia and other sites, consider for example [Igor Pak's](https://igorpak.wordpress.com/2015/05/26/the-power-of-negative-thinking-part-i-pattern-avoidance/) or [Terence Tao's](https://terrytao.wordpress.com/2009/04/26/szemeredis-regularity-lemma-via-random-partitions/) blogs. Currently, such blog-posts are considered complementary to "real" scientific publications. But but one day everything will change.
Comments: