Great paper by the founders of Smoothed Analysis!
Some nice quotes:
- "if one can prove that an algorithm performs well in the worst case, then one can be confident that it will work well in every domain. However, there are many algorithms that work well in practice that do not work well in the worst case. Smoothed analysis provides a theoretical framework for explaining why some of these algorithms do
work well in practice."
- "The performance profiles of algorithms across the landscape of input instances can differ greatly and can be quite irregular."
- "If a single input instance triggers an exponential run time, the algorithm is called an exponential-time algorithm."
- "While polynomial time algorithms are usually viewed as being efficient, we clearly prefer those whose run time is a polynomial of low degree, especially those that run in nearly linear time."
- "Developing means for predicting the performance of algorithms and heuristics on real data and on real computers is a grand challenge in algorithms."
- "We hope that theoretical explanations will be found for the success in practice of many of these algorithms, and that these theories will catalyze better algorithm design."
- "there are many problems that need to be solved
in practice for which we do not know algorithms with good
worst-case performance. Instead, scientists and engineers
typically use heuristic algorithms to solve these problems. Many of these algorithms work well in practice, in spite of having a poor, sometimes exponential, worst-case running time. Practitioners justify the use of these heuristics by observing that worst-case instances are usually not “typical” and rarely occur in practice. The worst-case analysis can be too pessimistic."
- "heuristics are often used to speed up the practical performance of implementations that are based on algorithms with polynomial worst-case complexity. These heuristics might in fact worsen the worst-case performance, or make the worst-case complexity
difficult to analyze."
- "While one would ideally choose the distribution of inputs that occur in practice, this is difficult as it is rare that one can determine or cleanly express these distributions, and the distributions can vary greatly between one application and another. Instead, average-case analyses have employed distributions with concise mathematical descriptions, such as Gaussian random vectors, uniform {0, 1} vectors, and Erdos-Renyi random graphs. The drawback of using such distributions is that the inputs actually encountered in practice may bear very little resemblance to the inputs that are likely to be generated by such distributions."
- "Because of the intrinsic difficulty in defining practical distributions, we consider an alternative approach to modeling real data. The basic idea is to identify typical properties of practical data, define an input model that captures these properties, and then rigorously analyze the performance of algorithms assuming their inputs have these properties. Smoothed analysis is a step in this direction. It is motivated by the observation that practical data are often subject to some small degree of random noise."
- "At a high level, each input is generated from a two-stage model: In the first stage, an instance is generated and in the second stage, the instance from the first stage is slightly perturbed. The perturbed instance is the input to the algorithm."
- "we hope insights gained from smoothed analysis will lead to new ideas in algorithm design [...] we suggest that it might be possible to solve some problems more efficiently by perturbing their inputs."
Great paper by the founders of Smoothed Analysis!
Some nice quotes:
- "if one can prove that an algorithm performs well in the worst case, then one can be confident that it will work well in every domain. However, there are many algorithms that work well in practice that do not work well in the worst case. Smoothed analysis provides a theoretical framework for explaining why some of these algorithms do
work well in practice."
- "The performance profiles of algorithms across the landscape of input instances can differ greatly and can be quite irregular."
- "If a single input instance triggers an exponential run time, the algorithm is called an exponential-time algorithm."
- "While polynomial time algorithms are usually viewed as being efficient, we clearly prefer those whose run time is a polynomial of low degree, especially those that run in nearly linear time."
- "Developing means for predicting the performance of algorithms and heuristics on real data and on real computers is a grand challenge in algorithms."
- "We hope that theoretical explanations will be found for the success in practice of many of these algorithms, and that these theories will catalyze better algorithm design."
- "there are many problems that need to be solved
in practice for which we do not know algorithms with good
worst-case performance. Instead, scientists and engineers
typically use heuristic algorithms to solve these problems. Many of these algorithms work well in practice, in spite of having a poor, sometimes exponential, worst-case running time. Practitioners justify the use of these heuristics by observing that worst-case instances are usually not “typical” and rarely occur in practice. The worst-case analysis can be too pessimistic."
- "heuristics are often used to speed up the practical performance of implementations that are based on algorithms with polynomial worst-case complexity. These heuristics might in fact worsen the worst-case performance, or make the worst-case complexity
difficult to analyze."
- "While one would ideally choose the distribution of inputs that occur in practice, this is difficult as it is rare that one can determine or cleanly express these distributions, and the distributions can vary greatly between one application and another. Instead, average-case analyses have employed distributions with concise mathematical descriptions, such as Gaussian random vectors, uniform {0, 1} vectors, and Erdos-Renyi random graphs. The drawback of using such distributions is that the inputs actually encountered in practice may bear very little resemblance to the inputs that are likely to be generated by such distributions."
- "Because of the intrinsic difficulty in defining practical distributions, we consider an alternative approach to modeling real data. The basic idea is to identify typical properties of practical data, define an input model that captures these properties, and then rigorously analyze the performance of algorithms assuming their inputs have these properties. Smoothed analysis is a step in this direction. It is motivated by the observation that practical data are often subject to some small degree of random noise."
- "At a high level, each input is generated from a two-stage model: In the first stage, an instance is generated and in the second stage, the instance from the first stage is slightly perturbed. The perturbed instance is the input to the algorithm."
- "we hope insights gained from smoothed analysis will lead to new ideas in algorithm design [...] we suggest that it might be possible to solve some problems more efficiently by perturbing their inputs."
I'm in love with this paper!
### Non-worst-case analysis:
In practice, "we are not interested in all problem instances, but only in those which can actually occur in reality."
"The notion of stability [...] is a concrete way to formalize the notion that the only instances of interest are those for which small perturbation in the data (which may reflect e.g. some measurement errors) do not change the optimal partition of the graph."
This Stable Analysis is different from "Smoothed Analysis" where "one shows that the hard instances form a discrete and isolated subset of the input space".
### Open problems:
**Conjecture:** There exists some constant $\gamma^{*}$ such that $\gamma^{*}$-stable instances can be solved in polynomial time.
**Question:** it is shown that $\gamma$-stable instances, with $\gamma>\sqrt{\Delta n}$, can be solved in polynomial time. Can this be improved (without further assumptions such as a lower bound on the minimum degree)? As $\sqrt{\Delta n}$ is usually large, this may not be useful in practice.
**Question:** How does the algorithm "FindMaxCut" (page 6) perform in practice on real-world instances???
**Question:** How about the greedy heuristic?: Start from a random cut and do passes on the nodes moving each node to the other side of the cut if the size of the cut increases until convergence. Does it have some guarantee on $\gamma$-stable instances?
### Extended spectral clustering:
"Let D be a diagonal matrix. Think of W + D as the weighted adjacency matrix of a graph, with loops added. Such loops do not change the weight of any cut, so that regardless of what D we choose, a cut is maximal in W iff it is maximal in W + D. Furthermore, it is not hard to see that W is $\gamma$-stable, iff W + D is. Our approach is to first find a “good” D, and then take the spectral partitioning of W + D as the maximal cut. These observations suggest the following question: Is it true that for every $\gamma$-stable instance W with γ large enough there exists a diagonal D for which extended spectral partitioning solves Max-Cut? If so, can such a D be found efficiently? Below we present certain sufficient conditions for these statements."
I did not fully understand what is presented below that paragraph. Let G be a $\gamma$-stable graph, how do I get $D$?
### Goemans-Williamson algorithm:
The approximation guarantee of the Goemans-Williamson algorithm is better on $\gamma$-stable instances than in general.
### Random model:
With high probability, the extended specral clustering leads to the optimal cut on $\gamma$-stable instances generated from a certain random model for $\gamma\geq 1+\Omega(\sqrt{\frac{\log(n)}{n}})$.
### Typos:
- page 3, Proposition 2.1: " A graph G graph"
- page 4: "this follows from Definition 2.1", should be "Proposition 2.1".
- page 5, Definition 2.2: should be "E" instead of "e" in the equation.
- page 5: "which must to be on the" and "of the optional cut" -> "optimal".
- page 8: "we multiply it be a PSD matrix"
I'm in love with this paper!
### Non-worst-case analysis:
In practice, "we are not interested in all problem instances, but only in those which can actually occur in reality."
"The notion of stability [...] is a concrete way to formalize the notion that the only instances of interest are those for which small perturbation in the data (which may reflect e.g. some measurement errors) do not change the optimal partition of the graph."
This Stable Analysis is different from "Smoothed Analysis" where "one shows that the hard instances form a discrete and isolated subset of the input space".
### Open problems:
**Conjecture:** There exists some constant $\gamma^{*}$ such that $\gamma^{*}$-stable instances can be solved in polynomial time.
**Question:** it is shown that $\gamma$-stable instances, with $\gamma>\sqrt{\Delta n}$, can be solved in polynomial time. Can this be improved (without further assumptions such as a lower bound on the minimum degree)? As $\sqrt{\Delta n}$ is usually large, this may not be useful in practice.
**Question:** How does the algorithm "FindMaxCut" (page 6) perform in practice on real-world instances???
**Question:** How about the greedy heuristic?: Start from a random cut and do passes on the nodes moving each node to the other side of the cut if the size of the cut increases until convergence. Does it have some guarantee on $\gamma$-stable instances?
### Extended spectral clustering:
"Let D be a diagonal matrix. Think of W + D as the weighted adjacency matrix of a graph, with loops added. Such loops do not change the weight of any cut, so that regardless of what D we choose, a cut is maximal in W iff it is maximal in W + D. Furthermore, it is not hard to see that W is $\gamma$-stable, iff W + D is. Our approach is to first find a “good” D, and then take the spectral partitioning of W + D as the maximal cut. These observations suggest the following question: Is it true that for every $\gamma$-stable instance W with γ large enough there exists a diagonal D for which extended spectral partitioning solves Max-Cut? If so, can such a D be found efficiently? Below we present certain sufficient conditions for these statements."
I did not fully understand what is presented below that paragraph. Let G be a $\gamma$-stable graph, how do I get $D$?
### Goemans-Williamson algorithm:
The approximation guarantee of the Goemans-Williamson algorithm is better on $\gamma$-stable instances than in general.
### Random model:
With high probability, the extended specral clustering leads to the optimal cut on $\gamma$-stable instances generated from a certain random model for $\gamma\geq 1+\Omega(\sqrt{\frac{\log(n)}{n}})$.
### Typos:
- page 3, Proposition 2.1: " A graph G graph"
- page 4: "this follows from Definition 2.1", should be "Proposition 2.1".
- page 5, Definition 2.2: should be "E" instead of "e" in the equation.
- page 5: "which must to be on the" and "of the optional cut" -> "optimal".
- page 8: "we multiply it be a PSD matrix"
Nice paper building on top of [the WebGraph framework](https://papers-gamma.link/paper/31) and [Chierichetti et al.](https://papers-gamma.link/paper/126) to compress graphs.
### Approximation guarantee
I read: "our algorithm is inspired by a theoretical approach
with provable guarantees on the final quality, and it is designed to directly optimize the resulting compression ratio.".
I misunderstood initially, but the proposed algorithm actually does not have any provable approximation guarantee other than the $\log(n)$ one (which is also obtained by a random ordering of the nodes).
Designing an algorithm with (a better) approximation guarantee for minimizing "MLogA", "MLogGapA" or "BiMLogA" seems to be a nice open problem.
### Objectives
Is there any better objective than "MLogA", "MLogGapA" or "BiMLogA" to have a proxy of the compression obtained by the BV-framework?
Is it possible to directly look for an ordering that minimizes the size of the output of BV compression algorithm?
Nice paper building on top of [the WebGraph framework](https://papers-gamma.link/paper/31) and [Chierichetti et al.](https://papers-gamma.link/paper/126) to compress graphs.
### Approximation guarantee
I read: "our algorithm is inspired by a theoretical approach
with provable guarantees on the final quality, and it is designed to directly optimize the resulting compression ratio.".
I misunderstood initially, but the proposed algorithm actually does not have any provable approximation guarantee other than the $\log(n)$ one (which is also obtained by a random ordering of the nodes).
Designing an algorithm with (a better) approximation guarantee for minimizing "MLogA", "MLogGapA" or "BiMLogA" seems to be a nice open problem.
### Objectives
Is there any better objective than "MLogA", "MLogGapA" or "BiMLogA" to have a proxy of the compression obtained by the BV-framework?
Is it possible to directly look for an ordering that minimizes the size of the output of BV compression algorithm?
Comments: