Alt-Tab — Papersᵞ

☆

0

Techniques for Inverted Index Compression

Comments:

Read the paper, add your comments…

☆

1

Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations

Comments:

The survey mentions the "parentheses encoding" on page 26. The result of the parentheses encoding can be interpreted as a [Dyck path](http://www.findstat.org/DyckPaths). Consider non-isomorphic rooted plane trees with $n+1$ nodes. Root will be denoted by the symbol √ in the examples below. • | • • \ / • • \ / • • • | | | Non-isomorphic • • • • • • • plane | | \ / \|/ rooted trees √ √ √ √ ----------------------------------------------------------------------------→ /\ /\/ \ /\/ \ /\ / \ Dyck paths /\ / \/\/ \ /\ / \ /\/\ / \ ----------------------------------------------------------------------------→ Parentheses encoding () (()) ()() ((())()((()(()(()))))) The length of a Dyck path is always pair, so we note it $2n$. We can encode all $(n+1)$-node trees using binary words of length $2n$, encoding any node, except the root, by $2$ bits. Can we do better? The famous fact says that the number of such paths of length $2n$ is [the Catalan number](http://oeis.org/A108) $$C_n ={1 \over n+1}{2n \choose n}.$$ It respects the following inequalities: $$ 3^n < {1 \over n+1}{2n \choose n} < 4^n,$$ and symptomatically behaves like $$C_n \sim \frac {4^{n}}{n^{3/2}{\sqrt {\pi }}}.$$ With $1$ bit per non-root node, we can encode only $2^n$ such trees, while $2$ bits per node gives us $2^{2n} = 4^n$ which is greater than actually needed. So we cannot encode all $(n+1)$-node trees using only $1$ bit per node, but we should be able to do slightly better than $2n$. For example, solving the following equation for $k$ $$ 2^k = \frac {4^{n}}{n^{3/2}{\sqrt {\pi }}},$$ we obtain $$ k = 2n - \frac{3 \ln n }{2 \ln 2} - \frac{ \ln \pi}{2 \ln 2}. $$ It suggest that $\left\lceil 2n - \frac{3 \ln n }{2 \ln 2} - \frac{ \ln \pi}{2 \ln 2} \right\rceil$ should be near the optimal length of a binary word needed to encode all $(n+1)$-node trees. Brute-force pythonization shows that the following sequence gives the optimal length: $1, 2, 3, 4, 6, 8, 9, 11, 13, 15, 16, 18, 20, 22, 24, 26, 27, 29, 31, 33, 35, 37, ...$ Compare this to the the number of trees with $n+1$ nodes, the Catalan sequence: $1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900,... $ See also [unranking Catalan functions](http://oeis.org/wiki/Ranking_and_unranking_functions#Ranking_and_unranking_functions_for_Catalan_objects) and [Combinatorial Generation](https://papers-gamma.link/paper/161/Combinatorial%20Generation) book for algorithms that convert a natural number to a Catalan tree and vice versa.

Sergey Kirgizov at 2021-05-20 13:32:34

Read the paper, add your comments…

☆

0