The survey mentions the "parentheses encoding" on page 26. The result
of the parentheses encoding can be interpreted as a [Dyck
path](http://www.findstat.org/DyckPaths).
Consider non-isomorphic rooted plane trees with $n+1$ nodes.
Root will be denoted by the symbol √ in the examples below.
•
|
• •
\ /
• •
\ /
• • •
| | |
Non-isomorphic • • • • • • •
plane | | \ / \|/
rooted trees √ √ √ √
----------------------------------------------------------------------------→
/\
/\/ \
/\/ \
/\ / \
Dyck paths /\ / \/\/ \
/\ / \ /\/\ / \
----------------------------------------------------------------------------→
Parentheses encoding () (()) ()() ((())()((()(()(())))))
The length of a Dyck path is always pair, so we note it $2n$. We can
encode all $(n+1)$-node trees using binary words of length $2n$,
encoding any node, except the root, by $2$ bits. Can we do better?
The famous fact says that the number of such paths of length $2n$
is [the Catalan number](http://oeis.org/A108)
$$C_n ={1 \over n+1}{2n \choose n}.$$
It respects the following inequalities:
$$ 3^n < {1 \over n+1}{2n \choose n} < 4^n,$$
and symptomatically behaves like
$$C_n \sim \frac {4^{n}}{n^{3/2}{\sqrt {\pi }}}.$$
With $1$ bit per non-root node, we can encode only $2^n$ such trees,
while $2$ bits per node gives us $2^{2n} = 4^n$ which is greater than actually needed.
So we cannot encode all $(n+1)$-node trees using only $1$ bit per node,
but we should be able to do slightly better than $2n$.
For example, solving the following equation for $k$
$$ 2^k = \frac {4^{n}}{n^{3/2}{\sqrt {\pi }}},$$
we obtain
$$
k = 2n - \frac{3 \ln n }{2 \ln 2} - \frac{ \ln \pi}{2 \ln 2}.
$$
It suggest that $\left\lceil 2n - \frac{3 \ln n }{2 \ln 2} - \frac{ \ln \pi}{2 \ln 2} \right\rceil$
should be near the optimal length of a binary word needed to encode all $(n+1)$-node trees.
Brute-force pythonization shows that the following sequence gives the optimal
length:
$1, 2, 3, 4, 6, 8, 9, 11, 13, 15, 16, 18, 20, 22, 24, 26, 27, 29, 31, 33, 35, 37, ...$
Compare this to the the number of trees with $n+1$ nodes, the Catalan sequence:
$1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900,... $
See also [unranking Catalan
functions](http://oeis.org/wiki/Ranking_and_unranking_functions#Ranking_and_unranking_functions_for_Catalan_objects)
and [Combinatorial
Generation](https://papers-gamma.link/paper/161/Combinatorial%20Generation)
book for algorithms that convert a natural number to a Catalan tree and vice versa.
The survey mentions the "parentheses encoding" on page 26. The result
of the parentheses encoding can be interpreted as a [Dyck
path](http://www.findstat.org/DyckPaths).
Consider non-isomorphic rooted plane trees with $n+1$ nodes.
Root will be denoted by the symbol √ in the examples below.
•
|
• •
\ /
• •
\ /
• • •
| | |
Non-isomorphic • • • • • • •
plane | | \ / \|/
rooted trees √ √ √ √
----------------------------------------------------------------------------→
/\
/\/ \
/\/ \
/\ / \
Dyck paths /\ / \/\/ \
/\ / \ /\/\ / \
----------------------------------------------------------------------------→
Parentheses encoding () (()) ()() ((())()((()(()(())))))
The length of a Dyck path is always pair, so we note it $2n$. We can
encode all $(n+1)$-node trees using binary words of length $2n$,
encoding any node, except the root, by $2$ bits. Can we do better?
The famous fact says that the number of such paths of length $2n$
is [the Catalan number](http://oeis.org/A108)
$$C_n ={1 \over n+1}{2n \choose n}.$$
It respects the following inequalities:
$$ 3^n < {1 \over n+1}{2n \choose n} < 4^n,$$
and symptomatically behaves like
$$C_n \sim \frac {4^{n}}{n^{3/2}{\sqrt {\pi }}}.$$
With $1$ bit per non-root node, we can encode only $2^n$ such trees,
while $2$ bits per node gives us $2^{2n} = 4^n$ which is greater than actually needed.
So we cannot encode all $(n+1)$-node trees using only $1$ bit per node,
but we should be able to do slightly better than $2n$.
For example, solving the following equation for $k$
$$ 2^k = \frac {4^{n}}{n^{3/2}{\sqrt {\pi }}},$$
we obtain
$$
k = 2n - \frac{3 \ln n }{2 \ln 2} - \frac{ \ln \pi}{2 \ln 2}.
$$
It suggest that $\left\lceil 2n - \frac{3 \ln n }{2 \ln 2} - \frac{ \ln \pi}{2 \ln 2} \right\rceil$
should be near the optimal length of a binary word needed to encode all $(n+1)$-node trees.
Brute-force pythonization shows that the following sequence gives the optimal
length:
$1, 2, 3, 4, 6, 8, 9, 11, 13, 15, 16, 18, 20, 22, 24, 26, 27, 29, 31, 33, 35, 37, ...$
Compare this to the the number of trees with $n+1$ nodes, the Catalan sequence:
$1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900,... $
See also [unranking Catalan
functions](http://oeis.org/wiki/Ranking_and_unranking_functions#Ranking_and_unranking_functions_for_Catalan_objects)
and [Combinatorial
Generation](https://papers-gamma.link/paper/161/Combinatorial%20Generation)
book for algorithms that convert a natural number to a Catalan tree and vice versa.
Comments: