```
Entropy Makes the Collatz Series Go Down

Edited October 21, 2021

ABSTRACT

An equivalent redefinition of the Collatz series is used that has
three kinds of segments.  Inputs to each segment have a string of
zeros or ones in the low order bits represented as powers of two.
In the corresponding output of each segment these terms are
replaced by powers of three or are deleted.  These input terms
have no entropy and entropy is increased in outputs.  As the
series progresses entropy increases; randomizing the sequence.

The low order two bits in inputs to each segment determine which
of the three kinds of transitions to apply.  As these bits become
more random, the rate that each kind of segment occurs averages
out.  An average distribution causes the sequence to decrease.

A linear counter example consists of an infinitely long sequence
with transactions that produce ever increasing values.  Values
would need to be highly skewed in favor of odd numbers to sustain
the increase.  Even when the starting value is contrived to
produce mostly odd (increasing) transitions, information encoded
in the starting value is lost.  Eventually even and odd values
balance out and the sequence decreases.  This makes it impossible
to construct an infinitely long series.

I. Overview

A. Rewritten transitions for the Collatz sequence merge two or three steps:

N is Odd:     N' = (3 * N + 1) / 2

N % 4 = 0:    N' = N / 4

N % 4 = 2:    N' = (3 * N + 2) / 4

All intermediate values in this sequence have:  N % 3 = 2
The low order two bits of N determine the type of the next transition.

B. The average occurance and gain (output to input ratio) is:

Occurrence     Gain Limit        Gain ^ Occurrence

N is Odd:      50%          3/2 = 1.5         1.5^.5 = 1.22474

N % 4 = 0:     25%          1/4 = .25        .25^.25 = 0.70711

N % 4 = 2:     25%          3/4 = .75        .75^.25 = 0.9306

The gain of a sequence is the product of the gains for each transition.
In the limit the average gain of a Collatz sequence is:

Ga = 1.22474 * 0.70711 * 0.9306 = 0.80592

The average gain is under unity so on average Collatz sequences will
steadily be reduced until terminated.  Gains vary in actual sequences.

C. A Collatz sequence will have segments with consecutive runs of
each transition.  Each of these chains have length k with these
input and output forms:

Chain Input        Chain Output        Gain

N is Odd:     j * 2^k - 1        j * 3^k - 1         1.5^k

N % 4 = 0:    j * 4^k            j                   .25^k

N % 4 = 2:    j * 4^k + 2        j * 3^k + 2         .75^k

Each chain takes a value where the low order bits are a string of zeros
or ones and replaces or removes them.  Inputs have a 2^k term that
transition to 3^k or a 4^k term that is removed.  Strings of zeros or
ones have no entropy and the 3^k term has positive entropy.  Entropy is
added each time a chain of transitions is applied.

In particular, each chain hashes the low order two bits that determine
the type of the next chain.  In a very long sequence each type of
transition occurs closer to the average rate.

In order for a Collatz sequence to avoid eventual termination it needs
to either be circular or produce values in ever greater ranges.  In
the latter case any infinite sequence requires a gain over one to keep
growing.  This cannot be sustained indefinitely.  Entropy ensures that
it will drift towards the average gain of 0.8059 and eventually terminate.

Over twice as many odd transitions than even are needed in order to
achieve the minimum eternal growth rate; which is far askew from equity.
In a long sequence the entropy of the low order bit would need to remain
below 0.9183.

Odd     N % 4 = 0    N % 4 = 2

1.5^.674 * .25^.163 * .75^.163 = 1.00043   gain slightly over 1

The series self regulates in that as more odd transactions are applied
the entropy goes up which pushes the number of odd transactions down
towards the 50% average.  In turn this lowers the average gain lowers
until it descends below one leading to eventual termination.

After the first few chains in even the most contrived runs, entropy
measures above .99.  This is well above the 0.9183 minimum value required
for termination.

Here is an example run with metrics for each step.

II. Collatz Conjecture

even N:  N' = N / 2
odd N:  N' = N * 3 + 1

The sequence eventually reduces to 1.

notation:   %   modulus
->   transition to the the next value in a sequence

A. Rewritten Conjecture

The original series is rewritten to produce only values where:  N % 3 = 2
Two or three iterations are merged into each revised transition.
Intermediate values that are skipped have:  N % 3 = 0 or 1.
They account for 2/3 of the values in the original series and skipping
them simplifies analysis.

The transition out of an odd number is always even so we can take 2 steps:

N' = {N * 3 + 1} / 2

On an even number apply one of these two transitions:

When N % 4 = 0 then:  N' = N / 4

When N % 4 = 2 then:  N' = ([N / 2] * 3 + 1) / 2 = {N * 3 + 2} / 4

The first even case merges two steps and the second takes three steps.
You can determine the type of the next transition to apply by looking
at the low two bits of N.

Before starting the revised series the initial value may not meet the
condition N % 3 = 2.  In this case apply the original transitions until
it does.  This only takes a few steps at most.

After that the revised series can be run.  Eventually they reduce to two;
which is trivially reduced to one using the original series.

original:  13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1  terminate

revised:  13  advance 20    ->    5    ->    8    ->   2  terminate -> 1

This Windows program prints the revised Collatz series:  collatz.exe

Run from a command line:    collatz   63_728_127

B. Transition Values Have N % 3 = 2

We briefly show that the revised transitions consume and produce values
where N % 3 = 2.

Odd:  N' = (3 * N + 1) / 2

N = 3 * j + 2                   Follows from:  N % 3 = 2

= 3 * [2 * k + 1] + 2         For N to be odd, j must be odd.

= 6 * k + 5

N' = (3 * [6 * k + 5] + 1) / 2   Substitute

= ([18 * k + 15 ] + 1) / 2

= 9 * k + 8

N' % 3 = (9 * k + 8) % 3 = 2

N%4=0:  N' = N / 4

N = 3 * j + 2                   Follows from:  N % 3 = 2

= 3 * [4 * k + 2] + 2         For N = 4j, j must be 4k+2

= 12 * k + 8

N' = [12 * k + 8] / 4            Substitute

= 3 * k + 2

N' % 3 = (3 * k + 2) % 3 = 2

N%4=2:  N' = (3 * N + 2) / 4

N = 3 * j + 2                   Follows from:  N % 3 = 2

= 3 * [4 * k] + 2             For N = 4j + 2, j must be 4k.

= 12 * k + 2

N' = (3 * [12 * k + 2] + 2 ) / 4  Substitute

= (36 * k + 8) / 4

= 9 * k + 2

N' % 3 = (9 * k + 2) % 3 = 2

Values where N % 3 = 0 or 1 occur as intermediate values within the
new transitions.  The number of possible values between transitions
is reduced by 2/3.

An arbitrary starting number may not have N % 3 = 2.  To begin the
revised sequence we need to first advance from the starting number
until the condition is met.

notation:  /\      bitwise and

A. When N % 3 = 0

DO until N /\ 1:           DO until the number is odd,
N = N / 2;                 Advance one even transition.
-

N = (3 * N + 1) / 2;       Advance an odd then even transition.

Examples:  20 -> 10 -> 5 -> 8     36 -> 18 -> 9 -> 14

B. When N % 3 = 1

IF N /\ 1:                 IF N is odd,
N = (3 * N + 1) / 2;       Advance an odd then even transition.

ELSE:                      ELSE N is even,
N = N / 2;                 Advance one even transition.
.

Examples:  7 -> 11         10 -> 5

IV. Transition Chains

There are often consecutive iterations of the same kind of transition
in a sequence to form a chain.  If you were to graph a series you'd
see a tree where the branches consist of these chains.

notation:  ^      exponentiation

A. By-Four Chain Where N % 4 = 0

A By-Four chain has inputs that are multiples of four.  The length of the
chain is determined by looking at the low order bits of the input value.
The number of pairs of zero bits gives you the length of the chain.

The first value of a By-Four chain can be written as:  j * 4^k
Then the last value is just j after k divisions by 4.

B. Odd Chain Where N % 2 = 1

Odd chains consume an odd input and have odd intermediate values.
The output is an even number.

The number of consecutive low order 1 bits determines the chain length.
For example an input of 19 is 10011 base 2 so the subsequent chain has
two Odd transitions:  19 -> 29 -> 44

A chain of length k takes an input of the form:  j * 2^k - 1
The output value is then:  j * 3^k - 1

N(1) = (3 * N + 1) / 2                         First transition

= (3 * [j * 2^k - 1] + 1) / 2             Substitute N = j * 2^k - 1

= ([3 * j * 2^k - 3] + 1) / 2

= [3 * j * 2^k - 2] / 2

= 3^1 * j * 2^[k-1] - 1

N(i+1) = (3 * {3^i * j * 2^[k-i] - 1} + 1) / 2   Subsequent transitions

= ({9 * j * 2^[k-i] - 3} + 1) / 2

= {9 * j * 2^[k-i] - 2} / 2

= 3^[i+1] * j * 2^[k-i-1] - 1

N(k) = 3^k * j - 1                             Odd chain output

C. Deuce Chain Where N % 4 = 2

The two low order bits of the input end with bits 10.  The length of the
chain can be determined by counting the pairs of zeros above them.  The
number of transitions is that count plus one.  For example:

258 = 1_00_00_00_10 base 2

There are 3 pairs of zeros above the trailing 10 so there are 4 transitions
in the chain and the output is 83.  When there is an odd number of zeros
the output next connects to a By-Four chain.

386 = 110_00_00_10 base 2

There are 2 pairs of zeros so there are 3 Deuce nodes before transitioning
to a Deuce chain starting with 164.

When the first value of a Deuce chain is written as:  j * 4^k + 2
The last value is then:  j * 3^k + 2

N(1) = (3 * N + 2) / 4                         First transition

= {3 * [j * 4^k + 2] + 2} / 4             Substitute N = j * 4^k + 2

= {[3 * j * 4^k + 6] + 2} / 4

= [3 * j * 4^k + 8] / 4

= 3^1 * j * 4^[k-1] + 2

N(i+1) = {3 * (3^i * j * 4^[k-i] + 2) + 2} / 4   Subsequent transitions

= {(3^[i+1] * j * 4^[k-i] + 6) + 2} / 4

= {(3^[i+1] * j * 4^[k-i] + 8)} / 4

= 3^[i+1] * j * 4^[k-i-1] - 1

N(k) = 3^k * j + 2                             By-Four chain output

V. Transition Metrics

A statistical argument for termination considers that on average values
along a path decrease and eventually reduce to 1.  The average change
in each transition is represented using a gain factor.  When the gain
is under one this means that on average each transition is a reduction.

This is similar to American Roulette where betting on red each round has
odds that ensure you will lose an average of 7.26%; making the gain
0.9274 = (1 - 7.26%).  Even if you have an initial lucky run with a gain
under unity eventually you will lose all your money.

A. Gain For Transitions

The gain of a transition is the ratio, output / input.  The gain for
individual transitions is its formula divided by the input N.  For large
N the "+1" and "+2" terms become insignificant.  They are dropped when
discussing the limits of sequences.

Output / Input           Limit

N is Odd:     [(3 * N + 1) / 2] / N        1.5

N % 4 = 0:    (N / 4) / N                  .25

N % 4 = 2:    [(3 * N + 2) / 4]            .75

The gain of a series of transitions is the product of the gains for
each transition.  A single kind of transition repeated k times has a
final gain of:  gain ^ k

The gain of a sequence with different types of transitions depends
on the number of times each type of transition was applied.  For the
revised sequences this would be:

Gt = 1.5 ^ (# of Odd) * .25 ^ (# of By4) * .75 * (# of Deuce)

More useful for analysis is the average gain.  Instead of using the
number of occurrences, the subtotals are divided by the total number
of iterations.  In other words we use the probability that each
kind of transition occurred.

Ga = 1.5 ^ p(Odd) * .25 ^ p(By4) * .75 ^ p(Deuce)

For a run length of 100 iterations if 50 were Odd, 20 were By-Four, and
30 were Deuce then the average gain is:

Ga = [1.5 ^ .5] * [.25 ^ .2] * [.75 ^ .3] = 0.85144

In an idealized Collatz sequence 50% would be Odd with Deuce and By-Four
transitions each occurring 25% of the time.

Ga = (3/2)^.5 * (1/4)^.25 * (3/4)^.25 = (27/64) ^ (1/4) = 0.80593

Performing many runs the actual average gains were:

Range       (3n+1)/2      n/4     (3n+2)/4        Ga

to 10^6      0.5068     0.2501     0.2431       0.80969
to 10^9      0.5043     0.2502     0.2456       0.80815
1B to 2B     0.5039     0.2501     0.2458       0.80800

The average gain for an individual sequence can vary quite a bit.
When there are none or few odd chains it is low.

n          (3n+1)/2     n/4   (3n+2)/4   1.5^pa * .25^pb * .75^pc

1_267_611_776       0      0.8236   0.1764          0.303485

As the portion of odd chains increases the average gain increases.

n          (3n+1)/2     n/4   (3n+2)/4   1.5^pa * .25^pb * .75^pc

1_005_925_919    0.5687    0.1400   0.2913        0.953995

B. Entropy

In this context we use Shannon entropy to measure the uniformity of
the occurrence of each kind of transition.  Since the kind of transition
is determined by the two low order bits we track the entropy of each.

The entropy metric ranges from zero to one.  For a sequence of bits the
entropy is one if they are uniform and zero if they are all the same.

p0 = (zero bits) / (total bits)

p1 =  (one bits) / (total bits) = 1 - p0

H = -p0 * log2( p0 ) + -p1 * log2( p1 )

To get a sense of how this applies, as we run a single Collatz sequence
the entropy of each of the low two bits is computed between each chain.
Entropy goes up as the bits change and goes down when they stay the same.
This tells us how well the transitions are blended.

In By-Four chains the type chain that follows is determined by j % 4.
Entropy is unchanged as the upper bits reamain the same and only low order
zeros with no entropy are shifted out.

After Deuce and Odd chains the next transition depends on j and k.
For these two cases the two low order bits of the output are hashed
based on their inputs.

Odd             Deuce

Chain In     j * 2^k - 1      j * 4^k + 2

Chain Out    j * 3^k - 1      j * 3^k + 2

The 2^k and 4^k terms are all zeros with zero entropy and are
transformed into 3^k with positive entropy.  This adds entropy in
the form of randomness to any individual Collatz sequence.

Over time a very long sequence will move closer to the average and the
sequence eventually terminates.  A sequence has to run far askew from
the averages to avoid convergence.  For example to skew the frequency
of By-Four and Deuce chains you would need:

50% Odd         6% By-Four        44% Deuce       Gain=1.00396

A 6% occurrence is a very significant change; not a slight deviation.
In this example entropy would be added in 94% of the chains making it
exceedingly rare and short lived.

It is well known that we only need to advance until the series reaches
a number that is less than the starting point.  After that you can use
recursion to show it terminates.  This implies that the effects of
entropy need to take effect early on.  Analyzing entropy beyond that
would be pointless.  It would remain high and would be misleading.

Using truncated series with any even starting point the next step goes
down and we are done.  Also if the low bits of N are 01 (N % 4 = 1)
we get:

N = 4 * j + 1  ->  3 * (4 * j + 1) + 1

= 12 * j + 4  =  3 * j + 1  <  N = 4 * j + 1

Now we only need consider remaining odd starting values for k > 2:

j * 2^k - 1  ->  j * 3^k - 1     where j is even

C. Observations

Here we give metrics for the first few orbits with various numbers of
chains.  There is a warm up period while enough entropy accumulates to
dominate over the initial state.  Adding one bit of entropy per Odd or
Deuce chain then the length of the period is 4/3 * log2( n ).  During
this period entropy is biased downward.

Also, since the goal is to run the sequence until it is lower than the
starting value, any subsequent values are not critical.  They will also
have higher entropy giving an upward bias.  Metrics are only taken for
the region after the warm up period to the point where the sequence goes
below the starting value.

The values listed are the initial sequence value, the number of chains,
the average entropy, and the minimum value.  In the first section metrics
are calculated starting with the fiftieth chain.  Only orbits are listed
with over 100 chains and the worst case (lowest) entropy.  Orbits with
average entropy over .98 are discarded.

N          Chains    Average     Minimum

1_004_910_504_406    101     0.97855     0.96440
1_006_847_123_698    102     0.97925     0.96764
1_009_507_654_930    103     0.97812     0.96226
1_016_869_791_785    104     0.97900     0.96511
1_010_222_525_375    105     0.97798     0.96309
1_018_973_403_257    106     0.97471     0.96171
1_011_458_120_662    107     0.97171     0.95689
1_019_225_482_855    108     0.97425     0.95522
1_016_511_374_619    109     0.97520     0.96012
1_009_097_630_686    110     0.97304     0.96162
1_012_606_892_238    111     0.97971     0.96898
1_016_251_181_895    112     0.97794     0.96485
1_006_442_994_665    113     0.97962     0.96026
1_009_074_973_979    114     0.97511     0.96435
1_009_300_305_682    115     0.97969     0.96373
1_004_039_432_955    121     0.97661     0.96880
1_017_723_198_430    122     0.97518     0.96323
1_004_753_675_742    123     0.97927     0.96378
1_019_270_409_215    124     0.97997     0.96124
1_012_163_792_146    125     0.97944     0.96564
1_019_057_497_270    127     0.97831     0.95842
1_009_934_311_788    142     0.97933     0.96564

This next chart tracks orbits in the same region with over 200 chains.
It illustrates how longer orbits have higher entropy.  For these longer
orbits Entropy is substantially higher.

N          Chains    Average     Minimum

1_019_524_626_430    201     0.99810     0.99517
1_011_050_942_290    202     0.99705     0.99342
1_012_645_261_039    203     0.99184     0.98394
1_019_445_805_308    204     0.99306     0.97947
1_008_195_131_391    205     0.99358     0.98272
1_015_571_878_206    206     0.99509     0.98566
1_011_482_948_255    208     0.99471     0.99005
1_014_934_024_239    209     0.98914     0.96991
1_019_229_503_550    211     0.99248     0.97494
1_012_087_192_295    213     0.99760     0.99186
1_016_311_780_350    217     0.99659     0.99162
1_005_443_653_055    219     0.99367     0.98477
1_019_868_848_999    220     0.99796     0.99290
1_003_500_978_358    231     0.99349     0.98741
1_015_160_288_158    236     0.99463     0.98407
1_009_542_611_052    253     0.99174     0.98403

D. Disjoint Branch

If a graph tree of all orbits was not fully connected there would be
a path that never reached the root.  The branch could either be cyclic
or infinitely long.  Cyclic branches would have repeating values whereas
within acyclic branches all values are unique.  To remain unique an
acyclic disjoint branch would contain values from a pool of increasingly
larger values.

A characteristic of pseudo random number generators is that they can
produce a sequence that repeats.  If the Collatz sequence, acting a
generator, was to repeat then the sequence would cycle and the branch
would not be infinitely long.  This raises the prospect of a cyclical
counter example with a very long period.

As with the gambler that does not quit while ahead, an infinite branch
would succumb to the pressure of the house odds.  To thwart them Odd
transitions would need to be applied over 2/3 of the time.  An infinite
disjoint branch would contain more than twice as many odd values as even.

Ga = 1.5^.672 * .25^.164 * .75^.164 = 0.99794   Just under breaking even

Ga = 1.5^.674 * .25^.163 * .75^.163 = 1.00043   Just over breaking even

To sustain these proportions in a long sequence the entropy of the low
order bit would need to average below 0.9183.

H = -1/3 * log2( 1/3 ) + -2/3 * log2( 2/3 ) = 0.9183

Another way to beat the odds is if By-Four chains were to never occur.
Even and odd values could potentially appear half the time and the path
would on average keep increasing.

Ga = 1.5 ^ .5 * .25 ^ .0 * .75 ^ .5 = 1.06066

Long paths of any length can be contrived, but eventually will produce a
By-Four chain.  Actual paths can start off with very few By-Four chains
for values (such as the path starting at 71), but eventually terminate.

A tiny number of By-Four chains could occur and still have a gain over
unity.  A disjoint branch would have to meet this constraint.  Allowing
4% By-Four chains (highly skewed from the 25% average) we get:

Ga = 1.5 ^ .48 * .25 ^ .04 * .75 ^ .48 = 1.00108

A statistical argument alone is insufficient for a proof.  Physical
imperfections aside, the model for roulette assumes that each spin is an
independent purely random event.  This differs from the Collatz sequence
where a given starting number produces a deterministic pseudo random
sequence.

The remaining issue concerns the distribution of even and odd values over
an infinite disjoint branch.  We need to show there is enough entropy
so that there are fewer than twice as many odd values as evens in order
for the series to converge.

A starting value can be contrived to create a sequence of any
desired fixed length.  The initial value contains a finite amount of
information.  As each chain is run strings of ones or zeros with no
entropy are replaced by a power of three with high entropy.  After
running each chain the information in the starting value becomes
randomized until all of the information in the initial state is lost.
Consequently an infinitely long chain cannot be created.

The overall increase in entropy for odd chains forms a self-regulating
mechanism.  Any skew in the ratio of odd to even values moves to 50:50
due to the increase in entropy.  Collatz chains become sufficiently
randomized after the first few chains for even the most contrived runs.

We've observed that orbits with over 100 chains have entropy is sustained
above 0.975 with minimums over 0.95.  This goes up in longer orbits and
is over 0.995 with 200 chains.  This is well above the 0.9183 level
required for eternal growth.  However, to be conclusive we'd need to
prove that entropy rises in longer chains and has a lower bound.

VI. Conclusion

A revised sequence combines two or three steps in the original.

N is Odd:     N' = (3 * N + 1) / 2

N % 4 = 0:    N' = N / 4

N % 4 = 2:    (3 * N + 2) / 4

Values between each step have N % 3 = 2.  The other 2/3 of values are
subsumed within each transition.  The low two bits of each value
produced determine the next step.

In the limit the gain in each of these transitions is 1.5, .25, and .75
respectively.  Gains in a sequence of transitions is the product of each
individual gain.  The average gain with a uniform distribution of each
type of transition is then:

Ga = (3/2)^.5 * (1/4)^.25 * (3/4)^.25 = (27/64) ^ (1/4) = 0.80593

An orbit that never reduced and instead continuosly increases would
require twice as many odd transitions as even.

Ga = 1.5^.672 * .25^.164 * .75^.164 = 0.99794   Just under breaking even

Repetitions of the revised steps form chains and combine to give:

N is Odd:     N  = 2^K * j - 1

N' = 3^k * j - 1

N % 4 = 0:    N  = 4^k * j

N' = j

N % 4 = 2:    N  = 4^K * j + 2

N' = 3^k * j + 2

Starting values of interest are those that do not immediately reduce.

2 * j * 3^k - 1      where k > 2

The entropy of a bit stream is:

H = -p0 * log2( p0 ) + -p1 * log2( p1 )

The entropy required for an orbit that never reduced would be:

H = -1/3 * log2( 1/3 ) + -2/3 * log2( 2/3 ) = 0.9183

A disjoint branch needs to on average always increase.  To do so it needs
to consist primarily of Odd chains.  However Odd chains increase entropy,
reducing the number times they occur.  A branch that started with a high
proportion of Odd chains would eventually converge due to increasing
entropy, making it impossible to form an infinite disjoint branch.

We measure the entropy of each of the low two bits between each chain.
Higher entropy implies a more uniform distribution of the kinds of each
transition.  Observed entropy is over .995 in orbits with over 200
chains; well over the 0.9183 minimum.  Such orbits will eventually reduce
below their starting value; supporting the Collatz conjecture.

Entropy is more than a statistical metric.  Here it acts as a force that
in part determines the next transition to apply.  It randomizes the lower
two bits of the revised sequence resulting in a more uniform blend of
transactions.  The source of the entropy is the change in sequence values
from 2^k to 3^k.

Part of what makes the Collatz sequence intractable is the randomization
of the values it produces.  Ironically randomization is the feature that
moves it towards termination.
```