Entropy Makes the Collatz Series Go Down

             Bradley Berg                          November 21, 2018
                                              Edited August  6, 2022


       We look at the Collatz Series from an information theory 
       perspective to lay out its underlying computational mechanics.  
       The mechanisms are similar to those used by pseudo random number 
       generators and one way hashes.

       An equivalent restatement of the Collatz series steps through 
       alternating even and odd chains.  The result of each step is an 
       even number with one or more trailing zeros.  The next higher 
       bits will be one or more ones.  The remaining high order bits are 
       an even number with no special ordering.

       Numbers with all ones or zeros have no entropy.  Transitions 
       discard low order zeros; increasing the entropy of the output 
       value.  The next higher group of ones is then replaced by 
       combining them with the remaining high order bits.  This also 
       increases entropy.

       Generally, as the entropy of a series increases the values become 
       more randomized.  The more random the series the more it trends 
       towards statistical averages for the series.  The average gain of 
       the Collatz Series is well below one so that values drift lower 
       until reaching one; providing it is not circular.

               ->     "Transition to" the the next value in a sequence.
                ^     exponentiation operator
               //     logical shift right operator
               \\     logical shift left operator

1.  Combining Transitions

   This is an informal explanation of the computational mechanics for the 
   Collatz Sequence.  The same class of computations are used in 
   cryptographic random number generators.

   Often the sequence is defined where the Odd transition is just "3*n+1".  
   Instead we also divide the resilt by two.  This way the average odds of 
   taking either step are even.  This is easy to see by taking a range of 
   inputs and you'll notice the outputs will be exactly evenly distributed.

      N is Even:    N' = N / 2

      N is Odd:     N' = (3 * N + 1) / 2

   The gain of a transition is its Output divided by the Input (N' / N).
   As N gets larger the "+ 1" term quickly becomes insignificant.  It is 
   dropped so we can compute the gain for the transition in the limit.

                      Output / Input             Gain

       N is Even:    {N / 2} / N                  0.5

       N is Odd:     {(3 * N + 1) / 2} / N        1.5

   The gain of a series of transitions is the product of the gains for
   each transition.  We get the average gain of a sequence by using the
   probability of taking either an odd or even path.

       Gs = 1.5 ^ p(odd) * .5 ^ p(even)

   If a sequence produces uniformly randomized values then the chances of 
   using either transitions are 50:50.  In this case the choice of path is 
   determined by the low order bit of the input value.  This means the low 
   bit would need to be randomized over the sequence.  The average gain of a 
   randomized sequence is then:

       Average transaction gain = <transaction gain> ^ p(<transaction>) 

            Average odd gain  = 1.5 ^ .5 = 1.22474

            Average even gain = 0.5 ^ .5 = 0.70711

       Average sequence gain:   Ga = 1.22474 * 0.70711 = 0.86603+

   The statistical average gain in each step is less than one so on average 
   the sequence declines.  However, the gain in an actual series can vary 
   significantly.  There could potentially still be a series where the the 
   total gain indefinitely exceeds one and never terminates.

         Gain = 1.5 ^ p * 0.5 ^ (1 - p) = 1

                ln( 1.5 ^ p * 0.5 ^ [1 - p]) = ln( 1 ) = 0

              = ln( 1.5 ^ p ) + ln( 0.5 ^ (1 - p)

              = p * ln( 1.5 ) + (1 - p) * ln( 0.5 )

               p * ln( 1.5 )  =  -(1 - p) * ln( 0.5 )

               ln( 1.5 ) / ln( 0.5 )  =  -(1 - p) / p

               ln( 1.5 ) / ln( 0.5 )  =  -1 / p + 1

               ln( 1.5 ) / ln( 0.5 ) - 1  =  -1 / p

         p  =  -1 / {ln( 1.5 ) / ln( 0.5 ) - 1} = 0.63093+

       Gain = 1.5^0.63 * 0.5^0.37 = 0.99898      Just under breaking even

       Gain = 1.5^0.64 * 0.5^0.36 = 1.01001      Just over breaking even

   For a series to run on forever it would have to sustain an average gain 
   over one.  To do this odd transitions would need to be applied over 1.7 
   times more than evens.  We've determined that an equal number of 
   transitions gives an average gain of about 0.86603.

1.1 Even and Odd Chains

   There are often consecutive iterations of the same kind of transition in 
   the sequence to form a chain.  Even chains start with an even seed value 
   that in binary have one or more trailing zeroes.  The Even transition
   simply removes the zeros at the end of an even chain.

   Odd chains consume an odd input and can have multiple odd intermediate 
   values.  The output of an Odd chain is an even number.  The number of 
   consecutive low order one bits determines the chain length.  For example, 
   an input of 19 is a binary 10011 so the subsequent chain has two Odd 
   transitions:  19 -> 29 -> 44

   A chain of length k takes an input of the form:  j * 2^k - 1
   The output value simplifies to:  j * 3^k - 1

     N(1) = (3 * N + 1) / 2                         First transition

          = (3 * [j * 2^k - 1] + 1) / 2             Substitute N = j * 2^k - 1

          = ([3 * j * 2^k - 3] + 1) / 2

          = [3 * j * 2^k - 2] / 2

          = 3^1 * j * 2^[k-1] - 1

   N(i+1) = (3 * {3^i * j * 2^[k-i] - 1} + 1) / 2   Subsequent transitions

          = ({9 * j * 2^[k-i] - 3} + 1) / 2

          = {9 * j * 2^[k-i] - 2} / 2

          = 3^[i+1] * j * 2^[k-i-1] - 1

     N(k) = 3^k * j - 1                             Odd chain output

   A Collatz sequence will have segments with consecutive runs of each 
   transition.  These chains have length k and have the form:

                    Chain Input        Chain Output        Gain

      N is Odd:     j * 2^k - 1        j * 3^k - 1         1.5^k

      N is Even:    j * 2^k            j                   0.5^k

   Note that for even inputs j is the input shifted right by the number of 
   its trailing zeros.  For odd inputs j is 1 plus the input shifted right 
   by the number of trailing ones.  For reference, here are the first few 
   chains for the series beginning with a seed of 27.

               Original                        Even        Odd         j   k

      27 -> 41 -> 124 -> 62                             27 -> 62       7   2

               -> 31                           -> 31                  31   1

         -> 47 -> 71 -> 107 -> 161 -> 242                  -> 242      1   5

               -> 121                          -> 121                121   1

               -> 364 -> 182                               -> 182     61   1

               -> 91                           -> 91                  91   1

               -> 137 -> 206                               -> 206     23   2

               -> 103                          -> 103                103   1

               -> 155 -> 233 -> 350                        -> 350     13   3

               -> 175                          -> 175                175   1

        -> 263 -> 395 -> 593 -> 890                        -> 890     11   4

1.2  Combining Even And Odd Chains

   The series alternates between even and odd chains.  We represent this 
   aspect by algebraically merging both into a single step.  This gives us a 
   series defined using a single transition.  Since all intermediate values 
   using the combined definition are even, an initial odd value needs to 
   transition one step using the "3 * n + 1" rule to get an even number.

      Every input has the binary form:   j  ones  zeros
         N = ([j + 1] * 2^ko - 1) * 2^kz  -->  [j + 1] * 3^ko - 1

             where:  kz - number of trailing zeros in N
                     ko - number of the next higher set of ones in N
                      j - N shifted right by kz + ko bits:  N / 2^[kz + ko]

   Using the previous example, initially we transition 27 to 82.  From there 
   the next few steps are:

                                       j     ko     kz       binary

       82 -> 41 ->  124 -> 62         20      1      2       101_0010

          -> 31 ->  484 -> 242         0      5      1        11_1110

         -> 121 ->  364 -> 182        60      1      1      1111_0010

         ->  91 ->  274 -> 206        22      2      1      1011_0110

         -> 103 ->  310 -> 466        12      3      1      1100_1110

         -> 233 ->  700 -> 350       116      1      1     1_1101_0010
         -> 593 -> 1780 -> 890        10      4      1     1_0101_1110

2.  Sequence Entropy

   Entropy is a measure of information indicating the level of uncertainty
   about the possible outcomes of a random variable.  

       H = -p0 * log2( p0 ) + -p1 * log2( p1 )

         where:  p0 is the probability a bit is zero.
                 p1 is the probability a bit is one (1 - p0).

   A set of coin tosses has p0 = p1 =.5 so its entropy is 1, totally random.
   When looking at the entropy of bits in a number then p0 is the percentage 
   of zero bits.  For the binary number, 1010_1111, p0 is .25 (H = 0.811).  
   Strings of all ones or zeros have no entropy (H = 0).  In these examples
   we are measuring the bits in a number horizontally.

   Bits in a series of numbers have two dimensions - horizontal bits in each 
   individual value and vertical bits over the duration of the series.  We 
   can also measure entropy vertically over a number series.  That means we 
   can look at a select bit in each value as the series progresses.

   For Collatz the low order bit is interesting because it determines if a 
   number is odd or even.  In turn that determines which transition to take. 
   When the entropy of the low order bit is high then on average there will 
   be nearly as many even transitions as odds.

   Each kind of chain takes a value where the low order bits are a string of 
   zeros or ones and either removes or replaces them.  Even inputs remove low
   order zeros.  The expression for odd chain inputs has a 2^k term that 
   transitions to 3^k.

   Strings of zeros or ones have no entropy and the 3^k term has positive 
   entropy.  This adds entropy each time a chain of transitions is applied.  
   Removing the low order zeros that have no entropy only increases the 
   entropy of the result.  Entropy is also increased by removing the 
   repeated ones and replacing them with scrambled bits.  The upper bits, j, 
   are scrambled by multiplying j by a power of 3.  This uniformly 
   randomizes the choice of transition to apply next.

2.1 Losing Information

   The seed can be contrived to produce a Collatz series of any desired 
   length.  The longer the series the larger the seed has to be.  It has to 
   contain enough information to influence the desired outcome.

   Initially, as the series progresses information in the seed is lost.
   When there are two ways to reach the same number in a sequence we lose
   the information about which path was taken.

   Odd numbers always transition (3 * n + 1) to even numbers so an odd value 
   cannot be reached from another odd.  You can reach certain even numbers 
   from either an odd or even transition.  For example, an output of 16 can 
   be reached from either 32 or 5.
          32 -> 32 / 2 = 16     5 -> 3 * 5 + 1 = 16

   Whenever transitioning to an even value such that (even - 1) % 3 = 0
   then the previous value could have been either:

         2 * even   or   (even - 1) / 3

   For Collatz, a bit of information contained in the seed is lost each time 
   one of these select even numbers is reached.  After all bits in the seed 
   are scrubbed the initial phase is complete.  Any attempt to contrive a 
   seed to skew results can only directly affect values during this phase.

   One way hash functions use this concept of lost information.  Secure 
   hashes have very many ways to reach each hash value.  This is how 
   passwords are encoded so they can be authenticated.  In this sense 
   Collatz is a trivial example of a one way hash.

2.2  Randomization Phase

   This next phase is key as the sequence acts as a pseudo random number 
   generator.  Randomized values eventually trend towards their average.
   In turn this drives transitions towards their average gain.  In the 
   introduction we showed that Collatz has an average gain of 0.86603;
   driving the series below the seed value.

   Computations for a single Odd step can be performed using shifts and

      N'  =  (3 * N + 1) / 2  =  [(N \\ 1) + N + 1] // 1

   The input is odd so the low order bit is one.  The left shift and the two 
   additions multiply N by 3 and increment the result.  This scrambles the 
   bits in N and the increment ensures an even intermediate value.  The final 
   right shift eliminates the known low order zero; leaving only the 
   scrambled bits.

   A input value has a low order one bit and the upper bits are unknown.  
   The output removes the known low order one bit and produces an unknown 
   output.  Entropy is increased in this step by scrambling the input.

   Similarly, in the Even step the known low order zero is shifted out.
   Removing the known low order zero leaves all unknown bits which also
   increases entropy.
   Adding a constant to uniformly random numbers does not change the
   uniformity of the values.  They are simply offset by the constant value.
   Adding 1 to random numbers from 0 to 9 produces equally random numbers
   from 1 to 10.

   The product of a random variable by a constant is also random, but they 
   have a larger gap between them.  Multiplying random numbers from 1 to 10 
   by 3 yields random numbers from 3 to 33.  They simply have a gap of 3 
   between them.

   For the combined series from section 1.2 we have:

         N  =  ([j + 1] * 2^ko - 1) * 2^kz  ->  [j + 1] * 3^ko - 1

   Overall we are taking a value with unknown bits, R, and scramble them
   by multiplying by a power of 3.  Note that the kz term is discarded.

         R'  =  R * 3^ko - 1

   Outputs in any individual series depend on the values kz and ko.  The more
   random they are then the more random the series.  The k values measure 
   the width of a horizontal subset of bits in each value.  Pseudo random 
   number generators that conflate operations on horizontal and vertical bit 
   sets rely on the independence of these orthogonal values.

   Repeated low order bits with low entropy are continuously removed and 
   replaced by scrambled bits.  This creates a self regulating system that 
   randomizes the low order bits.  Those bits control the choice of which 
   even or odd transition to apply.

   If the series is uniformly random then the average number of even and odd 
   transitions are the same.  In turn this causes the series to decrease. It 
   remains to formally prove uniformity.  The computations must be shown
   to have no aspect that is biased towards more one bits than zero bits.

   Lets look at the output of the combined transition:  [j + 1] * 3^ko - 1

         [j + 1]   Changes the low zero bit to one.
                   Adding a constant is unbiased.

          * 3^ko   repeated shift and add

                   x \\ 1  adds a zero to the end

                      + x  mixes the bits; the low bit remains one
          - 1      Borrows 1 from the result to reset the low bit to zero.
                   Subtracting one is unbiased.

   If the sequence was not uniformly random  then that would be reflected in 
   the arithmetic in the transitions.  The result of each of these steps is 
   unbiased so that the series acts as a random number generator.   Even if 
   the seed initially results in highly skewed values, that cannot be 
   sustained.  Values will become more randomized as the series progresses 
   and trend toward average results.  Even if you get lucky and call the 
   results of several coin tosses, your luck will run out in the long run.

2.3  Reduce To One

   The previous phase leaves us with a value of N that is below the seed.  
   Once this happens we know the series will eventually terminate at one.  
   Firstly, we know all values below some arbitrary small number M (say 10) 
   transition to one.

   Next, starting with the next higher seed, M + 1, we transition until it 
   reaches M or less,  Since we already know seeds of M or less will reach 
   one, by induction, a series that goes below its seed will reach one.

   In this final phase the law of small numbers takes over and the series 
   will likely behave less randomly.  Once one is reached all information 
   about previous values leading up to it is gone.  The final value is a 1 
   which has no entropy.

3. Conclusion

   The Collatz sequence integrates several mechanisms used to create
   pseudo random number generators.

      * A one way hash loses regularities due to a contrived seed.

      * Repetition in low order bits is erased in each step.

      * The product of independent values is used to randomize values.

   Any individual series is partitioned into three phases.  In the initial 
   phase the seed value can influence the outcome to produce arbitrarily 
   long runs.  After that the series generates randomized values until it 
   goes below the seed value.  From there it is guaranteed to reduce to one 
   unless the series is circular.

   A uniformly randomized series eventually moves towards a statistically 
   average gain.  For a Collatz series to sustain an average gain above one
   would require over 1.7 times more odd transitions than even.  This is
   well above parity.  When the low order bit is random the series will
   average out and decrease until it inevitably goes below the seed.  Once
   it does that we know it will terminate.

   The random behavior of the Collatz series makes it impossible to prove
   algebraically.  The irony is that this randomness is the force that
   causes it to converge.