Entropy Makes the Collatz Series Go Down

             Bradley Berg                          November 21, 2018
                                             Edited October 21, 2021


                                ABSTRACT

       An equivalent redefinition of the Collatz series is used that has 
       three kinds of segments.  Inputs to each segment have a string of 
       zeros or ones in the low order bits represented as powers of two. 
       In the corresponding output of each segment these terms are 
       replaced by powers of three or are deleted.  These input terms 
       have no entropy and entropy is increased in outputs.  As the 
       series progresses entropy increases; randomizing the sequence.

       The low order two bits in inputs to each segment determine which 
       of the three kinds of transitions to apply.  As these bits become 
       more random, the rate that each kind of segment occurs averages 
       out.  An average distribution causes the sequence to decrease.

       A linear counter example consists of an infinitely long sequence 
       with transactions that produce ever increasing values.  Values 
       would need to be highly skewed in favor of odd numbers to sustain 
       the increase.  Even when the starting value is contrived to 
       produce mostly odd (increasing) transitions, information encoded 
       in the starting value is lost.  Eventually even and odd values 
       balance out and the sequence decreases.  This makes it impossible 
       to construct an infinitely long series.



I. Overview

A. Rewritten transitions for the Collatz sequence merge two or three steps:

      N is Odd:     N' = (3 * N + 1) / 2

      N % 4 = 0:    N' = N / 4

      N % 4 = 2:    N' = (3 * N + 2) / 4


   All intermediate values in this sequence have:  N % 3 = 2
   The low order two bits of N determine the type of the next transition.


B. The average occurance and gain (output to input ratio) is:

                   Occurrence     Gain Limit        Gain ^ Occurrence

      N is Odd:      50%          3/2 = 1.5         1.5^.5 = 1.22474

      N % 4 = 0:     25%          1/4 = .25        .25^.25 = 0.70711

      N % 4 = 2:     25%          3/4 = .75        .75^.25 = 0.9306


   The gain of a sequence is the product of the gains for each transition.
   In the limit the average gain of a Collatz sequence is:

           Ga = 1.22474 * 0.70711 * 0.9306 = 0.80592


   The average gain is under unity so on average Collatz sequences will
   steadily be reduced until terminated.  Gains vary in actual sequences.


C. A Collatz sequence will have segments with consecutive runs of
   each transition.  Each of these chains have length k with these
   input and output forms:

                    Chain Input        Chain Output        Gain

      N is Odd:     j * 2^k - 1        j * 3^k - 1         1.5^k

      N % 4 = 0:    j * 4^k            j                   .25^k

      N % 4 = 2:    j * 4^k + 2        j * 3^k + 2         .75^k


   Each chain takes a value where the low order bits are a string of zeros 
   or ones and replaces or removes them.  Inputs have a 2^k term that 
   transition to 3^k or a 4^k term that is removed.  Strings of zeros or 
   ones have no entropy and the 3^k term has positive entropy.  Entropy is 
   added each time a chain of transitions is applied.

   In particular, each chain hashes the low order two bits that determine
   the type of the next chain.  In a very long sequence each type of
   transition occurs closer to the average rate.

   In order for a Collatz sequence to avoid eventual termination it needs
   to either be circular or produce values in ever greater ranges.  In
   the latter case any infinite sequence requires a gain over one to keep
   growing.  This cannot be sustained indefinitely.  Entropy ensures that
   it will drift towards the average gain of 0.8059 and eventually terminate.

   Over twice as many odd transitions than even are needed in order to
   achieve the minimum eternal growth rate; which is far askew from equity.
   In a long sequence the entropy of the low order bit would need to remain
   below 0.9183.

           Odd     N % 4 = 0    N % 4 = 2
   
         1.5^.674 * .25^.163 * .75^.163 = 1.00043   gain slightly over 1


   The series self regulates in that as more odd transactions are applied
   the entropy goes up which pushes the number of odd transactions down
   towards the 50% average.  In turn this lowers the average gain lowers
   until it descends below one leading to eventual termination.

   After the first few chains in even the most contrived runs, entropy
   measures above .99.  This is well above the 0.9183 minimum value required
   for termination.

   Here is an example run with metrics for each step.



II. Collatz Conjecture

     even N:  N' = N / 2
      odd N:  N' = N * 3 + 1

   The sequence eventually reduces to 1.


      notation:   %   modulus
                 ->   transition to the the next value in a sequence


A. Rewritten Conjecture

   The original series is rewritten to produce only values where:  N % 3 = 2
   Two or three iterations are merged into each revised transition.
   Intermediate values that are skipped have:  N % 3 = 0 or 1.
   They account for 2/3 of the values in the original series and skipping
   them simplifies analysis.

   The transition out of an odd number is always even so we can take 2 steps:

      N' = {N * 3 + 1} / 2


   On an even number apply one of these two transitions:

      When N % 4 = 0 then:  N' = N / 4

      When N % 4 = 2 then:  N' = ([N / 2] * 3 + 1) / 2 = {N * 3 + 2} / 4


   The first even case merges two steps and the second takes three steps.
   You can determine the type of the next transition to apply by looking
   at the low two bits of N.

   Before starting the revised series the initial value may not meet the
   condition N % 3 = 2.  In this case apply the original transitions until
   it does.  This only takes a few steps at most.

   After that the revised series can be run.  Eventually they reduce to two;
   which is trivially reduced to one using the original series.

     original:  13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1  terminate

      revised:  13  advance 20    ->    5    ->    8    ->   2  terminate -> 1


   This Windows program prints the revised Collatz series:  collatz.exe

        Run from a command line:    collatz   63_728_127


B. Transition Values Have N % 3 = 2

   We briefly show that the revised transitions consume and produce values
   where N % 3 = 2.

         Odd:  N' = (3 * N + 1) / 2

                N = 3 * j + 2                   Follows from:  N % 3 = 2

                  = 3 * [2 * k + 1] + 2         For N to be odd, j must be odd.

                  = 6 * k + 5

               N' = (3 * [6 * k + 5] + 1) / 2   Substitute    

                  = ([18 * k + 15 ] + 1) / 2

                  = 9 * k + 8

           N' % 3 = (9 * k + 8) % 3 = 2


       N%4=0:  N' = N / 4

                N = 3 * j + 2                   Follows from:  N % 3 = 2

                  = 3 * [4 * k + 2] + 2         For N = 4j, j must be 4k+2

                  = 12 * k + 8

               N' = [12 * k + 8] / 4            Substitute    

                  = 3 * k + 2

           N' % 3 = (3 * k + 2) % 3 = 2


       N%4=2:  N' = (3 * N + 2) / 4

                N = 3 * j + 2                   Follows from:  N % 3 = 2

                  = 3 * [4 * k] + 2             For N = 4j + 2, j must be 4k.

                  = 12 * k + 2

               N' = (3 * [12 * k + 2] + 2 ) / 4  Substitute    

                  = (36 * k + 8) / 4

                  = 9 * k + 2

           N' % 3 = (9 * k + 2) % 3 = 2


   Values where N % 3 = 0 or 1 occur as intermediate values within the
   new transitions.  The number of possible values between transitions
   is reduced by 2/3.



III. Advance the Initial Value

   An arbitrary starting number may not have N % 3 = 2.  To begin the
   revised sequence we need to first advance from the starting number
   until the condition is met.


       notation:  /\      bitwise and


A. When N % 3 = 0

      DO until N /\ 1:           DO until the number is odd,
         N = N / 2;                 Advance one even transition.
      -

      N = (3 * N + 1) / 2;       Advance an odd then even transition.


         Examples:  20 -> 10 -> 5 -> 8     36 -> 18 -> 9 -> 14


B. When N % 3 = 1

      IF N /\ 1:                 IF N is odd,
         N = (3 * N + 1) / 2;       Advance an odd then even transition.

      ELSE:                      ELSE N is even,
         N = N / 2;                 Advance one even transition.
      .


         Examples:  7 -> 11         10 -> 5



IV. Transition Chains

   There are often consecutive iterations of the same kind of transition
   in a sequence to form a chain.  If you were to graph a series you'd
   see a tree where the branches consist of these chains.


       notation:  ^      exponentiation


A. By-Four Chain Where N % 4 = 0

   A By-Four chain has inputs that are multiples of four.  The length of the 
   chain is determined by looking at the low order bits of the input value.  
   The number of pairs of zero bits gives you the length of the chain.
   
   The first value of a By-Four chain can be written as:  j * 4^k
   Then the last value is just j after k divisions by 4.


B. Odd Chain Where N % 2 = 1

   Odd chains consume an odd input and have odd intermediate values.
   The output is an even number.

   The number of consecutive low order 1 bits determines the chain length.
   For example an input of 19 is 10011 base 2 so the subsequent chain has
   two Odd transitions:  19 -> 29 -> 44

   A chain of length k takes an input of the form:  j * 2^k - 1
   The output value is then:  j * 3^k - 1

     N(1) = (3 * N + 1) / 2                         First transition

          = (3 * [j * 2^k - 1] + 1) / 2             Substitute N = j * 2^k - 1

          = ([3 * j * 2^k - 3] + 1) / 2

          = [3 * j * 2^k - 2] / 2

          = 3^1 * j * 2^[k-1] - 1


   N(i+1) = (3 * {3^i * j * 2^[k-i] - 1} + 1) / 2   Subsequent transitions

          = ({9 * j * 2^[k-i] - 3} + 1) / 2

          = {9 * j * 2^[k-i] - 2} / 2

          = 3^[i+1] * j * 2^[k-i-1] - 1


     N(k) = 3^k * j - 1                             Odd chain output


                           
C. Deuce Chain Where N % 4 = 2

   The two low order bits of the input end with bits 10.  The length of the 
   chain can be determined by counting the pairs of zeros above them.  The 
   number of transitions is that count plus one.  For example:

       258 = 1_00_00_00_10 base 2


   There are 3 pairs of zeros above the trailing 10 so there are 4 transitions
   in the chain and the output is 83.  When there is an odd number of zeros
   the output next connects to a By-Four chain.

       386 = 110_00_00_10 base 2


   There are 2 pairs of zeros so there are 3 Deuce nodes before transitioning
   to a Deuce chain starting with 164.

   When the first value of a Deuce chain is written as:  j * 4^k + 2
   The last value is then:  j * 3^k + 2

     N(1) = (3 * N + 2) / 4                         First transition

          = {3 * [j * 4^k + 2] + 2} / 4             Substitute N = j * 4^k + 2

          = {[3 * j * 4^k + 6] + 2} / 4

          = [3 * j * 4^k + 8] / 4

          = 3^1 * j * 4^[k-1] + 2


   N(i+1) = {3 * (3^i * j * 4^[k-i] + 2) + 2} / 4   Subsequent transitions

          = {(3^[i+1] * j * 4^[k-i] + 6) + 2} / 4

          = {(3^[i+1] * j * 4^[k-i] + 8)} / 4

          = 3^[i+1] * j * 4^[k-i-1] - 1


     N(k) = 3^k * j + 2                             By-Four chain output



V. Transition Metrics

   A statistical argument for termination considers that on average values
   along a path decrease and eventually reduce to 1.  The average change
   in each transition is represented using a gain factor.  When the gain
   is under one this means that on average each transition is a reduction.

   This is similar to American Roulette where betting on red each round has
   odds that ensure you will lose an average of 7.26%; making the gain
   0.9274 = (1 - 7.26%).  Even if you have an initial lucky run with a gain
   under unity eventually you will lose all your money.


A. Gain For Transitions

   The gain of a transition is the ratio, output / input.  The gain for
   individual transitions is its formula divided by the input N.  For large
   N the "+1" and "+2" terms become insignificant.  They are dropped when
   discussing the limits of sequences.

                       Output / Input           Limit

      N is Odd:     [(3 * N + 1) / 2] / N        1.5

      N % 4 = 0:    (N / 4) / N                  .25

      N % 4 = 2:    [(3 * N + 2) / 4]            .75


   The gain of a series of transitions is the product of the gains for
   each transition.  A single kind of transition repeated k times has a
   final gain of:  gain ^ k

   The gain of a sequence with different types of transitions depends
   on the number of times each type of transition was applied.  For the
   revised sequences this would be:

       Gt = 1.5 ^ (# of Odd) * .25 ^ (# of By4) * .75 * (# of Deuce)


   More useful for analysis is the average gain.  Instead of using the
   number of occurrences, the subtotals are divided by the total number
   of iterations.  In other words we use the probability that each
   kind of transition occurred.

       Ga = 1.5 ^ p(Odd) * .25 ^ p(By4) * .75 ^ p(Deuce)


   For a run length of 100 iterations if 50 were Odd, 20 were By-Four, and
   30 were Deuce then the average gain is:

       Ga = [1.5 ^ .5] * [.25 ^ .2] * [.75 ^ .3] = 0.85144


   In an idealized Collatz sequence 50% would be Odd with Deuce and By-Four 
   transitions each occurring 25% of the time.

       Ga = (3/2)^.5 * (1/4)^.25 * (3/4)^.25 = (27/64) ^ (1/4) = 0.80593


   Performing many runs the actual average gains were:

        Range       (3n+1)/2      n/4     (3n+2)/4        Ga

        to 10^6      0.5068     0.2501     0.2431       0.80969
        to 10^9      0.5043     0.2502     0.2456       0.80815
        1B to 2B     0.5039     0.2501     0.2458       0.80800


   The average gain for an individual sequence can vary quite a bit.
   When there are none or few odd chains it is low.

           n          (3n+1)/2     n/4   (3n+2)/4   1.5^pa * .25^pb * .75^pc

      1_267_611_776       0      0.8236   0.1764          0.303485


   As the portion of odd chains increases the average gain increases.

           n          (3n+1)/2     n/4   (3n+2)/4   1.5^pa * .25^pb * .75^pc

      1_005_925_919    0.5687    0.1400   0.2913        0.953995


B. Entropy

   In this context we use Shannon entropy to measure the uniformity of
   the occurrence of each kind of transition.  Since the kind of transition
   is determined by the two low order bits we track the entropy of each.

   The entropy metric ranges from zero to one.  For a sequence of bits the
   entropy is one if they are uniform and zero if they are all the same.

      p0 = (zero bits) / (total bits)

      p1 =  (one bits) / (total bits) = 1 - p0

       H = -p0 * log2( p0 ) + -p1 * log2( p1 )


   To get a sense of how this applies, as we run a single Collatz sequence
   the entropy of each of the low two bits is computed between each chain.
   Entropy goes up as the bits change and goes down when they stay the same.
   This tells us how well the transitions are blended.

   In By-Four chains the type chain that follows is determined by j % 4.
   Entropy is unchanged as the upper bits reamain the same and only low order
   zeros with no entropy are shifted out.

   After Deuce and Odd chains the next transition depends on j and k.
   For these two cases the two low order bits of the output are hashed
   based on their inputs.

                               Odd             Deuce

               Chain In     j * 2^k - 1      j * 4^k + 2

               Chain Out    j * 3^k - 1      j * 3^k + 2


      The 2^k and 4^k terms are all zeros with zero entropy and are
      transformed into 3^k with positive entropy.  This adds entropy in
      the form of randomness to any individual Collatz sequence.


   Over time a very long sequence will move closer to the average and the 
   sequence eventually terminates.  A sequence has to run far askew from
   the averages to avoid convergence.  For example to skew the frequency
   of By-Four and Deuce chains you would need:

      50% Odd         6% By-Four        44% Deuce       Gain=1.00396


   A 6% occurrence is a very significant change; not a slight deviation.
   In this example entropy would be added in 94% of the chains making it
   exceedingly rare and short lived.

   It is well known that we only need to advance until the series reaches
   a number that is less than the starting point.  After that you can use 
   recursion to show it terminates.  This implies that the effects of 
   entropy need to take effect early on.  Analyzing entropy beyond that 
   would be pointless.  It would remain high and would be misleading.

   Using truncated series with any even starting point the next step goes
   down and we are done.  Also if the low bits of N are 01 (N % 4 = 1)
   we get:

         N = 4 * j + 1  ->  3 * (4 * j + 1) + 1

                         = 12 * j + 4  =  3 * j + 1  <  N = 4 * j + 1


   Now we only need consider remaining odd starting values for k > 2:

         j * 2^k - 1  ->  j * 3^k - 1     where j is even


C. Observations

   Here we give metrics for the first few orbits with various numbers of
   chains.  There is a warm up period while enough entropy accumulates to
   dominate over the initial state.  Adding one bit of entropy per Odd or
   Deuce chain then the length of the period is 4/3 * log2( n ).  During
   this period entropy is biased downward.

   Also, since the goal is to run the sequence until it is lower than the
   starting value, any subsequent values are not critical.  They will also
   have higher entropy giving an upward bias.  Metrics are only taken for
   the region after the warm up period to the point where the sequence goes
   below the starting value.

   The values listed are the initial sequence value, the number of chains, 
   the average entropy, and the minimum value.  In the first section metrics 
   are calculated starting with the fiftieth chain.  Only orbits are listed 
   with over 100 chains and the worst case (lowest) entropy.  Orbits with 
   average entropy over .98 are discarded.

               N          Chains    Average     Minimum     

       1_004_910_504_406    101     0.97855     0.96440
       1_006_847_123_698    102     0.97925     0.96764
       1_009_507_654_930    103     0.97812     0.96226
       1_016_869_791_785    104     0.97900     0.96511
       1_010_222_525_375    105     0.97798     0.96309
       1_018_973_403_257    106     0.97471     0.96171
       1_011_458_120_662    107     0.97171     0.95689
       1_019_225_482_855    108     0.97425     0.95522
       1_016_511_374_619    109     0.97520     0.96012
       1_009_097_630_686    110     0.97304     0.96162
       1_012_606_892_238    111     0.97971     0.96898
       1_016_251_181_895    112     0.97794     0.96485
       1_006_442_994_665    113     0.97962     0.96026
       1_009_074_973_979    114     0.97511     0.96435
       1_009_300_305_682    115     0.97969     0.96373
       1_004_039_432_955    121     0.97661     0.96880
       1_017_723_198_430    122     0.97518     0.96323
       1_004_753_675_742    123     0.97927     0.96378
       1_019_270_409_215    124     0.97997     0.96124
       1_012_163_792_146    125     0.97944     0.96564
       1_019_057_497_270    127     0.97831     0.95842
       1_009_934_311_788    142     0.97933     0.96564


   This next chart tracks orbits in the same region with over 200 chains.
   It illustrates how longer orbits have higher entropy.  For these longer
   orbits Entropy is substantially higher.

               N          Chains    Average     Minimum

       1_019_524_626_430    201     0.99810     0.99517
       1_011_050_942_290    202     0.99705     0.99342
       1_012_645_261_039    203     0.99184     0.98394
       1_019_445_805_308    204     0.99306     0.97947
       1_008_195_131_391    205     0.99358     0.98272
       1_015_571_878_206    206     0.99509     0.98566
       1_011_482_948_255    208     0.99471     0.99005
       1_014_934_024_239    209     0.98914     0.96991
       1_019_229_503_550    211     0.99248     0.97494
       1_012_087_192_295    213     0.99760     0.99186
       1_016_311_780_350    217     0.99659     0.99162
       1_005_443_653_055    219     0.99367     0.98477
       1_019_868_848_999    220     0.99796     0.99290
       1_003_500_978_358    231     0.99349     0.98741
       1_015_160_288_158    236     0.99463     0.98407
       1_009_542_611_052    253     0.99174     0.98403


D. Disjoint Branch

   If a graph tree of all orbits was not fully connected there would be 
   a path that never reached the root.  The branch could either be cyclic
   or infinitely long.  Cyclic branches would have repeating values whereas 
   within acyclic branches all values are unique.  To remain unique an 
   acyclic disjoint branch would contain values from a pool of increasingly
   larger values.

   A characteristic of pseudo random number generators is that they can 
   produce a sequence that repeats.  If the Collatz sequence, acting a 
   generator, was to repeat then the sequence would cycle and the branch 
   would not be infinitely long.  This raises the prospect of a cyclical 
   counter example with a very long period.

   As with the gambler that does not quit while ahead, an infinite branch 
   would succumb to the pressure of the house odds.  To thwart them Odd 
   transitions would need to be applied over 2/3 of the time.  An infinite 
   disjoint branch would contain more than twice as many odd values as even.

      Ga = 1.5^.672 * .25^.164 * .75^.164 = 0.99794   Just under breaking even

      Ga = 1.5^.674 * .25^.163 * .75^.163 = 1.00043   Just over breaking even


   To sustain these proportions in a long sequence the entropy of the low
   order bit would need to average below 0.9183.

         H = -1/3 * log2( 1/3 ) + -2/3 * log2( 2/3 ) = 0.9183


   Another way to beat the odds is if By-Four chains were to never occur.
   Even and odd values could potentially appear half the time and the path
   would on average keep increasing.

         Ga = 1.5 ^ .5 * .25 ^ .0 * .75 ^ .5 = 1.06066


   Long paths of any length can be contrived, but eventually will produce a 
   By-Four chain.  Actual paths can start off with very few By-Four chains 
   for values (such as the path starting at 71), but eventually terminate.

   A tiny number of By-Four chains could occur and still have a gain over
   unity.  A disjoint branch would have to meet this constraint.  Allowing
   4% By-Four chains (highly skewed from the 25% average) we get:

         Ga = 1.5 ^ .48 * .25 ^ .04 * .75 ^ .48 = 1.00108


   A statistical argument alone is insufficient for a proof.  Physical 
   imperfections aside, the model for roulette assumes that each spin is an 
   independent purely random event.  This differs from the Collatz sequence 
   where a given starting number produces a deterministic pseudo random
   sequence.  

   The remaining issue concerns the distribution of even and odd values over
   an infinite disjoint branch.  We need to show there is enough entropy
   so that there are fewer than twice as many odd values as evens in order
   for the series to converge.

      A starting value can be contrived to create a sequence of any 
      desired fixed length.  The initial value contains a finite amount of 
      information.  As each chain is run strings of ones or zeros with no
      entropy are replaced by a power of three with high entropy.  After
      running each chain the information in the starting value becomes
      randomized until all of the information in the initial state is lost.
      Consequently an infinitely long chain cannot be created.

   The overall increase in entropy for odd chains forms a self-regulating 
   mechanism.  Any skew in the ratio of odd to even values moves to 50:50 
   due to the increase in entropy.  Collatz chains become sufficiently 
   randomized after the first few chains for even the most contrived runs.

   We've observed that orbits with over 100 chains have entropy is sustained
   above 0.975 with minimums over 0.95.  This goes up in longer orbits and
   is over 0.995 with 200 chains.  This is well above the 0.9183 level
   required for eternal growth.  However, to be conclusive we'd need to
   prove that entropy rises in longer chains and has a lower bound.



VI. Conclusion

   A revised sequence combines two or three steps in the original.

      N is Odd:     N' = (3 * N + 1) / 2

      N % 4 = 0:    N' = N / 4

      N % 4 = 2:    (3 * N + 2) / 4


   Values between each step have N % 3 = 2.  The other 2/3 of values are
   subsumed within each transition.  The low two bits of each value
   produced determine the next step.

   In the limit the gain in each of these transitions is 1.5, .25, and .75
   respectively.  Gains in a sequence of transitions is the product of each
   individual gain.  The average gain with a uniform distribution of each
   type of transition is then:

      Ga = (3/2)^.5 * (1/4)^.25 * (3/4)^.25 = (27/64) ^ (1/4) = 0.80593


   An orbit that never reduced and instead continuosly increases would 
   require twice as many odd transitions as even.

      Ga = 1.5^.672 * .25^.164 * .75^.164 = 0.99794   Just under breaking even


   Repetitions of the revised steps form chains and combine to give:

      N is Odd:     N  = 2^K * j - 1

                    N' = 3^k * j - 1


      N % 4 = 0:    N  = 4^k * j

                    N' = j


      N % 4 = 2:    N  = 4^K * j + 2

                    N' = 3^k * j + 2


   Starting values of interest are those that do not immediately reduce.

       2 * j * 3^k - 1      where k > 2


   The entropy of a bit stream is:

       H = -p0 * log2( p0 ) + -p1 * log2( p1 )


   The entropy required for an orbit that never reduced would be:

       H = -1/3 * log2( 1/3 ) + -2/3 * log2( 2/3 ) = 0.9183


   A disjoint branch needs to on average always increase.  To do so it needs
   to consist primarily of Odd chains.  However Odd chains increase entropy,
   reducing the number times they occur.  A branch that started with a high
   proportion of Odd chains would eventually converge due to increasing 
   entropy, making it impossible to form an infinite disjoint branch.

   We measure the entropy of each of the low two bits between each chain.
   Higher entropy implies a more uniform distribution of the kinds of each
   transition.  Observed entropy is over .995 in orbits with over 200
   chains; well over the 0.9183 minimum.  Such orbits will eventually reduce
   below their starting value; supporting the Collatz conjecture.

   Entropy is more than a statistical metric.  Here it acts as a force that
   in part determines the next transition to apply.  It randomizes the lower
   two bits of the revised sequence resulting in a more uniform blend of
   transactions.  The source of the entropy is the change in sequence values
   from 2^k to 3^k.

   Part of what makes the Collatz sequence intractable is the randomization
   of the values it produces.  Ironically randomization is the feature that
   moves it towards termination.