Nano-Memory Simulation

                             by Bradley Berg

                            December 14, 2005



                                ABSTRACT

        The wires interconnecting a grid of sub-lithographic memory
        cells are too small to be directly manipulated.  Randomized
        address decoders can be used to access individual memory
        cells.  Either differentiated wires are randomly positioned
        or undifferentiated wires are randomly connected.

        Randomized access requires a mapping memory to translate
        ordered addresses into random addresses.  A paged serial access
        chip architecture is presented to minimize the size of the
        memory map and to compensate for the resultant timing latencies.

        Simulations were performed to determine plausible memory
        configurations for differentiated Core-Shell wires and Random-
        Particle decoders.  With both approaches about 25% of the cells
        were usable assuming a projected 10% failure rate for wires and
        cells.  About 25% of the cells were lost due to failures and
        the remaining half were lost as a result of randomized access.



1.  Introduction

Initial nano-memory offerings will compete against dominant incumbent
technologies, DRAM, SRAM, and flash, as well as several emerging technologies.
Random access nano-memories will have to outperform either DRAM (< 50ns) or
SRAM (< 8ns) at lower cost.  If near term random access memory technologies
(e.g. Ovonic phase change memory [1]) is to displace DRAM it will need to be
even faster; raising the performance bar.  Non-volatile memory has less
stringent performance constraints, but has higher density requirements.
Flash memory can be read quickly (< 60ns), but writes can take milliseconds.

Some proposed nano-memories may be fast enough to compete for use as main
memory, but many will not.  A claim has been made that the nano-memory being
developed by Nantero has an access time of 1/2 nanosecond [2].  Even if the
storage medium itself is fast the total chip level access time may be
substantially slower.  Proposed nano-memory schemes need to cope with high
failure rates and randomized configuration.  These issues are addressed with
mesa-scale (usually CMOS) address translation maps and error correction.

An additional factor that may degrade nano-memory performance is output
coupling delay.  This occurs at the junction of a nano-scale read sensor and a
mesa-scale output line.  It takes time for the small read sensor to transfer
enough charge to register on the large output line due to its higher
capacitance.

Considering these factors it is likely that nano-memories will be better
positioned to compete on it's density advantage rather than speed.  File
systems are stored in paged memory and comprise the bulk of the storage bits in
a computing system.  Currently, file systems are stored on low-cost disk with
access times in the tens of milliseconds.  New high capacity memory chips can
store frequently accessed pages in fast solid-state devices to dramatically
improve file system access speed [3].

This paper considers chip level architectures for paged nano-memory devices.
It is applicable to nano-memory technologies whose performance is not
sufficient to compete with incumbent and emerging random access devices.
Plausible circuitry to access pages of nano-memory is devised using simulation.
The simulations take into account a varying degrees of fabrication faults and
are used to compare configuration options.


2. Paged Memory Configuration

Solid-state paged memory can be used to improve file system performance and to
build portable storage devices.  With the advent of low cost non-volatile
solid-state memory, paged memory can be combined with rotating disks to create
high performance file systems.  There are four different uses for solid-state
paged memory in general purpose computer systems.

* Portable drives are already used widely in pen drives and music players.
  Over time it may be desirable for personal data to migrate to pen drives [4].

* Non-volatile memory can be used to cache data for rotating disk drives.
  Microsoft Windows Vista (a.k.a Longhorn) buffers disk reads in main memory
  while disk writes are buffered in non-volatile memory on a hybrid drive [5].

* Solid state disk can potentially sustain peak transfer rates for a given
  transfer protocol with access times significantly faster than hard drives.
  Transfer rates for SATA 1 are 150MB/second.  The recently introduced SATA
  2.5 specification has a peak rate of 300MB/second.  The next generation
  SATA 3 will double the rate again to 600MB / second. [6]

* Even higher transfer rates can be achieved by storing local file systems in
  non-volatile memory placed on the mainboard.  Local storage can use full
  speed DMA channels to transfer data in and out of main memory.  This can
  greatly improve the performance of local desktop and server systems and
  vector data in and out of supercomputer systems.


Dehon [7, 8] proposed that nano-memory chips be hierarchically organized using
a set of crossbar grids of nano-wires.  Within a grid nano-wires are
partitioned into bundles.  Each bundle is individually addressable by
mesa-scale (typically CMOS) wiring.  Throughout this project each grid contains
64 usable bundles per axis.  The actual number will be greater due to flaws,
unaddressable wires, and parity bits.

Bundles are kept small enough that the bulk of address decoding can be done
with reliable mesa-scale circuits.  A practical bundle size is near the ratio
of mesa-scale to nano-scale pitches.  This is expected to be about 10 (e.g.
90nm:9nm).  On this basis the page address size will be set at 3 bits and is
used to select a wire within a bundle using a decoder.



                                  GRID ADDRESSING

               +-------------------------------------------------------+
               |  X Bundle(6)  |  Y Bundle(6)  | X Page(3) | Y Page(3) |
               +-------------------------------------------------------+



                                       -++++++++++++++-++++++++++++++-
                                       -++++++++++++++-++++++++++++++-
                          +-----+      -++++++++++++++-++++++++++++++-
                        ->|     |      -++++++++++++++-++++++++++++++-
                Page(3) ->| Map |      -++++++++++++++-++++++++++++++-
                        ->|     |       |||||||||||||| ||||||||||||||
                          +-----+       |||||||||||||| ||||||||||||||
                           |||||        |||||||||||||| ||||||||||||||
           Decoder Inputs  |||||        |||||||||||||| ||||||||||||||
                           |||||        |||||||||||||| ||||||||||||||
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
              Bundle      -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                           |||||        |||||||||||||| ||||||||||||||
                          -+++++--------+++++++++++++++++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
              Bundle      -+++++--------+++++++++++++++++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                          -+++++--------++++++++++++++-++++++++++++++----
                                        |||||||||||||| ||||||||||||||




The X and Y Bundle addresses are scanned and the X and Y Page addresses select
a particular 4096 bit page out of 64 pages within the grid.  When a page is
accessed, bits are selected over the entire grid.  Correspondingly, the heat at
cross points will be dissipated over a wide area, avoiding hot spots on the
chip.  Memory cells this small are likely to be particularly susceptible to
thermal perturbations.  Heat can more easily change their analog characteristics
or damage the device.  Spreading out access over pages also increases endurance.
Over a long period of time no small group of bits will be repeatedly changed.


3. Fault Model

This section details sources of faults within a grid.  However, with respect to
page addressing the decoder logic simply needs to know whether or not a
particular page address can access a fully functional nano-wire.  Faults are
categorized into always on, always off, and intermittent [9, 10].  Hard faults
are found using a discovery process.  Each address in the grid is tested to see
if data can be successfully written.  As invalid addresses are found the
corresponding map entries are marked.

The addresses map is scanned sequentially and invalid or unused addresses are
skipped.  The validity of neighboring addresses has no effect on a mapped
address.  Faults can be treated as independent events.  Consequently a single
composite fault metric is used based on this observation.

Simplifying the fault model down to a single parameter implies the fundamental
chip architecture does not need to be altered as the capacity or fault rate
changes.  At most a small number of bundles can be added or removed from grids
to accommodate design changes.

Intermittent faults occur while the chip is in use and can not be detected at
the time of fabrication.  These are managed using error correction and are
discussed in section 4.  Any hard fault has a corresponding intermittent
counterpart so the same categorization of hard faults can also be used to
structure a detailed analysis of intermittent faults.


3.1 Crossbar faults

Crossbar faults occur in an individual memory cell at the point where two
nano-wires cross.

*  Short:  The wires can short causing failures along both intersecting wires.
           Mark both wires as faulty.

*  Open:   Only the single cell will appear to be always 0 or 1.
           Additional logic to remap single cells would increase the circuit size.
           It is better to just mark both wires as faulty.

*  Cell:   The memory cell itself could fail and appear as always 0 or 1.
           As before both cross wires are marked faulty.


3.2 Wire Faults

Due to their delicate nature nano-wires can easily break.  Their close
proximity means that a small fabrication error can allow them to come in
contact with each other.  Note that within a bundle duplicate wires may be
selected by the same address; operating as a single line.  In some cases a
break in one of the wires may be masked by the other.

* Broken:  Cells past the point of breakage will not be accessible.
           Disable the address for the broken wire.

* Touch:   Two or more touching wires probably will have different addresses.
           Once a faulty wire is found the discovery process needs to account
           for other interacting addresses.  All interacting addresses must
           be marked faulty.


3.3 Contact Faults

All wires in a bundle are activated and then their activation is selectively
blocked by enabled decoder input (control) lines.  An input line blocks a wire
when it is activated and passes if deactivated.  A faulty uncontrolled contact
never blocks and conversely a faulty controlled contact always blocks.

There are no hard contact faults possible for the Random-Particle decoder since
the contacts are randomly present or not.  However, intermittent contact faults
might still occur.

* Uncontrolled:  A contact is uncontrolled when activated when it should be
                 controlled.  This causes a wire to be activated when it should
                 not be.  If more than one address can activate multiple wires
                 then they will interfere.

                 With a linear decoder the wire will always be selected so the
                 address is discarded.  For a logarithmic decoder, if each
                 wire is accessible through unique addresses despite the
                 uncontrolled contact, it is still usable.  In fact this
                 condition may be undetectable.

* Controlled:    When a contact is always controlled the Wire is not selected
                 when it should be.  In this case the address will not have a
                 detectable wire and is indistinguishable from a wire missing
                 from the bundle.


4.  Error Correction

Error correction is required to correct intermittent faults that occur after a
chip is fabricated.  Reed-Solomon codes are effective at correcting errors in
paged access memories.  The page is divided into a sequence of k-bit symbols
and additional parity symbols are appended to the page.  The parity symbols
can be used to correct errors in up to a fixed number of symbols determined
by design parameters.  Any number of bits within a symbol can be corrected.
Consequently, lengthy sequences of errors can be corrected.

Unlike mesa-scale memory grids, failures in a nano-scale grid are likely to
involve complete wires.  Serially reading cells along a faulty nano-wire yields
a contiguous sequence of failed bits.  Reading cells perpendicular to a faulty
nano-wire distributes the failures throughout the page.  In this case many
symbols need correcting as each failed cell occurs in a different symbol.  This
requires many correctable symbols and consequently many parity bits.

A more balanced error pattern can be achieved by scanning the grid linearly
halfway along both axis.  The grid can be divided into quadrants and accessed
along a different axis in each quadrant as the following diagram illustrates.
This cuts the number of distributed faults in half.


         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ----------------||||||||||||||||
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------
         ||||||||||||||||----------------



Dividing each 4096 bit page into three RS(255, 233) code words reduces the
number of parity bits needed even further.  RS(255, 233) is a Reed-Solomon code
with 8 bit symbols of which up to 16 can be corrected per code word.  Each
code word contains up to 233 data symbols and 32 parity symbols.  Together the
three code words use (32 * 8 * 3) 768 parity bits per page.  Alternatively,
using a single code word for each page requires a 10 bit symbol resulting in
(32 * 10 * 3) 960 parity bits per page.

As the page is scanned cells are transferred to alternating code words.  Invalid
data due to a nano-wire failure is then evenly distributed over all three
code words.  Consequently the number of failures per code word will not exceeds
the upper bound of 16 corrections per code word.

The data capacity of the three RS(255, 233) code words is (223 * 8 * 3) 5352
bits; which is more than is needed.  Rounding up the 4096 bit page size to a
multiple of 3 symbols gives (171 * 8 * 3) 4104 data bits.  The capacity of the
grid is then (4104 + 768) 4872 total bits per page; which can be stored in
(70 * 70) 4900 grid bits.  A 70 by 70 bit grid size leaves enough room to store
an additional byte per code word that can be used as a checksum.  The checksum
can detect when errors exceed the correctable limit.

Once a fault is corrected, the corresponding nano-wires are re-mapped to spare
bundles.  This is done very simply by updating the bundle map and rewriting the
page.  At some point there might be no more spare bundles available to be
remapped.  Spare grids can also be added to the chip and the entire grid can be
rewritten to a spare grid.


5. Nano-Memory Simulation

Simulation is used to determine plausible chip configurations for nano-memories.
In particular Core-Shell and Random-Particle decoders are simulated with
varying defect rates, bundle sizes, and the number of decoder inputs.


5.1  Core-Shell Nano-Wires

Core-shell decoders can double up the contacts to increase reliability.  For
independent contact failures the probability of a fault is squared.  For a
given fabrication process contact failures may not be totally independent and
the actual failure rate will be less than the square.  Random particle contacts
can not use double contacts to increase their reliability as the contacts are
random and can't be duplicated.

Different Core-Shell nano-wires are developed through a chemical design
process.  It can be expected that making many different types of shells is
bounded by chemistry to a small number of wire types.  Consequently the
simulated models seek to use a small, but reasonable number of wire types.

The input address map for a Core-Shell decoder can use one bit for each wire
type.  When a wire type is addressable in a bundle the bit is set.  To
determine if a particular wire type (address) is present the map bits are
shifted and the address is decremented each time a one is encountered.  Another
counter counts the number of shifts.  As the address reaches each 0 the shift
counter has the input line values.

The number of cycles needed for decoding each bundle can be up to the number of
wire types.  The decoding process has to be faster than the memory access time
to achieve maximum access speed.  For paged access it's assumed that the memory
is slower than DRAM, so there is plenty of time for the computation.  A more
complex equivalent circuit could perform the same computation in a single cycle.


   Example:  7 of 8 addresses usable in a bundle with map settings of:

             A3:  0     1  2     3        4     5  6  7
            Map:  1  0  1  1  0  1  0  0  1  0  1  1
             A4:  0  1  2  3  4  5  6  7  8  9 10 11 Miss


                     A3 = 3 (in)   A4 = 0 (out)
                          2             1
                          2             2
                          1             3
                          0             4
                          0             5  Hit


                     A3 = 7 (in)   A4 = 0 (out)
                          6             1
                          6             2
                          5             3
                          4             4
                          4             5
                          3             6
                          3             7
                          3             8
                          2             9
                          2            10
                          1            11
                          0           Miss


To prevent the upper addresses from always being mapped to a miss, the
addresses need to be cycled.  This can be done using the exclusive or of the
low bits of the bundle index and the input address.  In the example above the
low 3 bits of the bundle index would be exclusive or'ed with A(in).


5.1.2  Single Core Shell [11,12]

The decoder for the Single Core-Shell modeled here can use either a linear
encoding (8 control lines) or a dual-rail log encoder (6 control lines).  A
preliminary simulation was run to determine reasonable bundle sizes and to
observe the effect of disabling individual input addresses in groups.  Each
simulation was run to determine the number of bundles required along each axis
to produce a working 64 by 64 bundle grid.  There are 8 wire types with no
faults and 1000 grids were generated.  Simulations were not run for the blank
fields as it was apparent 2 bit maps were impractical after a few runs.


       map bits per bundle -> (bundles per axis, wires, map bits per axis)

bundle    2                       4                 8

  6                        205, 1230,820      124,  744,992
  7                        175, 1225,700      103,  721,824
  8     198, 1584,396      118,  944,472      *88,  704,704
  9                        126, 1134,504      *91,  819,728
 10     160, 1600,320      116, 1160,464      *85,  850,680
 12     145, 1740,290      100, 1200,400       80,  960,640
 14     125, 1750,250       88, 1232,352       77, 1078,616


Note that the more plausible configurations are marked with an asterisk in this
and subsequent tables.

Observations:

*   Using fewer map bits than wire types requires many more wires.
    This is because addressable wires are being discarded.

*   Using fewer wires per bundle than wire types requires more wires
    and more map bits.

*   Using more wires per bundle than wire types uses more wires but
    fewer map bits, but with diminishing returns.  8 to 10 wires
    per bundle seem reasonable.


Simulations incorporating different fault rates were run next.  Ten 70 by 70
grids were generated in each of two runs.  The average number of bundles
required was used to determine the number of wires and map bits needed for each
axis.  The first table contains the raw results and the second multiplies the
number of bundles to yield the number of nano-wires and map bits required.


            (run1  run2 | truncated average bundles per grid)

  Fault    8 wire bundles           9 wire bundles           10 wire bundles

   0%      103  99 | 101             98  98 | 98               92  92 |  92
   1%      117 117 | 117            110 110 | 110             103 104 | 103
   5%      125 125 | 125            121 120 | 120             114 112 | 113
  10%      137 137 | 137            129 127 | 128             126 123 | 124
  20%      162 166 | 164            158 157 | 157             151 150 | 150
  30%      201 200 | 200            196 196 | 196             191 191 | 191



                      (average bundles, wires, map bits)

  Fault    8 wire bundles           9 wire bundles           10 wire bundles

   0%      101, 808, 808           * 98, 882, 784              92, 920, 736
   1%      117, 936, 936           *110, 990, 880             103,1030, 824
   5%      125,1000,1000           *120,1080, 960             113,1130, 904
  10%      137,1096,1096           *128,1152,1024             124,1240, 992
  20%      164,1312,1312           *157,1413,1256             150,1500,1200
  30%      200,1600,1600           *196,1764,1568             191,1910,1528


Observations:

* Using nine wires per bundle provides a good trade-off between map size
  and wire utilization.

* For fault rates of 10% and under wire utilization is dominated by duplicate
  wire addresses and not faulty wires.  Note that the ideal utilization is
  (70 * 8) 560 wires.  The 0% fault case shows the utilization due solely to
  duplicate wires.



5.1.3  Double Core

Coating nano-wires with two shells reduces the number of different material
types and etching steps.  Four material types can be used to fabricate up to 12
different wire types.  The table below shows each etching step (columns) for
each of the possible 12 combination of wire coatings.


          In 5 etchings there are 9 wire types.

               k1  k2
               k1      k3
               k1          k4
                   k2  k3
                   k2      k4
                       k3  k4
                   k2          k1
                       k3      k1
                           k4  k1


          For 11 wire types add a 6th etch:

                       K3            K2
                           K4        K2


          For 12 wire types add a 7th etch:

                           K4             K3



As before simulations for ten 70 by 70 grids were run for 9, 11, and 12 wire
types.  This shows the effect of using 5, 6, or 7 etches respectively.



              9 wire types with 9 bit map shift/count map

                 (run1  run2 | truncated average wires per grid)

  Fault    9 wire bundles          10 wire bundles           11 wire bundles

   0%      104 102 | 103             98  96 |  97              92  93 |  92
   1%      105 104 | 104            101 101 | 101              95  95 |  95
   5%      112 113 | 112            106 106 | 106             102 103 | 102
  10%      123 126 | 124            121 121 | 121             112 113 | 112
  20%      146 147 | 146            143 141 | 142             139 139 | 139
  30%      181 180 | 180            181 181 | 181             176 176 | 176


                  (average bundles, wires, map bits)

  Fault    9 wire bundles          10 wire bundles           11 wire bundles

   0%      103, 927, 927           * 97, 970, 873              92,1012, 828
   1%      104, 936, 936            101,1010, 909            * 95,1045, 855
   5%      112,1008,1008            106,1060, 954            *102,1122, 918
  10%      124,1116,1116            121,1210,1089            *112,1232,1008
  20%      146,1314,1314           *142,1420,1278             139,1529,1251
  30%     *180,1620,1620            181,1810,1629             176,1936,1584



                    11 wire types with 11 bit shift/count map

                 (run1  run2 | truncated average wires per grid)

  Fault   11 wire bundles          12 wire bundles           13 wire bundles

   0%       86  86 |  86             83  81 |  82              78  79 | 78
   1%       88  86 |  87             85  84 |  84              82  82 | 82
   5%       93  93 |  93             90  90 |  90              86  88 | 87
  10%      100 103 | 101            100  98 |  99              95  95 | 95
  20%      124 123 | 123            119 119 | 119             117 117 | 117
  30%      149 149 | 149            147 145 | 146             142 143 | 142


                  (average bundles, wires, map bits)

  Fault   11 wire bundles          12 wire bundles           13 wire bundles

   0%       86, 946, 946             82, 984, 902              78,1014, 858
   1%       87, 957, 957             84,1008, 924              82,1066, 902
   5%       93,1023,1023             90,1080, 990              87,1131, 957
  10%      101,1111,1111             99,1188,1089              95,1235,1045
  20%      123,1353,1353            119,1428,1309             117,1521,1287
  30%      149,1639,1639            146,1752,1606             142,1846,1562



                 12 wire types with 12 bit shift/count map

                 (run1  run2 | truncated average wires per grid)

  Fault   12 wire bundles          13 wire bundles           14 wire bundles

   0%       86  80 |  83             82  81 |  81              77  77 |  77
   1%       81  81 |  81             79  79 |  79              77  77 |  77
   5%       86  88 |  87             84  84 |  84              83  83 |  83
  10%       95  95 |  95             93  90 |  91              88  89 |  88
  20%      109 115 | 112            108 109 | 108             106 108 | 107
  30%      138 135 | 136            133 133 | 133             132 132 | 132


                  (average bundles, wires, map bits)

  Fault   12 wire bundles          13 wire bundles           14 wire bundles

   0%       83, 996, 996             81,1053, 972              77,1078, 924
   1%       81, 972, 972             79,1027, 948              77,1078, 924
   5%       87,1044,1044             84,1092,1008              83,1162, 996
  10%       95,1140,1140             91,1183,1092              88,1232,1056
  20%      112,1344,1344            108,1404,1296             107,1498,1284
  30%      136,1632,1632            133,1729,1596             132,1848,1584


Observations:

* Typical wire utilization and map size were close for 8 to 12 wire types.
  It is probably not worth the additional cost to perform the additional
  fifth or sixth etches.



5.2 Random Particle

Williams and Kuekes describe a scheme for building a decoder based on the
random deposition of gold particles [13].  Control lines with random controlled
and uncontrolled contacts are produced.  The deposition process is tuned such
that there is an even distribution of controlled and uncontrolled contacts.

The number of input lines needs to be larger than in the decoder for Core-Shell
in order to uniquely address nano-wires.  Consequently the dense mapping scheme
used for Core-Shell decoders can not be used.  Instead each 3 bit page address
needs to be mapped to the setting for each input line.  For each bundle 8 input
lines use an (8 * 8) 64 bit map, 10 use (8 * 10) 80 bits, and a 12 line decoder
uses (8 * 12) 96 bits.

A preliminary simulation was run to determine a reasonable number of input
lines and bundle size.  The six most promising configurations were selected
for further analysis.  The simulation results for the six chosen input and
bundle size combinations are shown in the second set of tables.



                  (average bundles, wires, map bits)

 Inputs   bundle 8     bundle  10     bundle  12     bundle 14      bundle 16

   8   120,960,7680  106,1060,6784  *98,1176,6272   93,1116,5952   93,1488,5952
  10    95,760,7600  *81, 810,6480  *78, 936,6240  *77,1078,6160   74,1184,5920
  12    83,664,7968   75, 750,7200   70, 840,6720  *68, 952,6528  *67,1072,6432




        (input lines, wires per bundle) -> (average bundles, wires, map bits)


       Fault      8, 12                                          64 bit map

        0%     106,1272, 6784
        1%     112,1344, 7168
        5%     117,1404, 7488
       10%     124,1488, 7936
       20%     138,1656, 8832
       30%     158,1896,10112


       Fault     10, 10           10, 12          10, 14         80 bit map

        0%      89, 890, 7120   *83, 996,6640    82,1148,6560
        1%      94, 940, 7520   *86,1032,6880    82,1148,6560
        5%      98, 980, 7840   *88,1056,7040    86,1204,6880
       10%     103,1030, 8240    93,1116,7440    87,1218,6960
       20%     114,1140, 9120   103,1236,8240    96,1344,7680
       30%     132,1320,10560   119,1428,9520   109,1526,8720


       Fault     12, 14           12, 16                         96 bit map

        0%      74,1036,7104     74,1184,7104
        1%      75,1050,7200     75,1200,7200
        5%      75,1050,7200     75,1200,7200
       10%     *77,1078,7392     76,1216,7296
       20%      83,1162,7968    *80,1280,7680
       30%      91,1274,8736    *85,1360,8160


Observations:

    * The number of nano-wires used is close to that for Core-Shell wires.

    * The number of map bits required shows an increase of about 6 or 7 times
      compared to Core-Shell wires.


6. Conclusions

Using a paged address scheme for nano-memories relaxes several design
parameters; lowering technical risks.  This is particularly relevant for first
generation devices.  Access is distributed over many bits ensuring there are no
hot spots at the chip or memory cell level.  Disbursed access also means longer
endurance (number of rewrites) over the life of the chip.

Support logic for paged access is simpler than the logic for random access.
This is particularly true of nano-memory; which has to cope with randomized wire
placement and high fault rates.  Fault management can also result in irregular
timing; which is undesirable in random access memory.  Buffers used in paged
access memory eliminate the irregularities.  The overall access speeds required
for paged access are less stringent than for random access.

The following recommendation are based on the simulation runs:

   Single Core-Shell

      * Use 8 wire types in 9 wire bundles.

      * Use either an 8 bit linear decoder or a 3 bit dual rail log decoder.

      * Double up the decoder input lines to reduce contacts faults.


   Double Core-Shell

      * With a fault rate of 10%, use 9 wire types in 11 wire bundles.

      * Use a 4 bit dual rail log decoder.

      * Double up the decoder input lines to reduce contacts faults.

      * Compress the input map using the shift-and-add mapping method.


   Random-Particle

      * Use a 10 to 12 bit log decoder.

      * Use 12 to 16 wires per bundle.

      * As the map size is the same for paged and random access,
        Random-Particle wires are suitable for either access mode.



All 3 fabrication methods are limited by the small number of wire types leading
to many duplicate nano-wires.  At a 10% fault rate duplicate wires dominate
wire utilization over faulty wires by about a factor of two.  The major
limitation of the Core-Shell process is the cost of making additional wire
types.  The limiting factor for Random-Particle is the large input map size.
With a typical 10:1 mesa-scale to nano-scale pitch ratio the potential density
increase for nano-memories is 22 to 28 times that of conventional CMOS memory.


                         ACKNOWLEDGEMENT

Thanks to Eric Rachlin for his assistance with the discovery algorithm for the
Random-Particle decoder.






                           BIBLIOGRAPHY

[1]   Stefan Lai (Intel) and Tyler Lowrey (Ovonyx).  OUM - A 180 nm Nonvolatile
      Memory Cell Element Technology For Stand Alone and Embedded Applications.
      ftp://download.intel.com/technology/silicon/OUM_doc.pdf

[2]   On the Tube.  The Economist.  May 8th 2003.
      http://www.economist.com/science/displaystory.cfm?story_id=1763552

[3]   Bradley A. Berg.  New Computers Based on Non-Volatile Random Access
      Memory.  July 18, 2003.
      http://www.techneon.com/paper/nvram.html

[4]   Bradley A. Berg.  Securing Personal Portable Storage.  May 12, 2005.
      http://www.techneon.com/paper/pen.html

[5]   Jack Creasey (Microsoft).  Hybrid Hard Drives with Non-Volatile Flash
      and Longhorn.
      http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWST05002_WinHEC05.ppt#309,10,Technical Assumptions for Hybrid Disk

[6]   Michael Alexenko (Maxtor).  ATA for the Enterprise: The Present and
      Future State of ATA.  February 21, 2001.
      http://www.sata-io.org/docs/srvrio0201b.pdf

[7]   Andre DeHon.  Array-Based Architecture for FET-Based, Nanoscale
      Electronics.  IEEE Transactions on Nanotechnology, VOL. 2, NO. 1,
      March 2003 pp. 23-32.

[8]   Andre DeHon, Patrick Lincoln, and John Savage.  Stochastic Assembly of
      Sublithographic Nanoscale Interfaces.  IEEE Transactions on
      Nanotechnology, vol. 2, no. 3, pp. 165-174, 2003.

[9]   Myung-Hyun Lee, Young Kwan Kim, and Yoon-Hwa Choi.  A Defect-Tolerant
      Memory Architecture for Molecular Electronics.  IEEE Transactions on
      Nanotechnology, VOL. 3, NO. 1, March 2004.

[10]  Philip J. Kuekes, Warren Robinett, Gadiel Seroussi and R. Stanley
      Williams.  Quantum Defect-tolerant Interconnect to Nanoelectronic
      Circuits:  Internally Redundant Demultiplexers Based on Error-correcting
      Codes.  Science Research, Hewlett-Packard Labs, 1501 Page Mill Road,
      Palo Alto, CA.

[11]  Lincoln J. Lauhon, Mark S. Gudiksen, Deli Wang, and Charles M. Lieber.
      Epitaxial Core-shell and Core-multishell Nanowire Heterostructures.
      Nature, Vol. 420, pp. 57-61 (2002).

[12]  Dongmok Whang, Song Jin, and Charles M. Lieber.  Nanolithography Using
      Hierarchically Assembled Nanowire Masks.  Nano Letters, Vol. 300, No. 7,
      pp. 951-954.

[13]  Stanley Williams and Philip Kuekes.  Demultiplexer for a Molecular Wire
      Crossbar Network.  United States Patent Number: 6,256,767, July 3 2001.
      http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=6,256,767.WKU.&OS=PN/6,256,767&RS=PN/6,256,767