2D-RS Storage Systems

www.tinyurl.com/2drsstoragesystems


Contents

1. Introduction

2. Brief History of ECC in NAND Flash

3. Issues Involved in Continuing to Increase t in Binary BCH Codes

4. N-Page Blocks Divided into Segments

5. 2D Encoding Concept

6. Hardware Encoding and Decoding of Segments

7. Correcting Erasures Plus Errors

8. Repeated Decodings

9. Interface to the NAND Flash Memory

10. High-Level Block Diagram of Data Collection and Downloading System

11. NAND Flash Memory Configurations With Data Path Width Converters

12. 2D-RS Encoding for First Sample Configuration

13. 2D-RS Decoding for First Sample Configuration

14. Percentage Redundancy for First Sample Configuration

15. NAND Flash Memory Configurations Without Data Path Width Converters

17. 2D-RS Decoding for Second Sample Configuration

18. Percentage Redundancy for Second Sample Configuration

Links


1. Introduction

This document describes how to apply ECC Tek’s two-dimensional (2D) RS error-correction (ECC) system to create fault-tolerant NAND Flash memories for spacecraft. The same ideas can be used in commercial SSDs.

Many commercial companies are developing NAND Flash storage systems.

NAND Flash solid state drives (SSDs) are attractive replacements for HDDs because they are faster, take less power, and are more shock resistant than HDDs. Unlike HDDs, SSDs have no moving parts and are silent.

However, there are serious problems associated with using NAND Flash devices. NAND Flash devices were originally designed as electrically erasable programmable read only memories (EEPROMs) and were not originally designed for random access memories. NAND Flash devices cannot be reprogrammed/rewritten an unlimited number of times as HDDs can because they degrade with use. Also, blocks of storage cells in NAND Flash devices must be erased before sub blocks (Pages) can be programmed/written.

The wear-out and failure characteristics of NAND Flash devices require the use of powerful error-correction systems to allow NAND Flash memories to be reliable over long periods of time.

The use of NAND Flash memories in spacecraft will also require the ECC system to tolerate entire chip failures.

This document presents a general methodology for implementing highly reliable and fault-tolerant NAND Flash memories. In order to quickly and effectively convey the concepts involved, two sample configurations are described. The first configuration requires data path width converters and the second one does not. The two configurations show designers how to create other configurations with the same characteristics as the two samples. The first configuration is optimized to reduce gate count. The second configuration is optimized to make it as easy as possible to match the data path width of the NAND Flash memory to the widths of other system data paths. Readers should keep in mind that other cases can be easily created with different Reed-Solomon (RS) symbol sizes and different levels of correction. The methodology is not limited to the two described sample cases.

ECC Tek strongly believes there is very little risk in implementing the methodology described in this document because ECC Tek has more than 30 years experience in designing ECC circuits, the idea of correcting erasures proposed in this document has been implemented in a slightly different form in multi-track tape drives to tolerate the failure of multiple tracks, and the idea of using error-correction software as a recovery procedure in the event that hardware correction fails was used in the first Reed-Solomon HDD implementations at what is now Seagate.

With a 2D-RS scheme as described in this document the designer of a NAND Flash memory has the freedom of arbitrarily increasing the level of protection and also the number of failed chips the system can tolerate. No other existing ECC system gives designers that freedom.

2. Brief History of ECC in NAND Flash

Shortly after NAND Flash devices were first manufactured, an ECC system was recommended by Samsung that corrected t=1 bit in each Page. Pages were then ~512 bytes. The t=1 ECC recommended by Samsung is very similar to a Hamming code which is a binary BCH code that can correct t=1 bit.

ECC Tek has designed and licensed numerous ECC designs for NAND Flash – a number of them are currently in mass production. The original ECC design ECC Tek licensed for NAND Flash was a Reed Solomon (RS) system that operated on 10-bit symbols and corrected t=5 10-bit symbol errors. Two of the bits in each symbol were forced to 0 in the data field so that the data field actually consisted of 8-bit bytes. ECC Tek’s first RS system for NAND Flash was implemented in 2005 and went into mass production shortly thereafter.

In 2007, ECC Tek developed and licensed its first programmable binary BCH encoder and decoder designs for use with NAND Flash devices which could correct up to t=18 bits. Shortly thereafter, ECC Tek developed and licensed a programmable binary BCH ECC system that could correct up to t=30 bits. The latest design ECC Tek has developed for MLC NAND Flash corrects up to t=72 bits in 512 and 1024-byte Pages.

3. Issues Involved in Continuing to Increase t in Binary BCH Codes

When the number of bits correctable, t, and/or the data field length, K, increases, the complexity of binary BCH encoders and decoders increase exponentially which limits the degree to which t and K can be increased. Correcting t=72 bits “on-the-fly” in 512 and 1024-byte data field lengths requires around 275K gates in an ASIC.

When implementing binary BCH encoders and decoders, the codeword contains binary symbols “bits” and a Galois or Finite Field must be used to uniquely identify, tag, address or locate each codeword symbol/bit. For NAND Flash Page sizes larger than 1024 bytes but less than 2048 bytes, a locator field with m=14-bit elements must be used. Fourteen-bit finite field elements are relatively large.

The number of redundant bits required by a t-bit error-correcting binary BCH code is r < mt where m is the width of the locator field elements. Usually the number of redundant bits required is r = mt.

When implementing RS codes, the codeword contains nonbinary symbols and smaller locator fields can be used than what are required for binary BCH codes. For example a RS locator field with 10-bit elements was used in ECC Tek’s first RS design for Flash, and a locator field with 13-bit elements was used in ECC Tek’s first binary BCH design. Both designs were for 512-byte data field lengths.

Apparently as time passes and feature sizes decrease the reliability of NAND Flash devices is decreasing because ECC Tek continues to receive requests to correct more and more bits – up to t=72 bits per Page. Seven years earlier only 1 bit per Page was corrected. This trend will most likely continue especially with multi-level cell (MLC) NAND Flash devices.

Designing binary BCH encoders and decoders with very large t is difficult because the amount of time it takes to compute the error locator polynomial, L(x), from the Syndrome, S(x), and the complexity (gate count) of the decoder increases rapidly as t is increased.

The above-mentioned ECC schemes do not include any provision to correct errors caused by entire chip failures. In order to achieve fault tolerance, some type of 2D ECC scheme must be used.

4. N-Page Blocks Divided into Segments

Data will be written to N NAND Flash chips in N-Page Blocks which are divided into logical Segments as illustrated in Figure 1. Think of each column in Figure 1 as containing a single column of “bits”. Assume one Page contains approximately 4K bytes.

Figure 1 N Pages Divided into Segments

Each Segment will contain N vertical codewords.

For binary BCH codes currently being implemented, each vertical codeword is approximately 512 or 1024 bytes with 4 or 8 Segments per Page.

ECC Tek is proposing the use of RS vertical codewords for the 2D ECC system instead of binary BCH codewords and much smaller Segments than what are currently being used. The maximum height (in number of symbols) of one vertical codeword in a Segment is determined by the size of the RS column symbol being used.

5. 2D Encoding Concept

Think of the data in one Segment as a matrix or two-dimensional array of data items. Think of the columns as vertical RS codewords and the rows that contain data as horizontal RS codewords.

Figure 2 illustrates how redundant data items are appended onto freely chosen data items in a 2D ECC encoding scheme. Most likely 2D encoding schemes will be widely implemented in the near future. In Figure 2, the rows are encoded first and then the columns.

Figure 2 Encoded Data for 2D-RS ECC Schemes

6. Hardware Encoding and Decoding of Segments

A high-level block diagram of a NAND Flash storage device that encodes and decodes Segments in hardware using a parallel Reed-Solomon (PRS) row encoder and decoder and multiple RS column encoders and decoders is illustrated in Figure 3.

Figure 3 Encoding and Decoding of 2D Arrays in Hardware

When writing, rows are encoded by the PRS row encoder, columns are encoded by multiple RS column encoders, and a 2D encoded Segment, as illustrated in Figure 2, is written to N NAND Flash chips.

When reading, columns read from the N NAND Flash chips are decoded by multiple RS column decoders and rows are decoded by a PRS row decoder. The PRS row decoder will pass through the vertical redundancy unaltered if any of the row decodings for one Segment fails. If none of the row decodings for one Segment fails, the 2D decoder has determined that either no errors occurred or that all of the errors have been properly corrected.

Since multiple column encoders and decoders are required, the complexity of one column encoder and decoder pair must be reasonable in order for the 2D scheme to be practical – especially when the 2D ECC scheme is implemented in FPGAs as is the standard practice for spacecraft. Since we are considering relatively small symbol sizes, there will necessarily be multiple Segments per Page. For example, if we consider 4096-byte Pages and 6-bit column symbols, there will be more than 92 Segments per Page.

7. Correcting Erasures Plus Errors

Both the RS column and row encoders and decoders implemented in hardware can be designed to correct erasures plus errors. An erasure means a codeword location where the codeword symbol in that location has a high probability of being in error, but is not necessarily in error.

The PRS row decoder should always be designed to correct errors plus erasures, but the RS column decoders would probably be designed to correct errors only. When a column decoding fails, the decode fail signal from the column decoder indicates that all of the row symbols in that vertical codeword are erased. The PRS row decoder can correct t errors and s erasures as long as 2t + s < R where R is the number of redundant symbols per row codeword.

8. Repeated Decodings

Both the row and column decoders can output entire corrected codewords (both corrected data and corrected redundancy) so both column and row decoding on each Segment can be repeated an unlimited number of times if desired or necessary.

If none of the row decodings on a Segment fail, then it is assumed that the data in that segment either has no errors or that all of the errors have been correctly corrected. If any of the row decodings fail, the PRS row decoder will pass the corrected column redundancy through the PRS row decoder without altering it so that column and row decodings can be repeated if desired.

It is possible to duplicate the column and row hardware decoders multiple times or to feed the data back to the input by using more FIFOs and muxes, but that would probably be overkill since one column decoding followed by one row decoding will most likely correct all of the errors almost all of the time.

Most likely the best way to implement multiple decodings would be to do repeated decodings in software only when needed. Software correction time is probably not critical because repeated decodings would rarely be done.  That way, row and column decodings can be repeated any number of times and no additional hardware is required so it keeps the complexity of the ECC hardware to a minimum.

When an error pattern occurs that is too severe for one column and row decoding to correct it (which may never happen or may happen only once a year), then some means would need to be provided for the software to access and redecoded the codewords resulting from one correction and repeat the correction any number of times in an attempt to recover the data. Since ECC Tek licenses both C and Verilog code, the C code can be used for that purpose.

9. Interface to the NAND Flash Memory

N is the number of NAND Flash chips that are written and read simultaneously. In the first sample configuration N=15, the row symbol width is 4 and the number of redundant symbols is 4 so the input and output data path widths for the NAND Flash memory can be up to 11 symbols = 44 bits.

What if we want to use a 44 bit wide NAND Flash memory in a 64 bit system?

To do that, we need a 64 bit to 44 bit converter at the input of the NAND Flash memory and a 44 bit to 64 bit converter at the output.

For the input converter in this case we need to find the smallest integers x and y such that 64x = 44y. Note that x/y=44/64=22/32=11/16.  Since 11 and 16 have no common factors, we need 11, 64-bit registers to convert a 64-bit data path to a 44-bit data path as illustrated in Figure 4.

The circuit shown in Figure 4 can be thought of as a FIFO with different input and output widths. Control logic similar to a standard FIFO’s control logic can be developed.

There are several ways to handle the clock with a 64 bit to 44 bit input converter. One way is to pause five clock cycles for every 11 clock cycles while writing to the NAND Flash memory. Another way is to implement a state machine which will monitor the circuit and issue a pause signal to the input as needed. Another way is to use a higher frequency clock for the NAND Flash memory than what is used for the input logic.

Figure 4 64-bit to 44-bit Data Path Width Converter

Other converters are simpler such as the 64 bit to 40 bit input converter shown in Figure 5.

Figure 5 64-bit to 40-bit Data Path Width Converter

Converters required at the output follow the same pattern as those shown and are easy to design.

10. High-Level Block Diagram of Data Collection and Downloading System

We will describe how to implement highly reliable, fault-tolerant NAND Flash memories to be used in spacecraft data collection and downloading systems as illustrated in Figure 6.

Figure 6 Spacecraft Data Collection and Downloading System

Assume that the Data Collector can be paused between words written to the Main Data FIFO (MDFIFO).

The MDFIFO can be implemented using a dual-port RAM. Assume the read and write address registers are initially reset to 0. When a word is written to the MDFIFO, the write address is incremented by 1. When a word is read from the MDFIFO, the read address is incremented by 1. Whenever the write and read addresses equal the end of the RAM, they are reset to 0.

Assume that we have a “full_level” variable that is incremented by 1 every time a word is written to the MDFIFO and decremented by 1 every time a word is read from the MDFIFO. If a word is written and read in the same clock cycle, the full_level is not changed. The full_level variable indicates the number of words that are currently in the MDFIFO. In order for the Transmitter to read the MDFIFO, the full_level must be > 1.

If the full_level is close to the size of the RAM, the pause signal to the Data Collector will be asserted indicating that inputs to the MDFIFO must be paused to avoid an overflow condition.

The above-described method of implementing a synchronous FIFO is the method that ECC Tek uses in all of its ECC designs for the Data FIFO (DFIFO) in its decoders.

The remainder of this document describes how the MDFIFO can be designed using NAND Flash memory chips so that it will be highly reliable over time and contain an arbitrary level of fault-tolerance.

11. NAND Flash Memory Configurations With Data Path Width Converters

Data path width converters may sometimes be required if designers wish to minimize the complexity (gate count) of the ECC circuits. Since most spacecraft electronics are implemented in FPGAs, it is often important and necessary to reduce gate counts so the circuits will fit into a small number of FPGAs.

We will use 6-bit column symbols and 4-bit row symbols for the first sample configuration with 4 redundant row symbols per horizontal codeword and 6 redundant column symbols per vertical codeword. For that case, up to 44-bit data words can be input in parallel.

With 6-bit column symbols, the maximum height of a vertical RS codeword is 63 6-bit symbols = 378 bits. Therefore, the height of each Segment must be < 63, 6-bit symbols. For the first sample configuration, the height of one Segment was chosen to be 60, 6-bit symbols because 60 symbols * 6 bits/symbol = 360 bits which is a multiple of 4, 6 and 8 so that one column of a Segment contains an equal number of 4-bit, 6-bit and 8-bit quantities/symbols.

With NAND Flash Page sizes of ~4096 bytes, there will ~92 Segments per Page.

A 2D encoded array with 4-bit row symbols outlined in red and 6-bit column symbols outlined in blue is illustrated in Figure 7. All of the bits from one column are written to one NAND Flash chip. It doesn’t matter whether the NAND Flash chip inputs and outputs 8-bits at a time or 16-bits at a time or more. The only thing that matters is that all of the bits from one column are written to one NAND Flash chip so that if one NAND Flash chip fails, only one column of bits will be affected.

Figure 7 Encoded 2D-RS Array for First Sample Configuration

The above scheme allows up to 15 4-bit symbols per horizontal codeword and up to 63 6-bit symbols per vertical codeword.

Most of the time, the column decoders will be able to correctly recover the data without the aid of the row decoder. However, if a NAND Flash chip fails, then many column decodings in one column will fail indicating that all of the symbols in those codewords should be considered erasures by the row decoder.

With 4 redundant row symbols, up to t errors and s NAND Flash chip failures can be corrected in each row as long as 2t + s < 4 which is probably more than sufficient. In other words, with no soft errors, up to 4 NAND Flash chips can fail with no loss of data.

With 6 redundant 6-bit column symbols, up to 3 soft errors in 60 can be corrected in each vertical codeword. Some work should be done to determine how many vertical symbols need to be corrected for a specific type of NAND Flash device, but it’s OK for the column decoding to fail every once in a while because the row decoder can correct for multiple column decoding failures.

12. 2D-RS Encoding for First Sample Configuration

A block diagram of the 2D-RS encoder hardware for the first sample configuration is shown in Figure 8.

Figure 8 2D-RS Encoding for First Sample Configuration

Data can be continuously input to the 2D-RS encoder as long as the pause input signal is not asserted. Whenever the encoder reaches the end of a data field, it will request the input to pause while it outputs the redundancy for the current N vertical codewords. Once the pause signal is deasserted, the input can continue to input data words.

All of the pieces of logic shown in Figure 8 have been previously designed and licensed by ECC Tek.

13. 2D-RS Decoding for First Sample Configuration

A block diagram of the 2D-RS decoder for the first sample configuration is shown in Figure 9.

Figure 9 2D-RS Decoding for First Sample Configuration

The FIFOs are needed to store corrected vertical codewords from one Segment because a column decoder cannot know if decoding has failed until the entire codeword has been outputted, but the PRS Row Decoder needs to receive an erasure indication at the same time it receives an input word. The Fail signal from the column decoder is the erasure indication. A column decoding failure means the entire vertical corrected codeword stored in the decoder’s output FIFO has been “erased”.

For the first sample configuration, the Column Decoder FIFO only needs to store 60 6-bit symbols.

14. Percentage Redundancy for First Sample Configuration

Figure 10 shows the first sample configuration has 34% redundancy. The percentage redundancy is a function of the level of protection provided. The more protection provided, the higher the percentage redundancy.

Figure 10 Percentage Redundancy for First Sample Configuration

15. NAND Flash Memory Configurations Without Data Path Width Converters

This section describes how to design 2D-RS NAND Flash memories that do not require data path width converters. Generally speaking, these configurations will be more complex and take more gates to implement than configurations with data path width converters.

We will use 6-bit column symbols and 5-bit row symbols for the second sample configuration with 4 redundant row symbols per horizontal codeword and 6 redundant column symbols per vertical codeword. For this case, up to 135-bit data words can be input in parallel, but we will input 65-bit data words = 13 5-bit symbols. The 65th bit can be forced to 0 or it can be used for some type of auxiliary data.

With 6-bit column symbols, the maximum height of a vertical RS codeword is 63 6-bit symbols = 378 bits.

In this second configuration, the row and column symbol boundaries are aligned every 30 bits since 5 and 6 have no common factors.

To determine how often the row and column symbol boundaries align, we use the same method as we used for the data path width converters. That is, we find the lowest x and y so that 6x=5y. Note that x/y=5/6 which cannot be reduced further so boundaries in this case will be aligned only every 30 bits.

In the first configuration the row and column boundaries were aligned every 12 bits since 4/6=2/3.

Although Verilog code has not been developed for these designs yet, it appears that the symbol boundaries must line up at the end of the data. In other words, it appears that the height (in bits) of a data field shown in Figure 14 in a Segment for this configuration must be 30i where i is an integer.

It also appears that the height of the column redundancy field shown in Figure 14 can be any number of 6-bit symbols since the PRS row decoder ignores the column redundancy fields.

With NAND Flash Page sizes of ~4096 bytes, there will ~92 Segments per Page.

A 2D encoded array with 5-bit row symbols outlined in red and 6-bit column symbols outlined in blue is illustrated in Figure 11. All of the bits from one column are written to one NAND Flash chip. It doesn’t matter whether the NAND Flash chip inputs and outputs 8-bits at a time or 16-bits at a time or more. The only thing that matters is that all of the bits from one column are written to one NAND Flash chip so that if one NAND Flash chip fails, only one column of bits will be affected.

Figure 11 Encoded 2D-RS Array for Second Sample Configuration

The above scheme allows up to 31 5-bit symbols per horizontal codeword and up to 63 6-bit symbols per vertical codeword.

Most of the time, the column decoders will be able to correctly recover the data without the aid of the row decoder. However, if a NAND Flash chip fails, then many column decodings in one column will fail indicating that all of the symbols in that codeword should be considered erasures by the row decoder.

With 4 redundant row symbols, up to t errors and s NAND Flash chip failures can be corrected in each row as long as 2t + s < 4 which is probably more than sufficient. In other words, with no soft errors, up to 4 NAND Flash chips can fail with no loss of data.

With 6 redundant 6-bit column symbols, up to 3 soft errors in 60 can be corrected in each vertical codeword. Some work should be done to determine how many vertical symbols need to be corrected for a specific type of NAND Flash device, but it’s OK for the column decoding to fail every once in a while because the row decoder can correct for multiple column decoding failures.

16. 2D-RS Encoding for Second Sample Configuration

A block diagram of the 2D-RS encoder hardware for the second sample configuration is shown in Figure 12.

Figure 12 2D-RS Encoding for Second Sample Configuration

Data can be continuously input to the 2D-RS encoder as long as the pause input signal is not asserted. Whenever the encoder reaches the end of a data field, it will request the input to pause while it outputs the redundancy for the current codeword. Once the pause signal is deasserted, the input can continue to input data words.

All of the pieces of logic shown in Figure 12 have been previously designed and licensed by ECC Tek.

17. 2D-RS Decoding for Second Sample Configuration

A block diagram of the 2D-RS decoder for the second configuration is shown in Figure 13.

Figure 13 2D-RS Decoding for Second Sample Configuration

The FIFOs are needed to store corrected vertical codewords from one Segment because a column decoder cannot know if decoding has failed until the entire codeword has been outputted, but the PRS Row Decoder needs to receive an erasure indication at the same time it receives an input word. The Fail signal from the column decoder is the erasure indication. A column decoding failure means the entire vertical corrected codeword stored in the decoder’s output FIFO has been “erased”.

For the sample configuration, the Column Decoder FIFO only needs to store 60 6-bit symbols.

18. Percentage Redundancy for Second Sample Configuration

Figure 14 shows the second sample configuration has 29.5% redundancy. The percentage redundancy is a function of the level of protection provided. The more protection provided, the higher the percentage redundancy.

Figure 14 Percentage Redundancy for Second Sample Configuration

Links

2D ECC Concepts

2D-RS SSDs

2D-RS HDDs

2D-RS HDD Products

2D-RS Storage Systems

Basic ECC Concepts

Binary BCH Design Customization

Correcting Cell Failures

Cylinder Drive Concept

ECC Tek’s Web Site

Finite Fields, RS Codes and RS RAID

Finite Fields with 4-bit Elements

Important High-Level ECC Design Considerations

Iterative Decoding of 2D Codes

LDPC and RS Codes

Most Important Disk Drive Facts

PRS Patent

The Danger of Overemphasizing Theory

Understanding RS RAID