1 of 29

Chapter 11 Storage

In computer architecture, storage refers to the components and systems used to store data and instructions either temporarily or permanently. It plays a crucial role in enabling a computer to retain information, access it efficiently, and execute programs.

🔹 Types of Storage

Primary Storage (Main Memory)

RAM (Random Access Memory)

Volatile: loses data when power is off
Fast access
Used to store data and instructions currently being used

ROM (Read-Only Memory)

Non-volatile: retains data without power
Stores firmware or system boot code

Secondary Storage (Permanent Storage)

Examples: Hard Disk Drives (HDDs), Solid State Drives (SSDs), Optical Discs (CD/DVD), USB drives
Non-volatile
Slower than RAM but much larger in capacity
Used to store operating systems, applications, files, and backups

Tertiary Storage (Archival Storage)

Example: Magnetic tape
Used for long-term data storage and backups
Low cost per bit, but slow access

Cache Memory

Very fast and small-sized memory located close to the CPU
Stores frequently used data and instructions to speed up processing
Levels: L1 (fastest, smallest), L2, and L3

Registers

Smallest and fastest form of memory inside the CPU
Temporarily hold data and instructions during execution

2 of 29

Input/Output

I/O Interface

Device drivers
Device controller
Service queues
Interrupt handling

Design Issues

Performance
Expandability
Standardization
Resilience to failure

Impact on Tasks

Blocking conditions
Priority inversion
Access ordering

Processor

Computer

Control

Datapath

Memory

Devices

Input

Output

Processor

Computer

Control

Datapath

Memory

Devices

Input

Output

Network

The Role of I/O Interface in Computer Operation

I/O Interface or input/output interface is an important part that connects the computer to external devices such as keyboard, mouse, hard disk or network devices. It has the following main components:

Device Driver: It is a software that acts as a medium for communication between the operating system and hardware devices, allowing the operating system to control various devices without having to understand the technical details of each type of hardware.

Device Controller: It is hardware embedded in each device that directly controls the operation of that device, such as reading data from the hard disk or sending data to the monitor.

Service Queues: It is a queue used to prioritize the operation of requests to access various devices when multiple programs need to use the same device.

Interrupt Handling: It is a mechanism that allows the computer to respond to events that occur with external devices immediately, such as when a button is pressed on the keyboard or data arrives at the network interface.

Summary: The I/O Interface acts as a link between the computer and the outside world, allowing the computer to receive data for processing and sending results to various devices efficiently. Each component works together to make communication smooth and highly efficient.

Key Issues in I/O Design Issues

I/O Design Issues are an important topic in computer engineering because I/O is the interface between a computer system and the outside world. Good I/O design affects the performance, scalability, and reliability of the overall system.

1. Performance

Transfer rate: The speed at which data is sent and received between devices and the system.

Latency: The time it takes to access and respond to requests from devices.

Throughput: The number of requests the system can support simultaneously.

Related factors: Bandwidth, Latency, Throughput

2. Expandability

Adding devices: New devices can be added to the system easily and quickly.

Replacing devices: Old devices can be replaced with newer, more modern devices.

Scaling: The system can be adjusted to accommodate increased workloads.

Related factors: System architecture, ports, protocols.

3. Standardization

Compatibility: Devices can work together smoothly.

Reducing costs: Using the same standards reduces development and maintenance costs. Ease of use: It is easier for users to use the devices.

Related factors: Industry standards, protocols, drivers

4. Resilience to failure

System continuity: The system continues to function even if some devices fail.

Data recovery: Lost data can be recovered.

Error detection and correction: The system can detect and correct errors that occur automatically.

Related factors: Redundancy, Error detection and correction, Fault tolerance

More on Impact on Task in various aspects

1. Blocking Conditions

Blocking Conditions refers to a condition that causes a process or thread to temporarily stop waiting for some event to happen, such as waiting to receive data from an I/O device, waiting for a desired resource to be freed, or waiting for another thread to send a signal.

Impact on Task:

Decreased performance: When a task is blocked, the system cannot fully utilize the resources of the machine, resulting in a decrease in overall system performance.

Increased Latency: Waiting causes a delay in the execution of the task, affecting the system's response latency.

May cause Deadlock: If multiple tasks are blocked waiting for each other, a deadlock can occur, causing the system to stop working.

Example:

I/O Blocking: When a program executes a command to read data from the disk, the program must stop until the data is read.

Mutex Blocking: When a thread wants to access a resource that is protected by a mutex and that resource is occupied by another thread. The first thread must wait until another thread releases the resource.

2. Priority Inversion

Priority Inversion occurs when a thread with lower priority acquires a resource that a thread with higher priority needs, causing the thread with higher priority to wait for the thread with lower priority first.

Impact on Task:

Reduced Priority: The task prioritization mechanism does not work efficiently.

Increased Latency: The thread with higher priority has to wait longer, causing delays in work.

May cause system instability: In some cases, Priority Inversion can cause the system to have a deadlock.

3. Access Ordering

Access Ordering refers to the order in which shared resources are accessed, especially when multiple threads access the same resource simultaneously. If the access order is incorrect, various problems can occur.

Impact on Task:

Data Corruption: If threads access and modify data simultaneously without control, the data may be corrupted.

Invalid Results: Computations involving data that are accessed simultaneously may produce incorrect results.

Race Condition: Occurs when the result of a program depends on the order in which multiple threads execute.

3 of 29

Impact of I/O on System Performance

Suppose we have a benchmark that executes in 100 seconds of elapsed time, where 90 seconds is CPU time and the rest is I/O time. If the CPU time improves by 50% per year for the next five years but I/O time does not improve, how much faster will our program run at the end of the five years?

Answer: Elapsed Time = CPU time + I/O time

Over five years:

CPU improvement = 90/12 = 7. BUT System improvement = 100/22 = 4.5

Suppose we have a benchmark that looks at 100 seconds, with 90 seconds spent on CPU processing and 10 seconds spent on I/O.

Assuming that CPU performance improves by 50% per year over the next 5 years, but I/O performance remains the same, how much faster will our program be in the next 5 years?

Elapse time refers to the amount of time that passes between the start and end of an event.

In the context of this question:

Elapse time of 100 seconds means that the program we are testing takes a total of 100 seconds to complete its task.

CPU time of 90 seconds means that out of those 100 seconds, the program spends 90 seconds processing data on the CPU.

I/O time of 10 seconds means that the program spends 80 seconds on input and output (e.g., reading data from the hard disk). Write data to the monitor) 10 more seconds

In short:

Elapse time is the total time spent performing a task

CPU time is the time spent performing internal calculations in the CPU

I/O time is the time spent retrieving data

In this question, we can see that the majority of the elapsed time is CPU processing time (90 seconds) compared to retrieving data (10 seconds), which means that CPU performance has a greater impact on program execution time.

A 50% increase in CPU speed per year means that each year the CPU will be 50% faster than the previous year. So in the next 5 years, the CPU will be much faster, and this will result in significantly reduced execution time.

However, since I/O time remains the same, improving CPU performance alone will not halve the program execution time, as there are other factors involved, such as the amount of data being transferred and the efficiency of the I/O devices.

4 of 29

Typical I/O System

The connection between the I/O devices, processor, and memory are usually called (local or internal) bus
Communication among the devices and the processor use both protocols on the bus and interrupts

Processor

Cache

Memory - I/O Bus

Main

Memory

I/O

Controller

Disk

I/O

Controller

I/O

Controller

Graphics

Network

interrupts

1. The Connection between I/O devices, processors, and memory

Bus: The term "bus" here refers to the communication path within the computer used to connect various devices together, such as the processor, memory, and various I/O (Input/Output) devices (e.g. hard disk, mouse, keyboard).

Internal Bus: Also known as the local bus, it is a bus inside the computer itself. It is used for communication between internal components, such as the CPU and RAM.

Comparison: A bus is like a road in a city that connects buildings together (buildings are computer devices). Data is transmitted back and forth along this bus.

2. Communication

Protocol on the bus: It is a set of rules that define the format and method of sending data across the bus so that all devices understand and can communicate correctly.

Interrupt: It is a method by which a device sends a signal to the processor to make the processor temporarily stop its current task and then follow the request of that device. For example, when we click the mouse, the mouse device sends an interrupt signal to the processor. To let the processor know that a mouse click has occurred.

Summary: The internal operation of a computer requires the connection and communication between devices via a bus, and this communication must follow certain rules (protocols), including the use of interrupts to ensure efficient operation.

Analyze: A computer is like a large factory, with various components working together. The bus is like a conveyor belt that transports raw materials and finished products back and forth. Interrupts are like signals to tell workers to go to work on the desired part.

5 of 29

I/O Device Examples

Device Behavior Partner Data Rate (KB/sec)

Keyboard Input Human 0.01

Mouse Input Human 0.02

Line Printer Output Human 1.00

Floppy disk Storage Machine 50.00

Laser Printer Output Human 100.00

Optical Disk Storage Machine 500.00

Magnetic Disk Storage Machine 5,000.00

Network-LAN Input or Output Machine 20 – 1,000.00

Graphics Display Output Human 30,000.00

6 of 29

Disk History

Data density in

Mbit/square inch

Capacity of Unit Shown in Megabytes

source: New York Times, 2/23/98, page C3

7 of 29

Organization of a Hard Magnetic Disk

Typical numbers (depending on the disk size):

500 to 2,000 tracks per surface
32 to 128 sectors per track

A sector is the smallest unit that can be read or written to

Traditionally all tracks have the same number of sectors:

Constant bit density: record more sectors on the outer tracks
Recently relaxed: constant bit size, speed varies with track location

Platters

Track

Sector

Here is a primitive picture showing you how a disk drive can have multiple platters.

Each surface on the platter are divided into tracks and each track is further divided into sectors. A sector is the smallest unit that can be read or written.

By simple geometry, you know the outer track has more area, and you would think the outer track will have more sectors.

This, however, is not the case in traditional disk design, where all tracks have the same number of sectors. Well, you will say, this is dumb, but dumb is the reason they do it .

By keeping the number of sectors, the same, the disk controller hardware and software can be dumb and does not have to know which track has how many sectors.

With more intelligent disk controller hardware and software, it is getting more popular to record more sectors on the outer tracks. This is referred to as constant bit density.

8 of 29

Magnetic Disk Operation

Cylinder: all the tracks under the �head at a given point on all surface
Read/write is a three-stage process:

Seek time

position the arm over proper track

Rotational latency

wait for the sector to rotate under the read/write head

Transfer time

transfer a block of bits (sector) under the read-write head

Average seek time

(∑ time for all possible seeks) / (# seeks)
Typically in the range of 8 ms to 12 ms
Due to locality of disk reference, actual average seek time may only be 25% to 33% of the advertised number

Cylinder

Sector

Track

Head

Platter

To read write information into a sector, a movable arm containing a read/write head is located over each surface.

The term cylinder is used to refer to all the tracks under the read/write head at a given point on all surfaces.

To access data, the operating system must direct the disk through a 3-stage process.

(a) The first step is to position the arm over the proper track. This is the seek operation and� the time to complete this operation is called the seek time.

(b) Once the head has reached the correct track, we must wait for the desired sector to� rotate under the read/write head. This is referred to as the rotational latency.

(c) Finally, once the desired sector is under the read/write head, the data transfer can begin.

The average seek time as reported by the manufacturer is in the range of 12 ms to 20ms and is calculated as the sum of the time for all possible seeks divided by the number of possible seeks.

This number is usually on the pessimistic side because due to locality of disk reference, the actual average seek time may only be 25 to 33% of the number published.

9 of 29

Magnetic Disk Characteristic

Rotational Latency:

Most disks rotate at 5,400 to 10,000 RPM
Approximately 11 ms to 6 ms per revolution, respectively
An average latency to the desired information is halfway around the disk:

5.5 ms at 5400 RPM, 3 ms at 10000 RPM

Transfer Time is a function of :

Transfer size (usually a sector): 1 KB / sector
Rotation speed: 5400 RPM to 10000 RPM
Recording density: bits per inch on a track
Diameter: typical diameter ranges from 2.5 to 5.25”
Typical values ~500MB per second

10 of 29

Example

Calculate the access time for a disk with 512 byte/sector and 12 ms advertised seek time. The disk rotates at 5400 RPM and transfers data at a rate of 4MB/sec. The controller overhead is 1 ms. Assume that the queue is idle (so no service time)

Answer:

Disk Access Time = Seek time + Rotational Latency + Transfer time

+ Controller Time + Queuing Delay

= 12 ms + 0.5 / 5400 RPM + 0.5 KB / 4 MB/s + 1 ms + 0

= 12 ms + 0.5 / 90 RPS + 0.125 / 1024 s + 1 ms + 0

= 12 ms + 5.5 ms + 0.1 ms + 1 ms + 0 ms

= 18.6 ms

If real seeks are 1/3 the advertised seeks, disk access time would be

10.6 ms, with rotation delay contributing 50% of the access time!

11 of 29

Historical Trend

Characteristics IBM 3090 IBM UltraStar Integral 1820

Disk diameter (inches) 10.88 3.50 1.80

Formatted data capacity (MB) 22,700 4,300 21

MTTF (hours) 50,000 1,000,000 100,000

Number of arms/box 12 1 1

Rotation speed (RPM) 3,600 7,200 3,800

Transfer rate (MB/sec) 4.2 9-12 1.9

Power/box (watts) 2,900 13 2

MB/watt 8 102 10.5

Volume (cubic feet) 97 0.13 0.02

MB/cubic feet 234 33000 1050

12 of 29

Reliability and Availability

Two terms that are often confused:

Reliability: Is anything broken?
Availability: Is the system still available to the user?

Availability can be improved by adding hardware:

Example: adding ECC on memory

Reliability can only be improved by:

Enhancing environmental conditions
Building more reliable components
Building with fewer components

Improve availability may come at the cost of lower reliability

This bring us to two terms that are often confused: reliability and availability.

Here is the proper distinction: reliability asks the question is anything broken?

Availability, on the other hand, ask the question is the system still availability to the user?

Adding hardware can therefore improve availability. For example, an airplane with two engines is more “available” than an airplane with one engine.

Reliability, on the other hand, can only be improved by bettering environmental conditions, building more reliable components, or reduce the number of components in the system.

Notice that by adding hardware to improve availability, you may actually reduce reliability.

For example, an airplane with two engines is twice as likely to have an engine failure than an airplane with only one engine so its reliability is lower although its availability is higher.

13 of 29

Disk Arrays

Increase potential throughput by �having many disk drives:

Data is spread over multiple disk
Multiple accesses are made to several disks

Reliability is lower than a single disk:

Reliability of N disks = Reliability of 1 Disk ÷ N

(50,000 Hours ÷ 70 disks = 700 hours)
Disk system MTTF: Drops from 6 years to 1 month

Arrays (without redundancy) too unreliable to be useful!
But availability can be improved by adding redundant disks (RAID):

Lost information can be reconstructed from redundant information

The discussion of reliability and availability brings us to a new organization of disk storage where arrays of small and inexpensive disks are used to increase the potential throughput.

This is how it works: Data is spread over multiple disk so multiple accesses can be made to several disks either via interleaving or done in parallel.

While disk arrays improve throughput, latency is not necessary improved.

Also with N disks in the disk array, its reliability is only 1 over N the reliability of a single disk.

But availability can be improved by adding redundant disks so lost information can be reconstructed from redundant information.

Since mean time to repair is measured in hours and MTTF (mean time to failure) is measured in years, redundancy can make the availability of disk arrays much higher than that of a single disk.

14 of 29

Manufacturing Advantages of Disk Arrays

14”

10”

5.25”

3.5”

Disk Array: 1 disk design

Conventional: 4 disk designs

Low End

High End

Disk Product Families

Replace Small # of Large Disks with Large # of Small Disks!

15 of 29

Redundant Arrays of Disks

Redundant Array of Inexpensive Disks (RAID)

Widely available and used in today’s market
Files are "striped" across multiple spindles
Redundancy yields high data availability despite low reliability
Contents of a failed disk is reconstructed from data redundantly stored in the disk array
Drawbacks include capacity penalty to store redundant data and bandwidth penalty to update a disk block
Different levels based on replication level and recovery techniques

RAID stands for Redundant Array of Independent Disks. It is a technology used to combine multiple hard drives together to increase the efficiency, capacity, and reliability of the storage system.

1. Prevalence

RAID is a widespread technology and is widely used in the market today, both in servers, workstations, and large storage systems.

2. Striping

RAID works by dividing data into small sections (Stripes) and distributing them to store on multiple hard drives, making reading and writing data faster.

3. High reliability

RAID can increase the reliability of the storage system by creating redundancy, which allows the system to continue working even if one hard drive fails.

4. Data recovery

If one hard drive fails, the RAID system can recover the lost data by reconstructing the data from the backup data stored on the other hard drives.

5. Limitations

Disk space usage: RAID requires additional disk space to store the backup data.

Write performance: Writing data to RAID can be slower because it has to write data to multiple hard drives. Hard disk simultaneously

6. RAID level

RAID has many levels, each level has different levels of data redundancy and recovery techniques, such as RAID 0, RAID 1, RAID 5, RAID 6, etc.

Summary: RAID is a very useful technology to increase the efficiency, capacity, and reliability of the storage system. However, the choice of RAID level depends on the needs and limitations of each system.

16 of 29

RAID 1: Disk Mirroring/Shadowing

Each disk is fully duplicated onto its "shadow“
Very high availability can be achieved
Bandwidth sacrifice on write: Logical write = two physical writes
Reads may be optimized
Most expensive solution: 100% capacity overhead

Targeted for high I/O rate , high availability environments

recovery

group

RAID 1, also known as Mirroring, is a form of RAID that focuses on data reliability. The principle is to make a one-to-one copy of data from one hard disk to another, creating a "shadow" of the data.

Advantages of RAID 1

High reliability: Since there are two copies of the data, if one hard disk fails, the system can continue to operate using the data from the other hard disk, reducing the risk of data loss.

Faster reading: In some cases, RAID 1 can read data faster because the system can read data from both hard disks simultaneously.

Disadvantages of RAID 1

Large storage space: Since all data must be backed up, a hard disk with twice the capacity of the data to be stored is required.

Write speed: Data is slower because data must be written to both hard disks simultaneously.

Summary of the given text

Each disk is fully duplicated onto its "shadow": Each hard disk has a complete copy of the data onto the other hard disk.

Very high availability can be achieved: Very high data availability can be achieved.

Bandwidth sacrifice on write: Logical write = two physical writes: Data writes are slower. Since data must be written to two hard drives

Reads may be optimized: Reads may be faster in some cases

Most expensive solution: 100% capacity overhead: This is a costly solution, since it requires twice the storage capacity

Overview: RAID 1 is suitable for systems that require high data reliability, but at the cost of increased storage and reduced write performance

When to use RAID 1:

Systems that require high data availability, such as database servers, mission critical storage

Systems that can accept increased storage costs

When not to use RAID 1:

Systems that require maximum write performance

Systems with limited storage capacity

17 of 29

RAID 3: Parity Disk

P

10010011

11001101

10010011

. . .

logical record

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

Striped physical

records

Parity computed across recovery group to protect against hard disk failures
33% capacity cost for parity in this configuration: wider arrays reduce

capacity costs, decrease expected availability, increase reconstruction time

Arms logically synchronized, spindles rotationally synchronized

(logically a single high capacity, high transfer rate disk)

Targeted for high bandwidth applications: Scientific, Image Processing

RAID 3 is one of the RAID levels that focuses on improving read performance, especially for sequential access, such as reading large video files.

It works like this:

Parity computed across recovery group to protect against hard disk failures: Parity data is computed across all data in a group of hard disks and stored on another hard disk. This parity checks the integrity of the data. If one hard disk fails, the system can use the parity data to recover the lost data.

33% capacity cost for parity in this configuration: wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time: To store parity data, approximately 33% of the total capacity of a RAID 3 system is required. As more hard disks are added to the system, the proportion of space used for parity storage decreases, but this also decreases the reliability of the system and increases the time it takes to recover data if a failure occurs.

Arms logically synchronized, spindles rotate synchronized (logically a single high capacity, high transfer rate disk): The read heads of all hard disks in a RAID 3 system are synchronized to operate simultaneously. It allows for continuous and fast reading of data, as if it were a single large hard disk.

Summary

RAID 3 is suitable for tasks that require high data reading speed, such as servers used to provide large file services. However, the disadvantage is that it must lose storage space to store parity data, and the reliability of the system may not be as high as other RAID levels, such as RAID 5 or RAID 6.

Advantages of RAID 3:

High data reading speed: Suitable for tasks that require continuous reading of large amounts of data.

Recovery from the failure of a single hard disk: Parity data allows for the recovery of lost data.

Disadvantages of RAID 3:

Loss of storage space: Additional storage space is required to store parity data.

Lower reliability than high-level RAID: If the hard disk that stores parity data is damaged, the data cannot be recovered.

Slow data writing speed: Because parity data must be calculated every time data is written.

Summary: RAID 3 is an interesting option for tasks that require fast data reading, but the pros and cons and suitability for the intended use should be considered before deciding to use it.

18 of 29

RAID 5 : Striping with Parity

Key Features of RAID 5:

Data striping: Data is divided into smaller blocks and distributed across multiple disks in the array, resulting in faster read and write speeds.
Parity: Parity information is calculated from all data in the array and distributed across all disks. This allows for data recovery in case of a single disk failure.
High reliability: Can tolerate the failure of one disk without causing total data loss.
High performance: Both read and write operations are fast due to data striping, which allows for parallel access to data from multiple disks.

RAID 5: A Balance of Performance and Reliability

RAID 5 is one of the most popular RAID levels, as it combines performance and data reliability.

Key Features of RAID 5

Striping: Data is divided into small blocks and distributed across each hard disk in the array, allowing faster read and write operations.

Parity: Parity data is calculated from all data in the array and distributed across each hard disk, allowing data recovery if one hard disk fails.

High Reliability: It can withstand the failure of a single hard disk without losing all data.

High Efficiency: Fast read and write operations, due to striping, allow access to data across multiple hard disks simultaneously.

Key Features Compared to RAID 3

Parity Data Distribution: RAID 5 distributes parity data across all hard disks in the array, while RAID 3 stores parity data on a single hard disk, making RAID 5 more efficient in using storage space.

Flexibility: RAID 5 can accommodate the addition or subtraction of hard drives in an array more easily than RAID 3.

Ideal for:

Servers: Suitable for servers that require fast data access and moderate data reliability.

Workstations: Suitable for workstations that require large amounts of storage and data security.

General-purpose storage: Suitable for systems that require a balance between performance and reliability.

Considerations:

Cost: Requires at least three hard drives, making it more expensive than using a single hard drive.

Complexity: Setting up and managing RAID 5 can be more complicated than other RAID levels.

Risk: If more than one hard drive fails simultaneously, data can be lost.

Conclusion:

RAID 5 is a good choice for those who want a highly reliable, efficient, and cost-effective storage system. However, it is important to consider its pros and cons and its suitability for actual use before deciding to use it.

19 of 29

Block-Based Parity

Block-based parity leads to more efficient read access compared to RAID 3
Designating a parity disk allows recovery but will keep it idle in the absence

of a disk failure

RAID 5 distribute the parity block to allow the use of all disk and enhance

parallelism of disk access

RAID 4

RAID 5

Comparison between RAID 3 and RAID 5, especially in terms of parity storage and read performance

RAID 3 is a system that uses parity data to check and recover data. The parity data is stored on a single hard disk, which means that the hard disk is not always fully utilized for storage, resulting in some storage space loss.

RAID 5 is an improvement on RAID 3 by solving this storage space loss problem by distributing parity data to all hard disks in the array, allowing for more efficient use of storage space.

Summary of the key differences between RAID 3 and RAID 5 according to the text provided

Parity storage:

RAID 3: Stores parity data on a single hard disk

RAID 5: Distributes parity data to all hard disks in the array

Read performance:

RAID 3: Has good read performance Because it is easy to access parity data.

RAID 5: It has slightly better read performance than RAID 3 because the distribution of parity data allows for more parallel access.

Advantages of RAID 5 over RAID 3

Storage efficiency: RAID 5 can use storage more efficiently than RAID 3 because it does not have to reserve space for parity data on a single hard disk.

Flexibility: RAID 5 can support adding or reducing the number of hard disks in the array more easily than RAID 3.

Read performance: RAID 5 has slightly better read performance.

Overall conclusion

From the given text, we can conclude that RAID 5 is an improvement over RAID 3 to solve the problem of wasted storage space and increase performance. Therefore, RAID 5 is more popular than RAID 3 today.

20 of 29

RAID 5+: High I/O Rate Parity

A logical write

becomes four

physical I/Os

Independent writes

possible because of

interleaved parity

Reed-Solomon

Codes ("Q") for

protection during

reconstruction

D0

D1

D2

D3

P

D4

D5

D6

P

D7

D8

D9

P

D10

D11

D12

P

D13

D14

D15

P

D16

D17

D18

D19

D20

D21

D22

D23

P

.

Disk Columns

Increasing

Logical

Disk

Addresses

Stripe

Unit

Targeted for mixed

applications

What is RAID 5+ and how is it different from RAID 5?

RAID 5+ is a form of RAID that is developed from RAID 5 with improved parity data storage mechanisms. RAID 5+ generally uses Reed-Solomon code to calculate parity data, allowing for more error detection and correction than RAID 5, and can withstand more than one hard disk failure.

Advantages of RAID 5+ compared to RAID 5:

High reliability: Can withstand more than one hard disk failure.

Efficiency in data recovery: Data can be recovered even if multiple hard disks are damaged simultaneously.

Efficiency in data writing: Each data write can be done independently. Improves performance

Disadvantages of RAID 5+:

Complexity: The RAID 5+ mechanism is more complex than RAID 5.

Cost: RAID 5+ may require devices that support the Reed-Solomon code, which may be more expensive.

A logical write becomes four physical I/Os: This means that when data is written to a RAID 5+ system once (logical write), the system must actually write the data to the hard disk four times (physical I/Os) because it must calculate and distribute the parity data to the hard disks according to the RAID 5+ mechanism.

Independent writes possible because of interleaved parity: This indicates that each write can be done independently because the parity data is distributed to each block of data, so each block of data writing does not have to wait for the parity calculation of other blocks, resulting in increased performance in writing data.

Reed-Solomon Codes (“Q”) for protection during reconstruction: RAID 5+ uses Reed-Solomon codes (also known as Q) to help recover data in the event of a data failure. This code helps to detect and correct data errors efficiently. It enables data recovery even if multiple hard drives fail simultaneously.

Summary

RAID 5+ is an interesting technology for those who need a highly reliable and efficient storage system, especially for systems with critical data and high security requirements. However, choosing RAID 5+ should consider factors such as cost, complexity, and suitability for practical use.

21 of 29

Problems of Small Writes

D0

D1

D2

D3

P

D0'

+

D0'

D1

D2

D3

P'

new

data

old

data

old

parity

XOR

(1. Read)

(2. Read)

(3. Write)

(4. Write)

RAID-5: Small Write Algorithm

1 Logical Write = 2 Physical Reads + 2 Physical Writes

Small Write is a common problem in RAID 5 systems, especially in systems with high I/O usage, such as database systems or server systems that need to support a large number of data accesses.

Cause of the problem

RAID 5 mechanism: When data is written to a RAID 5 system, the data is divided into small blocks and distributed to various hard disks, along with recalculating the parity data. This process takes a lot of time and resources.

Data block size: The size of the data block used to store data in RAID 5 affects the efficiency of writing data, especially when writing small amounts of data. If the size of the data block is too large, writing a single small amount of data will require updating the parity data in many blocks, which causes a loss of efficiency.

I/O volume: If a large amount of data is written continuously, the RAID 5 system will have to work hard to calculate the parity data, which may reduce the overall efficiency of the system.

Impact of the Small Write problem

Reduced data writing efficiency: Writing a large number of small pieces of data will significantly reduce the overall efficiency of the system.

Data access latency: When a large number of small pieces of data are written, the system may experience a delay in responding to data access commands.

Shortened hard disk life: Frequent writing will make the hard disk work harder, which may shorten the hard disk's life.

How to solve the Small Write problem

Adjust the size of the data block: Adjusting the size of the data block to suit the usage pattern will help reduce the Small Write problem.

Use RAID technology that can better handle Small Write: such as RAID 6 or RAID 10.

Use SSD: Using SSD instead of HDD can help solve the Small Write problem because SSD has a higher data access speed than HDD.

Adjust system parameters: Adjusting various system parameters, such as adjusting the operating system cache or adjusting the RAID controller, can help reduce the Small Write problem.

Summary

The Small Write problem is a common problem in RAID 5 systems and has a significant impact on system performance. Choosing the right RAID for the usage pattern, adjusting the system parameters, and using modern technology can help solve the Small Write problem and improve system performance.

22 of 29

Subsystem Organization

host

array

controller

single board

disk

controller

single board

disk

controller

single board

disk

controller

single board

disk

controller

host

adapter

manages interface

to host, DMA

control, buffering,

parity logic

physical device

control

often piggy-backed

in small format devices

striping software off-loaded from

host to array controller

no applications modifications

no reduction of host performance

This RAID system is designed to be able to off-load the workload related to striping data management directly to the array controller, which will benefit the overall system as follows:

Definition of key terms

Stripping: is a technique of dividing data into small parts (blocks) and distributing them to multiple hard disks to increase data access speed.

Off-loaded: means transferring some of the workload to other devices to reduce the workload of the main device.

Array controller: is a chip or board that controls the operation of multiple hard disks that are combined into an array.

Advantages of this RAID design

Reduce CPU load: Since the array controller manages the distribution of striping data, the CPU of the main system does not have to bear this workload, allowing the CPU to process more of other tasks.

Improved efficiency: Since the array controller is specialized in striping data management, the system operation is faster and more efficient.

No need to modify the application: Since the operation of the RAID system is hidden within the array controller, there is no need to modify the application in use.

No impact on system performance: Offloading the workload to the array controller This ensures that the overall system performance does not decrease.

Overview of Subsystem Organization

Illustration:

Open in new windowwww.geeksforgeeks.org

RAID subsystem organization with offloaded striping to array controller

Image description:

Host system: The primary system that accesses data

Array controller: The array controller that manages the distribution of striped data and controls the operation of all hard disks

Hard disks: The hard disks used to create the array

Work process:

When the primary system needs to save data, the system sends the data to the array controller.

The array controller divides the data into small blocks and distributes them to the various hard disks according to the RAID mechanism.

When the primary system needs to read data, the array controller collects the data from the various hard disks and sends it back to the primary system.

Summary

The design of a RAID system that offloads striping data management to the array controller is a design that emphasizes efficiency and ease of use. It reduces the workload of the primary system, increases work efficiency, and does not require any application changes, making the system more flexible and easier to use.

Additional advantages:

High reliability: Data is divided into multiple hard disks, making the data more secure if any hard disk is damaged. Data is still recoverable

Easy to expand: Hard drives can be easily added to the array to increase storage capacity

In summary, RAID systems that offload striping data management to the array controller are widely popular due to their high efficiency, high reliability, and ease of use.

23 of 29

System Availability: Orthogonal RAIDs

Data Recovery Group: unit of data redundancy
Redundant Support Components: fans, power supplies, controller, cables
End to End Data Integrity: internal parity protected data paths

Array

Controller

String

Controller

String

Controller

String

Controller

String

Controller

String

Controller

String

Controller

. . .

Orthogonal RAIDs are systems designed to increase the availability of storage systems by focusing on creating redundancy at multiple levels to reduce the risk of data loss in the event of a failure.

Key components that make the system highly available:

Data Recovery Group: is a subset of data that is created to allow data recovery in the event of a failure by having a backup or calculating parity data for data recovery.

Redundant Support Components: The system will have redundant components such as fans, power supplies, controllers and cables to reduce the risk of system failure. If any component is damaged, other components can work instead.

End to End Data Integrity: Data is checked for correctness throughout the process, from writing data into the system to reading it out, using validation techniques such as calculating parity data to ensure that the stored data is correct and complete.

Summary:

System Availability using the Orthogonal RAIDs concept focuses on creating redundancy at all levels of the system, whether it is the data level, component level or process level. Make the system fault-tolerant and able to continue working even if some components are damaged.

Advantages of this system:

High availability: Reduces the risk of data loss.

Reliability: The system is stable and works continuously.

Data security: Data is always checked for accuracy.

Application:

This type of high availability system is often used in systems that require continuous operation, such as corporate databases, 24-hour server systems, or security-related systems.

24 of 29

Polling: Programmed I/O

Advantage:

Simple: the processor is totally in control and does all the work

Disadvantage:

Polling overhead can consume a lot of CPU time

CPU

IOC

device

Memory

Is the

data

ready?

read

data

store

data

yes

no

done?

no

yes

busy wait loop

not an efficient

way to use the CPU

unless the device

is very fast!

but checks for I/O

completion can be

dispersed among

computation

intensive code

25 of 29

Interrupt Driven Data Transfer

Advantage:

User program progress is only halted during actual transfer

Disadvantage: special hardware is needed to:

Cause an interrupt (I/O device)
Detect an interrupt (processor)
Save the proper states to resume after the interrupt (processor)

add

sub

and

or

nop

read

store

...

rti

memory

user

program

(1) I/O

interrupt

(2) save PC

(3) interrupt

service addr

interrupt

service

routine

(4)

CPU

IOC

device

Memory

:

That is, whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing.

This is how an I/O interrupt looks in the overall scheme of things. The processor is minding its business when one of the I/O device wants its attention and causes an I/O interrupt.

The processor then save the current PC, branch to the address where the interrupt service routine resides, and start executing the interrupt service routine.

When it finishes executing the interrupt service routine, it branches back to the point of the original program where we stop and continue.

The advantage of this approach is efficiency. The user program’s progress is halted only during actual transfer.

The disadvantage is that it require special hardware in the I/O device to generate the interrupt. And on the processor side, we need special hardware to detect the interrupt and then to save the proper states so we can resume after the interrupt.

26 of 29

I/O Interrupt vs. Exception

An I/O interrupt is just like the exceptions except:

An I/O interrupt is asynchronous
Further information needs to be conveyed
Typically exceptions are more urgent than interrupts

An I/O interrupt is asynchronous with respect to instruction execution:

I/O interrupt is not associated with any instruction
I/O interrupt does not prevent any instruction from completion

You can pick your own convenient point to take an interrupt

I/O interrupt is more complicated than exception:

Needs to convey the identity of the device generating the interrupt
Interrupt requests can have different urgencies:

Interrupt request needs to be prioritized
Priority indicates urgency of dealing with the interrupt
high speed devices usually receive highest priority

How does an I/O interrupt different from the exception you already learned?

Well, an I/O interrupt is asynchronous with respect to the instruction execution while exception such as overflow or page fault are always associated with a certain instruction.

Also for exception, the only information needs to be conveyed is the fact that an exceptional condition has occurred but for interrupt, there is more information to be conveyed.

Let me elaborate on each of these two points.

Unlike exception, which is always associated with an instruction, interrupt is not associated with any instruction. The user program is just doing its things when an I/O interrupt occurs.

So I/O interrupt does not prevent any instruction from completing so you can pick your own convenient point to take the interrupt.

As far as conveying more information is concerned, the interrupt detection hardware must somehow let the OS know who is causing the interrupt.

Furthermore, interrupt requests needs to be prioritized. The hardware that can do all these looks like this.

27 of 29

Direct Memory Access

Direct Memory Access (DMA):

External to the CPU
Use idle bus cycles (cycle stealing)
Act as a master on the bus
Transfer blocks of data to or from memory without CPU intervention
Efficient for large data transfer, e.g. from disk
Cache usage allows the processor to leave enough memory bandwidth for DMA

CPU

IOC

device

Memory

DMAC

CPU sends a starting address,

direction, and length count

to DMAC. Then issues "start".

DMAC provides handshake

signals for Peripheral

Controller, and Memory

Addresses and handshake

signals for Memory.

How does DMA work?:

CPU sets up and supply device id, memory address, number of bytes
DMA controller (DMAC) starts the access and becomes bus master
For multiple byte transfer, the DMAC increment the address
DMAC interrupts the CPU upon completion

For multiple bus system, each bus controller often contains DMA control logic

Finally, lets see how we can delegate some of the I/O responsibilities from the CPU.

The first option is Direct Memory Access which take advantage of the fact that I/O events often involve block transfer: you are not going to access the disk 1 byte at a time.

The DMA controller is external to the processor and can acts as a bus master to transfer blocks of data to or from memory and the I/O device without CPU intervention.

This is how it works. The CPU sends the starting address, the direction and length of the transfer to the DMA controller and issues a start command.

The DMA controller then take over from there and provides handshake signals required (point to the last text block) to complete the entire block transfer.

So the DMA controller are pretty intelligent. If you add more intelligent to the DMA controller, you will end up with a IO processor or IOP for short.

+2 = 72 min. (Y:52)

28 of 29

With virtual memory systems: (pages would have physical and virtual addresses)

Physical pages re-mapping to different virtual pages during DMA operations
Multi-page DMA cannot assume consecutive addresses

Solutions:

Allow virtual addressing based DMA

Add translation logic to DMA controller
OS allocated virtual pages to DMA prevent re-mapping until DMA completes

Partitioned DMA

Break DMA transfer into multi-DMA operations, each is single page
OS chains the pages for the requester

In cache-based systems: (there can be two copies of data items)

Processor might not know that the cache and memory pages are different
Write-back caches can overwrite I/O data or makes DMA to read wrong data

Solutions:

Route I/O activities through the cache

Not efficient since I/O data usually is not demonstrating temporal locality

OS selectively invalidates cache blocks before I/O read or force write-back prior

to I/O write

Usually called cache flushing and requires hardware support

DMA Problems

DMA allows another path to main memory with no cache and address translation

DMA (Direct Memory Access) is a technique for transferring data directly between a peripheral device and main memory, without going through the central processing unit (CPU), which improves data transfer efficiency. However, in modern, more complex computer systems, DMA has some problems to consider.

DMA Problems in Virtual Memory Systems

Mapping Physical Memory to Virtual Memory: In a virtual memory system, a physical page of memory may be mapped to different virtual memory at different times, which can confuse DMA and cause data errors.

Multi-page DMA: When performing multi-page DMA, the system cannot assume that contiguous memory in virtual memory is mapped to contiguous physical memory.

Solution:

Allow DMA to access virtual memory directly:

Add address translation logic: Add electronic circuitry in the DMA controller to translate addresses from virtual memory to physical memory.

Operating system allocates virtual memory to DMA: The operating system allocates virtual memory exclusively to DMA and prevents changes to the memory mapping during DMA.

Dividing DMA into subsets:

Dividing the data transfer into multiple DMA operations: Each operation transfers only one page of data.

Operating system concatenates pages: The operating system concatenates the divided pages to ensure complete results.

DMA problems in systems with caches

Differences between cache and main memory: The processor may not know that the data in cache and main memory are different.

Write-back caches may overwrite data being transferred: If data is written to cache during DMA, the transferred data may be incorrect.

Solution:

Send I/O data through cache: This is not very efficient because I/O data often has no locality.

Operating system flushes cache or writes cached data back to main memory before performing I/O: This ensures that the data is transferred correctly.

Summary

DMA problems arise from the complexity of modern computer systems, especially those with virtual memory and caches. Resolving these problems requires a combination of hardware and software. To ensure that DMA data transfer is accurate and efficient.

Additional Questions:

Do you want to know more details about DMA troubleshooting techniques?

Do you want to know the impact of DMA problems on the performance of your computer system?

Do you have other questions about DMA?

Keywords: DMA, virtual memory, cache, computer system, data transfer

Additional Description:

Virtual Memory: A technique that allows the operating system to manage more memory than the hardware has by dividing the memory into pages and storing unused pages on disk.

Cache: A small amount of memory close to the processor that stores frequently used data, allowing faster access.

Problems in Systems with Virtual Memory

Problems:

Physical pages are mapped to different virtual pages during DMA operations: In systems with virtual memory, physical pages and virtual pages may be mapped in different ways. During DMA operations, if the physical page being accessed by DMA is remapped to a different virtual page, it can cause problems with incorrect data access.

DMA multi-page transfers cannot assume contiguous addresses: Since physical and virtual pages may not be contiguous, DMA cannot guarantee continuous access to the desired data.

Solution:

Allow DMA to access memory using virtual addresses:

Add address conversion logic to the DMA controller: Allow DMA to convert virtual addresses to physical addresses on its own.

The operating system allocates virtual pages to the DMA: and prevents remapping of pages until the DMA operation is complete.

Dividing the DMA operation into multiple parts:

Dividing the transfer into multiple DMA operations, each transferring only a single page: Makes dealing with non-contiguous addressing easier.

The operating system concatenates pages to enable DMA to work:

Problems in systems with caches

Problems:

The processor may not know that the data in the cache and main memory are different: This causes confusion in accessing the data.

Write-back caches may overwrite I/O data or cause DMA to misread the data: This is because the data in the cache may not match the data in main memory.

Solution:

Send I/O data through the cache:

Inappropriate: Since I/O data is rarely accessed repeatedly, storing data in the cache is not worthwhile.

The operating system flushes the cache before reading or writing I/O data:

Synchronize the data in the cache and main memory: Before reading data from an I/O device, the operating system flushes the data in the relevant cache, and before writing data to an I/O device, the operating system forces the data in the cache to be written back to main memory.

This is called flushing the cache: and requires hardware support.

Summary:

The operation of DMA in a system with virtual memory and cache is complicated because it has to deal with problems caused by memory mapping and the presence of cache, which may affect the correctness of the transferred data. Therefore, a mechanism to handle these problems is required to ensure that DMA works correctly and efficiently.

29 of 29

I/O Processor

CPU

IOP

Mem

D1

D2

Dn

. . .

main memory

bus

I/O

bus

CPU

IOP

(1) Issues

instruction

to IOP

memory

(2)

(3)

Device to/from memory

transfers are controlled

by the IOP directly.

IOP steals memory cycles.

OP Device Address

target device

where cmnds are

IOP looks in memory for commands

OP Addr Cnt Other

what

to do

where

to put

data

how

much

special

requests

(4) IOP interrupts

CPU when done

An I/O processor (IOP) offload the CPU
Some processors, e.g.

Motorola 860, include special purpose

IOP for serial communication