2 of 90

Main Memory :

Program must be brought (from disk) into memory and placed within a process for it to be run.

The instruction cycle, first fetch the instruction from memory, then it is decoded and fetch the operands from memory after that instruction is executed and result is stored back to the memory.

CPU can directly access Main memory and registers . if the data is in register it can be accessed fast i.e., in one CPU clock (or less).

If data is in Main memory to access it takes many cycles. To overcome this we can add fast memory between the CPU and main memory which is called cache.

Protection of memory required to ensure correct operation, and to protect user processes from one another. This protection must be provided by the hardware.

It can be implemented in several ways, one way is by using A pair of base and limit registers. Base register holds the smallest legal physical memory

3 of 90

The limit register holds specifies the size of the range. For example base register holds 300040 and the limit register is 120900, then the program can legally access all addresses from 300040 through 420939.

A Base and limit register define a logical address space

4 of 90

Protection of memory space is accomplished by having the CPU h/w compare every address generated in user mode with the registers. If the program try to access the memory of the other user’s memory result in trap to the operating system. The base and limit registers can be loaded only by the operating system. Which cannot be modified by the user.

HW address protection with base and limit registers

5 of 90

Binding of Instructions and Data to Memory

In general the program resides on a disk as a binary executable file. To be executed, the program must be brought into memory and placed within a process. In most of the cases, a user program will go through several steps before being executed.

Multistep Processing of a User Program

6 of 90

Addresses may be represented in different ways during these steps. Addresses in the source program are generally symbolic. ( such as count) . the computer bind these symbolic addresses to relocatable addresses ( such as “ 14 bytes from the beginning of this module “). The linking editor or loader will in turn bind the relocatable addresses to absolute addresses ( such as 74014 ). Each binding is a mapping from one address space to another.

Address binding of instructions and data to memory addresses can happen at three different stages

Compile time: if we know at compile time where the process will reside in memory, absolute code can be generated; must recompile code if starting location changes

Load time: if we doesn’t know at compile time where the process will reside in memory, then compiler must generate relocatable code . if the starting address changes, we need only reload the user code to incorporate the changed value.

Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers)

7 of 90

Logical vs. Physical Address Space

The concept of a logical address space that is bound to a separate physical address space is central to proper memory management

Logical address – generated by the CPU; also referred to as virtual address

Physical address – address seen by the memory unit

Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme

Memory-Management Unit (MMU)

Hardware device that maps virtual to physical address

In MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory

The user program deals with logical addresses; it never sees the real physical addresses

8 of 90

Dynamic relocation using a relocation register

Dynamic Loading

Routine is not loaded until it is called
Better memory-space utilization; unused routine is never loaded
Useful when large amounts of code are needed to handle infrequently occurring cases
No special support from the operating system is required implemented through program design

9 of 90

Dynamic Linking

Linking postponed until execution time
Small piece of code, stub, used to locate the appropriate memory-resident library routine. Stub replaces itself with the address of the routine, and executes the routine.
Operating system needed to check if routine is in processes’ memory address
Dynamic linking is particularly useful for libraries System also known as shared libraries

10 of 90

Swapping

A process needs to be in memory to be executed. A process, however, can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution. For example, assume a multi programming environment with a round-robin CPU-scheduling algorithm. When a quantum expires, the memory manager will start to swap out the process that just finished, and to swap in another process to the memory space that has been freed In the meantime, the CPU scheduler will allocate a time slice to some other process in memory. When each process finishes its quantum, it will be swapped with another process.

A variant of this swapping policy is used for priority-based scheduling algorithms. If a higher-priority process arrives and wants service, the memory manager can swap out the lower-priority process so that it can load and execute the higher-priority process. When the higher-priority process finishes, the lower-priority process can be swapped back in and continued. This variant of swapping is sometimes called roll out, roll in.

Normally a process that is swapped out will be swapped back into the same memory space that it occupied previously. the process cannot be moved to different locations. If execution-time binding is being used, then it is possible to swap a process into a different memory space, because the physical addresses are computed during execution time.

11 of 90

Schematic View of Swapping

Swapping requires a backing store. The backing store is commonly a fast disk. It must be large enough to accommodate copies of all memory images for all users, and must provide direct access to these memory images.

To get an idea of the context-switch time, let us assume that the user process is of size 100K and the backing store is a standard hard disk with a transfer rate of 1 megabyte per second. The actual transfer of the 100K process to or from memory takes 100K / 1000K per second = 1/10 second = 100 milliseconds. Assuming that no head seeks are necessary and an average latency of 8 milliseconds, the swap time takes 108 milliseconds. Since we must both swap out and swap in, the total swap time is then about 216 milliseconds.

12 of 90

Contiguous Memory Allocation

Single-Partition Allocation

The main memory must accommodate both the operating system and the various user processes. The memory is usually divided into two partitions, one for the resident operating system, and one or the user processes. It is possible to place the operating system in either low memory or high memory.

If the operating system is residing in low memory and the user processes are executing in high memory, we need to protect the operating-system code and data from changes (accidental or malicious) by the user processes. We also need to protect the user processes from one another. We can provide this protection by using a relocation register, with a limit register. The relocation register contains the value of the smallest physical address; the limit register contains the range of logical addresses. With relocation and limit registers, each logical address must be less than the limit register; the MMU maps the logical address dynamically by adding the value in the relocation register. This mapped address is sent to memory .

0	OPERATING SYSTEM
	USER
512K

13 of 90

Hardware Address Protection

14 of 90

Multiple-partition allocation

in general, that there be several user processes residing in memory at the same time, we need to consider the problem of how to allocate available memory to the various processes that are in the input queue waiting to be brought into memory.

One of the simplest schemes for memory allocation is to divide memory into a number of fixed-sized partitions. Each partition may contain exactly one process. When a partition is free, a process is selected from the input queue and is loaded into the free partition. When the process terminates, the partition becomes available for another process. This scheme was originally used by the IBM OS/360 operating system (called MFT); it is no longer in use.

The scheme described next is a generalization of the fixed partition scheme (called MVT). The operating system keeps a table indicating which parts of memory are available and which are occupied. Initially, all memory is available for user processes, and is considered as one large block of available memory, a hole. When a process arrives and needs memory, we search for a hole large enough for this process. If we find one, we allocate only as much memory as is needed, keeping the rest available to satisfy future requests. For example, assume that we have 2560K of memory available and a resident operating system of 400K. This situation leaves 2160K for user processes, as shown in Figure 8.7

15 of 90

0 400K	OPERATING SYSTEM
2560K	2160K

JOB QUEUE
Process	Memory	Time
P1	600K	10
P2	1000K	5
P3	300K	20
P4	700K	8
P5	500K	15

Figure 8.7 Scheduling Example

16 of 90

Given the input queue in the figure, and FCFS job Scheduling, we can immediately allocate memory to processes P1, P2, and P3, creating the memory map of-Figure 8.8(a). We have a hole of size 260K that cannot be used by any of the remaining processes in the input queue. Using a round-robin CPU-scheduling with a quantum of 1 time unit, process P2 will terminate after some time , releasing its memory. This situation is illustrated in Figure 8.8(b). We then return to our job queue and schedule the next process, process P4, to produce the memory map of Figure 8.8(c), Process PI will terminate at time 28 to produce Figure 8.8(d); process P5 is then scheduled, producing Figure 8.8(e).

17 of 90

This example illustrates the general structure of the allocation scheme. At any given time, we have a list of available block sizes and the input queue. The operating system can order the input queue according to a scheduling algorithm. Memory is allocated to processes until, finally, the memory requirements of the next process cannot be satisfied; no available block of memory (hole) is large enough to hold that process. The operating system can then wait until a large enough block is available, or it can skip down the input queue to see whether the smaller memory requirements of some other process can be met.

18 of 90

Dynamic Storage-Allocation Problem

how to satisfy a request of size n from a list of free holes. There are many solutions to this problem. The set of holes is searched to determine which hole is best to allocate. First-fit, best-fit, and worst-fit are the most common strategies used to select a free hole from the set of available holes.

First-fit: Allocate the first hole that is big enough. Searching can start either at the beginning of the set of holes or where the previous first-fit search ended. We can stop searching as soon as we find a free hole that is large enough.

Best-fit: Allocate the smallest hole that is big enough. We must search the entire list, unless the list is kept ordered by size. This strategy produces the smallest leftover hole.

Worst-fit: Allocate the largest hole. Again, we must search the entire list, unless it is sorted by size. This strategy produces the largest leftover hole, which may be more useful than the smaller leftover hole from a best-fit approach.

both first-fit and best-fit are better than worst-fit in terms of decreasing both time and storage utilization. first-fit is generally faster.

19 of 90

External and Internal Fragmentation

The MVT algorithm suffer from external fragmentation. As processes are loaded and removed from memory, the free memory space is broken into little pieces. External fragmentation exists when enough total memory space exists to satisfy a request, but it is not contiguous; storage is fragmented into a large number of small holes. Looking back at Figure 8.8, we can see two such situations. In Figure 8.8(a), there is a total externa] fragmentation of 260K, a space that is too small to satisfy the requests of either of the two remaining processes, P4 and P5- In Figure 8.8(c), however, we have a total external fragmentation of 560K (= 300K + 260K). This space would be large enough to run process P5 (which needs 500K), except that this free memory is not contiguous. The free memory space is fragmented into two pieces, neither one of which is large enough, by itself, to satisfy the memory request of process P5.

Another problem that arises with the multiple partition allocation scheme is illustrated by Figure 8.9. Consider the hole of 18,464 bytes. Suppose that the next process requests 18,462 bytes. If we allocate exactly the requested block we are left with a hole of 2 bytes. The overhead to keep track of this hole will be substantially larger than the hole itself. The general approach is to allocate very small holes as part of the larger request. Thus, the allocated memory may be slightly larger than the requested memory. The difference between these two numbers is internal fragmentation—memory that is internal to a partition, but is not being used.

21 of 90

COMPACTION

One solution to the problem of external fragmentation is compaction. The goal is to shuffle the memory contents to place all free memory together in one large block. For example, the memory map of Figure 8.8(e) can be compacted, as shown in Figure 8.10. The three holes of sizes 100K, 300K, and 260K can be compacted into one hole of size 660K.

Compaction is not always possible. If relocation is static and is done at assembly or load time, compaction cannot be done; compaction is possible only if relocation is dynamic, and is done at execution time. If addresses are relocated dynamically, relocation requires only moving the program and data, and then changing the base register to reflect the new base address. When compaction is possible, we must determine its cost. The simplest compaction algorithm is simply to move all processes toward one end of memory; all holes move in the other direction, producing one large hole of available memory.

Consider the memory allocation shown in Figure 8.11. If we use this simple algorithm, we must move processes P3 and P4, for a total of 600K moved. In this situation, we could simply move process P4 above process P3, moving only 400K, or move process P3 below process P4, moving only 200K. Note that, in this last instance, our one large hole of available memory is not at the end of memory, but rather is in the middle.

24 of 90

Paging

solution to the external fragmentation problem is to permit the logical address space of a process to be noncontiguous, thus allowing a process to be allocated physical memory wherever the latter is available. One way of implementing this solution is through the use of a paging scheme. Paging avoids the considerable problem of fitting the varying-sized memory chunks onto the backing store.

Basic Method

Physical memory is broken into fixed-sized blocks called frames. Logical memory is also broken into blocks of the same size called pages. When a process is to be executed, its pages are loaded into any available memory frames from the backing store. The backing store is divided into fixed-sized blocks that are of the same size as the memory frames.

Address Translation Scheme

Every address generated by the CPU is divided into two parts: a page number (p) and a page offset (d). The page number is used as an index into a page table. The page table contains the base address of each page in physical memory. This base address is combined with the page offset to define the physical memory address that is sent to the memory unit. The selection of a power of 2 as a page size makes the translation of a logical address into a page number and page offset particularly easy. If the size of logical address space is 2m, and a page size is 2" addressing units (bytes or words), then the high-order m — n bits of a logical address designate the page number, and the n low-order bits designate the page offset. Thus, the logical address is as follows:

25 of 90

page number

page offset

m - n

27 of 90

EXAMPLE : - consider the memory of Using a page size of 4 bytes and a physical memory of 32 bytes (8 pages), we show an example of how the user's view of memory can be mapped into physical memory. Logical address 0 is page 0, offset 0. Indexing into the page table, we find that page 0 is in frame 5. Thus, logical address 0 maps to physical address 20 (= (5x4) + 0). Logical address 3 (page 0, offset 3) maps to physical address 23 (= (5x4) + 3). Logical address 4 is page 1, offset 0; according to the page table, page 1 is mapped to frame 6. Thus, logical address 4 maps to physical address 24 (= (6 x 4) + 0). Logical address 13 maps to physical address 9.

When we use a paging scheme, we have no external fragmentation: Any free frame can be allocated to a process that needs it. However we may have some internal fragmentation. Notice that frames are allocated as units. If the memory requirements of a process do not happen to fall on page boundaries, the last frame allocated may not be completely full. For example, if pages are 2048 bytes, a process of 72,766 bytes would need 35 pages plus 1086 bytes. It would be allocated 36 frames, resulting in an internal fragmentation of 2048 - 1086 = 962 bytes. In the worst case, a process would need n pages plus one byte. It would be allocated n + 1 frames, resulting in an internal fragmentation of almost an entire frame.

28 of 90

Free Frames

When a process arrives in the system to be executed, its size, expressed in pages, is examined. Each page of the process needs one frame. Thus, if the " process requires n pages, there must be at least n frames available in memory. If there are n frames available, they are allocated to this arriving process. The first page of the process is loaded into one of the allocated frames, and the frame number is put in the page table for this process. The next page is loaded into another frame, and its frame number is put into the page table, and so on .

An important aspect of paging is the clear separation between the user's view of memory and the actual physical memory. The user program views that memory as one single contiguous space, containing only this one program. In fact, the user program is scattered throughout physical memory, which also holds other programs. The difference between the user's view of memory and the actual physical memory is reconciled by the address-translation hardware. The logical addresses are translated into physical addresses. This mapping is hidden from the user and is controlled by the operating system. the operating system is managing physical memory, it must be aware of the allocation details of physical memory: which frames are allocated, which frames are available, how many total frames there are, and so on. This information is generally kept in a data structure called a frame table.

29 of 90

Before allocation

After allocation

30 of 90

Memory Protection

Memory protection in a paged environment is accomplished by protection bits that are associated with each frame. Normally, these bits are kept in the page table. One bit can define a page to be read and write or read-only. Every reference to memory goes through the page table to find the correct frame number. For providing protection one more bit is generally attached to each entry in the page table: a valid invalid bit. When this bit is set to 'Valid," this value indicates that the associated page is in the process's logical address space, and is thus a legal (valid) page. If the bit is set to "invalid," this value indicates that the page is not in the process's logical address space. Illegal addresses are trapped by using the valid-invalid bit. The operating system sets this bit for each page to allow or disallow accesses to that page.

Valid (v) or Invalid (i) Bit

In A Page Table

31 of 90

Shared Pages

Another advantage of paging is the possibility of sharing common code. This consideration is particularly important in a time-sharing environment. Consider a system that supports 40 users, each of whom executes a text editor. If the text editor consists of 150K of code and 50K of data space, we would need 8000K to support the 40 users. If the code is reentrant, however, it can be shared, Here we see a three-page editor (each page of size 50K; the large page size is used to simplify the figure) being shared among three processes. Each process has its own data page. Reentrant code (also called pure code) is non-self-modifying code. If the code is reentrant, then it never changes during execution. Thus, two or more processes can execute the same code at the same time. Each process has its own copy of registers and data storage to hold the data for the process' execution. The data for two different processes will, of course, vary for each process. Only one copy of the editor needs to be kept in physical memory. Each user's page table maps the same physical copy. Thus, to support 40 users, we need only one copy of the editor (150K), plus 40 copies of the 50K of data space per user. The total space required is now 2150K, instead of 8000K — a significant savings. Other heavily used programs also can be shared: compilers, window systems, database systems, and so on. To be sharable, the code must be reentrant.

32 of 90

Shared Pages Example

33 of 90

Structure of the Page Table

Hierarchical Page Tables

Most modern computer systems support a very large logical address space (2³² to 2⁶⁴). In such an environment the page table itself becomes excessively large. For example, consider a system with a 32-bit logical address space. If the page size in such a system is 4K bytes (2¹²), then a page table may consist of up to 1 million entries (2³²/2¹²). Because each entry consists of 4 bytes, each process may need up to 4 megabytes of physical address space for the page table alone. Clearly, we would not want to allocate the page table contiguously in main memory. One simple solution to this is to divide the page table into smaller pieces. There are several different ways to accomplish this. One way is to use a two-level paging scheme, in which the page table itself is also paged

35 of 90

To illustrate this, let us return to our 32-bit machine example, with a page size of 4K bytes. A logical address is divided into a page number consisting of 20 bits, and a page offset consisting of 12 bits. Because we page the page table, the page number is further divided into a 10-bit page number, and a 10-bit page offset. Thus, a logical address is as follows:

where p1 is an index into the outer page table, and p2 is the displacement within the page of the outer page table.

Address-Translation Scheme

36 of 90

The VAX architecture supports two level paging. The VAX is a 32-bit machine with page size of 512 bytes. The logical address space of a process is divided into four equal sections, each of which consists of 230 bytes. Each section represents a different part of the logical address space of a process. The first 2 high-order bits of the logical address designate the appropriate section. The next 21 bits represent the logical page number of that section, and the last 9 bits represent an offset in the desired page. An address on the VAX architecture is as follows:

where s designates the section number, p is an index into the page table, and d is the displacement within the page.

For a system with a 64-bit logical address space, a two-level paging scheme is no longer appropriate. To illustrate this point, let us suppose that the page size in such a system is 4K bytes (2¹²). In this case, the page table will consist of up to 2⁵² entries. If we use a two-level paging scheme, then the inner page tables could conveniently be 1 page long, or contain 2¹⁰ four-byte entries. The addresses would look like:

37 of 90

The outer page table will consist of 2⁴² entries, or 2M bytes. The obvious method to avoid such a large table is to divide the outer page table into smaller pieces. This approach is also used on some 32-bit processors for added flexibility and efficiency. This can be accomplished in various ways. We can page the outer page table, giving us a three-level paging scheme. Suppose that the outer page table is made up of standard-size pages (2¹⁰ entries, or 2¹²bytes); a 64-bit address space is still daunting:

The outer page table is still 2³² bytes large. The next step would be a four-level paging scheme, where the second level outer page table itself is also paged. And possibly 4 memory access to get to one physical memory location.

38 of 90

Hashed Page Tables

Common approach for handling address spaces larger than 32 bits is to use a hashed page table, with the hash value being the virtual page number. Each entry in the hash table contains a linked list of elements that hash to the same location. ( to handle the collisions).

Each element consists of three fields (1) the virtual page number , (2) the value of the mapped page frame, and (3) a pointer to the next element in the linked list.

Algorithm works as : Virtual page number in the virtual address is hashed into the hash table. The virtual page number is compared with field 1 in the first element in the linked list. If there is a match, the corresponding page frame ( field 2 ) is used to form the desired physical address. If there is no match, subsequent entries in the linked list are searched for a matching virtual page number.

A variation of this scheme that is favorable for 64-bit address spaces has been proposed. This variation uses clustered page tables, which are similar to hashed page tables except that each entry in hash table refers to several pages ( such as 16) rather than a single page. Therefore , the single page table entry can store the mapping for multiple physical page-frames. Clustered page tables are particularly useful for sparse address spaces, where memory references are noncontiguous and scattered throughout the address space.

39 of 90

Hashed Page Table

40 of 90

Inverted Page Table

Usually, each process has a page table associated with it. The page table has one entry for each page that the process is using. The operating system must then translate this reference into a physical memory address. One of the drawbacks of this scheme is that each page table may consist of millions of entries. These tables may consume large amounts of physical memory, which is required just to keep track of how the other physical memory is being used. To solve this problem, we can use an inverted page table.

An inverted page table has one entry for each real page (frame) of memory. Each entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. Thus, there is only one page table in the system, and it has only one entry for each page of physical memory.

41 of 90

Inverted Page Table Architecture

42 of 90

Examples

To illustrate this scheme, we shall describe a simplified version of the implementation of the inverted page table used in the IBM RT. Each virtual address in the system consists of a triple <process-id, page-number, offset>. Each inverted page-table entry is a pair <process-id, page-number>. When a memory reference occurs, part of the virtual address, consisting of <process-id, page-number>, is presented to the memory subsystem. The inverted page table is then searched for a match. If a match is found—say, at entry i—then the physical address <i, offset> is generated. If no match is found, then an illegal address access has been attempted.

Although this scheme decreases the amount of memory needed to store each page table, it increases the amount of time needed to search the table when a page reference occurs. Because the table is sorted by a physical address, but lookups occur on virtual addresses, the whole table might need to be searched for a match. This search would take far too long. To alleviate this problem, we use a hash table to limit the search to one—or at most a few—page-table entries. Of course, each access to the hash table adds a memory reference to the procedure, so one virtual-memory reference requires at least two real-memory reads: one for the hash-table entry and one for the page table. To improve performance, we use associative memory registers to hold recently located entries. These registers are searched first, before the hash table is consulted.

43 of 90

Segmentation

An important aspect of memory management that became unavoidable with paging is the separation of the user's view of memory and the actual physical memory.

Basic Method

What is the user's view of memory? the user prefers to view memory as a collection of variable-sized segments, with no necessary ordering among segments.

User’s View of a Program

44 of 90

Logical View of Segmentation

Consider how you think of a program when you are writing it. You think of it as a main program with a set of subroutines, procedures, functions, or modules. There may also be various data structures: tables, arrays, stacks, variables, and so on. Each of these modules or data elements is referred to by name. You talk about "the symbol table," "function Sqrt" "the main program," without caring what addresses in memory these elements occupy. Each of these segments is of variable length; the length is intrinsically defined by the purpose of the segment in the program. Elements within a segment are identified by their offset from the beginning of the segment: the first statement of the program, the seventeenth entry in the symbol table, the fifth instruction of the Sqrt function, and so on.

Segmentation is a memory-management scheme that supports this user view of memory. A logical address space is a collection of segments. Each segment has a name and a length. The addresses specify both the segment name and the offset within the segment. The user therefore specifies each address by two quantities: a segment name and an offset. For simplicity of implementation, segments are numbered and are referred to by a segment number, rather than by a segment name. Thus, a logical address consists of a two tuple: <segment-number, offset>.

45 of 90

Hardware

Although the user can now refer to objects in the program by a two-dimensional address, the actual physical memory is still, of course, a one-dimensional sequence of bytes. Thus, we must define an implementation to map two dimensional user-defined addresses into one-dimensional physical addresses. This mapping is effected by a segment table. Each entry of the segment table has a segment base and a segment limit. The segment base contains the starting physical address where the segment resides in memory, whereas the segment limit specifies the length of the segment.

user space

physical memory space

46 of 90

Segmentation Architecture

A logical address consists of two parts: a segment number, s, and an offset into that segment, d. The segment number is used as an index into the segment table- The offset d of the logical address must be between 0 and the segment limit. If it is not, we trap to the operating system (logical addressing attempt beyond end of segment). If this offset is legal, it is added to the segment base to produce the address in physical address. Segmentation Hardware

47 of 90

Example

As an example, consider the situation shown in Figure. We have five segments numbered from 0 through 4. The segments are stored in physical memory as shown. The segment table has a separate entry for each segment, giving the beginning address of the segment in physical memory (the base) and the length of that segment (the limit). For example, segment 2 is 400 bytes long, and begins at location 4300. Thus, a reference to byte 53 of segment 2 is mapped onto location 4300 + 53 = 4353. A reference to segment 3, byte 852, is mapped to 3200 (the base of segment 3) + 852 = 4052. A reference to byte 1222 of segment 0 would result in a trap to the operating system, as this segment is only 1000 bytes long.

48 of 90

Example of Segmentation

49 of 90

Protection and Sharing

A particular advantage of segmentation is the association of protection with the segments. we have some segments that are instructions, whereas other segments are data. In a modern architecture, instructions are non-self-modifying, so instruction segments can be defined as read-only or execute-only The memory mapping hardware will check the protection bits associated with each segment table entry to prevent illegal .accesses to memory, such as attempts to write into a read-only segment, or to use an execute-only segment as data. Another advantage of segmentation involves the sharing of code or data. Each process has a segment table associated with it, which the dispatcher uses to define the hardware segment table when this process is given the CPU. Segments are shared when entries in the segment tables of two different processes point to the same physical location. For example, consider the use of a text editor in a time-sharing system. A complete editor might be quite large, composed of many segments. These segments can be shared among all users, limiting the physical memory needed to support editing tasks. Rather than n copies of the editor, we need only one copy. Common subroutine packages can be shared among many users if they are defined as sharable, read-only segments. Two FORTRAN programs, for instance, may use the same Sqrt subroutine, but only one physical copy of the Sqrt routine would be needed.

51 of 90

Fragmentation

The long-term scheduler must find and allocate memory for all the segments of a user program. This situation is similar to paging except that the segments are of variable length; pages are all the same size. Thus, as with the variable-sized partition scheme, memory allocation is a dynamic storage-allocation problem, usually solved with a best-fit or first-fit algorithm. Segmentation may then cause external fragmentation, when all blocks of free memory are too small to accommodate a segment. In this case, the process may simply have to wait until more memory (or at least a larger hole) becomes available, or compaction may be used to create a larger hole.

if the average segment size is small, external fragmentation will also be small.

52 of 90

Virtual memory

Virtual memory is a technique that allows the execution of processes that may not be completely in memory. The main visible advantage of this scheme is that programs can be larger than physical memory.

Overlays and dynamic loading can do this. In an examination of real programs shows us that, in many the entire program is not needed. Programs often have code to handle unusual error conditions. Since these errors seldom, if ever, occur in practice, this code is almost never executed. Arrays, lists, and tables are often allocated more memory than they actually need. An array may be declared 100 by 100 elements, even though it is not larger than 10 by 10 elements. An assembler symbol table may have room for 3000 symbols, although the average program has less than 200 symbols.

Virtual memory is commonly implemented by demand paging. It can also be implemented in a segmentation system. Several systems provide a paged segmentation scheme, where segments are broken into pages.

54 of 90

Demand Paging:

A demand-paging system is similar to a paging system with swapping. Processes reside on secondary memory (which is usually a disk). When we want to execute a process, we swap it into memory. Rather than swapping the entire process into memory, however, we use a lazy swapper. A lazy swapper never swaps a page into memory unless that page will be needed.

Transfer of a paged memory to contiguous disk spaces

55 of 90

Since we are now viewing a process as a sequence of pages, rather than one large contiguous address the use of the term swap is technically incorrect. A swapper manipulates entire processes, whereas a pager is concerned with the individual pages of a process. We shall thus use the term pager rather than swapper.

When a process is to be swapped in, the pager guesses which pages will be used before the process is swapped out again. Instead of swapping in a whole, the pager brings only those necessary pages into memory. Thus, it avoids the page swap time and the amount of physical memory needed.

With this we need some form of hardware support to distinguish between those pages that are in memory and those pages that are on the disk. The valid-invalid bit scheme can be used for this purpose. This time, however, when this bit is set to "valid," this value indicates that the associated page is both legal and memory. If the bit is set to "invalid," this value indicates that the page either is not valid (that is, not in the logical address space of the process), or is valid but is currently on the disk. While the process executes and access the pages that are memory resident, execution proceeds normally. But what happens if the process tries to use a page that was not brought into memory? Access to a page marked invalid causes a page-fault trap. The paging in translating the address through the page table, will notice that the invalid bit is set, causing a trap to the operating system. This trap is the result of the operating system's failure to bring the desired page into memory. The procedure for handling this page fault is simple

56 of 90

Figure 9.3 Page table when some pages are not in main memory.

57 of 90

Figure 9.4 Steps in handling a page fault

58 of 90

1. We check an internal table (usually kept with the process control block) for this to determine whether the reference was a valid or invalid memory access.

2. If the reference is invalid then terminate the process. if it is valid but we have not yet bought in that page into memory trap to OS.

3. We find a free frame (by taking one from the free-frame list, for example). If free frame not found use one of the page replacement algorithm and select the frame.

4.We schedule a disk operation to read the desired page into the newly allocated frame.

5. When the disk read is complete, we modify the internal table kept with the process and the page table to indicate that the page is in memory.

6. We restart the instruction that was interrupted by the illegal address trap

we can now access the page as though it had always been in memory.

It is important to realize because we save the state (registers, condition code, instruction counter) of the interrupted process when the page fault occurs, we can restart the process in exactly the same place.

59 of 90

In the extreme case, we could start executing a process with no pages in memory. When the operating system sets the instruction pointer to the first instruction of the process, which is on a non-memory-resident page, the process would immediately fault for the page. After this page was brought into memory, the process would continue to execute, faulting as necessary until every page that it needed was actually in memory. At that point, it could execute with no more faults. This scheme is pure demand paging: Never bring a page into memory until it is required.

Need for page replacement

When a page fault occurs and finds there are no free frames on the free-frame list, the operating system has several options at this point. We could swap out a process, freeing all its frames, and reducing the level of multi programming, or we can use page replacement.

Page replacement takes the following approach. If no frame is free, we find one that is not currently being used and free it. We can free a frame by writing its contents to swap space, and changing the page table (and all other tables) to indicate that the page is no longer in memory .The freed frame can now be used to hold the page for which the process faulted.

60 of 90

The page-fault service routine is now modified to include page replacement:

1. Find the location of the desired page on the disk.

2. Find a free frame:

a. If there is a free frame, use it.

b. Otherwise, use a page-replacement algorithm to select a victim frame.

c. Write the victim page to the disk; change the page and frame tables .

3. Read the desired page into the (newly) free frame; change the page and frame tables.

4.Restart the user process.

Notice that, if no frames are free, two page transfers (one out and one in) are required. This situation effectively doubles the page-fault service time and will increase the effective access time accordingly.

This overhead can be reduced by the use of a modify (dirty) bit. Each page or frame may have a modify bit associated with it in the hardware. The modify bit for a page is set by the hardware whenever any word or byte in the page is written indicating that the page has been modified.

61 of 90

9.6 Page replacement

62 of 90

When we select a page for replacement, we examine its modify bit. If the bit is set, we know that the page has been modified since it was read in from the disk. In this case, we must write that page to the disk. If the modify bit is not set then the page is not been modified since it was read into memory. Therefore, we can avoid writing the memory page to the disk; it is already there. This technique also applies to read-only pages (for example, pages of binary code). Such pages cannot be modified; they may be discarded when desired. This scheme can reduce significantly the time to service a page fault, since it reduces I/O time by one-half if the page is not modified.

We must solve two major problems to implement demand paging: We must develop a frame-allocation algorithm and a page-replacement algorithm. If we multiple processes in we must decide how many frames allocate to each process. Further, when page replacement is required, we must select the frames that are to be replaced. Designing appropriate algorithms solve these problems is an important task, because disk I/O is so expensive. Even slight improvements in demand-paging methods yield large gains in system performance.

63 of 90

Page-Replacement Algorithms

There are many different page-replacement algorithms. we want the one with the lowest page-fault rate.

We evaluate an algorithm by running it on a particular string of memory references and computing the number of page faults. The string of memory references is called a reference string.

First, for a given page size (and the page size is generally fixed by the hardware or system), we need to consider only the page number, not the entire

address. Second, if we have a reference to a page p, then any immediately

following references to page p will never cause a page fault. Page p will be in memory after the first reference; the immediately following references will not fault.

For example, if we trace a particular process, we might record the particular address sequences:

0100,0432,0101,0612,0102,0103,0104,0101,0611,0102,0130,

0104,0101,0610,0102,0103,0104,0101,0609,0102,0105

At 100 bytes per page, this sequence is reduced to the following reference string:

1,4,1,6,1,6,1,6,1,6,1

64 of 90

Figure 9.7 Graph of page faults versus the number of frames.

To determine the number of page faults for a particular reference string and page-replacement algorithm, we also need to know the number of page frames

available. as the number of frames available increases, the number of page faults will decrease. In general, we expect a curve such as that in Figure 9.7. As the number of frames increases, the number of page faults drops to some minimal Of course, adding physical memory increases the number of frames.

To illustrate the page-replacement consider the reference string for memory with three frames

7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1

65 of 90

FIFO Algorithm

The simplest page-replacement algorithm is a FIFO algorithm. A FIFO replacement algorithm associates with each page the time when that page was brought into memory. When a page must be replaced, the oldest page is chosen. Notice that it is not strictly necessary to record the time when a page is brought in. We can create a FIFO queue to hold all pages in memory. We replace the page at the head of the queue. When a page is brought into we insert it at the tail of the queue.

For our example reference string, our three frames are initially empty. The first three references cause page faults, and are brought into these empty

66 of 90

The next reference replaces page 7 because page 7 was brought in first. Since 0 the next reference and 0 is already in memory, we have no fault for this reference. The first to 3 results in page 0 being replaced, since it was the first of the three pages in memory (0, 1, and 2) to in. This replacement means that the next reference, to 0, will fault. Page 1 is then replaced by page 0. This process continues as shown in Figure. There are 15 faults altogether.

The FIFO page-replacement algorithm is easy to understand and program. However, its performance increases the page-fault rate and slows process execution, but does not cause incorrect execution.

To illustrate the that are possible with a FIFO page replacement algorithm, we consider the reference string. 1,2,3,4,1,2,5,1,2,3,4,5

67 of 90

Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
3 frames (3 pages can be in memory at a time per process)

�

4 frames��

�

Belady’s Anomaly: more frames ⇒ more page faults

9 page faults

10 page faults

68 of 90

Optimal Replacement Algorithm

An Optimal page-replacement algorithm has the lowest page-fault rate of all Algorithms. An optimal algorithm never suffer from Belady's anomaly. An optimal page-replacement algorithm and has been called OPT or It is simply replace the page that will not be used for longest period of time.

Use of this page-replacement algorithm guarantees the lowest possible page- fault rate for a fixed number of frames.

For eg, on our ample reference string, the optimal page-replacement algorithm would yield nine page faults shown in Figure 9.10. The first references cause faults that fill the three empty frames. The reference to page 2 replaces page 7, because 7 will not be used until reference whereas page 0 will be used at 5, and page 1 at 14. The reference to page 3 replaces page 1, as page 1 will be the last of the three pages in to be referenced again. With only nine page faults, optimal replacement is much a better than a FIFO algorithm, which resulted in fifteen faults. In fact, no replacement algorithm can process this reference string three frames with less than nine faults.

Unfortunately, the optimal page-replacement algorithm is difficult to implement, because it requires future knowledge of the string

69 of 90

shows the curve of page faults versus the number of available frames. We notice that the number of faults for four frames (10) is greater than the number of faults for three This result is most unexpected and is known as Belady's anomaly. anomaly reflects the fact that, for some page-replacement the page-fault rate may increase as the number of allocated frames increases. We would that giving more memory to a process would improve its performance. In some early research, investigators noticed that this not always true. anomaly was discovered as a result.

70 of 90

Optimal page-replacement algorithm

LRU page-replacement algorithm

71 of 90

LRU page replacement

LRU will replace the page that has not been used for the longest period of time.

LRU replacement associates with each page the time of that page's last use. When a page must be replaced, LRU chooses that page that has not been used for the longest period of time.

The result of applying LRU replacement to our example reference string produces 12 faults. Notice that the first five faults are the same as the optimal replacement. When the page 4 occurs, however, LRU replacement sees that, of the three frames in page 2 was used least recently. The most recently used page is page 0, and just before that page 3 was used. Thus, the LRU algorithm replaces page 2, not knowing that page 2 is about to be used. When it then faults for page 2, the LRU algorithm replaces page 3 since, of the three pages in memory page 3 is the least recently used. Despite these problems, LRU replacement with 12 faults is still much better than FIFO replacement with 15.

The LRU policy is often used as a page-replacement algorithm and is considered to be quite good. The major problem is how to implement LRU replacement. An LRU algorithm may require substantial hardware assistance. The problem is to determine an order for the frames defined by the time of last use.

72 of 90

Copy on Write

Copy on Write or simply COW is a resource management technique. One of its main use is in the implementation of the fork() system call in which it shares the virtual memory of the OS.

In UNIX like OS, fork() system call creates a duplicate process of the parent process which is called as the child process.

The idea behind a copy-on-write is that when a parent process creates a child process then both of these processes initially will share the same pages in memory and these shared pages will be marked as copy-on-write which means that if any of these processes will try to modify the shared pages then only a copy of these pages will be created and the modifications will be done on the copy of pages by that process and thus not affecting the other process.

Suppose, there is a process P that creates a new process Q and then process P modifies page 3. The below figures shows what happens before and after process P modifies page 3.

74 of 90

Assume that the child process attempts to modify a page containing portions of the stack, with the pages set to be copy-on-write. The operating system will then create a copy of this page, mapping it to the address space of the child process.

The child process will then modify its copied page and not the page belonging to the parent process. Obviously, when the copy-on write technique is used, only the pages that are modified by either process are copied; all unmodified pages can be shared by the parent and child processes.

Note, too, that only pages that can be modified need be marked as copy-on write. Pages that cannot be modified can be shared by the parent and child. Copy-on-write is a common technique used by several operating systems, including Windows XP, Linux, and Solaris.

75 of 90

Operating systems typically allocate these pages using a technique known as zero-fill-on-demand. Zero-fill-on-demand pages have been zeroed-out before being allocated, thus erasing the previous contents. Several versions of UNIX (including Solaris and Linux) also provide a variation of the fork() system call—vfork() (for virtual memory fork). vfork() operates differently from fork() with copy-on-write.

With vfork(), the parent process is suspended, and the child process uses the address space of the parent. Because vfork() does not use copy-on-write, if the child process changes any pages of the parent's address space, the altered pages will be visible to the parent once it resumes. Therefore, vfork() must be used with caution to ensure that the child process does not modify the address space of the parent, vfork() is intended to be used when the child process calls exec() immediately after creation. Because no copying of pages takes place.

76 of 90

Allocation of Frames

How do we allocate the fixed amount of free memory among the various processes? If we have 93 free frames and two processes, how many frames does each process get?

The simplest case of memory is the system. Consider a single-user system with 128K memory composed of pages of size Thus, there are 128 frames. The operating system may take leaving 93 frames for the user process. Under pure demand paging, all 93 frames would initially be put on the free-frame list. When a user process started execution, it would generate a sequence of page faults. The first 93 page faults would all get free frames from the free-frame list. When the free-frame list was exhausted, a Replacement algorithm must be used to select one of the 93 in memory pages to be replaced with the ninety-fourth, and so on. When the process terminated, the 93 frames would once again be placed on the free-frame list.

77 of 90

Minimum Number of Frames

There of various constraints on our strategies for the allocations of frames. We cannot allocate more than the total number of available frames (unless there is page sharing). There is also a number of frames that can be allocated. as the number of frames allocated to each process decreases, the page fault-rate increases, slowing process execution. Besides the undesirable performance properties of allocating only a few frames, there is a minimum number of frames that must be This minimum number is defined by the instruction-set architecture. Remember that, a page fault occurs before an executing instruction is complete, the instruction must be restarted. Consequently, we must have frames to hold all the different pages that any single instruction can reference.

For consider a machine in which all memory-reference instructions have only one memory address. Thus, we need at least one frame for the instruction and one frame for the memory reference. In if level indirect addressing is allowed (for example, a instruction on page 16 can refer to an address on page 0, which is an indirect reference to page 23), then paging requires at least three frames per process. Think about what happen if a process had only two frames. The minimum number of frames is defined by the computer architecture.

78 of 90

The worst-case scenario occurs in architectures that allow multiple levels of indirection (for example, each 16-bit word could contain a 15-bit address plus a 1-bit indirect indicator). a simple load instruction could reference an indirect address that could reference an indirect address (on another page) that could also reference an indirect address (on yet another page), and so on, until every page in virtual memory had been touched. Thus, in the worst case, the entire memory must be in physical memory. To overcome this difficulty, we must place a limit on the levels of indirection (for example, limit an instruction to at most 16 levels of indirection). When the first indirection occurs, a counter is set to the counter is then decremented for each successive indirection for this instruction. If the counter is decremented to a trap occurs (excessive indirection). This limitation reduces the maximum number of memory references per instruction to 17, requiring the same number of frames.

The minimum number of frames per process is defined by the whereas the maximum number is defined by the amount of physical memory.

79 of 90

Allocation Algorithms

The easiest way to split m frames among n processes is to give everyone an equal share, frames. For instance, if there are 93 frames and five processes, each process will get 18 frames. The leftover three frames could be used as a free-frame buffer pool. This scheme is called equal allocation.

An alternative is to recognize that various processes will differing amounts of memory. If a small student process of 10K and ah interactive database of are the only two processes running in a system with 62 free frames, it does not make much sense to give each process 31 frames. The student process does not need more than 10 frames, so the other 21 are strictly wasted

To solve this problem we can use propositional allocation, in which We allocate available memory to each process according to its size. Let the size of the virtual memory for process pi be si and define

S = ∑ si.

Then, if the total number of available frames is m, we allocate ai frames to process where ai is approximately

a_i = s_i/ S x m

Of course, we must adjust a_i to be an integer which is greater than the minimum number of frames required by the instruction set, with sum not exceeding m.

80 of 90

For proportional allocation, we would split 62 frames between two processes, one of 10 pages and one of 127 pages, by allocating four frames and 57 frames respectively

10 / 137 x 62 ≈ 4 and 127/ 137 x 62 ≈ 57

In this way, both processes share the available frames according to their “ needs," rather than equally.

If the multiprogramming level is increased, each process will lose some frames to provide the memory needed for the new process. On the other if the multiprogramming level decreases, the frames that had been allocated to process can now be spread over the remaining processes.

Notice that, with either equal or proportional allocation, a high- priority process is treated the same as a low-priority process. By its definition, we may want to give the high-priority process more memory to speed its execution. One solution is to use a proportional allocation where the ratio

of frames depends not on the relative sizes of processes, but rather on the priorities, or on a combination of size and priority.

81 of 90

Global Versus Local Allocation

Another important factor in the way frames are allocated to the various processes is page replacement. we can classify page-replacement algorithms into two broad categories: global replacement and local replacement. Global replacement allows a process to select a replacement frame from the set of all frames, even if that frame is currently allocated to some other process; one process can take a frame from another. Local replacement requires that each process select from only its own set of allocated frames.

For consider an allocation scheme where we allow high-priority processes to select frames from low-priority processes for replacement. process can select a replacement from among its own frames or the frames of any lower-priority process.

With a local replacement strategy, the number of frames allocated to a process does not change. With global replacement, a process may happen to select frames allocated to other processes, thus the number of frames allocated to it (assuming that other processes do not choose its frames for replacement ).

82 of 90

One problem with a global replacement algorithm is that a process cannot control its own page-fault rate. The set of pages in memory for a process depends not only on the paging behavior of that process, but also on the paging behavior of other processes. Therefore, the same process may perform quite differently (taking 0.5 seconds for one execution and 10.3 seconds for the next execution) due to totally external circumstances. Such is not the case with a local replacement algorithm. Under local replacement, the set of pages in memory for a process is affected by the paging behavior of only that process. global replacement generally results in greater system throughput, and is therefore the more common method

83 of 90

Thrashing

If the number of frames allocated to a low-priority process falls below the minimum number required by the computer architecture, we must suspend that execution. We should then page out its remaining pages, freeing all its allocated frames. This provision introduces a swap-in, swap-out level of intermediate CPU scheduling

If the process does not have minimum number of frames, it will very quickly page fault. At this point, it must replace some page. However, since all its pages are in active use, it must replace a page that will be needed again right away. Consequently, it very quickly faults again, and again, and again. The process continues to replacing pages for which it will then fault and bring back in right away.

This high paging activity is called thrashing. process is thrashing if it is spending more time paging than executing

84 of 90

Cause of Thrashing

The operating system monitors CPU utilization. If CPU utilization is too low,

we increase the degree of multiprogramming by introducing a new process to the system. A global page-replacement algorithm is used, replacing pages with no regard to the process to which they belong. Now suppose a process enters a new phase in its execution and needs more frames. It starts faulting and taking the frames away from other processes. These processes need those pages so they also starts faulting. These faulting processes must use the paging device to swap pages in and out. As they queue up for the paging device, the ready queue empties As processes wait for the paging device, CPU utilization decreases.

The CPU scheduler sees the decreasing CPU utilization, and increases the degree of multiprogramming as a result. The new process tries to get started by

taking frames from running processes, causing more page faults, and a longer queue for the paging device. As a CPU utilization drops even further, and the CPU scheduler tries to increase the degree of multiprogramming even more. Thrashing has occurred and system throughput plunges. The page- fault rate increases tremendously. As a result, the effective memory access time increases. No work is getting because the processes are spending all their time paging.

85 of 90

This phenomenon is illustrated in Figure. CPU utilization is plotted against the degree of multiprogramming. As the degree of multiprogramming increases CPU utilization also increases although more slowly, until a Maximum is reached. If the degree of multiprogramming is increased even further thrashing sets in and CPU utilization drops sharply. At this point, to increase CPU utilization and stop thrashing, we must decrease the degree of multiprogramming.

The effects of thrashing can be limited by using a local (or priority) replace-

ment algorithm. With local replacement, if one process starts thrashing, it cannot steal frames from another process and cause the latter to thrash also. However, if processes are thrashing, they will be in the queue for the paging device most of the time. The average service time for a page fault will increase, due to the longer average queue for the paging device. Thus, the effective access time will increase even for a process that is not thrashing.

To prevent we must provide a process as many frames as it needs. The working-set strategy starts by looking at how many frames a process is actually using. This approach defines the locality model of process execution.

The locality model states that, as a process executes, it moves from locality to locality. A locality is a set of pages that are actively used together A program is generally composed of several different localities, which may overlap.

86 of 90

For example a subroutine is called, it defines a new locality. In this locality, memory references are made to the instructions of the subroutine, its local and a subset of the global variables. When the subroutine is exited, the process leaves this locality, since the local variables and instructions of the subroutine are no longer in active use.

Suppose that we allocate enough frames to a process to accommodate its current locality. it will not fault again until it changes localities. If we allocate fewer frames than the size of the current locality, the process will thrash, since. it cannot keep in memory all the pages that it is actively using.

87 of 90

Working-Set Model

The working-set model is based on the assumption of locality. This model uses a parameter ⌂ to define the working-set window. The idea is to examine the most recent ⌂ page references. The set of pages in the most recent ⌂ page references is the wdrking set . If a page is in active use it will be in the working set. If it is no longer being used, it will drop from the working set ⌂ time units after its last reference. Thus, the working set is an approximation of the locality.

88 of 90

The accuracy of the working set depends on the selection of ⌂ . If ⌂ is too small, it will not encompass the entire locality; if ⌂ is too large, it may overlap several localities. In the extreme, if ⌂ is infinite, the working set is the set of pages touched during the process execution.

The most important property of the working set is its size. If we the working-set size, WSS_i for each process in the system, we can then consider that

D = ∑WSS_i

Where D is the total demand for frames. Each process is actively using the pages in its working set. process i needs WSS_i frames. If the total demand is greater than the total number of available frames (D > m), thrashing will occur, because some processes will not have enough frames.

The Operating system monitors the working set of each process and allocates to that working-set enough frames to provide it with its working-set size. If there are enough extra frames another process can be initiated. If the sum of the working-set increases, exceeding the total number of available frames, the operating system selects a process to suspend. The process pages are written out its frames are reallocated to other processes. The suspended process will be restarted later.

89 of 90

This working-set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. Thus, it optimizes CPU utilization The difficulty with the working-set model is keeping track of the working-set. The working-set window is a moving window. At each memory reference, a new reference appears at one end and the oldest reference drops off the other end. A page is in the working set if it is referenced any where in the working-set window.

Page-Fault Frequency

The working-set model is successful, and knowledge of the working set can be useful for prepaging , but it seems a clumsy way to control

thrashing. The page-fault strategy takes a more direct approach.

Thrashing has a high page-fault rate. Thus, we want to control the page-fault rate. it is too high, we know that the process needs more frames. Similarly, if the page-fault rate is too low, then the process may have too many frames. We can establish upper and lower bounds on the desired page-fault rate If the actual page-fault rate exceeds the upper limit, we allocate that process another frame; if the page-fault rate falls below the lower limit, we remove a frame from that process. Thus, we can directly measure and control the page-fault rate to prevent thrashing.

1 of 90