Hashing: General Idea
Additional concepts:
UNIT- V
Hashing is defined as follows...
“Hashing is the process of indexing and retrieving element (data) in a data structure to provide a faster way of finding the element using a hash key”
Hash Table is defined as follows...
“Hash table is just an array which maps a key (data) into the data structure with the help of hash function such that insertion, deletion and search operations are performed with constant time complexity (i.e. O(1)).”
Basic concept of hashing and hash table is shown in the following figure...
Example:
Example: 01
Data Item | Value % No. of Slots | Hash Value |
26 | 26 % 10 = 6 | 6 |
70 | 70 % 10 = 0 | 0 |
18 | 18 % 10 = 8 | 8 |
31 | 31 % 10 = 1 | 1 |
54 | 54 % 10 = 4 | 4 |
93 | 93 % 10 = 3 | 3 |
41 | 41 % 10 = 1 | 1 (collision occurs) |
74 | 74 % 10 = 4 | 4 (collision occurs) |
Collision Resolution Techniques
When the hash value of a key maps to an already occupied bucket/slot/index of the hash table, it is called as a Collision.
To handle the collision,
Advantages of separate chaining
Disadvantages of separate chaining”
Load Factor (λ)
Load factor (λ) is defined as-
Load factor (λ) = Number of Elements present in the hash table (n) /
Total size of the Hash Table (N)
Example: 01
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101. Use separate chaining technique for collision resolution
Data Item | Value % No. of Slots | Hash Value |
50 | 50 % 7 = 1 | 1 |
700 | 700 %7 = 0 | 0 |
76 | 76 % 7 = 6 | 6 |
85 | 85 % 7 = 1 | 1 ( collision occurs) |
92 | 92 % 7 = 1 | 1 (collision occurs) |
73 | 73 % 7 = 3 | 3 |
101 | 101 % 7 =3 | 3 (collision occurs) |
Open addressing is collision-resolution method that is used to control the collision in the hashing table. There is no key stored outside of the hash table. Therefore, the size of the hash table is always greater than or equal to the number of keys. It is also called closed hashing.
key = key % size;
key = (key+i) % size; here i is 0 to size-1
Example: 01:
Suppose we have a list of size 20 (m = 20). We want to put some elements in linear probing fashion. The elements / keys are {96, 48, 63, 29, 87, 77, 48, 65, 69, 94, 61}
Data Item | Value % No. of Slots | Hash Value | probes |
96 | 96 % 20 =16 | 16 | 1 |
48 | 48 % 20 =8 | 8 | 1 |
63 | 63 % 20 =3 | 3 | 1 |
29 | 29 % 20 =9 | 9 | 1 |
87 | 87 % 20 =7 | 7 | 1 |
77 | 77 % 20 =17 | 17 | 1 |
48 | 48 % 20 =8 (48+1)%20=9 (48+2)%20=10 | Collision occurs 10 | 3 |
65 | 65 % 20 =5 | 5 | 1 |
69 | 69 % 20 =9 (69+1) % 20=10 (69+2) % 20=11 | Collision occurs 11 | 3 |
94 | 94 % 20 =14 | 14 | 1 |
61 | 61 % 20 =1 | 1 | 1 |
Hash Table
--,61,--,63,--,65,--87,48,29,48,69,--,--,94,--,96,77,--,--
2. Quadratic Probing:
key = key % size;
key = (key+i2) % size; here i is 0 to size-1
Suppose we have a list of size 20 (size = 20). We want to put some elements / keys in Quadratic Probing fashion. The elements /Keys are { 96, 48, 63, 29, 87, 77, 48, 65, 69, 94, 61}.
Example: 01
Data Item | Value % No. of Slots | Hash Value | probes |
96 | 96 % 20 =16 | 16 | 1 |
48 | 48 % 20 =8 | 8 | 1 |
63 | 63 % 20 =3 | 3 | 1 |
29 | 29 % 20 =9 | 9 | 1 |
87 | 87 % 20 =7 | 7 | 1 |
77 | 77 % 20 =17 | 17 | 1 |
48 | 48 % 20 =8 (48+1)%20=9 (48+4)%20=12 | Collision occurs 12 | 3 |
65 | 65 % 20 =5 | 5 | 1 |
69 | 69 % 20 =9 (69+1) % 20=10 | Collision occurs 10 | 2 |
94 | 94 % 20 =14 | 14 | 1 |
61 | 61 % 20 =1 | 1 | 1 |
Hash Table
3. Double Hashing:
(h1 (key) + i*h2 (key)) % Table size.
Here h2(key)= PRIME-(key% PRIME) where PRIME is a prime smaller than Table Size.
NOTE: When collision occurs then finds the Second Hash function
Suppose we have a list of size 10 (size = 10). We want to put some elements / keys in Double hashing fashion. The elements /Keys are { 3,22,27,40,42,11,19}.
Example: 01
Data Item | h1=h1(key)% 10 | h2=7-(key%7) | Final Hash value =((h1+i*h2))%10 | probes |
3 | 3 % 10 =3 | - |
| 1 |
22 | 22 % 10 =2 | |
| 1 |
27 | 27 % 10 =7 | |
| 1 |
40 | 40 % 10 =0 | |
| 1 |
42 | 42 % 10 =2 Collision Occurred | 7-(42%7)=7 | (2+1*7)%10=9 | 2 |
11 | 11 % 10 =1 |
|
| 1 |
19 | 19%10=9 Collision Occurred | 7-(19%7)=2 | (9+1*2)%10=1 Collision Occurred (9+2*2)%10=3 Collision Occurred (9+3*2)%10=5
| 4 |
0 | 40 |
1 | 44 |
2 | 22 |
3 | 3 |
4 |
|
5 | 19 |
6 |
|
7 | 27 |
8 |
|
9 | 42 |
Separate Chaining Vs Open Addressing-
Separate Chaining | Open Addressing |
Keys are stored inside the hash table as well as outside the hash table. | All the keys are stored only inside the hash table. No key is present outside the hash table. |
The number of keys to be stored in the hash table can even exceed the size of the hash table. | The number of keys to be stored in the hash table can never exceed the size of the hash table. |
Deletion is easier. | Deletion is difficult. |
Extra space is required for the pointers to store the keys outside the hash table. | No extra space is required. |
Cache performance is poor. This is because of linked lists which store the keys outside the hash table. | Cache performance is better. This is because here no linked lists are used. |
Some buckets of the hash table are never used which leads to wastage of space. | Buckets may be used even if no key maps to those particular buckets. |
Conclusions-
Additional concepts:
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating a new table. It is preferable is the total size of table is a next prime number. There are situations in which the rehashing is required.
Load factor (λ) = Number of Elements present in the hash table (n) /
Total size of the Hash Table (N)
i.e λ<1 ,When λ=1 or λ>1 we need to increase the Hash Table Size this is Known as Rehashing.
Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. The table size is 10 and will use hash function
Example: 1
Data Item | Value % No. of Slots | Hash Value | probes |
37 | 37%10=7 | 7 | 1 |
90 | 90%10=0 | 0 | 1 |
55 | 55%10=5 | 5 | 1 |
22 | 22%10=2 | 2 | 1 |
17 | 17%10=7 (17+1)%10=8 | Collision occurs 8 | 2 |
49 | 49%10=9 | 9 | 1 |
87 | 87 % 10 =7 (87+1)%10=8 (87+2)%10=9 (87+3)%10=0 (87+4)%10=1 | Collision occurs 1 | 5 |
0 | 90 |
1 | 87 |
2 | 22 |
3 |
|
4 |
|
5 | 55 |
6 |
|
7 | 37 |
8 | 17 |
9 | 49 |
And new hash function will be
Data Item | Value % No. of Slots | Hash Value | Probs |
37 | 37%23=14 | 14 | 1 |
90 | 90%23=21 | 21 | 1 |
55 | 55%23=9 | 9 | 1 |
22 | 22%23=22 | 22 | 1 |
17 | 17%23=17 | 17 | 1 |
49 | 49%23=3 | 3 | 1 |
87 | 87 % 23 =18 | 18 | 1 |
0 |
|
1 |
|
2 |
|
3 | 49 |
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 | 55 |
10 |
|
11 |
|
12 |
|
13 |
|
14 | 37 |
15 |
|
16 |
|
17 | 17 |
18 | 87 |
19 |
|
20 |
|
21 | 90 |
22 | 22 |
Extendible hashing:
Basic Structure of Extendible Hashing
Frequently used terms in Extendible Hashing:
Example-1: Now, let us consider a prominent example of hashing the following elements: 16,4,6,22,24,10,31,7,9,20,26. �Bucket Size: 3 (Assume) �Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.
Solution:
First, calculate the binary forms of each of the given numbers. �16- 10000 �4- 00100 �6- 00110 �22- 10110 �24- 11000 �10- 01010 �31- 11111 �7- 00111 �9- 01001 �20- 10100 �26- 11010
Inserting 16: �The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1 LSB of 10000 which is 0. Hence
Inserting 4 and 6: �Both 4(100) and 6(110) have 0 in their LSB. Hence, they are hashed as follows:
Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by directory 0 is already full. Hence, Over Flow occurs.
�
As directed by Step 7-Case 1, Since Local Depth = Global Depth, the bucket splits and directory expansion takes place. Also, rehashing of numbers present in the overflowing bucket takes place after the split. And, since the global depth is incremented by 1, now, the global depth is 2. Hence, 16,4,6,22 are now rehashed w.r.t 2 LSBs.
[ 16(10000),4(100),6(110),22(10110) ] .
*Notice that the bucket which was underflow has remained untouched. But, since the number of directories has doubled, we now have 2 directories 01 and 11 pointing to the same bucket. This is because the local-depth of the bucket has remained 1. And, any bucket having a local depth less than the global depth is pointed-to by more than one directory.
Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on directories with id 00 and 10. Here, we encounter no overflow condition. �
Inserting 31,7,9: All of these elements[31(11111), 7(111), 9(1001) ] have either 01 or 11 in their LSBs. Hence, they are mapped on the bucket pointed out by 01 and 11. We do not encounter any overflow condition here
Inserting 20: Insertion of data element 20 (10100) will again cause the overflow problem.
20 is inserted in bucket pointed out by 00. As directed by Step 7-Case 1, since the local depth of the bucket = global-depth, directory expansion (doubling) takes place along with bucket splitting. Elements present in overflowing bucket are rehashed with the new global depth. Now, the new Hash table looks like this:
Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered. Therefore 26 best fits in the bucket pointed out by directory 010.
The bucket overflows, and, as directed by Step 7-Case 2, since the local depth of bucket < Global depth (2<3), directories are not doubled but, only the bucket is split and elements are rehashed. �Finally, the output of hashing the given list of numbers is obtained.
Implementation of Dictionaries:
In computer science, a dictionary is an abstract data type that represents an ordered or unordered list of key-value pair elements where keys are used to search/locate the elements in the list. In a dictionary ADT, the data to be stored is divided into two parts:
Each item stored in a dictionary is represented by a key-value pair. Key is used to access the elements in the dictionary. With the key we can access value which has more information about the element.
Characteristics of Dictionary
Types of Dictionary
There are two major variations of dictionaries:
Ordered dictionary.
Unordered dictionary.
Basic Dictionary Operations
The dictionary ADT provides operations for inserting the records, deleting the records and searching the records in the collection of databases. Dictionaries typically support so many operations such as:
Insert(key, value)
Delete(x, D) -> deletion of element x(key & value) from the dictionary D.
delete(key)
Search(x, D) -> searching the prescribed value of x in the dictionary D with a key of an element x.
search(key) – value
Example: search(age) – 40
Example: Consider an empty unordered dictionary and the following set of operations: