we will use the term hash table for an array and the function that will carry out the transformation will be called a hash function
Hash table is a data structure in which keys are mapped to array positions by a hash function A value stored in a hash table can be searched in O(1) time by using a hash function which generates an address from the key
In a hash table, an element with key k is stored at index h(k)
process of mapping the keys to appropriate locations (or indices) in a hash table is called hashing
h(k) The function which performs this transformation is called hash function
when two or more keys map to the same memory location, a collision
Properties of Good Hash Function
Low Cost
Distinctive
Uniformity
Division Method �h(x) = x mod M �int const M = 97; // a prime number int h (int x)
{ return (x % M); } �
Multiplication Method
Step 1: Choose a constant A such that 0 < A < 1 �Step 2: Multiply the key k by A. �Step 3: Extract the fractional part of kA. �Step 4: Multiply the result of Step 3 by the size of hash table (m) �h(k) = || m (kA mod 1) ||
(sqrt5 – 1) /2 = 0.6180339887 �A = 0.618033, m = 1000, and k = 12345
h(12345) = Î 1000 (12345 ¥ 0.618033 mod 1) ˚
h(12345) = Î 1000 (7629.617385 mod 1) ˚
h(12345) = Î 1000 (0.617385) ˚
h(12345) = Î 617.385 ˚
h(12345) = 617
Mid-Square Method �Step 1: Square the value of the key. That is, find k2.
Step 2: Extract the middle r digits of the result obtained in Step 1 �h(k) = s
where s is obtained by selecting r digits from k2 �
When k = 1234, k2 = 1522756,
h (1234) = 27
When k = 5642, k2 = 31832164,
h (5642) = 21
Folding Method ��
Folding Method
Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2, ..., kn, where each part has the same number of digits except the last part which may have lesser digits than the other parts.
Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash value is produced by ignoring the last carry, if any ��
Collision Resolution by Linear Probing (open addressing)�Suppose new record R with key K is to be added to the memory table T, but that memory with H(k) = h is already filled, one way of avoiding collision is to design R to 1st available location following T[h]. We assume that T with m location is circular so T[1] comes after T[m]. according to procedure, we search for record R in table T by linearly searching location T[h], T[h+1]… until we meet an empty location or finding R
H(92) = 92 mod 10 =2
Collision occurred since 2 is already filled. So go to next position – 3, which is also already filled, go to
next position – 4 which is also already filled. So go to 5 – which is not filled – so insert the key 92 in
position 5.
0 | 1 | 2 | 3 | 4 | 5 | 6 |
120 | 141 | 362 | 93 | 154 | -1 | -1 |
Average search length = (1+1+1+1+4+1+1+8)/ 8 = 2.25 �
2. Quadratic Probing�A variation of the linear probing idea is called quadratic probing. Instead of using a constant “skip” value, if the first hash value that has resulted in collision is h, the successive values which are probed are h+1, h+4, h+9, h+16, and so on. In other words, quadratic probing uses a skip consisting of successive perfect squares.
Double Hashing�In double hashing, we use two hash functions rather than a single function.
Double hashing uses the idea of applying a second hash function to the key when a collision occurs. The result of the second hash function will be the number of positions form the point of collision to insert. There are a couple of requirements for the second function:�it must never evaluate to 0
must make sure that all cells can be probed �
A popular second hash function is: Hash2(key) = M - ( key % M ) where M is a prime number
37,90,45,22,17,49,55�H1(key)=key%10�H2(key)=7-(key%7)�After collision�(H1(key)+1*H2(key))%10
Rehashing�When the hash table becomes nearly full, the number of collisions increases, thereby degrading the performance of insertion and search operations. In such cases, a better option is to create a new hash table with size double of the original hash table.�All the entries in the original hash table will then have to be moved to the new hash table. This is done by taking each entry, computing its new hash value, and then inserting it in the new hash table.�Though rehashing seems to be a simple process, it is quite expensive and must therefore not be done frequently.�Consider the hash table of size 5 given below.
The hash function used is h(x) = x % 5. Rehash the entries into to a new hash table
Chaining�• Chaining technique avoids collision using an array of liked lists (run time).If more than one key has same hash value, then all the keys will be inserted at the end of the list (insert rear) one by one and thus collision is avoided.�• Example: Construct a hash table of size and store the following words:
like, a, tree, you, first, a, place, to
Let H(str)=P0+P1+P2+……+Pn-1; where Pi is position of letter in English alphabet series.�• Then calculate the hash address = Sum % 5 �H(like) = 12 + 9 + 11 + 5 = 37 % 5 = 2�H(a) = 1 %5 =1�H(tree) = 20 + 18 + 5 + 5 = 48 % 5 = 3�H(you) = 25 + 15 + 21 = 61 % 5 = 1�H(first) = 6 + 9 + 18 + 19 + 20 = 72 % 5 = 2�H(a) = 1 % 5 = 1�H(place) = 16 + 12 + 1 + 3 + 5 = 37 % 5 =2�H(to) = 20 + 15 = 35 % 5 = 0 �
Static Hashing:�• Is a hashing technique in which the table(bucket) size remains the same (Fixed during compilation time) is called static hashing.�• Various techniques of static hashing are linear probing and chaining�• As the size is fixed, this type of hashing consist handling overflow of elements (Collision) efficiently.
Drawbacks of static hashing�1. Table size is fixed and hence cannot accommodate data growth.�2. Collisions increases as data size grows.
�Dynamic Hashing:�Dynamically increases the size of the hash table as collision occurs. There are two types:�1) Dynamic hashing using directory or (Extendible hashing) : uses a directory that grows or shrinks depending on the data distribution. No overflow buckets�2) Directory less Dynamic hashing or (Linear hashing): No directory. Splits buckets in linear order, uses overflow bucket
�Dynamic hashing using directory�• Uses a directory of pointers to buckets/bins which are collections of records�• The number of buckets are doubled by doubling the directory, and splitting just the bin that overflowed.�• Directory much smaller than file, so doubling it is much cheaper ��
Directory less Dynamic hashing or (Linear hashing
Basic Idea:�• Pages are split when overflows occur – but not necessarily the page with the overflow.�• Directory avoided in Linear hashing by using overflow pages. (chaining approach)�• Splitting occurs in turn, in a round robin fashion.one by one from the first bucket to the last bucket.�• Use a family of hash functions h0, h1, h2, ...�• Each function’s range is twice that of its predecessor.�• When all the pages at one level (the current hash function) have been split, a new level is applied
Insert in Order using linear hashing: 1,7,3,8,12,4,11,2,10
0 |
1 |
2 |
3 | 12 |
1 | 7 |
8 | 11 |
4 | |
A max tree is a tree in which the key value in each node is no smaller than the key values in its children (if any). A max heap is a complete binary tree that is also a max tree.
A min tree is a tree in which the key value in each node is no larger than the key values in its children (if any). A min heap is a complete binary tree that is also a min tree. � �
Heaps are frequently used to implement priority queues.
Leftist tree
An extended binary tree is a binary tree in which all empty binary subtrees have been replaced by a square node
Let X be a node in an extended binary tree. Let left-child (x) and right-child (x), respectively, denote the left and right children of the internal node x.
shortest (x) to be the length of a shortest path from x to an external node �
A leftist tree is a binary tree such that if it is not empty, then shortest {left - child (x)) > shortest {right - child (x)) �
typedef struct {
int key; } element;
typedef struct leftist
struct leftist {
leftist—tree left—child;
element data;
leftist—tree right—chiId;
int shortest;
} ; �
A min-leftist tree (max leftist tree) is a leftist tree in which the key value in each node is no larger (smaller) than the key values in its children (if any).
a min (max) leftist tree is a leftist tree that is also a min (max) tree
both insert and delete min operations by using the combine operation. To insert an element, x, into a min-leftist tree, a, we first create a min-leftist tree, b, that contains the single element x.
Then we combine the min-leftist trees a and B. To delete the min element from a nonempty min-leftist tree, a, we combine the minleftist trees a ~> left-child and a -> right-child and delete node a. �
Two types
1 Height Biased
2 Weight Biased
OPTIMAL BINARY SEARCH TREES
A Binary Search Tree (BST) is a tree where the key values are stored in the internal nodes. The external nodes are null nodes. The keys are ordered lexicographically, i.e. for each internal node all the keys in the left sub-tree are less than the keys in the node, and all the keys in the right sub-tree are greater
Consider four keys A, B, C, and D to be searched for with probabilities 0.1, 0.2, 0.4, and 0.3
The average number of comparisons in a successful search in the first of these trees is
0.1 .1+ 0.2 . 2 + 0.4 . 3+ 0.3 . 4 = 2.9, and for the second one it is
0.1 . 2 + 0.2 . 1+ 0.4 . 2 + 0.3 .3=2.1. �
The total number of binary search trees with n keys is equal to the nth Catalan number c(n)=(2n)!/(n+1)!n!
A B C D
0.1 0.2 0.4 0.3