Comparison Based Sorting Algorithm Analysis
4th Semester CMSA
By
Sumana Bandyopadhyay
How fast a Comparison based sorting works
Quick Sort
Quick Sort Algorithm
1. If (p < r)then //be sorted whose lowest // index is p and highest index r
1.1 q<-partition(A, p, r) // partition function returns
1.2 Quick sort(A, p, q-1) // an index q as the index // of pivot element
1.3 Quick sort(A, q+1, r)
2. End of If
3. End
Quick sort algorithm (contd.)
Partition
Input: data type A[ ] , p, r
Output: pivot position as index
3.1 if (A[j]≤x) then // to keep all smaller or equal element of
3.1.1 i<-i+1 // pivot element chosen to the left portion of the array
3.1.2 swap (A, i, j)
3.2 end of if
5. return (i+1)
Performance Analysis
So, C(n) = (n – 1) + C(t) + C(n – t –1)�
Performance Analysis: Worst Case (Contd.)
If according to ordering of elements of A[], rightmost element, chosen as pivot is either less than all the remaining elements or greater than all elements of A[]. In such case portioning, one sub list has 0 elements and another one has (n-1) element.
So, C(0)=0, C(n) = (n-1) + C(n-1)
This is a recurrence relation.
C(1)=0, C(2)=1+C(1), C(3)=2+C(2), C(4)= 3+C(3),…….,
C(n) = (n-1) + C(n-1)
C(n)=(n-1)+(n-2)+(n-3)+…..+3+2+1
=½(n-1) x n
=0.5 x n2- 0.5 x n�
Average case analysis
C(n)=n-1+(2/n)x(C(0)+C(1)+C(2)+…..+C(n-1))….(1)
C(n-1)=n-2+(2/(n-1))x(C(0)+C(1)+…+C(n-2)……(2), Multiplying (1) by n and (2) by (n-1) and subtracting….
n x C(n)-(n-1)x C(n-1)=2x(n-1)+C(n-1)…………….(3)
C(n)x(1/(n-1))=2/n+(2xC(n-1))/(n)………………….(4)
Average case analysis(Contd.)
C(n)/(n-1)=2x{1/n+1/(n-1)+1/(n-2)+…..+C(2)/3}…(5)
Now, for a Harmonic series,
Hn={1+1/2+1/3+…..+1/n} can be approximated by the area under the curve {1/x} from ½ to n+ ½, i.e.,
Hn= ∫ ½n+ ½ {1/x}dx ≈lg n+0.7,
Hence, C(n)/(n-1) ≈ 2x{lg n+0.7}………………………..(6)
Finally, C(n) ≈ O(nlg n)
Logic behind Quick Sort
Logic of Merge Sort
Merge Sort: Contiguous Storage Version
Merge Sort
Input: A[ ], l, h
Output: A[l: h] in sorted sequence
1.1 mid<- ⎣(l+h)/2⎦
1.2 Merge Sort(A[ ], l, mid)
1.3 Merge Sort(A[ ], mid+1, h)
1.4 Merge (A[ ], l, mid, h)
2. End of If
3. End Merge Sort
Merge Sort: Contiguous Version(contd.)
Merge
Input: A[ ], l, mid, h, B[ ]
Output: A[ ], l, h
2.1 if (A[k] ≤ A[j]) then
2.1.1 B[i]<- A[k]
2.1.2 k<- k+1
2.2 else
2.2.1 B[i]<- A[j]
2.2.2 j <- j+1
2.3 End of if
2.4 i<- i +1
3. End of while
Merge Sort: Contiguous Version(contd.)
4. if( k>mid) then
4.1 Repeat for p = j to h do
4.1.1 B[i]<- A[p]
4.1.2 i <- i + 1
4.2 End of for
5. else
5.1 Repeat for p = k to mid do
4.1.1 B[i]<- A[p]
4.1.2 i <- i + 1
4.2 End of for
6 end of if
7. Repeat for i = l to h do
7.1 A[i] <- B[i]
8. End of Repeat
9. End of Algorithm
Analysis of Contiguous Version
Merge Sort for Linked List
Merge Sort
Input: List_type * p
Output: List_type * head
2.1 q <- Divide(p)
2.2 p<- Merge Sort (p)
2.3 q<- Merge Sort(q)
2.4 head <- Merge (p, q)
3. End of if
4. End of Merge Sort
Merge Sort for Linked List (contd.)
Divide
Input: List_type * p
Output: List_type *r
2.1 r<-(r->next)
2.2 q<-(q->next)
2.3 if(r!=Null) then
2.3.1 r<-(r->next)
2.4 end of if
Merge Sort for Linked List(contd)
Merge
Input: List_type * head, *r
Output: List_type *head
1.1 write(“error: empty list can’t be merged”)
1.2 exit
3.1 head<- p
3.2 p <- (p->next)
4. else
4.1 head<- q
4.2 q<-(q->next)
5. end of if
Merge Sort for Linked List(contd)
6. r<- head
7. Repeat while (p & q) do
7.1 if(p->data ≤ q->data) then
7.1.1 r<-(r->next)
7.1.2 r<- p
7.1.3 p<-(p->next)
7.2 else
7.2.1 r<-(r->next)
7.2.2 r<- q
7.2.3 q<-(q->next)
7.3 end of if
8. end of while
9. if(p!=Null){
9.1 (r->next)<- p
10 else
10.1 (r->next)<-q
11. End of if
12. Return head
Complexity Analyses of Merge Sort
Complexity Analyses of Merge Sort(contd.)
8
4
4
2
2
2
2
1
2
3
4
5
6
7
8
A[]={4, 1, 3, 2, 16, 9, 10, 14, 8, 7}
4
1
3
2
16
9
10
14
8
7
1
2
3
4
5
6
7
8
9
10
To make array A, a max heap, consider its length 10, i., e., n is 10. Last internal node at is in index 5, i.e., index(n/2), whose value is 16. 16 is greater than its left child 7, so, no swapping.
4
1
3
14
16
9
10
2
8
7
1
2
3
4
5
6
7
8
9
10
A[]={4, 1, 3, 14, 16, 9, 10, 2, 8, 7}
Next internal node is at index 4 and its value is 2. 2 is less than its children. Among its children 14 is greater. So, swap 14 and 2
4
1
3
2
16
9
10
14
8
7
1
2
3
4
5
6
7
8
9
10
4
1
10
14
16
9
3
2
8
7
1
2
3
4
5
6
7
8
9
10
A[]={4, 1, 10, 14, 16, 9, 3, 2, 8, 7}
Next internal node is at index 3 and its value is 3. 3 is less than its children. Among its children 10 is greater. So, swap 10 and 3.
4
1
3
14
16
9
10
2
8
7
1
2
3
4
5
6
7
8
9
10
4
16
10
14
1
9
3
2
8
7
1
2
3
4
5
6
7
8
9
10
A[]={4, 16, 10, 14, 1, 9, 3, 2, 8, 7}
Next internal node is at index 2 and its value is 1. 1 is less than its children. Among its children 16 is greater. So, swap 16 and 1.
4
1
3
14
16
9
10
2
8
7
1
2
3
4
5
6
7
8
9
10
4
16
10
14
7
9
3
2
8
1
1
2
3
4
5
6
7
8
9
10
A[]={4, 16, 10, 14, 7, 9, 3, 2, 8, 1}
Now, internal node 1 is at index 5 and its value is 1. 1 is less than its child 7 which is at index 10. So, swap 1 and 7 and 7 is now at index 5
4
16
10
14
1
9
3
2
8
7
1
2
3
4
5
6
7
8
9
10
16
4
10
14
7
9
3
2
8
1
1
2
3
4
5
6
7
8
9
10
A[]={16, 4, 10, 14, 7, 9, 3, 2, 8, 1}
Lastly, root node is at index 1 and its value is 4. 4 is less than its children. Among its children 16 is greater. So, swap 16 and 4.
4
16
10
14
7
9
3
2
8
1
1
2
3
4
5
6
7
8
9
10
16
14
10
4
7
9
3
2
8
1
1
2
3
4
5
6
7
8
9
10
A[]={16, 14, 10, 4, 7, 9, 3, 2, 8, 1}
Now, 4 is at index 2 and its left child is 14. 4 is less than its children. Among its children 14 is greater. So, swap 14 and 4. 4 is at index 4.
16
4
10
14
7
9
3
2
8
1
1
2
3
4
5
6
7
8
9
10
16
14
10
8
7
9
3
2
4
1
1
2
3
4
5
6
7
8
9
10
A[]={16, 14, 10, 8, 7, 9, 3, 2, 4, 1}
However, 4 is not in its proper index and its right child at index 9 is 8 and greater than 4. So, swap 4, 8. 4 is now at index 9.
16
14
10
4
7
9
3
2
8
1
1
2
3
4
5
6
7
8
9
10
What is Heap?
Max Heap
A[]={16, 14, 10, 8, 7, 9, 3, 2, 4, 1}
16
14
10
8
7
9
3
2
4
1
1
2
3
4
5
6
7
8
9
10
Max Heap
Heap Sort
1. In an arbitrary array Heap sort calls Build_Heap to construct Heap.
2. Swap A[1] and A[n], so, A[n] is now the maximum element.
3. Size of A[] is (n-1) now.
4. Invoke Build_Heap for A[1:n-1]
5. Swap A[1] and A[n-1]
6. Size of A[] is (n-2)
7. Repeat the steps 2 to 6 till n is reduced to 1
How much Heap Sort works in worst case
2 x ∑i=0 to ⎣n/2⎦ lg(n/i)=2x(⎣n/2⎦ lg n-lg (⎣n/2⎦)!
=5 x ⎣n/2⎦=2.5 x n
By Stirling’s approximation and lg(⎣n/2⎦)= lg n -1,
We get lg (⎣n/2⎦)! = ⎣n/2⎦ lg n – 2.5 X n
Representing Every Comparison Sort
Algorithm must “find” the right answer among n! possible answers
Starts “knowing nothing” and gains information with each comparison
Can represent this process as a decision tree
CSE 332 Data Abstractions, Summer 2012
38
July 16, 2012
Decision Tree for n = 3
CSE 332 Data Abstractions, Summer 2012
39
July 16, 2012
a < b < c, b < c < a,
a < c < b, c < a < b,
b < a < c, c < b < a
a < b < c
a < c < b
c < a < b
b < a < c
b < c < a
c < b < a
a < b < c
a < c < b
c < a < b
a < b < c
a < c < b
b < a < c
b < c < a
c < b < a
b < c < a
b < a < c
a < b
a > b
a > c
a < c
b < c
b > c
b < c
b > c
c < a
c > a
a ? b
The leaves contain all the possible orderings of a, b, c
What the Decision Tree Tells Us
Is a binary tree because
Because any data is possible, any algorithm needs to ask enough questions to decide among all n! answers
CSE 332 Data Abstractions, Summer 2012
40
July 16, 2012
Decision Tree for n = 3
CSE 332 Data Abstractions, Summer 2012
41
July 16, 2012
a < b < c, b < c < a,
a < c < b, c < a < b,
b < a < c, c < b < a
a < b < c
a < c < b
c < a < b
b < a < c
b < c < a
c < b < a
a < b < c
a < c < b
c < a < b
a < b < c
a < c < b
b < a < c
b < c < a
c < b < a
b < c < a
b < a < c
a < b
a > b
a > c
a < c
b < c
b > c
b < c
b > c
c < a
c > a
a ? b
possible �orders
actual�order
Where are We
Proven: No comparison sort can have worst-case better than the height of a binary tree with n! leaves
Now: Show a binary tree with n! leaves has height Ω(n log n)
Conclude that: (Comparison) Sorting is Ω(n log n)
CSE 332 Data Abstractions, Summer 2012
42
July 16, 2012
Lower Bound on Height
h ≥ log2 (n!) property of binary trees
= log2 (n*(n-1)*(n-2)…(2)(1)) definition of factorial
= log2 n + log2 (n-1) + … + log2 1 property of logarithms
≥ log2 n + log2 (n-1) + … + log2 (n/2) keep first n/2 terms
≥ (n/2) log2 (n/2) each of the n/2 terms left is ≥ log2 (n/2)
≥ (n/2)(log2 n - log2 2) property of logarithms
≥ (1/2)nlog2 n – (1/2)n arithmetic
“=“ Ω (n log n)
CSE 332 Data Abstractions, Summer 2012
43
July 16, 2012
Lower Bound on Height
The height of a binary tree with L �leaves is at least log2 L
So the height of our decision tree, h:
h ≥ log2 (n!)
= log2 (n*(n-1)*(n-2)…(2)(1))
= log2 n + log2 (n-1) + … + log2 1
≥ log2 n + log2 (n-1) + … + log2 (n/2
≥ (n/2) log2 (n/2)
= (n/2)(log2 n - log2 2)
≥ (1/2)nlog2 n – (1/2)n �"=" Ω(n log n)
CSE 332 Data Abstractions, Summer 2012
44
July 16, 2012
Insertion Sort Algorithm
Chapter 10: Sorting
45
Lets analyze the Insertion sort
Input Size
Running Time
Run Time of Insertion Sort
In the RAM model the total time required is the sum of that for each statement:
Steps Cost Times
Best Case
The run time is thus a linear function of n
Worst Case: The algorithm cannot run slower!
The run time is thus a quadratic function of n
Average Case
Bubble Sort Algorithm, Refined
Chapter 10: Sorting
53
Analysis of Bubble Sort
Chapter 10: Sorting
54