Lab 7
61C Summer 2023
SIMD
SIMD Intrinsics
SIMD Intrinsics
Intel Intrinsic Functions
Intel Intrinsics
Intel Intrinsics Examples
We have to load arrays from memory into vector registers
We update arr1
Another Intrinsics Example
Another Intrinsics Example
We first create sum_vec(4 ints wide set to all 0s) to store our sum
sum_vec:
0
0
0
0
Another Intrinsics Example
Next we load in 4 ints (elems 0-3) to a temp vector
sum_vec:
0
0
0
0
tmp:
1
3
4
1
Another Intrinsics Example
We add sum_vec and tmp together
sum_vec:
1
3
4
1
tmp:
1
3
4
1
Another Intrinsics Example
We load in the next 4 elems of arr into tmp
sum_vec:
1
3
4
1
tmp:
9
5
2
6
Another Intrinsics Example
Once again we add sum_vec and tmp
sum_vec:
10
8
6
7
tmp:
9
5
2
6
Another Intrinsics Example
Finally, we store sum_vec into a temporary array and then add up all 4 elements of that array
sum_vec:
10
8
6
7
tmp:
9
5
2
6
Loop unrolling
Loop Unrolling Example
int N = 100;�int arr[N];
for (int i = 0; i < N; i += 1) {
arr[i] = i;
}
int N = 100;�int arr[N];
for (int i = 0; i < N; i += 4) {
arr[i] = i;� arr[i + 1] = i + 1;� arr[i + 2] = i + 2;� arr[i + 3] = i + 3;
}
Loop Unrolling Example with tail case
int N = 103;�int arr[N];
for (int i = 0; i < N; i += 1) {
arr[i] = i;
}
int N = 103;�int arr[N];
for (int i = 0; i < N / 4 * 4; i += 4) {� arr[i] = i;� arr[i + 1] = i + 1;� arr[i + 2] = i + 2;� arr[i + 3] = i + 3;�}
for (int i = N / 4 * 4; i < N; i += 1) {� arr[i] = i;�}