.

.

32-byte element. Test done on a several years old L1, L2 cache architecture PC64-byte element. Test done on a several years old L1, L2 cache architecture PC

.

elementsLL-32 (x86)V-32 (x86) elementsLL-64 (x86)V-64 (x86)

.

1000.0110.0191000.0120.024

.

2000.0290.0482000.0390.067

.

4000.090.1494000.0920.224

.

8000.3370.5238000.691.223

.

1,0000.6470.821,0001.2862.107

.

2,0004.94.5592,0006.66110.307

.

3,00013.02911.6293,00016.12324.046

.

4,00026.4321.1134,00029.82442.732

.

5,00042.96233.5395,00046.85167.039

.

10,000187.001136.09410,000195.812263.162

.

15,000418.807307.59715,000443.482615.781

.

20,000764.397545.68620,000811.7341103.635

.

25,0001238.982852.85625,0001852.4752266.393

.

30,0001939.0151227.3630,0004426.3415635.21

.

35,0003100.051680.61335,0007074.2395269.106

.

40,0004147.8632339.61240,0008373.5136123.777

.

.

.

.

.

32-byte element. Test done on a 2012 bought x64 laptop with L1, L2 and L3 cache architecture64-byte element. Test done on a 2012 bought x64 laptop with L1, L2 and L3 cache architecture

.

elementsLL-32 (x64)V-32 (x64)elementsLL-64 (x64)V-64 (x64)

.

1000.0090.0111000.0230.013

.

2000.0240.0222000.0230.033

.

4000.0810.0694000.0790.099

.

8000.3840.2378000.3840.479

.

1,0000.6820.4061,0000.7380.842

.

2,0003.561.8342,0003.9853.763

.

3,00010.0584.5093,00010.3168.436

.

4,00021.5488.1414,00020.55215.177

.

5,00040.22412.8245,00038.69824.842

.

10,000288.00451.35210,000299.656111.604

.

15,000803.816122.50215,000815.587271.979

.

20,0001549.002244.86720,0001573.75494.343

.

25,0002549.716379.41825,0002651.837785.363

.

30,0003861.683565.43330,0004044.2291153.088

.

35,0005462.584768.01735,0005845.4021635.971

.

40,0007801.3021012.52740,0008163.1872221.651

.

.

.

Below can be seen the linear insertion (sorted) of random elements. 200, 4.000 and 10.000 elements are shown.

The y-axis is time in milliseconds. The x-axis is how for that number of elements the time vary when the POD size grow (4,8,...256 bytes).

For a given number of elements the linked-list cost is almost constant no matter how the POD size changes.Vector on the other hand show how the performance worsens as the cost of copying increases for larger PODs.

Vector "catches up" as the number of elements increase as more elements mean more cache utilization.

It is interesting to see the performance difference between the x64 . The x64 with three levels of cache (L1, L2, L3) is much more vector effective than the x86 with two levels of cache. The linked-list performance is also improved on the x64 but not at all to the same extent

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x86200 elements

.

200 elementsBytesVector x86Linked-List x86

.

40.0470.037

.

80.0430.039

.

160.0360.031

.

320.0480.029

.

640.0670.039

.

1280.1180.035

.

2560.2980.04

.

.

.

x864000 elements

.

4,000 elementsBytesVector x86Linked-List x86

.

49.03117.107

.

88.17525.805

.

1610.54221.681

.

3221.11326.43

.

6442.73229.824

.

12874.63830.472

.

256143.06329.771

.

.

.

.

.

x8610000 elements

.

10,000 elementsBytesVector x86Linked-List x86

.

455.962193.583

.

860.629217.488

.

1675.653176.983

.

32136.094187.001

.

64263.162195.812

.

128482.345196.886

.

2561513.701225.866

.

.

Below x64

.

x64 BytesVector x64Linked-List x64

.

200 elements40.050.07

.

80.0180.024

.

160.020.026

.

320.0220.024

.

640.0330.023

.

1280.0510.025

.

2560.0980.027

.

.

.

4,000 elements

.

x64 BytesVector x64Linked-List x64

.

4,000 elements43.15614.028

.

83.513.658

.

164.69416.264

.

328.14121.548

.

6415.17720.552

.

12826.90623.911

.

25650.61124.21

.

.

.

.

10,000 elements

.

x64 BytesVector x64Linked-List x64

.

10,000 elements419.096143.862

.

823.642130.475

.

1631.254214.536

.

3251.352288.004

.

64111.604299.656

.

128192.058307.171

.

256387.333325.666

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.