BUSTER has been openMP enabled for some time now. But what is the optimal number of processors to use?
The number of "threads" that a buster job will use can be controlled with the -nthreads command line option. Up to the last main release, the default behaviour of BUSTER, if run without -nthreads specified, was to use ALL available threads.
This default may well be sensible on a standard "desktop" machine, but is not necessarily optimal or desirable on more powerful, multi-cpu machines.
The default behaviour of BUSTER was changed in the snapshot release from 6th December 2012 as follows:
Change to the number of threads that refine will use by default (nthreads). This is now set by finding ncpu (the number of CPU's that the operating system reports). The value of nthreads is set to min(ncpu,4) if ncpu < 24, 6 if 23 < ncpu < 63, 8 otherwise. The default can be overriden by environment variable OMP_NUM_THREADS or by refine command line argument -nthreads. See BUSTER documentation
$BDG_home/docs/autobuster/manual/autoBUSTER4.html#refine_default_nthreads for full details.
The following tests were carried out to investigate the impact of the number of threads used on a wide variety of structures.
15 different structures were selected of different size (small, moderate, medium and large), a range of resolution (high, moderate and medium), with/without ligands, protein and protein/nucleic acid. All the tests were run on a high spec desktop machine with 2 x 6 core Xeon CPU = 24 threads in total.
The selected structures were:
structure | size | resolution |
2uv4 | small (152 res) | high (1.3A) |
4ec5 | medium (492 res) | moderate (2.2A) |
3trj | large (804 res) | medium (2.8A) |
4baf | small (129 res) | high (1.5A) |
4g4b | small (129 res) | moderate (2.1A) |
4ah1 | small (123 res) | medium (2.8A) |
4e6u | moderate (265 res) | high (1.4A) |
4afa | moderate (262 res) | moderate (2.1A) |
4fne | moderate (254 res) | medium (2.8A) |
4h2g | medium (546 res) | high (1.6A) |
4dlg | medium (540 res) | moderate (1.9A) |
3vue | medium (536 res) | medium (2.7A) |
2fhf | large (1083 res) | high (1.7A) |
2po4 | large (1104 res) | moderate (2.0A) |
1gku | large (1054 res) | medium resolution (2.7A) |
In each case, the following buster refinement jobs were run:
refine -p .. -m .. -nthreads x
where x = 1,4,6,8,12 and 24
Note that the "default" number of threads on this machine is 6.
1] Execution time (in seconds)
2uv4 | 4ec5 | 3trj | 4baf | 4g4b | 4ahi | 4e6u | 4afa | 4h2g | 4dlg | 3vue | 2fhf | 2po4 | 1gku | |
1 | 592 | 1820 | 2362 | 465 | 269 | 445 | 1434 | 639 | 2017 | 1803 | 2219 | 2640 | 2021 | 2785 |
4 | 357 | 717 | 1070 | 271 | 168 | 216 | 1004 | 351 | 1145 | 925 | 1343 | 1617 | 1008 | 1163 |
6 | 323 | 578 | 900 | 253 | 157 | 193 | 942 | 320 | 1051 | 826 | 1240 | 1419 | 946 | 984 |
8 | 311 | 595 | 856 | 244 | 158 | 176 | 918 | 307 | 1011 | 731 | 1151 | 1212 | 871 | 899 |
12 | 303 | 515 | 842 | 244 | 154 | 174 | 921 | 311 | 988 | 718 | 1141 | 1145 | 822 | 864 |
24 | 312 | 493 | 822 | 249 | 152 | 173 | 1000 | 311 | 1052 | 730 | 1146 | 1170 | 869 | 786 |
2] Percentage speed increase with threads used
2uv4 | 4ec5 | 3trj | 4baf | 4g4b | 4ahi | 4e6u | 4afa | 4h2g | 4dlg | 3vue | 2fhf | 2po4 | 1gku | |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 81 | 83 | 84 | 88 | 86 | 84 | 83 | 87 | 85 | 81 | 83 | 68 | 84 | 81 |
6 | 93 | 94 | 95 | 96 | 96 | 93 | 95 | 96 | 94 | 95 | 91 | 82 | 90 | 90 |
8 | 97 | 92 | 98 | 100 | 95 | 99 | 100 | 100 | 98 | 99 | 99 | 96 | 96 | 94 |
12 | 100 | 98 | 99 | 100 | 98 | 99 | 99 | 99 | 100 | 100 | 100 | 100 | 100 | 96 |
24 | 97 | 100 | 100 | 98 | 100 | 100 | 84 | 99 | 94 | 99 | 99 | 98 | 96 | 100 |
The results of the tests indicate that in most cases, that 6 threads (the new default for BUSTER on a machine with > 23 and < 63 cpu's) is optimal - there is little if any speed increase afforded by increasing the number of threads used.
For large structures there may be some benefit in increasing the number of threads used, but only to 8.