What is the optimal number of processors to use in BUSTER?

BUSTER has been openMP enabled for some time now. But what is the optimal number of processors to use?

The number of "threads" that a buster job will use can be controlled with the -nthreads command line option. Up to the last main release, the default behaviour of BUSTER, if run without -nthreads specified, was to use ALL available threads.

This default may well be sensible on a standard "desktop" machine, but is not necessarily optimal or desirable on more powerful, multi-cpu machines.

The default behaviour of BUSTER was changed in the snapshot release from 6th December 2012 as follows:

Change to the number of threads that refine will use by default (nthreads). This is now set by finding ncpu (the number of CPU's that the operating system reports). The value of nthreads is set to min(ncpu,4) if ncpu < 24, 6 if 23 < ncpu < 63, 8 otherwise. The default can be overriden by environment variable OMP_NUM_THREADS or by refine command line argument -nthreads. See BUSTER documentation

$BDG_home/docs/autobuster/manual/autoBUSTER4.html#refine_default_nthreads
for full details.

The following tests were carried out to investigate the impact of the number of threads used on a wide variety of structures.

15 different structures were selected of different size (small, moderate, medium and large), a range of resolution (high, moderate and medium), with/without ligands, protein and protein/nucleic acid. All the tests were run on a high spec desktop machine with 2 x 6 core Xeon CPU = 24 threads in total.

The selected structures were:

structure size resolution
2uv4 small (152 res) high (1.3A)
4ec5 medium (492 res) moderate (2.2A)
3trj large (804 res) medium (2.8A)
4baf small (129 res) high (1.5A)
4g4b small (129 res) moderate (2.1A)
4ah1 small (123 res) medium (2.8A)
4e6u moderate (265 res) high (1.4A)
4afa moderate (262 res) moderate (2.1A)
4fne moderate (254 res) medium (2.8A)
4h2g medium (546 res) high (1.6A)
4dlg medium (540 res) moderate (1.9A)
3vue medium (536 res) medium (2.7A)
2fhf large (1083 res) high (1.7A)
2po4 large (1104 res) moderate (2.0A)
1gku large (1054 res) medium resolution (2.7A)

In each case, the following buster refinement jobs were run:

refine -p .. -m .. -nthreads x

where x = 1,4,6,8,12 and 24

Note that the "default" number of threads on this machine is 6.


Results

1] Execution time (in seconds)

2uv4 4ec5 3trj 4baf 4g4b 4ahi 4e6u 4afa 4h2g 4dlg 3vue 2fhf 2po4 1gku
1 592 1820 2362 465 269 445 1434 639 2017 1803 2219 2640 2021 2785
4 357 717 1070 271 168 216 1004 351 1145 925 1343 1617 1008 1163
6 323 578 900 253 157 193 942 320 1051 826 1240 1419 946 984
8 311 595 856 244 158 176 918 307 1011 731 1151 1212 871 899
12 303 515 842 244 154 174 921 311 988 718 1141 1145 822 864
24 312 493 822 249 152 173 1000 311 1052 730 1146 1170 869 786

speed-times.png

2] Percentage speed increase with threads used

2uv4 4ec5 3trj 4baf 4g4b 4ahi 4e6u 4afa 4h2g 4dlg 3vue 2fhf 2po4 1gku
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 81 83 84 88 86 84 83 87 85 81 83 68 84 81
6 93 94 95 96 96 93 95 96 94 95 91 82 90 90
8 97 92 98 100 95 99 100 100 98 99 99 96 96 94
12 100 98 99 100 98 99 99 99 100 100 100 100 100 96
24 97 100 100 98 100 100 84 99 94 99 99 98 96 100

speed-percent.png

The results of the tests indicate that in most cases, that 6 threads (the new default for BUSTER on a machine with > 23 and < 63 cpu's) is optimal - there is little if any speed increase afforded by increasing the number of threads used.

For large structures there may be some benefit in increasing the number of threads used, but only to 8.