VMD 1.9.3 AVX-512 Acceleration Notes
clusters and supercomputers at NSF/TACC Stampede-2 (left),
DOE/ALCF Theta (right).
VMD 1.9.3 has been adapted to improve performance on the latest Xeon Phi processors by taking advantage of vector instructions for key computationally demanding analysis algorithms, and by using Intel-hardware-optimized frameworks and libraries such as the OpenSWR rasterizer and the OSPRay ray tracing engine to achieve much better performance than would otherwise be possible with conventional portable software.
The immediate short-term target for this development effort has been to support the DOE Exascale Early Science Program on the ALCF Theta system, and the new NSF/TACC Stampede-2 system, both of which are based on Xeon Phi Knight's Landing processors.
Some very early performance results on KNL-based systems are presented here.
Xeon Phi KNL AVX-512 System Configuration Notes
AVX-512 instruction set variants At present, the VMD builds for Xeon Phi will not run on any other Intel CPUs, as VMD doesn't (yet) contain sufficient CPU feature detection logic to permit the same binary to target many different CPU types. The current AVX-512 implementation in VMD specifically targets the Xeon Phi processors that also implement the scientific computing instruction set extensions known as AVX-512ER, AVX-512CD, and AVX-512PF. We expect to add support for future Intel Xeon CPUs that support only the AVX-512F "foundation" instructions in a future version of VMD when those processors become available.
KNL MCDRAM Configuration: Several of the upcoming clusters and supercomputers based on self-hosted Xeon Phi compute nodes support on-the-fly hardware configuration customization for the on-chip MCDRAM, specified at the time of job launch. In the case of deskside workstations these configuration changes are made through the remote system management interfaces, or with the BIOS during system power-on self-test.
We expect that VMD will obtain the best overall performance when the system is configured with MCDRAM set to "Cache" mode, where it behaves as a 16GB direct-mapped cache. We currently suggest setting the MCDRAM cluster mode to "All-to-All" in typical general purpose usage. See the Intel Hotchips presentation for further information on the behaviour of MCDRAM cache and clustering modes.
Xeon Phi with GPUs In principle the VMD builds for Xeon Phi can support GPU accelerators, e.g., in a KNL-self-hosted machine, however we have not yet tried such a configuration and at present it seems to be an unlikely scenario in the field.
Example VMD Xeon Phi KNL AVX-512 Startup Messages:
Info) VMD for LINUXAMD64, version 1.9.3 (November 29, 2016) Info) http://www.ks.uiuc.edu/Research/vmd/ Info) Email questions and bug reports to email@example.com Info) Please include this reference in published work using VMD: Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38. Info) ------------------------------------------------------------- Info) Multithreading available, 256 CPUs detected. Info) CPU features: SSE2 AVX AVX2 FMA KNL:AVX-512F+CD+ER+PF Info) Free system memory: 97GB (95%) Info) No CUDA accelerator devices available. Info) Dynamically loaded 2 plugins in directory: Info) /usr/local/lib/vmd/plugins/LINUXAMD64/molfile vmd >