+--------------------------------------------------------------------+
|                                                                    |
|                      NAMD 2.5b2 Release Notes                      |
|                                                                    |
+--------------------------------------------------------------------+

This file includes directions for running NAMD on various plaftorms,
tips on limiting memory usage and improving parallel scaling, an
explanation of the automated usage reporting mechanism, advice for
dealing with endian issues when switching platforms, a procedure for
reporting bugs, and detailed compilation instructions.

----------------------------------------------------------------------

New Features

--- Improved Parallel Scaling and Serial Performance ---

Load balancer and communication library improvements that allow
NAMD to scale to 1000 or more processors on PSC's Lemieux are
included in this release.  For more modest Linux clusters we now
provide TCP versions that outperform traditional UDP for gigabit
ethernet and all released Linux binaries are built with the Intel
compiler for better performance on Pentium 4 and Xeon processors
and no penalty for Pentium III or Athlon processors.  Finally,
the inner loop has been optimized to incorporate pairlists that
are saved between steps and automatically adjusted.  Pairlists
can be disabled to save memory via the pairlistMinProcs option.

--- Trajectory Reading and Interaction Energy Analysis ---

A new "coorfile" command allows Tcl scripts to read coordinates
from DCD files, allowing energies and forces to be evaluated for
a saved trajectory.  This is most usefully combined with the new
pair interaction feature, allowing the isolation of forces between
two specified groups of atoms, or within a single group.

--- Improved Constant Pressure Simulation and Coordinate Wrapping ---

Average pressure is calculated for steps between energy outputs.
Berendsen method uses average rather then instantaneous pressure.
Pressure contributions due to steering forces handled consistently.
Ratio of first two basis vectors can be fixed for flexible cells.
Any connected fragment can be wrapped to the periodic cell on
output, rather than only wrapping water molecules.  Coordinates
can be wrapped to the true nearest image for hexagonal or similar
highly faceted periodic cells.

----------------------------------------------------------------------

Running NAMD

NAMD runs on a variety of serial and parallel platforms.  While it is
trivial to launch a serial program, a parallel program depends on a
platform-specific library such as MPI to launch copies of itself on
other nodes and to provide access to a high performance network such
as Myrinet if one is available.

For typical workstations (Windows, Linux, Mac OS X, or other Unix)
with only ethernet networking (100 Megabit or Gigabit), NAMD uses the
Charm++ native communications layer and the program charmrun to launch
namd2 processes for parallel runs (either exclusively on the local
machine with the ++local option or on other hosts as specified by a
nodelist file).  The namd2 binaries for these platforms can also be
run directly (known as standalone mode) for single process runs.

For workstation clusters and other massively parallel machines with
special high-performance networking, NAMD uses the system-provided
MPI library (with a few exceptions) and standard system tools such as
mpirun are used to launch jobs.  Since MPI libraries are very often
incompatible between versions, you will likely need to recompile NAMD
and its underlying Charm++ libraries to use these machines in parallel
(the provided non-MPI binaries should still work for serial runs.)
The provided charmrun program for these platforms is only a script
that attempts to translate charmrun options into mpirun options, but
due to the diversity of MPI libraries it often fails to work.

-- Individual Windows, Linux, Mac OS X, or Other Unix Workstations --

Individual workstations use the same version of NAMD as workstation
networks, but running NAMD is much easier.  If your machine has only
one processor you can run the namd2 binary directly:

  namd2 <configfile>

For multiprocessor workstations, Windows and Solaris released binaries
are based on SMP versions of Charm++ that can run multiple threads.
For best performance use one thread per processor with the +p option:

  namd2 +p<procs> <configfile>

Since the SMP versions of NAMD are relatively new, there may be bugs
that are only present when running multiple threads.  You may want to
try running with charmrun (see below) if you experience crashes.

For other multiprocessor workstations the included charmrun program is
needed to run multiple namd2 processes.  The ++local option is also
required to specify that only the local machine is being used:

  charmrun namd2 ++local +p<procs> <configfile>

You may need to specify the full path to the namd2 binary.

-- Linux, Mac OS X, or Other Unix Workstation Networks --

The same binaries used for individual workstations as described above
can be used with charmrun to run in parallel on a workstation network.
The only difference is that you must provide a "nodelist" file listing
the machines where namd2 processes should run, for example:

  group main
  host brutus
  host romeo

The "group main" line defines the default machine list.  Hosts brutus
and romeo are the two machines on which to run the simulation.  Note
that charmrun may run on one of those machines, or charmrun may run
on a third machine.  All machines used for a simulation must be of the
same type and have access to the same namd2 binary.

By default, the "rsh" command ("remsh" on HPUX) is used to start namd2
on each node specified in the nodelist file.  You can change this via
the CONV_RSH environment variable, i.e., to use ssh instead of rsh run
"setenv CONV_RSH ssh" or add it to your login or batch script.  You
must be able to connect to each node via rsh/ssh without typing your
password; this can be accomplished via a .rhosts files in your home
directory, by an /etc/hosts.equiv file installed by your sysadmin, or
by a .ssh/authorized_keys file in your home directory.  You should
confirm that you can run "ssh hostname pwd" (or "rsh hostname pwd")
without typing a password before running NAMD.  Contact your local
sysadmin if you have difficulty setting this up.  If you are unable to
use rsh or ssh, then add "setenv CONV_DAEMON" to your script and run 
charmd (or charmd_faceless, which produces a log file) on every node.

You should now be able to try running NAMD as:

  charmrun namd2 +p<procs> <configfile>

If this fails or just hangs, try adding the ++verbose option to see
more details of the startup process.  You may need to specify the full
path to the namd2 binary.  Charmrun will start the number of processes
specified by the +p option, cycling through the hosts in the nodelist
file as many times as necessary.  You may list multiprocessor machines
multiple times in the nodelist file, once for each processor.

You may specify the nodelist file with the "++nodelist" option and the
group (which defaults to "main") with the "++nodegroup" option.  If
you do not use "++nodelist" charmrun will first look for "nodelist"
in your current directory and then ".nodelist" in your home directory.

Some automounters use a temporary mount directory which is prepended
to the path returned by the pwd command.  To run on multiple machines
you must add a "++pathfix" option to your nodelist file.  For example:

  group main ++pathfix /tmp_mnt /
  host alpha1
  host alpha2

There are many other options to charmrun and for the nodelist file.
These are documented at in the Charm++ Installation and Usage Manual
available at http://charm.cs.uiuc.edu/manuals/ and a list of available
charmrun options is available by running charmrun without arguments.

If your workstation cluster is controlled by a queueing system you
will need build a nodelist file in your job script.  For example, if
your queueing system provides a $HOST_FILE environment variable:

  set NODES = `cat $HOST_FILE`
  set NODELIST = $TMPDIR/namd2.nodelist
  echo group main >! $NODELIST
  foreach node ( $nodes )
    echo host $node >> $NODELIST
  end
  @ NUMPROCS = 2 * $#NODES
  charmrun namd2 +p$NUMPROCS ++nodelist $NODELIST <configfile>

Note that $NUMPROCS is twice the number of nodes in this example.
This is the case for dual-processor machines.  For single-processor
machines you would not multiply $#NODES by two.

Note that these example scripts and the setenv command are for the csh
or tcsh shells.  They must be translated to work with sh or bash.

-- Windows Workstation Networks ---

Windows is the same as other workstation networks described above,
except that rsh is not available on this platform.  Instead, you must
run the provided daemon (charmd.exe) on every node listed in the
nodelist file.  Using charmd_faceless rather than charmd will eliminate
consoles for the daemon and node processes.

-- BProc-Based Clusters (Scyld and Clustermatic) --

Scyld and Clustermatic replace rsh and other methods of launching jobs
via a distributed process space.  There is no need for a nodelist file
or any special daemons, although special Scyld or Clustermatic versions
of charmrun and namd2 are required.  In order to allow access to files,
the first NAMD process must be on the master node of the cluster.
Launch jobs from the master node of the cluster via the command:

  charmrun namd2 +p<procs> <configfile>

For best performance, run a single NAMD job on all available nodes and
never run multiple NAMD jobs at the same time.  You should probably
determine the number of processors via a script, for example on Scyld:

  @ NUMPROCS = `bpstat -u` + 1
  charmrun namd2 +p$NUMPROCS <configfile>

You may safely suspend and resume a running NAMD job on these clusters
using kill -STOP and kill -CONT on the process group.  Queueing systems
typically provide this functionality, allowing you to suspend a running
job to allow a higher priority job to run immediately.

-- Compaq AlphaServer SC --

If your machine as a Quadrics interconnect you should use the Elan
version of NAMD, other wise select the normal MPI version.  In either
case, parallel jobs are run using the "prun" command as follows:

  prun -n <procs> <configfile>

There are additional options.  Consult your local documentation.

-- IBM RS/6000 SP --

Run the MPI version of NAMD as you would any POE program.  The options
and environment variables for poe are various and arcane, so you should
consult your local documentation for recommended settings.  As an
example, to run on Blue Horizon one would specify:

  poe namd2 <configfile> -nodes <procs/8> -tasks_per_node 8

-- Cray T3E --

The T3E version has been tested on the Pittsburgh Supercomputer Center
T3E.  To run on <procs> processors, use the mpprun command:

  mpprun -n <procs> namd2 <configfile>

-- Origin 2000 --

For small numbers of processors (1-8) use the non-MPI version of namd2.
If your stack size limit is unlimited, which DQS may do, you will need
to set it with "limit stacksize 64M" to run on multiple processors.
To run on <procs> processors call the binary directly with the +p option:

  namd2 +p<procs> <configfile>

For better performance on larger numbers of processors we recommend
that you use the MPI version of NAMD.  To run this version, you must
have MPI installed.  Furthermore, you must set two environment
variables to tell MPI how to allocate certain internal buffers.  Put
the following commands in your .cshrc or .profile file, or in your
job file if you are running under a queuing system:

  setenv MPI_REQUEST_MAX 10240
  setenv MPI_TYPE_MAX 10240

Then run NAMD with the following command:

  mpirun -np <procs> namd2 <configfile>

----------------------------------------------------------------------

Memory Usage

NAMD has traditionally used less than 100MB of memory even for systems
of 100,000 atoms.  With the reintroduction of pairlists in NAMD 2.5,
however, memory usage for a 100,000 atom system with a 12A cutoff can
approach 300MB, and will grow with the cube of the cutoff.  This extra
memory is distributed across processors during a parallel run, but a
single workstation may run out of physical memory with a large system.

To avoid this, NAMD now provides a pairlistMinProcs config file option
that specifies the minimum number of processors that a run must use
before pairlists will be enabled (on fewer processors small local
pairlists are generated and recycled rather than being saved, the
default is "pairlistMinProcs 1").  This is a per-simulation rather than
a compile time option because memory usage is molecule-dependent.

----------------------------------------------------------------------

Improving Parallel Scaling

While NAMD is designed to be a scalable program, particularly for
simulations of 100,000 atoms or more, at some point adding additional
processors to a simulation will provide little or no extra performance.
If you are lucky enough to have access to a parallel machine you should
measure NAMD's parallel speedup for a variety of processor counts when
running your particular simulation.  The easiest and most accurate way
to do this is to look at the "Benchmark time:" lines that are printed
after 20 and 25 cycles (usually less than 500 steps).  You can monitor
performance during the entire simulation by adding "outputTiming <N>"
to your configuration file, but be careful to look at the "wall time"
rather than "CPU time" fields on the "TIMING:" output lines produced.
For an external measure of performance, you should run simulations of
both 25 and 50 cycles (see the stepspercycle parameter) and base your
estimate on the additional time needed for the longer simulation in
order to exclude startup costs and allow for initial load balancing.

We provide both standard (UDP) and new TCP based precompiled binaries
for Linux clusters.  We have observed that the TCP version is better
on our dual processor clusters with gigabit ethernet while the basic
UDP version is superior on our single processor fast ethernet cluster.
When using the UDP version with gigabit you can add the +giga option
to adjust several tuning parameters.  Additional performance may be
gained by building NAMD against an SMP version of Charm++ such as
net-linux-smp or net-linux-smp-icc.  This will use a communication
thread for each process to respond to network activity more rapidly.
For dual processor clusters we have found it that running two separate
processes per node, each with its own communication thread, is faster
than using the charmrun ++ppn option to run multiple worker threads.
However, we have observed that when running on a single hyperthreaded
processor (i.e., a newer Pentium 4) there is an additional 15% boost
from running standalone with two threads (namd2 +p2) beyond running
two processors (charmrun namd2 ++local +p2).  For a cluster of single
processor hyperthreaded machines an SMP version should provide very
good scaling running one process per node since the communication
thread can run very efficiently on the second virtual processor.  We
are unable to ship an SMP build for Linux due to portability problems
with the Linux pthreads implementation needed by Charm++.  The new
NPTL pthreads library in RedHat 9 fixes these problems so an SMP port
can become the standard shipping binary version in the future.

On some large machines with very high bandwidth interconnects you may
be able to increase performance for PME simulations by adding either
"+strategy USE_MESH" or "+strategy USE_GRID" to the command line.
These flags instruct the Charm++ communication optimization library to
reduce the number of messages sent during PME 3D FFT by combining data
into larger messages to be transmitted along each dimension of either
a 2D mesh or a 3D grid, respectively.  While reducing the number of
messages sent per processor from N to 2*sqrt(N) or 3*cbrt(N), the
total amount of data transmitted for the FFT is doubled or tripled.

Extremely short cycle lengths (less than 10 steps) will also limit
parallel scaling, since the atom migration at the end of each cycle
sends many more messages than a normal force evaluation.  Increasing
pairlistdist from, e.g., cutoff + 1.5 to cutoff + 2.5, while also
doubling stepspercycle from 10 to 20, may increase parallel scaling,
but it is important to measure.  When increasing stepspercycle, also
try increasing pairlistspercycle by the same proportion.

----------------------------------------------------------------------

Automated Usage Reporting

NAMD may print information similar to the following at startup:

 Info: Sending usage information to NAMD developers via UDP.
 Info: Sent data is: 1 NAMD  2.2  Solaris-Sparc  1    verdun  jim

The information you see listed is sent via a single UDP packet and
logged for future analysis.  The UDP protocol does not retransmit
lost packets and does not attempt to establish a connection.  We will
use this information only for statistical analysis to determine how
frequently NAMD is actually being used.  Your user and machine name
are only used for counting unique users, they will not be correlated
with your download registration data.

We collect this information in order to better justify continued
development of NAMD to the NIH, our primary source of funding.  We
also use collected information to direct NAMD development efforts.
If you are opposed to this reporting, you are welcome to recompile
NAMD and disable it.  Usage reporting is disabled in the release
Windows binaries, but will be added in a later version.

----------------------------------------------------------------------

Endian Issues

Some architectures write binary data (integer or floating point) with
the most significant byte first; others put the most significant byte
last.  This doesn't effect text files but it does matter when a binary
data file that was written on a "big-endian" machine (Sun, HP, SGI) is
read on a "small-endian" machine (Intel, Alpha) or vice versa.

NAMD generates DCD trajectory files and binary coordinate and velocity
files which are "endian-sensitive".  While VMD can now read DCD files
from any machine and NAMD reads most other-endian binary restart files,
many analysis programs (like CHARMM or X-PLOR) require same-endian DCD
files.  We provide the programs flipdcd and flipbinpdb for switching the
endianness of DCD and binary restart files, respectively.  These programs
use mmap to alter the file in-place and may therefore appear to consume
an amount of memory equal to the size of the file.

----------------------------------------------------------------------

Problems?  Found a bug?

For problems or questions, send email to namd@ks.uiuc.edu.  If you
think you have found a bug, please follow the steps outlined below.
Your feedback will help us improve NAMD.

 1. Download and test the latest version of NAMD. 
 2. Please check the FAQ, known bugs and problem reports. 
 3. Gather, in a single directory, all input and config files needed
    to reproduce your problem. 
 4. Run once, redirecting output to a file. 
 5. Tar everything up (but not the namd2 or charmrun binaries) and
    compress it. 
 6. Email namd@ks.uiuc.edu with: 
    - A synopsis of the problem as the subject. 
    - The NAMD version number the problem occurs with. 
    - The platform and number of CPU's the problem occurs with. 
    - A description of the problematic behavior and any error messages. 
    - If the problem is consistent or random. 
    - The compressed tar file as an attachment (or a URL if it is too big). 
 7. We'll get back to you with further questions or suggestions.

----------------------------------------------------------------------

Compiling NAMD

Building a complete NAMD binary from source code requires working
C and C++ compilers, Charm++/Converse, TCL, FFTW, and the VMD molfile
plugins.  NAMD will compile without TCL, FFTW or plugins but certain
features will be disabled.  Fortunately, precompiled libraries are
available from http://www.ks.uiuc.edu/Research/namd/libraries/.  You
must enable these options by specifying "tcl fftw plugins" as options
when you run the config script and some files in arch may need editing.

As an example, here is the build sequence for Linux workstations:

Download TCL, FFTW, and plugins libraries:
  mkdir fftw;  cd fftw
  wget http://www.ks.uiuc.edu/Research/namd/libraries/fftw-linux.tar.gz
  tar xzf fftw-linux.tar.gz
  cd ..;  mkdir plugins;  cd plugins
  wget http://www.ks.uiuc.edu/Research/namd/libraries/plugins-LINUX.tar.gz
  tar xzf plugins-LINUX.tar.gz
  cd ..;  mkdir tcl;  cd tcl
  wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl-linux.tar.gz
  tar xzf tcl-linux.tar.gz

Unpack NAMD and matching Charm++ source code and enter directory:
  tar xzf NAMD_2.5_Source.tar.gz
  cd NAMD_2.5_Source
  tar xf charm.tar
  cd charm

Build the Charm++/Converse library:
  ./build charm++ net-linux -O -DCMK_OPTIMIZE=1
  cd ..

Edit various configuration files:
  vi Make.charm  (set CHARMBASE to .rootdir/charm or full path to charm)
  vi arch/Linux-i686.fftw     (fix library name and path to files)
  vi arch/Linux-i686.plugins  (fix library name and path to files)
  vi arch/Linux-i686.tcl      (fix library version and path to TCL files)

Set up build directory and compile:
  ./config tcl fftw plugins Linux-i686-g++
  cd Linux-i686-g++ 
  make

Quick tests using one and two processes:
  ./namd2
  ./namd2 src/alanin
  ./charmrun ++local +p2 ./namd2
  ./charmrun ++local +p2 ./namd2 src/alanin

That's it.  A more complete explanation of the build process follows.
Note that you will need Cygwin to compile NAMD on Windows.

Download and unpack fftw, plugins, and tcl libraries for your platform
from http://www.ks.uiuc.edu/Research/namd/libraries/.  Each tar file
contains a directory with the name of the platform, and the plugins
platform names are different from fftw and tcl.  These libraries don't
change very often, so you should find a permanent home for them.

Unpack the NAMD source code and the enclosed charm.tar archive.  This
version of Charm++ is the same one used to build the released binaries
and is more likely to work and be bug free than any other we know of.
Edit Make.charm to point at .rootdir/charm or the full path to the
charm directory if you unpacked outside of the NAMD source directory.

Run the config script without arguments to list the available builds,
which have names like Linux-i686-Clustermatic-TCP-icc.  Each build or
"ARCH" of the form BASEARCH-options-compiler, where BASEARCH is the
most generic name for a platform, like Linux-i686.  The options are
things like Scyld, TCP, SMP, or Elan.

Edit arch/BASEARCH.fftw, arch/BASEARCH.plugins, and arch/BASEARCH.tcl
to point to the libraries you downloaded.  View arch/ARCH.arch and
look for a line like "CHARMARCH = net-linux-clustermatic-tcp-icc" to
find out what Charm++ platform you need to build.  The CHARMARCH name
is of the format comm-OS-cpu-options-compiler.  It is very important
that Charm++ and NAMD be built with the same C++ compiler.

Enter the charm directory and run the build script without options
to see a list of available platforms.  Only the comm-OS-cpu part will
be listed.  Any options or compiler tags are listed separately and
must be separated by spaces on the build command line.  Run the build
command for your platform as:

  ./build charm++ comm-OS-cpu options compiler -O -DCMK_OPTIMIZE=1

For this specific example:

  ./build charm++ net-linux clustermatic tcp icc -O -DCMK_OPTIMIZE=1

The README distributed with Charm++ contains a complete explanation.
You only actually need the bin, include, and lib subdirectories, so
you can copy those elsewhere and delete the whole charm directory,
but don't forget to edit Make.charm if you do this.

If you're building an MPI version you will probably need to edit
compiler flags or commands in the Charm++ src/arch directory.  The
file charm/src/arch/mpi-linux/conv-mach.sh contains the definitions
that select the mpiCC compiler for mpi-linux, while other compiler
choices are defined by files in charm/src/arch/common/.

Now you can run the NAMD config script to set up a build directory:

  ./config tcl fftw plugins ARCH

For this specific example:

  ./config tcl fftw plugins Linux-i686-Clustermatic-TCP-icc

This will create a build directory Linux-i686-Clustermatic-TCP-icc.
If you wish to create this directory elsewhere use config DIR/ARCH,
replacing DIR with the location the build directory should be created.
A symbolic link to the remote directory will be created as well.  You
can create multiple build directories for the same ARCH by adding a
suffix.  These can be combined, of course, as in:

  ./config tcl fftw plugins /tmp/Linux-i686-Clustermatic-TCP-icc.test1

Now cd to your build directory and type make.  The namd2 binary and
a number of utilities will be created.

If you have trouble building NAMD your compiler may be different from
ours.  The architecture-specific makefiles in the arch directory use
several options to elicit similar behavior on all platforms.  Your
compiler may conform to an earlier C++ specification than NAMD uses.
You compiler may also enforce a later C++ rule than NAMD follows.
You may ignore repeated warnings about new and delete matching.

----------------------------------------------------------------------