From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Mon Sep 09 2013 - 13:02:37 CDT
On Mon, Sep 9, 2013 at 7:24 PM, Kenno Vanommeslaeghe
<kvanomme_at_rx.umaryland.edu> wrote:
> Ah, so we're having a little nitpicking contest here. My turn. :-)
>
> In principle, even floating point math is deterministic (though not
> reversible) when executing the same binary instructions on the same model of
> processor. In CHARMM (which also uses floating-point, as does every MD code
> I know of), we can perfectly get the same trajectory multiple times if *all*
> of the following conditions are satisfied:
> - we're running on 1 core (this also means: no GPU allowed)
> - the PRNG seed is the same
> - the binary is the same (no recompiling allowed, especially not with a
> different compiler version, different flags or different bitness)
> - the operating system is the same, including
> * same libraries
> * same kernel version
> * same bitness (32 vs. 64)
> - the CPU model is the same. Not only the manufacturer but also the model.
this is very important, since glibc and the intel compiler runtime
libraries have runtime relocations, that will replace generic
functions with CPU model specific versions, e.g. based on availability
of SSE/AVX/FMA and so on.
> - (when using non-ECC memory) the memory is not defective
- and you need to enforce a consistent rounding mode that does not
allow denormalized numbers. otherwise you can still have some
randomness creep in when you have numbers that very, very close but
not quite zero (like 1e-300). this one, however, is extremely
unlikely. but then again, if you run with a very large number of
atoms...
- and if you run on a multi-core CPU, you have to set processor
affinity so that the process doesn't get bounced around to a different
CPU core.
- and you have to make certain, that there are no actions that are
triggered by the amount of time passed or other unexpected
things like signal handlers etc.
at this point, writing an MD kernel in fixed point math almost seems
like the easier undertaking... ;-)
ciao,
axel.
> Now, unless NAMD has some really tricky optimizations that give rise to
> different associative math even when running in serial, I'd expect the same
> to be true for NAMD. You (Hailey) said you ran on a different CPU. If it's a
> different CPU *model*, then there's your explanation. Otherwise, *if* *all*
> the above conditions are satisfied and you're not using ECC memory, then
> your observations arouse some suspicion about the memory. Memory errors
> happen more than one would think.
>
> All that said, Axel's main point still stands. You're simulating a chaotic
> system in which an infinitesimal perturbation will quickly produce a
> completely different outcome in terms of positions and velocities. Not one
> of these outcomes is more "correct" than another; what counts is the
> statistics.
>
>
>
>
> On 09/09/2013 10:06 AM, Axel Kohlmeyer wrote:
>>
>> On Mon, Sep 9, 2013 at 3:31 PM, Jérôme Hénin <jerome.henin_at_ibpc.fr> wrote:
>>>
>>> Dear Axel,
>>>
>>> If I really want to be an extreme nitpicker - and who doesn't!!! - I must
>>> say that together with fixed-point math, you could use a deterministic
>>> thermostat or barostat (which I think they all are, as long as the PRNG is
>>> and the math is associative).
>>
>>
>> yes, they are forward deterministic, but you cannot reverse the
>> trajectory and get back to the beginning.
>>
>> ciao,
>> axel.
>>
>> p.s.: it is fun to outnitpick an experienced nitpicker. ;-)
>>
>>>
>>> You're most welcome.
>>> Jérôme
>>>
>>>
>>> ----- Original Message -----
>>>>
>>>> On Mon, Sep 9, 2013 at 3:07 PM, Hailey Bureau
>>>> <hailey.bureau_at_gmail.com> wrote:
>>>>>
>>>>> Hi Norman,
>>>>>
>>>>> Thanks for your response!
>>>>>
>>>>> I actually am using only one cpu, as I mentioned in my earlier
>>>>> email (granted it isn't the same cpu everytime, in which case I
>>>>> don't know if that would affect the results; perhaps it would).
>>>>> The thing that is keeping me stuck is that I can generate the
>>>>> exact same data using vacuum and implicit solvent conditions, and
>>>>> it seems that only in the case of explicit solvent I am having
>>>>> trouble reproducing data. I am just wondering if there is
>>>>> something specifically going on in only the case of explicit
>>>>> solvent where I cannot reproduce data. However, what's even more
>>>>> puzzling to me is that I can reproduce *some trajectories. For
>>>>> example, between two batches of 5 trajectories each coming from
>>>>> the same 5 seeds, one or two of them will end up identical. The
>>>>> others; however, do not. Any further insight you might have would
>>>>> be greatly appreciated.
>>>>
>>>>
>>>> if you want perfectly reproducible (and reversible!) trajectories,
>>>> you
>>>> will have to write an MD code that uses fixed point math instead of
>>>> floating point and you must not use any thermostat or barostat.
>>>> floating point math is not associative and thus your trajectories
>>>> will
>>>> sooner or later diverge, since there are certain conditions that will
>>>> trigger the tiniest bit change and from then on you will get
>>>> exponentially diverging trajectories. MD is solving a system of
>>>> linear
>>>> partial differential equations, which exhibits chaotic behavior. of
>>>> course, the more items are involved, the larger the probability of an
>>>> event initiating the divergence.
>>>>
>>>> that being said, whether your trajectories are perfectly reproducible
>>>> has no relevance whether they are correct or not. in fact, often this
>>>> divergence is desirable, as you can quickly produce decorrelated
>>>> trajectories, which allows you to increase phase space sampling
>>>> through concurrent simulations (cf. parallel replica MD). the final
>>>> coordinates of a simulation (in equilibrium) are of little to no
>>>> relevance, averages however are and those should converge to
>>>> consistent results regardless of whether you get diverging
>>>> trajectories or not.
>>>>
>>>> axel.
>>>>
>>>>
>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> -Hailey
>>>>>
>>>>>
>>>>> On Sep 9, 2013, at 2:26 AM, "Norman Geist"
>>>>> <norman.geist_at_uni-greifswald.de> wrote:
>>>>>
>>>>>> Hi Hailey,
>>>>>>
>>>>>> 1st thing to mention is that the parallelization itself can change
>>>>>> results
>>>>>> slightly. This happens due varying orders of arriving part results
>>>>>> that get
>>>>>> computed together. If you changed the number of processors between
>>>>>> the 5
>>>>>> trajectories you mentioned, this could be a 1st likely reason.
>>>>>> Furthermore,
>>>>>> but only an assumption, the load balancer can have additional
>>>>>> impact as
>>>>>> individual core performance can have a random nature and not every
>>>>>> core is
>>>>>> as fast as the others, even if same model. Moreover I'm not sure
>>>>>> if the
>>>>>> langevin thermostat comes with random forces.
>>>>>>
>>>>>> To clarify all these things, try using only a single cpu core for
>>>>>> some
>>>>>> tests. This will eliminate the load balancer and the
>>>>>> parallelization and
>>>>>> will show if other thing in your simulation come with any
>>>>>> randomness.
>>>>>>
>>>>>> Norman Geist.
>>>>>>
>>>>>>
>>>>>>> -----Ursprüngliche Nachricht-----
>>>>>>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu]
>>>>>>> Im
>>>>>>> Auftrag von Hailey Bureau
>>>>>>> Gesendet: Montag, 9. September 2013 00:54
>>>>>>> An: NAMD list
>>>>>>> Betreff: namd-l: explicit NVT simulation
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am running an explicit NVT simulation and I am having trouble
>>>>>>> reproducing data, using the same starting coordinates and random
>>>>>>> seed
>>>>>>> value. In a batch of 5 trajectories, that I run two times with
>>>>>>> identical seed values, I see different results. However,
>>>>>>> sometimes the
>>>>>>> trajectories do turn out the same. I have been trying to find any
>>>>>>> previous interest in this sort of problem and how to solve it,
>>>>>>> but I
>>>>>>> haven't had much luck. Has anyone encountered a problem like this
>>>>>>> before? I am running on one CPU; below is my starting
>>>>>>> configuration
>>>>>>> file:
>>>>>>>
>>>>>>>
>>>>>>> #############################################################
>>>>>>> ## ADJUSTABLE PARAMETERS ##
>>>>>>> #############################################################
>>>>>>> structure ../../../../00.struc/03.exp/00.psf
>>>>>>> coordinates ../../../../00.struc/03.exp/00.pdb
>>>>>>> outputName daOut
>>>>>>> #############################################################
>>>>>>> ## SIMULATION PARAMETERS ##
>>>>>>> #############################################################
>>>>>>> # Input
>>>>>>> seed xxxxx
>>>>>>> paraTypeCharmm on
>>>>>>> parameters ../../../../toppar/par_all27_prot_lipid.prm
>>>>>>> temperature 300
>>>>>>>
>>>>>>> # Force-Field Parameters
>>>>>>> exclude scaled1-4
>>>>>>> 1-4scaling 1.0
>>>>>>> cutoff 12.0
>>>>>>> switching on
>>>>>>> switchdist 10.0
>>>>>>> pairlistdist 13.5
>>>>>>>
>>>>>>> # Integrator Parameters
>>>>>>> timestep 2.0 ;# 2fs/step
>>>>>>> rigidBonds all ;# needed for 2fs steps
>>>>>>> nonbondedFreq 1
>>>>>>> fullElectFrequency 2
>>>>>>> stepspercycle 10
>>>>>>>
>>>>>>> # Constant Temperature Control
>>>>>>> langevin on ;# do langevin dynamics
>>>>>>> langevinDamping 5 ;# damping coefficient (gamma) of 5/ps
>>>>>>> langevinTemp 300
>>>>>>> langevinHydrogen no ;# don't couple langevin bath to
>>>>>>> hydrogens
>>>>>>>
>>>>>>> # Periodic Boundary conditions
>>>>>>> # NOTE: Do not set the periodic cell basis if you have also
>>>>>>> # specified an .xsc restart file!
>>>>>>> if {1} {
>>>>>>> cellBasisVector1 27.0 0.0 0.0
>>>>>>> cellBasisVector2 0.0 24.0 0.0
>>>>>>> cellBasisVector3 0.0 0.0 54.0
>>>>>>> cellOrigin 2.28 -0.35 16.49
>>>>>>> }
>>>>>>> wrapWater on
>>>>>>> wrapAll on
>>>>>>>
>>>>>>> # PME (for full-system periodic electrostatics)
>>>>>>> if {1} {
>>>>>>> PME yes
>>>>>>> #PMEGridSpacing 1.0
>>>>>>> #manual grid definition
>>>>>>> PMEGridSizeX 27
>>>>>>> PMEGridSizeY 24
>>>>>>> PMEGridSizeZ 54
>>>>>>> }
>>>>>>>
>>>>>>> # Constant Pressure Control (variable volume)
>>>>>>> useGroupPressure yes ;# needed for rigidBonds
>>>>>>> useFlexibleCell no
>>>>>>> useConstantArea no
>>>>>>>
>>>>>>> langevinPiston off
>>>>>>> #langevinPistonTarget 1.01325 ;# in bar -> 1 atm
>>>>>>> #langevinPistonPeriod 100.0
>>>>>>> #langevinPistonDecay 50.0
>>>>>>> #langevinPistonTemp 300
>>>>>>>
>>>>>>> # Output
>>>>>>> binaryoutput no
>>>>>>> dcdfreq 100 ;# 500steps = every 1ps
>>>>>>> outputEnergies 500
>>>>>>>
>>>>>>> #############################################################
>>>>>>> ## EXTRA PARAMETERS ##
>>>>>>> #############################################################
>>>>>>> # Tcl interface
>>>>>>> tclForces on
>>>>>>> tclForcesScript smdforce.tcl
>>>>>>>
>>>>>>> run 10000 ;# 20 ps
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>> International Centre for Theoretical Physics, Trieste. Italy.
>>>>
>>>>
>>
>>
>>
>
-- Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0 International Centre for Theoretical Physics, Trieste. Italy.
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:38 CST