MITgcm

5628a2becd Chri*0001 Miscellaneous notes relating to MITgcm UV
                0002 =========================================
                0003
62fd505404 Chri*0004 This files form is close to that of an FAQ. If you are having
                0005 a problem getting the model to behave as you might expect they
                0006 may be some helpful clues in this file.
                0007
                0008
5628a2becd Chri*0009 o Something really weird is happening - variables keep
                0010   changing value!
                0011
                0012   Apart from the usual problems of out of bounds array refs.
                0013   and various bugs itis important to be sure that "stack"
                0014   variables really are stack variables in multi-threaded execution.
                0015   Some compilers put subroutines local variables in static storage.
                0016   This can result in an apparently private variable in a local
                0017   routine being mysteriously changed by concurrently executing
                0018   thread.
                0019
                0020                     =====================================
                0021
                0022 o Something really weird is happening - the code gets stuck in
                0023   a loop somewhere!
                0024
                0025   The routines in barrier.F should be compiled without any
                0026   optimisation. The routines check variables that are updated by other threads
                0027   Compiler optimisations generally assume that the code being optimised
                0028   will obey the sequential semantics of regular Fortran. That means they
                0029   will assume that a variable is not going to change value unless the
                0030   code it is optimising changes it. Obviously this can cause problems.
                0031
                0032                     =====================================
                0033
                0034 o Is the Fortran SAVE statement a problem.
                0035
                0036   Yes. On the whole the Fortran SAVE statement should not be used
                0037   for data in a multi-threaded code. SAVE causes data to be held in
                0038   static storage meaning that all threads will see the same location.
                0039   Therefore, generally if one thread updates the location all other threads
                0040   will see it. Note - there is often no specification for what should happen
                0041   in this situation in a multi-threaded environment, so this is
                0042   not a robust machanism for sharing data.
                0043   For most cases where SAVE might be appropriate either of the following
                0044   recipes should be used instead. Both these schemes are potential
                0045   performance bottlenecks if they are over-used.
                0046   Method 1
                0047   ********
                0048    1. Put the SAVE variable in a common block
                0049    2. Update the SAVE variable in a _BEGIN_MASTER, _END_MASTER block.
                0050    3. Include a _BARRIER after the _BEGIN_MASTER, _END_MASTER block.
                0051    e.g
                0052    C nIter - Current iteration counter
                0053    COMMON /PARAMS/ nIter
                0054    INTEGER nIter
                0055
                0056    _BEGIN_MASTER(myThid)
                0057     nIter = nIter+1
                0058    _END_MASTER(myThid)
                0059    _BARRIER
                0060
                0061    Note. The _BARRIER operation is potentially expensive. Be conservative
                0062          in your use of this scheme.
                0063
                0064   Method 2
                0065   ********
                0066    1. Put the SAVE variable in a common block but with an extra dimension
                0067       for the thread number.
                0068    2. Change the updates and references to the SAVE variable to a per thread
                0069       basis.
                0070    e.g
                0071    C nIter - Current iteration counter
                0072    COMMON /PARAMS/ nIter
                0073    INTEGER nIter(MAX_NO_THREADS)
                0074
                0075     nIter(myThid) = nIter(myThid)+1
                0076
                0077    Note. nIter(myThid) and nIter(myThid+1) will share the same
                0078          cache line. The update will cause extra low-level memory
                0079          traffic to maintain cache coherence. If the update is in
                0080          a tight loop this will be a problem and nIter will need
                0081          padding.
                0082          In a NUMA system nIter(1:MAX_NO_THREADS) is likely to reside
                0083          in a single page of physical memory on a single box. Again in
                0084          a tight loop this would cause lots of remote/far memory references
                0085          and would be a problem. Some compilers provide a machanism
                0086          for helping overcome this problem.
                0087
                0088                     =====================================
                0089
                0090 o Can I debug using write statements.
                0091
                0092   Many systems do not have "thread-safe" Fortran I/O libraries.
                0093   On these systems I/O generally orks but it gets a bit intermingled!
                0094   Occaisionally doing multi-threaded I/O with an unsafe  Fortran I/O library
                0095   will actual cause the program to fail. Note: SGI has a "thread-safe" Fortran
                0096   I/O library.
                0097
                0098                     =====================================
                0099
                0100 o Mapping virtual memory to physical memory.
                0101
                0102   The current code declares arrays as
                0103        real aW2d (1-OLx:sNx+OLx,1-OLy:sNy+OLy,nSx,nSy)
                0104   This raises an issue on shared virtual-memory machines that have
                0105   an underlying non-uniform memory subsystem e.g. HP Exemplar, SGI
                0106   Origin, DG, Sequent etc.. . What most machines implement is a scheme
                0107   in which the physical memory that backs the virtual memory is allocated
                0108   on a page basis at
                0109   run-time. The OS manages this allocation and without exception
                0110   pages are assigned to physical memory on the box where the thread
                0111   which caused the page-fault is running. Pages are typically 4-8KB in
                0112   size. This means that in some environments it would make sense to
                0113   declare arrays
                0114        real aW2d (1-OLx:sNx+OLx+PX,1-OLy:sNy+OLy+PY,nSx,nSy)
                0115   where PX and PY are chosen so that the divides between near and
                0116   far memory will coincide with the boundaries of the virtual memory
                0117   regions a thread works on. In principle this is easy but it is
                0118   also inelegant and really one would like the OS/hardware to take
                0119   care of this issue. Doing it oneself requires PX and PY to be recalculated whenever
                0120   the mapping of the nSx, nSy blocks to nTx and nTy threads is changed. Also
                0121   different PX and PY are required depending on
                0122    page size
                0123    array element size ( real*4, real*8 )
                0124    array dimensions ( 2d, 3d Nz, 3d Nz+1 ) - in 3d a PZ would also be needed!
                0125   Note: 1. A C implementation would be a lot easier. An F90 including allocation
                0126            would also be fairly straightforward.
                0127         2. The padding really ought to be between the "collection" of blocks
                0128            that all the threads using the same near memory work on. To save on wasted
                0129            memory the padding really should be between these blocks. The
                0130            PX, PY, PZ mechanism does this three levels down on the heirarchy. This
                0131            wastes more memory.
                0132         3. For large problems this is less of an issue. For a large problem
                0133            even for a 2d array there might be say 16 pages per array per processor
                0134            and at least 4 processors in a uniform memory access box. Assuming a
                0135            sensible mapping of processors to blocks only one page (1.5% of the
                0136            memory) referenced by processors in another box.
                0137            On the other hand for a very small per processor problem size e.g.
                0138            32x32 per processor and again four processors per box as many as
                0139            50% of the memory references could be to far memory for 2d fields.
                0140            This could be very bad!
                0141
                0142                     =====================================
                0143
                0144                     =====================================
                0145
                0146                     =====================================
                0147
                0148                     =====================================
                0149
                0150                     =====================================
                0151
                0152                     =====================================