Back to home page

MITgcm

 
 

    


Warning, /doc/old_doc/notes is written in an unsupported language. File is not indexed.

view on githubraw file Latest commit ad38444b on 2018-01-31 20:35:48 UTC
5628a2becd Chri*0001 Miscellaneous notes relating to MITgcm UV
                0002 =========================================
                0003 
62fd505404 Chri*0004 This files form is close to that of an FAQ. If you are having
                0005 a problem getting the model to behave as you might expect they
                0006 may be some helpful clues in this file.
                0007 
                0008 
5628a2becd Chri*0009 o Something really weird is happening - variables keep
                0010   changing value!
                0011 
                0012   Apart from the usual problems of out of bounds array refs.
                0013   and various bugs itis important to be sure that "stack"
                0014   variables really are stack variables in multi-threaded execution.
                0015   Some compilers put subroutines local variables in static storage.
                0016   This can result in an apparently private variable in a local
                0017   routine being mysteriously changed by concurrently executing
                0018   thread.
                0019 
                0020                     =====================================
                0021 
                0022 o Something really weird is happening - the code gets stuck in
                0023   a loop somewhere!
                0024 
                0025   The routines in barrier.F should be compiled without any
                0026   optimisation. The routines check variables that are updated by other threads
                0027   Compiler optimisations generally assume that the code being optimised 
                0028   will obey the sequential semantics of regular Fortran. That means they
                0029   will assume that a variable is not going to change value unless the
                0030   code it is optimising changes it. Obviously this can cause problems.
                0031 
                0032                     =====================================
                0033 
                0034 o Is the Fortran SAVE statement a problem.
                0035 
                0036   Yes. On the whole the Fortran SAVE statement should not be used
                0037   for data in a multi-threaded code. SAVE causes data to be held in 
                0038   static storage meaning that all threads will see the same location. 
                0039   Therefore, generally if one thread updates the location all other threads 
                0040   will see it. Note - there is often no specification for what should happen
                0041   in this situation in a multi-threaded environment, so this is
                0042   not a robust machanism for sharing data.
                0043   For most cases where SAVE might be appropriate either of the following 
                0044   recipes should be used instead. Both these schemes are potential 
                0045   performance bottlenecks if they are over-used. 
                0046   Method 1
                0047   ********
                0048    1. Put the SAVE variable in a common block
                0049    2. Update the SAVE variable in a _BEGIN_MASTER, _END_MASTER block.
                0050    3. Include a _BARRIER after the _BEGIN_MASTER, _END_MASTER block.
                0051    e.g
                0052    C nIter - Current iteration counter
                0053    COMMON /PARAMS/ nIter
                0054    INTEGER nIter
                0055 
                0056    _BEGIN_MASTER(myThid)
                0057     nIter = nIter+1
                0058    _END_MASTER(myThid)
                0059    _BARRIER
                0060 
                0061    Note. The _BARRIER operation is potentially expensive. Be conservative
                0062          in your use of this scheme.
                0063 
                0064   Method 2
                0065   ********
                0066    1. Put the SAVE variable in a common block but with an extra dimension
                0067       for the thread number.
                0068    2. Change the updates and references to the SAVE variable to a per thread 
                0069       basis.
                0070    e.g
                0071    C nIter - Current iteration counter
                0072    COMMON /PARAMS/ nIter
                0073    INTEGER nIter(MAX_NO_THREADS)
                0074  
                0075     nIter(myThid) = nIter(myThid)+1
                0076 
                0077    Note. nIter(myThid) and nIter(myThid+1) will share the same
                0078          cache line. The update will cause extra low-level memory 
                0079          traffic to maintain cache coherence. If the update is in
                0080          a tight loop this will be a problem and nIter will need
                0081          padding.
                0082          In a NUMA system nIter(1:MAX_NO_THREADS) is likely to reside
                0083          in a single page of physical memory on a single box. Again in
                0084          a tight loop this would cause lots of remote/far memory references
                0085          and would be a problem. Some compilers provide a machanism
                0086          for helping overcome this problem.
                0087 
                0088                     =====================================
                0089 
                0090 o Can I debug using write statements.
                0091 
                0092   Many systems do not have "thread-safe" Fortran I/O libraries.
                0093   On these systems I/O generally orks but it gets a bit intermingled!
                0094   Occaisionally doing multi-threaded I/O with an unsafe  Fortran I/O library
                0095   will actual cause the program to fail. Note: SGI has a "thread-safe" Fortran 
                0096   I/O library.
                0097 
                0098                     =====================================
                0099 
                0100 o Mapping virtual memory to physical memory.
                0101 
                0102   The current code declares arrays as
                0103        real aW2d (1-OLx:sNx+OLx,1-OLy:sNy+OLy,nSx,nSy)
                0104   This raises an issue on shared virtual-memory machines that have
                0105   an underlying non-uniform memory subsystem e.g. HP Exemplar, SGI 
                0106   Origin, DG, Sequent etc.. . What most machines implement is a scheme 
                0107   in which the physical memory that backs the virtual memory is allocated           
                0108   on a page basis at 
                0109   run-time. The OS manages this allocation and without exception
                0110   pages are assigned to physical memory on the box where the thread 
                0111   which caused the page-fault is running. Pages are typically 4-8KB in 
                0112   size. This means that in some environments it would make sense to   
                0113   declare arrays
                0114        real aW2d (1-OLx:sNx+OLx+PX,1-OLy:sNy+OLy+PY,nSx,nSy)
                0115   where PX and PY are chosen so that the divides between near and 
                0116   far memory will coincide with the boundaries of the virtual memory
                0117   regions a thread works on. In principle this is easy but it is
                0118   also inelegant and really one would like the OS/hardware to take
                0119   care of this issue. Doing it oneself requires PX and PY to be recalculated whenever 
                0120   the mapping of the nSx, nSy blocks to nTx and nTy threads is changed. Also
                0121   different PX and PY are required depending on
                0122    page size
                0123    array element size ( real*4, real*8 )
                0124    array dimensions ( 2d, 3d Nz, 3d Nz+1 ) - in 3d a PZ would also be needed!
                0125   Note: 1. A C implementation would be a lot easier. An F90 including allocation
                0126            would also be fairly straightforward.
                0127         2. The padding really ought to be between the "collection" of blocks
                0128            that all the threads using the same near memory work on. To save on wasted 
                0129            memory the padding really should be between these blocks. The 
                0130            PX, PY, PZ mechanism does this three levels down on the heirarchy. This
                0131            wastes more memory.
                0132         3. For large problems this is less of an issue. For a large problem
                0133            even for a 2d array there might be say 16 pages per array per processor
                0134            and at least 4 processors in a uniform memory access box. Assuming a 
                0135            sensible mapping of processors to blocks only one page (1.5% of the
                0136            memory) referenced by processors in another box.
                0137            On the other hand for a very small per processor problem size e.g.
                0138            32x32 per processor and again four processors per box as many as
                0139            50% of the memory references could be to far memory for 2d fields. 
                0140            This could be very bad!
                0141 
                0142                     =====================================
                0143 
                0144                     =====================================
                0145 
                0146                     =====================================
                0147 
                0148                     =====================================
                0149 
                0150                     =====================================
                0151 
                0152                     =====================================