Warning, /doc/old_doc/notes is written in an unsupported language. File is not indexed.
view on githubraw file Latest commit ad38444b on 2018-01-31 20:35:48 UTC
5628a2becd Chri*0001 Miscellaneous notes relating to MITgcm UV
0002 =========================================
0003
62fd505404 Chri*0004 This files form is close to that of an FAQ. If you are having
0005 a problem getting the model to behave as you might expect they
0006 may be some helpful clues in this file.
0007
0008
5628a2becd Chri*0009 o Something really weird is happening - variables keep
0010 changing value!
0011
0012 Apart from the usual problems of out of bounds array refs.
0013 and various bugs itis important to be sure that "stack"
0014 variables really are stack variables in multi-threaded execution.
0015 Some compilers put subroutines local variables in static storage.
0016 This can result in an apparently private variable in a local
0017 routine being mysteriously changed by concurrently executing
0018 thread.
0019
0020 =====================================
0021
0022 o Something really weird is happening - the code gets stuck in
0023 a loop somewhere!
0024
0025 The routines in barrier.F should be compiled without any
0026 optimisation. The routines check variables that are updated by other threads
0027 Compiler optimisations generally assume that the code being optimised
0028 will obey the sequential semantics of regular Fortran. That means they
0029 will assume that a variable is not going to change value unless the
0030 code it is optimising changes it. Obviously this can cause problems.
0031
0032 =====================================
0033
0034 o Is the Fortran SAVE statement a problem.
0035
0036 Yes. On the whole the Fortran SAVE statement should not be used
0037 for data in a multi-threaded code. SAVE causes data to be held in
0038 static storage meaning that all threads will see the same location.
0039 Therefore, generally if one thread updates the location all other threads
0040 will see it. Note - there is often no specification for what should happen
0041 in this situation in a multi-threaded environment, so this is
0042 not a robust machanism for sharing data.
0043 For most cases where SAVE might be appropriate either of the following
0044 recipes should be used instead. Both these schemes are potential
0045 performance bottlenecks if they are over-used.
0046 Method 1
0047 ********
0048 1. Put the SAVE variable in a common block
0049 2. Update the SAVE variable in a _BEGIN_MASTER, _END_MASTER block.
0050 3. Include a _BARRIER after the _BEGIN_MASTER, _END_MASTER block.
0051 e.g
0052 C nIter - Current iteration counter
0053 COMMON /PARAMS/ nIter
0054 INTEGER nIter
0055
0056 _BEGIN_MASTER(myThid)
0057 nIter = nIter+1
0058 _END_MASTER(myThid)
0059 _BARRIER
0060
0061 Note. The _BARRIER operation is potentially expensive. Be conservative
0062 in your use of this scheme.
0063
0064 Method 2
0065 ********
0066 1. Put the SAVE variable in a common block but with an extra dimension
0067 for the thread number.
0068 2. Change the updates and references to the SAVE variable to a per thread
0069 basis.
0070 e.g
0071 C nIter - Current iteration counter
0072 COMMON /PARAMS/ nIter
0073 INTEGER nIter(MAX_NO_THREADS)
0074
0075 nIter(myThid) = nIter(myThid)+1
0076
0077 Note. nIter(myThid) and nIter(myThid+1) will share the same
0078 cache line. The update will cause extra low-level memory
0079 traffic to maintain cache coherence. If the update is in
0080 a tight loop this will be a problem and nIter will need
0081 padding.
0082 In a NUMA system nIter(1:MAX_NO_THREADS) is likely to reside
0083 in a single page of physical memory on a single box. Again in
0084 a tight loop this would cause lots of remote/far memory references
0085 and would be a problem. Some compilers provide a machanism
0086 for helping overcome this problem.
0087
0088 =====================================
0089
0090 o Can I debug using write statements.
0091
0092 Many systems do not have "thread-safe" Fortran I/O libraries.
0093 On these systems I/O generally orks but it gets a bit intermingled!
0094 Occaisionally doing multi-threaded I/O with an unsafe Fortran I/O library
0095 will actual cause the program to fail. Note: SGI has a "thread-safe" Fortran
0096 I/O library.
0097
0098 =====================================
0099
0100 o Mapping virtual memory to physical memory.
0101
0102 The current code declares arrays as
0103 real aW2d (1-OLx:sNx+OLx,1-OLy:sNy+OLy,nSx,nSy)
0104 This raises an issue on shared virtual-memory machines that have
0105 an underlying non-uniform memory subsystem e.g. HP Exemplar, SGI
0106 Origin, DG, Sequent etc.. . What most machines implement is a scheme
0107 in which the physical memory that backs the virtual memory is allocated
0108 on a page basis at
0109 run-time. The OS manages this allocation and without exception
0110 pages are assigned to physical memory on the box where the thread
0111 which caused the page-fault is running. Pages are typically 4-8KB in
0112 size. This means that in some environments it would make sense to
0113 declare arrays
0114 real aW2d (1-OLx:sNx+OLx+PX,1-OLy:sNy+OLy+PY,nSx,nSy)
0115 where PX and PY are chosen so that the divides between near and
0116 far memory will coincide with the boundaries of the virtual memory
0117 regions a thread works on. In principle this is easy but it is
0118 also inelegant and really one would like the OS/hardware to take
0119 care of this issue. Doing it oneself requires PX and PY to be recalculated whenever
0120 the mapping of the nSx, nSy blocks to nTx and nTy threads is changed. Also
0121 different PX and PY are required depending on
0122 page size
0123 array element size ( real*4, real*8 )
0124 array dimensions ( 2d, 3d Nz, 3d Nz+1 ) - in 3d a PZ would also be needed!
0125 Note: 1. A C implementation would be a lot easier. An F90 including allocation
0126 would also be fairly straightforward.
0127 2. The padding really ought to be between the "collection" of blocks
0128 that all the threads using the same near memory work on. To save on wasted
0129 memory the padding really should be between these blocks. The
0130 PX, PY, PZ mechanism does this three levels down on the heirarchy. This
0131 wastes more memory.
0132 3. For large problems this is less of an issue. For a large problem
0133 even for a 2d array there might be say 16 pages per array per processor
0134 and at least 4 processors in a uniform memory access box. Assuming a
0135 sensible mapping of processors to blocks only one page (1.5% of the
0136 memory) referenced by processors in another box.
0137 On the other hand for a very small per processor problem size e.g.
0138 32x32 per processor and again four processors per box as many as
0139 50% of the memory references could be to far memory for 2d fields.
0140 This could be very bad!
0141
0142 =====================================
0143
0144 =====================================
0145
0146 =====================================
0147
0148 =====================================
0149
0150 =====================================
0151
0152 =====================================