File indexing completed on 2023-05-28 05:10:54 UTC
view on githubraw file Latest commit b4daa243 on 2023-05-28 03:53:22 UTC
b4daa24319 Shre*0001 /*
0002 ##########################################################
0003 # This file is part of the AdjoinableMPI library #
0004 # released under the MIT License. #
0005 # The full COPYRIGHT notice can be found in the top #
0006 # level directory of the AdjoinableMPI distribution. #
0007 ##########################################################
0008
0009 #ifndef _AMPI_AMPI_H_
0010 #define _AMPI_AMPI_H_
0011
0012 /**
0013 * \file
0014 * \ingroup UserInterfaceHeaders
0015 * One-stop header file for all AD-tool-independent AMPI routines; this is the file to replace mpi.h in the user code.
0016 */
0017
0018 /**
0019 * \defgroup UserInterfaceHeaders User-Interface header files
0020 * This set contains all the header files with declarations relevant to the user; header files not listed in this group
0021 * are internal to AdjoinableMPI or relate to support to be provided by a given AD tool.
0022 */
0023
0024 /**
0025 * \defgroup UserInterfaceDeclarations User-Interface declarations
0026 * This set contains all declarations relevant to the user; anything in the source files not listed in this group
0027 * is internal to AdjoinableMPI or relates to support to be provided by a given AD tool.
0028 */
0029
0030 /** \mainpage
0031 * The Adjoinable MPI (AMPI) library provides a modified set if MPI subroutines
0032 * that are constructed such that an adjoint in the context of algorithmic
0033 * differentiation (AD) can be computed. The library is designed to be supported
0034 * by a variety of AD tools and to enable also the computation of (higher-order)
0035 * forward derivatives.
0036 * \authors <b>Laurent Hascoët</b>
0037 * (currently at INRIA Sophia-Antipolis; <a href="http://fr.linkedin.com/pub/laurent-hascoët/86/821/a04">LinkedIn</a> - <a href="mailto:Laurent.Hascoet@sophia.inria.fr?subject=AMPI">e-mail</a>)
0038 * \authors <b>Michel Schanen</b>
0039 * (currently at RWTH Aachen; <a href="http://www.stce.rwth-aachen.de/people/Michel.Schanen.html">home page</a> - <a href="mailto:schanen@stce.rwth-aachen.de?subject=AMPI">e-mail</a>)
0040 * \authors <b>Jean Utke</b>
0041 * (until March 2014 at Argonne National Laboratory; <a href="http://www.linkedin.com/pub/jean-utke/5/645/7a">LinkedIn</a> - <a href="mailto:utkej1@gmail.com?subject=AMPI">e-mail</a>)
0042 *
0043 * Contributions informing the approach implemented in AMPI were made by the co-authors of \cite Utke2009TAM <b>P. Heimbach, C. Hill, U. Naummann</b>.
0044 *
0045 * Significant contributions were made by <b>Anton Bovin</b> (summer student at Argonne National Laboratory in 2013;<a href="http://www.linkedin.com/pub/anton-bovin/86/b1b/847">LinkedIn</a>).
0046 *
0047 * <b>Please refer to the \ref UserGuide for information regarding the use of the library in a given application.</b>
0048 *
0049 * Information regarding the library design, library internal functionality and the interfaces of methods to
0050 * be supported by a given AD tool are given in \ref LibraryDevelopmentGuide
0051 *
0052 * \section links Links to Ressources
0053 *
0054 * - <a href="https://trac.mcs.anl.gov/projects/AdjoinableMPI/wiki">TRAC page</a> for bug and feature tracking, links to presentations
0055 * - <a href="http://mercurial.mcs.anl.gov/ad/AdjoinableMPI/">mercurial repository</a> for source code and change history
0056 * - <a href="http://www.mcs.anl.gov/~utke/AdjoinableMPI/regression/tests.shtml">regression tests</a>
0057 *
0058 */
0059
0060 /**
0061 * \page UserGuide User Guide
0062 * \tableofcontents
0063 * \section Introduction
0064 *
0065 * The Adjoinable MPI (AMPI) library provides a modified set of MPI subroutines
0066 * that are constructed such that:
0067 * - an adjoint in the context of algorithmic differentiation (AD) can be computed,
0068 * - it can be supported by a variety of AD tools,
0069 * - it enable also the computation of (higher-order) forward derivatives,
0070 * - it provides an implementation for a straight pass-through to MPI such that the switch to AMPI can be made permanent
0071 * without forcing compile dependencies on any AD tool.
0072 *
0073 * There are principal recipes for the construction of the adjoint of
0074 * a given communication, see \cite Utke2009TAM .
0075 * The practical implementation of these recipes, however, faces the following
0076 * challenges.
0077 * - the target language may prevent some implementation options
0078 * - exposing an MPI_Request augmented with extra information as a structured type (not supported by Fortran 77)
0079 * - passing an array of buffers (of different length), e.g. to \ref AMPI_Waitall, as an additional argument to (not supported in any Fortran version)
0080 * - the AD tool implementation could be based on
0081 * - operator overloading
0082 * - original data and (forward) derivatives co-located (e.g. Rapsodia,dco)
0083 * - original data and (forward) derivatives referenced (e.g. Adol-C)
0084 * - source transformation
0085 * - association by address (e.g. OpenAD)
0086 * - association by name (e.g. Tapenade)
0087 *
0088 * The above choices imply certain consequences on the complexity for implementing
0089 * the adjoint (and forward derivative) action and this could imply differences in the AMPI design.
0090 * However, from a user's perspective it is a clear advantage to present a <b>single, AD tool implementation independent
0091 * AMPI library</b> such that switching AD tools is not hindered by AMPI while also promoting a common understanding of the
0092 * differentiation through MPI calls.
0093 * We assume the reader is familiar with MPI and AD concepts.
0094 *
0095 * \section sources Getting the library sources
0096 *
0097 * The sources can be accessed through the <a href="http://mercurial.mcs.anl.gov/ad/AdjoinableMPI/">AdjoinableMPI mercurial repository</a>. Bug tracking, feature requests
0098 * etc. are done via <a href="http://trac.mcs.anl.gov/projects/AdjoinableMPI">trac</a>.
0099 * In the following we assume the sources are cloned (cf <a href="http://mercurial.selenic.com/">mercurial web site</a> for details about mercurial)
0100 * into a directory `AdjoinableMPI` by invoking
0101 * \code
0102 * hg clone http://mercurial.mcs.anl.gov/ad/AdjoinableMPI
0103 * \endcode
0104 *
0105 * \section configure Library - Configure, Build, and Install
0106 *
0107 * Configuration, build, and install follows the typical GNU autotools chain. Go to the source directory
0108 * \code
0109 * cd AdjoinableMPI
0110 * \endcode
0111 * If the sources were obtained from the mercurial repository, then one first needs to run the autotools via invoking
0112 * \code
0113 * ./autogen.sh
0114 * \endcode
0115 * In the typical `autoconf` fashion invoke
0116 * \code
0117 * configure --prefix=<installation directory> ...
0118 * \endcode
0119 * in or outside the source tree.
0120 * The AD tool supporting AMPI should provide information which detailed AMPI
0121 * configure settings are required if any.
0122 * Build the libaries with
0123 * \code
0124 * make
0125 * \endcode
0126 * Optionally, before installing, one can do a sanity check by running: `make check` .
0127 *
0128 * To install the header files and compiled libraries follow with
0129 * \code
0130 * make install
0131 * \endcode
0132 * after which in the installation directory one should find under <tt>\<installation directory\></tt> the following.
0133 * - header files: see also \ref dirStruct
0134 * - libraries:
0135 * - libampiPlainC - for pass through to MPI, no AD functionality
0136 * - libampiCommon - implementation of AD functionality shared between all AD tools supporting AMPI
0137 * - libampiBookkeeping - implementation of AD functionality needed by some AD tools (see the AD tool documentation)
0138 * - libampiTape - implementation of AD functionality needed by some AD tools (see the AD tool documentation)
0139 *
0140 * Note, the following libraries are AMPI internal:
0141 * - libampiADtoolStubsOO - stubs for operator overloading AD tools not needed by the user
0142 * - libampiADtoolStubsST - stubs for source transformation AD tools not needed by the user
0143 *
0144 * \section mpiToAmpi Switching from MPI to Adjoinable MPI
0145 *
0146 * For a given MPI-parallelized source code the user will replace all calls to MPI_... routines with the respective AMPI_...
0147 * equivalent provided in \ref UserInterfaceDeclarations.
0148 * To include the declarations replace
0149 * - in C/C++: includes of <tt>mpi.h</tt> with
0150 * \code
0151 * #include <ampi/ampi.h>
0152 * \endcode
0153 * - in Fortran: includes of <tt>mpif.h</tt> with
0154 * \code
0155 * #include <ampi/ampif.h>
0156 * \endcode
0157 *
0158 * respectively.
0159 *
0160 * Because in many cases certain MPI calls (e.g. for initialization and finalization) take place outside the scope of
0161 * the original computation and its AD-derivatives and therefore do not themselves become part of the AD process,
0162 * see the explanations in \ref differentiableSection.
0163 * Each routine in this documentation lists to the changes to the parameters
0164 * relative to the MPI standard. These changes impact parameters specifying
0165 * - MPI_Datatype parameters, see \ref datatypes
0166 * - MPI_Request parameters, see \ref requests
0167 *
0168 * Some routines require new parameters specifying the pairing two-sided communications, see \ref pairings.
0169 * Similarly to the various approaches (preprocessing, templating, using <tt>typedef</tt>)
0170 * employed to effect a change to an active type for overloading-based AD tools, this switch
0171 * from MPI to AMPI routines should be done as a one-time effort.
0172 * Because AMPI provides an implementation for a straight pass-through to MPI it is possible to make this switch
0173 * permanent and retain builds that are completely independent of any AD tool and use AMPI as a thin wrapper library to AMPI.
0174 *
0175 * \section appCompile Application - compile and link
0176 *
0177 * After the switch described in \ref mpiToAmpi is done, the application should be recompiled with the include path addition
0178 * \code
0179 * -I<installation directory>/include
0180 * \endcode
0181 * and linked with the link path extension
0182 * \code
0183 * -L<installation directory>/lib[64]
0184 * \endcode
0185 * Note, the name of the subdirectory (lib or lib64 ) depends on the system;
0186 * the appropriate set of libraries, see \ref configure; the optional ones in square brackets depend on the AD tool:
0187 * \code
0188 * -libampicommon [ -libampiBookkeeping -lampiTape ]
0189 * \endcode
0190 * <b>OR</b> if instead of differentiation by AD a straight pass-through to MPI is desired, then
0191 * \code
0192 * -libampiPlainC
0193 * \endcode
0194 * instead.
0195 *
0196 * \section dirStruct Directory and File Structure
0197 * All locations discussed below are relative to the top level source directory.
0198 * The top level header file to be included in place of the usual "mpi.h" is located in
0199 * ampi/ampi.h
0200 *
0201 * It references the header files in <tt>ampi/userIF</tt> , see also \ref UserInterfaceHeaders which are organized to contain
0202 * - unmodified pass through to MPI in <tt>ampi/userIF/passThrough.h</tt> which exists to give the extent of the original MPI we cover
0203 * - variants of routines that in principle need adjoint logic but happen to be called outside of the code section that is adjoined and therefore
0204 * are not transformed / not traced (NT) in <tt>ampi/userIF/nt.h</tt>
0205 * - routines that are modified from the original MPI counterparts because their behavior in the reverse sweep differs from their behavior in the
0206 * forward sweep and they also may have a modified signatyre; in <tt>ampi/userIF/modified.h</tt>
0207 * - routines that are specific for some variants of source transformation (ST) approaches in <tt>ampi/userIF/st.h</tt>;
0208 * while these impose a larger burden for moving from MPI to AMPI on the user, they also enable a wider variety of transformations
0209 * currently supported by the tools; we anticipate that the ST specific versions may become obsolete as the source transformation tools evolve to
0210 * support all transformations via the routines in <tt>ampi/userIF/modified.h</tt>
0211 *
0212 * Additional header files contain enumerations used as arguments to AMPI routines. All declarations that are part of the user
0213 * interface are grouped in \ref UserInterfaceDeclarations. All other declarations in header files in the library are not to be used directly in the user code.
0214 *
0215 * A library that simply passes through all AMPI calls to their MPI counterparts for a test compilation and execution without any involvement of
0216 * and AD tool is implemented in the source files in the <tt>PlainC</tt> directory.
0217 *
0218 * \section differentiableSection Using subroutine variants NT vs non-NT relative to the differentiable section
0219 *
0220 * The typical assumption of a program to be differentiated is that there is some top level routine <tt>head</tt> which does the numerical computation
0221 * and communication which is called from some main <tt>driver</tt> routine. The <tt>driver</tt> routine would have to be manually adjusted to initiate
0222 * the derivative computation, retrieve, and use the derivative values.
0223 * Therefore only <tt>head</tt> and everything it references would be <em>adjoined</em> while <tt>driver</tt> would not. Typically, the <tt>driver</tt>
0224 * routine also includes the basic setup and teardown with MPI_Init and MPI_Finalize and consequently these calls (for consistency) should be replaced
0225 * with their AMPI "no trace/transformation" (NT) counterparts \ref AMPI_Init_NT and \ref AMPI_Finalize_NT.
0226 * The same approach should be taken for all resource allocations/deallocations (e.g. \ref AMPI_Buffer_attach_NT and \ref AMPI_Buffer_detach_NT)
0227 * that can exist in the scope enclosing the adjointed section alleviating
0228 * the need for the AD tool implementation to tackle them.
0229 * For cases where these routines have to be called within the adjointed code section the variants without the <tt>_NT</tt> suffix will ensure the
0230 * correct adjoint behavior.
0231 *
0232 * \section general General Assumptions on types and Communication Patterns
0233 *
0234 * \subsection datatypes Datatype consistency
0235 *
0236 * Because the MPI standard passes buffers as <tt>void*</tt> (aka choice) the information about the type of
0237 * the buffer and in particular the distinction between active and passive data (in the AD sense) must be
0238 * conveyed via the <tt>datatype</tt> parameters and be consistent with the type of the buffer. To indicate buffers of
0239 * active type the library predefines the following
0240 * - for C/C++
0241 * - \ref AMPI_ADOUBLE as the active variant of the passive MPI_DOUBLE
0242 * - \ref AMPI_AFLOAT as the active variant of the passive MPI_FLOAT
0243 * - for Fortran
0244 * - \ref AMPI_ADOUBLE_PRECISION as the active variant of the passive MPI_DOUBLE_PRECISION
0245 * - \ref AMPI_AREAL as the active variant of the passive MPI_REAL
0246 *
0247 * Passive buffers can be used as parameters to the AMPI interfaces with respective passive data type values.
0248 *
0249 * \subsection requests Request Type
0250 *
0251 * Because additional information has to be attached to the MPI_Request instances used in nonblocking communications, there
0252 * is an expanded data structure to hold this information. Even though in some contexts (F77) this structure cannot be exposed
0253 * to the user code the general approach is to declare variables that are to hold requests as \ref AMPI_Request (instead of
0254 * MPI_Request).
0255 *
0256 * \subsection pairings Pairings
0257 *
0258 * Following the explanations in \cite Utke2009TAM it is clear that context information about the
0259 * communication pattern, that is the pairing of MPI calls, is needed to achieve
0260 * -# correct adjoints, i.e. correct send and receive end points and deadlock free
0261 * -# if possible retain the efficiency advantages present in the original MPI communication for the adjoint.
0262 *
0263 * In AMPI pairings are conveyed via additional <tt>pairedWith</tt> parameters which may be set to \ref AMPI_PairedWith enumeration values , see e.g. \ref AMPI_Send or \ref AMPI_Recv.
0264 * The need to convey the pairing imposes restrictions because in a given code the pairing may not be static.
0265 * For a example a given <tt>MPI_Recv</tt> may be paired with
0266 * \code{.cpp}
0267 * if (doBufferedSends)
0268 * MPI_Bsend(...);
0269 * else
0270 * MPI_Ssend(...);
0271 * \endcode
0272 *
0273 * but the AD tool has to decide on the send mode once the reverse sweep needs to adjoin the orginal <tt>MPI_Recv</tt>.
0274 * Tracing such information in a global data structure is not scalable and piggybacking the send type onto the message
0275 * so it can be traced on the receiving side is conceivable but not trivial and currently not implemented.
0276 *
0277 * \restriction Pairing of send and receive modes must be static.
0278 *
0279 * Note that this does not prevent the use of wild cards for source, or tag.
0280 *
0281 * \section examples Examples
0282 * A set of examples organized to illustrate the uses of AMPI together with setups for AD tools that also serve as
0283 * regression tests are collected in `AdjoinableMPIexamples` that can be obtained similarly to the AMPI sources themselves
0284 * by cloning
0285 *\code
0286 * hg clone http://mercurial.mcs.anl.gov/ad/AdjoinableMPIexamples
0287 * \endcode
0288 * The daily regression tests based on these examples report the results on the page linked via the main page of this documentation.
0289 *
0290 */
0291
0292 /**
0293 * \page LibraryDevelopmentGuide Library Development Guide
0294 * \tableofcontents
0295 * \section naming Naming Conventions - Code Organization
0296 * Directories and libraries are organized as follows:
0297 * - user interface header files, see \ref dirStruct; should not contain anything else (e.g. no internal helper functions)
0298 * - `PlainC` : pass through to MPI implementations of the user interface; no reference to ADTOOL interfaces; to be renamed
0299 * - `Tape` : sequential access storage mechanism default implementation (implemented as doubly linked list) to enable forward/reverse
0300 * reading; may not reference ADTOOL or AMPI symbols/types; may reference MPI
0301 * - `Bookkeeping` : random access storage for AMPI_Requests (but possibly also other objects that could be opaque)
0302 * - `Common` : the AD enabled workhorse; here we have all the common functionality for MPI differentiation;
0303 *
0304 * Symbol prefixes:
0305 * - `AMPI_` to be used for anything in MPI replacing the `MPI_` prefix; not to be used for symbols outside of the user interface
0306 * - `TAPE_AMPI_` to be used for the `Tape` sequential access storage mechanism declared in ampi/tape/support.h
0307 * - `BK_AMPI_`: `Bookkeeping` random access storage mechanism declared in ampi/bookkeeping/support.h
0308 * - `ADTOOL_AMPI_` to be
0309 *
0310 *
0311 *
0312 * \section nonblocking Nonblocking Communication and Fortran Compatibility
0313 *
0314 * A central concern is the handling of non-blocking sends and receives in combination with their respective completion,
0315 * e.g. wait, waitall, test.
0316 * Taking as an example
0317 * \code{.cpp}
0318 * MPI_Irecv(&b,...,&r);
0319 * // some other code in between
0320 * MPI_Wait(&r,MPI_STATUS_IGNORE);
0321 * \endcode
0322 * The adjoint action for <tt>MPI_Wait</tt> will have to be the <tt>MPI_Isend</tt> of the adjoint data associated with
0323 * the data in buffer <tt>b</tt>.
0324 * The original <tt>MPI_Wait</tt> does not have any of the parameters required for the send and in particular it does not
0325 * have the buffer. The latter, however, is crucial in particular in a source transformation context because, absent a correct syntactic
0326 * representation for the buffer at the <tt>MPI_Wait</tt> call site one has to map the address <tt>&b</tt> valid during the forward
0327 * sweep to the address of the associated adjoint buffer during the reverse sweep.
0328 * In some circumstances, e.g. when the buffer refers to stack variable and the reversal mode follows a strict <em>joint</em> scheme
0329 * where one does not leave the stack frame of a given subroutine until the reverse sweep has completed, it is possible to predetermine
0330 * the address of the respective adjoint buffer even in the source transformation context.
0331 * In the general case, e.g. allowing for <em>split</em> mode reversal
0332 * or dynamic memory deallocation before the adjoint sweep commences such predetermination
0333 * requires a more elaborate mapping algorithm.
0334 * This mapping is subject of ongoing research and currently not supported.
0335 *
0336 * On the other hand, for operator overloading based tools, the mapping to a reverse sweep address space is an integral part of the
0337 * tool because there the reverse sweep is executed as interpretation of a trace of the execution that is entirely separate from the original program
0338 * address space. Therefore all addresses have to be mapped to the new adjoint address space to begin with and no association to some
0339 * adjoint program variable is needed. Instead, the buffer address can be conveyed via the request parameter (and AMPI-userIF bookkeeping)
0340 * to the <tt>MPI_Wait</tt> call site, traced there and is then recoverable during the reverse sweep.
0341 * Nevertheless, to allow a common interface this version of the AMPI library has the buffer as an additional argument to in the source-transformation-specific \ref AMPI_Wait_ST
0342 * variant of \ref AMPI_Wait.
0343 * In later editions, when source transformation tools can fully support the address mapping, the of the AMPI library the \ref AMPI_Wait_ST variant may be dropped.
0344 *
0345 * Similarly to conveying the buffer address via userIF bookkeeping associated with the request being passed, all other information such as source or destination, tag,
0346 * data type, or the distinction if a request originated with a send or receive will be part of the augmented information attached to the request and be subject to the trace and recovery as the buffer address itself.
0347 * In the source transformation context, for cases in which parameter values such as source, destination, or tag are constants or loop indices the question could be asked if these values couldn't be easily recovered in
0348 * the generated adjoint code without having to store them.
0349 * Such recovery following a TBR-like approach would, however, require exposing the augmented request instance as a structured data type to the TBR analysis in the languages other than Fortran77.
0350 * This necessitates the introduction of the \ref AMPI_Request, which in Fotran77 still maps to just an integer address.
0351 * The switching between these variants is done via configure flags, see \ref configure.
0352 *
0353 * \section bookkeeping Bookkeeping of Requests
0354 *
0355 * As mentioned in \ref nonblocking the target language may prevent the augmented request from being used directly.
0356 * In such cases the augmented information has to be kept internal to the library, that is we do some bookkeeping to convey the necessary information between the nonblocking sends or receives and
0357 * the and respective completion calls. Currently the bookkeeping has a very simple implementation as a doubly-linked list implying linear search costs which is acceptable only as long as the
0358 * number of icomplete nonblocking operations per process remains moderate.
0359 *
0360 * Whenever internal handles are used to keep trace (or correspondence) of a given internal object
0361 * between two distant locations in the source code (e.g. file identifier to keep trace of an opened/read/closed file,
0362 * or address to keep trace of a malloc/used/freed dynamic memory, or request ID to keep trace of a Isend/Wait...)
0363 * we may have to arrange the same correspondence during the backward sweep.
0364 * Keeping the internal identifier in the AD stack is not sufficient because there is no guarantee that
0365 * the mechanism in the backward sweep will use the same values for the internal handle.
0366 * The bookkeeping we use to solve this problem goes as follows:
0367 * - standard TBR mechanism makes sure that variables that are needed in the BW sweep and are overwritten
0368 * are pushed onto the AD stack before they are overwritten
0369 * - At the end of its life in the forward sweep, the FW handle is pushed in the AD stack
0370 * - At the beginning of its backward life, we obtain a BW handle, we pop the FW handle,
0371 * and we keep the pair of those in a table (if an adjoint handle is created too, we keep the triplet).
0372 * - When a variable is popped from the AD stack, and it is an internal handle,
0373 * the popped handle is re-based using the said table.
0374 *
0375 * Simple workaround for the "request" case:
0376 * This method doesn't rely on TBR.
0377 * - Push the FW request upon acquisition (e.g. just after the Isend)
0378 * - Push the FW request upon release (e.g. just before the Wait)
0379 * - Pop the FW request upon adjoint of release, and get the BW request from the adjoint of release
0380 * - Add the BW request into the bookkeeping, with the FW request as a key.
0381 * - Upon adjoint of acquisition, pop the FW request, lookup in the bookkeeping to get the BW request.
0382 *
0383 * \section bundling Tangent-linear mode bundling the derivatives or shadowing the communication
0384 * A central question for the implementation of tangent-linear mode becomes
0385 * whether to bundle the original buffer <tt>b</tt> with the derivative <tt>b_d</tt> as pair and communicate the pair
0386 * or to send separate messages for the derivatives.
0387 * - shadowing messages avoid the bundling/unbundling if <tt>b</tt> and <tt>b_d</tt>
0388 * are already given as separate entities as is the case in association by name, see \ref Introduction.
0389 * - for one-sided passive communications there is no hook to do the bundling/unbundling on the target side; therefore
0390 * it would be inherently impossible to achieve semantically correct behavior with any bundling/unbundling scheme.
0391 * The example here is a case where a put on the origin side and subsequent computations on the target side are synchronized
0392 * via a barrier which by itself does not have any obvious link to the target window by which one could trigger an unbundling.
0393 * - the bundling operation itself may incur nontrivial overhead for large buffers
0394 *
0395 * An earlier argument against message shadowing was the difficulty of correctly associating message pairs while using wildcards.
0396 * This association can, however, be ensured when a the shadowing message for the <tt>b_d</tt> is received on a communicator
0397 * <tt>comm_d</tt> that duplicates the original communicator <tt>comm</tt> and uses the
0398 * actual src and tag values obtained from the receive of the shadowed message as in the following example:
0399 *
0400 * \code{.cpp}
0401 * if ( myRank==1) {
0402 * send(x,...,0,tag1,comm); // send of the original data
0403 * send(x_d,...,0,tag1,comm_d); // shadowing send of the derivatives
0404 * else if ( myRank==2) {
0405 * send(y,...,0,tag2,comm);
0406 * send(y_d,...,0,tag2,comm_d);
0407 * else if ( myRank==0) {
0408 * do {
0409 * recv(t,...,ANY_SOURCE, ANY_TAG,comm,&status); // recv of the original data
0410 * recv(t_d,...,status.SOURCE,status.TAG,comm_d,STATUS_IGNORE); // shadowing recv with wildcards disambiguated
0411 * z+=t; // original operation
0412 * z_d+=t_d; // corresponding derivative operation
0413 * }
0414 * }
0415 * \endcode
0416 *
0417 * This same approach can be applied to (user-defined) reduction operations, see \ref reduction, in that the binomial
0418 * tree traversal for the reduction is shadowed in the same way and a user defined operation with derivatives can be invoked
0419 * by passing the derivatives as separate arguments.
0420 *
0421 * The above approach is to be taken by any tool in which <tt>b</tt> and <tt>b_d</tt> are not already paired in consecutive
0422 * memory such as association by name as in Tapenade or by implementation choice such as forward interpreters in Adol-C where
0423 * the 0-th order Taylor coefficients live in a separate array from the first- and higher-order Taylor coefficients.
0424 * Tools with association by address (OpenAD, Rapsodia) would have the data already given in paired form and therefore not
0425 * need messsage shadowing but communicate the paired data.
0426 *
0427 * \section badOptions Rejected design options
0428 * About MPI_Types and the "active" boolean:
0429 * One cannot get away with just an "active" boolean to indicate the structure of
0430 * the MPI_Type of the bundle. Since the MPI_Type definition of the bundle type
0431 * has to be done anyway in the differentiated application code, and is passed
0432 * to the communication call, the AMPI communication implementation will
0433 * check this bundle MPI_Type to discover activity and trace/not trace accordingly.
0434 *
0435 * For the operator overloading, the tool needs to supply the active MPI types
0436 * for the built-in MPI_datatypes and using the active types, one can achieve
0437 * type conformance between the buffer and the type parameter passed.
0438 *
0439 * \section onesided One-Sided Active Targets
0440 * Idea - use an <tt>AMPI_Win</tt> instance (similar to the \ref AMPI_Request ) to attach more
0441 * information about the things that are applied to the window and completed on the fence;
0442 * we execute/trace/collect-for-later-execution operations on the window in the following fashion
0443 *
0444 * forward:
0445 * - MPI_Get record op/args on the window (buffer called 'x')
0446 * - MPI_Put/MPI_Accumulate(z,,...): record op/args on the window; during forward: replace with MPI_Get of the remote target value into temporary 't' ; postpone to the fence;
0447 *
0448 * upon hitting a fence in the forward sweep:
0449 * 1. put all ops on the stack
0450 * 2. run the fence
0451 * 3. for earch accum/put:
0452 * 3.1: push 't'
0453 * 3.2: do the postponed accumulate/put
0454 * 4. run a fence for 3.2
0455 * 5. for each accum*:
0456 * 5.1 get accumlation result 'r'
0457 * 6. run a fence for 5.1
0458 * 7. for each accum*:
0459 * 7.1 push 'r'
0460 *
0461 * for the adjoint of a fence :
0462 * 0. for each operation on the window coming from the previous fence:
0463 * 0.1 op isa GET then x_bar=0.0
0464 * 0.2 op isa PUT/accum= then x_bar+=t21
0465 * 0.3 op isa accum+ then x_bar+=t22
0466 * 1. run a fence
0467 * 2. pop op from the stack and put onto adjoint window
0468 * 2.1 op isa PUT/accum=: then GET('t21')
0469 * 2.2 op isa accum+; then get('t22') from adjoint target
0470 * 2.3 op isa accum*, then pop('r'), GET('t23') from adjoint target
0471 * 3. run a fence
0472 * 4. for each op on the adjoint window
0473 * 4.1 op isa GET, then accum+ into remote
0474 * 4.2 op isa PUT/accum: pop(t); accu(t,'=') to the value in the target
0475 * 4.3 op isa PUT/accum=; then acc(0.0,'=') to adjoint target
0476 * 4.4 op isa accum*: then accumulate( r*t23/t,'=', to the target) AND do z_bar+=r*t23/z (this is the old local z );
0477 *
0478 * \section derived Handling of Derived Types
0479 * (Written mostly in the context of ADOL-C.) MPI allows the user to create typemaps for arbitrary structures in terms of a block
0480 * count and arrays of block lengths, block types, and displacements. For sending an array of active variables, we could get by with
0481 * a pointer to their value array; in the case of a struct, we may want to send an arbitrary collection of data as well as some active
0482 * variables which we'll need to "dereference". If a struct contains active data, we must manually pack it into a new array because
0483 * -# the original datamap alignment is destroyed when we convert active data to real values
0484 * -# we would like to send completely contiguous messages
0485 *
0486 * When received, the struct is unpacked again.
0487 *
0488 * When the user calls the \ref AMPI_Type_create_struct_NT wrapper with a datamap, the map is stored in a structure of type
0489 * \ref derivedTypeData; the wrapper also generates an internal typemap that describes the packed data. The packed typemap is used
0490 * whenver a derived type is sent and received; it's also used in conjunction with the user-provided map to pack and unpack data.
0491 * This typemap is invisible to the user, so the creation of derived datatypes is accomplished entirely with calls to the
0492 * \ref AMPI_Type_create_struct and \ref AMPI_Type_commit_NT wrappers.
0493 *
0494 * \image html dtype_illustration.png
0495 * \image latex dtype_illustration.png
0496 *
0497 * AMPI currently supports sending structs with active elements and structs with embedded structs. Packing is called recursively.
0498 * Functions implemented are \ref AMPI_Type_create_struct_NT and \ref AMPI_Type_contiguous_NT. A wrapper for _Type_vector can't be
0499 * implemented now because the point of that function is to send noncontiguous data and, for simplicity and efficiency, we're assuming
0500 * that the active variables we're sending are contiguous.
0501 *
0502 * Worth noting: if we have multiple active variables in a struct and we want to send an array of these structs, we have to send every
0503 * active element to ensure that our contiguity checks don't assert false.
0504 *
0505 * \section reduction Reduction operations
0506 *
0507 * Since operator overloading can't enter MPI routines, other AMPI functions extract the double values from active variables,
0508 * transfer those, and have explicit adjoint code that replaces the automated transformation. This is possible because we know the
0509 * partial derivative of the result. For reductions, we can also do this with built-in reduction ops (e.g., sum, product). But
0510 * we can't do this for user-defined ops because we don't know the partial derivative of the result.
0511 *
0512 * (Again explained in the context of ADOL-C.) So we have to make the tracing machinery enter the Reduce and perform taping every
0513 * time the reduction op is applied. As it turns out, MPICH implements Reduce for derived types as a binary tree of Send/Recv pairs,
0514 * so we can make our own Reduce by replicating the code with AMPI_Send/Recv functions. (Note that derived types are necessarily
0515 * reduced with user-defined ops because MPI doesn't know how to accumulate them with its built-in ops.) So AMPI_Reduce is implemented
0516 * for derived types as the aforementioned binary tree with active temporaries used between steps for applying the reduction op.
0517 * See \ref AMPI_Op_create_NT.
0518 *
0519 *
0520 */
0521
0522
0523
0524
0525 #include <mpi.h>
0526 #if defined(__cplusplus)
0527 extern "C" {
0528 #endif
0529
0530 #include "ampi/userIF/passThrough.h"
0531 #include "ampi/userIF/nt.h"
0532 #include "ampi/userIF/modified.h"
0533 #include "ampi/userIF/st.h"
0534
0535 #include "ampi/libCommon/modified.h"
0536
0537 #if defined(__cplusplus)
0538 }
0539 #endif
0540
0541 #endif