|  | <?xml version="1.0" encoding="UTF-8" standalone="no"?> | 
|  | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Design</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="C++, library, parallel" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="parallel_mode.html" title="Chapter 18. Parallel Mode" /><link rel="prev" href="parallel_mode_using.html" title="Using" /><link rel="next" href="parallel_mode_test.html" title="Testing" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Design</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="parallel_mode_using.html">Prev</a> </td><th width="60%" align="center">Chapter 18. Parallel Mode</th><td width="20%" align="right"> <a accesskey="n" href="parallel_mode_test.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="manual.ext.parallel_mode.design"></a>Design</h2></div></div></div><p> | 
|  | </p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.intro"></a>Interface Basics</h3></div></div></div><p> | 
|  | All parallel algorithms are intended to have signatures that are | 
|  | equivalent to the ISO C++ algorithms replaced. For instance, the | 
|  | <code class="function">std::adjacent_find</code> function is declared as: | 
|  | </p><pre class="programlisting"> | 
|  | namespace std | 
|  | { | 
|  | template<typename _FIter> | 
|  | _FIter | 
|  | adjacent_find(_FIter, _FIter); | 
|  | } | 
|  | </pre><p> | 
|  | Which means that there should be something equivalent for the parallel | 
|  | version. Indeed, this is the case: | 
|  | </p><pre class="programlisting"> | 
|  | namespace std | 
|  | { | 
|  | namespace __parallel | 
|  | { | 
|  | template<typename _FIter> | 
|  | _FIter | 
|  | adjacent_find(_FIter, _FIter); | 
|  |  | 
|  | ... | 
|  | } | 
|  | } | 
|  | </pre><p>But.... why the ellipses? | 
|  | </p><p> The ellipses in the example above represent additional overloads | 
|  | required for the parallel version of the function. These additional | 
|  | overloads are used to dispatch calls from the ISO C++ function | 
|  | signature to the appropriate parallel function (or sequential | 
|  | function, if no parallel functions are deemed worthy), based on either | 
|  | compile-time or run-time conditions. | 
|  | </p><p> The available signature options are specific for the different | 
|  | algorithms/algorithm classes.</p><p> The general view of overloads for the parallel algorithms look like this: | 
|  | </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>ISO C++ signature</p></li><li class="listitem"><p>ISO C++ signature + sequential_tag argument</p></li><li class="listitem"><p>ISO C++ signature + algorithm-specific tag type | 
|  | (several signatures)</p></li></ul></div><p> Please note that the implementation may use additional functions | 
|  | (designated with the <code class="code">_switch</code> suffix) to dispatch from the | 
|  | ISO C++ signature to the correct parallel version. Also, some of the | 
|  | algorithms do not have support for run-time conditions, so the last | 
|  | overload is therefore missing. | 
|  | </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.tuning"></a>Configuration and Tuning</h3></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.omp"></a>Setting up the OpenMP Environment</h4></div></div></div><p> | 
|  | Several aspects of the overall runtime environment can be manipulated | 
|  | by standard OpenMP function calls. | 
|  | </p><p> | 
|  | To specify the number of threads to be used for the algorithms globally, | 
|  | use the function <code class="function">omp_set_num_threads</code>. An example: | 
|  | </p><pre class="programlisting"> | 
|  | #include <stdlib.h> | 
|  | #include <omp.h> | 
|  |  | 
|  | int main() | 
|  | { | 
|  | // Explicitly set number of threads. | 
|  | const int threads_wanted = 20; | 
|  | omp_set_dynamic(false); | 
|  | omp_set_num_threads(threads_wanted); | 
|  |  | 
|  | // Call parallel mode algorithms. | 
|  |  | 
|  | return 0; | 
|  | } | 
|  | </pre><p> | 
|  | Some algorithms allow the number of threads being set for a particular call, | 
|  | by augmenting the algorithm variant. | 
|  | See the next section for further information. | 
|  | </p><p> | 
|  | Other parts of the runtime environment able to be manipulated include | 
|  | nested parallelism (<code class="function">omp_set_nested</code>), schedule kind | 
|  | (<code class="function">omp_set_schedule</code>), and others. See the OpenMP | 
|  | documentation for more information. | 
|  | </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.compile"></a>Compile Time Switches</h4></div></div></div><p> | 
|  | To force an algorithm to execute sequentially, even though parallelism | 
|  | is switched on in general via the macro <code class="constant">_GLIBCXX_PARALLEL</code>, | 
|  | add <code class="classname">__gnu_parallel::sequential_tag()</code> to the end | 
|  | of the algorithm's argument list. | 
|  | </p><p> | 
|  | Like so: | 
|  | </p><pre class="programlisting"> | 
|  | std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); | 
|  | </pre><p> | 
|  | Some parallel algorithm variants can be excluded from compilation by | 
|  | preprocessor defines. See the doxygen documentation on | 
|  | <code class="code">compiletime_settings.h</code> and <code class="code">features.h</code> for details. | 
|  | </p><p> | 
|  | For some algorithms, the desired variant can be chosen at compile-time by | 
|  | appending a tag object. The available options are specific to the particular | 
|  | algorithm (class). | 
|  | </p><p> | 
|  | For the "embarrassingly parallel" algorithms, there is only one "tag object | 
|  | type", the enum _Parallelism. | 
|  | It takes one of the following values, | 
|  | <code class="code">__gnu_parallel::parallel_tag</code>, | 
|  | <code class="code">__gnu_parallel::balanced_tag</code>, | 
|  | <code class="code">__gnu_parallel::unbalanced_tag</code>, | 
|  | <code class="code">__gnu_parallel::omp_loop_tag</code>, | 
|  | <code class="code">__gnu_parallel::omp_loop_static_tag</code>. | 
|  | This means that the actual parallelization strategy is chosen at run-time. | 
|  | (Choosing the variants at compile-time will come soon.) | 
|  | </p><p> | 
|  | For the following algorithms in general, we have | 
|  | <code class="code">__gnu_parallel::parallel_tag</code> and | 
|  | <code class="code">__gnu_parallel::default_parallel_tag</code>, in addition to | 
|  | <code class="code">__gnu_parallel::sequential_tag</code>. | 
|  | <code class="code">__gnu_parallel::default_parallel_tag</code> chooses the default | 
|  | algorithm at compiletime, as does omitting the tag. | 
|  | <code class="code">__gnu_parallel::parallel_tag</code> postpones the decision to runtime | 
|  | (see next section). | 
|  | For all tags, the number of threads desired for this call can optionally be | 
|  | passed to the respective tag's constructor. | 
|  | </p><p> | 
|  | The <code class="code">multiway_merge</code> algorithm comes with the additional choices, | 
|  | <code class="code">__gnu_parallel::exact_tag</code> and | 
|  | <code class="code">__gnu_parallel::sampling_tag</code>. | 
|  | Exact and sampling are the two available splitting strategies. | 
|  | </p><p> | 
|  | For the <code class="code">sort</code> and <code class="code">stable_sort</code> algorithms, there are | 
|  | several additional choices, namely | 
|  | <code class="code">__gnu_parallel::multiway_mergesort_tag</code>, | 
|  | <code class="code">__gnu_parallel::multiway_mergesort_exact_tag</code>, | 
|  | <code class="code">__gnu_parallel::multiway_mergesort_sampling_tag</code>, | 
|  | <code class="code">__gnu_parallel::quicksort_tag</code>, and | 
|  | <code class="code">__gnu_parallel::balanced_quicksort_tag</code>. | 
|  | Multiway mergesort comes with the two splitting strategies for multi-way | 
|  | merging. The quicksort options cannot be used for <code class="code">stable_sort</code>. | 
|  | </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.settings"></a>Run Time Settings and Defaults</h4></div></div></div><p> | 
|  | The default parallelization strategy, the choice of specific algorithm | 
|  | strategy, the minimum threshold limits for individual parallel | 
|  | algorithms, and aspects of the underlying hardware can be specified as | 
|  | desired via manipulation | 
|  | of <code class="classname">__gnu_parallel::_Settings</code> member data. | 
|  | </p><p> | 
|  | First off, the choice of parallelization strategy: serial, parallel, | 
|  | or heuristically deduced. This corresponds | 
|  | to <code class="code">__gnu_parallel::_Settings::algorithm_strategy</code> and is a | 
|  | value of enum <span class="type">__gnu_parallel::_AlgorithmStrategy</span> | 
|  | type. Choices | 
|  | include: <span class="type">heuristic</span>, <span class="type">force_sequential</span>, | 
|  | and <span class="type">force_parallel</span>. The default is <span class="type">heuristic</span>. | 
|  | </p><p> | 
|  | Next, the sub-choices for algorithm variant, if not fixed at compile-time. | 
|  | Specific algorithms like <code class="function">find</code> or <code class="function">sort</code> | 
|  | can be implemented in multiple ways: when this is the case, | 
|  | a <code class="classname">__gnu_parallel::_Settings</code> member exists to | 
|  | pick the default strategy. For | 
|  | example, <code class="code">__gnu_parallel::_Settings::sort_algorithm</code> can | 
|  | have any values of | 
|  | enum <span class="type">__gnu_parallel::_SortAlgorithm</span>: <span class="type">MWMS</span>, <span class="type">QS</span>, | 
|  | or <span class="type">QS_BALANCED</span>. | 
|  | </p><p> | 
|  | Likewise for setting the minimal threshold for algorithm | 
|  | parallelization.  Parallelism always incurs some overhead. Thus, it is | 
|  | not helpful to parallelize operations on very small sets of | 
|  | data. Because of this, measures are taken to avoid parallelizing below | 
|  | a certain, pre-determined threshold. For each algorithm, a minimum | 
|  | problem size is encoded as a variable in the | 
|  | active <code class="classname">__gnu_parallel::_Settings</code> object.  This | 
|  | threshold variable follows the following naming scheme: | 
|  | <code class="code">__gnu_parallel::_Settings::[algorithm]_minimal_n</code>.  So, | 
|  | for <code class="function">fill</code>, the threshold variable | 
|  | is <code class="code">__gnu_parallel::_Settings::fill_minimal_n</code>, | 
|  | </p><p> | 
|  | Finally, hardware details like L1/L2 cache size can be hardwired | 
|  | via <code class="code">__gnu_parallel::_Settings::L1_cache_size</code> and friends. | 
|  | </p><p> | 
|  | </p><p> | 
|  | All these configuration variables can be changed by the user, if | 
|  | desired. | 
|  | There exists one global instance of the class <code class="classname">_Settings</code>, | 
|  | i. e. it is a singleton. It can be read and written by calling | 
|  | <code class="code">__gnu_parallel::_Settings::get</code> and | 
|  | <code class="code">__gnu_parallel::_Settings::set</code>, respectively. | 
|  | Please note that the first call return a const object, so direct manipulation | 
|  | is forbidden. | 
|  | See <a class="link" href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/index.html" target="_top"> | 
|  | <code class="filename"><parallel/settings.h></code></a> | 
|  | for complete details. | 
|  | </p><p> | 
|  | A small example of tuning the default: | 
|  | </p><pre class="programlisting"> | 
|  | #include <parallel/algorithm> | 
|  | #include <parallel/settings.h> | 
|  |  | 
|  | int main() | 
|  | { | 
|  | __gnu_parallel::_Settings s; | 
|  | s.algorithm_strategy = __gnu_parallel::force_parallel; | 
|  | __gnu_parallel::_Settings::set(s); | 
|  |  | 
|  | // Do work... all algorithms will be parallelized, always. | 
|  |  | 
|  | return 0; | 
|  | } | 
|  | </pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.impl"></a>Implementation Namespaces</h3></div></div></div><p> One namespace contain versions of code that are always | 
|  | explicitly sequential: | 
|  | <code class="code">__gnu_serial</code>. | 
|  | </p><p> Two namespaces contain the parallel mode: | 
|  | <code class="code">std::__parallel</code> and <code class="code">__gnu_parallel</code>. | 
|  | </p><p> Parallel implementations of standard components, including | 
|  | template helpers to select parallelism, are defined in <code class="code">namespace | 
|  | std::__parallel</code>. For instance, <code class="function">std::transform</code> from <code class="filename">algorithm</code> has a parallel counterpart in | 
|  | <code class="function">std::__parallel::transform</code> from <code class="filename">parallel/algorithm</code>. In addition, these parallel | 
|  | implementations are injected into <code class="code">namespace | 
|  | __gnu_parallel</code> with using declarations. | 
|  | </p><p> Support and general infrastructure is in <code class="code">namespace | 
|  | __gnu_parallel</code>. | 
|  | </p><p> More information, and an organized index of types and functions | 
|  | related to the parallel mode on a per-namespace basis, can be found in | 
|  | the generated source documentation. | 
|  | </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="parallel_mode_using.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="parallel_mode.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="parallel_mode_test.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Using </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Testing</td></tr></table></div></body></html> |