Parallel operations

All Your Base
All Your Base in C++

The key for this group of approaches is to convert one digit at a time. You will use the remainder operator repeatedly in tandem with the division operator or make use of the pow.

Note

You might have heard Donald Knuth's famous words ["Premature Optimization Is the Root of All Evil"][pre-opt-blog]. With all the talk about best performance, please take care of the readability and therefore maintainability of your code.

Keep in mind, that the number of digits you need to process is the most important factor for your algorithm's performance. A simple serial for-loop can be a lot faster for a small number than a parallel algorithm that has more overhead to be set up.

Converting the input_digits

Numeric's accumulate

The for-loop with * and + can be written with the accumulate function from the numeric header.

#include <numeric>
unsigned int intermediate = std::accumulate(input_digits.begin(),
                                            input_digits.end(), 0U,
        [&input_base] (unsigned int sum, unsigned int d) {
            return sum * input_base + d;
        });

accumulate is guaranteed to run in order and thus is strictly serial as well.

Algorithm's transform_reduce in parallel

There are several functions that can be run with a parallel policy in the algorithm header. As you need the position to calculate the correct exponent for the pow function, you do need a second container with the position information. To fill that container we can use numeric's iota, which will produce something like {0, 1, 2, 3, ... } for us.

std::vector <unsigned int> positions(input_digits.size());
std::iota(positions.begin(), positions.end(), 0);
unsigned int intermediate = std::transform_reduce(
    std::execution::par_unseq,  // execution policy
    input_digits.cbegin(),      // first input container start
    input_digits.cend(),        // first input container end
    positions.crbegin(),        // start second container from the rear
    0U,                          // initial value
    std::plus {},               // reduce function
                                // transformation function:
    [ & input_base](unsigned int d, unsigned int pos) {
        return d * std::pow(input_base, pos);
    });

transform_reduce can work on two containers at the same time. In this case we use the input_digits and the positions from the back as exponents for the pow function in the transformation function. The reduce function is combining each result from the transformation with a simple addition (std::plus{}).

The most important part is the execution policy. The example uses std::execution::par_unseq to indicate that the transform_reduce function can be run in parallel and vectorized. With this statement you promise, that the execution does not violate any data dependencies. The function calls are only reading, we do not need to set any locks.

Note

If the compiler will transform the code into something that actually works in parallel and/or vectorized is a whole different chapter. There is a lot of ifs and whens that lead to the final code. One of the more direct ways to parallel processing is the use of [Intel's thread building blocks][tbb].

Converting into output_digits

For-loop with pow

As discussed in the serial approach and above, you can use pow to move to an independent approach, that does not change the intermediate variable.

To set up the for-loop, you need to calculate the number of digits beforehand with std::log(intermediate) / std::log(output_base).

std::vector <unsigned int> output_digits{};
unsigned int digit_limit{static_cast<unsigned int>(std::log(intermediate) / std::log(output_base) + 1)};
for(unsigned int digit{digit_limit}; digit > 0; --digit) {
    output_digits.emplace_back(intermediate / static_cast<unsigned int>(std::pow(output_base, digit - 1)) % output_base);
}

With this approach you do not need to reverse the resulting vector.

Algorithm's for_each in parallel

With the pow approach, you gained the option to move to parallel territory. You cannot decrement an integer like above, as the std container functions in C++17 expect an iterator. iota helps to set up the vector with the correct values.

unsigned int digit_limit{static_cast<unsigned int>(std::log(intermediate) / std::log(output_base)) + 1};
std::vector <unsigned int> output_digits(digit_limit);
std::iota(output_digits.rbegin(), output_digits.rend(), 1);
std::for_each(std::execution::par_unseq,
                output_digits.begin(),   
                output_digits.end(),    
                [&](unsigned int digit) {
                output_digits[digit_limit - digit] = 
                    intermediate
                        / static_cast<unsigned int>(std::pow(output_base, digit - 1))
                        % output_base;
});

The for-each function then uses the prepared vector in the fashion of approach #4. The most important part is the execution policy. The approach uses std::execution::par_unseq to indicate that the transform_reduce function can be run in parallel and vectorized. With this statement you promise, that the execution does not violate any data dependencies. The function calls are never writing to the same vector element, we do not need to set any locks.

All together

#include <algorithm>
#include <cmath>
#include <execution>
#include <numeric>
#include <stdexcept>

std::vector<unsigned int> convert(unsigned int input_base,
    const std::vector<unsigned int>& input_digits, unsigned int output_base) {
    if (input_base < 2 || output_base < 2) {
        throw std::invalid_argument("Bases must be at least 2");
    }
    if (std::any_of(input_digits.begin(), input_digits.end(),
            [input_base](unsigned int digit) {
                return digit >= input_base;
            }))
        throw std::invalid_argument("Digits cannot be bigger than the base");

    std::vector <unsigned int> positions(input_digits.size());
    std::iota(positions.begin(), positions.end(), 0U);
    unsigned int intermediate = std::transform_reduce(
        std::execution::par_unseq,
        input_digits.cbegin(),
        input_digits.cend(),
        positions.crbegin(),
        0U,
        std::plus {},
        [&input_base](unsigned int d, unsigned int pos) {
            return d * std::pow(input_base, pos);
        });

    unsigned int digit_limit{static_cast<unsigned int>(std::log(intermediate) / std::log(output_base) + 1)};
    std::vector <unsigned int> output_digits(digit_limit);
    std::iota(output_digits.rbegin(), output_digits.rend(), 1);
    std::for_each(
        std::execution::par_unseq,
        output_digits.begin(),
        output_digits.end(),
        [&](unsigned int digit) {
            output_digits[digit_limit - digit] = 
                intermediate / static_cast<unsigned int>(std::pow(output_base, digit - 1)) % output_base;
    });

    return output_digits;
}
27th Nov 2024 · Found it useful?