Lumiera  0.pre.03
»edit your freedom«
microbenchmark.hpp File Reference

Go to the source code of this file.

Description

Functions to perform (multithreaded) timing measurement on a given functor.

This helper simplifies micro benchmarks of isolated implementation details. The test subject, given as function object or lambda, is invoked numerous times within a tight loop. In the multithreaded variant, the lambda is copied into N threads and performed in each thread in parallel; after waiting on termination of the test threads, results are summed up and then averaged into milliseconds per single invocation. The actual timing measurement relies on chrono::duration, which means to count micro ticks of the OS.

Warning
care has to bee taken when optimisation is involved! Optimisation usually has quite some impact on the results, but since this function is inline, the lambda can typically be inlined and the loop possibly be optimised away altogether. A simple workaround is to define a volatile variable in the call context, close the lambda by reference, and perform a comparison with that volatile variable in each invocation. The compiler is required actually to access the value of the volatile each time.
Remarks
some interesting observations (in my setup, 8 core AMD FX-8350)
  • if we replace the global volatile by a local variable within the test subject, the initialisation of that local typically costs +5ns per invocation.
  • incrementing the volatile costs +10ns
  • multithreaded (unlocked) incrementing of the global volatile creates massive overhead and increases the running time by factor 100. This nicely confirms that the x86_64 platform has strong cache coherence.

Definition in file microbenchmark.hpp.

#include "lib/meta/function.hpp"
#include "lib/scoped-collection.hpp"
#include "lib/sync-barrier.hpp"
#include "lib/thread.hpp"
#include "lib/test/microbenchmark-adaptor.hpp"
#include <chrono>

Typedefs

using CLOCK_SCALE = std::micro
 

Functions

template<class FUN >
size_t benchmarkLoop (FUN const &testSubject, const size_t repeatCnt=DEFAULT_RUNS)
 Benchmark building block to invoke a functor or λ in a tight loop, passing the current loop index and capturing a result checksum value. More...
 
template<class FUN >
double benchmarkTime (FUN const &invokeTestCode, const size_t repeatCnt=1)
 Helper to invoke a functor or λ to observe its running time. More...
 
template<class FUN >
auto microBenchmark (FUN const &testSubject, const size_t repeatCnt=DEFAULT_RUNS)
 perform a simple looped microbenchmark. More...
 
template<size_t nThreads, class FUN >
auto threadBenchmark (FUN const &subject, const size_t repeatCnt=DEFAULT_RUNS)
 perform a multithreaded microbenchmark. More...
 

Variables

constexpr size_t DEFAULT_RUNS = 10'000'000
 

Namespaces

 lib
 Implementation namespace for support and library code.
 

Function Documentation

◆ benchmarkTime()

double lib::test::benchmarkTime ( FUN const &  invokeTestCode,
const size_t  repeatCnt = 1 
)
inline

Helper to invoke a functor or λ to observe its running time.

Parameters
invokeTestLoopthe test (complete including loop) invoked once
repeatCntnumber of repetitions to divide the timing measurement
Returns
averaged time for one repetition, in microseconds

Definition at line 85 of file microbenchmark.hpp.

References lib::test::benchmarkTime().

Referenced by lib::test::benchmarkTime().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ benchmarkLoop()

size_t lib::test::benchmarkLoop ( FUN const &  testSubject,
const size_t  repeatCnt = DEFAULT_RUNS 
)
inline

Benchmark building block to invoke a functor or λ in a tight loop, passing the current loop index and capturing a result checksum value.

Returns
sum of all individual invocation results as checksum

Definition at line 104 of file microbenchmark.hpp.

References lib::test::benchmarkLoop().

Referenced by lib::test::benchmarkLoop().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ microBenchmark()

auto lib::test::microBenchmark ( FUN const &  testSubject,
const size_t  repeatCnt = DEFAULT_RUNS 
)
inline

perform a simple looped microbenchmark.

Parameters
testSubjectthe operation to test as functor or λ
Returns
a pair (microseconds, checksum)
Warning
this setup is only usable under strong optimisation; moreover, the scaffolding without actual operation should also be tested for comparison, to get a feeling for the setup overhead. For very small test subjects (single operations) it is recommended to use a direct loop without any lambdas and building blocks.

Definition at line 127 of file microbenchmark.hpp.

References lib::test::microBenchmark().

Referenced by lib::test::microBenchmark().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ threadBenchmark()

auto lib::test::threadBenchmark ( FUN const &  subject,
const size_t  repeatCnt = DEFAULT_RUNS 
)
inline

perform a multithreaded microbenchmark.

This function fires up a number of threads and invokes the given test subject repeatedly.

Template Parameters
numberof threads to run in parallel
Parameters
subjectfunction to be timed in parallel
repeatCntloop-count within each thread
Returns
a pair (microseconds, checksum) combining the averaged invocation time and a compounded checksum from all threads.
Remarks
- the subject function will be copied into each thread
  • so nThreads copies of this function will run in parallel
  • consider locking if this function accesses a shared closure.
  • if you pass a lambda, it is eligible for inlining followed by loop optimisation – be sure to include an observable effect, like returning a value tied to the actual computation, to prevent the compiler from optimising it away altogether.

Definition at line 156 of file microbenchmark.hpp.

References lib::test::threadBenchmark().

Referenced by lib::test::threadBenchmark().

+ Here is the call graph for this function:
+ Here is the caller graph for this function: