Description

Functions to perform (multithreaded) timing measurement on a given functor.

This helper simplifies micro benchmarks of isolated implementation details. The test subject, given as function object or lambda, is invoked numerous times within a tight loop. In the multithreaded variant, the lambda is copied into N threads and performed in each thread in parallel; after waiting on termination of the test threads, results are summed up and then averaged into milliseconds per single invocation. The actual timing measurement relies on chrono::duration, which means to count micro ticks of the OS.

Warning: care has to bee taken when optimisation is involved! Optimisation usually has quite some impact on the results, but since this function is inline, the lambda can typically be inlined and the loop possibly be optimised away altogether. A simple workaround is to define a volatile variable in the call context, close the lambda by reference, and perform a comparison with that volatile variable in each invocation. The compiler is required actually to access the value of the volatile each time.

Remarks

some interesting observations (in my setup, 8 core AMD FX-8350)

if we replace the global volatile by a local variable within the test subject, the initialisation of that local typically costs +5ns per invocation.
incrementing the volatile costs +10ns
multithreaded (unlocked) incrementing of the global volatile creates massive overhead and increases the running time by factor 100. This nicely confirms that the x86_64 platform has strong cache coherence.

Definition in file microbenchmark.hpp.

#include "lib/meta/function.hpp"
#include "lib/scoped-collection.hpp"
#include "lib/sync-barrier.hpp"
#include "lib/thread.hpp"
#include "lib/test/microbenchmark-adaptor.hpp"
#include <chrono>

Typedefs
using	CLOCK_SCALE = std::micro

Functions
template<class FUN >
size_t	benchmarkLoop (FUN const &testSubject, const size_t repeatCnt=DEFAULT_RUNS)
	Benchmark building block to invoke a functor or λ in a tight loop, passing the current loop index and capturing a result checksum value. More...

template<class FUN >
double	benchmarkTime (FUN const &invokeTestCode, const size_t repeatCnt=1)
	Helper to invoke a functor or λ to observe its running time. More...

template<class FUN >
auto	microBenchmark (FUN const &testSubject, const size_t repeatCnt=DEFAULT_RUNS)
	perform a simple looped microbenchmark. More...

template<size_t nThreads, class FUN >
auto	threadBenchmark (FUN const &subject, const size_t repeatCnt=DEFAULT_RUNS)
	perform a multithreaded microbenchmark. More...

Variables
constexpr size_t	DEFAULT_RUNS = 10'000'000

Namespaces
	lib
	Implementation namespace for support and library code.

Function Documentation

◆ benchmarkTime()

double lib::test::benchmarkTime	(	FUN const &	invokeTestCode,
		const size_t	repeatCnt = `1`
	)

inline

Helper to invoke a functor or λ to observe its running time.

Parameters

invokeTestLoop	the test (complete including loop) invoked once
repeatCnt	number of repetitions to divide the timing measurement

Returns: averaged time for one repetition, in microseconds

Definition at line 85 of file microbenchmark.hpp.

References lib::test::benchmarkTime().

Referenced by lib::test::benchmarkTime().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ benchmarkLoop()

size_t lib::test::benchmarkLoop	(	FUN const &	testSubject,
		const size_t	repeatCnt = `DEFAULT_RUNS`
	)

inline

Benchmark building block to invoke a functor or λ in a tight loop, passing the current loop index and capturing a result checksum value.

Returns: sum of all individual invocation results as checksum

Definition at line 104 of file microbenchmark.hpp.

References lib::test::benchmarkLoop().

Referenced by lib::test::benchmarkLoop().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ microBenchmark()

auto lib::test::microBenchmark	(	FUN const &	testSubject,
		const size_t	repeatCnt = `DEFAULT_RUNS`
	)

inline

perform a simple looped microbenchmark.

Parameters

testSubject the operation to test as functor or λ

Returns: a pair (microseconds, checksum)

Warning: this setup is only usable under strong optimisation; moreover, the scaffolding without actual operation should also be tested for comparison, to get a feeling for the setup overhead. For very small test subjects (single operations) it is recommended to use a direct loop without any lambdas and building blocks.

Definition at line 127 of file microbenchmark.hpp.

References lib::test::microBenchmark().

Referenced by lib::test::microBenchmark().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ threadBenchmark()

auto lib::test::threadBenchmark	(	FUN const &	subject,
		const size_t	repeatCnt = `DEFAULT_RUNS`
	)

inline

perform a multithreaded microbenchmark.

This function fires up a number of threads and invokes the given test subject repeatedly.

Template Parameters

number of threads to run in parallel

Parameters

subject	function to be timed in parallel
repeatCnt	loop-count within each thread

Returns: a pair (microseconds, checksum) combining the averaged invocation time and a compounded checksum from all threads.

Remarks

- the subject function will be copied into each thread

so nThreads copies of this function will run in parallel
consider locking if this function accesses a shared closure.
if you pass a lambda, it is eligible for inlining followed by loop optimisation – be sure to include an observable effect, like returning a value tied to the actual computation, to prevent the compiler from optimising it away altogether.

Definition at line 156 of file microbenchmark.hpp.

References lib::test::threadBenchmark().

Referenced by lib::test::threadBenchmark().

Here is the call graph for this function:

Here is the caller graph for this function:

Description

Typedefs

Functions

Variables

Namespaces

Function Documentation

◆ benchmarkTime()

◆ benchmarkLoop()

◆ microBenchmark()

◆ threadBenchmark()