The new emerging NLE for GNU/Linux

We encounter dependencies as an issue at implementation level: In order to deal with some task at hand, sometimes we need to arrange matters way beyond the scope of that task. We could just thoughtlessly reach out and settle those extraneous concerns — yet this kind of pragmatism has a price tag: we are now mutually dependent with internals of some other part of the system we do not even care much about. A more prudent choice would be to let “that other part” provide a service for us, focussed to what we actually need right here to get our work done. In essence, we create a dependency to resolve issues of coupling and to reduce complexity (»divide et impera«).

Unfortunately this solution created a new problem: how do we get at our dependencies? We can not just step ahead and create them or manage them, because then we’d be “back to square one”. Rather someone else has to care. Someone needs to connect us with those dependencies, so we can use them. This is a special meta-service known as Dependency Injection. A dedicated part of the application wires all other components, so each component can focus on its specific concern and abstract away everything else. Dependency Injection can be seen as application of the principle »Inversion Of Control«: each part is sovereign within its own realm, but becomes a client (asks for help) for anything beyond that.

However, in the Lumiera code base, we refrain from building or using a full-blown Dependency Injection Container. A lot of FUD has been spread regarding Dependency Injection and Singletons, to the point that a majority of developers confuses and conflates the Inversion-of-Control principle (which is essential) with the use of a DI-Container. Nowadays, you can not even utter the word “Singleton” without everyone yelling out “Evil! Evil!” — while most of these people at the same time feel just comfortable living in the metadata hell.

Not Singletons as such are problematic — rather, the coupling of the Singleton class itself with the instantiation and lifecycle mechanism is what creates the problems. This situation is similar to the use of global variables, which likewise are not evil as such; the problems arise from an imperative, operation driven and data centric mindset, combined with hostility towards any abstraction. In C++ such problems can be mitigated by use of a generic Singleton Factory — which can be augmented into a Dependency Factory for those rare cases where we actually need more instance and lifecycle management beyond lazy initialisation. Client code indicates the dependence on some other service by planting an instance of that Dependency Factory (for Lumiera this is lib::Depend<TY>) and remains unaware if the instance is created lazily in singleton style (which is the default) or has been reconfigured to expose a service instance explicitly created by some subsystem lifecycle. The essence of a “dependency” of this kind is that we access a service by name. And this service name or service ID is in our case a type name.


Our DependencyFactory satisfies the following requirements

  • client code is able to access some service by-name — where the name is actually the type name of the service interface.

  • client code remains agnostic with regard to the lifecycle or backing context of the service it relies on.

  • in the simplest (and most prominent case), nothing has to be done at all by anyone to manage that lifecycle.
    By default, the Dependency Factory creates a singleton instance lazily (heap allocated) on demand and it ensures thread-safe initialisation and access.

  • we establish a policy to disallow any significant functionality during application shutdown. After leaving main(), only trivial dtors are invoked and possibly a few resource handles are dropped. No filesystem writes, no clean-up and reorganisation, not even any logging is allowed. For this reason, we established a Subsystem concept with explicit shutdown hooks, which are invoked beforehand.

  • the Dependency Factory can be re-configured for individual services (type names) to refer to an explicitly installed service instance. In those cases, access while the service is not available will raise an exception. There is a simple one-shot mechanism to reconfigure Dependency Factory and create a link to an actual service implementation, including automatic deregistration.


The DependencyFactory and thus the behaviour of dependency injection can be reconfigured, ad hoc, at runtime.
Deliberately, we do not enforce global consistency statically (since that would lead to one central static configuration). However, a runtime sanity check is performed to ensure configuration actually happens prior to any use, which means any invocation to retrieve (and thus lazily create) the service instance. The following flavours can be configured:


a singleton instance of the designated type is created lazily, on first access

  • define an instance for access (preferably static): Depend<Blah> theBla;

  • access the singleton instance as theBla().doIt()

singleton subclass

causes the dependency factory Depend<Bla> to create a SubBlah singleton instance from now on

attach to service

DependInject<Blah>::ServiceInstance<SubBlah> service{p1, p2, p3};

  • build and manage an instance of SubBlah in heap memory immediately (not lazily)

  • configure the dependency factory to return a reference to this instance

  • the instantiated ServiceInstance<SubBlah> object itself acts as lifecycle handle (and managing smart-ptr)

  • when it is destroyed, the dependency factory is automatically cleared, and further access will trigger an error

support for test mocking

DependInject<Blah>::Local<SubBlah> mock;

  • temporarily shadows whatever configuration resides within the dependency factory

  • the next access will create a (non singleton) SubBlah instance in heap memory and return a Blah&

  • the instantiated mock handle object again acts as lifecycle handle and smart-ptr to access the SubBlah instance like mock->doItSpecial()

  • when this handle goes out of scope, the original configuration of the dependency factory is restored

custom constructors

both the subclass singleton configuration and the test mock support optionally accept a functor or lambda argument with signature SubBlah*(). The contract is for this construction functor to return a heap allocated object, which will be owned and managed by the DependencyFactory. Especially this enables use of subclasses with non default ctor and / or binding to some additional hidden context. Please note that this closure will be invoked later, on-demand.

We consider the usage pattern of dependencies a question of architecture rather — such can not be solved by any mechanism at implementation level. For this reason, Lumiera’s Dependency Factory prevents reconfiguration after use, but does nothing beyond such basic sanity checks.

Performance considerations

We acknowledge that such a dependency or service will be accessed frequently and even from rather performance critical parts of the application. We have to optimise for low overhead on access, while initialisation happens only once and can be arbitrarily expensive. It is more important that configuration, setup and initialisation code remains readable. And it is important to place such configuration at a location within the code where the related concerns are treated — which is not at the usage site, and which is likewise not within some global central core application setup. At which point precisely initialisation happens is a question of architecture — lazy initialisation can be used to avoid expensive setup of rarely used services, or it can be employed to simplify the bootstrap of complex subsystems, or to break service dependency cycles. All of this builds on the assumption that the global application structure is fixed and finite and well-known — we assume we are in full control about when and how parts of the application start and stop.

Our requirements on (optional) reconfigurability have some impact on the implementation technique though, since we need access to the instance pointer for individual service types. This basically rules out Meyers Singleton — and so the adequate implementation technique for our usage pattern is Double Checked Locking. In the past, there was much debate about DCL being broken — which indeed was true when assuming full portability and arbitrary target platform. Since our focus is primarily on PC-with-Linux systems, this argument seems to lean more to the theoretical side though, since the x86/64 platform is known to employ rather strong memory and cache coherency constraints. With the recent advent of ARM systems, the situation has changed however. Anyway, since C++11 there is now a portable solution available for writing a correct DCL implementation, based on std::atomic.

The idea underlying Double Checked Locking is to optimise for the access path, which is achieved by moving the expensive locking entirely out of that path. However, any kind of concurrent consistency assertion requires us to establish a »happens before« relation between two events of information exchange. Both traditional locking and lock-free concurrency implement this relation by establishing a synchronises-with relation between two actions on a common guard entity — for traditional locking, this would be a Lock, Mutex, Monitor or Semaphore, while lock-free concurrency uses the notion of a fence connected with some well defined action on a userspace guard variable. In modern C++, typically we use Atomic variables as guard. In addition to well defined semantics regarding concurrent visibility of changes, these "atomics" offer indivisible access and exchange operations. A correct concurrent interaction must involve some kind of well defined handshake to establish the aforementioned synchronises-with relation — otherwise we just can not assume anything. Herein lies the problem with Double Checked Locking: when we move all concurrency precautions away from the optimised access path, we get performance close to a direct local memory access, but we can not give any correctness assertions in this setup. If we are lucky (and the underlying hardware does much to yield predictable behaviour), everything works as expected, but we can never be sure about that. A correct solution thus inevitably needs to take away some of the performance from the optimised access path. Fortunately, with properly used atomics this price tag is known to be low. At the end of the day, correctness is more important than some superficially performance boost.

To gain insight into the rough proportions of performance impact, in 2018 we conducted some micro benchmarks (using a 8 core AMD FX-8350 64bit CPU running Debian/Jessie and GCC 4.9 compiler) The following table lists averaged results in relative numbers, in relation to a single threaded optimised direct non virtual member function invocation (≈ 0.3ns)

Access Technique development optimised





direct invoke on shared local object





invoke existing object through unique_ptr





lazy init unprotected (not threadsafe)





lazy init always mutex protected





Double Checked Locking with mutex





DCL with std::atomic and mutex for init





These benchmarks used a dummy service class holding a volatile int, initialised to a random value. The complete code was visible to the compiler and thus eligible for inlining. Repeatedly the benchmarked code accessed this dummy object through the means listed in the table, then retrieved the (actually constant) value from the private volatile variable within the service and compared it to zero. This setup ensures the optimiser can not remove the code altogether, while the access to the service dominates the measured time. The concurrent measurement used 8 threads (number of cores), each performing the same timing loop on a local instance. The number of invocations within each thread was high enough (several millions) to amortise the actual costs of object allocation. Some observations:

  • The numbers obtained pretty much confirm other people’s measurments.

  • Synchronisation is indeed necessary; the unprotected lazy init crashed several times randomly during multithreaded tests.

  • Contention on concurrent access is very tangible; even for unguarded access the cache and memory hardware has to perform additional work

  • However, the concurrency situation in this example is rather extreme and deliberately provokes collisions; in practice we’d be closer to the single threaded case

  • Double Checked Locking is a very effective implementation strategy and results in timings within the same order of magnitude as direct unprotected access

  • Unprotected lazy initialisation performs spurious duplicate initialisations, which can be avoided by DCL

  • Naïve Mutex locking is slow even with non-recursive Mutex without contention

  • Optimisation achieves access times around ≈ 1ns


Dependency management does not define the architecture, nor can it solve architecture problems. Rather, its purpose is to enact the architecture. A dependency is something we need in order to perform the task at hand, yet essence of a dependency lies outside the scope and relates to concerns beyond and theme of this actual task. A naïve functional approach — pass everything you need as argument — would be as harmful as thoughtlessly manipulating some off-site data to fit current needs. The local function would be splendid, strict and referentially transparent — yet anyone using it would be infected with issues of tangling and tight coupling. As remedy, a global context can be introduced, which works well as long as this global context does not exhibit any other state than “being available”. The root of those problems however lies in the drive to conceive matters simpler as they are.

  • collaboration typically leads to indirect mutual dependency. We can only define precisely what is required locally, and then pull our requirements on demand.

  • a given local action can be part of a process, or a conversation or interaction chain, which in turn might originate from various, quite distinct contexts. At that level, we might find a simpler structure to hinge questions of lifecycle on.

In Lumiera we encounter both these kinds of circumstances. On a global level, we have a simple and well defined order of dependencies, cast into Subsystem relations. We know e.g. that mutating changes to the session can originate from scripts or from UI interactions. It suffices thus, when the leading subsystem (the UI or the script runner) refrains from emitting any further external activities, prior to reaching that point in the lifecycle where everything is “basically set”. Yet however self evident this insight might be, it yields some unsettling and challenging consequences: The UI must not assume the presence of specific data structures within the lower layers, nor is it allowed to “pull” session contents as a dependency while starting up. Rather the UI-Layer is bound to bootstrap itself into completely usable and operative state, without the ability to attach anything onto existing tangible content structures. This runs completely counter common practice of UI programming, where it is customary to wire most of the application internals somehow directly below the UI “shell”. Rather, in Lumiera the UI must be conceived as a collection of services — and when running, a population request can be issued to fill the prepared UI framework with content. This is Inversion-of-Control at work.