Description

Service for coordination and dispatch of render activities.

The implementation of scheduling services is provided by an integration of two layers of functionality:

Layer-1 allows to enqueue and prioritise render activity records
Layer-2 connects and coordinates activities to conduct complex calculations Additionally, a custom allocation scheme is involved, a notification service and the execution environment for the low-level »Activity Language. Some operational control and and load management is delegated to the LoadController. The purpose of the »Scheduler Service« in the lumiera Render Engine is to coordinate the execution of »Render Jobs«, which can be controlled by a timing scheme, but also triggered in response to some prerequisite event, most notably the completion of IO work.

Thread coordination

The typical situation found when rendering media is the demand to distribute rather scarce computation resources to various self-contained tasks sequenced in temporary and dependency order. In addition, some internal management work must be conducted to order these tasks, generate further tasks and coordinate the dependencies. Overall, any such internal work is by orders of magnitude less expensive than the actual media calculations, which reach up into the range of 1-10 milliseconds, possibly even way more (seconds for expensive computations). For this reason, the Scheduler in the Lumiera Render Engine uses a pool of workers, each representing one unit of computation resource (a »core«), and these workers will pull work actively, rather then distributing, queuing and dispatching tasks to a passive set of workers. And notably the »management work« is performed also by the workers themselves, to the degree it is necessary to retrieve the next piece of computation. So there is no dedicated »queue manager« — scheduling is driven by the workers.

Assuming that this internal work is comparatively cheap to perform, a choice was made to handle any internal state changes of the Scheduler exclusively in single-threaded mode. This is achieved by an atomic lock, maintained in Layer-2 of the Scheduler implementation. Any thread looking for more work will pull a pre-configured functor, which is implemented by the work-function. The thread will attempt to acquire the lock, designated as »grooming-token« – but only if this is necessary to perform internal changes. Since workers are calling in randomly, in many cases there might be no task to perform at the moment, and the worker can be instructed to go to a sleep cycle and call back later. On the other hand, when load is high, workers are instructed to call back immediately again to find the next piece of work. Based on assessment of the current »head time«, a quick decision will be made if the thread's capacity is useful right now, or if this capacity will be re-focussed into another zone of the scheduler's time axis, based on the distance to the next task.

If however a thread is put to work, it will start dequeuing an entry from the head of the priority queue, and start interpreting this entry as a chain of render activities, with the help of the »Activity Language«. In the typical scenario, after some preparatory checks and notifications, the thread transitions into work mode, which entails to drop the grooming-token. Since the scheduler queue only stores references to render activities, which are allocated in a special arrangement exploiting the known deadline time of each task, further processing can commence concurrently.

Note: The grooming-token should always be dropped by a deliberate state transition. Notably internal processing (e.g. planning of new jobs) will not drop the token, since it must be able to change the schedule. Such internal tasks can be processed in row and will be confined to a single thread (there is a special treatment at the end of #doWork() to achieve that). As a safety net, the grooming-token will automatically be dropped after catching an exception, or when a thread is sent to sleep.

See also: SchedulerService_test Component integration test; SchedulerStress_test; SchedulerUsage_test; SchedulerInvocation Layer-1; SchedulerCommutator Layer-2; activity.hpp description of »Render Activities«

Todo:

WIP 11/2024 »Playback Vertical Slice«

initial version of Scheduler was built and validated by scheduler-stress-test.cpp
now awaiting integration with Render-Node invocation and Job-Planning
very likely we'll extract a Scheduler-Interface (and this file then becomes a service-impl)

Definition in file scheduler.hpp.

#include "lib/error.hpp"
#include "vault/gear/block-flow.hpp"
#include "vault/gear/work-force.hpp"
#include "vault/gear/activity-lang.hpp"
#include "vault/gear/scheduler-commutator.hpp"
#include "vault/gear/scheduler-invocation.hpp"
#include "vault/gear/load-controller.hpp"
#include "vault/gear/engine-observer.hpp"
#include "vault/real-clock.hpp"
#include "lib/nocopy.hpp"
#include <optional>
#include <utility>

Classes
class	Scheduler::ExecutionCtx

class	Scheduler
	»Scheduler-Service« : coordinate render activities. More...

class	ScheduleSpec

struct	Scheduler::Setup
	Binding of worker callbacks to the scheduler implementation. More...

class	WorkTiming
	work-timing event for performance observation More...

Variables
const size_t	DISMISS_CYCLES = 100
	number of wait cycles before an idle worker terminates completely

Offset	DUTY_CYCLE_PERIOD {FSecs(1,20)}
	period of the regular scheduler »tick« for state maintenance.

Offset	DUTY_CYCLE_TOLERANCE {FSecs(2,10)}
	maximum slip tolerated on duty-cycle start before triggering Scheduler-emergency

Offset	FUTURE_PLANNING_LIMIT {FSecs{20}}
	limit timespan of deadline into the future (~360 MiB max)

const auto	IDLE_WAIT = 20ms
	sleep-recheck cycle for workers deemed idle

Namespaces
	vault
	Vault-Layer implementation namespace root.

	vault::gear
	Active working gear and plumbing.

Description

Thread coordination

Classes

Variables

Namespaces