SchedulerRequirements

State	Draft
Date	Mi 09 Jan 2013 12:04:03 CET
Proposed by	Ichthyostega <prg@ichthyostega.de>

Abstract

Define the expected core properties and requirements of the Scheduler service.

The rendering and playback subsystem relies on a Scheduler component to invoke individual frame rendering and resource fetching jobs. This RfC summarises the general assumptions and requirements other parts of the application are relying on

Description

The Scheduler is responsible for getting the individual render jobs to run. The basic idea is that individual render jobs should never block — and thus the calculation of a single frame might be split into several atomic jobs, including resource fetching. This expected usage should be considered together with the data exchange protocol defined for data output through the OutputSlot instances; moreover the extended data of the low-level model can be hot-swapped while rendering continues to go on, necessitating to release blocks of superseded model data at well defined points. Combining all these known usage constraints leads to the following requirements for the scheduler:

ordering of jobs: the scheduler has to ensure all prerequisites of a given job are met
job time window: when it is not possible to run a job within the defined target time window, it must not be run any more but rather be marked as failure
failure propagation: when a job fails, either due to an job internal error, or by timing glitch, the effect of this failure needs to propagate reliably; we need a mechanism for dependent jobs to receive a notification of such a failure state
conditional scheduling: we need to provide some way to tie the activity of jobs to external conditions, notable examples being the availability of cached data, or the arrival of data loaded from storage
superseding of planned jobs: changes in playback modes require us to “change the plan on-the-fly” — essentially this means we need to supersede a group of already planned jobs. Moreover, we need certain ordering guarantees to ensure the resulting switch in the effective output data happens once and without glitches.

The scheduler interface and specification establishes some kind of micro-language to encode the patterns of behaviour prompted by the playback control and the interpretation of the render node model. Together these basic requirements help to address some relevant themes

dependency on prerequisites

Render tasks don’t exist in isolation; they depend on prerequisites, both preceding calculations and the availability of data. Since our primary decision is to avoid blocking waits, these prerequisites need to be modelled as other jobs, which leads to dependencies and conditional scheduling.

detecting termination

The way other parts of the system are built, requires us to obtain a guaranteed knowledge of some specific job’s termination. More precisely, we need to find out when a “stream of calculations” has left a well defined domain — and this can be modelled by the activation of specific marker jobs. It is possible to obtain that knowledge with some timing leeway, but in the end, this information needs to arrive with absolutely reliability (violations leading to segfault).

job scheduling modes

The scheduler offers various modes of treatment on a per job base. The default is to handle jobs time based with a moderate latency. Alternatively jobs can be handled as background jobs, as freewheeling jobs (maximum usage of performance and bandwidth), or as low-latency timed jobs.

latency, reliability and precision

By involving a scheduler component we employ an asynchronous calculation model. This allows and necessitates to define special guarantees regarding various properties of job execution.

it is acknowledged that every scheduling involves some latency — which needs to be included into any calculation of deadlines. The latency limits the minimum time window we can target for scheduling an operation
it is acknowledged that timing specifications include some degree of fuzziness — but it is possible to give guarantees regarding correctness. The defined state transitions and notifications will happen reliably
it is acknowledged that the behaviour of the scheduler is non-deterministic (the way this term is used in computer science). Yet still we’ll impose some ordering guarantees, which will be observed with precision: both the adding and the superseding of a group of jobs happens in a transactional way, to retain the ordering according to dependency and job time.

Tasks

define the job interface (✔ done)
define a protocol for job state handling TBD
define the representation of dependencies and the notifications in practice TBD
verify the proposed requirements by an scheduler implementation sketch TBD

Discussion

Pros

the entity “job with defined properties” serves as an interface
open and complex patterns of behaviour can be built on top
a proper scheduler replaces several other mechanisms (threaded output backend, producer-consumer queue with locking, GUI animation services)
to provide an atomic execution service allows us to control various aspects of execution explicitly
in the end, this enables to scale and use various kinds of hardware

Cons

there is no “for-loop” to base any playback control structures on
compliance to externally imposed deadlines and memory management are challenging.

Alternatives

use a synchronous player with buffering
use a simplistic scheduler with entirely atomic jobs

We do not want (1), since it is tied to an obsolete hardware model and lacks the ability to be adapted to the new kinds of hardware available today or to be expected in near future. We do not want (2) since it essentially doesn’t solve any problem, but rather pushes complexity into the higher layers (Session, Stage), which are lacking the information about individual jobs and timing.

Rationale

We use a scheduler to gain flexibility in controlling various aspects of computation and I/O usage. Moreover, we turn the scheduler into an interface between the Vault and Steam-Layer; while the exact outfitting of the individual jobs highly depends on internals of the Session and Engine models, the properties of actual job execution, closely related to system programming are akin to the Vault. The actual requirements outlined in this RfC are derived from the internals of the player implementation, while the way these requirements are defined, and especially the aspects omitted from specification are derived from knowledge regarding the possible scheduler and vault layer implementation.

Comments

State → Draft

This RfC emerged from the work on the player implementation, which is the immediate client built on top of the scheduler service. At FrOSCon 2013 and the following developer meeting, we had an extended discussion covering various aspects of the possible scheduler implementation. The goal is to settle down on an interface definition, so the player and engine implementation can be developed independently of the scheduler implementation

Ichthyostega: Do 19 Sep 2013 21:31:07 CEST _{<prg@ichthyostega.de>}

Back to Lumiera Design Process overview

git://git.lumiera.org/LUMIERA →Gitweb	TRAC · timeline · roadmap
master · gui · proc · back · dok · web	recent · stalled · core-work · non-code
Builddrone · log	API Documentation (Doxygen)	Impressum