State | Draft |
Date | Mi 09 Jan 2013 12:04:03 CET |
Proposed by | Ichthyostega <prg@ichthyostega.de> |
The rendering and playback subsystem relies on a Scheduler component to invoke individual frame rendering and resource fetching jobs. This RfC summarises the general assumptions and requirements other parts of the application are relying on
Description
The Scheduler is responsible for getting the individual render jobs to run.
The basic idea is that individual render jobs should never block — and thus
the calculation of a single frame might be split into several atomic jobs,
including resource fetching. This expected usage should be considered together
with the data exchange protocol defined for data output through the OutputSlot
instances; moreover the extended data of the low-level model can be hot-swapped
while rendering continues to go on, necessitating to release blocks of superseded
model data at well defined points. Combining all these known usage constraints
leads to the following requirements for the scheduler:
- ordering of jobs
-
the scheduler has to ensure all prerequisites of a given job are met
- job time window
-
when it is not possible to run a job within the defined target time window, it must not be run any more but rather be marked as failure
- failure propagation
-
when a job fails, either due to an job internal error, or by timing glitch, the effect of this failure needs to propagate reliably; we need a mechanism for dependent jobs to receive a notification of such a failure state
- conditional scheduling
-
we need to provide some way to tie the activity of jobs to external conditions, notable examples being the availability of cached data, or the arrival of data loaded from storage
- superseding of planned jobs
-
changes in playback modes require us to “change the plan on-the-fly” — essentially this means we need to supersede a group of already planned jobs. Moreover, we need certain ordering guarantees to ensure the resulting switch in the effective output data happens once and without glitches.
The scheduler interface and specification establishes some kind of micro-language to encode the patterns of behaviour prompted by the playback control and the interpretation of the render node model. Together these basic requirements help to address some relevant themes
dependency on prerequisites
Render tasks don’t exist in isolation; they depend on prerequisites, both preceding calculations and the availability of data. Since our primary decision is to avoid blocking waits, these prerequisites need to be modelled as other jobs, which leads to dependencies and conditional scheduling.
detecting termination
The way other parts of the system are built, requires us to obtain a guaranteed knowledge of some specific job’s termination. More precisely, we need to find out when a “stream of calculations” has left a well defined domain — and this can be modelled by the activation of specific marker jobs. It is possible to obtain that knowledge with some timing leeway, but in the end, this information needs to arrive with absolutely reliability (violations leading to segfault).
job scheduling modes
The scheduler offers various modes of treatment on a per job base. The default is to handle jobs time based with a moderate latency. Alternatively jobs can be handled as background jobs, as freewheeling jobs (maximum usage of performance and bandwidth), or as low-latency timed jobs.
latency, reliability and precision
By involving a scheduler component we employ an asynchronous calculation model. This allows and necessitates to define special guarantees regarding various properties of job execution.
-
it is acknowledged that every scheduling involves some latency — which needs to be included into any calculation of deadlines. The latency limits the minimum time window we can target for scheduling an operation
-
it is acknowledged that timing specifications include some degree of fuzziness — but it is possible to give guarantees regarding correctness. The defined state transitions and notifications will happen reliably
-
it is acknowledged that the behaviour of the scheduler is non-deterministic (the way this term is used in computer science). Yet still we’ll impose some ordering guarantees, which will be observed with precision: both the adding and the superseding of a group of jobs happens in a transactional way, to retain the ordering according to dependency and job time.
Tasks
-
define the job interface (✔ done)
-
define a protocol for job state handling TBD
-
define the representation of dependencies and the notifications in practice TBD
-
verify the proposed requirements by an scheduler implementation sketch TBD
Discussion
Pros
-
the entity “job with defined properties” serves as an interface
-
open and complex patterns of behaviour can be built on top
-
a proper scheduler replaces several other mechanisms (threaded output backend, producer-consumer queue with locking, GUI animation services)
-
to provide an atomic execution service allows us to control various aspects of execution explicitly
-
in the end, this enables to scale and use various kinds of hardware
Cons
-
there is no “for-loop” to base any playback control structures on
-
compliance to externally imposed deadlines and memory management are challenging.
Alternatives
-
use a synchronous player with buffering
-
use a simplistic scheduler with entirely atomic jobs
We do not want (1), since it is tied to an obsolete hardware model and lacks the ability to be adapted to the new kinds of hardware available today or to be expected in near future. We do not want (2) since it essentially doesn’t solve any problem, but rather pushes complexity into the higher layers (Session, Stage), which are lacking the information about individual jobs and timing.
Rationale
We use a scheduler to gain flexibility in controlling various aspects of computation and I/O usage. Moreover, we turn the scheduler into an interface between the Vault and Steam-Layer; while the exact outfitting of the individual jobs highly depends on internals of the Session and Engine models, the properties of actual job execution, closely related to system programming are akin to the Vault. The actual requirements outlined in this RfC are derived from the internals of the player implementation, while the way these requirements are defined, and especially the aspects omitted from specification are derived from knowledge regarding the possible scheduler and vault layer implementation.
Comments
This RfC emerged from the work on the player implementation, which is the immediate client built on top of the scheduler service. At FrOSCon 2013 and the following developer meeting, we had an extended discussion covering various aspects of the possible scheduler implementation. The goal is to settle down on an interface definition, so the player and engine implementation can be developed independently of the scheduler implementation
- Ichthyostega
-
Do 19 Sep 2013 21:31:07 CEST <prg@ichthyostega.de>
Back to Lumiera Design Process overview