The new emerging NLE for GNU/Linux

to symbolically represent hierarchically structured elements, without actually implementing them.

The purpose of this “external” description is to remove the need of a central data model to work against. We consider such a foundation data model as a good starting point, yet harmful for the evolution of any larger structure to be built. According to the subsidiarity principle, we prefer to turn the working data representation into a local concern. Which leaves us with the issue of collaboration. Any collaboration requires, as an underlying, some kind of common understanding. And any formalised, mechanical collaboration requires to represent that common point of attachment — at least as symbolic representation. The »External Tree Description« is shaped to fulfil this need: in theory, the whole field could be represented, symbolically, as a network of hierarchically structured elements. Yet, in practice, all we need is to conceive the presence of such a representation, as a backdrop to work against. And we do so — we work against that symbolic representation, by describing changes made to the structure and its elements. Thus, the description of changes, the diff language, refers to and partially embodies such symbolically represented elements and relations.

Elements, Nodes and Records

We have to deal with entities and relationships. Entities are considered the building blocks, the elements, which are related by directional links. Within the symbolic representation, elements are conceived as generic nodes (GenNode), while the directed relations are impersonated as being attached or rooted at the originating side, so the target of a relation has no traces or knowledge of being part of that relation. Moreover, each of our nodes bears a relatively clear-cut identity. That is to say, within the relevant scope in question, this identity is unique. Together, these are the building blocks to represent any graph.

For practical purposes, we have to introduce some distinctions and limitations.

  • we have to differentiate the generic node to be either a mere data element, or an object-like record

  • the former, a mere data element, is considered to be “just data”, to be “right here” and without further meta information. You need to know what it is to deal with it.

  • to the contrary, a Record has an associated, symbolic and typed ID, plus it can potentially be associated with and thus relate to further elements, with the relation originating at the Record.

  • and indeed we distinguish two different kinds of relations possibly originating from a Record:

    • attributes are known by-name; they can be addressed through this name-ID as a key, while the value is again a generic node, possibly even another record.

    • children to the contrary can only be enumerated; they are considered to be within (and form) the scope of the given record (“object”).

And there is a further limitation: The domain of possible data is fixed, even hard wired.
[ Implementation-wise, this turns the data within the generic node into a »Variant« (typesafe union).]
Basically, this opens two different ways to access the data within a given GenNode: either you know the type to expect beforehand.
[and the validity of this assumption is checked on each access; please recall, all of this is meant for symbolic representation, not for implementation of high performance computing]
Or we offer the ability for generic access through a double dispatch (»Visitor«). The latter includes the option to handle just some of the possible content types and to ignore the other.
[making the variant visitor a partial function — as in any non exhaustive pattern match]

data elements

Basically, we can expect to encounter the following kinds of fundamental data elements

  • int, int64_t, short, char

  • bool

  • double

  • std::string

  • time::Time, time::Offset, time::Duration, time::TimeSpan

  • hash::LuidH (to address and refer to elements known by ID)

  • diff::Record<GenNode>

The last option is what makes our representation recursive.
[Regarding the implementation, all these data elements are embedded inline, as values. With the exception of the record, which, like any std::vector implicitly uses heap allocations for the members of the collection.]

names, identity and typing

It was a design decision that the generic node shall not embody a readable type field, just a type selector within the variant to hold the actual data elements. This decision more or less limits the usefulness of simple values as children to those cases, where all children are of uniform type, or where we agree to deal with all children through variant visitation solely. Of course, we can still use simple values as attributes, since those are known and addressed by name.
[As an extension, we could use filtering by type to limit access to some children of type Record, since every record does indeed embody a symbolic type name, an attribute named "type". It must be that way, since otherwise, records would be pretty much useless as representation for any object like entity.]

The discriminating ID of any GenNode can serve as a name, and indeed will be used as the name of an attribute within a record. This entry-ID of the node is comprised of a human readable symbolic part, and a hash ID (LUID). The calculation of the latter, the hash, includes the symbolic ID and a type information. This is what constitutes the full identity — so two nodes with the same name but different payload type are treated as different elements.

A somewhat related design question is that of ordering and uniqueness of children. While attributes — due to the usage of the attribute node’s ID as name-key — are bound to be unique within a given Record, children within the scope of a record could be required to be unique too, making the scope a set. And, of course, children could be forcibly ordered, or just retain the initial ordering, or even be completely unordered. On a second thought, it seems wise not to impose any guarantees in that regard, beyond the simple notion of retaining an initial sequence order, the way a “stable” sorting algorithm does. All these more specific ordering properties can be considered the concern of some specific kinds of objects — which then just happen to “supply” a list of children for symbolic representation as they see fit.