|
rapidyaml 0.15.2
parse and emit YAML, and do it fast
|
Shows how to implement serialization for custom user types.
Shows how to implement serialization for custom user types.
rapidyaml provides a serialization implementation for all fundamental types, and optionally for some STL containers. To enable use with any type, rapidyaml uses ADL to dispatch to type-specific function overloads. For serializing or deserializing custom user types, you only need to provide the appropriate overloads, which are explained here.
There are two distinct type categories to consider regarding YAML serialization:
A container type will always require child nodes in the tree. A scalar type will always be a leaf (childless) node in the tree. Most of the time, a scalar will be converted to string and not require any meta info (like tags) or style flags set in the tree, but occasionally this will be needed.
So in fact, from the implementation point of view, the categories are the following:
To have rapidyaml interact with your types, you need to define functions where this is done, and then the compiler will have rapidyaml call your functions because of C++'s ADL rules.
Briefly stated, these are the functions you need to implement, under your type's namespace:
You may also implement read/write() using the node API instead of the tree API (but read the following section for details):
Read on for details.
You may have noticed above that there are two sets of functions: one for the node API and another for the tree API. You don't need to implement both. Simply put, the choice on which one to implement comes down to which one you want to use, but for maximum flexibility the default advice is to implement the tree read/write functions.
Here are the key considerations:
If you trigger the deserialization from a particular API, it will directly call the corresponding read/write() function. Further, rapidyaml's default implementation of node is calling into the tree read/write(), so that if you only implement this one, it is automagically picked even if you're calling from nodes. For example:
So again, it is best to choose to implement the tree read/write() functions.
As explained above, general types are those that require child nodes (in the case of containers), or are scalars that require extra NodeType flags to be set along with it. For each type, the functions you will to implement depend on whether you're reading or writing from the tree/node.
When writing general types to YAML, you need to define the following function:
Likewise, for writing keys you need to define the following function (but note the key MUST be a scalar):
The requirements for write() are less numerous than with read(). Inside write(), you may assume the node is valid, as rapidyaml will have made the required checks before calling your function, as specified by the call triggering the write (as described in How to use (de)serialization).
As for what you can do inside write(): generally you should only be setting/adding things to the node, and not to its key (that will generally have been dealt with elsewhere), typically with one of .set_seq() or .set_map() for containers, or .set_val() or .set_serialized(). Following this, for containers you should create and populate the children, with further calls to any of these functions, but now with child nodes and data structures as the targets.
To enable reading (deserialization) of a custom user type T falling into the general category, you need to define the following function:
Likewise, for reading keys you need to define the following function:
Then when you call any of NodeRef::load(), NodeRef::deserialize(), Tree::load() or Tree::deserialize() (as described in How to use (de)serialization), rapidyaml will call your read() function through the magic of C++ ADL / Koenig lookup. And likewise, when you call any of NodeRef::load_key(), NodeRef::deserialize_key(), Tree::load_key() or Tree::deserialize_key() (as described in How to use (de)serialization), rapidyaml will call your read_key() function. (But note the rapidyaml tree cannot accept containers as keys!)
The ReadResult return type is a lightweight truthy type, used to enable reporting either of success or of the offending node, when an error happens in nested reads. It evaluates as true (empty-initialized) when there is no error, or as false on error, and has the innermost node causing the error. This enables accurate error reporting, and is very useful on large YAML files; see also sample_location_tracking() to find the original source location of the offending node.
To start with an example, here is the rapidyaml implementation of read() for std::map:
The beginning rule is actually an assumption:
rapidyaml will already have checked for this as specified by the triggering call (see How to use (de)serialization).
Now the first rule:
This is needed to ensure that the node type matches the type of the destination variable. Concretely:
Why can't rapidyaml do this check for you before calling your read() function? Well, in the general case, it is impossible to know what type of node to expect, so rapidyaml can only check that the node is one of the VAL|SEQ|MAP cases above, but not concretely which one. It is up to the read() implementation for a type to specify which one.
However, note that inside read_key() you do not need a type check, as the rapidyaml tree requires that these are scalars (ie KEY), so rapidyaml does this check for you before calling read_key().
Now the next rule:
.load() triggers an error, while .deserialize() just returns, so you don't want to have a .deserialize() caller being aborted by a nested .load() call in your function. Let the top-level .load() caller trigger the error.
Finally,
Your implementation of read() or read_key() must return a truthy type to signify success of deserialization. The type should preferably be a ReadResult to enable accurate error reporting.
If the type is not ReadResult (like the legacy bool), rapidyaml will still work – although with the inconvenience of pointing only at the outer-most node instead of the actual error-causing node.
With this return value, rapidyaml will continue on success; on failure it will either return this value to the caller (with .deserialize()) or with .load() trigger a visit error on the reported node, as instructed by the triggering call (see How to use (de)serialization).
That's it for read()!
When a scalar type does not require any style or tags to be set in the tree, instead of defining read() / write() you can just define the direct serialization functions from_chars() and/or to_chars() to transform the scalar from/to its string representation.
To implement reading (deserialization) of scalar types, you need to define the following function:
The function receives a string fitted to the scalar, and must convert the string to the argument. To achieve this, you may find it useful to use the utilities in Charconv utilities or format: formatted string interpolation, which are very fast and efficient, and play nice with this approach. But that's not mandatory – you are also free to use any other conversion method you choose, such as fmtlib (but please do not use stringstreams; their performance is really bad).
Finally, you must return a boolean success status. rapidyaml will then react to this status in accordance with the call triggering the read.
To implement writing (serialization) of scalar types, you need to define the following function:
This function receives a buffer on which it is to write the serialization of var. Importantly, inside your function you cannot assume the buffer is large enough to fit the serialization of var. You must always check against its size.
You must return the number of bytes required to fit the serialization of var. Importantly, this size must not depend on the size of the buffer, which means you cannot do an early exit when you find the buffer is too small. The returned size must be invariant.
Upon returning, the caller will compare the returned size with the current buffer size. If the returned size is >= than the buffer size, it means the serialization succeeded, and we're done. Otherwise, it means the buffer was too small; then rapidyaml will resize the buffer and call the function again. For an example of this call pattern, see eg serialize_to_arena_scalar().
A typical implementation of to_chars() will look like this:
For instance, if your T is a string type, you could do: