rapidyaml 0.15.2
parse and emit YAML, and do it fast
Loading...
Searching...
No Matches
Serialization of user types

Shows how to implement serialization for custom user types.

Shows how to implement serialization for custom user types.


rapidyaml provides a serialization implementation for all fundamental types, and optionally for some STL containers. To enable use with any type, rapidyaml uses ADL to dispatch to type-specific function overloads. For serializing or deserializing custom user types, you only need to provide the appropriate overloads, which are explained here.

See also
Serialization overview to find how the user-provided functions fit into rapidyaml's serialization pipeline.

Serialization type categories

There are two distinct type categories to consider regarding YAML serialization:

  • Container types. These represent a hierarchy of values (or containers) and must converted to/from a YAML map (MAP) or sequence (SEQ).
  • Scalar types. These types are encoded as scalars, but need to be transformed from/to their string representation in the YAML buffer.

A container type will always require child nodes in the tree. A scalar type will always be a leaf (childless) node in the tree. Most of the time, a scalar will be converted to string and not require any meta info (like tags) or style flags set in the tree, but occasionally this will be needed.

So in fact, from the implementation point of view, the categories are the following:

  • General types. Require extra structure/info from the tree: child nodes (required by containers) and/or tags or extra NodeType flags (required by some scalars).
  • Scalar types. These merely need to be converted to string and then set as scalars on the tree, without needing to set any tags or extra NodeType flags.

To have rapidyaml interact with your types, you need to define functions where this is done, and then the compiler will have rapidyaml call your functions because of C++'s ADL rules.

Briefly stated, these are the functions you need to implement, under your type's namespace:

// IMPORTANT: define under the namespace of T. Read note below.
namespace your_namespace {
// tree API implementation for general types (containers
// or scalars requiring extra info from the tree):
//
// needed only if you're deserializing T:
// needed only if you're serializing T:
void write(c4::yml::Tree * tree, c4::yml::id_type node_id, T const& var);
// or...
// special case for scalars not needing interaction with the tree:
//
// needed only if you're deserializing T:
bool from_chars(c4::yml::csubstr str, T* var);
// needed only if you're serializing T:
size_t to_chars(c4::yml::substr buffer, T const& var);
// optional:
c4::yml::type_bits scalar_flags_val(T const& var); // set extra style flags on T vals
c4::yml::type_bits scalar_flags_key(T const& var); // set extra style flags on T keys
// or...
// special case for writing string scalars: no need to convert to chars!
// mark as string
template<> struct c4::is_string<T> : std::true_type {};
// instead of to_chars()
c4::yml::csubstr to_csubstr(T const& var);
// rest as above for scalars
} // namespace
uint32_t type_bits
the integral type necessary to cover all the bits for NodeType_e
Definition node_type.hpp:26
ryml::ReadResult read(ryml::Tree const *tree, ryml::id_type id, my_seq_type< T > *seq)
void write(ryml::Tree *tree, ryml::id_type id, my_seq_type< T > const &seq)
size_t to_chars(ryml::substr buf, vec2< T > v)
bool from_chars(ryml::csubstr buf, vec2< T > *v)
RYML_ID_TYPE id_type
The type of a node id in the YAML tree; to override the default type, define the macro RYML_ID_TYPE t...
Definition common.hpp:124
a traits class to mark a type as a string type, meaning c4::to_csubstr() can be used directly instead...
Definition substr.hpp:134
A lightweight truthy type, used to enable reporting the offending node when a deserializing error hap...
Definition common.hpp:162
Important
Because of C++'s ADL rules, it is required to overload these functions in the namespace of the type you're serializing. Here's an example of an issue where failing to do this was causing problems in some platforms.

You may also implement read/write() using the node API instead of the tree API (but read the following section for details):

// IMPORTANT: define %read() under the namespace of T. Read note above.
namespace your_namespace {
// node API implementation for general types (old approach)
// needed only if you're deserializing T:
// needed only if you're serializing T:
void write(c4::yml::NodeRef * node, T const& var);
} // namespace
Holds a pointer to an existing tree, and a node id.
Definition node.hpp:737
A reference to a node in an existing yaml tree, offering a more convenient API than the index-based A...
Definition node.hpp:1063
Note
For maximum flexibility you should prefer implementing the tree read/write.

Read on for details.



Why you should prefer implementing with tree API

You may have noticed above that there are two sets of functions: one for the node API and another for the tree API. You don't need to implement both. Simply put, the choice on which one to implement comes down to which one you want to use, but for maximum flexibility the default advice is to implement the tree read/write functions.

Here are the key considerations:

  • If you trigger the deserialization from a particular API, it will directly call the corresponding read/write() function. Further, rapidyaml's default implementation of node is calling into the tree read/write(), so that if you only implement this one, it is automagically picked even if you're calling from nodes. For example:

    T var;
    // tree calls
    Tree tree = ...;
    id_type node_id = ...;
    node.load(&var) // calls read(Tree const*,id_type,T*)
    if(!node.deserialize(&var)) ...; // calls read(Tree const*,id_type,T*)
    tree.save(var); // calls write(Tree*,id_type,T const&)
    tree.set_serialized(&var); // calls write(Tree*,id_type,T const&)
    // node calls - forwarding to tree by default
    NodeRef node = ...;
    node.load(&var); // calls read(ConstNodeRef const&,T*)
    // -> rapidyaml calls read(Tree const*,id_type,T*)
    if(!node.deserialize(&var)) ...; // calls read(ConstNodeRef const&,T*)
    // -> rapidyaml calls read(Tree const*,id_type,T*)
    node.save(var); // calls write(NodeRef*,T const&)
    // -> rapidyaml calls write(Tree*,id_type,T const&)
    node.set_serialized(&var); // calls write(NodeRef*,T const&)
    // -> rapidyaml calls write(Tree*,id_type,T const&)
  • By default, a tree read/write() impl will get called from a node call. rapidyaml's node impl calls into the tree impl. This means that if you implement the tree read/write(), rapidyaml will pick it up even if you are triggering it with the node API**.
  • If you implement node read/write(), they will be picked up by a node call, but not by a tree call. Further, if you also implement tree read/writes, they will only be picked up by a tree call.
  • If you implement node read/write(), it hides rapidyaml's default implementation of calling the tree read/write(), so if you then want to call tree deserialization, you will also need to implement tree read/write().

So again, it is best to choose to implement the tree read/write() functions.



Implementation notes: general types

As explained above, general types are those that require child nodes (in the case of containers), or are scalars that require extra NodeType flags to be set along with it. For each type, the functions you will to implement depend on whether you're reading or writing from the tree/node.


Writing general types

When writing general types to YAML, you need to define the following function:

// implement these functions for T ...
namespace your_namespace { // IMPORTANT read note about namespace above
void write(c4::yml::Tree *tree, c4::yml::id_type node_id, T const& var);
// or, if you want to use the node API,
void write(c4::yml::NodeRef *scalar, T const& var);
} // namespace

Likewise, for writing keys you need to define the following function (but note the key MUST be a scalar):

// implement these functions for T ...
namespace your_namespace { // IMPORTANT read note about namespace above
void write_key(c4::yml::Tree *tree, c4::yml::id_type node_id, T const& var);
// or, if you want to use the node API,
void write_key(c4::yml::NodeRef *scalar, T const& var);
} // namespace

The requirements for write() are less numerous than with read(). Inside write(), you may assume the node is valid, as rapidyaml will have made the required checks before calling your function, as specified by the call triggering the write (as described in How to use (de)serialization).

As for what you can do inside write(): generally you should only be setting/adding things to the node, and not to its key (that will generally have been dealt with elsewhere), typically with one of .set_seq() or .set_map() for containers, or .set_val() or .set_serialized(). Following this, for containers you should create and populate the children, with further calls to any of these functions, but now with child nodes and data structures as the targets.

Note
See examples of write() implementations:


Reading general types

To enable reading (deserialization) of a custom user type T falling into the general category, you need to define the following function:

// IMPORTANT: define read() under the namespace of T. Read warning above.
namespace your_namespace {
// and/or, if you prefer the node API
} // namespace

Likewise, for reading keys you need to define the following function:

// IMPORTANT: define %read_key() under the namespace of T. Read warning above.
namespace your_namespace {
c4::yml::ReadResult read_key(c4::yml::Tree const *tree, c4::yml::id_type node_id, T* var);
// and/or, if you prefer the node API
} // namespace

Then when you call any of NodeRef::load(), NodeRef::deserialize(), Tree::load() or Tree::deserialize() (as described in How to use (de)serialization), rapidyaml will call your read() function through the magic of C++ ADL / Koenig lookup. And likewise, when you call any of NodeRef::load_key(), NodeRef::deserialize_key(), Tree::load_key() or Tree::deserialize_key() (as described in How to use (de)serialization), rapidyaml will call your read_key() function. (But note the rapidyaml tree cannot accept containers as keys!)

The ReadResult return type is a lightweight truthy type, used to enable reporting either of success or of the offending node, when an error happens in nested reads. It evaluates as true (empty-initialized) when there is no error, or as false on error, and has the innermost node causing the error. This enables accurate error reporting, and is very useful on large YAML files; see also sample_location_tracking() to find the original source location of the offending node.

To start with an example, here is the rapidyaml implementation of read() for std::map:

template<class K, class V, class Less, class Alloc>
c4::yml::ReadResult read(c4::yml::Tree const* tree, c4::yml::id_type id, std::map<K, V, Less, Alloc> * m)
{
// RULE 0. you may assume tree and id are valid.
if(!tree->is_map(id)) // RULE 1. check node type
return c4::yml::ReadResult(id); // report error on this id
for(id_type child = tree->first_child(id); child != NONE; child = tree->next_sibling(child))
{
K k{};
// RULE 2. use .deserialize(), not .load()
c4::yml::ReadResult result = tree->deserialize_key(child, &k);
if((!result))
return result; // RULE 3. early exit on error
result = tree->deserialize(child, &(*m)[std::move(k)]);
if(!result)
return result; // may refer to a deeply nested node!
}
return ReadResult{}; // report success
}
id_type first_child(id_type node) const
Definition tree.hpp:577
bool is_map(id_type node) const
Definition tree.hpp:480
ReadResult deserialize(id_type node, T *v) const
(1) deserialize a node's contents to a variable
Definition tree.hpp:954
id_type next_sibling(id_type node) const
Definition tree.hpp:572
ReadResult deserialize_key(id_type node, T *v) const
(1) deserialize a node's key to a variable
Definition tree.hpp:975
ReadResult read(ConstNodeRef const &n, T *v)
Definition node.hpp:2074
@ NONE
an index to none
Definition common.hpp:131


The beginning rule is actually an assumption:

Important
Rule 0. Inside your implementation of read() or read_key(), you may assume the node is valid (ie, that the tree and node_id are valid).

rapidyaml will already have checked for this as specified by the triggering call (see How to use (de)serialization).


Now the first rule:

Important
Rule 1. Inside read(), start with a node type check: must be exactly one of VAL (for scalars), SEQ (for sequence types) or MAP (for dictionary types). read_key() does not require a KEY check.

This is needed to ensure that the node type matches the type of the destination variable. Concretely:

  • If you're reading a scalar type like a number or a string, the node must be VAL, ie it must verify NodeType::has_val().
  • If you're reading a sequence type like a vector, the node must be a SEQ, ie it should verify NodeType::is_seq().
  • If you're reading a map type, the node should be a VAL, ie it should verify NodeType::is_map().

Why can't rapidyaml do this check for you before calling your read() function? Well, in the general case, it is impossible to know what type of node to expect, so rapidyaml can only check that the node is one of the VAL|SEQ|MAP cases above, but not concretely which one. It is up to the read() implementation for a type to specify which one.

However, note that inside read_key() you do not need a type check, as the rapidyaml tree requires that these are scalars (ie KEY), so rapidyaml does this check for you before calling read_key().


Now the next rule:

Important
Rule 2. Inside read(), use .deserialize() and not .load(), to play nice with .deserialize() callers calling your function. For read_key() it should be .deserialize_key() instead of .load_key().

.load() triggers an error, while .deserialize() just returns, so you don't want to have a .deserialize() caller being aborted by a nested .load() call in your function. Let the top-level .load() caller trigger the error.


Finally,

Important
Rule 3. Check every read and do early exit on error, adequately filling the ReadResult return type.

Your implementation of read() or read_key() must return a truthy type to signify success of deserialization. The type should preferably be a ReadResult to enable accurate error reporting.

If the type is not ReadResult (like the legacy bool), rapidyaml will still work – although with the inconvenience of pointing only at the outer-most node instead of the actual error-causing node.

With this return value, rapidyaml will continue on success; on failure it will either return this value to the caller (with .deserialize()) or with .load() trigger a visit error on the reported node, as instructed by the triggering call (see How to use (de)serialization).

That's it for read()!

Note
See examples of read() implementations:



Implementation notes: scalars

When a scalar type does not require any style or tags to be set in the tree, instead of defining read() / write() you can just define the direct serialization functions from_chars() and/or to_chars() to transform the scalar from/to its string representation.

Note
Please take note of the following pitfall when using scalar serialization functions: you may have to include the header with your from_chars() / to_chars() implementation before any other headers that use functions from it.


Reading scalars

To implement reading (deserialization) of scalar types, you need to define the following function:

namespace your_namespace {
bool from_chars(c4::yml::csubstr str, T* var); // if you want to read from YAML
} // namespace

The function receives a string fitted to the scalar, and must convert the string to the argument. To achieve this, you may find it useful to use the utilities in Charconv utilities or format: formatted string interpolation, which are very fast and efficient, and play nice with this approach. But that's not mandatory – you are also free to use any other conversion method you choose, such as fmtlib (but please do not use stringstreams; their performance is really bad).

Finally, you must return a boolean success status. rapidyaml will then react to this status in accordance with the call triggering the read.

Note
See examples of from_chars() implementations:


Writing scalars

To implement writing (serialization) of scalar types, you need to define the following function:

namespace your_namespace {
size_t to_chars(c4::yml::substr buffer, T const& var); // if you want to write to YAML
} // namespace

This function receives a buffer on which it is to write the serialization of var. Importantly, inside your function you cannot assume the buffer is large enough to fit the serialization of var. You must always check against its size.

You must return the number of bytes required to fit the serialization of var. Importantly, this size must not depend on the size of the buffer, which means you cannot do an early exit when you find the buffer is too small. The returned size must be invariant.

Upon returning, the caller will compare the returned size with the current buffer size. If the returned size is >= than the buffer size, it means the serialization succeeded, and we're done. Otherwise, it means the buffer was too small; then rapidyaml will resize the buffer and call the function again. For an example of this call pattern, see eg serialize_to_arena_scalar().

A typical implementation of to_chars() will look like this:

namespace your_namespace {
size_t to_chars(c4::yml::substr buffer, T const& var)
{
size_t pos = 0;
for(... var) // iterate over var, adding characters to the buffer
{
// append another char to the buffer: only if possible!
// BUT do not break the loop if the buffer is too small.
// Continue doing a blank loop until the end, to count
// the needed characters
if(pos < buffer.len)
buffer[pos] = ...;
++pos; // keep counting, even if we already know
// the buffer is small!
}
return pos; // now we know the required size, return it
}
} // namespace

For instance, if your T is a string type, you could do:

namespace your_namespace {
size_t to_chars(c4::yml::substr buffer, T const& var)
{
size_t sz = var.size();
if(sz && sz <= buffer.len)
memcpy(buffer.str, var.data(), sz);
return sz;
}
} // namespace
Note
See examples of to_chars() implementations:


Further reading for scalar serialization