rapidyaml 0.15.2
parse and emit YAML, and do it fast
Loading...
Searching...
No Matches
doxy_serialization_user_types.hpp
Go to the documentation of this file.
1
2// DANGER: Keep markdown []() links in a single line!!!
3//
4// doxygen is broken and fails to render the markdown links when
5// they span multi lines.
6
7
8#include "c4/yml/tree.hpp"
9#include "c4/yml/node.hpp"
11
12namespace c4 {
13namespace yml {
14
15
16/** @addtogroup doc_serialization_user_types
17
18<br>
19<hr>
20## Serialization type categories
21
22There are two distinct type categories to consider regarding YAML
23serialization:
24
25 - **Container types**. These represent a hierarchy of values (or
26 containers) and must converted to/from a YAML map (@ref MAP) or
27 sequence (@ref SEQ).
28
29 - **Scalar types**. These types are encoded as scalars, but need
30 to be transformed from/to their string representation in the
31 YAML buffer.
32
33
34A container type will always require child nodes in the tree. A scalar
35type will always be a leaf (childless) node in the tree. Most of the
36time, a scalar will be converted to string and not require any meta
37info (like tags) or style flags set in the tree, but occasionally this
38will be needed.
39
40So in fact, from the implementation point of view, the categories are
41the following:
42
43 - **General types**. Require extra structure/info from the tree:
44 child nodes (required by containers) and/or tags or extra @ref
45 NodeType flags (required by some scalars).
46
47 - **Scalar types**. These merely need to be converted to string and
48 then set as scalars on the tree, without needing to set any tags
49 or extra @ref NodeType flags.
50
51
52To have rapidyaml interact with your types, you need to define functions
53where this is done, and then the compiler will have rapidyaml call your
54functions because of [C++'s ADL rules](http://en.cppreference.com/w/cpp/language/adl).
55
56Briefly stated, these are the functions you need to implement, **under
57your type's namespace**:
58
59@code{c++}
60// IMPORTANT: define under the namespace of T. Read note below.
61namespace your_namespace {
62
63// tree API implementation for general types (containers
64// or scalars requiring extra info from the tree):
65//
66// needed only if you're deserializing T:
67c4::yml::ReadResult read(c4::yml::Tree const *tree, c4::yml::id_type node_id, T* var);
68// needed only if you're serializing T:
69void write(c4::yml::Tree * tree, c4::yml::id_type node_id, T const& var);
70
71// or...
72
73// special case for scalars not needing interaction with the tree:
74//
75// needed only if you're deserializing T:
76bool from_chars(c4::yml::csubstr str, T* var);
77// needed only if you're serializing T:
78size_t to_chars(c4::yml::substr buffer, T const& var);
79// optional:
80c4::yml::type_bits scalar_flags_val(T const& var); // set extra style flags on T vals
81c4::yml::type_bits scalar_flags_key(T const& var); // set extra style flags on T keys
82
83// or...
84
85// special case for writing string scalars: no need to convert to chars!
86// mark as string
87template<> struct c4::is_string<T> : std::true_type {};
88// instead of to_chars()
89c4::yml::csubstr to_csubstr(T const& var);
90// rest as above for scalars
91
92} // namespace
93@endcode
94
95
96@important Because of [C++'s ADL
97rules](http://en.cppreference.com/w/cpp/language/adl), **it is
98required to overload these functions in the namespace of the type**
99you're serializing. Here's an [example of an issue](https://github.com/biojppm/rapidyaml/issues/424)
100where failing to do this was causing problems in some platforms.
101
102
103You may also implement %read/write() using the node API instead of the
104tree API (but read the following section for details):
105
106@code{c++}
107// IMPORTANT: define %read() under the namespace of T. Read note above.
108namespace your_namespace {
109
110// node API implementation for general types (old approach)
111// needed only if you're deserializing T:
112c4::yml::ReadResult read(c4::yml::ConstNodeRef node, T* var);
113// needed only if you're serializing T:
114void write(c4::yml::NodeRef * node, T const& var);
115
116} // namespace
117@endcode
118
119@note For maximum flexibility you should prefer implementing the
120tree %read/write.
121
122
123Read on for details.
124
125
126// <br>
127// <hr>
128
129## Why you should prefer implementing with tree API
130
131You may have noticed above that there are two sets of functions: one
132for the node API and another for the tree API. You don't need to
133implement both. Simply put, the choice on which one to implement comes
134down to which one you want to use, but for maximum flexibility
135the **default advice is to implement the tree %read/write functions**.
136
137Here are the key considerations:
138
139 - If you trigger the deserialization from a particular API, it will
140 directly call the corresponding %read/write() function. Further,
141 rapidyaml's default implementation of node is calling into the tree
142 %read/write(), so that if you only implement this one, it is
143 automagically picked even if you're calling from nodes. For
144 example:
145
146 @code{c++}
147 T var;
148
149 // tree calls
150 Tree tree = ...;
151 id_type node_id = ...;
152 node.load(&var) // calls read(Tree const*,id_type,T*)
153 if(!node.deserialize(&var)) ...; // calls read(Tree const*,id_type,T*)
154 tree.save(var); // calls write(Tree*,id_type,T const&)
155 tree.set_serialized(&var); // calls write(Tree*,id_type,T const&)
156
157 // node calls - forwarding to tree by default
158 NodeRef node = ...;
159 node.load(&var); // calls read(ConstNodeRef const&,T*)
160 // -> rapidyaml calls read(Tree const*,id_type,T*)
161 if(!node.deserialize(&var)) ...; // calls read(ConstNodeRef const&,T*)
162 // -> rapidyaml calls read(Tree const*,id_type,T*)
163 node.save(var); // calls write(NodeRef*,T const&)
164 // -> rapidyaml calls write(Tree*,id_type,T const&)
165 node.set_serialized(&var); // calls write(NodeRef*,T const&)
166 // -> rapidyaml calls write(Tree*,id_type,T const&)
167 @endcode
168
169 - By default, a tree %read/write() impl will get called from a node
170 call. rapidyaml's node impl calls into the tree impl. This means that
171 if you implement the tree %read/write(), rapidyaml will pick it up
172 **even if you are triggering it with the node API**.
173
174 - If you implement node %read/write(), they will be picked up by a
175 node call, but not by a tree call. Further, if you also implement
176 tree %read/writes, they will only be picked up by a tree call.
177
178 - If you implement node %read/write(), it hides rapidyaml's default
179 implementation of calling the tree %read/write(), so if you then
180 want to call tree deserialization, you will also need to implement
181 tree %read/write().
182
183So again, it is best to choose to implement the tree %read/write() functions.
184
185
186
187// <br>
188// <hr>
189
190## Implementation notes: general types
191
192As explained above, general types are those that require child nodes
193(in the case of containers), or are scalars that require extra @ref
194NodeType flags to be set along with it. For each type, the functions
195you will to implement depend on whether you're reading or writing from
196the tree/node.
197
198
199
200// <br>
201### Writing general types
202
203When writing general types to YAML, you need to define the following
204function:
205
206@code{c++}
207// implement these functions for T ...
208namespace your_namespace { // IMPORTANT read note about namespace above
209void write(c4::yml::Tree *tree, c4::yml::id_type node_id, T const& var);
210// or, if you want to use the node API,
211void write(c4::yml::NodeRef *scalar, T const& var);
212} // namespace
213@endcode
214
215Likewise, for writing keys you need to define the following function
216(but note the key MUST be a scalar):
217
218@code{c++}
219// implement these functions for T ...
220namespace your_namespace { // IMPORTANT read note about namespace above
221void write_key(c4::yml::Tree *tree, c4::yml::id_type node_id, T const& var);
222// or, if you want to use the node API,
223void write_key(c4::yml::NodeRef *scalar, T const& var);
224} // namespace
225@endcode
226
227The requirements for `%write()` are less numerous than with
228%read(). Inside `%write()`, you may assume the node is valid, as rapidyaml
229will have made the required checks before calling your function, as
230specified by the call triggering the %write (as described in @ref
231doc_serialization_using).
232
233As for what you can do inside `%write()`: generally you should only be
234setting/adding things to the node, and not to its key (that
235will generally have been dealt with elsewhere), typically with one of
236[.set_seq()](@ref Tree::set_seq()) or
237[.set_map()](@ref Tree::set_map()) for containers,
238or [.set_val()](@ref Tree::set_val()) or
239[.set_serialized()](@ref Tree::set_serialized()). Following this, for
240containers you should create and populate the children, with further
241calls to any of these functions, but now with child nodes and data
242structures as the targets.
243
244
245@note See examples of `%write()` implementations:
246 - @ref doc_serialization_tree_write
247 - @ref doc_serialization_node_write
248 - see the [vector write implementation](@ref src/c4/yml/std/vector.hpp)
249 - see the [map write implementation](@ref src/c4/yml/std/map.hpp).
250 - see the sample @ref sample_user_container_types
251 - see the sample @ref sample_std_types
252
253
254
255// <br>
256### Reading general types
257
258To enable reading (deserialization) of a custom user type T falling
259into the general category, you need to define the following function:
260
261@code{c++}
262// IMPORTANT: define read() under the namespace of T. Read warning above.
263namespace your_namespace {
264c4::yml::ReadResult read(c4::yml::Tree const *tree, c4::yml::id_type node_id, T* var);
265// and/or, if you prefer the node API
266c4::yml::ReadResult read(c4::yml::ConstNodeRef node, T* var);
267} // namespace
268@endcode
269
270Likewise, for reading keys you need to define the following function:
271@code{c++}
272// IMPORTANT: define %read_key() under the namespace of T. Read warning above.
273namespace your_namespace {
274c4::yml::ReadResult read_key(c4::yml::Tree const *tree, c4::yml::id_type node_id, T* var);
275// and/or, if you prefer the node API
276c4::yml::ReadResult read_key(c4::yml::ConstNodeRef node, T* var);
277} // namespace
278@endcode
279
280
281Then when you call any of @ref NodeRef::load(), @ref
282NodeRef::deserialize(), @ref Tree::load() or @ref Tree::deserialize()
283(as described in @ref doc_serialization_using), rapidyaml will call
284your `%read()` function through the magic of C++ ADL / Koenig
285lookup. And likewise, when you call any of @ref NodeRef::load_key(),
286@ref NodeRef::deserialize_key(), @ref Tree::load_key() or @ref
287Tree::deserialize_key() (as described in @ref
288doc_serialization_using), rapidyaml will call your `%read_key()`
289function. (**But note the rapidyaml tree cannot accept containers as
290keys!**)
291
292
293The @ref ReadResult return type is a lightweight truthy type, used to
294enable reporting either of success or of the offending node, when an
295error happens in nested reads. It evaluates as true
296(empty-initialized) when there is no error, or as false on error, and
297has the innermost node causing the error. This enables accurate error
298reporting, and is very useful on large YAML files; see also @ref
299sample_location_tracking() to find the original source location of the
300offending node.
301
302
303
304To start with an example, here is the rapidyaml implementation of `%read()` for
305`std::map`:
306
307@code{c++}
308template<class K, class V, class Less, class Alloc>
309c4::yml::ReadResult read(c4::yml::Tree const* tree, c4::yml::id_type id, std::map<K, V, Less, Alloc> * m)
310{
311 // RULE 0. you may assume tree and id are valid.
312 if(!tree->is_map(id)) // RULE 1. check node type
313 return c4::yml::ReadResult(id); // report error on this id
314 for(id_type child = tree->first_child(id); child != NONE; child = tree->next_sibling(child))
315 {
316 K k{};
317 // RULE 2. use .deserialize(), not .load()
318 c4::yml::ReadResult result = tree->deserialize_key(child, &k);
319 if((!result))
320 return result; // RULE 3. early exit on error
321 result = tree->deserialize(child, &(*m)[std::move(k)]);
322 if(!result)
323 return result; // may refer to a deeply nested node!
324 }
325 return ReadResult{}; // report success
326}
327@endcode
328
329
330<br>
331The beginning rule is actually an assumption:
332
333@important Rule 0. Inside your implementation of `%read()` or
334`%read_key()`, you may assume the node is valid (ie, that the tree and
335node_id are valid).
336
337rapidyaml will already have checked for this as specified by the
338triggering call (see @ref doc_serialization_using).
339
340
341<br>
342Now the first rule:
343
344@important Rule 1. Inside `%read()`, **start with a node type check**:
345must be exactly one of @ref VAL (for scalars), @ref SEQ (for sequence
346types) or @ref MAP (for dictionary types). `%read_key()` *does not
347require* a @ref KEY check.
348
349This is needed to ensure that the node type matches the type of the
350destination variable. Concretely:
351
352 - If you're reading a scalar type like a number or a string, the
353 node must be @ref VAL, ie it must verify @ref NodeType::has_val().
354
355 - If you're reading a sequence type like a vector, the node must be
356 a @ref SEQ, ie it should verify @ref NodeType::is_seq().
357
358 - If you're reading a map type, the node should be a @ref VAL, ie
359 it should verify @ref NodeType::is_map().
360
361Why can't rapidyaml do this check for you before calling your `%read()`
362function? Well, in the general case, it is impossible to know what type
363of node to expect, so rapidyaml can only check that the node is one of
364the @ref VAL|@ref SEQ|@ref MAP cases above, but not concretely which
365one. It is up to the `%read()` implementation for a type to specify
366which one.
367
368However, note that inside `%read_key()` you do not need a type check,
369as the rapidyaml tree requires that these are scalars (ie @ref KEY),
370so rapidyaml does this check for you before calling `%read_key()`.
371
372
373<br>
374Now the next rule:
375
376@important Rule 2. Inside `%read()`, **use
377[.deserialize()](@ref Tree::deserialize()) and not
378[.load()](@ref Tree::load())**, to play nice with `.deserialize()`
379callers calling your function. For `%read_key()` it should be
380[.deserialize_key()](@ref Tree::deserialize_key()) instead
381of [.load_key()](@ref Tree::load_key()).
382
383
384`.load()` triggers an error, while `.deserialize()` just returns, so
385you don't want to have a `.deserialize()` caller being aborted by a
386nested `.load()` call in your function. Let the top-level `.load()`
387caller trigger the error.
388
389
390<br>
391Finally,
392
393@important Rule 3. **Check every read and do early exit on error**,
394adequately filling the @ref ReadResult return type.
395
396Your implementation of `%read()` or `%read_key()` **must return a
397truthy type to signify success of deserialization**. The type should
398preferably be a @ref ReadResult to enable accurate error reporting.
399
400If the type is not @ref ReadResult (like the legacy bool), rapidyaml
401will still work -- although with the inconvenience of pointing only at the
402outer-most node instead of the actual error-causing node.
403
404With this return value, rapidyaml will continue on success; on failure
405it will either return this value to the caller (with `.deserialize()`)
406or with `.load()` trigger a visit error on the reported node, as
407instructed by the triggering call (see @ref doc_serialization_using).
408
409That's it for `%read()`!
410
411@note See examples of `%read()` implementations:
412 - @ref doc_serialization_tree_read
413 - @ref doc_serialization_node_read
414 - see the [vector read implementation](@ref src/c4/yml/std/vector.hpp)
415 - see the [map read implementation](@ref src/c4/yml/std/map.hpp).
416 - see the sample @ref sample_user_container_types
417 - see the sample @ref sample_std_types
418
419
420
421
422<br>
423<hr>
424
425## Implementation notes: scalars
426
427When a scalar type does not require any style or tags to be set in the
428tree, instead of defining `%read()` / `%write()` you can just define
429the direct serialization functions `%from_chars()` and/or
430`%to_chars()` to transform the scalar from/to its string
431representation.
432
433@note Please take note of the following pitfall when using scalar
434serialization functions: you may have to include the header with your
435`%from_chars()` / `%to_chars()` implementation before any other headers
436that use functions from it.
437
438
439<br>
440### Reading scalars
441
442To implement reading (deserialization) of scalar types, you
443need to define the following function:
444
445@code{c++}
446namespace your_namespace {
447bool from_chars(c4::yml::csubstr str, T* var); // if you want to read from YAML
448} // namespace
449@endcode
450
451The function receives a string fitted to the scalar, and must convert
452the string to the argument. To achieve this, you may find it useful to
453use the utilities in @ref doc_charconv or @ref doc_format, which are
454very fast and efficient, and play nice with this approach. But that's
455not mandatory -- you are also free to use any other conversion method
456you choose, such as fmtlib (but please do not use stringstreams; their
457performance is really bad).
458
459Finally, you must return a boolean success status. rapidyaml will then
460react to this status in accordance with the call triggering the read.
461
462@note See examples of `%from_chars()` implementations:
463 - for `std::string`: @ref ext/c4core.src/c4/std/string.hpp
464 - for `std::vector<char>`: @ref ext/c4core.src/c4/std/vector.hpp
465 - for `std::span<char>`: @ref ext/c4core.src/c4/std/span.hpp
466 - see the several from_chars overloads in @ref doc_charconv
467 - see the several from_chars overloads in @ref doc_format
468
469
470<br>
471### Writing scalars
472
473To implement writing (serialization) of scalar types, you
474need to define the following function:
475
476@code{c++}
477namespace your_namespace {
478size_t to_chars(c4::yml::substr buffer, T const& var); // if you want to write to YAML
479} // namespace
480@endcode
481
482This function receives a buffer on which it is to write the
483serialization of var. Importantly, inside your function **you cannot
484assume the buffer is large enough** to fit the serialization of
485var. You must always check against its size.
486
487You must return the number of bytes required to fit the serialization
488of var. Importantly, this size must not depend on the size of the
489buffer, which means **you cannot do an early exit** when you find the
490buffer is too small. The returned size must be invariant.
491
492Upon returning, the caller will compare the returned size with the
493current buffer size. If the returned size is >= than the buffer size,
494it means the serialization succeeded, and we're done. Otherwise, it
495means the buffer was too small; then rapidyaml will resize the buffer
496and call the function again. For an example of this call pattern, see
497eg @ref serialize_to_arena_scalar().
498
499A typical implementation of `%to_chars()` will look like this:
500
501@code{c++}
502namespace your_namespace {
503size_t to_chars(c4::yml::substr buffer, T const& var)
504{
505 size_t pos = 0;
506 for(... var) // iterate over var, adding characters to the buffer
507 {
508 // append another char to the buffer: only if possible!
509 // BUT do not break the loop if the buffer is too small.
510 // Continue doing a blank loop until the end, to count
511 // the needed characters
512 if(pos < buffer.len)
513 buffer[pos] = ...;
514 ++pos; // keep counting, even if we already know
515 // the buffer is small!
516 }
517 return pos; // now we know the required size, return it
518}
519} // namespace
520@endcode
521
522For instance, if your T is a string type, you could do:
523
524@code{c++}
525namespace your_namespace {
526size_t to_chars(c4::yml::substr buffer, T const& var)
527{
528 size_t sz = var.size();
529 if(sz && sz <= buffer.len)
530 memcpy(buffer.str, var.data(), sz);
531 return sz;
532}
533} // namespace
534@endcode
535
536@note See examples of `%to_chars()` implementations:
537 - for `std::string`: @ref ext/c4core.src/c4/std/string.hpp
538 - for `std::string_view`: @ref ext/c4core.src/c4/std/string_view.hpp
539 - for `std::vector<char>`: @ref ext/c4core.src/c4/std/vector.hpp
540 - for `std::span<char>`: @ref ext/c4core.src/c4/std/span.hpp
541 - see the several to_chars overloads in @ref doc_charconv
542 - see the several to_chars overloads in @ref doc_format
543
544
545<br>
546### Further reading for scalar serialization
547
548 - See the sample @ref sample_user_scalar_types
549 - See the sample @ref sample_formatting for examples
550 of functions from @ref doc_format_utils that will be very
551 helpful in implementing custom @ref to_chars() / @ref from_chars()
552 functions.
553 - See @ref doc_charconv for the example implementations of
554 @ref to_chars() / @ref from_chars() for the fundamental types.
555 - See @ref doc_substr and @ref sample_substr() for the
556 many useful utilities in the substring class.
557 - See quickstart examples on how to @ref doc_sample_scalar_types
558
559*/
560
561
562} // namespace yml
563} // namespace c4
Node classes.