rapidyaml
0.15.2
parse and emit YAML, and do it fast
Loading...
Searching...
No Matches
doxy_serialization_user_types.hpp
Go to the documentation of this file.
1
2
// DANGER: Keep markdown []() links in a single line!!!
3
//
4
// doxygen is broken and fails to render the markdown links when
5
// they span multi lines.
6
7
8
#include "
c4/yml/tree.hpp
"
9
#include "
c4/yml/node.hpp
"
10
#include "
c4/yml/scalar_charconv.hpp
"
11
12
namespace
c4
{
13
namespace
yml
{
14
15
16
/** @addtogroup doc_serialization_user_types
17
18
<br>
19
<hr>
20
## Serialization type categories
21
22
There are two distinct type categories to consider regarding YAML
23
serialization:
24
25
- **Container types**. These represent a hierarchy of values (or
26
containers) and must converted to/from a YAML map (@ref MAP) or
27
sequence (@ref SEQ).
28
29
- **Scalar types**. These types are encoded as scalars, but need
30
to be transformed from/to their string representation in the
31
YAML buffer.
32
33
34
A container type will always require child nodes in the tree. A scalar
35
type will always be a leaf (childless) node in the tree. Most of the
36
time, a scalar will be converted to string and not require any meta
37
info (like tags) or style flags set in the tree, but occasionally this
38
will be needed.
39
40
So in fact, from the implementation point of view, the categories are
41
the following:
42
43
- **General types**. Require extra structure/info from the tree:
44
child nodes (required by containers) and/or tags or extra @ref
45
NodeType flags (required by some scalars).
46
47
- **Scalar types**. These merely need to be converted to string and
48
then set as scalars on the tree, without needing to set any tags
49
or extra @ref NodeType flags.
50
51
52
To have rapidyaml interact with your types, you need to define functions
53
where this is done, and then the compiler will have rapidyaml call your
54
functions because of [C++'s ADL rules](http://en.cppreference.com/w/cpp/language/adl).
55
56
Briefly stated, these are the functions you need to implement, **under
57
your type's namespace**:
58
59
@code{c++}
60
// IMPORTANT: define under the namespace of T. Read note below.
61
namespace your_namespace {
62
63
// tree API implementation for general types (containers
64
// or scalars requiring extra info from the tree):
65
//
66
// needed only if you're deserializing T:
67
c4::yml::ReadResult read(c4::yml::Tree const *tree, c4::yml::id_type node_id, T* var);
68
// needed only if you're serializing T:
69
void write(c4::yml::Tree * tree, c4::yml::id_type node_id, T const& var);
70
71
// or...
72
73
// special case for scalars not needing interaction with the tree:
74
//
75
// needed only if you're deserializing T:
76
bool from_chars(c4::yml::csubstr str, T* var);
77
// needed only if you're serializing T:
78
size_t to_chars(c4::yml::substr buffer, T const& var);
79
// optional:
80
c4::yml::type_bits scalar_flags_val(T const& var); // set extra style flags on T vals
81
c4::yml::type_bits scalar_flags_key(T const& var); // set extra style flags on T keys
82
83
// or...
84
85
// special case for writing string scalars: no need to convert to chars!
86
// mark as string
87
template<> struct c4::is_string<T> : std::true_type {};
88
// instead of to_chars()
89
c4::yml::csubstr to_csubstr(T const& var);
90
// rest as above for scalars
91
92
} // namespace
93
@endcode
94
95
96
@important Because of [C++'s ADL
97
rules](http://en.cppreference.com/w/cpp/language/adl), **it is
98
required to overload these functions in the namespace of the type**
99
you're serializing. Here's an [example of an issue](https://github.com/biojppm/rapidyaml/issues/424)
100
where failing to do this was causing problems in some platforms.
101
102
103
You may also implement %read/write() using the node API instead of the
104
tree API (but read the following section for details):
105
106
@code{c++}
107
// IMPORTANT: define %read() under the namespace of T. Read note above.
108
namespace your_namespace {
109
110
// node API implementation for general types (old approach)
111
// needed only if you're deserializing T:
112
c4::yml::ReadResult read(c4::yml::ConstNodeRef node, T* var);
113
// needed only if you're serializing T:
114
void write(c4::yml::NodeRef * node, T const& var);
115
116
} // namespace
117
@endcode
118
119
@note For maximum flexibility you should prefer implementing the
120
tree %read/write.
121
122
123
Read on for details.
124
125
126
// <br>
127
// <hr>
128
129
## Why you should prefer implementing with tree API
130
131
You may have noticed above that there are two sets of functions: one
132
for the node API and another for the tree API. You don't need to
133
implement both. Simply put, the choice on which one to implement comes
134
down to which one you want to use, but for maximum flexibility
135
the **default advice is to implement the tree %read/write functions**.
136
137
Here are the key considerations:
138
139
- If you trigger the deserialization from a particular API, it will
140
directly call the corresponding %read/write() function. Further,
141
rapidyaml's default implementation of node is calling into the tree
142
%read/write(), so that if you only implement this one, it is
143
automagically picked even if you're calling from nodes. For
144
example:
145
146
@code{c++}
147
T var;
148
149
// tree calls
150
Tree tree = ...;
151
id_type node_id = ...;
152
node.load(&var) // calls read(Tree const*,id_type,T*)
153
if(!node.deserialize(&var)) ...; // calls read(Tree const*,id_type,T*)
154
tree.save(var); // calls write(Tree*,id_type,T const&)
155
tree.set_serialized(&var); // calls write(Tree*,id_type,T const&)
156
157
// node calls - forwarding to tree by default
158
NodeRef node = ...;
159
node.load(&var); // calls read(ConstNodeRef const&,T*)
160
// -> rapidyaml calls read(Tree const*,id_type,T*)
161
if(!node.deserialize(&var)) ...; // calls read(ConstNodeRef const&,T*)
162
// -> rapidyaml calls read(Tree const*,id_type,T*)
163
node.save(var); // calls write(NodeRef*,T const&)
164
// -> rapidyaml calls write(Tree*,id_type,T const&)
165
node.set_serialized(&var); // calls write(NodeRef*,T const&)
166
// -> rapidyaml calls write(Tree*,id_type,T const&)
167
@endcode
168
169
- By default, a tree %read/write() impl will get called from a node
170
call. rapidyaml's node impl calls into the tree impl. This means that
171
if you implement the tree %read/write(), rapidyaml will pick it up
172
**even if you are triggering it with the node API**.
173
174
- If you implement node %read/write(), they will be picked up by a
175
node call, but not by a tree call. Further, if you also implement
176
tree %read/writes, they will only be picked up by a tree call.
177
178
- If you implement node %read/write(), it hides rapidyaml's default
179
implementation of calling the tree %read/write(), so if you then
180
want to call tree deserialization, you will also need to implement
181
tree %read/write().
182
183
So again, it is best to choose to implement the tree %read/write() functions.
184
185
186
187
// <br>
188
// <hr>
189
190
## Implementation notes: general types
191
192
As explained above, general types are those that require child nodes
193
(in the case of containers), or are scalars that require extra @ref
194
NodeType flags to be set along with it. For each type, the functions
195
you will to implement depend on whether you're reading or writing from
196
the tree/node.
197
198
199
200
// <br>
201
### Writing general types
202
203
When writing general types to YAML, you need to define the following
204
function:
205
206
@code{c++}
207
// implement these functions for T ...
208
namespace your_namespace { // IMPORTANT read note about namespace above
209
void write(c4::yml::Tree *tree, c4::yml::id_type node_id, T const& var);
210
// or, if you want to use the node API,
211
void write(c4::yml::NodeRef *scalar, T const& var);
212
} // namespace
213
@endcode
214
215
Likewise, for writing keys you need to define the following function
216
(but note the key MUST be a scalar):
217
218
@code{c++}
219
// implement these functions for T ...
220
namespace your_namespace { // IMPORTANT read note about namespace above
221
void write_key(c4::yml::Tree *tree, c4::yml::id_type node_id, T const& var);
222
// or, if you want to use the node API,
223
void write_key(c4::yml::NodeRef *scalar, T const& var);
224
} // namespace
225
@endcode
226
227
The requirements for `%write()` are less numerous than with
228
%read(). Inside `%write()`, you may assume the node is valid, as rapidyaml
229
will have made the required checks before calling your function, as
230
specified by the call triggering the %write (as described in @ref
231
doc_serialization_using).
232
233
As for what you can do inside `%write()`: generally you should only be
234
setting/adding things to the node, and not to its key (that
235
will generally have been dealt with elsewhere), typically with one of
236
[.set_seq()](@ref Tree::set_seq()) or
237
[.set_map()](@ref Tree::set_map()) for containers,
238
or [.set_val()](@ref Tree::set_val()) or
239
[.set_serialized()](@ref Tree::set_serialized()). Following this, for
240
containers you should create and populate the children, with further
241
calls to any of these functions, but now with child nodes and data
242
structures as the targets.
243
244
245
@note See examples of `%write()` implementations:
246
- @ref doc_serialization_tree_write
247
- @ref doc_serialization_node_write
248
- see the [vector write implementation](@ref src/c4/yml/std/vector.hpp)
249
- see the [map write implementation](@ref src/c4/yml/std/map.hpp).
250
- see the sample @ref sample_user_container_types
251
- see the sample @ref sample_std_types
252
253
254
255
// <br>
256
### Reading general types
257
258
To enable reading (deserialization) of a custom user type T falling
259
into the general category, you need to define the following function:
260
261
@code{c++}
262
// IMPORTANT: define read() under the namespace of T. Read warning above.
263
namespace your_namespace {
264
c4::yml::ReadResult read(c4::yml::Tree const *tree, c4::yml::id_type node_id, T* var);
265
// and/or, if you prefer the node API
266
c4::yml::ReadResult read(c4::yml::ConstNodeRef node, T* var);
267
} // namespace
268
@endcode
269
270
Likewise, for reading keys you need to define the following function:
271
@code{c++}
272
// IMPORTANT: define %read_key() under the namespace of T. Read warning above.
273
namespace your_namespace {
274
c4::yml::ReadResult read_key(c4::yml::Tree const *tree, c4::yml::id_type node_id, T* var);
275
// and/or, if you prefer the node API
276
c4::yml::ReadResult read_key(c4::yml::ConstNodeRef node, T* var);
277
} // namespace
278
@endcode
279
280
281
Then when you call any of @ref NodeRef::load(), @ref
282
NodeRef::deserialize(), @ref Tree::load() or @ref Tree::deserialize()
283
(as described in @ref doc_serialization_using), rapidyaml will call
284
your `%read()` function through the magic of C++ ADL / Koenig
285
lookup. And likewise, when you call any of @ref NodeRef::load_key(),
286
@ref NodeRef::deserialize_key(), @ref Tree::load_key() or @ref
287
Tree::deserialize_key() (as described in @ref
288
doc_serialization_using), rapidyaml will call your `%read_key()`
289
function. (**But note the rapidyaml tree cannot accept containers as
290
keys!**)
291
292
293
The @ref ReadResult return type is a lightweight truthy type, used to
294
enable reporting either of success or of the offending node, when an
295
error happens in nested reads. It evaluates as true
296
(empty-initialized) when there is no error, or as false on error, and
297
has the innermost node causing the error. This enables accurate error
298
reporting, and is very useful on large YAML files; see also @ref
299
sample_location_tracking() to find the original source location of the
300
offending node.
301
302
303
304
To start with an example, here is the rapidyaml implementation of `%read()` for
305
`std::map`:
306
307
@code{c++}
308
template<class K, class V, class Less, class Alloc>
309
c4::yml::ReadResult read(c4::yml::Tree const* tree, c4::yml::id_type id, std::map<K, V, Less, Alloc> * m)
310
{
311
// RULE 0. you may assume tree and id are valid.
312
if(!tree->is_map(id)) // RULE 1. check node type
313
return c4::yml::ReadResult(id); // report error on this id
314
for(id_type child = tree->first_child(id); child != NONE; child = tree->next_sibling(child))
315
{
316
K k{};
317
// RULE 2. use .deserialize(), not .load()
318
c4::yml::ReadResult result = tree->deserialize_key(child, &k);
319
if((!result))
320
return result; // RULE 3. early exit on error
321
result = tree->deserialize(child, &(*m)[std::move(k)]);
322
if(!result)
323
return result; // may refer to a deeply nested node!
324
}
325
return ReadResult{}; // report success
326
}
327
@endcode
328
329
330
<br>
331
The beginning rule is actually an assumption:
332
333
@important Rule 0. Inside your implementation of `%read()` or
334
`%read_key()`, you may assume the node is valid (ie, that the tree and
335
node_id are valid).
336
337
rapidyaml will already have checked for this as specified by the
338
triggering call (see @ref doc_serialization_using).
339
340
341
<br>
342
Now the first rule:
343
344
@important Rule 1. Inside `%read()`, **start with a node type check**:
345
must be exactly one of @ref VAL (for scalars), @ref SEQ (for sequence
346
types) or @ref MAP (for dictionary types). `%read_key()` *does not
347
require* a @ref KEY check.
348
349
This is needed to ensure that the node type matches the type of the
350
destination variable. Concretely:
351
352
- If you're reading a scalar type like a number or a string, the
353
node must be @ref VAL, ie it must verify @ref NodeType::has_val().
354
355
- If you're reading a sequence type like a vector, the node must be
356
a @ref SEQ, ie it should verify @ref NodeType::is_seq().
357
358
- If you're reading a map type, the node should be a @ref VAL, ie
359
it should verify @ref NodeType::is_map().
360
361
Why can't rapidyaml do this check for you before calling your `%read()`
362
function? Well, in the general case, it is impossible to know what type
363
of node to expect, so rapidyaml can only check that the node is one of
364
the @ref VAL|@ref SEQ|@ref MAP cases above, but not concretely which
365
one. It is up to the `%read()` implementation for a type to specify
366
which one.
367
368
However, note that inside `%read_key()` you do not need a type check,
369
as the rapidyaml tree requires that these are scalars (ie @ref KEY),
370
so rapidyaml does this check for you before calling `%read_key()`.
371
372
373
<br>
374
Now the next rule:
375
376
@important Rule 2. Inside `%read()`, **use
377
[.deserialize()](@ref Tree::deserialize()) and not
378
[.load()](@ref Tree::load())**, to play nice with `.deserialize()`
379
callers calling your function. For `%read_key()` it should be
380
[.deserialize_key()](@ref Tree::deserialize_key()) instead
381
of [.load_key()](@ref Tree::load_key()).
382
383
384
`.load()` triggers an error, while `.deserialize()` just returns, so
385
you don't want to have a `.deserialize()` caller being aborted by a
386
nested `.load()` call in your function. Let the top-level `.load()`
387
caller trigger the error.
388
389
390
<br>
391
Finally,
392
393
@important Rule 3. **Check every read and do early exit on error**,
394
adequately filling the @ref ReadResult return type.
395
396
Your implementation of `%read()` or `%read_key()` **must return a
397
truthy type to signify success of deserialization**. The type should
398
preferably be a @ref ReadResult to enable accurate error reporting.
399
400
If the type is not @ref ReadResult (like the legacy bool), rapidyaml
401
will still work -- although with the inconvenience of pointing only at the
402
outer-most node instead of the actual error-causing node.
403
404
With this return value, rapidyaml will continue on success; on failure
405
it will either return this value to the caller (with `.deserialize()`)
406
or with `.load()` trigger a visit error on the reported node, as
407
instructed by the triggering call (see @ref doc_serialization_using).
408
409
That's it for `%read()`!
410
411
@note See examples of `%read()` implementations:
412
- @ref doc_serialization_tree_read
413
- @ref doc_serialization_node_read
414
- see the [vector read implementation](@ref src/c4/yml/std/vector.hpp)
415
- see the [map read implementation](@ref src/c4/yml/std/map.hpp).
416
- see the sample @ref sample_user_container_types
417
- see the sample @ref sample_std_types
418
419
420
421
422
<br>
423
<hr>
424
425
## Implementation notes: scalars
426
427
When a scalar type does not require any style or tags to be set in the
428
tree, instead of defining `%read()` / `%write()` you can just define
429
the direct serialization functions `%from_chars()` and/or
430
`%to_chars()` to transform the scalar from/to its string
431
representation.
432
433
@note Please take note of the following pitfall when using scalar
434
serialization functions: you may have to include the header with your
435
`%from_chars()` / `%to_chars()` implementation before any other headers
436
that use functions from it.
437
438
439
<br>
440
### Reading scalars
441
442
To implement reading (deserialization) of scalar types, you
443
need to define the following function:
444
445
@code{c++}
446
namespace your_namespace {
447
bool from_chars(c4::yml::csubstr str, T* var); // if you want to read from YAML
448
} // namespace
449
@endcode
450
451
The function receives a string fitted to the scalar, and must convert
452
the string to the argument. To achieve this, you may find it useful to
453
use the utilities in @ref doc_charconv or @ref doc_format, which are
454
very fast and efficient, and play nice with this approach. But that's
455
not mandatory -- you are also free to use any other conversion method
456
you choose, such as fmtlib (but please do not use stringstreams; their
457
performance is really bad).
458
459
Finally, you must return a boolean success status. rapidyaml will then
460
react to this status in accordance with the call triggering the read.
461
462
@note See examples of `%from_chars()` implementations:
463
- for `std::string`: @ref ext/c4core.src/c4/std/string.hpp
464
- for `std::vector<char>`: @ref ext/c4core.src/c4/std/vector.hpp
465
- for `std::span<char>`: @ref ext/c4core.src/c4/std/span.hpp
466
- see the several from_chars overloads in @ref doc_charconv
467
- see the several from_chars overloads in @ref doc_format
468
469
470
<br>
471
### Writing scalars
472
473
To implement writing (serialization) of scalar types, you
474
need to define the following function:
475
476
@code{c++}
477
namespace your_namespace {
478
size_t to_chars(c4::yml::substr buffer, T const& var); // if you want to write to YAML
479
} // namespace
480
@endcode
481
482
This function receives a buffer on which it is to write the
483
serialization of var. Importantly, inside your function **you cannot
484
assume the buffer is large enough** to fit the serialization of
485
var. You must always check against its size.
486
487
You must return the number of bytes required to fit the serialization
488
of var. Importantly, this size must not depend on the size of the
489
buffer, which means **you cannot do an early exit** when you find the
490
buffer is too small. The returned size must be invariant.
491
492
Upon returning, the caller will compare the returned size with the
493
current buffer size. If the returned size is >= than the buffer size,
494
it means the serialization succeeded, and we're done. Otherwise, it
495
means the buffer was too small; then rapidyaml will resize the buffer
496
and call the function again. For an example of this call pattern, see
497
eg @ref serialize_to_arena_scalar().
498
499
A typical implementation of `%to_chars()` will look like this:
500
501
@code{c++}
502
namespace your_namespace {
503
size_t to_chars(c4::yml::substr buffer, T const& var)
504
{
505
size_t pos = 0;
506
for(... var) // iterate over var, adding characters to the buffer
507
{
508
// append another char to the buffer: only if possible!
509
// BUT do not break the loop if the buffer is too small.
510
// Continue doing a blank loop until the end, to count
511
// the needed characters
512
if(pos < buffer.len)
513
buffer[pos] = ...;
514
++pos; // keep counting, even if we already know
515
// the buffer is small!
516
}
517
return pos; // now we know the required size, return it
518
}
519
} // namespace
520
@endcode
521
522
For instance, if your T is a string type, you could do:
523
524
@code{c++}
525
namespace your_namespace {
526
size_t to_chars(c4::yml::substr buffer, T const& var)
527
{
528
size_t sz = var.size();
529
if(sz && sz <= buffer.len)
530
memcpy(buffer.str, var.data(), sz);
531
return sz;
532
}
533
} // namespace
534
@endcode
535
536
@note See examples of `%to_chars()` implementations:
537
- for `std::string`: @ref ext/c4core.src/c4/std/string.hpp
538
- for `std::string_view`: @ref ext/c4core.src/c4/std/string_view.hpp
539
- for `std::vector<char>`: @ref ext/c4core.src/c4/std/vector.hpp
540
- for `std::span<char>`: @ref ext/c4core.src/c4/std/span.hpp
541
- see the several to_chars overloads in @ref doc_charconv
542
- see the several to_chars overloads in @ref doc_format
543
544
545
<br>
546
### Further reading for scalar serialization
547
548
- See the sample @ref sample_user_scalar_types
549
- See the sample @ref sample_formatting for examples
550
of functions from @ref doc_format_utils that will be very
551
helpful in implementing custom @ref to_chars() / @ref from_chars()
552
functions.
553
- See @ref doc_charconv for the example implementations of
554
@ref to_chars() / @ref from_chars() for the fundamental types.
555
- See @ref doc_substr and @ref sample_substr() for the
556
many useful utilities in the substring class.
557
- See quickstart examples on how to @ref doc_sample_scalar_types
558
559
*/
560
561
562
}
// namespace yml
563
}
// namespace c4
c4::yml
Definition
doxy_common.hpp:2
c4
Definition
doxy_common.hpp:1
node.hpp
Node classes.
scalar_charconv.hpp
tree.hpp
doxy_serialization_user_types.hpp
Generated by
1.15.0