rapidyaml  0.9.0
parse and emit YAML, and do it fast
Event Handlers

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine. More...

Classes

struct  c4::yml::EventHandlerStack< HandlerImpl, HandlerState >
 Use this class a base of implementations of event handler to simplify the stack logic. More...
 
struct  c4::yml::EventHandlerTreeState
 The stack state needed specifically by EventHandlerTree. More...
 
struct  c4::yml::EventHandlerTree
 The event handler to create a ryml Tree. More...
 
struct  c4::yml::EventHandlerYamlStdState
 The stack state needed specifically by EventHandlerYamlStd. More...
 
struct  c4::yml::EventHandlerYamlStd
 The event handler producing standard YAML events as used in the YAML test suite. More...
 

Functions

void c4::yml::append_escaped (extra::string *s, csubstr val)
 

Detailed Description

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine.

Because ParseEngine is templated on the event handler, the binding uses static polymorphism, without any virtual functions. The actual handler object can be changed at run time, (but of course needs to be the type of the template parameter). This is thus a very efficient architecture, and further enables the user to provide his own custom handler if he wishes to bypass the rapidyaml Tree.

There are two handlers implemented in this project:

Event model

The event model used by the parse engine and event handlers follows very closely the event model in the YAML test suite.

Consider for example this YAML,

{foo: bar,foo2: bar2}

which would produce these events in the test-suite parlance:

+STR
+MAP {}
=VAL :foo
=VAL :bar
=VAL :foo2
=VAL :bar2
-STR
@ MAP
a map: a parent of KEYVAL/KEYSEQ/KEYMAP nodes
Definition: node_type.hpp:35
@ VAL
a scalar: has a scalar (ie string) value, possibly empty. must be a leaf node, and cannot be MAP or S...
Definition: node_type.hpp:34
@ DOC
a document
Definition: node_type.hpp:37

For reference, the ParseEngine object will produce this sequence of calls to its bound EventHandler:

handler.begin_stream();
handler.begin_doc();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("foo");
handler.set_val_scalar_plain("bar");
handler.add_sibling();
handler.set_key_scalar_plain("foo2");
handler.set_val_scalar_plain("bar2");
handler.end_map();
handler.end_doc();
handler.end_stream();

For many other examples of all areas of YAML and how ryml's parse model corresponds to the YAML standard model, refer to the [unit tests for the parse engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).

Special events

Most of the parsing events adopted by rapidyaml in its event model are fairly obvious, but there are two less-obvious events requiring some explanation.

These events exist to make it easier to parse some special YAML cases. They are called by the parser when a just-handled value/container is actually the first key of a new map:

For example, consider an implicit map inside a seq: [a: b, c: d] which is parsed as [{a: b}, {c: d}]. The standard event sequence for this YAML would be the following:

handler.begin_seq_val_flow();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("a");
handler.set_val_scalar_plain("b");
handler.end_map();
handler.add_sibling();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("c");
handler.set_val_scalar_plain("d");
handler.end_map();
handler.end_seq();

The problem with this event sequence is that it forces the parser to delay setting the val scalar (in this case "a" and "c") until it knows whether the scalar is a key or a val. This would require the parser to store the scalar until this time. For instance, in the example above, the parser should delay setting "a" and "c", because they are in fact keys and not vals. Until then, the parser would have to store "a" and "c" in its internal state. The downside is that this complexity cost would apply even if there is no implicit map – every val in a seq would have to be delayed until one of the disambiguating subsequent tokens ,-]: is found. By calling this function, the parser can avoid this complexity, by preemptively setting the scalar as a val. Then a call to this function will create the map and rearrange the scalar as key. Now the cost applies only once: when a seqimap starts. So the following (easier and cheaper) event sequence below has the same effect as the event sequence above:

handler.begin_seq_val_flow();
handler.set_val_scalar_plain("notmap");
handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
handler.end_map();
handler.set_val_scalar_plain("c"); // "c" also as val!
handler.actually_as_block_flow(); // likewise
handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
handler.end_map();
handler.end_seq();

This also applies to container keys (although ryml's tree cannot accomodate these): the parser can preemptively set a container as a val, and call this event to turn that container into a key. For example, consider this yaml:

[aa, bb]: [cc, dd]
# ^ ^ ^
# | | |
# (2) (1) (3) <- event sequence

The standard event sequence for this YAML would be the following:

handler.begin_map_val_block(); // (1)
handler.begin_seq_key_flow(); // (2)
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.begin_seq_val_flow(); // (3)
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

The problem with the sequence above is that, reading from left-to-right, the parser can only detect the proper calls at (1) and (2) once it reaches (1) in the YAML source. So, the parser would have to buffer the entire event sequence starting from the beginning until it reaches (1). Using this function, the parser can do instead:

handler.begin_seq_val_flow(); // (2) -- preemptively as val!
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
handler.begin_seq_val_flow(); // (3) -- go on as before
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

Function Documentation

◆ append_escaped()

void c4::yml::append_escaped ( extra::string *  s,
csubstr  val 
)

Definition at line 15 of file test_suite_event_handler.cpp.

16 {
17  #define _c4flush_use_instead(i, repl, skip) \
18  do { \
19  es->append(val.range(prev, i)); \
20  es->append(repl); \
21  prev = i + skip; \
22  } \
23  while(0)
24  uint8_t const* C4_RESTRICT s = reinterpret_cast<uint8_t const*>(val.str);
25  size_t prev = 0;
26  for(size_t i = 0; i < val.len; ++i)
27  {
28  switch(s[i])
29  {
30  case UINT8_C(0x0a): // \n
31  _c4flush_use_instead(i, "\\n", 1); break;
32  case UINT8_C(0x5c): // '\\'
33  _c4flush_use_instead(i, "\\\\", 1); break;
34  case UINT8_C(0x09): // \t
35  _c4flush_use_instead(i, "\\t", 1); break;
36  case UINT8_C(0x0d): // \r
37  _c4flush_use_instead(i, "\\r", 1); break;
38  case UINT8_C(0x00): // \0
39  _c4flush_use_instead(i, "\\0", 1); break;
40  case UINT8_C(0x0c): // \f (form feed)
41  _c4flush_use_instead(i, "\\f", 1); break;
42  case UINT8_C(0x08): // \b (backspace)
43  _c4flush_use_instead(i, "\\b", 1); break;
44  case UINT8_C(0x07): // \a (bell)
45  _c4flush_use_instead(i, "\\a", 1); break;
46  case UINT8_C(0x0b): // \v (vertical tab)
47  _c4flush_use_instead(i, "\\v", 1); break;
48  case UINT8_C(0x1b): // \e (escape)
49  _c4flush_use_instead(i, "\\e", 1); break;
50  case UINT8_C(0xc2):
51  if(i+1 < val.len)
52  {
53  const uint8_t np1 = s[i+1];
54  if(np1 == UINT8_C(0xa0))
55  _c4flush_use_instead(i, "\\_", 2);
56  else if(np1 == UINT8_C(0x85))
57  _c4flush_use_instead(i, "\\N", 2);
58  }
59  break;
60  case UINT8_C(0xe2):
61  if(i+2 < val.len)
62  {
63  if(s[i+1] == UINT8_C(0x80))
64  {
65  if(s[i+2] == UINT8_C(0xa8))
66  _c4flush_use_instead(i, "\\L", 3);
67  else if(s[i+2] == UINT8_C(0xa9))
68  _c4flush_use_instead(i, "\\P", 3);
69  }
70  }
71  break;
72  }
73  }
74  // flush the rest
75  es->append(val.sub(prev));
76  #undef _c4flush_use_instead
77 }
#define _c4flush_use_instead(i, repl, skip)

References _c4flush_use_instead.