rapidyaml  0.10.0
parse and emit YAML, and do it fast
Event Handlers

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine. More...

Classes

struct  c4::yml::EventHandlerStack< HandlerImpl, HandlerState >
 Use this class a base of implementations of event handler to simplify the stack logic. More...
 
struct  c4::yml::EventHandlerTree
 The event handler to create a ryml Tree. More...
 
struct  c4::yml::extra::EventHandlerInts
 A parser event handler that creates a compact representation of the YAML tree in a buffer of integers (see ievt::EventFlags) containing masks (to represent events) and offset+length (to represent strings in the source buffer). More...
 
struct  c4::yml::extra::EventHandlerTestSuite
 This event produces standard YAML events as used in the YAML test suite. More...
 

Functions

int32_t c4::yml::extra::estimate_events_ints_size (csubstr src)
 Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts. More...
 
size_t c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz, substr evts_testsuite)
 Create a testsuite event string from integer events. More...
 
template<class Container >
void c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz, Container *evts_testsuite)
 Create a testsuite event string from integer events, writing into an output container. More...
 
template<class Container >
Container c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
 Create a testsuite event string from integer events, returning a new container with the result. More...
 
void c4::yml::extra::events_ints_print (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
 Print integer events to stdout. More...
 
size_t c4::yml::extra::escape_scalar (substr buffer, csubstr val)
 

Detailed Description

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine.

Because ParseEngine is templated on the event handler, the binding uses static polymorphism, without any virtual functions. The actual handler object can be changed at run time, (but of course needs to be the type of the template parameter). This is thus a very efficient architecture, and further enables the user to provide his own custom handler if he wishes to bypass the rapidyaml Tree.

There are two handlers implemented in this project:

Event model

The event model used by the parse engine and event handlers follows very closely the event model in the YAML test suite.

Consider for example this YAML,

{foo: bar,foo2: bar2}

which would produce these events in the test-suite parlance:

+STR
+MAP {}
=VAL :foo
=VAL :bar
=VAL :foo2
=VAL :bar2
-STR
@ MAP
a map: a parent of KEYVAL/KEYSEQ/KEYMAP nodes
Definition: node_type.hpp:38
@ VAL
a scalar: has a scalar (ie string) value, possibly empty. must be a leaf node, and cannot be MAP or S...
Definition: node_type.hpp:37
@ DOC
a document
Definition: node_type.hpp:40

For reference, the ParseEngine object will produce this sequence of calls to its bound EventHandler:

handler.begin_stream();
handler.begin_doc();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("foo");
handler.set_val_scalar_plain("bar");
handler.add_sibling();
handler.set_key_scalar_plain("foo2");
handler.set_val_scalar_plain("bar2");
handler.end_map();
handler.end_doc();
handler.end_stream();

For many other examples of all areas of YAML and how ryml's parse model corresponds to the YAML standard model, refer to the [unit tests for the parse engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).

Special events

Most of the parsing events adopted by rapidyaml in its event model are fairly obvious, but there are two less-obvious events requiring some explanation.

These events exist to make it easier to parse some special YAML cases. They are called by the parser when a just-handled value/container is actually the first key of a new map:

For example, consider an implicit map inside a seq: [a: b, c: d] which is parsed as [{a: b}, {c: d}]. The standard event sequence for this YAML would be the following:

handler.begin_seq_val_flow();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("a");
handler.set_val_scalar_plain("b");
handler.end_map();
handler.add_sibling();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("c");
handler.set_val_scalar_plain("d");
handler.end_map();
handler.end_seq();

The problem with this event sequence is that it forces the parser to delay setting the val scalar (in this case "a" and "c") until it knows whether the scalar is a key or a val. This would require the parser to store the scalar until this time. For instance, in the example above, the parser should delay setting "a" and "c", because they are in fact keys and not vals. Until then, the parser would have to store "a" and "c" in its internal state. The downside is that this complexity cost would apply even if there is no implicit map – every val in a seq would have to be delayed until one of the disambiguating subsequent tokens ,-]: is found. By calling this function, the parser can avoid this complexity, by preemptively setting the scalar as a val. Then a call to this function will create the map and rearrange the scalar as key. Now the cost applies only once: when a seqimap starts. So the following (easier and cheaper) event sequence below has the same effect as the event sequence above:

handler.begin_seq_val_flow();
handler.set_val_scalar_plain("notmap");
handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
handler.end_map();
handler.set_val_scalar_plain("c"); // "c" also as val!
handler.actually_as_block_flow(); // likewise
handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
handler.end_map();
handler.end_seq();

This also applies to container keys (although ryml's tree cannot accomodate these): the parser can preemptively set a container as a val, and call this event to turn that container into a key. For example, consider this yaml:

[aa, bb]: [cc, dd]
# ^ ^ ^
# | | |
# (2) (1) (3) <- event sequence

The standard event sequence for this YAML would be the following:

handler.begin_map_val_block(); // (1)
handler.begin_seq_key_flow(); // (2)
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.begin_seq_val_flow(); // (3)
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

The problem with the sequence above is that, reading from left-to-right, the parser can only detect the proper calls at (1) and (2) once it reaches (1) in the YAML source. So, the parser would have to buffer the entire event sequence starting from the beginning until it reaches (1). Using this function, the parser can do instead:

handler.begin_seq_val_flow(); // (2) -- preemptively as val!
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
handler.begin_seq_val_flow(); // (3) -- go on as before
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

Function Documentation

◆ estimate_events_ints_size()

int32_t c4::yml::extra::estimate_events_ints_size ( csubstr  src)

Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts.

This estimation is meant to exceed the actual number of required events.

Note
This function must overpredict. It does so for every case in the hundreds/thousands of extensive tests of rapidyaml – both for the YAML test suite and the internal cases. If you find a case where that does not hold, it is a bug. Please report it at https://github.com/biojppm/rapidyaml/issues!

Definition at line 25 of file event_handler_ints.cpp.

26 {
27  int32_t count = 7; // BSTR + BDOC + =VAL + EDOC + ESTR
28  for(size_t i = 0; i < src.len; ++i)
29  {
30  switch(src.str[i])
31  {
32  // this has strings preceding/following it
33  case ':':
34  case ',': // overestimate, assume map
35  count += 6;
36  break;
37  // these have (or are likely to have) a string following it
38  case '-':
39  case '&':
40  case '*':
41  case '<':
42  case '!':
43  case '\'':
44  case '"':
45  case '|':
46  case '>':
47  case '?':
48  case '\n':
49  count += 3;
50  break;
51  case '[':
52  case ']':
53  count += 4;
54  break;
55  case '{':
56  case '}':
57  count += 7;
58  break;
59  }
60  }
61  return count;
62 }

◆ events_ints_to_testsuite() [1/3]

size_t c4::yml::extra::events_ints_to_testsuite ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts_ints,
ievt::DataType  evts_ints_sz,
substr  evts_testsuite 
)

Create a testsuite event string from integer events.

This overload receives a buffer where the string should be written, and returns the size needed for the buffer. If that size is larger than the buffer's size, the user must resize the buffer and call again.

Definition at line 36 of file ints_to_testsuite.cpp.

41 {
42  auto getstr = [&](ievt::DataType i){
43  bool in_arena = evts_ints[i] & ievt::AREN;
44  csubstr region = !in_arena ? parsed_yaml : arena;
45  return region.sub((size_t)evts_ints[i+1], (size_t)evts_ints[i+2]);
46  };
47  size_t sz = 0;
48  auto append = [&](csubstr s){
49  size_t next = sz + s.len;
50  if (s.len && (next <= evts_test_suite.len && evts_test_suite.len))
51  memcpy(evts_test_suite.str + sz, s.str, s.len);
52  sz = next;
53  };
54  bool has_tag = false;
55  csubstr tag;
56  auto maybe_append_tag = [&]{
57  if(has_tag)
58  {
59  #ifdef RYML_NO_COVERAGE__TO_BE_DELETED
60  if(tag.begins_with('<'))
61  {
62  append(" ");
63  append(tag);
64  }
65  else
66  #endif
67  if(tag.begins_with("!<"))
68  {
69  append(" ");
70  append(tag.sub(1));
71  }
72  else if(tag.begins_with('!'))
73  {
74  append(" <");
75  append(tag);
76  append(">");
77  }
78  else
79  {
80  append(" <!");
81  append(tag);
82  append(">");
83  }
84  }
85  has_tag = false;
86  };
87  bool has_anchor = false;
88  csubstr anchor;
89  auto maybe_append_anchor = [&]{
90  if(has_anchor)
91  {
92  append(" &");
93  append(anchor);
94  }
95  has_anchor = false;
96  };
97  auto append_cont = [&](csubstr evt, csubstr style){
98  append(evt);
99  if(style.len)
100  {
101  append(" ");
102  append(style);
103  }
104  maybe_append_anchor();
105  maybe_append_tag();
106  append("\n");
107  };
108  auto append_val = [&](csubstr evt, csubstr val){
109  append("=VAL");
110  maybe_append_anchor();
111  maybe_append_tag();
112  append(" ");
113  append(evt);
114  substr buf = sz <= evts_test_suite.len ? evts_test_suite.sub(sz) : evts_test_suite.last(0);
115  sz += escape_scalar(buf, val);
116  append("\n");
117  };
118  for(ievt::DataType i = 0; i < evts_ints_sz; )
119  {
120  ievt::DataType evt = evts_ints[i];
121  if(evt & ievt::SCLR)
122  {
123  csubstr s = getstr(i);
124  if(evt & ievt::SQUO)
125  append_val("'", s);
126  else if(evt & ievt::DQUO)
127  append_val("\"", s);
128  else if(evt & ievt::LITL)
129  append_val("|", s);
130  else if(evt & ievt::FOLD)
131  append_val(">", s);
132  else //if(evt & ievt::PLAI)
133  append_val(":", s);
134  }
135  else if(evt & ievt::BSEQ)
136  {
137  if(evt & ievt::FLOW)
138  append_cont("+SEQ", "[]");
139  else
140  append_cont("+SEQ", "");
141  }
142  else if(evt & ievt::ESEQ)
143  {
144  append("-SEQ\n");
145  }
146  else if(evt & ievt::BMAP)
147  {
148  if(evt & ievt::FLOW)
149  append_cont("+MAP", "{}");
150  else
151  append_cont("+MAP", "");
152  }
153  else if(evt & ievt::EMAP)
154  {
155  append("-MAP\n");
156  }
157  else if(evt & ievt::ALIA)
158  {
159  append("=ALI *");
160  append(getstr(i));
161  append("\n");
162  }
163  else if(evt & ievt::TAG_)
164  {
165  has_tag = true;
166  tag = getstr(i);
167  }
168  else if(evt & ievt::ANCH)
169  {
170  has_anchor = true;
171  anchor = getstr(i);
172  }
173  else if(evt & ievt::BDOC)
174  {
175  if(evt & ievt::EXPL)
176  append("+DOC ---\n");
177  else
178  append("+DOC\n");
179  }
180  else if(evt & ievt::EDOC)
181  {
182  if(evt & ievt::EXPL)
183  append("-DOC ...\n");
184  else
185  append("-DOC\n");
186  }
187  else if(evt & ievt::BSTR)
188  {
189  append("+STR\n");
190  }
191  else if(evt & ievt::ESTR)
192  {
193  append("-STR\n");
194  }
195 
196  i += (evt & ievt::WSTR) ? 3 : 1;
197  }
198  return sz;
199 }
size_t escape_scalar(substr buffer, csubstr val)
Definition: scalar.cpp:20
@ LITL
block literal scalar (|)
@ DQUO
double-quoted scalar ("")
@ FOLD
block folded scalar (>)
@ BSTR
+STR begin stream
@ AREN
IMPORTANT. Marks events whose string was placed in the arena. This happens when the filtered string i...
@ ALIA
*ref =ALI alias (reference)
@ EXPL
--- (with BDOC) or ... (with EDOC)
@ SQUO
single-quoted scalar (')
int32_t DataType
data type for integer events.
@ FLOW
reading is inside explicit flow chars: [] or {}

References c4::yml::extra::ievt::ALIA, c4::yml::extra::ievt::ANCH, c4::yml::extra::ievt::AREN, c4::yml::extra::ievt::BDOC, c4::yml::extra::ievt::BMAP, c4::yml::extra::ievt::BSEQ, c4::yml::extra::ievt::BSTR, c4::yml::extra::ievt::DQUO, c4::yml::extra::ievt::EDOC, c4::yml::extra::ievt::EMAP, c4::yml::extra::escape_scalar(), c4::yml::extra::ievt::ESEQ, c4::yml::extra::ievt::ESTR, c4::yml::extra::ievt::EXPL, c4::yml::FLOW, c4::yml::extra::ievt::FOLD, c4::yml::extra::ievt::LITL, c4::yml::extra::ievt::SCLR, c4::yml::extra::ievt::SQUO, c4::yml::extra::ievt::TAG_, and c4::yml::extra::ievt::WSTR.

Referenced by c4::yml::extra::events_ints_to_testsuite().

◆ events_ints_to_testsuite() [2/3]

template<class Container >
void c4::yml::extra::events_ints_to_testsuite ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts_ints,
ievt::DataType  evts_ints_sz,
Container *  evts_testsuite 
)

Create a testsuite event string from integer events, writing into an output container.

Definition at line 35 of file ints_to_testsuite.hpp.

40 {
41  size_t len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
42  if(len > evts_testsuite->size())
43  {
44  evts_testsuite->resize(len);
45  len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
46  }
47  evts_testsuite->resize(len);
48 }
Container events_ints_to_testsuite(csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
Create a testsuite event string from integer events, returning a new container with the result.
substr to_substr(substr s) noexcept
neutral version for use in generic code
Definition: substr.hpp:2184

References c4::yml::extra::events_ints_to_testsuite(), and c4::to_substr().

◆ events_ints_to_testsuite() [3/3]

template<class Container >
Container c4::yml::extra::events_ints_to_testsuite ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts_ints,
ievt::DataType  evts_ints_sz 
)

Create a testsuite event string from integer events, returning a new container with the result.

Definition at line 53 of file ints_to_testsuite.hpp.

57 {
58  Container ret;
59  events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, &ret);
60  return ret;
61 }

References c4::yml::extra::events_ints_to_testsuite().

◆ events_ints_print()

void c4::yml::extra::events_ints_print ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts,
ievt::DataType  evts_sz 
)

Print integer events to stdout.

Definition at line 93 of file ints_utils.cpp.

94 {
95  char buf[200];
96  for(ievt::DataType evtpos = 0, evtnumber = 0;
97  evtpos < evts_sz;
98  ++evtnumber,
99  evtpos += ((evts[evtpos] & ievt::WSTR) ? 3 : 1))
100  {
101  ievt::DataType evt = evts[evtpos];
102  {
103  csubstr str = ievt::to_chars_sub(buf, evt);
104  printf("[%d][%d] %.*s(0x%x)", evtnumber, evtpos, (int)str.len, str.str, evt);
105  }
106  if (evt & ievt::WSTR)
107  {
108  bool in_arena = evt & ievt::AREN;
109  csubstr region = !in_arena ? parsed_yaml : arena;
110  bool safe = (evts[evtpos + 1] >= 0)
111  && (evts[evtpos + 2] >= 0)
112  && (evts[evtpos + 1] <= (int)region.len)
113  && ((evts[evtpos + 1] + evts[evtpos + 2]) <= (int)region.len);
114  const char *str = safe ? (region.str + evts[evtpos + 1]) : "ERR!!!";
115  int len = safe ? evts[evtpos + 2] : 6;
116  printf(": %d [%d]~~~%.*s~~~", evts[evtpos+1], evts[evtpos+2], len, str);
117  if(in_arena)
118  printf(" (arenasz=%zu)", arena.len);
119  else
120  printf(" (srcsz=%zu)", parsed_yaml.len);
121  }
122  printf("\n");
123  }
124 }
csubstr to_chars_sub(substr buf, ievt::DataType flags)
Convert bit mask of ievt::EventFlags to text.
Definition: ints_utils.cpp:72

References c4::yml::extra::ievt::AREN, c4::yml::extra::ievt::to_chars_sub(), and c4::yml::extra::ievt::WSTR.

◆ escape_scalar()

size_t c4::yml::extra::escape_scalar ( substr  buffer,
csubstr  val 
)

Definition at line 20 of file scalar.cpp.

21 {
22  size_t pos = 0;
23  #define _append(repl) \
24  do { \
25  if(repl.len && (pos + repl.len <= buffer.len)) \
26  memcpy(buffer.str + pos, repl.str, repl.len); \
27  pos += repl.len; \
28  } while(0)
29  #define _c4flush_use_instead(i, repl, skip) \
30  do { \
31  _append(val.range(prev, i)); \
32  _append(csubstr(repl)); \
33  prev = i + skip; \
34  } \
35  while(0)
36  uint8_t const* C4_RESTRICT s = reinterpret_cast<uint8_t const*>(val.str);
37  size_t prev = 0;
38  for(size_t i = 0; i < val.len; ++i)
39  {
40  switch(s[i])
41  {
42  case UINT8_C(0x0a): // \n
43  _c4flush_use_instead(i, "\\n", 1); break;
44  case UINT8_C(0x5c): // '\\'
45  _c4flush_use_instead(i, "\\\\", 1); break;
46  case UINT8_C(0x09): // \t
47  _c4flush_use_instead(i, "\\t", 1); break;
48  case UINT8_C(0x0d): // \r
49  _c4flush_use_instead(i, "\\r", 1); break;
50  case UINT8_C(0x00): // \0
51  _c4flush_use_instead(i, "\\0", 1); break;
52  case UINT8_C(0x0c): // \f (form feed)
53  _c4flush_use_instead(i, "\\f", 1); break;
54  case UINT8_C(0x08): // \b (backspace)
55  _c4flush_use_instead(i, "\\b", 1); break;
56  case UINT8_C(0x07): // \a (bell)
57  _c4flush_use_instead(i, "\\a", 1); break;
58  case UINT8_C(0x0b): // \v (vertical tab)
59  _c4flush_use_instead(i, "\\v", 1); break;
60  case UINT8_C(0x1b): // \e (escape)
61  _c4flush_use_instead(i, "\\e", 1); break;
62  case UINT8_C(0xc2):
63  if(i+1 < val.len)
64  {
65  const uint8_t np1 = s[i+1];
66  if(np1 == UINT8_C(0xa0))
67  _c4flush_use_instead(i, "\\_", 2);
68  else if(np1 == UINT8_C(0x85))
69  _c4flush_use_instead(i, "\\N", 2);
70  }
71  break;
72  case UINT8_C(0xe2):
73  if(i+2 < val.len)
74  {
75  if(s[i+1] == UINT8_C(0x80))
76  {
77  if(s[i+2] == UINT8_C(0xa8))
78  _c4flush_use_instead(i, "\\L", 3);
79  else if(s[i+2] == UINT8_C(0xa9))
80  _c4flush_use_instead(i, "\\P", 3);
81  }
82  }
83  break;
84  }
85  }
86  // flush the rest
87  _append(val.sub(prev));
88  #undef _c4flush_use_instead
89  #undef _append
90  return pos;
91 }
#define _append(repl)
#define _c4flush_use_instead(i, repl, skip)

References _append, and _c4flush_use_instead.

Referenced by c4::yml::extra::append_scalar_escaped(), and c4::yml::extra::events_ints_to_testsuite().