rapidyaml  0.11.1
parse and emit YAML, and do it fast
Event Handlers

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine. More...

Classes

struct  c4::yml::EventHandlerStack< HandlerImpl, HandlerState >
 Use this class a base of implementations of event handler to simplify the stack logic. More...
 
struct  c4::yml::EventHandlerTree
 The event handler to create a ryml Tree. More...
 
struct  c4::yml::extra::EventHandlerInts
 A parser event handler that creates a compact representation of the YAML tree in a buffer of integers (see ievt::EventFlags) containing masks (to represent events) and offset+length (to represent strings in the source buffer). More...
 
struct  c4::yml::extra::EventHandlerTestSuite
 This event produces standard YAML events as used in the YAML test suite. More...
 

Functions

int32_t c4::yml::extra::estimate_events_ints_size (csubstr src)
 Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts. More...
 
size_t c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz, substr evts_testsuite)
 Create a testsuite event string from integer events. More...
 
template<class Container >
void c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz, Container *evts_testsuite)
 Create a testsuite event string from integer events, writing into an output container. More...
 
template<class Container >
Container c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
 Create a testsuite event string from integer events, returning a new container with the result. More...
 
void c4::yml::extra::events_ints_print (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
 Print integer events to stdout. More...
 

Detailed Description

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine.

Because ParseEngine is templated on the event handler, the binding uses static polymorphism, without any virtual functions. The actual handler object can be changed at run time, (but of course needs to be the type of the template parameter). This is thus a very efficient architecture, and further enables the user to provide his own custom handler if he wishes to bypass the rapidyaml Tree.

There are two handlers implemented in this project:

Event model

The event model used by the parse engine and event handlers follows very closely the event model in the YAML test suite.

Consider for example this YAML,

{foo: bar,foo2: bar2}

which would produce these events in the test-suite parlance:

+STR
+MAP {}
=VAL :foo
=VAL :bar
=VAL :foo2
=VAL :bar2
-STR
@ MAP
a map: a parent of KEYVAL/KEYSEQ/KEYMAP nodes
Definition: node_type.hpp:38
@ VAL
a scalar: has a scalar (ie string) value, possibly empty. must be a leaf node, and cannot be MAP or S...
Definition: node_type.hpp:37
@ DOC
a document
Definition: node_type.hpp:40

For reference, the ParseEngine object will produce this sequence of calls to its bound EventHandler:

handler.begin_stream();
handler.begin_doc();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("foo");
handler.set_val_scalar_plain("bar");
handler.add_sibling();
handler.set_key_scalar_plain("foo2");
handler.set_val_scalar_plain("bar2");
handler.end_map();
handler.end_doc();
handler.end_stream();

For many other examples of all areas of YAML and how ryml's parse model corresponds to the YAML standard model, refer to the [unit tests for the parse engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).

Special events

Most of the parsing events adopted by rapidyaml in its event model are fairly obvious, but there are two less-obvious events requiring some explanation.

These events exist to make it easier to parse some special YAML cases. They are called by the parser when a just-handled value/container is actually the first key of a new map:

For example, consider an implicit map inside a seq: [a: b, c: d] which is parsed as [{a: b}, {c: d}]. The standard event sequence for this YAML would be the following:

handler.begin_seq_val_flow();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("a");
handler.set_val_scalar_plain("b");
handler.end_map();
handler.add_sibling();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("c");
handler.set_val_scalar_plain("d");
handler.end_map();
handler.end_seq();

The problem with this event sequence is that it forces the parser to delay setting the val scalar (in this case "a" and "c") until it knows whether the scalar is a key or a val. This would require the parser to store the scalar until this time. For instance, in the example above, the parser should delay setting "a" and "c", because they are in fact keys and not vals. Until then, the parser would have to store "a" and "c" in its internal state. The downside is that this complexity cost would apply even if there is no implicit map – every val in a seq would have to be delayed until one of the disambiguating subsequent tokens ,-]: is found. By calling this function, the parser can avoid this complexity, by preemptively setting the scalar as a val. Then a call to this function will create the map and rearrange the scalar as key. Now the cost applies only once: when a seqimap starts. So the following (easier and cheaper) event sequence below has the same effect as the event sequence above:

handler.begin_seq_val_flow();
handler.set_val_scalar_plain("notmap");
handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
handler.end_map();
handler.set_val_scalar_plain("c"); // "c" also as val!
handler.actually_as_block_flow(); // likewise
handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
handler.end_map();
handler.end_seq();

This also applies to container keys (although ryml's tree cannot accomodate these): the parser can preemptively set a container as a val, and call this event to turn that container into a key. For example, consider this yaml:

[aa, bb]: [cc, dd]
# ^ ^ ^
# | | |
# (2) (1) (3) <- event sequence

The standard event sequence for this YAML would be the following:

handler.begin_map_val_block(); // (1)
handler.begin_seq_key_flow(); // (2)
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.begin_seq_val_flow(); // (3)
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

The problem with the sequence above is that, reading from left-to-right, the parser can only detect the proper calls at (1) and (2) once it reaches (1) in the YAML source. So, the parser would have to buffer the entire event sequence starting from the beginning until it reaches (1). Using this function, the parser can do instead:

handler.begin_seq_val_flow(); // (2) -- preemptively as val!
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
handler.begin_seq_val_flow(); // (3) -- go on as before
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

Function Documentation

◆ estimate_events_ints_size()

int32_t c4::yml::extra::estimate_events_ints_size ( csubstr  src)

Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts.

This estimation is meant to exceed the actual number of required events.

Note
This function must overpredict. It does so for every case in the hundreds/thousands of extensive tests of rapidyaml – both for the YAML test suite and the internal cases. If you find a case where that does not hold, it is a bug. Please report it at https://github.com/biojppm/rapidyaml/issues!

Definition at line 25 of file event_handler_ints.cpp.

26 {
27  int32_t count = 7; // BSTR + BDOC + =VAL + EDOC + ESTR
28  for(size_t i = 0; i < src.len; ++i)
29  {
30  switch(src.str[i])
31  {
32  case ':': // this has strings preceding/following it
33  case ',': // overestimate, assume map
34  case '%': // assume TAGD->string + TAGV->string
35  count += 6;
36  break;
37  // these have (or are likely to have) a string following it
38  case '-':
39  case '&':
40  case '*':
41  case '<':
42  case '!':
43  case '\'':
44  case '"':
45  case '|':
46  case '>':
47  case '\n':
48  count += 3;
49  break;
50  case '[':
51  case ']':
52  count += 4;
53  break;
54  case '{':
55  case '}':
56  count += 7;
57  break;
58  case '?':
59  count += 5;
60  break;
61  }
62  }
63  return count;
64 }

◆ events_ints_to_testsuite() [1/3]

size_t c4::yml::extra::events_ints_to_testsuite ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts_ints,
ievt::DataType  evts_ints_sz,
substr  evts_testsuite 
)

Create a testsuite event string from integer events.

This overload receives a buffer where the string should be written, and returns the size needed for the buffer. If that size is larger than the buffer's size, the user must resize the buffer and call again.

Definition at line 36 of file ints_to_testsuite.cpp.

41 {
42  auto getstr = [&](ievt::DataType i){
43  bool in_arena = evts_ints[i] & ievt::AREN;
44  csubstr region = !in_arena ? parsed_yaml : arena;
45  return region.sub((size_t)evts_ints[i+1], (size_t)evts_ints[i+2]);
46  };
47  size_t sz = 0;
48  auto append = [&](csubstr s){
49  size_t next = sz + s.len;
50  if (s.len && (next <= evts_test_suite.len && evts_test_suite.len))
51  memcpy(evts_test_suite.str + sz, s.str, s.len);
52  sz = next;
53  };
54  bool has_tag = false;
55  csubstr tag;
56  auto maybe_append_tag = [&]{
57  if(has_tag)
58  {
59  if(tag.begins_with('<'))
60  {
61  append(" ");
62  append(tag);
63  }
64  else if(tag.begins_with("!<"))
65  {
66  append(" ");
67  append(tag.sub(1));
68  }
69  else if(tag.begins_with('!'))
70  {
71  append(" <");
72  append(tag);
73  append(">");
74  }
75  else
76  {
77  append(" <!");
78  append(tag);
79  append(">");
80  }
81  }
82  has_tag = false;
83  };
84  bool has_anchor = false;
85  csubstr anchor;
86  auto maybe_append_anchor = [&]{
87  if(has_anchor)
88  {
89  append(" &");
90  append(anchor);
91  }
92  has_anchor = false;
93  };
94  auto append_cont = [&](csubstr evt, csubstr style){
95  append(evt);
96  if(style.len)
97  {
98  append(" ");
99  append(style);
100  }
101  maybe_append_anchor();
102  maybe_append_tag();
103  append("\n");
104  };
105  auto append_esc = [&](csubstr str){
106  substr buf = sz <= evts_test_suite.len ? evts_test_suite.sub(sz) : evts_test_suite.last(0);
107  sz += escape_scalar(buf, str);
108  append("\n");
109  };
110  auto append_val = [&](csubstr evt, csubstr val){
111  append("=VAL");
112  maybe_append_anchor();
113  maybe_append_tag();
114  append(" ");
115  append(evt);
116  append_esc(val);
117  };
118  for(ievt::DataType i = 0; i < evts_ints_sz; )
119  {
120  ievt::DataType evt = evts_ints[i];
121  if(evt & ievt::SCLR)
122  {
123  csubstr s = getstr(i);
124  if(evt & ievt::SQUO)
125  append_val("'", s);
126  else if(evt & ievt::DQUO)
127  append_val("\"", s);
128  else if(evt & ievt::LITL)
129  append_val("|", s);
130  else if(evt & ievt::FOLD)
131  append_val(">", s);
132  else //if(evt & ievt::PLAI)
133  append_val(":", s);
134  }
135  else if((evt & ievt::BSEQ) == ievt::BSEQ)
136  {
137  if(evt & ievt::FLOW)
138  append_cont("+SEQ", "[]");
139  else
140  append_cont("+SEQ", "");
141  }
142  else if((evt & ievt::ESEQ) == ievt::ESEQ)
143  {
144  append("-SEQ\n");
145  }
146  else if((evt & ievt::BMAP) == ievt::BMAP)
147  {
148  if(evt & ievt::FLOW)
149  append_cont("+MAP", "{}");
150  else
151  append_cont("+MAP", "");
152  }
153  else if((evt & ievt::EMAP) == ievt::EMAP)
154  {
155  append("-MAP\n");
156  }
157  else if(evt & ievt::ALIA)
158  {
159  append("=ALI *");
160  append(getstr(i));
161  append("\n");
162  }
163  else if(evt & ievt::TAG_)
164  {
165  has_tag = true;
166  tag = getstr(i);
167  }
168  else if(evt & ievt::ANCH)
169  {
170  has_anchor = true;
171  anchor = getstr(i);
172  }
173  else if((evt & ievt::BDOC) == ievt::BDOC)
174  {
175  if(evt & ievt::EXPL)
176  append("+DOC ---\n");
177  else
178  append("+DOC\n");
179  }
180  else if((evt & ievt::EDOC) == ievt::EDOC)
181  {
182  if(evt & ievt::EXPL)
183  append("-DOC ...\n");
184  else
185  append("-DOC\n");
186  }
187  else if((evt & ievt::BSTR) == ievt::BSTR)
188  {
189  append("+STR\n");
190  }
191  else if((evt & ievt::ESTR) == ievt::ESTR)
192  {
193  append("-STR\n");
194  }
195 
196  i += (evt & ievt::WSTR) ? 3 : 1;
197  }
198  return sz;
199 }
@ SCLR
scalar (=VAL in test suite events)
@ LITL
scalar: block literal (|)
@ EMAP
end map (-MAP in test suite events)
@ DQUO
scalar: double-quoted ("")
@ FOLD
scalar: block folded (>)
@ BMAP
begin map (+MAP in test suite events)
@ ESTR
end stream (-STR in test suite events)
@ BSTR
begin stream (+STR in test suite events)
@ BSEQ
begin seq (+SEQ in test suite events)
@ ESEQ
end seq (-SEQ in test suite events)
@ WSTR
WithSTRing: mask of all the events that encode a string following the event. For such events,...
@ FLOW
container: flow: [] for seqs or {} for maps
@ BDOC
begin doc (+DOC in test suite events)
@ AREN
IMPORTANT. Marks events whose string was placed in the arena. This happens when the filtered string i...
@ ALIA
*ref (reference)
@ EDOC
end doc (-DOC in test suite events)
@ EXPL
--- (with BDOC) or ... (with EDOC)
@ SQUO
scalar: single-quoted (')
int32_t DataType
data type for integer events.
size_t escape_scalar(substr buffer, csubstr scalar, bool keep_newlines=false)
Escape a scalar to an existing buffer, using escape_scalar_fn.

References c4::yml::extra::ievt::ALIA, c4::yml::extra::ievt::ANCH, c4::yml::extra::ievt::AREN, c4::yml::extra::ievt::BDOC, c4::yml::extra::ievt::BMAP, c4::yml::extra::ievt::BSEQ, c4::yml::extra::ievt::BSTR, c4::yml::extra::ievt::DQUO, c4::yml::extra::ievt::EDOC, c4::yml::extra::ievt::EMAP, c4::yml::escape_scalar(), c4::yml::extra::ievt::ESEQ, c4::yml::extra::ievt::ESTR, c4::yml::extra::ievt::EXPL, c4::yml::extra::ievt::FLOW, c4::yml::extra::ievt::FOLD, c4::yml::extra::ievt::LITL, c4::yml::extra::ievt::SCLR, c4::yml::extra::ievt::SQUO, c4::yml::extra::ievt::TAG_, and c4::yml::extra::ievt::WSTR.

◆ events_ints_to_testsuite() [2/3]

template<class Container >
void c4::yml::extra::events_ints_to_testsuite ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts_ints,
ievt::DataType  evts_ints_sz,
Container *  evts_testsuite 
)

Create a testsuite event string from integer events, writing into an output container.

Definition at line 35 of file ints_to_testsuite.hpp.

40 {
41  size_t len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
42  if(len > evts_testsuite->size())
43  {
44  evts_testsuite->resize(len);
45  len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
46  }
47  evts_testsuite->resize(len);
48 }
Container events_ints_to_testsuite(csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
Create a testsuite event string from integer events, returning a new container with the result.
substr to_substr(substr s) noexcept
neutral version for use in generic code
Definition: substr.hpp:2208

References c4::yml::extra::events_ints_to_testsuite(), and c4::to_substr().

◆ events_ints_to_testsuite() [3/3]

template<class Container >
Container c4::yml::extra::events_ints_to_testsuite ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts_ints,
ievt::DataType  evts_ints_sz 
)

Create a testsuite event string from integer events, returning a new container with the result.

Definition at line 53 of file ints_to_testsuite.hpp.

57 {
58  Container ret;
59  events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, &ret);
60  return ret;
61 }

References c4::yml::extra::events_ints_to_testsuite().

◆ events_ints_print()

void c4::yml::extra::events_ints_print ( csubstr  parsed_yaml,
csubstr  arena,
ievt::DataType const *  evts,
ievt::DataType  evts_sz 
)

Print integer events to stdout.

Definition at line 94 of file ints_utils.cpp.

95 {
96  char buf[200];
97  for(ievt::DataType evtpos = 0, evtnumber = 0;
98  evtpos < evts_sz;
99  ++evtnumber,
100  evtpos += ((evts[evtpos] & ievt::WSTR) ? 3 : 1))
101  {
102  ievt::DataType evt = evts[evtpos];
103  csubstr flags = ievt::to_chars_sub(buf, evt);
104  printf("[%d][%d] %.*s(0x%x)", evtnumber, evtpos, (int)flags.len, flags.str, evt);
105  if (evt & ievt::WSTR)
106  {
107  bool in_arena = evt & ievt::AREN;
108  csubstr region = !in_arena ? parsed_yaml : arena;
109  bool safe = (evts[evtpos + 1] >= 0)
110  && (evts[evtpos + 2] >= 0)
111  && (evts[evtpos + 1] <= (int)region.len)
112  && (evts[evtpos + 2] <= ((int)region.len - evts[evtpos + 1]));
113  const char *str = safe ? (region.str + evts[evtpos + 1]) : "ERR!!!";
114  int len = safe ? evts[evtpos + 2] : 6;
115  printf(": %d [%d]~~~%.*s~~~", evts[evtpos+1], evts[evtpos+2], len, str);
116  if(in_arena)
117  printf(" (arenasz=%zu)", arena.len);
118  else
119  printf(" (srcsz=%zu)", parsed_yaml.len);
120  }
121  printf("\n");
122  }
123 }
csubstr to_chars_sub(substr buf, ievt::DataType flags)
Convert bit mask of ievt::EventFlags to text.
Definition: ints_utils.cpp:73

References c4::yml::extra::ievt::AREN, c4::yml::extra::ievt::to_chars_sub(), and c4::yml::extra::ievt::WSTR.