rapidyaml 0.14.0
parse and emit YAML, and do it fast
Loading...
Searching...
No Matches
Event Handlers

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine. More...

Namespaces

namespace  c4::yml::extra::ievt

Classes

struct  c4::yml::EventHandlerStack< HandlerImpl, HandlerState >
 Use this class a base of implementations of event handler to simplify the stack logic. More...
struct  c4::yml::EventHandlerTree
 The event handler to create a ryml Tree. More...
struct  c4::yml::extra::EventHandlerInts
 A parser event handler that creates a compact representation of the YAML tree in a contiguous buffer of integers. More...

Functions

int32_t c4::yml::extra::estimate_events_ints_size (csubstr src)
 Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts.
size_t c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz, substr evts_testsuite)
 Create a testsuite event string from integer events.
template<class Container>
void c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz, Container *evts_testsuite)
 Create a testsuite event string from integer events, writing into an output container.
template<class Container>
Container c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
 Create a testsuite event string from integer events, returning a new container with the result.
void c4::yml::extra::events_ints_print (csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz)
 Print integer events to stdout.

Detailed Description

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine.

Because ParseEngine is templated on the event handler, the binding uses static polymorphism, without any virtual functions. The actual handler object can be changed at run time, (but of course needs to be the type of the template parameter). This is thus a very efficient architecture, and further enables the user to provide his own custom handler if he wishes to bypass the rapidyaml Tree.

The following handlers are implemented in this project:

Event model

The event model used by the parse engine and event handlers follows very closely the event model in the YAML test suite.

Consider for example this YAML,

{foo: bar,foo2: bar2}

which would produce these events in the test-suite parlance:

+STR
+MAP {}
=VAL :foo
=VAL :bar
=VAL :foo2
=VAL :bar2
-STR
@ MAP
a map: a parent of KEYVAL/KEYSEQ/KEYMAP nodes
Definition node_type.hpp:39
@ VAL
a scalar: has a scalar (ie string) value, possibly empty. must be a leaf node, and cannot be MAP or S...
Definition node_type.hpp:38
@ DOC
a document
Definition node_type.hpp:41

For reference, the ParseEngine object will produce this sequence of calls to its bound EventHandler:

handler.begin_stream();
handler.begin_doc();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("foo");
handler.set_val_scalar_plain("bar");
handler.add_sibling();
handler.set_key_scalar_plain("foo2");
handler.set_val_scalar_plain("bar2");
handler.end_map();
handler.end_doc();
handler.end_stream();

For many other examples of all areas of YAML and how ryml's parse model corresponds to the YAML standard model, refer to the [unit tests for the parse engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).

Special events

Most of the parsing events adopted by rapidyaml in its event model are fairly obvious, but there are two less-obvious events requiring some explanation.

These events exist to make it easier to parse some special YAML cases. They are called by the parser when a just-handled value/container is actually the first key of a new map:

For example, consider an implicit map inside a seq: [a: b, c: d] which is parsed as [{a: b}, {c: d}]. The standard event sequence for this YAML would be the following:

handler.begin_seq_val_flow();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("a");
handler.set_val_scalar_plain("b");
handler.end_map();
handler.add_sibling();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("c");
handler.set_val_scalar_plain("d");
handler.end_map();
handler.end_seq();

The problem with this event sequence is that it forces the parser to delay setting the val scalar (in this case "a" and "c") until it knows whether the scalar is a key or a val. This would require the parser to store the scalar until this time. For instance, in the example above, the parser should delay setting "a" and "c", because they are in fact keys and not vals. Until then, the parser would have to store "a" and "c" in its internal state. The downside is that this complexity cost would apply even if there is no implicit map – every val in a seq would have to be delayed until one of the disambiguating subsequent tokens ,-]: is found. By calling this function, the parser can avoid this complexity, by preemptively setting the scalar as a val. Then a call to this function will create the map and rearrange the scalar as key. Now the cost applies only once: when a seqimap starts. So the following (easier and cheaper) event sequence below has the same effect as the event sequence above:

handler.begin_seq_val_flow();
handler.set_val_scalar_plain("notmap");
handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
handler.end_map();
handler.set_val_scalar_plain("c"); // "c" also as val!
handler.actually_as_block_flow(); // likewise
handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
handler.end_map();
handler.end_seq();

This also applies to container keys (although ryml's tree cannot accomodate these): the parser can preemptively set a container as a val, and call this event to turn that container into a key. For example, consider this yaml:

[aa, bb]: [cc, dd]
# ^ ^ ^
# | | |
# (2) (1) (3) <- event sequence

The standard event sequence for this YAML would be the following:

handler.begin_map_val_block(); // (1)
handler.begin_seq_key_flow(); // (2)
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.begin_seq_val_flow(); // (3)
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

The problem with the sequence above is that, reading from left-to-right, the parser can only detect the proper calls at (1) and (2) once it reaches (1) in the YAML source. So, the parser would have to buffer the entire event sequence starting from the beginning until it reaches (1). Using this function, the parser can do instead:

handler.begin_seq_val_flow(); // (2) -- preemptively as val!
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
handler.begin_seq_val_flow(); // (3) -- go on as before
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

Function Documentation

◆ estimate_events_ints_size()

int32_t c4::yml::extra::estimate_events_ints_size ( csubstr src)

Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts.

This estimation is meant to exceed the actual number of required events.

Note
This function must overpredict. It does so for every case in the hundreds/thousands of extensive tests of rapidyaml – both for the YAML test suite and the internal cases. If you find a case where that does not hold, it is a bug. Please report it at https://github.com/biojppm/rapidyaml/issues!

Definition at line 25 of file event_handler_ints.cpp.

26{
27 int32_t count = 7; // BSTR + BDOC + =VAL + EDOC + ESTR
28 for(size_t i = 0; i < src.len; ++i)
29 {
30 switch(src.str[i])
31 {
32 case ':': // this has strings preceding/following it
33 case ',': // overestimate, assume map
34 case '%': // assume TAGD->string + TAGV->string
35 count += 6;
36 break;
37 // these have (or are likely to have) a string following it
38 case '-':
39 case '&':
40 case '*':
41 case '<':
42 case '!':
43 case '\'':
44 case '"':
45 case '|':
46 case '>':
47 case '\n':
48 count += 3;
49 break;
50 case '[':
51 case ']':
52 count += 4;
53 break;
54 case '{':
55 case '}':
56 count += 7;
57 break;
58 case '?':
59 count += 5;
60 break;
61 }
62 }
63 return count;
64}
size_t len
the length of the substring
Definition substr.hpp:218
C * str
a restricted pointer to the first character of the substring
Definition substr.hpp:216

◆ events_ints_to_testsuite() [1/3]

size_t c4::yml::extra::events_ints_to_testsuite ( csubstr parsed_yaml,
csubstr arena,
ievt::DataType const * evts_ints,
ievt::DataType evts_ints_sz,
substr evts_testsuite )

Create a testsuite event string from integer events.

This overload receives a buffer where the string is to be written, and returns the size needed to accomodate the result. The size of the buffer is strictly respected. The caller must check that the returned size is smaller than the buffer's size to ensure that the result is complete. If that's not the case, the user must resize the buffer and call again.

Definition at line 36 of file ints_to_testsuite.cpp.

41{
42 auto getstr = [&](ievt::DataType i){
43 bool in_arena = evts_ints[i] & ievt::AREN;
44 csubstr region = !in_arena ? parsed_yaml : arena;
45 return region.sub((size_t)evts_ints[i+1], (size_t)evts_ints[i+2]);
46 };
47 size_t sz = 0;
48 auto append = [&](csubstr s){
49 size_t next = sz + s.len;
50 if (s.len && (next <= evts_test_suite.len && evts_test_suite.len))
51 memcpy(evts_test_suite.str + sz, s.str, s.len);
52 sz = next;
53 };
54 bool has_tag = false;
55 csubstr tag;
56 auto maybe_append_tag = [&]{
57 if(has_tag)
58 {
59 if(tag.begins_with('<'))
60 {
61 append(" ");
62 append(tag);
63 }
64 else if(tag.begins_with("!<"))
65 {
66 append(" ");
67 append(tag.sub(1));
68 }
69 else if(tag.begins_with('!'))
70 {
71 append(" <");
72 append(tag);
73 append(">");
74 }
75 else
76 {
77 append(" <!");
78 append(tag);
79 append(">");
80 }
81 }
82 has_tag = false;
83 };
84 bool has_anchor = false;
85 csubstr anchor;
86 auto maybe_append_anchor = [&]{
87 if(has_anchor)
88 {
89 append(" &");
90 append(anchor);
91 }
92 has_anchor = false;
93 };
94 auto append_cont = [&](csubstr evt, csubstr style){
95 append(evt);
96 if(style.len)
97 {
98 append(" ");
99 append(style);
100 }
101 maybe_append_anchor();
102 maybe_append_tag();
103 append("\n");
104 };
105 auto append_esc = [&](csubstr str){
106 substr buf = sz <= evts_test_suite.len ? evts_test_suite.sub(sz) : evts_test_suite.last(0);
107 sz += escape_scalar(buf, str);
108 append("\n");
109 };
110 auto append_val = [&](csubstr evt, csubstr val){
111 append("=VAL");
112 maybe_append_anchor();
113 maybe_append_tag();
114 append(" ");
115 append(evt);
116 append_esc(val);
117 };
118 ievt::DataType evt = 0;
119 for(ievt::DataType i = 0; i < evts_ints_sz; i += (evt & ievt::WSTR) ? 3 : 1)
120 {
121 evt = evts_ints[i];
122 if(evt & ievt::SCLR)
123 {
124 csubstr s = getstr(i);
125 if(evt & ievt::SQUO)
126 append_val("'", s);
127 else if(evt & ievt::DQUO)
128 append_val("\"", s);
129 else if(evt & ievt::LITL)
130 append_val("|", s);
131 else if(evt & ievt::FOLD)
132 append_val(">", s);
133 else //if(evt & ievt::PLAI)
134 append_val(":", s);
135 }
136 else if((evt & ievt::BSEQ) == ievt::BSEQ)
137 {
138 if(evt & ievt::FLOW)
139 append_cont("+SEQ", "[]");
140 else
141 append_cont("+SEQ", "");
142 }
143 else if((evt & ievt::ESEQ) == ievt::ESEQ)
144 {
145 append("-SEQ\n");
146 }
147 else if((evt & ievt::BMAP) == ievt::BMAP)
148 {
149 if(evt & ievt::FLOW)
150 append_cont("+MAP", "{}");
151 else
152 append_cont("+MAP", "");
153 }
154 else if((evt & ievt::EMAP) == ievt::EMAP)
155 {
156 append("-MAP\n");
157 }
158 else if(evt & ievt::ALIA)
159 {
160 append("=ALI *");
161 append(getstr(i));
162 append("\n");
163 }
164 else if(evt & ievt::TAG_)
165 {
166 has_tag = true;
167 tag = getstr(i);
168 }
169 else if(evt & ievt::ANCH)
170 {
171 has_anchor = true;
172 anchor = getstr(i);
173 }
174 else if((evt & ievt::BDOC) == ievt::BDOC)
175 {
176 if(evt & ievt::EXPL)
177 append("+DOC ---\n");
178 else
179 append("+DOC\n");
180 }
181 else if((evt & ievt::EDOC) == ievt::EDOC)
182 {
183 if(evt & ievt::EXPL)
184 append("-DOC ...\n");
185 else
186 append("-DOC\n");
187 }
188 else if((evt & ievt::BSTR) == ievt::BSTR)
189 {
190 append("+STR\n");
191 }
192 else if((evt & ievt::ESTR) == ievt::ESTR)
193 {
194 append("-STR\n");
195 }
196 }
197 return sz;
198}
basic_substring< char > substr
a mutable string view
Definition substr.hpp:2356
basic_substring< const char > csubstr
an immutable string view
Definition substr.hpp:2357
int32_t DataType
data type for integer events.
@ SCLR
scalar (=VAL in test suite events)
@ LITL
scalar: block literal (|)
@ EMAP
end map (-MAP in test suite events)
@ DQUO
scalar: double-quoted ("")
@ FOLD
scalar: block folded (>)
@ BMAP
begin map (+MAP in test suite events)
@ ESTR
end stream (-STR in test suite events)
@ BSTR
begin stream (+STR in test suite events)
@ BSEQ
begin seq (+SEQ in test suite events)
@ ESEQ
end seq (-SEQ in test suite events)
@ WSTR
WithSTRing: mask of all the events that encode a string following the event. For such events,...
@ FLOW
container: flow: [] for seqs or {} for maps
@ BDOC
begin doc (+DOC in test suite events)
@ AREN
IMPORTANT. Marks events whose string was placed in the arena. This happens when the filtered string i...
@ EDOC
end doc (-DOC in test suite events)
@ EXPL
--- (with BDOC) or ... (with EDOC)
@ SQUO
scalar: single-quoted (')
size_t escape_scalar(substr buffer, csubstr scalar, bool keep_newlines=false)
Escape a scalar to an existing buffer, using escape_scalar_fn.
bool begins_with(const C c) const noexcept
true if the first character of the string is c
Definition substr.hpp:851
basic_substring sub(size_t first) const noexcept
return [first,len[
Definition substr.hpp:503

◆ events_ints_to_testsuite() [2/3]

template<class Container>
void c4::yml::extra::events_ints_to_testsuite ( csubstr parsed_yaml,
csubstr arena,
ievt::DataType const * evts_ints,
ievt::DataType evts_ints_sz,
Container * evts_testsuite )

Create a testsuite event string from integer events, writing into an output container.

Definition at line 39 of file ints_to_testsuite.hpp.

44{
45 size_t len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
46 if(len > evts_testsuite->size())
47 {
48 evts_testsuite->resize(len);
49 len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
50 }
51 evts_testsuite->resize(len);
52}
size_t events_ints_to_testsuite(csubstr parsed_yaml, csubstr arena, ievt::DataType const *evts_ints, ievt::DataType evts_ints_sz, substr evts_test_suite)
Create a testsuite event string from integer events.
substr to_substr(char(&s)[N]) noexcept
Definition substr.hpp:2377

◆ events_ints_to_testsuite() [3/3]

template<class Container>
Container c4::yml::extra::events_ints_to_testsuite ( csubstr parsed_yaml,
csubstr arena,
ievt::DataType const * evts_ints,
ievt::DataType evts_ints_sz )

Create a testsuite event string from integer events, returning a new container with the result.

Definition at line 58 of file ints_to_testsuite.hpp.

62{
63 Container ret;
64 events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, &ret);
65 return ret;
66}

◆ events_ints_print()

void c4::yml::extra::events_ints_print ( csubstr parsed_yaml,
csubstr arena,
ievt::DataType const * evts,
ievt::DataType evts_sz )

Print integer events to stdout.

Definition at line 94 of file ints_utils.cpp.

95{
96 char buf[200];
97 for(ievt::DataType evtpos = 0, evtnumber = 0;
98 evtpos < evts_sz;
99 ++evtnumber,
100 evtpos += ((evts[evtpos] & ievt::WSTR) ? 3 : 1))
101 {
102 ievt::DataType evt = evts[evtpos];
103 csubstr flags = ievt::to_chars_sub(buf, evt);
104 printf("[%d][%d] %.*s(0x%x)", evtnumber, evtpos, (int)flags.len, flags.str, evt);
105 if (evt & ievt::WSTR)
106 {
107 bool in_arena = evt & ievt::AREN;
108 csubstr region = !in_arena ? parsed_yaml : arena;
109 bool safe = (evts[evtpos + 1] >= 0)
110 && (evts[evtpos + 2] >= 0)
111 && (evts[evtpos + 1] <= (int)region.len)
112 && (evts[evtpos + 2] <= ((int)region.len - evts[evtpos + 1]));
113 const char *str = safe ? (region.str + evts[evtpos + 1]) : "ERR!!!";
114 int len = safe ? evts[evtpos + 2] : 6;
115 printf(": %d [%d]~~~%.*s~~~", evts[evtpos+1], evts[evtpos+2], len, str);
116 if(in_arena)
117 printf(" (arenasz=%zu)", arena.len);
118 else
119 printf(" (srcsz=%zu)", parsed_yaml.len);
120 }
121 printf("\n");
122 }
123}
csubstr to_chars_sub(substr buf, ievt::DataType flags)
Convert bit mask of ievt::EventFlags to text.