rapidyaml 0.15.2
parse and emit YAML, and do it fast
Loading...
Searching...
No Matches
Event Handlers

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine. More...

Namespaces

namespace  c4::yml::extra::ievt

Classes

struct  c4::yml::EventHandlerStack< HandlerImpl, HandlerState >
 Use this class a base of implementations of event handler to simplify the stack logic. More...
struct  c4::yml::EventHandlerTree
 The event handler to create a ryml Tree. More...
struct  c4::yml::extra::EventHandlerInts
 A parser event handler that creates a compact representation of the YAML tree in a contiguous buffer of integers. More...

Typedefs

using c4::yml::extra::evt_size = int32_t
 data type for integer events size.

Functions

evt_size c4::yml::extra::estimate_events_ints_size (csubstr src)
 Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts.
size_t c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::evt_bits const *evts_ints, ievt::evt_bits evts_ints_sz, substr evts_testsuite)
 Create a testsuite event string from integer events.
template<class Container>
void c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::evt_bits const *evts_ints, ievt::evt_bits evts_ints_sz, Container *evts_testsuite)
 Create a testsuite event string from integer events, writing into an output container.
template<class Container>
Container c4::yml::extra::events_ints_to_testsuite (csubstr parsed_yaml, csubstr arena, ievt::evt_bits const *evts_ints, ievt::evt_bits evts_ints_sz)
 Create a testsuite event string from integer events, returning a new container with the result.
void c4::yml::extra::events_ints_print (csubstr parsed_yaml, csubstr arena, ievt::evt_bits const *evts_ints, ievt::evt_bits evts_ints_sz)
 Print integer events to stdout.

Detailed Description

rapidyaml implements its parsing logic with a two-level model, where a ParseEngine object reads through the YAML source, and dispatches events to an EventHandler bound to the ParseEngine.

Because ParseEngine is templated on the event handler, the binding uses static polymorphism, without any virtual functions. The actual handler object can be changed at run time, (but of course needs to be the type of the template parameter). This is thus a very efficient architecture, and further enables the user to provide his own custom handler if he wishes to bypass the rapidyaml Tree.

The following handlers are implemented in this project:

Event model

The event model used by the parse engine and event handlers follows very closely the event model in the YAML test suite.

Consider for example this YAML,

{foo: bar,foo2: bar2}

which would produce these events in the test-suite parlance:

+STR
+MAP {}
=VAL :foo
=VAL :bar
=VAL :foo2
=VAL :bar2
-STR
@ MAP
a map: a parent of KEYVAL/KEYSEQ/KEYMAP nodes
Definition node_type.hpp:35
@ VAL
a scalar: has a scalar (ie string) value, possibly empty. must be a leaf node, and cannot be MAP or S...
Definition node_type.hpp:34
@ DOC
a document
Definition node_type.hpp:37

For reference, the ParseEngine object will produce this sequence of calls to its bound EventHandler:

handler.begin_stream();
handler.begin_doc();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("foo");
handler.set_val_scalar_plain("bar");
handler.add_sibling();
handler.set_key_scalar_plain("foo2");
handler.set_val_scalar_plain("bar2");
handler.end_map();
handler.end_doc();
handler.end_stream();

For many other examples of all areas of YAML and how ryml's parse model corresponds to the YAML standard model, refer to the [unit tests for the parse engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).

Special events

Most of the parsing events adopted by rapidyaml in its event model are fairly obvious, but there are two less-obvious events requiring some explanation.

These events exist to make it easier to parse some special YAML cases. They are called by the parser when a just-handled value/container is actually the first key of a new map:

For example, consider an implicit map inside a seq: [a: b, c: d] which is parsed as [{a: b}, {c: d}]. The standard event sequence for this YAML would be the following:

handler.begin_seq_val_flow();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("a");
handler.set_val_scalar_plain("b");
handler.end_map();
handler.add_sibling();
handler.begin_map_val_flow();
handler.set_key_scalar_plain("c");
handler.set_val_scalar_plain("d");
handler.end_map();
handler.end_seq();

The problem with this event sequence is that it forces the parser to delay setting the val scalar (in this case "a" and "c") until it knows whether the scalar is a key or a val. This would require the parser to store the scalar until this time. For instance, in the example above, the parser should delay setting "a" and "c", because they are in fact keys and not vals. Until then, the parser would have to store "a" and "c" in its internal state. The downside is that this complexity cost would apply even if there is no implicit map – every val in a seq would have to be delayed until one of the disambiguating subsequent tokens ,-]: is found. By calling this function, the parser can avoid this complexity, by preemptively setting the scalar as a val. Then a call to this function will create the map and rearrange the scalar as key. Now the cost applies only once: when a seqimap starts. So the following (easier and cheaper) event sequence below has the same effect as the event sequence above:

handler.begin_seq_val_flow();
handler.set_val_scalar_plain("notmap");
handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
handler.end_map();
handler.set_val_scalar_plain("c"); // "c" also as val!
handler.actually_as_block_flow(); // likewise
handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
handler.end_map();
handler.end_seq();

This also applies to container keys (although ryml's tree cannot accomodate these): the parser can preemptively set a container as a val, and call this event to turn that container into a key. For example, consider this yaml:

[aa, bb]: [cc, dd]
# ^ ^ ^
# | | |
# (2) (1) (3) <- event sequence

The standard event sequence for this YAML would be the following:

handler.begin_map_val_block(); // (1)
handler.begin_seq_key_flow(); // (2)
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.begin_seq_val_flow(); // (3)
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

The problem with the sequence above is that, reading from left-to-right, the parser can only detect the proper calls at (1) and (2) once it reaches (1) in the YAML source. So, the parser would have to buffer the entire event sequence starting from the beginning until it reaches (1). Using this function, the parser can do instead:

handler.begin_seq_val_flow(); // (2) -- preemptively as val!
handler.set_val_scalar_plain("aa");
handler.add_sibling();
handler.set_val_scalar_plain("bb");
handler.end_seq();
handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
handler.begin_seq_val_flow(); // (3) -- go on as before
handler.set_val_scalar_plain("cc");
handler.add_sibling();
handler.set_val_scalar_plain("dd");
handler.end_seq();
handler.end_map();

Typedef Documentation

◆ evt_size

using c4::yml::extra::evt_size = int32_t

data type for integer events size.

This is set to an int32_t integer to allow compatibility with a wide range of processing languages.

Definition at line 38 of file event_handler_ints.hpp.

Function Documentation

◆ estimate_events_ints_size()

evt_size c4::yml::extra::estimate_events_ints_size ( csubstr src)

Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer required for EventHandlerInts.

This estimation is meant to exceed the actual number of required events.

Note
This function must overpredict. It does so for every case in the hundreds/thousands of extensive tests of rapidyaml – both for the YAML test suite and the internal cases. If you find a case where that does not hold, it is a bug. Please report it at https://github.com/biojppm/rapidyaml/issues!

Definition at line 25 of file event_handler_ints.cpp.

26{
27 evt_size count = 7; // BSTR + BDOC + =VAL + EDOC + ESTR
28 for(size_t i = 0; i < src.len; ++i)
29 {
30 switch(src.str[i])
31 {
32 case ':': // this has strings preceding/following it
33 case ',': // overestimate, assume map
34 case '%': // assume TAGD->string + TAGV->string
35 count += 6;
36 break;
37 // these have (or are likely to have) a string following it
38 case '-':
39 case '&':
40 case '*':
41 case '<':
42 case '!':
43 case '\'':
44 case '"':
45 case '|':
46 case '>':
47 case '\n':
48 count += 3;
49 break;
50 case '[':
51 case ']':
52 count += 4;
53 break;
54 case '{':
55 case '}':
56 count += 7;
57 break;
58 case '?':
59 count += 5;
60 break;
61 }
62 }
63 return count;
64}
int32_t evt_size
data type for integer events size.
size_t len
the length of the substring
Definition substr.hpp:218
C * str
a restricted pointer to the first character of the substring
Definition substr.hpp:216

Referenced by estimate_events_ints_size().

◆ events_ints_to_testsuite() [1/3]

size_t c4::yml::extra::events_ints_to_testsuite ( csubstr parsed_yaml,
csubstr arena,
ievt::evt_bits const * evts_ints,
ievt::evt_bits evts_ints_sz,
substr evts_testsuite )

Create a testsuite event string from integer events.

This overload receives a buffer where the string is to be written, and returns the size needed to accomodate the result. The size of the buffer is strictly respected. The caller must check that the returned size is smaller than the buffer's size to ensure that the result is complete. If that's not the case, the user must resize the buffer and call again.

Definition at line 33 of file ints_to_testsuite.cpp.

38{
39 auto getstr = [&](ievt::evt_bits i){
40 bool in_arena = evts_ints[i] & ievt::AREN;
41 csubstr region = !in_arena ? parsed_yaml : arena;
42 return region.sub((size_t)evts_ints[i+1], (size_t)evts_ints[i+2]);
43 };
44 size_t sz = 0;
45 auto append = [&](csubstr s){
46 size_t next = sz + s.len;
47 if (s.len && (next <= evts_test_suite.len && evts_test_suite.len))
48 memcpy(evts_test_suite.str + sz, s.str, s.len);
49 sz = next;
50 };
51 bool has_tag = false;
52 csubstr tag;
53 auto maybe_append_tag = [&]{
54 if(has_tag)
55 {
56 if(tag.begins_with('<'))
57 {
58 append(" ");
59 append(tag);
60 }
61 else
62 {
63 RYML_ASSERT_BASIC_(tag.begins_with('!'));
64 append(" <");
65 append(tag);
66 append(">");
67 }
68 }
69 has_tag = false;
70 };
71 bool has_anchor = false;
72 csubstr anchor;
73 auto maybe_append_anchor = [&]{
74 if(has_anchor)
75 {
76 append(" &");
77 append(anchor);
78 }
79 has_anchor = false;
80 };
81 auto append_cont = [&](csubstr evt, csubstr style){
82 append(evt);
83 if(style.len)
84 {
85 append(" ");
86 append(style);
87 }
88 maybe_append_anchor();
89 maybe_append_tag();
90 append("\n");
91 };
92 auto append_esc = [&](csubstr str){
93 substr buf = sz <= evts_test_suite.len ? evts_test_suite.sub(sz) : evts_test_suite.last(0);
94 sz += escape_scalar(buf, str);
95 append("\n");
96 };
97 auto append_val = [&](csubstr evt, csubstr val){
98 append("=VAL");
99 maybe_append_anchor();
100 maybe_append_tag();
101 append(" ");
102 append(evt);
103 append_esc(val);
104 };
105 ievt::evt_bits evt = 0;
106 for(ievt::evt_bits i = 0; i < evts_ints_sz; i += (evt & ievt::WSTR) ? 3 : 1)
107 {
108 evt = evts_ints[i];
109 if(evt & ievt::SCLR)
110 {
111 csubstr s = getstr(i);
112 if(evt & ievt::SQUO)
113 append_val("'", s);
114 else if(evt & ievt::DQUO)
115 append_val("\"", s);
116 else if(evt & ievt::LITL)
117 append_val("|", s);
118 else if(evt & ievt::FOLD)
119 append_val(">", s);
120 else //if(evt & ievt::PLAI)
121 append_val(":", s);
122 }
123 else if((evt & ievt::BSEQ) == ievt::BSEQ)
124 {
125 if(evt & ievt::FLOW)
126 append_cont("+SEQ", "[]");
127 else
128 append_cont("+SEQ", "");
129 }
130 else if((evt & ievt::ESEQ) == ievt::ESEQ)
131 {
132 append("-SEQ\n");
133 }
134 else if((evt & ievt::BMAP) == ievt::BMAP)
135 {
136 if(evt & ievt::FLOW)
137 append_cont("+MAP", "{}");
138 else
139 append_cont("+MAP", "");
140 }
141 else if((evt & ievt::EMAP) == ievt::EMAP)
142 {
143 append("-MAP\n");
144 }
145 else if(evt & ievt::ALIA)
146 {
147 append("=ALI *");
148 append(getstr(i));
149 append("\n");
150 }
151 else if(evt & ievt::TAG_)
152 {
153 has_tag = true;
154 tag = getstr(i);
155 }
156 else if(evt & ievt::ANCH)
157 {
158 has_anchor = true;
159 anchor = getstr(i);
160 }
161 else if((evt & ievt::BDOC) == ievt::BDOC)
162 {
163 if(evt & ievt::EXPL)
164 append("+DOC ---\n");
165 else
166 append("+DOC\n");
167 }
168 else if((evt & ievt::EDOC) == ievt::EDOC)
169 {
170 if(evt & ievt::EXPL)
171 append("-DOC ...\n");
172 else
173 append("-DOC\n");
174 }
175 else if((evt & ievt::BSTR) == ievt::BSTR)
176 {
177 append("+STR\n");
178 }
179 else if((evt & ievt::ESTR) == ievt::ESTR)
180 {
181 append("-STR\n");
182 }
183 }
184 return sz;
185}
basic_substring< char > substr
a mutable string view
Definition substr.hpp:2355
basic_substring< const char > csubstr
an immutable string view
Definition substr.hpp:2356
int32_t evt_bits
data type for integer events bits.
@ SCLR
scalar (=VAL in test suite events)
@ LITL
scalar: block literal (|)
@ EMAP
end map (-MAP in test suite events)
@ DQUO
scalar: double-quoted ("")
@ FOLD
scalar: block folded (>)
@ BMAP
begin map (+MAP in test suite events)
@ ESTR
end stream (-STR in test suite events)
@ BSTR
begin stream (+STR in test suite events)
@ BSEQ
begin seq (+SEQ in test suite events)
@ ESEQ
end seq (-SEQ in test suite events)
@ WSTR
WithSTRing: mask of all events that encode a string following the event. For such events,...
@ FLOW
container: flow: [] for seqs or {} for maps
@ BDOC
begin doc (+DOC in test suite events)
@ AREN
Special flag to mark events whose string was placed in the arena. This happens when the filtered stri...
@ EDOC
end doc (-DOC in test suite events)
@ EXPL
--- (with BDOC) or ... (with EDOC)
@ SQUO
scalar: single-quoted (')
size_t escape_scalar(substr buffer, csubstr scalar, bool keep_newlines=false)
Escape a scalar to an existing buffer, using escape_scalar_fn.
bool begins_with(const C c) const noexcept
true if the first character of the string is c
Definition substr.hpp:850
basic_substring sub(size_t first) const noexcept
return [first,len[
Definition substr.hpp:502

Referenced by events_ints_to_testsuite(), events_ints_to_testsuite(), and events_ints_to_testsuite().

◆ events_ints_to_testsuite() [2/3]

template<class Container>
void c4::yml::extra::events_ints_to_testsuite ( csubstr parsed_yaml,
csubstr arena,
ievt::evt_bits const * evts_ints,
ievt::evt_bits evts_ints_sz,
Container * evts_testsuite )

Create a testsuite event string from integer events, writing into an output container.

Definition at line 39 of file ints_to_testsuite.hpp.

44{
45 size_t len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
46 if(len > evts_testsuite->size())
47 {
48 evts_testsuite->resize(len);
49 len = events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, to_substr(*evts_testsuite));
50 }
51 evts_testsuite->resize(len);
52}
size_t events_ints_to_testsuite(csubstr parsed_yaml, csubstr arena, ievt::evt_bits const *evts_ints, ievt::evt_bits evts_ints_sz, substr evts_test_suite)
Create a testsuite event string from integer events.
substr to_substr(char(&s)[N]) noexcept
Definition substr.hpp:2376

◆ events_ints_to_testsuite() [3/3]

template<class Container>
Container c4::yml::extra::events_ints_to_testsuite ( csubstr parsed_yaml,
csubstr arena,
ievt::evt_bits const * evts_ints,
ievt::evt_bits evts_ints_sz )

Create a testsuite event string from integer events, returning a new container with the result.

Definition at line 58 of file ints_to_testsuite.hpp.

62{
63 Container ret;
64 events_ints_to_testsuite(parsed_yaml, arena, evts_ints, evts_ints_sz, &ret);
65 return ret;
66} // LCOV_EXCL_LINE

◆ events_ints_print()

void c4::yml::extra::events_ints_print ( csubstr parsed_yaml,
csubstr arena,
ievt::evt_bits const * evts,
ievt::evt_bits evts_sz )

Print integer events to stdout.

Definition at line 103 of file ints_utils.cpp.

104{
105 char buf[200];
106 for(ievt::evt_bits evtpos = 0, evtnumber = 0;
107 evtpos < evts_sz;
108 ++evtnumber,
109 evtpos += ((evts[evtpos] & ievt::WSTR) ? 3 : 1))
110 {
111 ievt::evt_bits evt = evts[evtpos];
112 csubstr flags = ievt::to_str_sub(buf, evt);
113 printf("[%d][%d] %.*s(0x%x)", evtnumber, evtpos, (int)flags.len, flags.str, evt);
114 if (evt & ievt::WSTR)
115 {
116 bool in_arena = evt & ievt::AREN;
117 csubstr region = !in_arena ? parsed_yaml : arena;
118 bool safe = (evts[evtpos + 1] >= 0)
119 && (evts[evtpos + 2] >= 0)
120 && (evts[evtpos + 1] <= (ievt::evt_bits)region.len) // NOLINT
121 && (evts[evtpos + 2] <= ((ievt::evt_bits)region.len - evts[evtpos + 1]));
122 const char *str = safe ? (region.str + evts[evtpos + 1]) : "ERR!!!";
123 ievt::evt_bits len = safe ? evts[evtpos + 2] : 6;
124 printf(": %d [%d]~~~%.*s~~~", evts[evtpos+1], evts[evtpos+2], len, str);
125 if(in_arena)
126 printf(" (arenasz=%zu)", arena.len);
127 else
128 printf(" (srcsz=%zu)", parsed_yaml.len);
129 }
130 printf("\n");
131 }
132}
csubstr to_str_sub(substr buf, ievt::evt_bits flags)
Convert bit mask of ievt::EventBits to text.