A parser event handler that creates a compact representation of the YAML tree in a contiguous buffer of integers.
More...
|
|
| | EventHandlerInts (c4::yml::Callbacks const &cb) |
| |
| | EventHandlerInts () |
| |
| void | reset (substr str, substr arena, ievt::DataType *dst, int32_t dst_size) |
| |
| int | required_size_events () const |
| | get the size needed for the event buffer from the previous parse More...
|
| |
| size_t | required_size_arena () const |
| | get the size needed for the arena from the previous parse More...
|
| |
| bool | fits_buffers () const |
| | Predicate to test if the event and arena buffers successfully accomodated all the parse events. More...
|
| |
| void | reserve_arena (int) |
| |
| TagDirectives & | tag_directives () |
| |
| TagCache & | tag_cache () |
| |
|
| void | start_parse (const char *filename, substr src) |
| |
| void | finish_parse () |
| |
| void | cancel_parse () |
| |
|
| void | begin_stream () |
| |
| void | end_stream () |
| |
|
| void | begin_doc () |
| | implicit doc start (without —) More...
|
| |
| void | end_doc () |
| | implicit doc end (without ...) More...
|
| |
| void | begin_doc_expl () |
| | explicit doc start, with — More...
|
| |
| void | end_doc_expl () |
| | explicit doc end, with ... More...
|
| |
|
| void | begin_map_key_flow () |
| |
| void | begin_map_key_block () |
| |
| void | begin_map_val_flow () |
| |
| void | begin_map_val_block () |
| |
| void | end_map_block () |
| |
| void | end_map_flow (bool) |
| |
|
| void | begin_seq_key_flow () |
| |
| void | begin_seq_key_block () |
| |
| void | begin_seq_val_flow () |
| |
| void | begin_seq_val_block () |
| |
| void | end_seq_block () |
| |
| void | end_seq_flow (bool) |
| |
|
| void | add_sibling () |
| |
| void | actually_val_is_first_key_of_new_map_flow () |
| | set the previous val as the first key of a new map, with flow style. More...
|
| |
| void | actually_val_is_first_key_of_new_map_block () |
| | like its flow counterpart, but this function can only be called after the end of a flow-val at root or doc level. More...
|
| |
|
| void | set_key_scalar_plain_empty () |
| |
| void | set_val_scalar_plain_empty () |
| |
| void | set_key_scalar_plain (csubstr scalar) |
| |
| void | set_val_scalar_plain (csubstr scalar) |
| |
| void | set_key_scalar_dquoted (csubstr scalar) |
| |
| void | set_val_scalar_dquoted (csubstr scalar) |
| |
| void | set_key_scalar_squoted (csubstr scalar) |
| |
| void | set_val_scalar_squoted (csubstr scalar) |
| |
| void | set_key_scalar_literal (csubstr scalar) |
| |
| void | set_val_scalar_literal (csubstr scalar) |
| |
| void | set_key_scalar_folded (csubstr scalar) |
| |
| void | set_val_scalar_folded (csubstr scalar) |
| |
| void | mark_key_scalar_unfiltered () |
| |
| void | mark_val_scalar_unfiltered () |
| |
|
| void | set_key_anchor (csubstr anchor) |
| |
| void | set_val_anchor (csubstr anchor) |
| |
| void | set_key_ref (csubstr ref) |
| |
| void | set_val_ref (csubstr ref) |
| |
|
| void | set_key_tag (csubstr tag) |
| |
| void | set_val_tag (csubstr tag) |
| |
|
| void | add_directive_yaml (csubstr yaml_version) |
| |
| void | add_directive_tag (csubstr handle, csubstr prefix) |
| |
|
| substr | arena () |
| |
| substr | arena_rem () |
| |
| substr | alloc_arena (size_t len) |
| | this may fail, in which case an empty string is returned More...
|
| |
|
| void | _push () |
| | push a new parent, add a child to the new parent, and set the child as the current node More...
|
| |
| void | _pop () |
| | end the current scope More...
|
| |
| template<c4::yml::type_bits bits> |
| void | _enable__ () noexcept |
| |
| template<c4::yml::type_bits bits> |
| void | _disable__ () noexcept |
| |
| template<c4::yml::type_bits bits> |
| bool | _has_any__ () const noexcept |
| |
| int32_t | _next (int32_t pos) const noexcept |
| |
| int32_t | _prev (int32_t pos) const noexcept |
| |
| bool | _is_sub_ (csubstr str) const noexcept |
| |
| void | _send_flag_only_ (ievt::DataType flags) |
| |
| void | _send_str_ (csubstr scalar, ievt::DataType flags) |
| |
| void | _mark_parent_with_children_ () |
| |
| csubstr | _get_latest_empty_scalar () const |
| |
| int32_t | _find_last_bdoc (int32_t pos) const |
| |
| int32_t | _find_matching_open (ievt::DataType open, ievt::DataType close, int32_t pos) const |
| |
| int32_t | _extend_left_to_include_tag_and_or_anchor (int32_t pos) const |
| |
A parser event handler that creates a compact representation of the YAML tree in a contiguous buffer of integers.
The integers are ievt::EventFlags containing masks (to represent events), interleaved with offset+length (to represent strings in the source buffer).
This is meant for use by other programming languages, and supports container keys (unlike the ryml tree). It parses faster than the ryml tree parser, because the resulting data structure is much simpler.
The resulting integer buffer is a linear array of integers containing events (as a mask of ievt::EventFlags), which in some cases (see ievt::WSTR) are followed by an encoded string (encoded as an offset and length to the parsed source buffer).
For example, parsing [a, bb, ccc] results in the following event buffer (grouped to highlight the event sequence structure):
Here is a sketch clarifying the meaning of this event sequence:
source : [a, bb, ccc]
has a string........
| offset "a"
| | length "a"
| | |
event0 event1 event2 [ event3 "a"......|..|
| | | | | |
(start) +--------+-------+------------------+---------------+--+-----> (continued)
i : 0 1 2 3 4 5
has a string............. has a string.............
| offset "bb" | offset "ccc"
| | length "bb" | | length "ccc"
| | | | | |
event4 "bb"..........|..| event5 "ccc".........|..|
| | | | | |
(cont)--> -----+--------------------+--+--------------+--------------------+--+-----> (continued)
i : 6 | 7 8 9 | 10 11
| |
prev event has string prev event has string
(to get to prev, jump (to get to prev, jump
back 3 slots: ie 6->3) back 3 slots: ie 9->6)
event6 ] event7 event8
| | |
(cont)--> -----+-------------+--------+-----| (end)
i : 12 | 13 14
|
prev event has string
(to get to it, jump
back 3 slots: ie 12->9)
Note that the buffer contains both events and strings encoded as integer pairs. That is, events that have an associated string are immediately followed by two integers providing the offset and length of that string in the source buffer. (In the example above, this happens in the events for the strings a, bb, and ccc at positions 3, 6 and 9, respectively).
The flag ievt::PSTR and the mask ievt::WSTR are provided to enable easier iteration over the array: you can use them to test for presence of a string when iterating over the array.
The flag ievt::PSTR announces that an event is preceded by a string. That is, the previous event has a string, so that when this flag is found while iterating right-to-left, a jump of -3 should be used to get at the bitmask of the previous event. (In the example above, this flag is present for the events for bb and ccc, but not a because it is not preceded by a string).
Likewise, to signify that the current event is followed by a string, there is the mask ievt::WSTR, which is a mask of all the flags of events that have a string: ievt::SCLR, ievt::ALIA, ievt::ANCH and ievt::TAG_. While iterating left-to-right in the array, presence of any of the bits in the mask ievt::WSTR means that a jump of +3 should be employed to get at the bitmask of the next event.
Here's another example with the result of parsing a: bb
Typical code to iterate left-to-right over the array will look like this:
{c++}
substr src = ...;
const int events[] = {...};
int events_size = ...;
for(int i = 0; i < events_size; ++i)
{
{
size_t offset = (size_t)events[i+1];
size_t length = (size_t)events[i+2];
csubstr str = region.sub(offset, length);
...
i += 2;
}
else
{
...
}
}
This handler must be initialized with the input source buffer, the output arena, and the output event buffer. This handler will not take ownership nor attempt to resize the output buffer. If the size required for the output buffer or arena are larger than their actual size, parsing goes all way to the end, determining the required buffer sizes without writing anything past the end of the respective buffer. After parsing is finished, the user must ensure that the buffer size was enough to accomodate all the data that needs to be written into it, or react accordingly (eg, throw an error, or resize the buffer then retry the parse).
A couple of functions will be helpful to do this. After parsing, EventHandlerInts::fits_buffers() must be used to verify that the output buffers were enough to accomodate the results. Then, EventHandlerInts::required_size_events() and EventHandlerInts::required_size_arena() can be used to retrieve to necessary information. To get an estimation of the number of events before parsing, see estimate_events_ints_size().
Typical code to parse YAML with this handler will look like this:
{c++}
csubstr filename = ...;
substr src = ...;
extra::EventHandlerInts handler;
ParseEngine<extra::EventHandlerInts> parser(&handler);
std::vector<int> evts;
evts.resize((size_t)estimated_size);
handler.reset(src,
arena, evts.data(), (
int)evts.size());
parser.parse_in_place_ev(filename, src);
if(handler.fits_buffers())
{
evts.resize((size_t)handler.required_size_events());
...
}
else
{
error("buffer could not accomodate all the events");
}
int32_t estimate_events_ints_size(csubstr src)
Read YAML source and, without undergoing a full parse, estimate the size of the integer buffer requir...
The result of estimate_events_ints_size() (click to see more info) must be an overprediction: it overpredicts for every single case among the many hundreds covered in the unit tests. This is deliberate, and aims at ensuring that a retry parse is not needed. But conceivably, it may underpredict in some instances not found in the out tests. What to do then?
First, open an issue to allow the estimation to be improved! Second, there are two ways to handle this situation in code:
1) throw an error (as sketched above)
2) grow the buffer to the required size (see EventHandlerInts::required_size_events()), and then parse again
If your code must be able to handle any case including where the prediction undershoots before the estimate function is fixed (after you open the issue), that is, if you are considering a parse retry, there is something important that needs attention. The YAML source buffer is mutated in-place during the parse, and cannot be used to parse again. So if you want to retry, you need to keep a pristine copy of the source, and use it for the retry:
{c++}
const std::string src = ...;
std::string parsed_src = src;
std::vector<int> evts((size_t)estimated_size);
std::vector<char>
arena(src.size());
ParseEngine<extra::EventHandlerInts> parser(&handler);
parser.parse_in_place_ev(filename,
to_substr(parsed_src));
if(handler.fits_buffers())
{
evts.resize((size_t)handler.required_size());
...
}
else
{
evts.resize((size_t)handler.required_size_events());
arena.resize(handler.required_size_arena());
parsed_src = src;
parser.parse_in_place_ev(filename,
to_substr(parsed_src));
assert((size_t)handler.fits_buffers());
}
substr to_substr(substr s) noexcept
neutral version for use in generic code
When bringing this to other programming languages, the semantics will be very similar to this.
Definition at line 436 of file event_handler_ints.hpp.