rapidyaml 0.15.2
parse and emit YAML, and do it fast
Loading...
Searching...
No Matches
parse_engine.hpp
Go to the documentation of this file.
1#ifndef C4_YML_PARSE_ENGINE_HPP_
2#define C4_YML_PARSE_ENGINE_HPP_
3
4#ifndef C4_YML_PARSER_STATE_HPP_
6#endif
7#ifndef C4_YML_PARSE_OPTIONS_HPP_
9#endif
10#ifndef C4_YML_FWD_HPP_
11#include "c4/yml/fwd.hpp"
12#endif
13
14
15#if defined(_MSC_VER)
16# pragma warning(push)
17# pragma warning(disable: 4251/*needs to have dll-interface to be used by clients of struct*/)
18#endif
19
20// NOLINTBEGIN(hicpp-signed-bitwise)
21
22namespace c4 {
23namespace yml {
24
25/** @addtogroup doc_parse
26 * @{ */
27
28/** @defgroup doc_event_handlers Event Handlers
29 *
30 * @brief rapidyaml implements its parsing logic with a two-level
31 * model, where a @ref ParseEngine object reads through the YAML
32 * source, and dispatches events to an EventHandler bound to the @ref
33 * ParseEngine. Because @ref ParseEngine is templated on the event
34 * handler, the binding uses static polymorphism, without any virtual
35 * functions. The actual handler object can be changed at run time,
36 * (but of course needs to be the type of the template parameter).
37 * This is thus a very efficient architecture, and further enables the
38 * user to provide his own custom handler if he wishes to bypass the
39 * rapidyaml @ref Tree.
40 *
41 * The following handlers are implemented in this project:
42 *
43 * - @ref EventHandlerTree is the handler responsible for creating the
44 * ryml @ref Tree . This is part of the library.
45 *
46 * - Extra handlers (not part of the library, but provided as extra classes):
47 *
48 * - @ref extra::EventHandlerInts parses YAML into a contiguous
49 * integer array representing the YAML structure.
50 * - [play.yaml.com](https://play.yaml.com/)
51 * - [matrix.yaml.info/](https://matrix.yaml.info/)
52 * - the CI of this project.
53 *
54 *
55 * ### Event model
56 *
57 * The event model used by the parse engine and event handlers follows
58 * very closely the event model in the [YAML test
59 * suite](https://github.com/yaml/yaml-test-suite).
60 *
61 * Consider for example this YAML,
62 * ```yaml
63 * {foo: bar,foo2: bar2}
64 * ```
65 * which would produce these events in the test-suite parlance:
66 * ```
67 * +STR
68 * +DOC
69 * +MAP {}
70 * =VAL :foo
71 * =VAL :bar
72 * =VAL :foo2
73 * =VAL :bar2
74 * -MAP
75 * -DOC
76 * -STR
77 * ```
78 *
79 * For reference, the @ref ParseEngine object will produce this
80 * sequence of calls to its bound EventHandler:
81 * ```cpp
82 * handler.begin_stream();
83 * handler.begin_doc();
84 * handler.begin_map_val_flow();
85 * handler.set_key_scalar_plain("foo");
86 * handler.set_val_scalar_plain("bar");
87 * handler.add_sibling();
88 * handler.set_key_scalar_plain("foo2");
89 * handler.set_val_scalar_plain("bar2");
90 * handler.end_map();
91 * handler.end_doc();
92 * handler.end_stream();
93 * ```
94 *
95 * For many other examples of all areas of YAML and how ryml's parse
96 * model corresponds to the YAML standard model, refer to the [unit
97 * tests for the parse
98 * engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).
99 *
100 *
101 * ### Special events
102 *
103 * Most of the parsing events adopted by rapidyaml in its event model
104 * are fairly obvious, but there are two less-obvious events requiring
105 * some explanation.
106 *
107 * These events exist to make it easier to parse some special YAML
108 * cases. They are called by the parser when a just-handled
109 * value/container is actually the first key of a new map:
110 *
111 * - `actually_val_is_first_key_of_new_map_flow()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerTree" / @ref EventHandlerInts::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerInts")
112 * - `actually_val_is_first_key_of_new_map_block()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerTree" / @ref EventHandlerInts::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerInts")
113 *
114 * For example, consider an implicit map inside a seq: `[a: b, c:
115 * d]` which is parsed as `[{a: b}, {c: d}]`. The standard event
116 * sequence for this YAML would be the following:
117 * ```cpp
118 * handler.begin_seq_val_flow();
119 * handler.begin_map_val_flow();
120 * handler.set_key_scalar_plain("a");
121 * handler.set_val_scalar_plain("b");
122 * handler.end_map();
123 * handler.add_sibling();
124 * handler.begin_map_val_flow();
125 * handler.set_key_scalar_plain("c");
126 * handler.set_val_scalar_plain("d");
127 * handler.end_map();
128 * handler.end_seq();
129 * ```
130 * The problem with this event sequence is that it forces the
131 * parser to delay setting the val scalar (in this case "a" and
132 * "c") until it knows whether the scalar is a key or a val. This
133 * would require the parser to store the scalar until this
134 * time. For instance, in the example above, the parser should
135 * delay setting "a" and "c", because they are in fact keys and
136 * not vals. Until then, the parser would have to store "a" and
137 * "c" in its internal state. The downside is that this complexity
138 * cost would apply even if there is no implicit map -- every val
139 * in a seq would have to be delayed until one of the
140 * disambiguating subsequent tokens `,-]:` is found.
141 * By calling this function, the parser can avoid this complexity,
142 * by preemptively setting the scalar as a val. Then a call to
143 * this function will create the map and rearrange the scalar as
144 * key. Now the cost applies only once: when a seqimap starts. So
145 * the following (easier and cheaper) event sequence below has the
146 * same effect as the event sequence above:
147 * ```cpp
148 * handler.begin_seq_val_flow();
149 * handler.set_val_scalar_plain("notmap");
150 * handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
151 * handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
152 * handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
153 * handler.end_map();
154 * handler.set_val_scalar_plain("c"); // "c" also as val!
155 * handler.actually_as_block_flow(); // likewise
156 * handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
157 * handler.end_map();
158 * handler.end_seq();
159 * ```
160 * This also applies to container keys (although ryml's tree
161 * cannot accomodate these): the parser can preemptively set a
162 * container as a val, and call this event to turn that container
163 * into a key. For example, consider this yaml:
164 * ```yaml
165 * [aa, bb]: [cc, dd]
166 * # ^ ^ ^
167 * # | | |
168 * # (2) (1) (3) <- event sequence
169 * ```
170 * The standard event sequence for this YAML would be the
171 * following:
172 * ```cpp
173 * handler.begin_map_val_block(); // (1)
174 * handler.begin_seq_key_flow(); // (2)
175 * handler.set_val_scalar_plain("aa");
176 * handler.add_sibling();
177 * handler.set_val_scalar_plain("bb");
178 * handler.end_seq();
179 * handler.begin_seq_val_flow(); // (3)
180 * handler.set_val_scalar_plain("cc");
181 * handler.add_sibling();
182 * handler.set_val_scalar_plain("dd");
183 * handler.end_seq();
184 * handler.end_map();
185 * ```
186 * The problem with the sequence above is that, reading from
187 * left-to-right, the parser can only detect the proper calls at
188 * (1) and (2) once it reaches (1) in the YAML source. So, the
189 * parser would have to buffer the entire event sequence starting
190 * from the beginning until it reaches (1). Using this function,
191 * the parser can do instead:
192 * ```cpp
193 * handler.begin_seq_val_flow(); // (2) -- preemptively as val!
194 * handler.set_val_scalar_plain("aa");
195 * handler.add_sibling();
196 * handler.set_val_scalar_plain("bb");
197 * handler.end_seq();
198 * handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
199 * handler.begin_seq_val_flow(); // (3) -- go on as before
200 * handler.set_val_scalar_plain("cc");
201 * handler.add_sibling();
202 * handler.set_val_scalar_plain("dd");
203 * handler.end_seq();
204 * handler.end_map();
205 * ```
206 */
207
208
209/** Quickly inspect the source to estimate the number of nodes the
210 * resulting tree is likely to have. If a tree is empty before
211 * parsing, considerable time will be spent growing it, so calling
212 * this to reserve the tree size prior to parsing is likely to
213 * result in a time gain. We encourage using this method before
214 * parsing, but as always measure its impact in performance to
215 * obtain a good trade-off.
216 *
217 * @note since this method is meant for optimizing performance, it
218 * is approximate. The result may be actually smaller than the
219 * resulting number of nodes, notably if the YAML uses implicit
220 * maps as flow seq members as in `[these: are, individual:
221 * maps]`. */
222RYML_EXPORT id_type estimate_tree_capacity(csubstr src); // NOLINT(readability-redundant-declaration)
223
224
225
226/** @cond dev */
227struct FilterResult;
229typedef enum BlockChomp_ { // NOLINT
230 CHOMP_CLIP, //!< single newline at end (default)
231 CHOMP_STRIP, //!< no newline at end (-)
232 CHOMP_KEEP //!< all newlines from end (+)
233} BlockChomp_e;
234struct ScannedBlock
235{
236 substr scalar;
237 size_t indentation;
238 BlockChomp_e chomp;
239};
240struct ScannedScalar
241{
242 substr scalar;
243 bool needs_filter;
244};
245/** store pending tag or anchor/ref annotations */
246struct Annotation
247{
248 struct Entry
249 {
250 csubstr str;
251 size_t indentation;
252 size_t line;
253 csubstr orig;
254 };
255 Entry annotations[2];
256 uint8_t num_entries;
257};
258/** @endcond */
259
260
261//-----------------------------------------------------------------------------
262//-----------------------------------------------------------------------------
263//-----------------------------------------------------------------------------
264
265/** This is the main driver of parsing logic: it scans the YAML or
266 * JSON source for tokens, and emits the appropriate sequence of
267 * parsing events to its event handler. The parse engine itself has no
268 * special limitations, and *can* accomodate containers as keys; it is the
269 * event handler may introduce additional constraints.
270 *
271 * There are two implemented handlers (see @ref doc_event_handlers,
272 * which has important notes about the event model):
273 *
274 * - @ref EventHandlerTree is the handler responsible for creating the
275 * ryml @ref Tree
276 *
277 * - @ref extra::EventHandlerInts is the handler responsible for
278 * emitting integer-coded events. It is intended for implementing
279 * fully-conformant parsing in other programming languages
280 * (integration is currently under work for
281 * [YamlScript](https://github.com/yaml/yamlscript) and
282 * [go-yaml](https://github.com/yaml/go-yaml/)). It is not part of
283 * the library and is not installed.
284 *
285 */
286template<class EventHandler>
288{
289public:
290
291 using handler_type = EventHandler;
292
293public:
294
295 /** @name construction and assignment */
296 /** @{ */
297
298 ParseEngine(EventHandler *evt_handler, ParserOptions const& opts={});
299 ~ParseEngine() noexcept;
300
303 ParseEngine& operator=(ParseEngine &&) noexcept;
304 ParseEngine& operator=(ParseEngine const&);
305
306 /** @} */
307
308public:
309
310 /** @name modifiers */
311 /** @{ */
312
313 /** Reserve a certain capacity for the parsing stack.
314 * This should be larger than the expected depth of the parsed
315 * YAML tree.
316 *
317 * The parsing stack is the only (potential) heap memory used
318 * directly by the parser.
319 *
320 * If the requested capacity is below the default
321 * stack size of 16, the memory is used directly in the parser
322 * object; otherwise it will be allocated from the heap.
323 *
324 * @note this reserves memory only for the parser itself; all the
325 * allocations for the parsed tree will go through the tree's
326 * allocator (when different).
327 *
328 * @note for maximum efficiency, the tree and the arena can (and
329 * should) also be reserved. */
330 void reserve_stack(id_type capacity)
331 {
332 RYML_ASSERT_BASIC_(m_evt_handler);
333 m_evt_handler->m_stack.reserve(capacity);
334 }
335
336 /** Reserve a certain capacity for the array used to track node
337 * locations in the source buffer. */
338 void reserve_locations(size_t num_source_lines)
339 {
340 _resize_locations(num_source_lines);
341 }
342
343 /** @} */
344
345public:
346
347 /** @name getters */
348 /** @{ */
349
350 /** Get the options used to build this parser object. */
351 ParserOptions const& options() const { return m_options; }
352
353 /** Get the current callbacks in the parser. */
354 Callbacks const& callbacks() const { RYML_ASSERT_BASIC_(m_evt_handler); return m_evt_handler->m_stack.m_callbacks; }
355
356 /** Get the name of the latest file parsed by this object. */
357 csubstr filename() const { return m_evt_handler->m_curr ? m_evt_handler->m_curr->pos.name : csubstr{}; }
358
359 /** Get the latest YAML buffer parsed by this object. */
360 csubstr source() const { return m_evt_handler ? m_evt_handler->m_src : csubstr{}; }
361
362 /** Get the encoding of the latest YAML buffer parsed by this object.
363 * If no encoding was specified, UTF8 is assumed as per the YAML standard. */
364 Encoding_e encoding() const { return m_encoding != NOBOM ? m_encoding : UTF8; }
365
366 id_type stack_capacity() const { RYML_ASSERT_BASIC_(m_evt_handler); return m_evt_handler->m_stack.capacity(); }
367 size_t locations_capacity() const { return m_newline_offsets_capacity; }
368
369 /** @} */
370
371public:
372
373 /** @name parse methods */
374 /** @{ */
375
376 /** parse YAML in place, emitting events to the current handler */
378
379 /** parse JSON in place, emitting events to the current handler */
381
382 /** @} */
383
384public:
385
386 /** @name locations */
387 /** @{ */
388
389 /** Get the string starting at a particular location, to the end
390 * of the parsed source buffer. */
392
393 /** Given a pointer to a buffer position, get the location.
394 * @param[in] val must be pointing to somewhere in the source
395 * buffer that was last parsed by this object. */
396 Location val_location(const char *val) const;
397
398 /** @} */
399
400public:
401
402 /** @name scalar filtering */
403 /** @{*/
404
405 /** filter a plain scalar */
406 FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation);
407 /** filter a plain scalar in place */
408 FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation);
409
410 /** filter a single-quoted scalar */
412 /** filter a single-quoted scalar in place */
414
415 /** filter a double-quoted scalar */
417 /** filter a double-quoted scalar in place */
419
420 /** filter a block-literal scalar */
421 FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
422 /** filter a block-literal scalar in place */
423 FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
424
425 /** filter a block-folded scalar */
426 FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
427 /** filter a block-folded scalar in place */
428 FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
429
430 /** @} */
431
432private:
433
434 bool _is_doc_begin(csubstr s);
435 bool _is_doc_end(csubstr s);
436
437 bool _scan_scalar_plain_blck(ScannedScalar *C4_RESTRICT sc, size_t indentation);
438 bool _scan_scalar_plain_seq_flow(ScannedScalar *C4_RESTRICT sc);
439 bool _scan_scalar_plain_seq_blck(ScannedScalar *C4_RESTRICT sc);
440 bool _scan_scalar_plain_map_flow(ScannedScalar *C4_RESTRICT sc);
441 bool _scan_scalar_plain_map_blck(ScannedScalar *C4_RESTRICT sc);
442 bool _scan_scalar_map_json(ScannedScalar *C4_RESTRICT sc);
443 bool _scan_scalar_seq_json(ScannedScalar *C4_RESTRICT sc);
444 bool _scan_scalar_plain_unk(ScannedScalar *C4_RESTRICT sc);
445 bool _is_valid_start_scalar_plain_flow(csubstr s);
446 bool _is_valid_start_scalar_plain_flow_check_block_token(csubstr s);
447 bool _is_valid_start_scalar_plain_flow_check_qmrk(csubstr s);
448 bool _scan_scalar_plain_handle_newline(csubstr s, size_t offs);
449 void _check_valid_newline_in_quoted_scalar();
450
451 ScannedScalar _scan_scalar_squot();
452 ScannedScalar _scan_scalar_dquot();
453
454 void _scan_block(ScannedBlock *C4_RESTRICT sb, size_t indref);
455 csubstr _scan_anchor();
456 csubstr _scan_ref_seq();
457 csubstr _scan_ref_map();
458 csubstr _scan_tag();
459 csubstr _scan_tag(csubstr *orig);
460
461public: // exposed for testing
462
463 /** @cond dev */
464 csubstr _filter_scalar_plain(substr s, size_t indentation);
465 csubstr _filter_scalar_squot(substr s);
466 csubstr _filter_scalar_dquot(substr s);
467 csubstr _filter_scalar_literal(substr s, size_t indentation, BlockChomp_e chomp);
468 csubstr _filter_scalar_folded(substr s, size_t indentation, BlockChomp_e chomp);
469 csubstr _move_scalar_left_and_add_newline(substr s);
470
471 csubstr _maybe_filter_key_scalar_plain(ScannedScalar const& sc, size_t indendation);
472 csubstr _maybe_filter_val_scalar_plain(ScannedScalar const& sc, size_t indendation);
473 csubstr _maybe_filter_key_scalar_squot(ScannedScalar const& sc);
474 csubstr _maybe_filter_val_scalar_squot(ScannedScalar const& sc);
475 csubstr _maybe_filter_key_scalar_dquot(ScannedScalar const& sc);
476 csubstr _maybe_filter_val_scalar_dquot(ScannedScalar const& sc);
477 csubstr _maybe_filter_key_scalar_literal(ScannedBlock const& sb);
478 csubstr _maybe_filter_val_scalar_literal(ScannedBlock const& sb);
479 csubstr _maybe_filter_key_scalar_folded(ScannedBlock const& sb);
480 csubstr _maybe_filter_val_scalar_folded(ScannedBlock const& sb);
481 /** @endcond */
482
483private:
484
485 void _handle_map_block();
486 bool _handle_map_block_qmrk();
487 bool _handle_map_block_rkcl();
488 void _handle_seq_block();
489 void _handle_map_flow();
490 void _handle_seq_flow();
491 void _handle_seq_imap();
492 void _handle_map_json();
493 void _handle_seq_json();
494
495 void _handle_unk();
496 void _handle_unk_json();
497
498 void _handle_usty();
499
500 void _handle_flow_skip_whitespace();
501 void _handle_flow_line_beginning();
502
503 size_t _handle_unk_check_left_tokens(size_t realindent, size_t col, bool skip_annotations=true);
504 void _handle_unk_get_first_non_pending_token_pos(csubstr s, size_t *indent, size_t *first_non_token_pos);
505 void _handle_unk_begin_doc();
506
507 size_t _handle_block_skip_leading_whitespace();
508 C4_ALWAYS_INLINE
509 size_t _handle_block_get_whitespace_mark() const noexcept { return m_evt_handler->m_curr->pos.offset; }
510 void _handle_block_check_leading_tabs(size_t prev_mark) { return _handle_block_check_leading_tabs(prev_mark, m_evt_handler->m_curr->pos.offset); }
511 void _handle_block_check_leading_tabs(size_t start_mark, size_t end_mark);
512
513 void _end_map_flow();
514 void _end_seq_flow();
515 void _end_map_blck();
516 void _end_seq_blck();
517 void _end2_map();
518 void _end2_seq();
519 void _end_flow_container(size_t orig_indent, bool multiline);
520 void _flow_container_was_a_key(size_t orig_indent);
521
522 void _begin2_doc();
523 void _begin2_doc_expl();
524 void _end2_doc();
525 void _end2_doc_expl();
526 void _check_doc_end_tokens() const;
527
528 void _maybe_begin_doc();
529 void _maybe_end_doc();
530
531 void _start_doc_suddenly();
532 void _end_doc_suddenly();
533 void _end_doc_suddenly__pop();
534 void _check_trailing_doc_token();
535 void _end_stream();
536
537 void _set_indentation(size_t indentation) noexcept;
538 void _save_indentation();
539 void _mark_seqflow_val_end() noexcept;
540 void _handle_indentation_pop_from_block_seq();
541 void _handle_indentation_pop_from_block_map();
542 void _handle_indentation_pop(ParserState const* dst);
543
544 void _maybe_skip_comment();
545 void _maybe_skip_comment_strict();
546 void _skip_comment();
547 void _maybe_skip_whitespace_tokens();
548 void _maybe_skipchars(char c);
549 template<size_t N>
550 void _skipchars(const char (&chars)[N]);
551 bool _maybe_scan_following_colon() noexcept;
552
553public:
554
555 /** @cond dev */
556 template<class FilterProcessor> auto _filter_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation) -> decltype(proc.result());
557 template<class FilterProcessor> auto _filter_squoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
558 template<class FilterProcessor> auto _filter_dquoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
559 template<class FilterProcessor> auto _filter_block_literal(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
560 template<class FilterProcessor> auto _filter_block_folded(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
561 /** @endcond */
562
563public:
564
565 /** @cond dev */
566 template<class FilterProcessor> void _filter_nl_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation);
567 template<class FilterProcessor> void _filter_nl_squoted(FilterProcessor &C4_RESTRICT proc);
568 template<class FilterProcessor> void _filter_nl_dquoted(FilterProcessor &C4_RESTRICT proc);
569
570 template<class FilterProcessor> bool _filter_ws_handle_to_first_non_space(FilterProcessor &C4_RESTRICT proc);
571 template<class FilterProcessor> void _filter_ws_copy_trailing(FilterProcessor &C4_RESTRICT proc);
572 template<class FilterProcessor> void _filter_ws_skip_trailing(FilterProcessor &C4_RESTRICT proc);
573
574 template<class FilterProcessor> void _filter_dquoted_backslash(FilterProcessor &C4_RESTRICT proc);
575 template<class FilterProcessor> void _filter_dquoted_backslash_decode(FilterProcessor &C4_RESTRICT proc, size_t sz);
576
577 template<class FilterProcessor> void _filter_chomp(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp, size_t indentation);
578 template<class FilterProcessor> size_t _handle_all_whitespace(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp);
579 template<class FilterProcessor> size_t _extend_to_chomp(FilterProcessor &C4_RESTRICT proc, size_t contents_len);
580 template<class FilterProcessor> void _filter_block_indentation(FilterProcessor &C4_RESTRICT proc, size_t indentation);
581 template<class FilterProcessor> void _filter_block_folded_newlines(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
582 template<class FilterProcessor> size_t _filter_block_folded_newlines_compress(FilterProcessor &C4_RESTRICT proc, size_t num_newl, size_t wpos_at_first_newl);
583 template<class FilterProcessor> void _filter_block_folded_newlines_leading(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
584 template<class FilterProcessor> void _filter_block_folded_indented_block(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len, size_t curr_indentation) noexcept;
585
586 substr _alloc_arena(size_t len, substr *relocated=nullptr);
587 substr _alloc_arena(size_t len, csubstr *relocated) { return _alloc_arena(len, reinterpret_cast<substr*>(relocated)); } // NOLINT
588
589 /** @endcond */
590
591private:
592
593 void _line_progressed(size_t ahead);
594 void _line_ended();
595 void _line_ended_undo();
596
597 bool _finished_file() const;
598 bool _finished_line() const;
599
600 void _scan_line();
601 substr _peek_next_line(size_t pos=npos) const;
602
603 void _relocate_arena(csubstr prev_arena, substr next_arena, substr *other_string=nullptr);
604
605private:
606
607 C4_ALWAYS_INLINE substr _buf() const noexcept { return m_evt_handler->m_src; }
608
609 C4_ALWAYS_INLINE bool has_all(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == f; }
610 C4_ALWAYS_INLINE bool has_any(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) != 0; }
611 C4_ALWAYS_INLINE bool has_none(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == 0; }
612 static C4_ALWAYS_INLINE bool has_all(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == f; }
613 static C4_ALWAYS_INLINE bool has_any(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) != 0; }
614 static C4_ALWAYS_INLINE bool has_none(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == 0; }
615
616 #ifndef RYML_DBG
617 C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { m_evt_handler->m_curr->flags |= on; }
618 C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; m_evt_handler->m_curr->flags |= on; }
619 C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; }
620 #else
621 C4_ALWAYS_INLINE void add_flags(ParserFlag_t on);
622 C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off);
623 C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off);
624 #endif
625
626private:
627
628 void _prepare_locations();
629 void _resize_locations(size_t sz);
630 bool _locations_dirty() const;
631
632private:
633
634 void _reset();
635 void _free();
636 void _clr();
637
638 template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, const char *fmt, Args const& ...args) const;
639 template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, Location const& ymlloc, const char *fmt, Args const& ...args) const;
640 #ifdef RYML_DBG
641 template<class ...Args> C4_NO_INLINE void _dbg(csubstr fmt, Args const& ...args) const;
642 template<class DumpFn> C4_NO_INLINE void _fmt_msg(DumpFn &&dumpfn) const;
643 C4_NO_INLINE void _print_state_stack() const;
644 C4_NO_INLINE void _print_state_stack(substr buf) const;
645 #endif
646
647private:
648
649 void _handle_colon();
650 void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str, size_t indentation, size_t line);
651 void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str, size_t indentation, size_t line, csubstr orig);
652 void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str);
653 C4_ALWAYS_INLINE void _clear_annotations(Annotation *C4_RESTRICT dst) noexcept { dst->num_entries = 0; }
654 bool _annotations_require_key_container() const;
655 bool _handle_annotations_before_unexpected_flow_token_rkey();
656 void _handle_annotations_before_blck_key_scalar();
657 void _handle_annotations_before_blck_val_scalar();
658 void _handle_annotations_before_start_mapblck(size_t current_line);
659 void _handle_annotations_before_start_mapblck_as_key();
660 void _handle_annotations_and_indentation_after_start_mapblck(size_t key_indentation, size_t key_line);
661 size_t _select_indentation_from_annotations(size_t val_indentation, size_t val_line);
662 uint32_t _get_annotations_same_line(csubstr token_soup, csubstr * first, csubstr * second) const;
663 void _handle_keyref(csubstr alias);
664 void _handle_valref(csubstr alias);
665 csubstr _resolve_tag(csubstr tag);
666 void _handle_directive(csubstr rem);
667 bool _validate_directive_yaml(csubstr *C4_RESTRICT directive, csubstr *C4_RESTRICT version) const;
668 bool _validate_directive_tag(csubstr *C4_RESTRICT directive, csubstr *C4_RESTRICT handle, csubstr *C4_RESTRICT prefix) const;
669 bool _handle_bom();
670 void _handle_bom(Encoding_e enc);
671
672private:
673
674 ParserOptions m_options;
675
676public:
677
678 /** @cond dev */
679 EventHandler *C4_RESTRICT m_evt_handler; // NOLINT
680 /** @endcond */
681
682private:
683
684 Annotation m_pending_anchors;
685 Annotation m_pending_tags;
686
687 bool m_has_directives_yaml;
688 bool m_has_directives;
689 bool m_doc_empty;
690 size_t m_prev_colon;
691 size_t m_prev_val_end;
692
693private:
694
695 size_t m_bom_len;
696 size_t m_bom_line;
697 Encoding_e m_encoding;
698
699private:
700
701 size_t *m_newline_offsets;
702 size_t m_newline_offsets_size;
703 size_t m_newline_offsets_capacity;
704
705public:
706
707 // deprecated methods
708
709 /** @cond dev */
710 RYML_DEPRECATED("filter arena no longer needed") size_t filter_arena_capacity() const { return 0u; } // LCOV_EXCL_LINE
711 RYML_DEPRECATED("filter arena no longer needed") void reserve_filter_arena(size_t) {} // LCOV_EXCL_LINE
712
713 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t, size_t node_id);
714 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, Tree *t, size_t node_id);
715 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t );
716 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, Tree *t );
717 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, NodeRef node );
718 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, NodeRef node );
719 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place(csubstr filename, substr yaml );
720 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place( substr yaml );
721 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t, size_t node_id);
722 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, Tree *t, size_t node_id);
723 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t );
724 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, Tree *t );
725 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, NodeRef node );
726 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, NodeRef node );
727 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, csubstr yaml );
728 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena( csubstr yaml );
729 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t, size_t node_id);
730 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, Tree *t, size_t node_id);
731 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t );
732 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, Tree *t );
733 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, NodeRef node );
734 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, NodeRef node );
735 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, substr yaml );
736 template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena( substr yaml );
737
738 template<class U>
739 RYML_DEPRECATED("moved to Tree::location(Parser const&). deliberately undefined here.")
740 auto location(Tree const&, id_type node) const -> typename std::enable_if<U::is_wtree, Location>::type;
741
742 template<class U>
743 RYML_DEPRECATED("moved to ConstNodeRef::location(Parser const&), deliberately undefined here.")
744 auto location(ConstNodeRef const&) const -> typename std::enable_if<U::is_wtree, Location>::type;
745 /** @endcond */
746
747};
748
749/** @} */
750
751/** @cond dev */
752C4_SUPPRESS_WARNING_GCC_WITH_PUSH("-Wattributes")
753C4_NO_INLINE inline size_t _find_last_newline_and_larger_indentation(csubstr s, size_t indentation) noexcept
754{
755 if(indentation + 1 > s.len)
756 return npos;
757 for(size_t i = s.len-indentation-1; i != size_t(-1); --i) // NOLINT
758 {
759 if(s.str[i] == '\n')
760 {
761 csubstr rem = s.sub(i + 1);
762 size_t first = rem.first_not_of(' ');
763 first = (first != npos) ? first : rem.len;
764 if(first > indentation)
765 return i;
766 }
767 }
768 return npos;
769}
770C4_SUPPRESS_WARNING_GCC_POP
771/** @endcond */
772
773} // namespace yml
774} // namespace c4
775
776// NOLINTEND(hicpp-signed-bitwise)
777
778#if defined(_MSC_VER)
779# pragma warning(pop)
780#endif
781
782#endif /* C4_YML_PARSE_ENGINE_HPP_ */
FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation)
filter a plain scalar
csubstr location_contents(Location const &loc) const
Get the string starting at a particular location, to the end of the parsed source buffer.
FilterResult filter_scalar_squoted(csubstr scalar, substr dst)
filter a single-quoted scalar
FilterResult filter_scalar_dquoted(csubstr scalar, substr dst)
filter a double-quoted scalar
void parse_json_in_place_ev(csubstr filename, substr src)
parse JSON in place, emitting events to the current handler
Location val_location(const char *val) const
Given a pointer to a buffer position, get the location.
FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation)
filter a plain scalar in place
FilterResult filter_scalar_squoted_in_place(substr scalar, size_t cap)
filter a single-quoted scalar in place
FilterResultExtending filter_scalar_dquoted_in_place(substr scalar, size_t cap)
filter a double-quoted scalar in place
Encoding_e encoding() const
Get the encoding of the latest YAML buffer parsed by this object.
ParserOptions const & options() const
Get the options used to build this parser object.
size_t locations_capacity() const
void parse_in_place_ev(csubstr filename, substr src)
parse YAML in place, emitting events to the current handler
csubstr source() const
Get the latest YAML buffer parsed by this object.
FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp)
filter a block-literal scalar in place
ParseEngine(EventHandler *evt_handler, ParserOptions const &opts={})
FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp)
filter a block-literal scalar
id_type stack_capacity() const
Callbacks const & callbacks() const
Get the current callbacks in the parser.
FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp)
filter a block-folded scalar in place
csubstr filename() const
Get the name of the latest file parsed by this object.
void reserve_locations(size_t num_source_lines)
Reserve a certain capacity for the array used to track node locations in the source buffer.
FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp)
filter a block-folded scalar
#define RYML_EXPORT
Definition export.hpp:18
forward declarations
void parse_in_arena(Parser *parser, csubstr filename, csubstr yaml, Tree *tree, id_type node_id)
(1) parse YAML into an existing tree node. The filename will be used in any error messages arising du...
Definition parse.cpp:209
void parse_in_place(Parser *parser, csubstr filename, substr yaml, Tree *tree, id_type node_id)
(1) parse YAML into an existing tree node.
Definition parse.cpp:165
id_type estimate_tree_capacity(csubstr src)
Quickly inspect the source to estimate the number of nodes the resulting tree is likely to have.
Definition parse.cpp:254
basic_substring< char > substr
a mutable string view
Definition substr.hpp:2355
basic_substring< const char > csubstr
an immutable string view
Definition substr.hpp:2356
@ npos
a null string position
Definition common.hpp:138
int ParserFlag_t
data type for ParserState_e
RYML_ID_TYPE id_type
The type of a node id in the YAML tree; to override the default type, define the macro RYML_ID_TYPE t...
Definition common.hpp:124
@ UTF8
UTF8.
Definition common.hpp:144
@ NOBOM
No Byte Order Mark was found.
Definition common.hpp:143
enum c4::yml::Encoding_ Encoding_e
csubstr version()
Definition version.cpp:6
A c-style callbacks class to customize behavior on errors or allocation.
Definition common.hpp:374
Result for filtering a scalar which not fit in the intended memory.
Result for filtering a scalar which not fit in the intended memory.
holds a source or yaml file position, for example when an error is detected; See also location_format...
Definition common.hpp:229
Options to give to the ParseEngine to control its behavior.