rapidyaml  0.13.0
parse and emit YAML, and do it fast
parse_engine.hpp
Go to the documentation of this file.
1 #ifndef _C4_YML_PARSE_ENGINE_HPP_
2 #define _C4_YML_PARSE_ENGINE_HPP_
3 
4 #ifndef _C4_YML_PARSER_STATE_HPP_
6 #endif
7 
8 
9 #if defined(_MSC_VER)
10 # pragma warning(push)
11 # pragma warning(disable: 4251/*needs to have dll-interface to be used by clients of struct*/)
12 #endif
13 
14 // NOLINTBEGIN(hicpp-signed-bitwise)
15 
16 namespace c4 {
17 namespace yml {
18 
19 /** @addtogroup doc_parse
20  * @{ */
21 
22 /** @defgroup doc_event_handlers Event Handlers
23  *
24  * @brief rapidyaml implements its parsing logic with a two-level
25  * model, where a @ref ParseEngine object reads through the YAML
26  * source, and dispatches events to an EventHandler bound to the @ref
27  * ParseEngine. Because @ref ParseEngine is templated on the event
28  * handler, the binding uses static polymorphism, without any virtual
29  * functions. The actual handler object can be changed at run time,
30  * (but of course needs to be the type of the template parameter).
31  * This is thus a very efficient architecture, and further enables the
32  * user to provide his own custom handler if he wishes to bypass the
33  * rapidyaml @ref Tree.
34  *
35  * The following handlers are implemented in this project:
36  *
37  * - @ref EventHandlerTree is the handler responsible for creating the
38  * ryml @ref Tree . This is part of the library.
39  *
40  * - Extra handlers (not part of the library, but provided as extra classes):
41  *
42  * - @ref extra::EventHandlerInts parses YAML into a contiguous
43  * integer array representing the YAML structure.
44  * - [play.yaml.com](https://play.yaml.com/)
45  * - [matrix.yaml.info/](https://matrix.yaml.info/)
46  * - the CI of this project.
47  *
48  *
49  * ### Event model
50  *
51  * The event model used by the parse engine and event handlers follows
52  * very closely the event model in the [YAML test
53  * suite](https://github.com/yaml/yaml-test-suite).
54  *
55  * Consider for example this YAML,
56  * ```yaml
57  * {foo: bar,foo2: bar2}
58  * ```
59  * which would produce these events in the test-suite parlance:
60  * ```
61  * +STR
62  * +DOC
63  * +MAP {}
64  * =VAL :foo
65  * =VAL :bar
66  * =VAL :foo2
67  * =VAL :bar2
68  * -MAP
69  * -DOC
70  * -STR
71  * ```
72  *
73  * For reference, the @ref ParseEngine object will produce this
74  * sequence of calls to its bound EventHandler:
75  * ```cpp
76  * handler.begin_stream();
77  * handler.begin_doc();
78  * handler.begin_map_val_flow();
79  * handler.set_key_scalar_plain("foo");
80  * handler.set_val_scalar_plain("bar");
81  * handler.add_sibling();
82  * handler.set_key_scalar_plain("foo2");
83  * handler.set_val_scalar_plain("bar2");
84  * handler.end_map();
85  * handler.end_doc();
86  * handler.end_stream();
87  * ```
88  *
89  * For many other examples of all areas of YAML and how ryml's parse
90  * model corresponds to the YAML standard model, refer to the [unit
91  * tests for the parse
92  * engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).
93  *
94  *
95  * ### Special events
96  *
97  * Most of the parsing events adopted by rapidyaml in its event model
98  * are fairly obvious, but there are two less-obvious events requiring
99  * some explanation.
100  *
101  * These events exist to make it easier to parse some special YAML
102  * cases. They are called by the parser when a just-handled
103  * value/container is actually the first key of a new map:
104  *
105  * - `actually_val_is_first_key_of_new_map_flow()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerTree" / @ref EventHandlerInts::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerInts")
106  * - `actually_val_is_first_key_of_new_map_block()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerTree" / @ref EventHandlerInts::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerInts")
107  *
108  * For example, consider an implicit map inside a seq: `[a: b, c:
109  * d]` which is parsed as `[{a: b}, {c: d}]`. The standard event
110  * sequence for this YAML would be the following:
111  * ```cpp
112  * handler.begin_seq_val_flow();
113  * handler.begin_map_val_flow();
114  * handler.set_key_scalar_plain("a");
115  * handler.set_val_scalar_plain("b");
116  * handler.end_map();
117  * handler.add_sibling();
118  * handler.begin_map_val_flow();
119  * handler.set_key_scalar_plain("c");
120  * handler.set_val_scalar_plain("d");
121  * handler.end_map();
122  * handler.end_seq();
123  * ```
124  * The problem with this event sequence is that it forces the
125  * parser to delay setting the val scalar (in this case "a" and
126  * "c") until it knows whether the scalar is a key or a val. This
127  * would require the parser to store the scalar until this
128  * time. For instance, in the example above, the parser should
129  * delay setting "a" and "c", because they are in fact keys and
130  * not vals. Until then, the parser would have to store "a" and
131  * "c" in its internal state. The downside is that this complexity
132  * cost would apply even if there is no implicit map -- every val
133  * in a seq would have to be delayed until one of the
134  * disambiguating subsequent tokens `,-]:` is found.
135  * By calling this function, the parser can avoid this complexity,
136  * by preemptively setting the scalar as a val. Then a call to
137  * this function will create the map and rearrange the scalar as
138  * key. Now the cost applies only once: when a seqimap starts. So
139  * the following (easier and cheaper) event sequence below has the
140  * same effect as the event sequence above:
141  * ```cpp
142  * handler.begin_seq_val_flow();
143  * handler.set_val_scalar_plain("notmap");
144  * handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
145  * handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
146  * handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
147  * handler.end_map();
148  * handler.set_val_scalar_plain("c"); // "c" also as val!
149  * handler.actually_as_block_flow(); // likewise
150  * handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
151  * handler.end_map();
152  * handler.end_seq();
153  * ```
154  * This also applies to container keys (although ryml's tree
155  * cannot accomodate these): the parser can preemptively set a
156  * container as a val, and call this event to turn that container
157  * into a key. For example, consider this yaml:
158  * ```yaml
159  * [aa, bb]: [cc, dd]
160  * # ^ ^ ^
161  * # | | |
162  * # (2) (1) (3) <- event sequence
163  * ```
164  * The standard event sequence for this YAML would be the
165  * following:
166  * ```cpp
167  * handler.begin_map_val_block(); // (1)
168  * handler.begin_seq_key_flow(); // (2)
169  * handler.set_val_scalar_plain("aa");
170  * handler.add_sibling();
171  * handler.set_val_scalar_plain("bb");
172  * handler.end_seq();
173  * handler.begin_seq_val_flow(); // (3)
174  * handler.set_val_scalar_plain("cc");
175  * handler.add_sibling();
176  * handler.set_val_scalar_plain("dd");
177  * handler.end_seq();
178  * handler.end_map();
179  * ```
180  * The problem with the sequence above is that, reading from
181  * left-to-right, the parser can only detect the proper calls at
182  * (1) and (2) once it reaches (1) in the YAML source. So, the
183  * parser would have to buffer the entire event sequence starting
184  * from the beginning until it reaches (1). Using this function,
185  * the parser can do instead:
186  * ```cpp
187  * handler.begin_seq_val_flow(); // (2) -- preemptively as val!
188  * handler.set_val_scalar_plain("aa");
189  * handler.add_sibling();
190  * handler.set_val_scalar_plain("bb");
191  * handler.end_seq();
192  * handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
193  * handler.begin_seq_val_flow(); // (3) -- go on as before
194  * handler.set_val_scalar_plain("cc");
195  * handler.add_sibling();
196  * handler.set_val_scalar_plain("dd");
197  * handler.end_seq();
198  * handler.end_map();
199  * ```
200  */
201 
202 class Tree;
203 class NodeRef;
204 class ConstNodeRef;
205 struct FilterResult;
206 struct FilterResultExtending;
207 
208 /** @cond dev */
209 typedef enum BlockChomp_ { // NOLINT
210  CHOMP_CLIP, //!< single newline at end (default)
211  CHOMP_STRIP, //!< no newline at end (-)
212  CHOMP_KEEP //!< all newlines from end (+)
213 } BlockChomp_e;
214 /** @endcond */
215 
216 
217 /** Quickly inspect the source to estimate the number of nodes the
218  * resulting tree is likely to have. If a tree is empty before
219  * parsing, considerable time will be spent growing it, so calling
220  * this to reserve the tree size prior to parsing is likely to
221  * result in a time gain. We encourage using this method before
222  * parsing, but as always measure its impact in performance to
223  * obtain a good trade-off.
224  *
225  * @note since this method is meant for optimizing performance, it
226  * is approximate. The result may be actually smaller than the
227  * resulting number of nodes, notably if the YAML uses implicit
228  * maps as flow seq members as in `[these: are, individual:
229  * maps]`. */
230 RYML_EXPORT id_type estimate_tree_capacity(csubstr src); // NOLINT(readability-redundant-declaration)
231 
232 
233 //-----------------------------------------------------------------------------
234 //-----------------------------------------------------------------------------
235 //-----------------------------------------------------------------------------
236 
237 /** This is the main driver of parsing logic: it scans the YAML or
238  * JSON source for tokens, and emits the appropriate sequence of
239  * parsing events to its event handler. The parse engine itself has no
240  * special limitations, and *can* accomodate containers as keys; it is the
241  * event handler may introduce additional constraints.
242  *
243  * There are two implemented handlers (see @ref doc_event_handlers,
244  * which has important notes about the event model):
245  *
246  * - @ref EventHandlerTree is the handler responsible for creating the
247  * ryml @ref Tree
248  *
249  * - @ref extra::EventHandlerInts is the handler responsible for
250  * emitting integer-coded events. It is intended for implementing
251  * fully-conformant parsing in other programming languages
252  * (integration is currently under work for
253  * [YamlScript](https://github.com/yaml/yamlscript) and
254  * [go-yaml](https://github.com/yaml/go-yaml/)). It is not part of
255  * the library and is not installed.
256  *
257  */
258 template<class EventHandler>
260 {
261 public:
262 
263  using handler_type = EventHandler;
264 
265 public:
266 
267  /** @name construction and assignment */
268  /** @{ */
269 
270  ParseEngine(EventHandler *evt_handler, ParserOptions opts={});
271  ~ParseEngine();
272 
273  ParseEngine(ParseEngine &&) noexcept;
274  ParseEngine(ParseEngine const&);
275  ParseEngine& operator=(ParseEngine &&) noexcept;
276  ParseEngine& operator=(ParseEngine const&);
277 
278  /** @} */
279 
280 public:
281 
282  /** @name modifiers */
283  /** @{ */
284 
285  /** Reserve a certain capacity for the parsing stack.
286  * This should be larger than the expected depth of the parsed
287  * YAML tree.
288  *
289  * The parsing stack is the only (potential) heap memory used
290  * directly by the parser.
291  *
292  * If the requested capacity is below the default
293  * stack size of 16, the memory is used directly in the parser
294  * object; otherwise it will be allocated from the heap.
295  *
296  * @note this reserves memory only for the parser itself; all the
297  * allocations for the parsed tree will go through the tree's
298  * allocator (when different).
299  *
300  * @note for maximum efficiency, the tree and the arena can (and
301  * should) also be reserved. */
302  void reserve_stack(id_type capacity)
303  {
304  _RYML_ASSERT_BASIC(m_evt_handler);
305  m_evt_handler->m_stack.reserve(capacity);
306  }
307 
308  /** Reserve a certain capacity for the array used to track node
309  * locations in the source buffer. */
310  void reserve_locations(size_t num_source_lines)
311  {
312  _resize_locations(num_source_lines);
313  }
314 
315  /** @} */
316 
317 public:
318 
319  /** @name getters */
320  /** @{ */
321 
322  /** Get the options used to build this parser object. */
323  ParserOptions const& options() const { return m_options; }
324 
325  /** Get the current callbacks in the parser. */
326  Callbacks const& callbacks() const { _RYML_ASSERT_BASIC(m_evt_handler); return m_evt_handler->m_stack.m_callbacks; }
327 
328  /** Get the name of the latest file parsed by this object. */
329  csubstr filename() const { return m_evt_handler->m_curr ? m_evt_handler->m_curr->pos.name : csubstr{}; }
330 
331  /** Get the latest YAML buffer parsed by this object. */
332  csubstr source() const { return m_evt_handler ? m_evt_handler->m_src : csubstr{}; }
333 
334  /** Get the encoding of the latest YAML buffer parsed by this object.
335  * If no encoding was specified, UTF8 is assumed as per the YAML standard. */
336  Encoding_e encoding() const { return m_encoding != NOBOM ? m_encoding : UTF8; }
337 
338  id_type stack_capacity() const { _RYML_ASSERT_BASIC(m_evt_handler); return m_evt_handler->m_stack.capacity(); }
339  size_t locations_capacity() const { return m_newline_offsets_capacity; }
340 
341  /** @} */
342 
343 public:
344 
345  /** @name parse methods */
346  /** @{ */
347 
348  /** parse YAML in place, emitting events to the current handler */
349  void parse_in_place_ev(csubstr filename, substr src);
350 
351  /** parse JSON in place, emitting events to the current handler */
352  void parse_json_in_place_ev(csubstr filename, substr src);
353 
354  /** @} */
355 
356 public:
357 
358  /** @name locations */
359  /** @{ */
360 
361  /** Get the string starting at a particular location, to the end
362  * of the parsed source buffer. */
363  csubstr location_contents(Location const& loc) const;
364 
365  /** Given a pointer to a buffer position, get the location.
366  * @param[in] val must be pointing to somewhere in the source
367  * buffer that was last parsed by this object. */
368  Location val_location(const char *val) const;
369 
370  /** @} */
371 
372 public:
373 
374  /** @name scalar filtering */
375  /** @{*/
376 
377  /** filter a plain scalar */
378  FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation);
379  /** filter a plain scalar in place */
380  FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation);
381 
382  /** filter a single-quoted scalar */
383  FilterResult filter_scalar_squoted(csubstr scalar, substr dst);
384  /** filter a single-quoted scalar in place */
385  FilterResult filter_scalar_squoted_in_place(substr scalar, size_t cap);
386 
387  /** filter a double-quoted scalar */
388  FilterResult filter_scalar_dquoted(csubstr scalar, substr dst);
389  /** filter a double-quoted scalar in place */
391 
392  /** filter a block-literal scalar */
393  FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
394  /** filter a block-literal scalar in place */
395  FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
396 
397  /** filter a block-folded scalar */
398  FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
399  /** filter a block-folded scalar in place */
400  FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
401 
402  /** @} */
403 
404 private:
405 
406  struct ScannedScalar
407  {
408  substr scalar;
409  bool needs_filter;
410  };
411 
412  struct ScannedBlock
413  {
414  substr scalar;
415  size_t indentation;
416  BlockChomp_e chomp;
417  };
418 
419 private:
420 
421  bool _is_doc_begin(csubstr s);
422  bool _is_doc_end(csubstr s);
423 
424  bool _scan_scalar_plain_blck(ScannedScalar *C4_RESTRICT sc, size_t indentation);
425  bool _scan_scalar_plain_seq_flow(ScannedScalar *C4_RESTRICT sc);
426  bool _scan_scalar_plain_seq_blck(ScannedScalar *C4_RESTRICT sc);
427  bool _scan_scalar_plain_map_flow(ScannedScalar *C4_RESTRICT sc);
428  bool _scan_scalar_plain_map_blck(ScannedScalar *C4_RESTRICT sc);
429  bool _scan_scalar_map_json(ScannedScalar *C4_RESTRICT sc);
430  bool _scan_scalar_seq_json(ScannedScalar *C4_RESTRICT sc);
431  bool _scan_scalar_plain_unk(ScannedScalar *C4_RESTRICT sc);
432  bool _is_valid_start_scalar_plain_flow(csubstr s);
433  bool _is_valid_start_scalar_plain_flow_check_block_token(csubstr s);
434  bool _is_valid_start_scalar_plain_flow_check_qmrk(csubstr s);
435  bool _scan_scalar_plain_handle_newline(csubstr s, size_t offs);
436  void _check_valid_newline_in_quoted_scalar();
437 
438  ScannedScalar _scan_scalar_squot();
439  ScannedScalar _scan_scalar_dquot();
440 
441  void _scan_block(ScannedBlock *C4_RESTRICT sb, size_t indref);
442  csubstr _scan_anchor();
443  csubstr _scan_ref_seq();
444  csubstr _scan_ref_map();
445  csubstr _scan_tag();
446  csubstr _scan_tag(csubstr *orig);
447 
448 public: // exposed for testing
449 
450  /** @cond dev */
451  csubstr _filter_scalar_plain(substr s, size_t indentation);
452  csubstr _filter_scalar_squot(substr s);
453  csubstr _filter_scalar_dquot(substr s);
454  csubstr _filter_scalar_literal(substr s, size_t indentation, BlockChomp_e chomp);
455  csubstr _filter_scalar_folded(substr s, size_t indentation, BlockChomp_e chomp);
456  csubstr _move_scalar_left_and_add_newline(substr s);
457 
458  csubstr _maybe_filter_key_scalar_plain(ScannedScalar const& sc, size_t indendation);
459  csubstr _maybe_filter_val_scalar_plain(ScannedScalar const& sc, size_t indendation);
460  csubstr _maybe_filter_key_scalar_squot(ScannedScalar const& sc);
461  csubstr _maybe_filter_val_scalar_squot(ScannedScalar const& sc);
462  csubstr _maybe_filter_key_scalar_dquot(ScannedScalar const& sc);
463  csubstr _maybe_filter_val_scalar_dquot(ScannedScalar const& sc);
464  csubstr _maybe_filter_key_scalar_literal(ScannedBlock const& sb);
465  csubstr _maybe_filter_val_scalar_literal(ScannedBlock const& sb);
466  csubstr _maybe_filter_key_scalar_folded(ScannedBlock const& sb);
467  csubstr _maybe_filter_val_scalar_folded(ScannedBlock const& sb);
468  /** @endcond */
469 
470 private:
471 
472  void _handle_map_block();
473  bool _handle_map_block_qmrk();
474  bool _handle_map_block_rkcl();
475  void _handle_seq_block();
476  void _handle_map_flow();
477  void _handle_seq_flow();
478  void _handle_seq_imap();
479  void _handle_map_json();
480  void _handle_seq_json();
481 
482  void _handle_unk();
483  void _handle_unk_json();
484 
485  void _handle_usty();
486 
487  void _handle_flow_skip_whitespace();
488  void _handle_flow_line_beginning();
489 
490  size_t _handle_unk_check_left_tokens(size_t realindent, size_t col, bool skip_annotations=true);
491  void _handle_unk_get_first_non_pending_token_pos(csubstr s, size_t *indent, size_t *first_non_token_pos);
492  void _handle_unk_begin_doc();
493 
494  size_t _handle_block_skip_leading_whitespace();
495  C4_ALWAYS_INLINE
496  size_t _handle_block_get_whitespace_mark() const noexcept { return m_evt_handler->m_curr->pos.offset; }
497  void _handle_block_check_leading_tabs(size_t prev_mark) { return _handle_block_check_leading_tabs(prev_mark, m_evt_handler->m_curr->pos.offset); }
498  void _handle_block_check_leading_tabs(size_t start_mark, size_t end_mark);
499 
500  void _end_map_flow();
501  void _end_seq_flow();
502  void _end_map_blck();
503  void _end_seq_blck();
504  void _end2_map();
505  void _end2_seq();
506  void _end_flow_container(size_t orig_indent, bool multiline);
507  void _flow_container_was_a_key(size_t orig_indent);
508 
509  void _begin2_doc();
510  void _begin2_doc_expl();
511  void _end2_doc();
512  void _end2_doc_expl();
513  void _check_doc_end_tokens() const;
514 
515  void _maybe_begin_doc();
516  void _maybe_end_doc();
517 
518  void _start_doc_suddenly();
519  void _end_doc_suddenly();
520  void _end_doc_suddenly__pop();
521  void _check_trailing_doc_token();
522  void _end_stream();
523 
524  void _set_indentation(size_t indentation) noexcept;
525  void _save_indentation();
526  void _mark_seqflow_val_end() noexcept;
527  void _handle_indentation_pop_from_block_seq();
528  void _handle_indentation_pop_from_block_map();
529  void _handle_indentation_pop(ParserState const* dst);
530 
531  void _maybe_skip_comment();
532  void _maybe_skip_comment_strict();
533  void _skip_comment();
534  void _maybe_skip_whitespace_tokens();
535  void _maybe_skipchars(char c);
536  template<size_t N>
537  void _skipchars(const char (&chars)[N]);
538  bool _maybe_scan_following_colon() noexcept;
539 
540 public:
541 
542  /** @cond dev */
543  template<class FilterProcessor> auto _filter_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation) -> decltype(proc.result());
544  template<class FilterProcessor> auto _filter_squoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
545  template<class FilterProcessor> auto _filter_dquoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
546  template<class FilterProcessor> auto _filter_block_literal(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
547  template<class FilterProcessor> auto _filter_block_folded(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
548  /** @endcond */
549 
550 public:
551 
552  /** @cond dev */
553  template<class FilterProcessor> void _filter_nl_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation);
554  template<class FilterProcessor> void _filter_nl_squoted(FilterProcessor &C4_RESTRICT proc);
555  template<class FilterProcessor> void _filter_nl_dquoted(FilterProcessor &C4_RESTRICT proc);
556 
557  template<class FilterProcessor> bool _filter_ws_handle_to_first_non_space(FilterProcessor &C4_RESTRICT proc);
558  template<class FilterProcessor> void _filter_ws_copy_trailing(FilterProcessor &C4_RESTRICT proc);
559  template<class FilterProcessor> void _filter_ws_skip_trailing(FilterProcessor &C4_RESTRICT proc);
560 
561  template<class FilterProcessor> void _filter_dquoted_backslash(FilterProcessor &C4_RESTRICT proc);
562  template<class FilterProcessor> void _filter_dquoted_backslash_decode(FilterProcessor &C4_RESTRICT proc, size_t sz);
563 
564  template<class FilterProcessor> void _filter_chomp(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp, size_t indentation);
565  template<class FilterProcessor> size_t _handle_all_whitespace(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp);
566  template<class FilterProcessor> size_t _extend_to_chomp(FilterProcessor &C4_RESTRICT proc, size_t contents_len);
567  template<class FilterProcessor> void _filter_block_indentation(FilterProcessor &C4_RESTRICT proc, size_t indentation);
568  template<class FilterProcessor> void _filter_block_folded_newlines(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
569  template<class FilterProcessor> size_t _filter_block_folded_newlines_compress(FilterProcessor &C4_RESTRICT proc, size_t num_newl, size_t wpos_at_first_newl);
570  template<class FilterProcessor> void _filter_block_folded_newlines_leading(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
571  template<class FilterProcessor> void _filter_block_folded_indented_block(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len, size_t curr_indentation) noexcept;
572 
573  substr _alloc_arena(size_t len, substr *relocated=nullptr);
574  substr _alloc_arena(size_t len, csubstr *relocated) { return _alloc_arena(len, reinterpret_cast<substr*>(relocated)); } // NOLINT
575 
576  /** @endcond */
577 
578 private:
579 
580  void _line_progressed(size_t ahead);
581  void _line_ended();
582  void _line_ended_undo();
583 
584  bool _finished_file() const;
585  bool _finished_line() const;
586 
587  void _scan_line();
588  substr _peek_next_line(size_t pos=npos) const;
589 
590  void _relocate_arena(csubstr prev_arena, substr next_arena, substr *other_string=nullptr);
591 
592 private:
593 
594  C4_ALWAYS_INLINE substr _buf() const noexcept { return m_evt_handler->m_src; }
595 
596  C4_ALWAYS_INLINE bool has_all(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == f; }
597  C4_ALWAYS_INLINE bool has_any(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) != 0; }
598  C4_ALWAYS_INLINE bool has_none(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == 0; }
599  static C4_ALWAYS_INLINE bool has_all(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == f; }
600  static C4_ALWAYS_INLINE bool has_any(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) != 0; }
601  static C4_ALWAYS_INLINE bool has_none(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == 0; }
602 
603  #ifndef RYML_DBG
604  C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { m_evt_handler->m_curr->flags |= on; }
605  C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; m_evt_handler->m_curr->flags |= on; }
606  C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; }
607  #else
608  C4_ALWAYS_INLINE void add_flags(ParserFlag_t on);
609  C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off);
610  C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off);
611  #endif
612 
613 private:
614 
615  void _prepare_locations();
616  void _resize_locations(size_t sz);
617  bool _locations_dirty() const;
618 
619 private:
620 
621  void _reset();
622  void _free();
623  void _clr();
624 
625  template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, const char *fmt, Args const& ...args) const;
626  template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, Location const& ymlloc, const char *fmt, Args const& ...args) const;
627  #ifdef RYML_DBG
628  template<class ...Args> C4_NO_INLINE void _dbg(csubstr fmt, Args const& ...args) const;
629  template<class DumpFn> C4_NO_INLINE void _fmt_msg(DumpFn &&dumpfn) const;
630  C4_NO_INLINE void _print_state_stack() const;
631  C4_NO_INLINE void _print_state_stack(substr buf) const;
632  #endif
633 
634 
635 private:
636 
637  /** store pending tag or anchor/ref annotations */
638  struct Annotation
639  {
640  struct Entry
641  {
642  csubstr str;
643  size_t indentation;
644  size_t line;
645  csubstr orig;
646  };
647  Entry annotations[2];
648  uint8_t num_entries;
649  };
650 
651  void _handle_colon();
652  void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str, size_t indentation, size_t line);
653  void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str, size_t indentation, size_t line, csubstr orig);
654  void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str);
655  C4_ALWAYS_INLINE void _clear_annotations(Annotation *C4_RESTRICT dst) noexcept { dst->num_entries = 0; }
656  bool _annotations_require_key_container() const;
657  bool _handle_annotations_before_unexpected_flow_token_rkey();
658  void _handle_annotations_before_blck_key_scalar();
659  void _handle_annotations_before_blck_val_scalar();
660  void _handle_annotations_before_start_mapblck(size_t current_line);
661  void _handle_annotations_before_start_mapblck_as_key();
662  void _handle_annotations_and_indentation_after_start_mapblck(size_t key_indentation, size_t key_line);
663  size_t _select_indentation_from_annotations(size_t val_indentation, size_t val_line);
664  uint32_t _get_annotations_same_line(csubstr token_soup, csubstr * first, csubstr * second) const;
665  void _handle_keyref(csubstr alias);
666  void _handle_valref(csubstr alias);
667  csubstr _resolve_tag(csubstr tag);
668  void _handle_directive(csubstr rem);
669  bool _validate_directive_yaml(csubstr *C4_RESTRICT directive, csubstr *C4_RESTRICT version) const;
670  bool _validate_directive_tag(csubstr *C4_RESTRICT directive, csubstr *C4_RESTRICT handle, csubstr *C4_RESTRICT prefix) const;
671  bool _handle_bom();
672  void _handle_bom(Encoding_e enc);
673 
674 private:
675 
676  ParserOptions m_options;
677 
678 public:
679 
680  /** @cond dev */
681  EventHandler *C4_RESTRICT m_evt_handler; // NOLINT
682  /** @endcond */
683 
684 private:
685 
686  Annotation m_pending_anchors;
687  Annotation m_pending_tags;
688 
689  bool m_has_directives_yaml;
690  bool m_has_directives;
691  bool m_doc_empty;
692  size_t m_prev_colon;
693  size_t m_prev_val_end;
694 
695 private:
696 
697  size_t m_bom_len;
698  size_t m_bom_line;
699  Encoding_e m_encoding;
700 
701 private:
702 
703  size_t *m_newline_offsets;
704  size_t m_newline_offsets_size;
705  size_t m_newline_offsets_capacity;
706 
707 public:
708 
709  // deprecated methods
710 
711  /** @cond dev */
712  RYML_DEPRECATED("filter arena no longer needed") size_t filter_arena_capacity() const { return 0u; } // LCOV_EXCL_LINE
713  RYML_DEPRECATED("filter arena no longer needed") void reserve_filter_arena(size_t) {} // LCOV_EXCL_LINE
714 
715  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t, size_t node_id);
716  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, Tree *t, size_t node_id);
717  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t );
718  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, Tree *t );
719  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, NodeRef node );
720  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, NodeRef node );
721  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place(csubstr filename, substr yaml );
722  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place( substr yaml );
723  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t, size_t node_id);
724  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, Tree *t, size_t node_id);
725  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t );
726  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, Tree *t );
727  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, NodeRef node );
728  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, NodeRef node );
729  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, csubstr yaml );
730  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena( csubstr yaml );
731  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t, size_t node_id);
732  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, Tree *t, size_t node_id);
733  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t );
734  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, Tree *t );
735  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, NodeRef node );
736  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, NodeRef node );
737  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, substr yaml );
738  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena( substr yaml );
739 
740  template<class U>
741  RYML_DEPRECATED("moved to Tree::location(Parser const&). deliberately undefined here.")
742  auto location(Tree const&, id_type node) const -> typename std::enable_if<U::is_wtree, Location>::type;
743 
744  template<class U>
745  RYML_DEPRECATED("moved to ConstNodeRef::location(Parser const&), deliberately undefined here.")
746  auto location(ConstNodeRef const&) const -> typename std::enable_if<U::is_wtree, Location>::type;
747  /** @endcond */
748 
749 };
750 
751 
752 /** @} */
753 
754 } // namespace yml
755 } // namespace c4
756 
757 // NOLINTEND(hicpp-signed-bitwise)
758 
759 #if defined(_MSC_VER)
760 # pragma warning(pop)
761 #endif
762 
763 #endif /* _C4_YML_PARSE_ENGINE_HPP_ */
This is the main driver of parsing logic: it scans the YAML or JSON source for tokens,...
void reserve_stack(id_type capacity)
Reserve a certain capacity for the parsing stack.
FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation)
filter a plain scalar
csubstr location_contents(Location const &loc) const
Get the string starting at a particular location, to the end of the parsed source buffer.
FilterResult filter_scalar_squoted(csubstr scalar, substr dst)
filter a single-quoted scalar
ParseEngine(EventHandler *evt_handler, ParserOptions opts={})
FilterResult filter_scalar_dquoted(csubstr scalar, substr dst)
filter a double-quoted scalar
void parse_json_in_place_ev(csubstr filename, substr src)
parse JSON in place, emitting events to the current handler
Location val_location(const char *val) const
Given a pointer to a buffer position, get the location.
FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation)
filter a plain scalar in place
FilterResult filter_scalar_squoted_in_place(substr scalar, size_t cap)
filter a single-quoted scalar in place
FilterResultExtending filter_scalar_dquoted_in_place(substr scalar, size_t cap)
filter a double-quoted scalar in place
Encoding_e encoding() const
Get the encoding of the latest YAML buffer parsed by this object.
size_t locations_capacity() const
void parse_in_place_ev(csubstr filename, substr src)
parse YAML in place, emitting events to the current handler
csubstr source() const
Get the latest YAML buffer parsed by this object.
FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp)
filter a block-literal scalar in place
ParserOptions const & options() const
Get the options used to build this parser object.
FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp)
filter a block-literal scalar
id_type stack_capacity() const
Callbacks const & callbacks() const
Get the current callbacks in the parser.
EventHandler handler_type
FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp)
filter a block-folded scalar in place
csubstr filename() const
Get the name of the latest file parsed by this object.
void reserve_locations(size_t num_source_lines)
Reserve a certain capacity for the array used to track node locations in the source buffer.
FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp)
filter a block-folded scalar
#define RYML_EXPORT
Definition: export.hpp:18
void parse_in_arena(Parser *parser, csubstr filename, csubstr yaml, Tree *t, id_type node_id)
(1) parse YAML into an existing tree node. The filename will be used in any error messages arising du...
Definition: parse.cpp:92
void parse_in_place(Parser *parser, csubstr filename, substr yaml, Tree *t, id_type node_id)
(1) parse YAML into an existing tree node.
Definition: parse.cpp:38
ParseEngine< EventHandlerTree > Parser
This is the main ryml parser, where the parser events are handled to create a ryml tree.
Definition: fwd.hpp:19
id_type estimate_tree_capacity(csubstr src)
Quickly inspect the source to estimate the number of nodes the resulting tree is likely to have.
Definition: parse.cpp:135
RYML_ID_TYPE id_type
The type of a node id in the YAML tree; to override the default type, define the macro RYML_ID_TYPE t...
Definition: common.hpp:244
@ npos
a null string position
Definition: common.hpp:258
int ParserFlag_t
data type for ParserState_e
@ UTF8
UTF8.
Definition: common.hpp:264
@ NOBOM
No Byte Order Mark was found.
Definition: common.hpp:263
enum c4::yml::Encoding_ Encoding_e
csubstr version()
Definition: version.cpp:6
(Undefined by default) Use shorter error message from checks/asserts: do not show the check condition...
Definition: common.cpp:14
A c-style callbacks class to customize behavior on errors or allocation.
Definition: common.hpp:541
Abstracts the fact that a scalar filter result may not fit in the intended memory.
Abstracts the fact that a scalar filter result may not fit in the intended memory.
holds a source or yaml file position, for example when an error is detected; See also location_format...
Definition: common.hpp:284
Options to give to the parser to control its behavior.
Definition: common.hpp:350