rapidyaml  0.9.0
parse and emit YAML, and do it fast
parse_engine.hpp
Go to the documentation of this file.
1 #ifndef _C4_YML_PARSE_ENGINE_HPP_
2 #define _C4_YML_PARSE_ENGINE_HPP_
3 
4 #ifndef _C4_YML_DETAIL_PARSER_DBG_HPP_
5 #include "c4/yml/detail/parser_dbg.hpp"
6 #endif
7 
8 #ifndef _C4_YML_PARSER_STATE_HPP_
10 #endif
11 
12 
13 #if defined(_MSC_VER)
14 # pragma warning(push)
15 # pragma warning(disable: 4251/*needs to have dll-interface to be used by clients of struct*/)
16 #endif
17 
18 // NOLINTBEGIN(hicpp-signed-bitwise)
19 
20 namespace c4 {
21 namespace yml {
22 
23 /** @addtogroup doc_parse
24  * @{ */
25 
26 /** @defgroup doc_event_handlers Event Handlers
27  *
28  * @brief rapidyaml implements its parsing logic with a two-level
29  * model, where a @ref ParseEngine object reads through the YAML
30  * source, and dispatches events to an EventHandler bound to the @ref
31  * ParseEngine. Because @ref ParseEngine is templated on the event
32  * handler, the binding uses static polymorphism, without any virtual
33  * functions. The actual handler object can be changed at run time,
34  * (but of course needs to be the type of the template parameter).
35  * This is thus a very efficient architecture, and further enables the
36  * user to provide his own custom handler if he wishes to bypass the
37  * rapidyaml @ref Tree.
38  *
39  * There are two handlers implemented in this project:
40  *
41  * - @ref EventHandlerTree is the handler responsible for creating the
42  * ryml @ref Tree
43  *
44  * - @ref EventHandlerYamlStd is the handler responsible for emitting
45  * standardized [YAML test suite
46  * events](https://github.com/yaml/yaml-test-suite), used (only) in
47  * the CI of this project.
48  *
49  *
50  * ### Event model
51  *
52  * The event model used by the parse engine and event handlers follows
53  * very closely the event model in the [YAML test
54  * suite](https://github.com/yaml/yaml-test-suite).
55  *
56  * Consider for example this YAML,
57  * ```yaml
58  * {foo: bar,foo2: bar2}
59  * ```
60  * which would produce these events in the test-suite parlance:
61  * ```
62  * +STR
63  * +DOC
64  * +MAP {}
65  * =VAL :foo
66  * =VAL :bar
67  * =VAL :foo2
68  * =VAL :bar2
69  * -MAP
70  * -DOC
71  * -STR
72  * ```
73  *
74  * For reference, the @ref ParseEngine object will produce this
75  * sequence of calls to its bound EventHandler:
76  * ```cpp
77  * handler.begin_stream();
78  * handler.begin_doc();
79  * handler.begin_map_val_flow();
80  * handler.set_key_scalar_plain("foo");
81  * handler.set_val_scalar_plain("bar");
82  * handler.add_sibling();
83  * handler.set_key_scalar_plain("foo2");
84  * handler.set_val_scalar_plain("bar2");
85  * handler.end_map();
86  * handler.end_doc();
87  * handler.end_stream();
88  * ```
89  *
90  * For many other examples of all areas of YAML and how ryml's parse
91  * model corresponds to the YAML standard model, refer to the [unit
92  * tests for the parse
93  * engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).
94  *
95  *
96  * ### Special events
97  *
98  * Most of the parsing events adopted by rapidyaml in its event model
99  * are fairly obvious, but there are two less-obvious events requiring
100  * some explanation.
101  *
102  * These events exist to make it easier to parse some special YAML
103  * cases. They are called by the parser when a just-handled
104  * value/container is actually the first key of a new map:
105  *
106  * - `actually_val_is_first_key_of_new_map_flow()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerTree" / @ref EventHandlerYamlStd::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerYamlStd")
107  * - `actually_val_is_first_key_of_new_map_block()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerTree" / @ref EventHandlerYamlStd::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerYamlStd")
108  *
109  * For example, consider an implicit map inside a seq: `[a: b, c:
110  * d]` which is parsed as `[{a: b}, {c: d}]`. The standard event
111  * sequence for this YAML would be the following:
112  * ```cpp
113  * handler.begin_seq_val_flow();
114  * handler.begin_map_val_flow();
115  * handler.set_key_scalar_plain("a");
116  * handler.set_val_scalar_plain("b");
117  * handler.end_map();
118  * handler.add_sibling();
119  * handler.begin_map_val_flow();
120  * handler.set_key_scalar_plain("c");
121  * handler.set_val_scalar_plain("d");
122  * handler.end_map();
123  * handler.end_seq();
124  * ```
125  * The problem with this event sequence is that it forces the
126  * parser to delay setting the val scalar (in this case "a" and
127  * "c") until it knows whether the scalar is a key or a val. This
128  * would require the parser to store the scalar until this
129  * time. For instance, in the example above, the parser should
130  * delay setting "a" and "c", because they are in fact keys and
131  * not vals. Until then, the parser would have to store "a" and
132  * "c" in its internal state. The downside is that this complexity
133  * cost would apply even if there is no implicit map -- every val
134  * in a seq would have to be delayed until one of the
135  * disambiguating subsequent tokens `,-]:` is found.
136  * By calling this function, the parser can avoid this complexity,
137  * by preemptively setting the scalar as a val. Then a call to
138  * this function will create the map and rearrange the scalar as
139  * key. Now the cost applies only once: when a seqimap starts. So
140  * the following (easier and cheaper) event sequence below has the
141  * same effect as the event sequence above:
142  * ```cpp
143  * handler.begin_seq_val_flow();
144  * handler.set_val_scalar_plain("notmap");
145  * handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
146  * handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
147  * handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
148  * handler.end_map();
149  * handler.set_val_scalar_plain("c"); // "c" also as val!
150  * handler.actually_as_block_flow(); // likewise
151  * handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
152  * handler.end_map();
153  * handler.end_seq();
154  * ```
155  * This also applies to container keys (although ryml's tree
156  * cannot accomodate these): the parser can preemptively set a
157  * container as a val, and call this event to turn that container
158  * into a key. For example, consider this yaml:
159  * ```yaml
160  * [aa, bb]: [cc, dd]
161  * # ^ ^ ^
162  * # | | |
163  * # (2) (1) (3) <- event sequence
164  * ```
165  * The standard event sequence for this YAML would be the
166  * following:
167  * ```cpp
168  * handler.begin_map_val_block(); // (1)
169  * handler.begin_seq_key_flow(); // (2)
170  * handler.set_val_scalar_plain("aa");
171  * handler.add_sibling();
172  * handler.set_val_scalar_plain("bb");
173  * handler.end_seq();
174  * handler.begin_seq_val_flow(); // (3)
175  * handler.set_val_scalar_plain("cc");
176  * handler.add_sibling();
177  * handler.set_val_scalar_plain("dd");
178  * handler.end_seq();
179  * handler.end_map();
180  * ```
181  * The problem with the sequence above is that, reading from
182  * left-to-right, the parser can only detect the proper calls at
183  * (1) and (2) once it reaches (1) in the YAML source. So, the
184  * parser would have to buffer the entire event sequence starting
185  * from the beginning until it reaches (1). Using this function,
186  * the parser can do instead:
187  * ```cpp
188  * handler.begin_seq_val_flow(); // (2) -- preemptively as val!
189  * handler.set_val_scalar_plain("aa");
190  * handler.add_sibling();
191  * handler.set_val_scalar_plain("bb");
192  * handler.end_seq();
193  * handler.actually_as_new_map_key(); // (1) -- adjust when finding that the prev val was actually a key.
194  * handler.begin_seq_val_flow(); // (3) -- go on as before
195  * handler.set_val_scalar_plain("cc");
196  * handler.add_sibling();
197  * handler.set_val_scalar_plain("dd");
198  * handler.end_seq();
199  * handler.end_map();
200  * ```
201  */
202 
203 class Tree;
204 class NodeRef;
205 class ConstNodeRef;
206 
207 
208 //-----------------------------------------------------------------------------
209 //-----------------------------------------------------------------------------
210 //-----------------------------------------------------------------------------
211 
212 /** Options to give to the parser to control its behavior. */
214 {
215 private:
216 
217  typedef enum : uint32_t {
218  SCALAR_FILTERING = (1u << 0u),
219  LOCATIONS = (1u << 1u),
220  DEFAULTS = SCALAR_FILTERING,
221  } Flags_e;
222 
223  uint32_t flags = DEFAULTS;
224 
225 public:
226 
227  ParserOptions() = default;
228 
229 public:
230 
231  /** @name source location tracking */
232  /** @{ */
233 
234  /** enable/disable source location tracking */
235  ParserOptions& locations(bool enabled) noexcept
236  {
237  if(enabled)
238  flags |= LOCATIONS;
239  else
240  flags &= ~LOCATIONS;
241  return *this;
242  }
243  /** query source location tracking status */
244  C4_ALWAYS_INLINE bool locations() const noexcept { return (flags & LOCATIONS); }
245 
246  /** @} */
247 
248 public:
249 
250  /** @name scalar filtering status (experimental; disable at your discretion) */
251  /** @{ */
252 
253  /** enable/disable scalar filtering while parsing */
254  ParserOptions& scalar_filtering(bool enabled) noexcept
255  {
256  if(enabled)
257  flags |= SCALAR_FILTERING;
258  else
259  flags &= ~SCALAR_FILTERING;
260  return *this;
261  }
262  /** query scalar filtering status */
263  C4_ALWAYS_INLINE bool scalar_filtering() const noexcept { return (flags & SCALAR_FILTERING); }
264 
265  /** @} */
266 };
267 
268 
269 //-----------------------------------------------------------------------------
270 //-----------------------------------------------------------------------------
271 //-----------------------------------------------------------------------------
272 
273 /** This is the main driver of parsing logic: it scans the YAML or
274  * JSON source for tokens, and emits the appropriate sequence of
275  * parsing events to its event handler. The parse engine itself has no
276  * special limitations, and *can* accomodate containers as keys; it is the
277  * event handler may introduce additional constraints.
278  *
279  * There are two implemented handlers (see @ref doc_event_handlers,
280  * which has important notes about the event model):
281  *
282  * - @ref EventHandlerTree is the handler responsible for creating the
283  * ryml @ref Tree
284  *
285  * - @ref EventHandlerYamlStd is the handler responsible for emitting
286  * standardized [YAML test suite
287  * events](https://github.com/yaml/yaml-test-suite), used (only) in
288  * the CI of this project. This is not part of the library and is
289  * not installed.
290  */
291 template<class EventHandler>
293 {
294 public:
295 
296  using handler_type = EventHandler;
297 
298 public:
299 
300  /** @name construction and assignment */
301  /** @{ */
302 
303  ParseEngine(EventHandler *evt_handler, ParserOptions opts={});
304  ~ParseEngine();
305 
306  ParseEngine(ParseEngine &&) noexcept;
307  ParseEngine(ParseEngine const&);
308  ParseEngine& operator=(ParseEngine &&) noexcept;
309  ParseEngine& operator=(ParseEngine const&);
310 
311  /** @} */
312 
313 public:
314 
315  /** @name modifiers */
316  /** @{ */
317 
318  /** Reserve a certain capacity for the parsing stack.
319  * This should be larger than the expected depth of the parsed
320  * YAML tree.
321  *
322  * The parsing stack is the only (potential) heap memory used
323  * directly by the parser.
324  *
325  * If the requested capacity is below the default
326  * stack size of 16, the memory is used directly in the parser
327  * object; otherwise it will be allocated from the heap.
328  *
329  * @note this reserves memory only for the parser itself; all the
330  * allocations for the parsed tree will go through the tree's
331  * allocator (when different).
332  *
333  * @note for maximum efficiency, the tree and the arena can (and
334  * should) also be reserved. */
335  void reserve_stack(id_type capacity)
336  {
337  m_evt_handler->m_stack.reserve(capacity);
338  }
339 
340  /** Reserve a certain capacity for the array used to track node
341  * locations in the source buffer. */
342  void reserve_locations(size_t num_source_lines)
343  {
344  _resize_locations(num_source_lines);
345  }
346 
347  RYML_DEPRECATED("filter arena no longer needed")
348  void reserve_filter_arena(size_t) {}
349 
350  /** @} */
351 
352 public:
353 
354  /** @name getters */
355  /** @{ */
356 
357  /** Get the options used to build this parser object. */
358  ParserOptions const& options() const { return m_options; }
359 
360  /** Get the current callbacks in the parser. */
361  Callbacks const& callbacks() const { RYML_ASSERT(m_evt_handler); return m_evt_handler->m_stack.m_callbacks; }
362 
363  /** Get the name of the latest file parsed by this object. */
364  csubstr filename() const { return m_file; }
365 
366  /** Get the latest YAML buffer parsed by this object. */
367  csubstr source() const { return m_buf; }
368 
369  /** Get the encoding of the latest YAML buffer parsed by this object.
370  * If no encoding was specified, UTF8 is assumed as per the YAML standard. */
371  Encoding_e encoding() const { return m_encoding != NOBOM ? m_encoding : UTF8; }
372 
373  id_type stack_capacity() const { RYML_ASSERT(m_evt_handler); return m_evt_handler->m_stack.capacity(); }
374  size_t locations_capacity() const { return m_newline_offsets_capacity; }
375 
376  RYML_DEPRECATED("filter arena no longer needed")
377  size_t filter_arena_capacity() const { return 0u; }
378 
379  /** @} */
380 
381 public:
382 
383  /** @name parse methods */
384  /** @{ */
385 
386  /** parse YAML in place, emitting events to the current handler */
387  void parse_in_place_ev(csubstr filename, substr src);
388 
389  /** parse JSON in place, emitting events to the current handler */
390  void parse_json_in_place_ev(csubstr filename, substr src);
391 
392  /** @} */
393 
394 public:
395 
396  /** @name deprecated parse methods
397  * @{ */
398 
399  /** @cond dev */
400  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t, size_t node_id);
401  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, Tree *t, size_t node_id);
402  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t );
403  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, Tree *t );
404  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, NodeRef node );
405  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place( substr yaml, NodeRef node );
406  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place(csubstr filename, substr yaml );
407  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place( substr yaml );
408  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t, size_t node_id);
409  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, Tree *t, size_t node_id);
410  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t );
411  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, Tree *t );
412  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, NodeRef node );
413  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( csubstr yaml, NodeRef node );
414  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, csubstr yaml );
415  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena( csubstr yaml );
416  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t, size_t node_id);
417  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, Tree *t, size_t node_id);
418  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t );
419  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, Tree *t );
420  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, NodeRef node );
421  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena( substr yaml, NodeRef node );
422  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, substr yaml );
423  template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the freestanding csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena( substr yaml );
424  /** @endcond */
425 
426  /** @} */
427 
428 public:
429 
430  /** @name locations */
431  /** @{ */
432 
433  /** Get the location of a node of the last tree to be parsed by this parser. */
434  Location location(Tree const& tree, id_type node_id) const;
435  /** Get the location of a node of the last tree to be parsed by this parser. */
437  /** Get the string starting at a particular location, to the end
438  * of the parsed source buffer. */
439  csubstr location_contents(Location const& loc) const;
440  /** Given a pointer to a buffer position, get the location.
441  * @param[in] val must be pointing to somewhere in the source
442  * buffer that was last parsed by this object. */
443  Location val_location(const char *val) const;
444 
445  /** @} */
446 
447 public:
448 
449  /** @name scalar filtering */
450  /** @{*/
451 
452  /** filter a plain scalar */
453  FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation);
454  /** filter a plain scalar in place */
455  FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation);
456 
457  /** filter a single-quoted scalar */
458  FilterResult filter_scalar_squoted(csubstr scalar, substr dst);
459  /** filter a single-quoted scalar in place */
460  FilterResult filter_scalar_squoted_in_place(substr scalar, size_t cap);
461 
462  /** filter a double-quoted scalar */
463  FilterResult filter_scalar_dquoted(csubstr scalar, substr dst);
464  /** filter a double-quoted scalar in place */
465  FilterResultExtending filter_scalar_dquoted_in_place(substr scalar, size_t cap);
466 
467  /** filter a block-literal scalar */
468  FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
469  /** filter a block-literal scalar in place */
470  FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
471 
472  /** filter a block-folded scalar */
473  FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
474  /** filter a block-folded scalar in place */
475  FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
476 
477  /** @} */
478 
479 private:
480 
481  struct ScannedScalar
482  {
483  substr scalar;
484  bool needs_filter;
485  };
486 
487  struct ScannedBlock
488  {
489  substr scalar;
490  size_t indentation;
491  BlockChomp_e chomp;
492  };
493 
494  bool _is_doc_begin(csubstr s);
495  bool _is_doc_end(csubstr s);
496 
497  bool _scan_scalar_plain_blck(ScannedScalar *C4_RESTRICT sc, size_t indentation);
498  bool _scan_scalar_plain_seq_flow(ScannedScalar *C4_RESTRICT sc);
499  bool _scan_scalar_plain_seq_blck(ScannedScalar *C4_RESTRICT sc);
500  bool _scan_scalar_plain_map_flow(ScannedScalar *C4_RESTRICT sc);
501  bool _scan_scalar_plain_map_blck(ScannedScalar *C4_RESTRICT sc);
502  bool _scan_scalar_map_json(ScannedScalar *C4_RESTRICT sc);
503  bool _scan_scalar_seq_json(ScannedScalar *C4_RESTRICT sc);
504  bool _scan_scalar_plain_unk(ScannedScalar *C4_RESTRICT sc);
505  bool _is_valid_start_scalar_plain_flow(csubstr s);
506 
507  ScannedScalar _scan_scalar_squot();
508  ScannedScalar _scan_scalar_dquot();
509 
510  void _scan_block(ScannedBlock *C4_RESTRICT sb, size_t indref);
511 
512  csubstr _scan_anchor();
513  csubstr _scan_ref_seq();
514  csubstr _scan_ref_map();
515  csubstr _scan_tag();
516 
517 public: // exposed for testing
518 
519  /** @cond dev */
520  csubstr _filter_scalar_plain(substr s, size_t indentation);
521  csubstr _filter_scalar_squot(substr s);
522  csubstr _filter_scalar_dquot(substr s);
523  csubstr _filter_scalar_literal(substr s, size_t indentation, BlockChomp_e chomp);
524  csubstr _filter_scalar_folded(substr s, size_t indentation, BlockChomp_e chomp);
525 
526  csubstr _maybe_filter_key_scalar_plain(ScannedScalar const& sc, size_t indendation);
527  csubstr _maybe_filter_val_scalar_plain(ScannedScalar const& sc, size_t indendation);
528  csubstr _maybe_filter_key_scalar_squot(ScannedScalar const& sc);
529  csubstr _maybe_filter_val_scalar_squot(ScannedScalar const& sc);
530  csubstr _maybe_filter_key_scalar_dquot(ScannedScalar const& sc);
531  csubstr _maybe_filter_val_scalar_dquot(ScannedScalar const& sc);
532  csubstr _maybe_filter_key_scalar_literal(ScannedBlock const& sb);
533  csubstr _maybe_filter_val_scalar_literal(ScannedBlock const& sb);
534  csubstr _maybe_filter_key_scalar_folded(ScannedBlock const& sb);
535  csubstr _maybe_filter_val_scalar_folded(ScannedBlock const& sb);
536  /** @endcond */
537 
538 private:
539 
540  void _handle_map_block();
541  void _handle_seq_block();
542  void _handle_map_flow();
543  void _handle_seq_flow();
544  void _handle_seq_imap();
545  void _handle_map_json();
546  void _handle_seq_json();
547 
548  void _handle_unk();
549  void _handle_unk_json();
550  void _handle_usty();
551 
552  void _handle_flow_skip_whitespace();
553 
554  void _end_map_blck();
555  void _end_seq_blck();
556  void _end2_map();
557  void _end2_seq();
558 
559  void _begin2_doc();
560  void _begin2_doc_expl();
561  void _end2_doc();
562  void _end2_doc_expl();
563 
564  void _maybe_begin_doc();
565  void _maybe_end_doc();
566 
567  void _start_doc_suddenly();
568  void _end_doc_suddenly();
569  void _end_doc_suddenly__pop();
570  void _end_stream();
571 
572  void _set_indentation(size_t indentation);
573  void _save_indentation();
574  void _handle_indentation_pop_from_block_seq();
575  void _handle_indentation_pop_from_block_map();
576  void _handle_indentation_pop(ParserState const* dst);
577 
578  void _maybe_skip_comment();
579  void _skip_comment();
580  void _maybe_skip_whitespace_tokens();
581  void _maybe_skipchars(char c);
582  #ifdef RYML_NO_COVERAGE__TO_BE_DELETED
583  void _maybe_skipchars_up_to(char c, size_t max_to_skip);
584  #endif
585  template<size_t N>
586  void _skipchars(const char (&chars)[N]);
587  bool _maybe_scan_following_colon() noexcept;
588  bool _maybe_scan_following_comma() noexcept;
589 
590 public:
591 
592  /** @cond dev */
593  template<class FilterProcessor> auto _filter_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation) -> decltype(proc.result());
594  template<class FilterProcessor> auto _filter_squoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
595  template<class FilterProcessor> auto _filter_dquoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
596  template<class FilterProcessor> auto _filter_block_literal(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
597  template<class FilterProcessor> auto _filter_block_folded(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
598  /** @endcond */
599 
600 public:
601 
602  /** @cond dev */
603  template<class FilterProcessor> void _filter_nl_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation);
604  template<class FilterProcessor> void _filter_nl_squoted(FilterProcessor &C4_RESTRICT proc);
605  template<class FilterProcessor> void _filter_nl_dquoted(FilterProcessor &C4_RESTRICT proc);
606 
607  template<class FilterProcessor> bool _filter_ws_handle_to_first_non_space(FilterProcessor &C4_RESTRICT proc);
608  template<class FilterProcessor> void _filter_ws_copy_trailing(FilterProcessor &C4_RESTRICT proc);
609  template<class FilterProcessor> void _filter_ws_skip_trailing(FilterProcessor &C4_RESTRICT proc);
610 
611  template<class FilterProcessor> void _filter_dquoted_backslash(FilterProcessor &C4_RESTRICT proc);
612 
613  template<class FilterProcessor> void _filter_chomp(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp, size_t indentation);
614  template<class FilterProcessor> size_t _handle_all_whitespace(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp);
615  template<class FilterProcessor> size_t _extend_to_chomp(FilterProcessor &C4_RESTRICT proc, size_t contents_len);
616  template<class FilterProcessor> void _filter_block_indentation(FilterProcessor &C4_RESTRICT proc, size_t indentation);
617  template<class FilterProcessor> void _filter_block_folded_newlines(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
618  template<class FilterProcessor> size_t _filter_block_folded_newlines_compress(FilterProcessor &C4_RESTRICT proc, size_t num_newl, size_t wpos_at_first_newl);
619  template<class FilterProcessor> void _filter_block_folded_newlines_leading(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
620  template<class FilterProcessor> void _filter_block_folded_indented_block(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len, size_t curr_indentation) noexcept;
621 
622  /** @endcond */
623 
624 private:
625 
626  void _line_progressed(size_t ahead);
627  void _line_ended();
628  void _line_ended_undo();
629 
630  bool _finished_file() const;
631  bool _finished_line() const;
632 
633  void _scan_line();
634  substr _peek_next_line(size_t pos=npos) const;
635 
636  bool _at_line_begin() const
637  {
638  return m_evt_handler->m_curr->line_contents.rem.begin() == m_evt_handler->m_curr->line_contents.full.begin();
639  }
640 
641  void _relocate_arena(csubstr prev_arena, substr next_arena);
642  static void _s_relocate_arena(void*, csubstr prev_arena, substr next_arena);
643 
644 private:
645 
646  C4_ALWAYS_INLINE bool has_all(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == f; }
647  C4_ALWAYS_INLINE bool has_any(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) != 0; }
648  C4_ALWAYS_INLINE bool has_none(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == 0; }
649  static C4_ALWAYS_INLINE bool has_all(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == f; }
650  static C4_ALWAYS_INLINE bool has_any(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) != 0; }
651  static C4_ALWAYS_INLINE bool has_none(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == 0; }
652 
653  #ifndef RYML_DBG
654  C4_ALWAYS_INLINE static void add_flags(ParserFlag_t on, ParserState *C4_RESTRICT s) noexcept { s->flags |= on; }
655  C4_ALWAYS_INLINE static void addrem_flags(ParserFlag_t on, ParserFlag_t off, ParserState *C4_RESTRICT s) noexcept { s->flags &= ~off; s->flags |= on; }
656  C4_ALWAYS_INLINE static void rem_flags(ParserFlag_t off, ParserState *C4_RESTRICT s) noexcept { s->flags &= ~off; }
657  C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { m_evt_handler->m_curr->flags |= on; }
658  C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; m_evt_handler->m_curr->flags |= on; }
659  C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; }
660  #else
661  static void add_flags(ParserFlag_t on, ParserState *C4_RESTRICT s);
662  static void addrem_flags(ParserFlag_t on, ParserFlag_t off, ParserState *C4_RESTRICT s);
663  static void rem_flags(ParserFlag_t off, ParserState *C4_RESTRICT s);
664  C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { add_flags(on, m_evt_handler->m_curr); }
665  C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { addrem_flags(on, off, m_evt_handler->m_curr); }
666  C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { rem_flags(off, m_evt_handler->m_curr); }
667  #endif
668 
669 private:
670 
671  void _prepare_locations();
672  void _resize_locations(size_t sz);
673  bool _locations_dirty() const;
674 
675  bool _location_from_cont(Tree const& tree, id_type node, Location *C4_RESTRICT loc) const;
676  bool _location_from_node(Tree const& tree, id_type node, Location *C4_RESTRICT loc, id_type level) const;
677 
678 private:
679 
680  void _reset();
681  void _free();
682  void _clr();
683 
684  #ifdef RYML_DBG
685  template<class ...Args> void _dbg(csubstr fmt, Args const& C4_RESTRICT ...args) const;
686  #endif
687  template<class ...Args> void _err(csubstr fmt, Args const& C4_RESTRICT ...args) const;
688  template<class ...Args> void _errloc(csubstr fmt, Location const& loc, Args const& C4_RESTRICT ...args) const;
689 
690  template<class DumpFn> void _fmt_msg(DumpFn &&dumpfn) const;
691 
692 private:
693 
694  /** store pending tag or anchor/ref annotations */
695  struct Annotation
696  {
697  struct Entry
698  {
699  csubstr str;
700  size_t indentation;
701  size_t line;
702  };
703  Entry annotations[2];
704  size_t num_entries;
705  };
706 
707  void _handle_colon();
708  void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str, size_t indentation, size_t line);
709  void _clear_annotations(Annotation *C4_RESTRICT dst);
710  bool _has_pending_annotations() const { return m_pending_tags.num_entries || m_pending_anchors.num_entries; }
711  #ifdef RYML_NO_COVERAGE__TO_BE_DELETED
712  bool _handle_indentation_from_annotations();
713  #endif
714  bool _annotations_require_key_container() const;
715  void _handle_annotations_before_blck_key_scalar();
716  void _handle_annotations_before_blck_val_scalar();
717  void _handle_annotations_before_start_mapblck(size_t current_line);
718  void _handle_annotations_before_start_mapblck_as_key();
719  void _handle_annotations_and_indentation_after_start_mapblck(size_t key_indentation, size_t key_line);
720  size_t _select_indentation_from_annotations(size_t val_indentation, size_t val_line);
721  void _handle_directive(csubstr rem);
722  bool _handle_bom();
723  void _handle_bom(Encoding_e enc);
724 
725  void _check_tag(csubstr tag);
726 
727 private:
728 
729  ParserOptions m_options;
730 
731  csubstr m_file;
732  substr m_buf;
733 
734 public:
735 
736  /** @cond dev */
737  EventHandler *C4_RESTRICT m_evt_handler; // NOLINT
738  /** @endcond */
739 
740 private:
741 
742  Annotation m_pending_anchors;
743  Annotation m_pending_tags;
744 
745  bool m_was_inside_qmrk;
746  bool m_doc_empty = true;
747  size_t m_prev_colon = npos;
748 
749  Encoding_e m_encoding = UTF8;
750 
751 private:
752 
753  size_t *m_newline_offsets;
754  size_t m_newline_offsets_size;
755  size_t m_newline_offsets_capacity;
756  csubstr m_newline_offsets_buf;
757 
758 };
759 
760 /** @cond dev */
761 RYML_EXPORT C4_NO_INLINE size_t _find_last_newline_and_larger_indentation(csubstr s, size_t indentation) noexcept;
762 /** @endcond */
763 
764 
765 /** Quickly inspect the source to estimate the number of nodes the
766  * resulting tree is likely have. If a tree is empty before
767  * parsing, considerable time will be spent growing it, so calling
768  * this to reserve the tree size prior to parsing is likely to
769  * result in a time gain. We encourage using this method before
770  * parsing, but as always measure its impact in performance to
771  * obtain a good trade-off.
772  *
773  * @note since this method is meant for optimizing performance, it
774  * is approximate. The result may be actually smaller than the
775  * resulting number of nodes, notably if the YAML uses implicit
776  * maps as flow seq members as in `[these: are, individual:
777  * maps]`. */
778 RYML_EXPORT id_type estimate_tree_capacity(csubstr src); // NOLINT(readability-redundant-declaration)
779 
780 /** @} */
781 
782 } // namespace yml
783 } // namespace c4
784 
785 // NOLINTEND(hicpp-signed-bitwise)
786 
787 #if defined(_MSC_VER)
788 # pragma warning(pop)
789 #endif
790 
791 #endif /* _C4_YML_PARSE_ENGINE_HPP_ */
Holds a pointer to an existing tree, and a node id.
Definition: node.hpp:839
A reference to a node in an existing yaml tree, offering a more convenient API than the index-based A...
Definition: node.hpp:980
This is the main driver of parsing logic: it scans the YAML or JSON source for tokens,...
Location location(Tree const &tree, id_type node_id) const
Get the location of a node of the last tree to be parsed by this parser.
void reserve_stack(id_type capacity)
Reserve a certain capacity for the parsing stack.
FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation)
filter a plain scalar
csubstr location_contents(Location const &loc) const
Get the string starting at a particular location, to the end of the parsed source buffer.
FilterResult filter_scalar_squoted(csubstr scalar, substr dst)
filter a single-quoted scalar
ParseEngine(EventHandler *evt_handler, ParserOptions opts={})
FilterResult filter_scalar_dquoted(csubstr scalar, substr dst)
filter a double-quoted scalar
void reserve_filter_arena(size_t)
void parse_json_in_place_ev(csubstr filename, substr src)
parse JSON in place, emitting events to the current handler
Location val_location(const char *val) const
Given a pointer to a buffer position, get the location.
FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation)
filter a plain scalar in place
FilterResult filter_scalar_squoted_in_place(substr scalar, size_t cap)
filter a single-quoted scalar in place
FilterResultExtending filter_scalar_dquoted_in_place(substr scalar, size_t cap)
filter a double-quoted scalar in place
Encoding_e encoding() const
Get the encoding of the latest YAML buffer parsed by this object.
size_t locations_capacity() const
void parse_in_place_ev(csubstr filename, substr src)
parse YAML in place, emitting events to the current handler
csubstr source() const
Get the latest YAML buffer parsed by this object.
FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp)
filter a block-literal scalar in place
ParserOptions const & options() const
Get the options used to build this parser object.
size_t filter_arena_capacity() const
FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp)
filter a block-literal scalar
id_type stack_capacity() const
Callbacks const & callbacks() const
Get the current callbacks in the parser.
EventHandler handler_type
FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp)
filter a block-folded scalar in place
csubstr filename() const
Get the name of the latest file parsed by this object.
void reserve_locations(size_t num_source_lines)
Reserve a certain capacity for the array used to track node locations in the source buffer.
FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp)
filter a block-folded scalar
#define RYML_EXPORT
Definition: export.hpp:15
void parse_in_arena(Parser *parser, csubstr filename, csubstr yaml, Tree *t, id_type node_id)
(1) parse YAML into an existing tree node. The filename will be used in any error messages arising du...
Definition: parse.cpp:91
void parse_in_place(Parser *parser, csubstr filename, substr yaml, Tree *t, id_type node_id)
(1) parse YAML into an existing tree node.
Definition: parse.cpp:37
id_type estimate_tree_capacity(csubstr src)
Quickly inspect the source to estimate the number of nodes the resulting tree is likely have.
Definition: parse.cpp:152
RYML_ID_TYPE id_type
The type of a node id in the YAML tree; to override the default type, define the macro RYML_ID_TYPE t...
Definition: common.hpp:253
@ npos
a null string position
Definition: common.hpp:267
size_t _find_last_newline_and_larger_indentation(csubstr s, size_t indentation) noexcept
Definition: parse.cpp:132
int ParserFlag_t
data type for ParserState_e
csubstr version()
Definition: version.cpp:6
Encoding_e
Definition: common.hpp:427
@ UTF8
Definition: common.hpp:429
@ NOBOM
Definition: common.hpp:428
Definition: common.cpp:12
a c-style callbacks class.
Definition: common.hpp:376
a source file position
Definition: common.hpp:297
Options to give to the parser to control its behavior.
ParserOptions & scalar_filtering(bool enabled) noexcept
enable/disable scalar filtering while parsing
bool scalar_filtering() const noexcept
query scalar filtering status
bool locations() const noexcept
query source location tracking status
ParserOptions & locations(bool enabled) noexcept
enable/disable source location tracking