rapidyaml 0.14.0
parse and emit YAML, and do it fast
Loading...
Searching...
No Matches
Changelog

Current

Changes since latest release: current.md


0.14.0

Github release: 0.14.0

  • PR#607: add file utilities:
  • PR#609: fix misbuild in gcc 16.
  • PR#610: fix clang warnings: -Weverything. Thanks @TedLyngmo!
  • PR#613: fix #612: parse error on whitespace-only after block scalar indicators | or >:
    space after: >\space
    tag after: >\t
  • PR#614 improve base64 serialization facilities:
    std::string decoded;
    tree["node"] >> fmt::base64(decoded); // now possible
    // also can now obtain the size explicitly:
    substr buf = ...;
    size_t required = 0;
    tree["node"] >> fmt::base64(buf, &required);
  • Update c4core to 0.4.0.

Thanks


0.13.0

Github release: 0.13.0

Thanks


0.12.1

Github release: 0.12.1

  • Fix #597: parse error when quoted scalars start with ... or --- (PR#598). This fixes a regression introduced in (PR#587), while trying to ensure a parse error when ... or --- occur in quoted scalars at a line begin.

0.12.0

Github release: 0.12.0

This release focuses in compliance with the YAML standard, mostly by ensuring parse errors in invalid YAML cases. rapidyaml is now 100% compliant to the YAML test suite, both for valid and invalid YAML cases.

General fixes and improvements

  • Narrow the scope of TAG to only the following document, as per the standard (PR#588). This required or prompted some API changes:
    • Added new type TagDirectives
    • Added ParserOptions::resolve_tags() and ParserOptions::resolve_tags_all() options to allow controling resolution of tags while parsing. When disabled (which is the default), the tree still has Tree::resolve_tags() to perform post-parsing or programmatic resolution.
    • Changed TagDirective to point at its in-scope document, changed API to reflect this.
    • ParseEngine now has an instance of TagDirectives, and is in charge of updating it and checking for directive errors.
    • Removed the old TagDirective m_tag_directives[] members from the ints and testsuite handlers. Tree also has its own TagDirectives member, redundantly updated during parsing: this enables programmatic manipulation the tree's tag directives .
    • The event handlers now track the current document id, in order to enable the document scope check for TAG directives. This required adding node id tracking to the ints and testsuite handlers.
    • Added Tree::ancestor_doc(node_id) and ConstNodeRef::ancestor_doc() to query for the parent document of a node. This is needed to implement resolution of tags.
    • NodeType: rename tag directive types TAGD->TAGH (tag handle) and TAGV->TAGP (tag prefix).
    • Internal changes to improve the design of event handlers, moving relocation and error checking logic to ParseEngine, where it is most suited.
    • Fix warnings with clang-tidy 22
  • PR#596: Add TagCache accelerator structure (in c4/yml/tag.hpp), used by Tree::resolve_tags() and the parser engine. This reduces significantly the arena requirements for heavily-tagged YAML by ensuring reuse of resolved tags.

YAML fixes: valid cases

Fix parsing of valid YAML corner cases:

  • Ambiguity of tags/anchors in ? mode (PR#587):
    ? &mapanchor
    key: val
    ?
    &keyanchor key: val
  • flow tags/anchors with omitted plain scalar (PR#587):
    # ... likewise for !tag
    - [&anchor,&anchor]
    - {&anchor,&anchor}
    - [&anchor :,&anchor :]
    - {&anchor :,&anchor :}
    - [: &anchor,: &anchor]
    - {: &anchor,: &anchor}
    ---
    ? anchor
  • flow tags/anchors terminating with : (the colon is part of the tag/anchor) (PR#587):
    # ... likewise for !tag:
    - [&anchor:,&anchor:]
    - {&anchor:,&anchor:}
    - [&anchor: :,&anchor: :]
    - {&anchor: :,&anchor: :}
    - [: &anchor:,: &anchor:]
    - {: &anchor:,: &anchor:}
    ---
    ? anchor
  • Fix corner cases of explicit keys now allow same-line containers (PR#587):
    ? - a # same-line container key now parses successfully. both seqs and maps
    : - b # same-line container val now parses successfully. both seqs and maps
    ? ? - c # nested explicit keys were also fixed
    ? - d
  • Missing tags/anchors in some flow maps (PR#587).

YAML fixes: invalid cases

Ensure parse errors for invalid YAML cases, and improve reported error location:

  • TAG directives are valid only in the following document (PR#588).
    %TAG !m my-
    --- !m!first a
    --- !m!second a # error: %TAG directive out of scope here
    ...
    %TAG !m your-
    --- !m!second a # ok: %TAG provided now.
  • colon on newline at top level (PR#585):
    scalar
    : bad
    ---
    [seq]
    : bad
    ---
    {map: }
    : bad
  • colon on newline generally in block containers (PR#585):
    bad cases:
    scalar
    : bad colon
    [seq]
    : bad colon
    {map: }
    : bad colon
  • colon on newline in flow sequences (PR#586):
    [a
    :
    b]
  • tokens after explicit document end (PR#585):
    foo: bar
    ... bad tokens
  • - is invalid scalar in flow sequences (PR#586):
    [-]
    ---
    [-,-]
    ---
    [--,-]
    ---
    [-
    ]
  • doc start/begin tokens at zero indentation in seq flow and quoted scalars (PR#587):
    [
    ---,
    --- ,
    ...,
    ... ,
    # etc
    ]
    ---
    "
    --- # error as well
    ... # error as well
    "
    ---
    '
    --- # error as well
    ... # error as well
    '
  • nested flow containers now enforce the contextual parent indentation (PR#587):
    - - - [
    a # now this is a parse error
    ]
    - - - [
    a # this is ok
    ]
  • single/double-quoted scalars now enforce the contextual parent indentation (PR#587):
    - - - "a
    b" # now this is a parse error
    - - - "a
    b" # this is ok
  • plain scalars in block mode starting with , (PR#587):
    all invalid:
    - , foo
    - ,foo
    - ,
  • references with anchors or tags (PR#587):
    all invalid:
    - &anchor *ref
    - !tag *ref
  • directives with extra tokens (PR#587):
    %YAML 1.2 blabla # invalid
    %YAML 1. # this is ok
    %TAG ! my- ! blabla # this is also wrong
    ---
  • multiline implicit keys are invalid (PR#587):
    multiline
    plain: invalid
    'multiline
    squoted': invalid
    "multiline
    dquoted": invalid
    [multiline,
    seq]: invalid
    {multiline:
    map}: invalid
  • invalid block containers after document open (PR#592):
    --- a: b # invalid
    --- - a # invalid
    --- ? a # invalid
  • invalid block containers after in-block open (PR#592):
    a: - b # invalid
    a: ? b # invalid
  • same-line repeated annotations (PR#592):
    !a
    !b foo: bar # ok
    ---
    &a
    &b foo: bar # ok
    ---
    !a !b foo: bar # invalid
    ---
    &a &b foo: bar # invalid
  • Fix parsing of invalid YAML: block scalars with deindented first line (PR#593):
    # the _ characters are not part of the YAML;
    # they are used here only to show the leading whitespace
    empty block scalar: >
    _
    _
    _
    # comment

0.11.1

Github release: 0.11.1

  • PR#583: Fix corner cases of container keys. Eg, parsing of explicit keys forming valid YAML like:
    ?
    ? # was causing a parse error
    ? # popping was also causing a parse error
    ---
    ? [a: b]: x
    : y
    With this fix, rapidyaml now has a 100% success rate for valid YAML cases in the YAML test suite.
  • PR#580: fix compilation error when RYML_NO_DEFAULT_CALLBACKS is defined (thanks @toge)
  • PR#582: fix compilation error with clang-cl
  • Fix #584: install: RYML_VERSION was missing from rymlConfig.cmake
  • Update c4core to 0.2.11

Python

  • PR#579: python packaging files and CI infrastructure was moved to a different repo biojppm/rapidyaml-python. This was done because python packaging is notoriously hard and has always posed trouble in the CI, standing in the way of C++ development and releases.

Thanks


0.11.0

Github release: 0.11.0

New features

  • PR#550 - Implement flow multiline style (FLOW_ML):
    • The parser now detects this style automatically for flow seqs or maps when the terminating bracket sits on a line different from the opening bracket.
    • Added ParserOptions::detect_flow_ml() to enable/disable this behavior
    • Added EmitOptions::indent_flow_ml() to control indentation of FLOW_ML containers
    • The emit implementation was refactored, and is now significantly cleaner
    • Emitted YAML will now have anchors emitted before tags, as is customary (see example).
    • Added ParserOptions defaulted argument to temp-parser overloads of parse_{yaml,json}_in_{place,arena}()
    • PR#567 (fixes #566) fixes a regression from this refactor where top-level container anchors were wrongly emitted in the same line if no style was set on the container.

API changes

Fixes in YAML parsing

  • PR#561 (fixes #559) - Byte Order Mark: account for BOM length when determining block indentation
  • PR#547 - Fix parsing of implicit first documents with empty sequences, caused by a problem in Tree::set_root_as_stream():
    [] # this container was lost during parsing
    ---
    more data here
  • PR#576 - extra::events_ints_print(): Prevent integer overflow in bounds check (thanks @bytecodesky).

JSON emitting changes

  • PR#574 (fixes #535 and #312) - improve handling of .inf, .nan and some float formats when emitting to JSON. For example, the tree
    {
    inf: [inf, infinity, .inf, .Inf, .INF, -inf, -infinity, -.inf, -.Inf, -.INF],
    nan: [nan, .nan, .NaN, .NAN],
    dot: [.1, 1., .2e2, 10., -.2, -2.],
    zero: [10, 01],
    normal: [0.1, 0.2e3, 4.e5],
    }
    is now emitted to JSON as:
    {
    ".inf": [".inf",".inf",".inf",".inf","-.inf","-.inf","-.inf","-.inf","-.inf","-.inf"],
    ".nan": [".nan",".nan",".nan",".nan"],
    "dot": [0.1,1.0,0.2e2,10.0,-0.2,-2.0],
    "zero": [10,"01"],
    "normal": [0.1,0.2e3,4.e5]
    }
    Previously, some inf and nan cases were emitted without quotes; now they are all emitted with the fixed strings .nan and .inf, which helps in cases where the JSON may be loaded in JavaScript. Note also the added zeroes for some floats, eg .1 or -2. turning into 0.1 and -2.0.

Other changes

  • Update c4core to v0.2.10

Python improvements

  • PR#560 (see also #554): python improvements:
    • expose Tree::to_arena() in python. This allows safer and easier programatic creation of trees in python by ensuring scalars are placed into the tree and so have the same lifetime as the tree:
      t = ryml.Tree()
      s = t.to_arena(temp_string()) # Copy/serialize a temporary string to the tree's arena.
      # Works also with integers and floats.
      t.to_val(t.root_id(), s) # Now we can safely use the scalar in the tree:
      # there is no longer any risk of it being deallocated
    • improve behavior of Tree methods accepting scalars: all standard buffer types are now accepted (ie, str, bytes, bytearray and memoryview).
  • PR#565 (fixes #564) - Tree arena: allow relocation of zero-length strings when placed at the end (relax assertions triggered in Tree::_relocated())
  • PR#563 (fixes #562) - Fix bug in NodeRef::cend()
  • PR#568 - Move escape_scalar() from c4/yml/extra/scalar.hpp to c4/yml/escape_scalar.hpp (and removed the original header)

Thanks


0.10.0

Github release: 0.10.0

Extra event handlers

PR#536 adds a new major extra feature: a parser event handler that creates a compact representation of the YAML tree in a buffer of integers containing masks (to represent events) and offset+length (to represent strings in the source buffer).

This handler is meant for use by other programming languages, and it supports container keys (unlike the ryml tree). You can find this handler among the other headers in the new src_extra folder.

Changes

  • In PR#536 the location functions were moved from ParserEngine to Tree and ConstNodeRef. The parser engine is now fully agnostic vis-a-vis the type of the event-handler. (The location functions in the parser engine were a legacy of the initial implementation of the parser which was meant to create only ryml trees).
  • The tool ryml-yaml-events was updated to also dump integer events (and its command line options were changed to enable the different choices).

Fixes

  • Fix #524 (PR#525): problem parsing nested map value in complex map. Kudos to @MatthewSteel!
  • PR#542: \x Unicode sequences were not decoded. Thanks to @mutativesystems!
  • PR#541: std::is_trivial deprecated in c++26. Thanks to @P3RK4N!
  • Fix #529 (PR#530): double-quoted "<<" was mistaken for an inheriting reference.
  • PR#543: improvements to experimental style API:
    • Add getters to NodeType, Tree, NodeRef, and ConstNodeRef:
      • .key_style(): get the style flags in a node's key
      • .val_style(): get the style flags in a node's val
      • .container_style(): get the style flags in a node's container
    • Add style modifiers to NodeType, Tree, NodeRef, and ConstNodeRef:
      • .clear_style(bool recurse)
      • .set_style_conditionally(bool recurse)
  • Fix argument handling in ryml-parse-emit.

Thanks


0.9.0

Github release: 0.9.0

Fixes

  • Fix #400 (PR#506): clear anchors after resolving.
  • Fix #484 (PR#506): fix merge key order for last element to be overriden.
  • PR#503: ensure parse error on a: b: c and similar cases containing nested maps opening on the same line.
  • PR#502: fix parse errors or missing tags:
    • missing tags in empty documents:
      !yamlscript/v0/bare
      --- !code
      42
    • trailing empty keys or vals:
      a:
      :
    • missing tags in trailing empty keys or vals:
      a: !tag
      !tag : !tag
    • missing tags in complex maps:
      ? a: !tag
      !tag : !tag
      :
      !tag : !tag
  • PR#501: fix missing tag in - !!seq [].
  • PR#508: fix build with cmake 4.
  • PR#517 (fixes #516): fix python wheels for windows and macosx.
  • Fix #120 (again): add workaround for #define emit in Qt

Thanks


0.8.0

Github release: 0.8.0

Breaking changes

Fixes

  • PR#488:
    • add workarounds for problems with codegen of gcc 11,12,13.
    • improve CI coverage of gcc and clang optimization levels.
  • PR#496 and c4core PR#148: Add CI-proven support for CPU architectures:
    • mips, mipsel, mips64, mips64el
    • sparc, sparc64
    • riscv64
    • loongarch64
  • Fix #476 (PR#493): add handling of Byte Order Marks.
  • PR#492: fix emit of explicit keys when indented:
    fixed:
    ? explicit key
    : value
    previously:
    ? explicit key
    : value # this was not indented
  • PR#492: fix parser reset for full reuse (m_doc_empty was not resetted), which would cause problems under specific scenarios in subsequent reuse.
  • PR#485: improve the CI workflows (thanks to @ingydotnet):
    • amazing code reuse and organization, thanks to the use of YamlScript to generate the final workflows
    • all optimization levels are now covered for gcc, clang and Visual Studio.
  • PR#499: fix warnings with -Wundef.

Thanks


0.7.2

Fixes

  • Fix #464: test failures with g++14 -O2 in ppc64le (PR#467)

Thanks


0.7.1

Github release: 0.7.1

New features

  • PR#459: Add version functions and macros:
    #define RYML_VERSION "0.7.1"
    #define RYML_VERSION_MAJOR 0
    #define RYML_VERSION_MINOR 7
    #define RYML_VERSION_PATCH 1
    csubstr version();
    int version_major();
    int version_minor();
    int version_patch();

Fixes

  • Fix #455: parsing of trailing val-less nested maps when deindented to maps (PR#460)
  • Fix filtering of double-quoted keys in block maps (PR#452)
  • Fix #440: some tests failing with gcc -O2 (hypothetically due to undefined behavior)
    • This was accomplished by refactoring some internal parser functions; see the comments in #440 for further details.
    • Also, fix all warnings from scan-build.
  • Use malloc.h instead of alloca.h on MinGW (PR#447)
  • Fix #442 (PR#443):
    • Ensure leading + is accepted when deserializing numbers.
    • Ensure numbers are not quoted by fixing the heuristics in scalar_style_query_plain() and scalar_style_choose().
    • Add quickstart sample for overflow detection (only of integral types).
  • Parse engine: cleanup unused macros

Thanks


0.7.0

Github release: 0.7.0

Most of the changes are from the giant Parser refactor described below. Before getting to that, some other minor changes first.

Fixes

  • PR#431 - Emitter: prevent stack overflows when emitting malicious trees by providing a max tree depth for the emit visitor. This was done by adding an EmitOptions structure as an argument both to the emitter and to the emit functions, which is then forwarded to the emitter. This EmitOptions structure has a max tree depth setting with a default value of 64.
  • PR#431 - Fix _RYML_CB_ALLOC() using (T) in parenthesis, making the macro unusable.
  • #434 - Ensure empty vals are not deserialized (PR#436).
  • PR#433:
    • Fix some corner cases causing read-after-free in the tree's arena when it is relocated while filtering scalars.
    • Improve YAML error conformance - detect YAML-mandated parse errors when:

New features

  • PR#431 - append-emitting to existing containers in the emitrs_ functions, suggested in #345. This was achieved by adding a bool append=false as the last parameter of these functions.
  • PR#431 - add depth query methods:
    Tree::depth_asc(id_type) const; // O(log(num_tree_nodes)) get the depth of a node ascending (ie, from root to node)
    Tree::depth_desc(id_type) const; // O(num_tree_nodes) get the depth of a node descending (ie, from node to deep-most leaf node)
    ConstNodeRef::depth_asc() const; // likewise
    ConstNodeRef::depth_desc() const;
    NodeRef::depth_asc() const;
    NodeRef::depth_desc() const;
  • PR#432 - Added a function to estimate the required tree capacity, based on yaml markup:
    size_t estimate_tree_capacity(csubstr); // estimate number of nodes resulting from yaml

All other changes come from PR#414.

Parser refactor

The parser was completely refactored (PR#414). This was a large and hard job carried out over several months, but it brings important improvements.

  • The new parser is an event-based parser, based on an event dispatcher engine. This engine is templated on event handler, where each event is a function call, which spares branches on the event handler. The parsing code was fully rewritten, and is now much more simple (albeit longer), and much easier to work with and fix.
  • YAML standard-conformance was improved significantly. Along with many smaller fixes and additions, (too many to list here), the main changes are the following:
    • The parser engine can now successfully parse container keys, emitting all the events in correctly, but as before, the ryml tree cannot accomodate these (and this constraint is no longer enforced by the parser, but instead by EventHandlerTree). For an example of a handler which can accomodate key containers, see the one which is used for the test suite at test/test_suite/test_suite_event_handler.hpp
    • Anchor keys can now be terminated with colon (eg, &anchor: key: val), as dictated by the standard.
  • The parser engine can now be used to create native trees in other programming languages, or in cases where the user must have container keys.
  • Performance of both parsing and emitting improved significantly; see some figures below.

Strict JSON parser

  • A strict JSON parser was added. Use the parse_json_...() family of functions to parse json in stricter mode (and faster) than flow-style YAML.

YAML style preserved while parsing

  • The YAML style information is now fully preserved through parsing/emitting round trips. This was made possible because the event model of the new parsing engine now incorporates style varieties. So, for example:
    • a scalar parsed from a plain/single-quoted/double-quoted/block-literal/block-folded scalar will be emitted always using its original style in the YAML source
    • a container parsed in block-style will always be emitted in block-style
    • a container parsed in flow-style will always be emitted in flow-style Because of this, the style of YAML emitted by ryml changes from previous releases.
  • Scalar filtering was improved and is now done directly in the source being parsed (which may be in place or in the arena), except in the cases where the scalar expands and does not fit its initial range, in which case the scalar is filtered out of place to the tree's arena.
    • Filtering can now be disabled while parsing, to ensure a fully-readonly parse (but this feature is still experimental and somewhat untested, given the scope of the rewrite work).
    • The parser now offers methods to filter scalars in place or out of place.
  • Style flags were added to NodeType_e:
    FLOW_SL ///< mark container with single-line flow style (seqs as '[val1,val2], maps as '{key: val,key2: val2}')
    FLOW_ML ///< mark container with multi-line flow style (seqs as '[\n val1,\n val2\n], maps as '{\n key: val,\n key2: val2\n}')
    BLOCK ///< mark container with block style (seqs as '- val\n', maps as 'key: val')
    KEY_LITERAL ///< mark key scalar as multiline, block literal |
    VAL_LITERAL ///< mark val scalar as multiline, block literal |
    KEY_FOLDED ///< mark key scalar as multiline, block folded >
    VAL_FOLDED ///< mark val scalar as multiline, block folded >
    KEY_SQUO ///< mark key scalar as single quoted '
    VAL_SQUO ///< mark val scalar as single quoted '
    KEY_DQUO ///< mark key scalar as double quoted "
    VAL_DQUO ///< mark val scalar as double quoted "
    KEY_PLAIN ///< mark key scalar as plain scalar (unquoted, even when multiline)
    VAL_PLAIN ///< mark val scalar as plain scalar (unquoted, even when multiline)
  • Style predicates were added to NodeType, Tree, ConstNodeRef and NodeRef:
    bool is_container_styled() const;
    bool is_block() const
    bool is_flow_sl() const;
    bool is_flow_ml() const;
    bool is_flow() const;
    bool is_key_styled() const;
    bool is_val_styled() const;
    bool is_key_literal() const;
    bool is_val_literal() const;
    bool is_key_folded() const;
    bool is_val_folded() const;
    bool is_key_squo() const;
    bool is_val_squo() const;
    bool is_key_dquo() const;
    bool is_val_dquo() const;
    bool is_key_plain() const;
    bool is_val_plain() const;
  • Style modifiers were also added:
    void set_container_style(NodeType_e style);
    void set_key_style(NodeType_e style);
    void set_val_style(NodeType_e style);
  • Emit helper predicates were added, and are used when an emitted node was built programatically without style flags:
    /** choose a YAML emitting style based on the scalar's contents */
    NodeType_e scalar_style_choose(csubstr scalar) noexcept;
    /** query whether a scalar can be encoded using single quotes.
    * It may not be possible, notably when there is leading
    * whitespace after a newline. */
    bool scalar_style_query_squo(csubstr s) noexcept;
    /** query whether a scalar can be encoded using plain style (no
    * quotes, not a literal/folded block scalar). */
    bool scalar_style_query_plain(csubstr s) noexcept;

Breaking changes

As a result of the refactor, there are some limited changes with impact in client code. Even though this was a large refactor, effort was directed at keeping maximal backwards compatibility, and the changes are not wide. But they still exist:

  • The existing parse_...() methods in the Parser class were all removed. Use the corresponding parse_...(Parser*, ...) function from the header c4/yml/parse.hpp.
  • When instantiated by the user, the parser now needs to receive a EventHandlerTree object, which is responsible for building the tree. Although fully functional and tested, the structure of this class is still somewhat experimental and is still likely to change. There is an alternative event handler implementation responsible for producing the events for the YAML test suite in test/test_suite/test_suite_event_handler.hpp.
  • The declaration and definition of NodeType was moved to a separate header file c4/yml/node_type.hpp (previously it was in c4/yml/tree.hpp).
  • Some of the node type flags were removed, and several flags (and combination flags) were added.
    • Most of the existing flags are kept, as well as their meaning.
    • KEYQUO and VALQUO are now masks of the several style flags for quoted scalars. In general, however, client code using these flags and .is_val_quoted() or .is_key_quoted() is not likely to require any changes.

New type for node IDs

A type id_type was added to signify the integer type for the node id, defaulting to the backwards-compatible size_t which was previously used in the tree. In the future, this type is likely to change, and probably to a signed type, so client code is encouraged to always use id_type instead of the size_t, and specifically not to rely on the signedness of this type.

Reference resolver is now exposed

The reference (ie, alias) resolver object is now exposed in c4/yml/reference_resolver.hpp. Previously this object was temporarily instantiated in Tree::resolve(). Exposing it now enables the user to reuse this object through different calls, saving a potential allocation on every call.

Tag utilities

Tag utilities were moved to the new header c4/yml/tag.hpp. The types Tree::tag_directive_const_iterator and Tree::TagDirectiveProxy were deprecated. Fixed also an unitialization problem with Tree::m_tag_directives.

Performance improvements

To compare performance before and after this changeset, the benchmark runs were run (in the same PC), and the results were collected into these two files:

There are a lot of results in these files, and many insights can be obtained by browsing them; too many to list here. Below we show only some selected results.

Parsing

Here are some figures for parsing performance, for bm_ryml_inplace_reuse (name before) / bm_ryml_yaml_inplace_reuse (name after):

|---—|---------—|--------—|-----—|

case B/s before newparser B/s after newparser improv %
PARSE/appveyor.yml 168.628Mi/s 232.017Mi/s ~+40%
PARSE/compile_commands.json 630.17Mi/s 609.877Mi/s ~-3%
PARSE/travis.yml 193.674Mi/s 271.598Mi/s ~+50%
PARSE/scalar_dquot_multiline.yml 224.796Mi/s 187.335Mi/s ~-10%
PARSE/scalar_dquot_singleline.yml 339.889Mi/s 388.924Mi/s ~-16%

Some conclusions:

  • parse performance improved by ~30%-50% for YAML without filtering-heavy parsing.
  • parse performance decreased by ~10%-15% for YAML with filtering-heavy parsing. There is still some scope for improvement in the parsing code, so this cost may hopefully be minimized in the future.

Emitting

Here are some figures emitting performance improvements retrieved from these files, for bm_ryml_str_reserve (name before) / bm_ryml_yaml_str_reserve (name after):

|---—|---------—|--------—|

case B/s before newparser B/s after newparser
EMIT/appveyor.yml 311.718Mi/s 1018.44Mi/s
EMIT/compile_commands.json 434.206Mi/s 771.682Mi/s
EMIT/travis.yml 333.322Mi/s 1.41597Gi/s
EMIT/scalar_dquot_multiline.yml 868.6Mi/s 692.564Mi/s
EMIT/scalar_dquot_singleline.yml 336.98Mi/s 638.368Mi/s
EMIT/style_seqs_flow_outer1000_inner100.yml 136.826Mi/s 279.487Mi/s

Emit performance improved everywhere by over 1.5x and as much as 3x-4x for YAML without filtering-heavy parsing.


0.6.0

Github release: 0.6.0

Add API documentation

  • PR#423: add Doxygen-based API documentation, now hosted in https://rapidyaml.readthedocs.io/!
  • It uses the base doxygen docs, as I couldn't get doxyrest or breathe or exhale to produce anything meaningful using the doxygen groups already defined in the source code.

Error handling

Fix major error handling problem reported in #389 (PR#411):

  • The NodeRef and ConstNodeRef classes are now conditional noexcept using RYML_NOEXCEPT, which evaluates either to nothing when assertions are enabled, and to noexcept otherwise. The problem was that these classes had many methods explicitly marked noexcept, but were doing assertions which could throw exceptions, causing an abort instead of a throw whenever the assertion called an exception-throwing error callback.
  • This problem was compounded by assertions being enabled in every build type – despite the intention to have them only in debug builds. There was a problem in the preprocessor code to enable assertions which led to assertions being enabled in release builds even when RYML_USE_ASSERT was defined to 0. Thanks to @jdrouhard for reporting this.
  • Although the code is and was extensively tested, the testing was addressing mostly the happy path. Tests were added to ensure that the error behavior is as intended.
  • Together with this changeset, a major revision was carried out of the asserting/checking status of each function in the node classes. In most cases, assertions were added to functions that were missing them. So beware - some user code that was invalid will now assert or error out. Also, assertions and checks are now directed as much as possible to the callbacks of the closest scope: ie, if a tree has custom callbacks, errors within the tree class should go through those callbacks.
  • Also, the intended assertion behavior is now in place: no assertions in release builds. Beware as well - user code which was relying on this will now silently succeed and return garbage in release builds. See the next points, which may help.
  • Added new methods to the NodeRef/ConstNodeRef classes:
  • The state for NodeRef was refined, and now there are three mutually exclusive states (and class predicates) for an object of this class:
    • .invalid() when the object was not initialized to any node
    • .readable() when the object points at an existing tree+node
    • .is_seed() when the object points at an hypotethic tree+node
    • The previous state .valid() was deprecated: its semantics were confusing as it actually could be any of .readable() or .is_seed()
  • Deprecated also the following methods for NodeRef/ConstNodeRef:
    RYML_DEPRECATED() bool operator== (std::nullptr_t) const;
    RYML_DEPRECATED() bool operator!= (std::nullptr_t) const;
    RYML_DEPRECATED() bool operator== (csubstr val) const;
    RYML_DEPRECATED() bool operator!= (csubstr val) const;
  • Added macros and respective cmake options to control error handling:
    • RYML_USE_ASSERT - enable assertions regardless of build type. This is disabled by default. This macro was already defined; the current PR adds the cmake option.
    • RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS - make the default error handler provided by ryml throw exceptions instead of calling std::abort(). This is disabled by default.
  • Also, RYML_DEBUG_BREAK() is now enabled only if RYML_DBG is defined, as reported in #362.
  • As part of PR#423, to improve linters and codegen:
    • annotate the error handlers with [[noreturn]]/C4_NORETURN
    • annotate some error sites with C4_UNREACHABLE_AFTER_ERR()

More fixes

  • Tree::arena() const was returning a substr; this was an error. This function was changed to:
    csubstr Tree::arena() const;
    substr Tree::arena();
  • Fix #390 - csubstr::first_real_span() failed on scientific numbers with one digit in the exponent (PR#415).
  • Fix #361 - parse error on map scalars containing : and starting on the next line:
    ---
    # failed to parse:
    description:
    foo:bar
    ---
    # but this was ok:
    description: foo:bar
  • PR#368 - fix pedantic compiler warnings.
  • Fix #373 - false parse error with empty quoted keys in block-style map (PR#374).
  • Fix #356 - fix overzealous check in emit_as(). An id may be larger than the tree's size, eg when nodes were removed. (PR#357).
  • Fix #417 - add quickstart example explaining how to avoid precision loss while serializing floats (PR#420).
  • Fix #380 - Debug visualizer .natvis file for Visual Studio was missing ConstNodeRef (PR#383).
  • FR #403 - install is now optional when using cmake. The relevant option is RYML_INSTALL.

Python

  • Fix #428/#412 - Parse errors now throw RuntimeError instead of aborting.

Thanks


0.5.0

Github release: 0.5.0

Breaking changes

  • Make the node API const-correct (PR#267): added ConstNodeRef to hold a constant reference to a node. As the name implies, a ConstNodeRef object cannot be used in any tree-mutating operation. It is also smaller than the existing NodeRef (and faster because it does not need to check its own validity on every access). As a result of this change, there are now some constraints when obtaining a ref from a tree, and existing code is likely to break in this type of situation:
    const Tree const_tree = ...;
    NodeRef nr = const_tree.rootref(); // ERROR (was ok): cannot obtain a mutating NodeRef from a const Tree
    ConstNodeRef cnr = const_tree.rootref(); // ok
    Tree tree = ...;
    NodeRef nr = tree.rootref(); // ok
    ConstNodeRef cnr = tree.rootref(); // ok (implicit conversion from NodeRef to ConstNodeRef)
    // to obtain a ConstNodeRef from a mutable Tree
    // while avoiding implicit conversion, use the `c`
    // prefix:
    ConstNodeRef cnr = tree.crootref();
    // likewise for tree.ref() and tree.cref().
    nr = cnr; // ERROR: cannot obtain NodeRef from ConstNodeRef
    cnr = nr; // ok
    The use of ConstNodeRef also needs to be propagated through client code. One such place is when deserializing types:
    // needs to be changed from:
    template<class T> bool read(ryml::NodeRef const& n, T *var);
    // ... to:
    template<class T> bool read(ryml::ConstNodeRef const& n, T *var);
    Holds a pointer to an existing tree, and a node id.
    Definition node.hpp:827
    A reference to a node in an existing yaml tree, offering a more convenient API than the index-based A...
    Definition node.hpp:967
    bool read(ryml::ConstNodeRef const &n, my_seq_type< T > *seq)
    • The initial version of ConstNodeRef/NodeRef had the problem that const methods in the CRTP base did not participate in overload resolution (#294), preventing calls from const NodeRef objects. This was fixed by moving non-const methods to the CRTP base and disabling them with SFINAE (PR#295).
    • Also added disambiguation iteration methods: .cbegin(), .cend(), .cchildren(), .csiblings() (PR#295).
  • Deprecate emit() and emitrs() (#120, PR#303): use emit_yaml() and emitrs_yaml() instead. This was done to improve compatibility with Qt, which leaks a macro named emit. For more information, see #120.
    • In the Python API:
      • Deprecate emit(), add emit_yaml() and emit_json().
      • Deprecate compute_emit_length(), add compute_emit_yaml_length() and compute_emit_json_length().
      • Deprecate emit_in_place(), add emit_yaml_in_place() and emit_json_in_place().
      • Calling the deprecated functions will now trigger a warning.
  • Location querying is no longer done lazily (#260, PR#307). It now requires explicit opt-in when instantiating the parser. With this change, the accelerator structure for location querying is now built when parsing:
    Parser parser(ParserOptions().locations(true));
    // now parsing also builds location lookup:
    Tree t = parser.parse_in_arena("myfile.yml", "foo: bar");
    assert(parser.location(t["foo"]).line == 0u);
    • Locations are disabled by default:
      Parser parser;
      assert(parser.options().locations() == false);
  • Deprecate Tree::arena_pos(): use Tree::arena_size() instead (PR#290).
  • Deprecate pointless has_siblings(): use Tree::has_other_siblings() instead (PR#330.

Performance improvements

  • Improve performance of integer serialization and deserialization (in c4core). Eg, on Linux/g++11.2, with integral types:
    • c4::to_chars() can be expected to be roughly...
      • ~40% to 2x faster than std::to_chars()
      • ~10x-30x faster than sprintf()
      • ~50x-100x faster than a naive stringstream::operator<<() followed by stringstream::str()
    • c4::from_chars() can be expected to be roughly...
      • ~10%-30% faster than std::from_chars()
      • ~10x faster than scanf()
      • ~30x-50x faster than a naive stringstream::str() followed by stringstream::operator>>() For more details, see the changelog for c4core 0.1.10.
  • Fix #289 and #331 - parsing of single-line flow-style sequences had quadratic complexity, causing long parse times in ultra long lines PR#293/PR#332.
    • This was due to scanning for the token : before scanning for , or ], which caused line-length scans on every scalar scan. Changing the order of the checks was enough to address the quadratic complexity, and the parse times for flow-style are now in line with block-style.
    • As part of this changeset, a significant number of runtime branches was eliminated by separating Parser::_scan_scalar() into several different {seq,map}x{block,flow} functions specific for each context. Expect some improvement in parse times.
    • Also, on Debug builds (or assertion-enabled builds) there was a paranoid assertion calling Tree::has_child() in Tree::insert_child() that caused quadratic behavior because the assertion had linear complexity. It was replaced with a somewhat equivalent O(1) assertion.
    • Now the byte throughput is independent of line size for styles and containers. This can be seen in the table below, which shows parse troughputs in MB/s of 1000 containers of different styles and sizes (flow containers are in a single line):
Container Style 10elms 100elms 1000elms
1000 Maps block 50.8MB/s 57.8MB/s 63.9MB/s
1000 Maps flow 58.2MB/s 65.9MB/s 74.5MB/s
1000 Seqs block 55.7MB/s 59.2MB/s 60.0MB/s
1000 Seqs flow 52.8MB/s 55.6MB/s 54.5MB/s
  • Fix #329: complexity of has_sibling() and has_child() is now O(1), previously was linear (PR#330).

Fixes

  • Fix #233 - accept leading colon in the first key of a flow map (UNK node) PR#234:
    :foo: # parse error on the leading colon
    :bar: a # parse error on the leading colon
    :barbar: b # was ok
    :barbarbar: c # was ok
    foo: # was ok
    bar: a # was ok
    :barbar: b # was ok
    :barbarbar: c # was ol
  • Fix #253: double-quoted emitter should encode carriage-return \r to preserve roundtrip equivalence:
    Tree tree;
    NodeRef root = tree.rootref();
    root |= MAP;
    root["s"] = "t\rt";
    root["s"] |= _WIP_VAL_DQUO;
    std::string s = emitrs<std::string>(tree);
    EXPECT_EQ(s, "s: \"t\\rt\"\n");
    Tree tree2 = parse_in_arena(to_csubstr(s));
    EXPECT_EQ(tree2["s"].val(), tree["s"].val());
  • Fix parsing of empty block folded+literal scalars when they are the last child of a container (part of PR#264):
    seq:
    - ""
    - ''
    - >
    - | # error, the resulting val included all the YAML from the next node
    seq2:
    - ""
    - ''
    - |
    - > # error, the resulting val included all the YAML from the next node
    map:
    a: ""
    b: ''
    c: >
    d: | # error, the resulting val included all the YAML from the next node
    map2:
    a: ""
    b: ''
    c: |
    d: > # error, the resulting val included all the YAML from the next node
    lastly: the last
  • Fix #274 (PR#296): Lists with unindented items and trailing empty values parse incorrectly:
    foo:
    - bar
    -
    baz: qux
    was wrongly parsed as
    foo:
    - bar
    - baz: qux
  • Fix #277 (PR#340): merge fails with duplicate keys.
  • Fix #337 (PR#338): empty lines in block scalars shall not have tab characters \t.
  • Fix #268 (PR#339): don't override key type_bits when copying val. This was causing problematic resolution of anchors/references.
  • Fix #309 (PR#310): emitted scalars containing @ or ` should be quoted.
    • The quotes should be added only when they lead the scalar. See #320 and PR#334.
  • Fix #297 (PR#298): JSON emitter should escape control characters.
  • Fix #292 (PR#299): JSON emitter should quote version string scalars like 0.1.2.
  • Fix #291 (PR#299): JSON emitter should quote scalars with leading zero, eg 048.
  • Fix #280 (PR#281): deserialization of std::vector<bool> failed because its operator[] returns a reference instead of value_type.
  • Fix #288 (PR#290): segfault on successive calls to Tree::_grow_arena(), caused by using the arena position instead of its length as starting point for the new arena capacity.
  • Fix #324 (PR#328): eager assertion prevented moving nodes to the first position in a parent.
  • Fix Tree::_clear_val(): was clearing key instead (PR#335).
  • YAML test suite events emitter: fix emission of inheriting nodes. The events for {<<: *anchor, foo: bar} are now correctly emitted as:
    =VAL :<< # previously was =ALI <<
    =ALI *anchor
    =VAL :foo
    =VAL :bar
  • Fix #246: add missing #define for the include guard of the amalgamated header.
  • Fix #326: honor runtime settings for calling debugbreak, add option to disable any calls to debugbreak.
  • Fix cmake#8: SOVERSION missing from shared libraries.

Python

  • The Python packages for Windows and MacOSX are causing problems in the CI, and were mostly disabled. The problematic packages are successfully made, but then fail to be imported. This was impossible to reproduce outside of the CI, and they were disabled since they were delaying the release. As a consequence, the Python release will have very limited compiled packages for Windows (only Python 3.6 and 3.7) or MacOSX. Help would be appreciated from those interested in these packages.

Thanks


0.4.1

Github release: 0.4.1

Fixes

  • Fix #223: assertion peeking into the last line when it was whitespaces only.

0.4.0

Github release: 0.4.0

This release improves compliance with the YAML test suite (thanks @ingydotnet and @perlpunk for extensive and helpful cooperation), and adds node location tracking using the parser.

Breaking changes

As part of the new feature to track source locations, opportunity was taken to address a number of pre-existing API issues. These changes consisted of:

  • Deprecate c4::yml::parse() and c4::yml::Parser::parse() overloads; all these functions will be removed in short order. Until removal, any call from client code will trigger a compiler warning.
  • Add parse() alternatives, either parse_in_place() or parse_in_arena():
    • parse_in_place() receives only substr buffers, ie mutable YAML source buffers. Trying to pass a csubstr buffer to parse_in_place() will cause a compile error:
      substr readwrite = /*...*/;
      Tree tree = parse_in_place(readwrite); // OK
      csubstr readonly = /*...*/;
      Tree tree = parse_in_place(readonly); // compile error
    • parse_in_arena() receives only csubstr buffers, ie immutable YAML source buffers. Prior to parsing, the buffer is copied to the tree's arena, then the copy is parsed in place. Because parse_in_arena() is meant for immutable buffers, overloads receiving a substr YAML buffer are now declared but marked deprecated, and intentionally left undefined, such that calling parse_in_arena() with a substr will cause a linker error as well as a compiler warning.
      substr readwrite = /*...*/;
      Tree tree = parse_in_arena(readwrite); // compile warning+linker error
      This is to prevent an accidental extra copy of the mutable source buffer to the tree's arena: substr is implicitly convertible to csubstr. If you really intend to parse an originally mutable buffer in the tree's arena, convert it first explicitly to immutable by assigning the substr to a csubstr prior to calling parse_in_arena():
      substr readwrite = /*...*/;
      csubstr readonly = readwrite; // ok
      Tree tree = parse_in_arena(readonly); // ok
      This problem does not occur with parse_in_place() because csubstr is not implicitly convertible to substr.
  • In the python API, ryml.parse() was removed and not just deprecated; the parse_in_arena() and parse_in_place() now replace this.
  • Callbacks: changed behavior in Parser and Tree:
    • When a tree is copy-constructed or move-constructed to another, the receiving tree will start with the callbacks of the original.
    • When a tree is copy-assigned or move-assigned to another, the receiving tree will now change its callbacks to the original.
    • When a parser creates a new tree, the tree will now use a copy of the parser's callbacks object.
    • When an existing tree is given directly to the parser, both the tree and the parser now retain their own callback objects; any allocation or error during parsing will go through the respective callback object.

New features

  • Add tracking of source code locations. This is useful for reporting semantic errors after the parsing phase (ie where the YAML is syntatically valid and parsing is successful, but the tree contents are semantically invalid). The locations can be obtained lazily from the parser when the first location is queried:
    // To obtain locations, use of the parser is needed:
    ryml::Parser parser;
    ryml::Tree tree = parser.parse_in_arena("source.yml", R"({
    aa: contents,
    foo: [one, [two, three]]
    })");
    // After parsing, on the first call to obtain a location,
    // the parser will cache a lookup structure to accelerate
    // tracking the location of a node, with complexity
    // O(numchars(srcbuffer)). Then it will do the lookup, with
    // complexity O(log(numlines(srcbuffer))).
    ryml::Location loc = parser.location(tree.rootref());
    assert(parser.location_contents(loc).begins_with("{"));
    // note the location members are zero-based:
    assert(loc.offset == 0u);
    assert(loc.line == 0u);
    assert(loc.col == 0u);
    // On the next call to location(), the accelerator is reused
    // and only the lookup is done.
    loc = parser.location(tree["aa"]);
    assert(parser.location_contents(loc).begins_with("aa"));
    assert(loc.offset == 2u);
    assert(loc.line == 1u);
    assert(loc.col == 0u);
    // KEYSEQ in flow style: points at the key
    loc = parser.location(tree["foo"]);
    assert(parser.location_contents(loc).begins_with("foo"));
    assert(loc.offset == 16u);
    assert(loc.line == 2u);
    assert(loc.col == 0u);
    loc = parser.location(tree["foo"][0]);
    assert(parser.location_contents(loc).begins_with("one"));
    assert(loc.line == 2u);
    assert(loc.col == 6u);
    // SEQ in flow style: location points at the opening '[' (there's no key)
    loc = parser.location(tree["foo"][1]);
    assert(parser.location_contents(loc).begins_with("["));
    assert(loc.line == 2u);
    assert(loc.col == 11u);
    loc = parser.location(tree["foo"][1][0]);
    assert(parser.location_contents(loc).begins_with("two"));
    assert(loc.line == 2u);
    assert(loc.col == 12u);
    loc = parser.location(tree["foo"][1][1]);
    assert(parser.location_contents(loc).begins_with("three"));
    assert(loc.line == 2u);
    assert(loc.col == 17u);
    // NOTE: reusing the parser with a new YAML source buffer
    // will invalidate the accelerator.
    csubstr location_contents(Location const &loc) const
    Get the string starting at a particular location, to the end of the parsed source buffer.
    ParseEngine< EventHandlerTree > Parser
    This is the main ryml parser, where the parser events are handled to create a ryml tree.
    Definition fwd.hpp:19
    bool begins_with(const C c) const noexcept
    true if the first character of the string is c
    Definition substr.hpp:851
    holds a source or yaml file position, for example when an error is detected; See also location_format...
    Definition common.hpp:289
    size_t col
    column
    Definition common.hpp:292
    size_t line
    line
    Definition common.hpp:291
    size_t offset
    number of bytes from the beginning of the source buffer
    Definition common.hpp:290
    See more details in the quickstart sample. Thanks to @cschreib for submitting a working example proving how simple it could be to achieve this.
  • Parser:
    • add source() and filename() to get the latest buffer and filename to be parsed
    • add callbacks() to get the parser's callbacks
  • Add from_tag_long() and normalize_tag_long():
    assert(from_tag_long(TAG_MAP) == "<tag:yaml.org,2002:map>");
    assert(normalize_tag_long("!!map") == "<tag:yaml.org,2002:map>");
  • Add an experimental API to resolve tags based on the tree's tag directives. This API is still imature and will likely be subject to changes, so we won't document it yet.
  • Regarding emit styles (see issue #37): add an experimental API to force flow/block style on container nodes, as well as block-literal/block-folded/double-quoted/single-quoted/plain styles on scalar nodes. This API is also immature and will likely be subject to changes, so we won't document it yet. But if you are desperate for this functionality, the new facilities will let you go further.
  • Add preliminary support for bare-metal ARM architectures, with CI tests pending implementation of QEMU action. (#193, c4core#63).
  • Add preliminary support for RISC-V architectures, with CI tests pending availability of RISC-V based github actions. (c4core#69).

Fixes

  • Fix edge cases of parsing of explicit keys (ie keys after ?) (PR#212):
    # all these were fixed:
    ? : # empty
    ? explicit key # this comment was not parsed correctly
    ? # trailing empty key was not added to the map
  • Fixed parsing of tabs used as whitespace tokens after : or -. This feature is costly (see some benchmark results here) and thus it is disabled by default, and requires defining a macro or cmake option RYML_WITH_TAB_TOKENS to enable (PR#211).
  • Allow tab indentation in flow seqs (PR#215) (6CA3).
  • ryml now parses successfully compact JSON code {"like":"this"} without any need for preprocessing. This code was not valid YAML 1.1, but was made valid in YAML 1.2. So the preprocess_json() functions, used to insert spaces after : are no longer necessary and have been removed. If you were using these functions, remove the calls and just pass the original source directly to ryml's parser (PR#210).
  • Fix handling of indentation when parsing block scalars (PR#210):
    ---
    |
    hello
    there
    ---
    |
    ciao
    qua
    ---
    - |
    hello
    there
    - |
    ciao
    qua
    ---
    foo: |
    hello
    there
    bar: |
    ciao
    qua
  • Fix parsing of maps when opening a scope with whitespace before the colon (PR#210):
    foo0 : bar
    ---
    foo1 : bar # the " :" was causing an assert
    ---
    foo2 : bar
    ---
    foo3 : bar
    ---
    foo4 : bar
  • Ensure container keys preserve quote flags when the key is quoted (PR#210).
  • Ensure scalars beginning with % are emitted with quotes ((PR#216).
  • Fix #203: when parsing, do not convert null or ~ to null scalar strings. Now the scalar strings contain the verbatim contents of the original scalar; to query whether a scalar value is null, use Tree::key_is_null()/val_is_null() and NodeRef::key_is_null()/val_is_null() which return true if it is empty or any of the unquoted strings ~, null, Null, or NULL. (PR#207):
  • Fix #205: fix parsing of escaped characters in double-quoted strings: `"\\\"\n\r\t\<TAB>\/\<SPC>\0\b\f\a\v\e\_\N\L\P" ([PR#207](https://github.com/biojppm/rapidyaml/pulls/207)).
  • Fix [#204](https://github.com/biojppm/rapidyaml/issues/204): add decoding of unicode codepoints \c "\x" \c "\u" \c "\U" in double-quoted scalars: @code{c++} Tree tree = parse_in_arena(R"(["\u263A \xE2\x98\xBA \u2705 \U0001D11E"])"); assert(tree[0].val() == "☺ ☺ ✅ 𝄞"); @endcode This is mandated by the YAML standard and was missing from ryml ([PR#207](https://github.com/biojppm/rapidyaml/pulls/207)).
  • Fix emission of nested nodes which are sequences: when these are given as the emit root, the - from the parent node was added ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)): @code{c++} const ryml::Tree tree = ryml::parse_in_arena(R"(
    • - Rochefort 10
      • Busch
      • Leffe Rituel
      • - and so
        • many other
        • wonderful beers )"); // before (error), YAML valid but not expected //assert(ryml::emitrs<std::string>(tree[0][3]) == R"(- - and so // - many other // - wonderful beers //)"); // now: YAML valid and expected assert(ryml::emitrs<std::string>(tree[0][3]) == R"(- and so
    • many other
    • wonderful beers )"); @endcode
  • Fix parsing of isolated !: should be an empty val tagged with ! (UKK06-02) ([PR#215](https://github.com/biojppm/rapidyaml/pulls/215)).
  • Fix [#193](https://github.com/biojppm/rapidyaml/issues/193): amalgamated header missing #include <stdarg.h> which prevented compilation in bare-metal arm-none-eabi ([PR #195](https://github.com/biojppm/rapidyaml/pull/195), requiring also [c4core #64](https://github.com/biojppm/c4core/pull/64)).
  • Accept infinity,inf and nan as special float values (but not mixed case: eg InFiNiTy or Inf or NaN are not accepted) ([PR #186](https://github.com/biojppm/rapidyaml/pull/186)).
  • Accept special float values with upper or mixed case: .Inf, .INF, .NaN, .NAN. Previously, only low-case .inf and .nan were accepted ([PR #186](https://github.com/biojppm/rapidyaml/pull/186)).
  • Accept null with upper or mixed case: Null or NULL. Previously, only low-case null was accepted ([PR #186](https://github.com/biojppm/rapidyaml/pull/186)).
  • Fix [#182](https://github.com/biojppm/rapidyaml/issues/182): add missing export of DLL symbols, and document requirements for compiling shared library from the amalgamated header. [PR #183](https://github.com/biojppm/rapidyaml/pull/183), also [PR c4core#56](https://github.com/biojppm/c4core/pull/56) and [PR c4core#57](https://github.com/biojppm/c4core/pull/57).
  • Fix [#185](https://github.com/biojppm/rapidyaml/issues/185): compilation failures in earlier Xcode versions ([PR #187](https://github.com/biojppm/rapidyaml/pull/187) and [PR c4core#61](https://github.com/biojppm/c4core/pull/61)):
    c4/substr_fwd.hpp: (failure in Xcode 12 and earlier) forward declaration for std::allocator is inside the inline namespace 1, unlike later versions.
    c4/error.hpp: (failure in debug mode in Xcode 11 and earlier) __clang_major
    does not mean the same as in the common clang, and as a result the warning -Wgnu-inline-cpp-without-extern does not exist there.
  • Ensure error messages do not wrap around the buffer when the YAML source line is too long ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
  • Ensure error is emitted on unclosed flow sequence characters eg [[[ ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)). Same thing for []]`.
  • Refactor error message building and parser debug logging to use the new dump facilities in c4core (PR#212).
  • Parse: fix read-after-free when duplicating a parser state node, when pushing to the stack requires a stack buffer resize (PR#210).
  • Add support for legacy gcc 4.8 (PR#217).

Improvements

  • Rewrite filtering of scalars to improve parsing performance (PR #188). Previously the scalar strings were filtered in place, which resulted in quadratic complexity in terms of scalar length. This did not matter for small scalars fitting the cache (which is the more frequent case), but grew in cost as the scalars grew larger. To achieve linearity, the code was changed so that the strings are now filtered to a temporary scratch space in the parser, and copied back to the output buffer after filtering, if any change occurred. The improvements were large for the folded scalars; the table below shows the benchmark results of throughput (MB/s) for several files containing large scalars of a single type:

    scalar type before after improvement
    block folded 276 561 103%
    block literal 331 611 85%
    single quoted 247 267 8%
    double quoted 212 230 8%
    plain (unquoted) 173 186 8%

    The cost for small scalars is negligible, with benchmark improvement in the interval of -2% to 5%, so well within the margin of benchmark variability in a regular OS. In the future, this will be optimized again by copying each character in place, thus completely avoiding the staging arena.

  • Callbacks: add operator==() and operator!=() (PR #168).
  • Tree: on error or assert prefer the error callback stored into the tree's current Callbacks, rather than the global Callbacks (PR #168).
  • detail::stack<>: improve behavior when assigning from objects Callbacks, test all rule-of-5 scenarios (PR #168).
  • Improve formatting of error messages.

Thanks


0.3.0

Github release: 0.3.0

Breaking changes

Despite ryml being still in a non-stable 0.x.y version, considerable effort goes into trying to avoid breaking changes. However, this release has to collect on the semantic versioning prerogative for breaking changes. This is a needed improvement, so sorry for any nuisance!

The allocation and error callback logic was revamped on the amalgamation PR. Now trees and parsers receive (and store) a full ryml::Callbacks object instead of the (now removed) ryml::Allocator which had a pointer to a (now removed) ryml::MemoryResourceCallbacks, which was a (now removed) ryml::MemoryResource. To be clear, the Callbacks class is unchanged, other than removing some unneeded helper methods.

These changes were motivated by unfortunate name clashes between c4::Allocator/ryml::Allocator and c4::MemoryResource/ryml::MemoryResource, occurring if <c4/allocator.hpp> or <c4/memory_resource.hpp> were included before <c4/yml/common.hpp>. They also significantly simplify this part of the API, making it really easier to understand.

As a consequence of the above changes, the global memory resource getters and setters for ryml were also removed: ryml::get_memory_resource()/ryml::set_memory_resource().

Here's an example of the required changes in client code. First the old client code (from the quickstart):

struct PerTreeMemoryExample : public ryml::MemoryResource
{
void *allocate(size_t len, void * hint) override;
void free(void *mem, size_t len) override;
};
ryml::Parser parser = {ryml::Allocator(&mrp)};
ryml::Tree tree1 = {ryml::Allocator(&mr1)};
ryml::Tree tree2 = {ryml::Allocator(&mr2)};
an example for a per-tree memory allocator
void * allocate(size_t len)
void free(void *mem, size_t len)

Should now be rewritten to:

{
ryml::Callbacks callbacks() const; // helper to create the callbacks
};
ryml::Parser parser = {mrp.callbacks()};
ryml::Tree tree1 = {mr1.callbacks()};
ryml::Tree tree2 = {mr2.callbacks()};
ryml::Callbacks callbacks() const
A c-style callbacks class to customize behavior on errors or allocation.
Definition common.hpp:546

New features

  • Add amalgamation into a single header file (PR #172):
    • The amalgamated header will be available together with the deliverables from each release.
    • To generate the amalgamated header:
      $ python tools/amalgamate.py ryml_all.hpp
      @encode
      - To use the amalgamated header:
      - Include at will in any header of your project.
      - In one - and only one - of your project source files, `#define RYML_SINGLE_HDR_DEFINE_NOW` and then `#include <ryml_all.hpp>`. This will enable the function and class definitions in the header file. For example, here's a sample program:
      @code{c++}
      #include <iostream>
      #define RYML_SINGLE_HDR_DEFINE_NOW // do this before the include
      #include <ryml_all.hpp>
      int main()
      {
      auto tree = ryml::parse("{foo: bar}");
      std::cout << tree["foo"].val() << "\n";
      }
  • Add Tree::change_type() and NodeRef::change_type() (PR #171):
    // clears a node and sets its type to a different type (one of `VAL`, `SEQ`, `MAP`):
    Tree t = parse("{keyval0: val0, keyval1: val1, keyval2: val2}");
    t[0].change_type(VAL);
    t[1].change_type(MAP);
    t[2].change_type(SEQ);
    Tree expected = parse("{keyval0: val0, keyval1: {}, keyval2: []}");
    assert(emitrs<std::string>(t) == emitrs<std::string>(expected));
  • Add support for compilation with emscripten (WebAssembly+javascript) (PR #176).

Fixes

  • Take block literal indentation as relative to current indentation level, rather than as an absolute indentation level (PR #178):
    foo:
    - |
    child0
    - |2
    child2 # indentation is 4, not 2
  • Fix parsing when seq member maps start without a key (PR #178):
    # previously this resulted in a parse error
    - - : empty key
    - - : another empty key
  • Prefer passing substr and csubstr by value instead of const reference (PR #171)
  • Fix #173: add alias target ryml::ryml (PR #174)
  • Speedup compilation of tests by removing linking with yaml-cpp and libyaml. (PR #177)
  • Fix c4core#53: cmake install targets were missing call to export() (PR #179).
  • Add missing export to Tree (PR #181).

Thanks


0.2.3

Github release: 0.2.3

This release is focused on bug fixes and compliance with the YAML test suite.

New features

  • Add support for CPU architectures aarch64, ppc64le, s390x.
  • Update c4core to 0.1.7
  • Tree and NodeRef: add document getter doc() and docref()
    Tree tree = parse(R"(---
    doc0
    ---
    doc1
    )");
    NodeRef stream = t.rootref();
    assert(stream.is_stream());
    // tree.doc(i): get the index of the i-th doc node.
    // Equivalent to tree.child(tree.root_id(), i)
    assert(tree.doc(0) == 1u);
    assert(tree.doc(1) == 2u);
    // tree.docref(i), same as above, return NodeRef
    assert(tree.docref(0).val() == "doc0");
    assert(tree.docref(1).val() == "doc1");
    // stream.doc(i), same as above, given NodeRef
    assert(stream.doc(0).val() == "doc0");
    assert(stream.doc(1).val() == "doc1");

Fixes

  • Fix compilation with C4CORE_NO_FAST_FLOAT (PR #163)

Flow maps

  • Fix parse of multiline plain scalars inside flow maps (PR #161):
    # test case UT92
    # all parsed as "matches %": 20
    - { matches
    % : 20 }
    - { matches
    %: 20 }
    - { matches
    %:
    20 }

Tags

  • Fix parsing of tags followed by comments in sequences (PR #161):
    # test case 735Y
    - !!map # Block collection
    foo : bar

Quoted scalars

  • Fix filtering of tab characters in quoted scalars (PR #161):
    ---
    # test case 5GBF
    "Empty line
    <TAB>
    as a line feed"
    # now correctly parsed as "Empty line\nas a line feed"
    ---
    # test case PRH3
    ' 1st non-empty
    <SPC>2nd non-empty<SPC>
    <TAB>3rd non-empty '
    # now correctly parsed as " 1st non-empty\n2nd non-empty 3rd non-empty "
  • Fix filtering of backslash characters in double-quoted scalars (PR #161):
    # test cases NP9H, Q8AD
    "folded<SPC>
    to a space,<TAB>
    <SPC>
    to a line feed, or <TAB>\
    \ <TAB>non-content"
    # now correctly parsed as "folded to a space,\nto a line feed, or \t \tnon-content"
  • Ensure filtering of multiline quoted scalars (PR #161):
    # all scalars now correctly parsed as "quoted string",
    # both for double and single quotes
    ---
    "quoted
    string"
    --- "quoted
    string"
    ---
    - "quoted
    string"
    ---
    - "quoted
    string"
    ---
    "quoted
    string": "quoted
    string"
    ---
    "quoted
    string": "quoted
    string"

Block scalars

  • Ensure no newlines are added when emitting block scalars (PR #161)
  • Fix parsing of block spec with both chomping and indentation: chomping may come before or after the indentation (PR #161):
    # the block scalar specs below now have the same effect.
    # test cases: D83L, P2AD
    - |2-
    explicit indent and chomp
    - |-2
    chomp and explicit indent
  • Fix inference of block indentation with leading blank lines (PR #161):
    # test cases: 4QFQ, 7T8X
    - >
    # child1
    # parsed as "\n\n child1"
    --- # test case DWX9
    |
    literal
    text
    # Comment
    # parsed as "\n\nliteral\n \n\ntext\n"
  • Fix parsing of same-indentation block scalars (PR #161):
    # test case W4TN
    # all docs have the same value: "%!PS-Adobe-2.0"
    --- |
    %!PS-Adobe-2.0
    ...
    --- >
    %!PS-Adobe-2.0
    ...
    --- |
    %!PS-Adobe-2.0
    ...
    --- >
    %!PS-Adobe-2.0
    ...
    --- |
    %!PS-Adobe-2.0
    --- >
    %!PS-Adobe-2.0
    --- |
    %!PS-Adobe-2.0
    --- >
    %!PS-Adobe-2.0
  • Folded block scalars: fix folding of newlines at the border of indented parts (PR #161):
    # test case 6VJK
    # now correctly parsed as "Sammy Sosa completed another fine season with great stats.\n\n 63 Home Runs\n 0.288 Batting Average\n\nWhat a year!\n"
    >
    Sammy Sosa completed another
    fine season with great stats.
    63 Home Runs
    0.288 Batting Average
    What a year!
    ---
    # test case MJS9
    # now correctly parsed as "foo \n\n \t bar\n\nbaz\n"
    >
    foo<SPC>
    <SPC>
    <SPC><TAB><SPC>bar
    baz
  • Folded block scalars: fix folding of newlines when the indented part is at the begining of the scalar (PR #161):
    # test case F6MC
    a: >2
    more indented
    regular
    # parsed as a: " more indented\nregular\n"
    b: >2
    more indented
    regular
    # parsed as b: "\n\n more indented\nregular\n"

Plain scalars

  • Fix parsing of whitespace within plain scalars (PR #161):
    ---
    # test case NB6Z
    key:
    value
    with
    tabs
    tabs
    foo
    bar
    baz
    # is now correctly parsed as "value with\ntabs tabs\nfoo\nbar baz"
    ---
    # test case 9YRD, EX5H (trailing whitespace)
    a
    b
    c
    d
    e
    # is now correctly parsed as "a b c d\ne"
  • Fix parsing of unindented plain scalars at the root level scope (PR #161)
    --- # this parsed
    Bare
    scalar
    is indented
    # was correctly parsed as "Bare scalar is indented"
    --- # but this failed to parse successfully:
    Bare
    scalar
    is not indented
    # is now correctly parsed as "Bare scalar is not indented"
    --- # test case NB6Z
    value
    with
    tabs
    tabs
    foo
    bar
    baz
    # now correctly parsed as "value with\ntabs tabs\nfoo\nbar baz"
    ---
    --- # test cases EXG3, 82AN
    ---word1
    word2
    # now correctly parsed as "---word1 word2"
  • Fix parsing of comments within plain scalars
    # test case 7TMG
    --- # now correctly parsed as "word1"
    word1
    # comment
    --- # now correctly parsed as [word1, word2]
    [ word1
    # comment
    , word2]

Python API

  • Add missing node predicates in SWIG API definition (PR #166):
    • is_anchor_or_ref()
    • is_key_quoted()
    • is_val_quoted()
    • is_quoted()

Thanks


0.2.2

Github release: 0.2.2

Yank python package 0.2.1, was accidentally created while iterating the PyPI submission from the Github action. This release does not add any change, and is functionally the same as 0.2.1.


0.2.1

Github release: 0.2.1

This release is focused on bug fixes and compliance with the YAML test suite.

Breaking changes

  • Fix parsing behavior of root-level scalars: now these are parsed into a DOCVAL, not SEQ->VAL (5ba0d56, from PR #144). Eg,
    ---
    this is a scalar
    --- # previously this was parsed as
    - this is a scalar
  • Cleanup type predicate API (PR #155)):
    • ensure all type predicates from Tree and NodeRef forward to the corresponding predicate in NodeType
    • remove all type predicates and methods from NodeData; use the equivalent call from Tree or NodeRef. For example, for is_map():
      Tree t = parse("{foo: bar}");
      size_t map_id = t.root_id();
      NodeRef map = t.rootref();
      t.get(map_id)->is_map(); // compile error: no longer exists
      assert(t.is_map(map_id)); // OK
      assert(map.is_map()); // OK
    • Further cleanup to the type predicate API will be done in the future, especially around the .has_*() vs corresponding .is_*() naming scheme.

New features & improvements

  • Tree::lookup_path_or_modify(): add overload to graft existing branches (PR #141)
  • Callbacks: improve test coverage (PR #141)
  • YAML test suite (PR #144, PR #145): big progress towards compliance with the suite. There are still a number of existing problems, which are the subject of ongoing work. See the list of current known failures in the test suite file.
  • Python wheels and source package are now uploaded to PyPI as part of the release process.

Fixes

Anchors and references

  • Fix resolving of nodes with keyref+valref (PR #144): {&a a: &b b, *b: *a}
  • Fix parsing of implicit scalars when tags are present (PR #145):
    - &a # test case PW8X
    - a
    - &a : a
    b: &b
    - &c : &a
    - ? &d
    - ? &e
    : &a
  • Fix #151: scalars beginning with * or & or << are now correctly quoted when emitting (PR #156).
  • Also from PR #156, map inheritance nodes like <<: *anchor or <<: [*anchor1, *anchor2] now have a KEYREF flag in their type (until a call to Tree::resolve()):
    Tree tree = parse("{map: &anchor {foo: bar}, copy: {<<: *anchor}}");
    assert(tree["copy"]["<<"].is_key_ref()); // previously this did not hold
    assert(tree["copy"]["<<"].is_val_ref()); // ... but this did

Tags

  • Fix parsing of tag dense maps and seqs (PR #144):
    --- !!map {
    k: !!seq [ a, !!str b],
    j: !!seq
    [ a, !!str b]
    --- !!seq [
    !!map { !!str k: v},
    !!map { !!str ? k: v}
    ]
    --- !!map
    !!str foo: !!map # there was a parse error with the multiple tags
    !!int 1: !!float 20.0
    !!int 3: !!float 40.0
    --- !!seq
    - !!map
    !!str k1: v1
    !!str k2: v2
    !!str k3: v3

Whitespace

  • Fix parsing of double-quoted scalars with tabs (PR #145):
    "This has a\ttab"
    # is now correctly parsed as "This has a<TAB>tab"
  • Fix filtering of leading and trailing whitespace within double-quoted scalars (PR #145):
    # test case 4ZYM, 7A4E, TL85
    "
    <SPC><SPC>foo<SPC>
    <SPC>
    <SPC><TAB><SPC>bar
    <SPC><SPC>baz
    "
    # is now correctly parsed as " foo\nbar\nbaz "
  • Fix parsing of tabs within YAML tokens (PR #145):
    ---<TAB>scalar # test case K54U
    ---<TAB>{} # test case Q5MG
    --- # test case DC7X
    a: b<TAB>
    seq:<TAB>
    - a<TAB>
    c: d<TAB>#X
  • Fix parsing of flow-style maps with ommitted values without any space (PR #145):
    # test case 4ABK
    - {foo: , bar: , baz: } # this was parsed correctly as {foo: ~, bar: ~, baz: ~}
    - {foo:, bar:, baz:} # ... but this was parsed as {'foo:': , 'bar:': ~, 'baz:': ~}

Scalars

  • Unescape forward slashes in double quoted string (PR #145):
    --- escaped slash: "a\/b" # test case 3UYS
    # is now parsed as:
    --- escaped slash: "a/b"
  • Fix filtering of indented regions in folded scalars (PR #145):
    # test case 7T8X
    - >
    folded
    line
    next
    line
    * bullet
    * list
    * lines
    last
    line
    is now correctly parsed as \nfolded line\nnext line\n * bullet\n\n * list\n * lines\n\nlast line\n.
  • Fix parsing of special characters within plain scalars (PR #145):
    # test case 3MYT
    k:#foo
    &a !t s
    !t s
    # now correctly parsed as "k:#foo &a !t s !t s"
  • Fix parsing of comments after complex keys (PR #145):
    # test case X8DW
    ? key
    # comment
    : value
    # now correctly parsed as {key: value}
  • Fix parsing of consecutive complex keys within maps (PR #145)
    # test case 7W2P, ZWK4
    ? a
    ? b
    c:
    ? d
    e:
    # now correctly parsed as {a: ~, b: ~, c: ~, d: ~, e: ~}
  • Fix #152: parse error with folded scalars that are the last in a container (PR #157):
    exec:
    command:
    # before the fix, this folded scalar failed to parse
    - |
    exec pg_isready -U "dog" -d "dbname=dog" -h 127.0.0.1 -p 5432
    parses: no
  • Fix: documents consisting of a quoted scalar now retain the VALQUO flag (PR #156)
    Tree tree = parse("'this is a quoted scalar'");
    assert(tree.rootref().is_doc());
    assert(tree.rootref().is_val());
    assert(tree.rootref().is_val_quoted());

Document structure

  • Empty docs are now parsed as a docval with a null node:
    --- # test cases 6XDY, 6ZKB, 9BXL, PUW8
    ---
    ---
    is now parsed as
    --- ~
    --- ~
    --- ~
  • Prevent creation of DOC nodes from stream-level comments or tags (PR #145):
    !foo "bar"
    ...
    # Global
    %TAG ! tag:example.com,2000:app/
    ---
    !foo "bar"
    was parsed as
    ---
    !foo "bar"
    ---
    # notice the empty doc in here
    ---
    !foo "bar"
    and it is now correctly parsed as
    ---
    !foo "bar"
    ---
    !foo "bar"
    (other than the known limitation that ryml does not do tag lookup).

General

  • Fix #147: serialize/deserialize special float values .nan, .inf, -.inf (PR #149)
  • Fix #142: preprocess_json(): ensure quoted ranges are skipped when slurping containers
  • Ensure error macros expand to a single statement (PR #141)
  • Update c4core to 0.1.4

Special thanks


0.2.0

Github release: 0.2.0

New features & improvements

  • Enable parsing into nested nodes (87f4184)
  • as_json() can now be called with tree and node id (4c23041)
  • Add Parser::reserve_stack() (f31fb9f)
  • Add uninstall target (PR #122)
  • Update c4core to v0.1.1
  • Add a quickstart sample with build examples.
  • Update README.md to refer to the quickstart
  • Add gdb visualizers
  • Add SO_VERSION to shared builds

Fixes

  • Fix #139: substr and csubstr not found in ryml namespace
  • Fix #131: resolve references to map keys
  • Fix #129: quoted strings starting with * parsed as references
  • Fix #128: segfault on nonexistent anchor
  • Fix #124: parse failure in comments with trailing colon
  • Fix #121: preserve quotes when emitting scalars
  • Fix #103: ambiguous parsing of null/empty scalars
  • Fix #90: CMAKE_CXX_STANDARD ignored
  • Fix #40: quadratic complexity from use of sscanf(f)
  • Fix emitting json to streams (dc6af83)
  • Set the global memory resource when setting global callbacks (511cba0)
  • Fix python packaging (PR #102)

Special thanks


0.1.0

Github release: 0.1.0

This is the first ryml release. Future releases will have a more organized changelog; for now, only recent major changes are listed.

Please be aware that there are still some anticipated breaking changes in the API before releasing the 1.0 major version. These are highlighted in the repo ROADMAP.

  • 2020/October
  • 2020/September
    • [Breaking change] MR#85 null values in YAML are now parsed to null strings instead of YAML null token "~":
      auto tree = parse("{foo: , bar: ''}");
      // previous:
      assert(tree["foo"].val() == "~");
      assert(tree["bar"].val() == "");
      // now:
      assert(tree["foo"].val() == nullptr); // notice that this is now null
      assert(tree["bar"].val() == "");
    • MR#85 Commas after tags are now allowed:
      {foo: !!str, bar: ''} # now the comma does not cause an error
    • MR#81: Always compile with extra pedantic warnings.
  • 2020/May
    • [Breaking change] the error callback now receives a source location object:
      // previous
      using pfn_error = void (*)(const char* msg, size_t msg_len, void *user_data);
      // now:
      using pfn_error = void (*)(const char* msg, size_t msg_len, Location location, void *user_data);
    • Parser fixes to improve test suite success: MR#73, MR#71, MR#68, MR#67, MR#66
    • Fix compilation as DLL on windows MR#69