Crate json5format

Source
Expand description

A stylized formatter for JSON5 (“JSON for Humans”) documents.

The intent of this formatter is to rewrite a given valid JSON5 document, restructuring the output (if required) to conform to a consistent style.

The resulting document should preserve all data precision, data format representations, and semantic intent. Readability should be maintained, if not improved by the consistency within and across documents.

Most importantly, all JSON5 comments should be preserved, maintaining the positional relationship with the JSON5 data elements they were intended to document.

§Example

  use json5format::*;
  use maplit::hashmap;
  use maplit::hashset;

  let json5=r##"{
      "name": {
          "last": "Smith",
          "first": "John",
          "middle": "Jacob"
      },
      "children": [
          "Buffy",
          "Biff",
          "Balto"
      ],
      // Consider adding a note field to the `other` contact option
      "contact_options": [
          {
              "home": {
                  "email": "jj@notreallygmail.com",   // This was the original user id.
                                                      // Now user id's are hash values.
                  "phone": "212-555-4321"
              },
              "other": {
                  "email": "volunteering@serviceprojectsrus.org"
              },
              "work": {
                  "phone": "212-555-1234",
                  "email": "john.j.smith@worksforme.gov"
              }
          }
      ],
      "address": {
          "city": "Anytown",
          "country": "USA",
          "state": "New York",
          "street": "101 Main Street"
          /* Update schema to support multiple addresses:
             "work": {
                 "city": "Anytown",
                 "country": "USA",
                 "state": "New York",
                 "street": "101 Main Street"
             }
          */
      }
  }
  "##;

  let options = FormatOptions {
      indent_by: 2,
      collapse_containers_of_one: true,
      options_by_path: hashmap! {
          "/*" => hashset! {
              PathOption::PropertyNameOrder(vec![
                  "name",
                  "address",
                  "contact_options",
              ]),
          },
          "/*/name" => hashset! {
              PathOption::PropertyNameOrder(vec![
                  "first",
                  "middle",
                  "last",
                  "suffix",
              ]),
          },
          "/*/children" => hashset! {
              PathOption::SortArrayItems(true),
          },
          "/*/*/*" => hashset! {
              PathOption::PropertyNameOrder(vec![
                  "work",
                  "home",
                  "other",
              ]),
          },
          "/*/*/*/*" => hashset! {
              PathOption::PropertyNameOrder(vec![
                  "phone",
                  "email",
              ]),
          },
      },
      ..Default::default()
  };

  let filename = "new_contact.json5".to_string();

  let format = Json5Format::with_options(options)?;
  let parsed_document = ParsedDocument::from_str(&json5, Some(filename))?;
  let bytes: Vec<u8> = format.to_utf8(&parsed_document)?;

  assert_eq!(std::str::from_utf8(&bytes)?, r##"{
  name: {
    first: "John",
    middle: "Jacob",
    last: "Smith",
  },
  address: {
    city: "Anytown",
    country: "USA",
    state: "New York",
    street: "101 Main Street",

    /* Update schema to support multiple addresses:
       "work": {
           "city": "Anytown",
           "country": "USA",
           "state": "New York",
           "street": "101 Main Street"
       }
    */
  },

  // Consider adding a note field to the `other` contact option
  contact_options: [
    {
      work: {
        phone: "212-555-1234",
        email: "john.j.smith@worksforme.gov",
      },
      home: {
        phone: "212-555-4321",
        email: "jj@notreallygmail.com", // This was the original user id.
                                        // Now user id's are hash values.
      },
      other: { email: "volunteering@serviceprojectsrus.org" },
    },
  ],
  children: [
    "Balto",
    "Biff",
    "Buffy",
  ],
}
"##);

§Formatter Actions

When the options above are applied to the input, the formatter will make the following changes:

  • The formatted document will be indented by 2 spaces.
  • Quotes are removed from all property names (since they are all legal ECMAScript identifiers)
  • The top-level properties will be reordered to [name, address, contact_options]. Since property name children was not included in the sort order, it will be placed at the end.
  • The name properties will be reordered to [first, middle, last].
  • The properties of the unnamed object in array contact_options will be reordered to [work, home, other].
  • The properties of the work, home, and other objects will be reordered to [phone, email].
  • The children names array of string primitives will be sorted.
  • All elements (except the top-level object, represented by the outermost curly braces) will end with a comma.
  • Since the contact_options descendant element other has only one property, the other object structure will collapse to a single line, with internal trailing comma suppressed.
  • The line comment will retain its relative position, above contact_options.
  • The block comment will retain its relative position, inside and at the end of the address object.
  • The end-of-line comment after home/email will retain its relative location (appended at the end of the email value) and any subsequent line comments with the same vertical alignment are also retained, and vertically adjusted to be left-aligned with the new position of the first comment line.

§Formatter Behavior Details

For reference, the following sections detail how the JSON5 formatter verifies and processes JSON5 content.

§Syntax Validation

  • Structural syntax is checked, such as validating matching braces, property name-colon-value syntax, enforced separation of values by commas, properly quoted strings, and both block and line comment extraction.
  • Non-string literal value syntax is checked (null, true, false, and the various legal formats for JSON5 Numbers).
  • Syntax errors produce error messages with the line and column where the problem was encountered.

§Property Names

  • Duplicate property names are retained, but may constitute errors in higher-level JSON5 parsers or schema-specific deserializers.
  • All JSON5 unquoted property name characters are supported, including ‘$’ and ‘_’. Digits are the only valid property name character that cannot be the first character. Property names can also be represented as quoted strings. All valid JSON5 strings, if quoted, are valid property names (including multi-line strings and quoted numbers).

Example:

    $_meta_prop: 'Has "double quotes" and \'single quotes\' and \
multiple lines with escaped \\ backslash',

§Literal Values

  • JSON5 supports quoting strings (literal values or quoted property names) by either double (“) or single (’) quote. The formatter does not change the quotes. Double-quoting is conventional, but single quotes may be used when quoting strings containing double-quotes, and leaving the single quotes as-is is preferred.
  • JSON5 literal values are retained as-is. Strings retain all spacing characters, including escaped newlines. All other literals (unquoted tokens without spaces, such as false, null, 0.234, 1337, or l33t) are not interpreted syntactically. Other schema-based tools and JSON5 deserializers may flag these invalid values.

§Optional Sorting

  • By default, array items and object properties retain their original order. (Some JSON arrays are order-dependent, and sorting them indiscriminantly might change the meaning of the data.)
  • The formatter can automatically sort array items and object properties if enabled via FormatOptions:
    • To sort all arrays in the document, set FormatOptions.sort_array_items to true
    • To sort only specific arrays in the target schema, specify the schema location under FormatOptions.options_by_path, and set its SortArrayItems option.
    • Properties are sorted based on an explicit user-supplied list of property names in the preferred order, for objects at a specified path. Specify the object’s location in the target schema using FormatOptions.options_by_path, and provide a vector of property name strings with the PropertyNameOrder option. Properties not included in this option retain their original order, behind the explicitly ordered properties, if any.
  • When sorting array items, the formatter only sorts array item literal values (strings, numbers, bools, and null). Child arrays or objects are left in their original order, after sorted literals, if any, within the same array.
  • Array items are sorted in case-insensitive unicode lexicographic order. (Note that, since the formatter does not parse unquoted literals, number types cannot be sorted numerically.) Items that are case-insensitively equal are re-compared and ordered case-sensitively with respect to each other.

§Associated Comments

  • All comments immediately preceding an element (value or start of an array or object), and trailing line comments (starting on the same line as the element, optionally continued on successive lines if all line comments are left-aligned), are retained and move with the associated item if the item is repositioned during sorting.
  • All line and block comments are retained. Typically, the comments are re-aligned vertically (indented) with the values with which they were associated.
  • A single line comment appearing immediately after a JSON value (primitive or closing brace), on the same line, will remain appended to that value on its line after re-formatting.
  • Spaces separate block comments from blocks of contiguous line comments associated with the same entry.
  • Comments at the end of a list (after the last property or item) are retained at the end of the same list.
  • Block comments with lines that extend to the left of the opening “/*” are not re-aligned.

§Whitespace Handling

  • Unicode characters are allowed, and unicode space characters should retain their meaning according to unicode standards.
  • All spaces inside single- or multi-line strings are retained. All spaces in comments are retained except trailing spaces at the end of a line.
  • All other original spaces are removed.

Macros§

  • Create a TestFailure error including the source file location of the macro call.

Structs§

  • Represents a JSON5 array of items. During parsing, this object’s state changes, as comments and items are encountered. Parsed comments are temporarily stored in contained_comments, to be transferred to the next parsed item. After the last item, if any other comments are encountered, those comments are retained in the contained_comments field, to be restored during formatting, after writing the last item.
  • A struct containing all comments associated with a specific Value.
  • Options that change the style of the formatted JSON5 output.
  • A JSON5 formatter that parses a valid JSON5 input buffer and produces a new, formatted document.
  • A location within a document buffer or document file. This module uses Location to identify to refer to locations of JSON5 syntax errors, while parsing) and also to locations in this Rust source file, to improve unit testing output.
  • A specialized struct to represent the data of JSON5 object, including any comments placed at the end of the object.
  • Represents the parsed state of a given JSON5 document.
  • Represents a primitive value in a JSON5 object property or array item. The parsed value is stored as a formatted string, retaining its original format, and written to the formatted document just as it appeared.
  • Represents a name-value pair for a field in a JSON5 object.

Enums§

  • Represents the variations of allowable comments.
  • Errors produced by the json5format library.
  • Options that can be applied to specific objects or arrays in the target JSON5 schema, through FormatOptions.options_by_path. Each option can be set at most once per unique path.
  • Represents the possible data types in a JSON5 object. Each variant has a field representing a specialized struct representing the value’s data, and a field for comments (possibly including a line comment and comments appearing immediately before the value). For Object and Array, comments appearing at the end of the the structure are encapsulated inside the appropriate specialized struct.

Functions§

  • Format a JSON5 document, applying a consistent style, with given options.