nommy

nommy is a type based parsing crate that features a derive macro to help you utilise the power of rust and nommy

use nommy::{parse, text::*, Parse};

type Letters = AnyOf1<"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ">;

#[derive(Debug, Parse, PartialEq)]
#[nommy(prefix = Tag<"struct">)]
#[nommy(ignore = WhiteSpace)]
struct StructNamed {
    #[nommy(parser = Letters)]
    name: String,

    #[nommy(prefix = Tag<"{">, suffix = Tag<"}">)]
    fields: Vec<NamedField>,
}

#[derive(Debug, Parse, PartialEq)]
#[nommy(suffix = Tag<",">)]
#[nommy(ignore = WhiteSpace)]
struct NamedField {
    #[nommy(parser = Letters)]
    name: String,

    #[nommy(prefix = Tag<":">, parser = Letters)]
    ty: String,
}

fn main() {
    let input = "struct Foo {
        bar: Abc,
        baz: Xyz,
    }";

    let struct_: StructNamed = parse(input.chars()).unwrap();
    assert_eq!(
        struct_,
        StructNamed {
            name: "Foo".to_string(),
            fields: vec![
                NamedField {
                    name: "bar".to_string(),
                    ty: "Abc".to_string(),
                },
                NamedField {
                    name: "baz".to_string(),
                    ty: "Xyz".to_string(),
                },
            ]
        }
    );
}

Ideology

nommy has three main concepts worth exploring before we get into the usage

  1. Buffers
  2. Parsers
  3. Peekers

Buffers

Buffer is trait that wraps an Iterator. It extends upon this by requiring two extra methods


#![allow(unused)]
fn main() {
/// eagerly drops the first `n` elements in the buffer
fn fast_forward(&mut self, n: usize);

/// finds the `i`th element in the iterator, storing any read elements into a buffer for later access
fn peek_ahead(&mut self, i: usize) -> Option<T>
}

With these two method, the trait can implement a third method, cursor which returns a new Buffer type Cursor.

Cursor reads from a buffer only using peek_ahead. It ensures that any data read through the buffer can be read again in future.


#![allow(unused)]
fn main() {
use nommy::{Buffer, IntoBuf};
let mut buffer = (0..).into_buf();
let mut cursor1 = buffer.cursor();

// cursors act exactly like an iterator
assert_eq!(cursor1.next(), Some(0));
assert_eq!(cursor1.next(), Some(1));

// cursors can be made from other cursors
let mut cursor2 = cursor1.cursor();
assert_eq!(cursor2.next(), Some(2));
assert_eq!(cursor2.next(), Some(3));

// child cursors do not move the parent's iterator position
assert_eq!(cursor1.next(), Some(2));

assert_eq!(buffer.next(), Some(0));
}

If you read from a cursor and decide that you won't need to re-read that contents again, you can call fast_forward_parent. This takes how many elements ahead the Cursor has read, and calls the parent buffer's fast_forward method with it.


#![allow(unused)]
fn main() {
use nommy::{Buffer, IntoBuf};
let mut input = "foobar".chars().into_buf();
let mut cursor = input.cursor();
assert_eq!(cursor.next(), Some('f'));
assert_eq!(cursor.next(), Some('o'));
assert_eq!(cursor.next(), Some('o'));

// Typically, the next three calls to `next` would repeat
// the first three calls because cursors read non-destructively.
// However, this method allows to drop the already-read contents
cursor.fast_forward_parent();
assert_eq!(input.next(), Some('b'));
assert_eq!(input.next(), Some('a'));
assert_eq!(input.next(), Some('r'));
}

The standard implementation of Buffer is Buf, and can be created from any type that implements IntoIterator.

Parsers

Parse is a trait that defines how to go from a Buffer to a value. It is defined as the following


#![allow(unused)]
fn main() {
pub trait Parse<T>: Sized {
    fn parse(input: &mut impl Buffer<T>) -> eyre::Result<Self>;
    
    // Covered in the next section
    fn peek(input: &mut impl Buffer<T>) -> bool {
        Self::parse(input).is_ok()
    }
}
}

Parse isn't much on it's own, but it's the basis around the rest of this crate. We piggy-back off of eyre for error handling, as parsers may have several nested levels of errors and handling those with specific error types can get very complicated.

Example

This example implementation of Parse reads from a char Buffer, parsing a representation of a string.


#![allow(unused)]
fn main() {
/// StringParser parses a code representation of a string
struct StringParser(String);
impl Parse<char> for StringParser {
    fn parse(input: &mut impl Buffer<char>) -> eyre::Result<Self> {
        // ensure the first character is a quote mark
        if input.next() != Some('\"') {
            return Err(eyre::eyre!("starting quote not found"));
        }

        let mut output = String::new();
        let mut escaped = false;

        // read from the input until the ending quote is found
        for c in input {
            match (c, escaped) {
                ('\"', true) => output.push('\"'),
                ('n', true) => output.push('\n'),
                ('r', true) => output.push('\r'),
                ('t', true) => output.push('\t'),
                ('\\', true) => output.push('\\'),
                (c, true) => return Err(eyre::eyre!("unknown escaped character code \\{}", c)),

                ('\"', false) => return Ok(Self(output)),
                ('\\', false) => {
                    escaped = true;
                    continue;
                }
                (c, false) => output.push(c),
            }
            escaped = false;
        }

        Err(eyre::eyre!("ending quote not found"))
    }
}
}

Peekers

Hidden in the Parse definition in the previous chapter is the peek method. It's definition is almost exactly the same as parse, but instead of returning Result<Self>, it returns bool. It's supposed to be a faster method of determining whether a given input could be parsed. A lot of the built in parsers utilise [peek] under the hood to resolve branches.


#![allow(unused)]
fn main() {
pub trait Parse<T>: Sized {
    fn parse(input: &mut impl Buffer<T>) -> eyre::Result<Self>;
    
    fn peek(input: &mut impl Buffer<T>) -> bool {
        // Default impl - override for better performance
        Self::parse(input).is_ok()
    }
}
}

Example

This is the same example from the Parsers section, but instead implementing peek. It follows a very similar implementation, but avoids a lot of the heavy work, such as dealing with errors and saving the chars to the string buffer


#![allow(unused)]
fn main() {
/// StringParser parses a code representation of a string
struct StringParser(String);
impl Parse<char> for StringParser {
    fn parse(input: &mut impl Buffer<char>) -> Result<Self> {
        unimplemented!()
    }
    
    fn peek(input: &mut impl Buffer<char>) -> bool {
        // ensure the first character is a quote mark
        if input.next() != Some('\"') {
            return false;
        }

        let mut escaped = false;

        // read from the input until the ending quote is found
        for c in input {
            match (c, escaped) {
                ('\"', true) => { escaped = false }
                ('n', true) => { escaped = false }
                ('r', true) => { escaped = false }
                ('t', true) => { escaped = false }
                ('\\', true) => { escaped = false }
                (c, true) => return false,
                ('\"', false) => return true,
                ('\\', false) => { escaped = true; }
                _ => {},
            }
        }

        false
    }
}
}

Basic Parsers

nommy provides a set of basic parsers to handle a lot of standard situations. A lot of these makes use of rust's new const generics

Tag

Tag matches an exact string or byte slice in the input buffer.


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, text::Tag};
let mut buffer = "foobar".chars().into_buf();
assert!(Tag::<"foo">::peek(&mut buffer));
assert!(Tag::<"bar">::peek(&mut buffer));
assert!(buffer.next().is_none());
}

OneOf

OneOf matches one character or byte that is contained within the pattern string.


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, text::OneOf};
let mut buffer = "bC".chars().into_buf();
assert_eq!(OneOf::<"abcd">::parse(&mut buffer).unwrap().into(), 'b');
assert_eq!(OneOf::<"ABCD">::parse(&mut buffer).unwrap().into(), 'C');
assert!(buffer.next().is_none());
}

AnyOf

AnyOf matches as many characters or bytes that are contained within the pattern string as possible.


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, text::AnyOf};
let mut buffer = "dbacCBAD".chars().into_buf();
assert_eq!(&AnyOf::<"abcd">::parse(&mut buffer).unwrap().into(), "dbac");
assert_eq!(&AnyOf::<"ABCD">::parse(&mut buffer).unwrap().into(), "CBAD");
assert!(buffer.next().is_none());
}

AnyOf1

AnyOf1 matches as many characters or bytes that are contained within the pattern string as possible, requiring at least 1 value to match.


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, text::AnyOf1};
let mut buffer = "dbacCBAD".chars().into_buf();
assert_eq!(&AnyOf1::<"abcd">::parse(&mut buffer).unwrap().into(), "dbac");
assert_eq!(&AnyOf1::<"ABCD">::parse(&mut buffer).unwrap().into(), "CBAD");
assert!(buffer.next().is_none());
}

WhileNot1

WhileNot1 matches as many characters or bytes that are not contained within the pattern string as possible, requiring at least 1 value to match.


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, text::WhileNot1};
let mut buffer = "hello world!".chars().into_buf();
assert_eq!(&WhileNot1::<".?!">::parse(&mut buffer).unwrap().into(), "hello world");
assert_eq!(buffer.next(), Some('!'));
}

Vec

Vec parses P as many times as it can.


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, text::Tag};
let mut buffer = "...!".chars().into_buf();
assert_eq!(Vec::<Tag<".">>::parse(&mut buffer).unwrap().len(), 3);
assert_eq!(buffer.next(), Some('!'));
}

Vec1

Vec1 parses P as many times as it can, requiring at least 1 match.


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, Vec1, text::Tag};
let mut buffer = "...!".chars().into_buf();
assert_eq!(Vec1::<Tag<".">>::parse(&mut buffer).unwrap().len(), 3);

// assert_eq!(buffer.next(), Some('!'));
Vec1::<Tag<".">>::parse(&mut buffer).unwrap_err()
}

Derive Parse

If writing Parse impls is getting a bit tiresome for some basic situations, you can make use of the derive macro provided to implement a lot of standard situations


#![allow(unused)]
fn main() {
use nommy::{IntoBuf, Parse, text::Tag};
#[derive(Parse)]
pub struct FooBar {
    foo: Tag<"foo">,
    bar: Tag<"bar">,
}

let mut buffer = "foobar".chars().into_buf();
assert!(FooBar::peek(&mut buffer));
assert!(buffer.next().is_none());
}

Struct

There are 3 different types of struct in rust, all of them are supported by derive(Parse) in a very similar way

Named Struct

This is the standard struct that people think about,


#![allow(unused)]
fn main() {
use nommy::{Parse};
#[derive(Parse)]
pub struct FooBar {
    foo: Tag<"foo">,
    bar: Tag<"bar">,
}
}

This will parse the text "foo", then the text "bar". Order matters. If any single field returns an error when parsing, then the struct returns an error too.

Unnamed/Tuple Struct

Rust also provides unnamed structs that are essentially the same, but have unnamed fields


#![allow(unused)]
fn main() {
use nommy::{Parse};
#[derive(Parse)]
pub struct FooBar (
    Tag<"foo">,
    Tag<"bar">,
);
}

This parses exactly the same as the named variety

Unit Struct

Lastly, rust provides unit structs. While these may seem useless in parsing, they do have uses when you configure how the macro should implement Parse.


#![allow(unused)]
fn main() {
use nommy::{Parse};
#[derive(Parse)]
pub struct Unit;
}

This currently parses nothing, the configuration section lets you expand the functionality.

Enum

An enums parser attempts to parse each variant. The first variant that succeeds to parse is the variant that is returned. If no variant could be parsed, the parser returns an error

Example


#![allow(unused)]
fn main() {
use nommy::{Parse};
#[derive(Parse)]
pub enum FooOrBar {
    Foo(Tag<"foo">),
    Bar(Tag<"bar">),
}
}

This can either parse "foo" or "bar", but not both.

First come first serve


#![allow(unused)]
fn main() {
use nommy::{Parse};
#[derive(Parse)]
pub enum OnlyFoo {
    Foo(Tag<"foo">),
    Foob(Tag<"foob">),
}
}

In this example, the OnlyFoo enum can never parse into a variant of Foob. To see why, let's put in the input "foob".

Since enum parsers try to parse each variant in order, it will first try to parse the Foo variant. This will match the input "foo", and that is indeed found in the input sequence, therefore the result is OnlyFoo::Foo and the input sequence will have 'b' remaining.

One way to solve this is to swap the order, however that might not always be possible. It might be possible to configure greedy evaluation in the future, however that is currently not possible.

Variant types

There are 3 types of variant in a rust enum. These are analagous to the structs described in the previous chapter.


#![allow(unused)]
fn main() {
use nommy::{Parse};
#[derive(Parse)]
pub enum ExampleEnum {
    NamedVariant{
        foo: Tag<"foo">,
        bar: Tag<"bar">,
    },

    UnnamedVariant(
        Tag<"foo">,
        Tag<"bar">,
    ),

    UnitVariant,
}
}

Configuration

The derive(Parse) macro makes use of the nommy attribute to configure how the Parse implementation is created. There are two different types of attribute. type attributes and field attributes.

If the attribute is on a struct, enum or an enum variant definition, then it will be a type attribute. Otherwise, if the attribute is on a field definition, it will be a field attribute.

You can repeat many attribute blocks, or repeat attribute rules within the same attribute block, eg:

#[nommy(prefix = Tag<"(">, suffix = Tag<")">)]

// is the same as

#[nommy(prefix = Tag<"(">)]
#[nommy(suffix = Tag<")">)]

Example

This code example indicates which attributes are understood as type attributes, and which are field attributes

use nommy::{Parse, text::Tag};

/// Named struct FooBar
#[derive(Parse)]
#[nommy("TYPE", "TYPE")]
pub struct FooBar {
    #[nommy("FIELD")]
    foo: Tag<"foo">,

    #[nommy("FIELD")]
    #[nommy("FIELD")]
    bar: Tag<"bar">,
}

/// Tuple struct Baz123
#[derive(Parse)]
#[nommy("TYPE")]
#[nommy("TYPE")]
pub struct Baz123 (
    #[nommy("FIELD", "FIELD")]
    Tag<"baz">,

    #[nommy("FIELD")]
    #[nommy("FIELD")]
    Tag<"123">,
);

/// Enum struct FooBarBaz123
#[derive(Parse)]
#[nommy("TYPE")]
#[nommy("TYPE")]
pub struct FooBarBaz123 (
    #[nommy("TYPE")]
    FooBar{
        #[nommy("FIELD")]
        foo: Tag<"foo">,

        #[nommy("FIELD")]
        #[nommy("FIELD")]
        bar: Tag<"bar">,
    },

    #[nommy("TYPE")]
    Baz123(
        #[nommy("FIELD")]
        Tag<"baz">,

        #[nommy("FIELD")]
        #[nommy("FIELD")]
        Tag<"123">,
    ),

    #[nommy("TYPE")]
    None,
);

Type Attributes

There's currently only 3 supported type attributes

Ignore

ignore lets you specify how to parse the tokens that you don't care about.

For example, ignoring whitespace:


#![allow(unused)]
fn main() {
use nommy::{Parse, IntoBuf, text::{Tag, WhiteSpace}};
#[derive(Parse)]
#[nommy(ignore = WhiteSpace)]
pub struct FooBar(
    Tag<"foo">,
    Tag<"bar">,
);

let mut buffer = "foo   bar\t".chars().into_buf();
FooBar::parse(&mut buffer).unwrap();
// ignore also parses the trailing tokens
assert!(buffer.next().is_none());
}

Warning

If the type you give to ignore can parse 0 tokens, then the program will loop forever. In the future there might be checks in place to automatically exit when empty parsers succeed (or panic?)

Prefix/Suffix

prefix and suffix define the parser that you expect to match before we attempt to parse the value we care about.


#![allow(unused)]
fn main() {
use nommy::{Parse, IntoBuf, text::Tag};

#[derive(Parse)]
#[nommy(prefix = Tag<"(">, suffix = Tag<")">)]
pub struct Bracketed(
    Tag<"foo">,
    Tag<"bar">,
);

let mut buffer = "(foobar)".chars().into_buf();
Bracketed::parse(&mut buffer).unwrap();
assert!(buffer.next().is_none());
}

Field Attributes

There's currently only 4 supported field attributes

Parser

parser lets you specify how to parse the input into the type specified.

For example, parsing letters into a string:


#![allow(unused)]
fn main() {
use nommy::{Parse, IntoBuf, text::AnyOf1};

type Letters = AnyOf1<"abcdefghijklmnopqrstuvwxyz">;

#[derive(Debug, PartialEq)]
#[derive(Parse)]
pub struct Word (
    #[nommy(parser = Letters)]
    String,
);

let mut buffer = "foo bar".chars().into_buf();
assert_eq!(Word::parse(&mut buffer).unwrap(), Word("foo".to_string()));
}

This works because Letters implements Into<String>.

Prefix/Suffix

prefix and suffix define the parser that you expect to match before we attempt to parse the value we care about.


#![allow(unused)]
fn main() {
use nommy::{Parse, IntoBuf, text::{Tag, AnyOf1, Space}};

type Numbers = AnyOf1<"0123456789">;

#[derive(Debug, PartialEq)]
#[derive(Parse)]
#[nommy(ignore = Space)]
pub struct Add(
    #[nommy(parser = Numbers)]
    String,

    #[nommy(prefix = Tag<"+">)]
    #[nommy(parser = Numbers)]
    String,
);

let mut buffer = "4 + 7".chars().into_buf();
assert_eq!(
    Add::parse(&mut buffer).unwrap(),
    Add("4".to_string(), "7".to_string()),
);
assert!(buffer.next().is_none());
}

Inner Parser

inner_parser lets you specify how to parse the input into the vec type specified.

For example, parsing letters into a string:


#![allow(unused)]
fn main() {
use nommy::{Parse, IntoBuf, text::OneOf};

type Letter = OneOf<"abcdefghijklmnopqrstuvwxyz">;

#[derive(Debug, PartialEq)]
#[derive(Parse)]
pub struct Letters (
    #[nommy(inner_parser = Letter)]
    Vec<char>,
);

let mut buffer = "foo bar".chars().into_buf();
assert_eq!(Letters::parse(&mut buffer).unwrap(), Letters(vec!['f', 'o', 'o']));
}

This is necessary because Vec<P> does not implement Into<Vec<Q>> even if P: Into<Q>.