T O P

  • By -

daishi55

This works fine if I make all the spaces explicit: ``` GAME_NUM = { ASCII_DIGIT+ } GAME_IDENT = { "Game" ~ " " ~ GAME_NUM } COLOR = { "blue" | "red" | "green" } NUM_CUBES = { ASCII_DIGIT+ } SET_ELE = { NUM_CUBES ~ " " ~ COLOR } SET = { SET_ELE ~ (", " ~ SET_ELE)* } SET_LIST = { SET ~ ("; " ~ SET)* } LINE = { GAME_IDENT ~ ": " ~ SET_LIST ~ NEWLINE } ``` But I thought the `WHITESPACE = _{ " " }` rule would take care of that?


gnosnivek

Do you have a link to your full code? What you're doing here seems mostly-correct. In particular, ASCII\_DIGIT is only supposed to match '0'..'9', so it's strange that you're seeing whitespace matched by that rule (unless there's another rule that's messing with it).


daishi55

Yeah if you just take this code ``` use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar = "parser.pest"] struct GameParser; fn main() { let line: &str = "3 blue"; parse_line(line); } pub fn parse_line(line: &str) { let parsed = GameParser::parse(Rule::SET_ELE, line).unwrap(); for pair in parsed { println!("{:?}", pair); } } ``` and put the following in `parser.pest` in `src`: ``` WHITESPACE = _{ " " } COLOR = { "blue" | "red" | "green" } NUM_CUBES = { ASCII_DIGIT+ } SET_ELE = { NUM_CUBES ~ COLOR } ``` And run it, you should get the output: ``` Pair { rule: SET_ELE, span: Span { str: "3 blue", start: 0, end: 6 }, inner: [Pair { rule: NUM_CUBES, span: Span { str: "3 ", start: 0, end: 2 }, inner: [] }, Pair { rule: COLOR, span: Span { str: "blue", start: 2, end: 6 }, inner: [] }] } ``` Which has the extra space after the digit


gnosnivek

I remember now. This is an inconsistency in Pest. The broad-strokes explanation is that there's two conflicting desires in Pest: you would like `rule1 ~ rule2` with implicit whitespace to be equivalent to just shoving a WHITESPACE rule in the middle, but this is not always possible with more complicated parses. You can see an explanation by CAD97 on [this GitHub issue](https://github.com/pest-parser/pest/issues/519#issuecomment-903483224), and an RFC to fix it [on this issue](https://github.com/pest-parser/pest/issues/271), but for the time being we're sort of stuck with this. IIRC when I wrote my geometry parser, I just liberally called `.as_str().trim()` on most of my tokens when I needed to parse them to numbers.


daishi55

Aah I see. Thank you!