Design Doc Draft: rustfmt


Goals

  1. Auto reformat a rust source file to conform Rust Guidelines.

https://aturon.github.io/README.html

  1. Modularization

Non-goals

  1. High configurability
  2. Format code that can not compile

Design Choices

  1. Token tree or AST?

        I suggest we use AST which is more precise than regexp-based lexer.

I found that in the Rust Guidelines, “Guidelines by Rust feature” may be difficult to implement without AST. But we can start from style guides first.

        Pitfalls of AST:

  1. Hard to handle macros.

Specs

  1. Line Breaks
  1. One line must shorter than 99 characters, break down at proper point.
  1. Indentation
  1. Use 4 spaces for indentation, no tabs.
  2. Insert correct indentation for each line
  1. Space around tokens
  1. No trailing whitespaces at the end of lines or files. [EASY]
  2. Always use spaces around binary operators.
  3. A space after colons and commas.
  4. One line blocks or struct expressions: a space after the opening brace and before the closing brace.
  1. Conditions
  1. No unnecessary parentheses around `if` condition.

Algorithm

  1. Line Break
  2. Indentation
  3. Space around

Configuration

  1. Switch on reporting bad format lines or structures.

Reference

  1. clang-fmt design document: https://docs.google.com/document/d/1gpckL2U_6QuU9YW2L1ABsc4Fcogn5UngKk7fE5dDOoA/edit
  2. gofmt

        http://golang.org/src/cmd/gofmt/gofmt.go

  1. Rust Style Guidelines

https://aturon.github.io/README.html

  1. Other papers on pretty printing

        

        


Ideas


Question 1: Analyze the following toy program:

// test/case1.rs

fn main() {

  let mut i = 3;

  i = 4;

  println!("Hello world {}!", i);

}

My program gives following sample token stream output.

TokenAndSpan { tok: Comment, sp: Span { lo: BytePos(0), hi: BytePos(16), expn_id: ExpnId(4294967295) } }

TokenAndSpan { tok: Whitespace, sp: Span { lo: BytePos(16), hi: BytePos(17), expn_id: ExpnId(4294967295) } }

I can see two things are lost: the content of Comment and place of LineBreak.

For this, I can directly slice the bytes from source file to get Comment/Whitespace content, but I am not sure whether this is the best way.

Question 2: From the token stream, how to insert line breaks to keep line length smaller than 99?

  1. Start from a token stream. (DONE)
  2. Visit tokens one by one on each line. If one token overflows the line:

Look-ahead to check if breaking the line from beginning of the token can solve the problem (need to consider the correct indentation of new line).

  1. If it does solve the problem, insert a line break as well as correct indentation.
  2. If it does not solve the problem (e.g. too long String-literal), try to break from beginning of the String-literal.

Problem with this algorithm: If the line is only ~5 characters longer than limit, the new line will be super short which is not a charm. Breaking from the middle would be a better solution.

Another problem: for a large predicate we want to break down at boolean operators. There are several style options: before boolean operators or after boolean operators. But the algorithm above is unaware of boolean operators: it will break line at an arbitrary token.

So I think for each overflowed line, there are some tokens which have higher priority to insert break down point. For example, tokens in the 60%~70% of the line are more likely to have a line break, position after boolean operators are more likely to have line break.

// TODO: Read clang-fmt or gnu-indent to see how they solve this problem.

Question 3: How to get correct indentation for current new line?

Target before March 17 (Proposal submission):

Explicitly write down the algorithm for line break problem. (And possibly indentation problem)

Target before March 27 (Proposal deadline):

Implement the short line rule and make it work.