Enhanced String Handling for Java
(Version 1.1)

Change History
Version
Description
1.0
Initial version
1.1
Changed Context Annotations section to introduce generic @DSL instead of hardcoded annotation types, upon suggestion from Casper Bang
1.2
Standardized on $" as the prefix for unprocessed strings, """ as the one for multiline strings,  went back to DSL-specific context annotations plus a generic @DSL for extensibility. Also, added String.collapseWhitespace() to provide equivalent functionality to what YAML provides for whitespace handling



The current basic String variable syntax in Java suffers from two shortcomings which are not present in other languages, such as C#, Python or Groovy.

Namely
a) the ability to define string literals where escape sequences should not be processed (e.g. regular expressions or Windows filesystem paths)
b) the ability to define multi-line strings

Un-Escaped Strings

Let's look at this this typical regex expression for checking if a string is JPG, GIF or PNG file name:

.*(\.[Jj][Pp][Gg]|\.[Gg][Ii][Ff]|\.[Jj][Pp][Ee][Gg]|\.[Pp][Nn][Gg])

In Java this already complex expression becomes even more complex, since all backward slashed need to be escaped:

String regex = ".*(\\.[Jj][Pp][Gg]|\\.[Gg][Ii][Ff]|\\.[Jj][Pp][Ee][Gg]|\\.[Pp][Nn][Gg])";

Or let's take this Windows file path

C:\Program Files\My Corporation\My Application

In Java the String needs to look like:

String path = "C:\\Program Files\\My Corporation\\My Application";

This issue is solved in different languages usually by prefixing the String literal with a special character, e.g.

a) in C# it's a @:

String path = @"C:\Program Files\My Corporation\My Application";

b) in Python, it's r':

r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'

Proposed Java solution:

    Use the C# approach, but instead of "@" (which is associated with annotations in Java) use "$", e.g.

    String path = $"C:\Program Files\My Corporation\My Application";
    String regex = $"
.*(\.[Jj][Pp][Gg]|\.[Gg][Ii][Ff]|\.[Jj][Pp][Ee][Gg]|\.[Pp][Nn][Gg])";


Multi-line Strings

For multiline String, we will adopt the """ (tripe double quotes) syntax, like in Scala and Groovy.

Let's look at a typical SQL query embedded in Java code:

String sql =
   "SELECT col1, col2, col3 " +
   "FROM table1 " +
   "WHERE col1 = 'Test' " +
   "AND col2 = ? " +
   "ORDER BY col1 ";

Wouldn't it look much better like this:

String sql = """
    SELECT col1, col2, col3
    FROM TABLE 1
    WHERE col1 = 'Test'
    AND col2 = ?
    ORDER BY col1""";


The tricky question is how do you handle whitespace in these types of Strings. Here I believe we should take a lesson from the YAML file format, in particular it's handling of block literals:
http://en.wikipedia.org/wiki/YAML#Block_literals

Basically, in a typical scenario it would take the first indent of the first line of String and use that as the base. So in our example above, the multi-line String would look exactly like this (note the embedded newlines in red):
 
SELECT col1, col2, col3\n
FROM TABLE 1
\n
WHERE col1 = 'Test'\n
AND col2 = ?\n
ORDER BY col1

This pretty much follows the standard YAML format for preserved newlines. However, YAML offers an extra feature where it can collapse all the whitespace and newlines (which are replaced with a single space) and just process it as one long string, even if it is entered on multiple lines:
http://en.wikipedia.org/wiki/YAML#Newlines_folded

This could be accomplished by adding a collapseWhitespace() method directly to the String class, so that you could do:

String sql = """
    SELECT col1, col2, col3
    FROM TABLE 1
    WHERE col1 = 'Test'
    AND col2 = ?
    ORDER BY col1""".collapseWhitespace();


which would give us:

SELECT col1, col2, col3 FROM TABLE 1 WHERE col1 = 'Test' AND col2 = ? ORDER BY col1


Syntax Corner Cases

a) text entered on the starting line of a multi line String:

String sql = """SELECT col1, col2,
   FROM table1
      WHERE col1 ='Test'""";


This would take the first newline, replace it with a space and then use the indent of the second line as the base for furthe parsing, i.e.

SELECT col1, col2,\n
FROM table1\n
    WHERE col1='Test'


b) text entered with inconsistent leading tabs, e.g.

String sql = """SELECT col1, col2,
\t\tFROM table1
\t   \tWHERE col1 ='Test'""";


if the compiler finds inconsistent leading tabs\space sequences from the first line and any subsequent lines it will just reject it with a compiler error.

However, consistent tabs would be allowed, e.g.

String sql = """SELECT col1, col2,
\t\tFROM table1
\t\t\tWHERE col1 ='Test'""";


would be properly parsed as:

SELECT col1, col2,
FROM table1
\tWHERE col1 ='Test'""";


Context Annotation

In order for IDEs to properly interpret the embedded strings, it would be helpful to provide some info on what type of data is embedded in those Strings.
This could be accomplished via a set of predefined annotations, as well as an extensible generic @DSL annotation, e.g.

String sql = @SQL :"""
    SELECT col1, col2, col3
    FROM TABLE 1
    WHERE col1 = 'Test'
    AND col2 = ?
    ORDER BY col1""";

or a custom DSL:

String sql = @DSL("SwingBuilder") :"""
    JFrame(title="MyFrame",defaultCloseOperation=closeOnExit):
         - JMenuBar(name=menuBar):
              - JMenuItem(name=saveMenu, text="Save\tCtrl+S", onAction=save)
""";


This would allow an IDE to interpret it and provide proper syntax highlighting (and maybe even context-sensitive code completion, e.g. table/column names like in a database editor).

Suggested default DSL annotations:


Looking for community input and feedback:

This is just a first stab at the specification. I look forward to hearing community feedback so that the best, cleanest and most natural Java-style syntax can be achieved.
This can be later implemented as part of the Kijaro project.

Please post any comments to the Kijaro mailing list.