Bug 3982

Summary: SQL string parsing is incorrect
Product: SQL Reporter: Sjoerd Mullender <sjoerd>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: normal    
Priority: Normal    
Version: 11.21.19 (Jul2015-SP4)   
Hardware: All   
OS: All   

Description Sjoerd Mullender cwiconfidential 2016-04-14 15:15:20 CEST
User-Agent:       Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build Identifier: 

The SQL standard has the following syntax description for strings:
<character string literal> ::=
   [ <introducer><character set specification> ]
      <quote> [ <character representation>... ] <quote>
      [ { <separator> <quote> [ <character representation>... ] <quote> }... ]
<character representation> ::=
   <nonquote character>
 | <quote> <quote>
<quote> ::=

Ignoring the character set specification, it is clear that \ is *not* a special character in SQL strings.  Only the quote (') is special, and it needs to be doubled to include it in a string.

The MonetDB SQL parser treats \ like it is to be treated in C.  There is no basis for that in the SQL standard.

Also see bug 3965, comment 5.

Reproducible: Always

Steps to Reproduce:
1.select '\101';
Actual Results:  

Expected Results:  
Comment 1 Sjoerd Mullender cwiconfidential 2016-04-20 10:26:51 CEST
Note that there is also the possibility (in the standard) of having Unicode strings, as in:
select U&'\0041';
--> 'A'
This is completely not implemented.
This, by the way, does seem to be the only place where the backslash ("reverse solidus") has any meaning in the SQL standard.
Comment 2 Sjoerd Mullender cwiconfidential 2020-07-10 22:36:45 CEST
The Unicode strings have been implemented.
The parsing of strings is now in a transitional phase.  We have E-strings (E'string content') that are parsed with backslash interpretation, we have R-strings (R'string content') that are parsed according to the SQL standard (apart from the initial R), and we have plain strings that are currently equivalent to E-strings, but that will at some point change to be equivalent to R-strings.
Comment 3 Sjoerd Mullender cwiconfidential 2020-07-10 22:40:30 CEST
By the way, the E and R prefixes can of course also be spelled with lowercase.