Bug 3982 - SQL string parsing is incorrect
Summary: SQL string parsing is incorrect
Status: NEW
Alias: None
Product: SQL
Classification: Unclassified
Component: all (show other bugs)
Version: 11.21.19 (Jul2015-SP4)
Hardware: All All
: Normal normal
Assignee: SQL devs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-14 15:15 CEST by Sjoerd Mullender
Modified: 2020-07-10 22:40 CEST (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sjoerd Mullender cwiconfidential 2016-04-14 15:15:20 CEST
User-Agent:       Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build Identifier: 

The SQL standard has the following syntax description for strings:
<character string literal> ::=
   [ <introducer><character set specification> ]
      <quote> [ <character representation>... ] <quote>
      [ { <separator> <quote> [ <character representation>... ] <quote> }... ]
<character representation> ::=
   <nonquote character>
 | <quote> <quote>
<quote> ::=
   '

Ignoring the character set specification, it is clear that \ is *not* a special character in SQL strings.  Only the quote (') is special, and it needs to be doubled to include it in a string.

The MonetDB SQL parser treats \ like it is to be treated in C.  There is no basis for that in the SQL standard.

Also see bug 3965, comment 5.

Reproducible: Always

Steps to Reproduce:
1.select '\101';
2.
3.
Actual Results:  
'A'

Expected Results:  
'\101'
Comment 1 Sjoerd Mullender cwiconfidential 2016-04-20 10:26:51 CEST
Note that there is also the possibility (in the standard) of having Unicode strings, as in:
select U&'\0041';
--> 'A'
This is completely not implemented.
This, by the way, does seem to be the only place where the backslash ("reverse solidus") has any meaning in the SQL standard.
Comment 2 Sjoerd Mullender cwiconfidential 2020-07-10 22:36:45 CEST
The Unicode strings have been implemented.
The parsing of strings is now in a transitional phase.  We have E-strings (E'string content') that are parsed with backslash interpretation, we have R-strings (R'string content') that are parsed according to the SQL standard (apart from the initial R), and we have plain strings that are currently equivalent to E-strings, but that will at some point change to be equivalent to R-strings.
Comment 3 Sjoerd Mullender cwiconfidential 2020-07-10 22:40:30 CEST
By the way, the E and R prefixes can of course also be spelled with lowercase.