Bug 3301

Summary: CHAR comparisons do not pad strings of different lengths
Product: SQL Reporter: Ben Reilly <ben>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: enhancement CC: niels
Priority: Normal    
Version: -- development   
Hardware: x86_64 (amd64/em64t)   
OS: Linux   

Description Ben Reilly 2013-06-06 21:35:55 CEST
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36
Build Identifier: 

When comparing with a CHAR type, the strings are not padded with trailing spaces, as they should be according to the SQL-92 standard (I couldn't find a more recent specification, but I assume this hasn't changed). For example, 'ABC' does not match 'ABC   ' in MonetDB, whereas it should.

I experienced this error while running TPC-H: query 3 compares the c_mktsegment column with the string 'MACHINERY'. It appears that the loader I use actually pads the strings manually, so MonetDB does not match these values because they're stored as 'MACHINERY ' in the DB, and the comparison string is not padded to the same length.

Reproducible: Always

Steps to Reproduce:
sql>CREATE TABLE strtest (str CHAR(10));
sql>INSERT INTO strtest VALUES ('PAD  ');
sql>SELECT str FROM strtest WHERE str = 'NOPAD';
| str        |
| NOPAD      |
1 tuple (0.892ms)
sql>SELECT str FROM strtest WHERE str = 'NOPAD  ';
| str |
0 tuples (0.286ms)
sql>SELECT str FROM strtest WHERE str = 'PAD';
| str |
0 tuples (0.279ms)
sql>SELECT str FROM strtest WHERE str = 'PAD  ';
| str        |
| PAD        |
1 tuple (0.291ms)

Expected Results:  
From the SQL-92 specification, Section 8, General Rule 3
The comparison of two character strings is determined as follows:
         a) If the length in characters of X is not equal to the length
           in characters of Y, then the shorter string is effectively
           replaced, for the purposes of comparison, with a copy of
           itself that has been extended to the length of the longer
           string by concatenation on the right of one or more pad char-
           acters, where the pad character is chosen based on CS. If
           CS has the NO PAD attribute, then the pad character is an
           implementation-dependent character different from any char-
           acter in the character set of X and Y that collates less
           than any string under CS. Otherwise, the pad character is a
Comment 1 Niels Nes cwiconfidential 2013-06-12 10:22:01 CEST
although this is indeed part of the sql standard, we have no plans to change the current support for fixed sized strings. I changed this to a 'enhancement' request.

In the TPC-H standard the strings are all 'varchar' I believe.