|Summary:||CHAR comparisons do not pad strings of different lengths|
|Product:||SQL||Reporter:||Ben Reilly <ben>|
|Component:||all||Assignee:||SQL devs <bugs-sql>|
Description Ben Reilly 2013-06-06 21:35:55 CEST
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36 Build Identifier: When comparing with a CHAR type, the strings are not padded with trailing spaces, as they should be according to the SQL-92 standard (I couldn't find a more recent specification, but I assume this hasn't changed). For example, 'ABC' does not match 'ABC ' in MonetDB, whereas it should. I experienced this error while running TPC-H: query 3 compares the c_mktsegment column with the string 'MACHINERY'. It appears that the loader I use actually pads the strings manually, so MonetDB does not match these values because they're stored as 'MACHINERY ' in the DB, and the comparison string is not padded to the same length. Reproducible: Always Steps to Reproduce: sql>CREATE TABLE strtest (str CHAR(10)); sql>INSERT INTO strtest VALUES ('NOPAD'); sql>INSERT INTO strtest VALUES ('PAD '); sql>SELECT str FROM strtest WHERE str = 'NOPAD'; +------------+ | str | +============+ | NOPAD | +------------+ 1 tuple (0.892ms) sql>SELECT str FROM strtest WHERE str = 'NOPAD '; +-----+ | str | +=====+ +-----+ 0 tuples (0.286ms) sql>SELECT str FROM strtest WHERE str = 'PAD'; +-----+ | str | +=====+ +-----+ 0 tuples (0.279ms) sql>SELECT str FROM strtest WHERE str = 'PAD '; +------------+ | str | +============+ | PAD | +------------+ 1 tuple (0.291ms) Expected Results: From the SQL-92 specification, Section 8, General Rule 3 --- The comparison of two character strings is determined as follows: a) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad char- acters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any char- acter in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.
Comment 1 Niels Nes 2013-06-12 10:22:01 CEST
although this is indeed part of the sql standard, we have no plans to change the current support for fixed sized strings. I changed this to a 'enhancement' request. In the TPC-H standard the strings are all 'varchar' I believe.