Difference between revisions of "MonetDB type system"

From MonetDB
Jump to navigationJump to search
 
(35 intermediate revisions by 2 users not shown)
Line 1: Line 1:
* '''''Rule #1'''''
+
This page is mainly about '''''C''''' types.
** ''monetdb_config.h '''must''' be the '''first''' #include in '''each''' .c file ('''not''' in any .h file).''
+
 
 +
For MonetDB's '''''MAL''''' type system, see also https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types .
 +
 
 +
For the '''''SQL''''' types supported in MonetDB, see also https://www.monetdb.org/Documentation/Manuals/SQLreference/Datatypes .
 +
 
 +
 
 +
* '''''Type Rules'''''
 +
** C types '''<tt>long</tt>''' & '''<tt>unsigned long</tt>''' are '''''evil''''' (i.e., '''''NOT portable''''') and '''''must NOT be used!''''' <br> While they are 32/64 bit on 32/64-bit systems under Unix, ''they are '''always 32 bit''' under '''Windows''' (also on 64-bit systems)''. <br> In other words, C types '''<tt>long</tt>''' & '''<tt>unsigned long</tt>''' are '''always 32 bit''', except on 64-bit non-Windows systems. <br> If you need a type that scales from 32 bit on 32-bit systems to 64 bit on 64-bit systems, consider choosing the appropriate '''portable''' alternatives detailed below, i.e., <tt>size_t</tt>, <tt>ssize_t</tt>, <tt>ptrdiff_t</tt>, <tt>BUN</tt>, <tt>oid</tt>.
 +
** '''<tt>#include "monetdb_config.h"</tt>''' '''''must''''' be the '''''first''''' (non-comment) statement in '''''each''''' '''<tt>.c</tt>''' file (while it is '''''not''''' to be included in any '''<tt>.h</tt>''' file).
 +
** In C, a ''tuple-/object-ID (OID)'' is of type '''<tt>oid</tt>''', '''''NOT''''' of type <tt>int</tt>, <tt>lng</tt>, <tt>size_t</tt>, <tt>BUN</tt>, etc. <br> Type <tt>oid</tt> is always 32 bit (4 byte) on 32-bit systems, but can be either 32 bit (explicit choice during ''configure'') or 64 bit (8 byte; default) on 64-bit systems.
 +
** In C, the ''number of tuples in a BAT (BATcount)'' is of type '''<tt>BUN</tt>''', '''''NOT''''' of type <tt>int</tt>, <tt>lng</tt>, <tt>size_t</tt>, <tt>oid</tt>, etc. <br> Type <tt>BUN</tt> has the same size (width) as type <tt>oid</tt>.
 +
** In C, the ''length of a string'' and the ''size of an array or memory region'' are of type '''<tt>size_t</tt>''', '''''NOT''''' of type <tt>int</tt>, <tt>lng</tt>, <tt>oid</tt>, <tt>BUN</tt>, etc. <br> Type <tt> size_t</tt> is 32 bit (4 byte) on 32-bit systems and 64 bit (8 byte) on 64-bit systems (as are types <tt>ssize_t</tt> and <tt>ptrdiff_t</tt>).
 +
** In C, a ''BAT-ID'' is of type '''<tt>bat</tt>''', '''''NOT''''' of type <tt>int</tt>.
  
* '''''Rule #2'''''
 
** C types ''long'' & ''unsigned long'' are '''evil''' and '''must NOT be used'''; while they are 32/64-bit on 32/64-bit systems under Unix, ''they are '''always 32-bit''' (also on 64-bit systems) under '''Windows'''''.
 
  
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
|+ MonetDB type system
+
|+ MonetDB type system (excerpt!)
 +
|-
 +
! Semantics                                    !! SQL            !! MAL  !! C type  !! width                                  !! signed? !! NIL value                              !! value range <br> (excluding NIL value!)                          !! format string                !! availability                                              !! C example
 +
|-
 +
| string length,<br>array size,<br>memory size<br>[byte]||        ||      || <tt>size_t</tt>        || 4/8-byte <br> 32/64-bit || no      ||                                        ||                                          [0:2^32-1/2^64-1]      ||  SZFMT                      || always                                                    ||style="text-align: left;"|                size_t x = 0; <br> printf( SZFMT, x);
 +
|-
 +
| string length -,<br>array size -,<br>memory size -<br>- difference<br>[byte]|| || ||<tt>ssize_t</tt>||4/8-byte <br> 32/64-bit || yes    ||                                        ||                                      [-2^31/-2^63:2^31-1/2^63-1] || SSZFMT                      || always                                                    ||style="text-align: left;"|              ssize_t x = 0; <br> printf(SSZFMT, x);
 +
|-
 +
| pointer difference                          ||                ||      || <tt>ptrdiff_t</tt>      || 4/8-byte <br> 32/64-bit || yes    ||                                        ||                                      [-2^31/-2^63:2^31-1/2^63-1] ||  PDFMT                      || always                                                    ||style="text-align: left;"|            ptrdiff_t x = 0; <br> printf( PDFMT, x);
 +
|-
 +
| BAT-ID                                      ||                ||      ||<tt>'''bat'''</tt>      ||  4-byte <br>    32-bit || yes    || bat_nil == <br> (bat) int_nil          ||                                            (-2^31:2^31-1]        ||  "%d"                      || always                                                    ||style="text-align: left;"|                  bat x = 0; <br> printf(  "%d", x);
 +
|-
 +
| number of tuples in a BAT <br> (count)      ||                ||      || <tt>BUN</tt>            || 4/8-byte <br> 32/64-bit || no      || BUN_NONE == <br> (BUN) GDK_oid_max    || [0:BUN_MAX == (BUN_NONE - 1)] <br>        [0:2^31-2/2^63-2]      || BUNFMT                      || always                                                    ||style="text-align: left;"|                  BUN x = 0; <br> printf(BUNFMT, x);
 +
|-
 +
| object-ID / tuple-ID                        ||                || :oid || <tt>oid</tt>            || 4/8-byte <br> 32/64-bit || no      || oid_nil == <br> 2^31/2^63              || [GDK_oid_min:GDK_oid_max]    <br>        [0:2^31-1/2^63-1]      || OIDFMT                      || always                                                    ||style="text-align: left;"|                  oid x = 0; <br> printf(OIDFMT, x);
 +
|-
 +
| bit / boolean <br> (0/1 / false/true)        || BOOLEAN        || :bit || <tt>bit</tt>            ||  1-byte <br>    8-bit || (yes)  || bit_nil == <br> (bit) bte_nil          || {GDK_bit_min,GDK_bit_max} <br> {FALSE,TRUE} <br> {0,1}          || "%hhd"                      || always                                                    ||style="text-align: left;"|                  bit x = 0; <br> printf("%hhd", x);
 +
|-
 +
|  1-byte  (8-bit) <br> signed integer        || TINYINT        || :bte || <tt>bte</tt>            ||  1-byte <br>    8-bit || yes    || bte_nil == <br> GDK_bte_min            || (GDK_bte_min:GDK_bte_max]    <br>    (-2^7:2^7-1]              || "%hhd"                      || always                                                    ||style="text-align: left;"|                  bte x = 0; <br> printf("%hhd", x);
 +
|-
 +
|  1-byte  (8-bit) <br> unsigned integer      ||                ||      || <tt>unsigned char</tt>  ||  1-byte <br>    8-bit || no      ||                                        ||                                          [0:2^8-1]              || "%hhu"                      || always                                                    ||style="text-align: left;"|        unsigned char x = 0; <br> printf("%hhu", x);
 +
|-
 +
|  2-byte  (16-bit) <br> signed integer        || SMALLINT        || :sht || <tt>sht</tt>            ||  2-byte <br>    16-bit || yes    || sht_nil == <br> GDK_sht_min            || (GDK_sht_min:GDK_sht_max]    <br>    (-2^15:2^15-1]            ||  "%hd"                      || always                                                    ||style="text-align: left;"|                  sht x = 0; <br> printf( "%hd", x);
 
|-
 
|-
! SQL          !! MAL !! C  !! width <br> [byte] ([bit]) !! signed? !! range <br> ("raw")                        !! NIL <br> value      !! range <br> (non-NIL)                       !! format string !! format cast !! availability !! C example                    !! semantics
+
| 2-byte (16-bit) <br> unsigned integer      ||                ||      || <tt>unsigned short</tt> ||  2-byte <br>   16-bit || no      ||                                        ||                                          [0:2^16-1]            ||  "%hu"                      || always                                                    ||style="text-align: left;"|        unsigned short x = 0; <br> printf( "%hu", x);
 
|-
 
|-
| TINYINT      || :bte || bte || 1  (8)                    || yes    ||                 [-128:127]                 ||                -128 ||                [-127:127]                 || "%hhd"       ||             || always      || bte x = 0; printf("%hhd", x); || 1-byte  (8-bit) signed integer
+
| 4-byte  (32-bit) <br> signed integer        || INT<br>INTEGER  || :int || <tt>int</tt>            ||   4-byte <br>    32-bit || yes    || int_nil == <br> GDK_int_min            || (GDK_int_min:GDK_int_max]     <br>    (-2^31:2^31-1]             ||   "%d"                       || always                                                    ||style="text-align: left;"|                   int x = 0; <br> printf( "%d", x);
 
|-
 
|-
| SMALLINT     || :sht || sht || 2 (16)                    || yes    ||               [-65536:65535]              ||               -65536 ||               [-65535:65535]               || "%hd"       ||             || always      || sht x = 0; printf( "%hd", x); || 2-byte (16-bit) signed integer
+
| 4-byte  (32-bit) <br> unsigned integer     ||                 ||     || <tt>unsigned int</tt>  ||   4-byte <br>    32-bit || no      ||                                       ||                                           [0:2^32-1]             ||   "%u"                       || always                                                    ||style="text-align: left;"|         unsigned int x = 0; <br> printf( "%u", x);
 
|-
 
|-
| INT / INTEGER || :int || int || 4 (32)                    || yes     ||         [-2147483648:2147483647]         ||         -2147483647 ||         [-2147483647:2147483647]          ||   "%d"       ||            || always      || int x = 0; printf( "%d", x); || 4-byte (32-bit) signed integer
+
| machine-word-size <br> signed integer <br> 32/64 bit on 32/64-bit systems <br> '''''Deprecated''' as there is no such type in SQL. <br> Still used in MAL for counts, lacking :BUN in MAL <br> In C, use BUN for counts, otherwise ssize_t. || || :wrd || <tt>wrd</tt> || 4/8-byte <br> 32/64-bit || yes || wrd_nil == <br> GDK_wrd_min || (GDK_wrd_min:GDK_wrd_max] <br> (-2^31/63:2^31/63-1] || SSZFMT || always ||style="text-align: left;"| wrd x = 0; <br> printf( SSZFMT, x);
 
|-
 
|-
| BIGINT       || :lng || lng || 8 (64)                   || yes    || [-9223372036854775808:9223372036854775807] || -9223372036854775808 || [-9223372036854775807:9223372036854775807] || LLFMT        ||            || always       || lng x = 0; printf(LLFMT , x); || 8-byte (64-bit) signed integer
+
|  8-byte  (64-bit) <br> signed integer        || BIGINT         || :lng || <tt>lng</tt>            ||   8-byte <br>    64-bit || yes    || lng_nil == <br> GDK_lng_min            || (GDK_lng_min:GDK_lng_max]    <br>    (-2^63:2^63-1]            || LLFMT                        || always                                                    ||style="text-align: left;"|                  lng x = 0; <br> printf( LLFMT, x);
 +
|-
 +
|  8-byte  (64-bit) <br> unsigned integer      ||                ||      || <tt>ulng</tt>          ||  8-byte <br>    64-bit || no      ||                                        ||                                          [0:2^64-1]            || ULLFMT                      || always                                                    ||style="text-align: left;"|                  ulng x = 0; <br> printf(ULLFMT, x);
 +
|-
 +
| 16-byte (128-bit) <br> signed integer        || HUGEINT        || :hge || <tt>hge</tt>            ||  16-byte <br>  128-bit || yes    || hge_nil == <br> GDK_hge_min            || (GDK_hge_min:GDK_hge_max]    <br>  (-2^127:2^127-1]           || (none provided by compilers) ||if supported by compiler<br>(configure then defines HAVE_HGE)||style="text-align: left;"| #ifdef HAVE_HGE <br> hge x = 0; <br> printf("%.40g", (dbl) x); <br> #endif
 +
|-
 +
| 16-byte (128-bit) <br> unsigned integer      ||                ||      || <tt>uhge</tt>          ||  16-byte <br>  128-bit || no      ||                                        ||                                          [0:2^128-1]            || (none provided by compilers) ||if supported by compiler<br>(configure then defines HAVE_HGE)||style="text-align: left;"| #ifdef HAVE_HGE <br>uhge x = 0; <br> printf("%.40g", (dbl) x); <br> #endif
 +
|-
 +
|  4-byte  (32-bit) <br> floating-point number || REAL            || :flt || <tt>flt</tt>            ||  4-byte <br>    32-bit || yes    || flt_nil == <br> GDK_flt_min            || (GDK_flt_min:GDK_flt_max]    <br> (-FLT_MAX:FLT_MAX]            || "%e", "%f", "%g"             || always                                                     ||style="text-align: left;"|                   flt x = 0; <br> printf( "%f", x);
 +
|-
 +
| 8-byte (64-bit) <br> floating-point number || FLOAT<br>DOUBLE || :dbl || <tt>dbl</tt>            ||  8-byte <br>    64-bit || yes    || dbl_nil == <br> GDK_dbl_min            || (GDK_dbl_min:GDK_dbl_max]    <br> (-DBL_MAX:DBL_MAX]            || "%e", "%f", "%g"            || always                                                    ||style="text-align: left;"|                  dbl x = 0; <br> printf(  "%f", x);
 +
|-
 +
| strings <br> ''Internally, only valid UTF-8 encoded strings are supported; <br> conversion from/to other encoding has to be performed <br> before/during base data import <br> and during/after query result export.'' || CHAR <br> CHARACTER <br> VARCHAR <br> CHARACTER VARYING <br> TEXT <br> STRING <br> CLOB <br> CHARACTER LARGE OBJECT || :str || <tt>str</tt> || || || str_nil <br> const char str_nil[2] <br> = { '\200', 0 }; || || "%s" || always || str x = ""; <br> printf("%s",x);
 
|}
 
|}
 +
 +
In case you have any questions about correct, proper, and portable type usage in MonetDB, please do not hesitate to ask Sjoerd or Stefan.

Latest revision as of 19:14, 3 November 2015

This page is mainly about C types.

For MonetDB's MAL type system, see also https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types .

For the SQL types supported in MonetDB, see also https://www.monetdb.org/Documentation/Manuals/SQLreference/Datatypes .


  • Type Rules
    • C types long & unsigned long are evil (i.e., NOT portable) and must NOT be used!
      While they are 32/64 bit on 32/64-bit systems under Unix, they are always 32 bit under Windows (also on 64-bit systems).
      In other words, C types long & unsigned long are always 32 bit, except on 64-bit non-Windows systems.
      If you need a type that scales from 32 bit on 32-bit systems to 64 bit on 64-bit systems, consider choosing the appropriate portable alternatives detailed below, i.e., size_t, ssize_t, ptrdiff_t, BUN, oid.
    • #include "monetdb_config.h" must be the first (non-comment) statement in each .c file (while it is not to be included in any .h file).
    • In C, a tuple-/object-ID (OID) is of type oid, NOT of type int, lng, size_t, BUN, etc.
      Type oid is always 32 bit (4 byte) on 32-bit systems, but can be either 32 bit (explicit choice during configure) or 64 bit (8 byte; default) on 64-bit systems.
    • In C, the number of tuples in a BAT (BATcount) is of type BUN, NOT of type int, lng, size_t, oid, etc.
      Type BUN has the same size (width) as type oid.
    • In C, the length of a string and the size of an array or memory region are of type size_t, NOT of type int, lng, oid, BUN, etc.
      Type size_t is 32 bit (4 byte) on 32-bit systems and 64 bit (8 byte) on 64-bit systems (as are types ssize_t and ptrdiff_t).
    • In C, a BAT-ID is of type bat, NOT of type int.


MonetDB type system (excerpt!)
Semantics SQL MAL C type width signed? NIL value value range
(excluding NIL value!)
format string availability C example
string length,
array size,
memory size
[byte]
size_t 4/8-byte
32/64-bit
no [0:2^32-1/2^64-1] SZFMT always size_t x = 0;
printf( SZFMT, x);
string length -,
array size -,
memory size -
- difference
[byte]
ssize_t 4/8-byte
32/64-bit
yes [-2^31/-2^63:2^31-1/2^63-1] SSZFMT always ssize_t x = 0;
printf(SSZFMT, x);
pointer difference ptrdiff_t 4/8-byte
32/64-bit
yes [-2^31/-2^63:2^31-1/2^63-1] PDFMT always ptrdiff_t x = 0;
printf( PDFMT, x);
BAT-ID bat 4-byte
32-bit
yes bat_nil ==
(bat) int_nil
(-2^31:2^31-1] "%d" always bat x = 0;
printf( "%d", x);
number of tuples in a BAT
(count)
BUN 4/8-byte
32/64-bit
no BUN_NONE ==
(BUN) GDK_oid_max
[0:BUN_MAX == (BUN_NONE - 1)]
[0:2^31-2/2^63-2]
BUNFMT always BUN x = 0;
printf(BUNFMT, x);
object-ID / tuple-ID :oid oid 4/8-byte
32/64-bit
no oid_nil ==
2^31/2^63
[GDK_oid_min:GDK_oid_max]
[0:2^31-1/2^63-1]
OIDFMT always oid x = 0;
printf(OIDFMT, x);
bit / boolean
(0/1 / false/true)
BOOLEAN :bit bit 1-byte
8-bit
(yes) bit_nil ==
(bit) bte_nil
{GDK_bit_min,GDK_bit_max}
{FALSE,TRUE}
{0,1}
"%hhd" always bit x = 0;
printf("%hhd", x);
1-byte (8-bit)
signed integer
TINYINT :bte bte 1-byte
8-bit
yes bte_nil ==
GDK_bte_min
(GDK_bte_min:GDK_bte_max]
(-2^7:2^7-1]
"%hhd" always bte x = 0;
printf("%hhd", x);
1-byte (8-bit)
unsigned integer
unsigned char 1-byte
8-bit
no [0:2^8-1] "%hhu" always unsigned char x = 0;
printf("%hhu", x);
2-byte (16-bit)
signed integer
SMALLINT :sht sht 2-byte
16-bit
yes sht_nil ==
GDK_sht_min
(GDK_sht_min:GDK_sht_max]
(-2^15:2^15-1]
"%hd" always sht x = 0;
printf( "%hd", x);
2-byte (16-bit)
unsigned integer
unsigned short 2-byte
16-bit
no [0:2^16-1] "%hu" always unsigned short x = 0;
printf( "%hu", x);
4-byte (32-bit)
signed integer
INT
INTEGER
:int int 4-byte
32-bit
yes int_nil ==
GDK_int_min
(GDK_int_min:GDK_int_max]
(-2^31:2^31-1]
"%d" always int x = 0;
printf( "%d", x);
4-byte (32-bit)
unsigned integer
unsigned int 4-byte
32-bit
no [0:2^32-1] "%u" always unsigned int x = 0;
printf( "%u", x);
machine-word-size
signed integer
32/64 bit on 32/64-bit systems
Deprecated as there is no such type in SQL.
Still used in MAL for counts, lacking :BUN in MAL
In C, use BUN for counts, otherwise ssize_t.
:wrd wrd 4/8-byte
32/64-bit
yes wrd_nil ==
GDK_wrd_min
(GDK_wrd_min:GDK_wrd_max]
(-2^31/63:2^31/63-1]
SSZFMT always wrd x = 0;
printf( SSZFMT, x);
8-byte (64-bit)
signed integer
BIGINT :lng lng 8-byte
64-bit
yes lng_nil ==
GDK_lng_min
(GDK_lng_min:GDK_lng_max]
(-2^63:2^63-1]
LLFMT always lng x = 0;
printf( LLFMT, x);
8-byte (64-bit)
unsigned integer
ulng 8-byte
64-bit
no [0:2^64-1] ULLFMT always ulng x = 0;
printf(ULLFMT, x);
16-byte (128-bit)
signed integer
HUGEINT :hge hge 16-byte
128-bit
yes hge_nil ==
GDK_hge_min
(GDK_hge_min:GDK_hge_max]
(-2^127:2^127-1]
(none provided by compilers) if supported by compiler
(configure then defines HAVE_HGE)
#ifdef HAVE_HGE
hge x = 0;
printf("%.40g", (dbl) x);
#endif
16-byte (128-bit)
unsigned integer
uhge 16-byte
128-bit
no [0:2^128-1] (none provided by compilers) if supported by compiler
(configure then defines HAVE_HGE)
#ifdef HAVE_HGE
uhge x = 0;
printf("%.40g", (dbl) x);
#endif
4-byte (32-bit)
floating-point number
REAL :flt flt 4-byte
32-bit
yes flt_nil ==
GDK_flt_min
(GDK_flt_min:GDK_flt_max]
(-FLT_MAX:FLT_MAX]
"%e", "%f", "%g" always flt x = 0;
printf( "%f", x);
8-byte (64-bit)
floating-point number
FLOAT
DOUBLE
:dbl dbl 8-byte
64-bit
yes dbl_nil ==
GDK_dbl_min
(GDK_dbl_min:GDK_dbl_max]
(-DBL_MAX:DBL_MAX]
"%e", "%f", "%g" always dbl x = 0;
printf( "%f", x);
strings
Internally, only valid UTF-8 encoded strings are supported;
conversion from/to other encoding has to be performed
before/during base data import
and during/after query result export.
CHAR
CHARACTER
VARCHAR
CHARACTER VARYING
TEXT
STRING
CLOB
CHARACTER LARGE OBJECT
:str str str_nil
const char str_nil[2]
= { '\200', 0 };
"%s" always str x = "";
printf("%s",x);

In case you have any questions about correct, proper, and portable type usage in MonetDB, please do not hesitate to ask Sjoerd or Stefan.