Difference between revisions of "MonetDB type system"

From MonetDB
Jump to navigationJump to search
 
(34 intermediate revisions by 2 users not shown)
Line 1: Line 1:
* '''''Rule #1'''''
+
This page is mainly about '''''C''''' types.
** ''monetdb_config.h '''must''' be the '''first''' #include in '''each''' .c file ('''not''' in any .h file).''
+
 
 +
For MonetDB's '''''MAL''''' type system, see also https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types .
 +
 
 +
For the '''''SQL''''' types supported in MonetDB, see also https://www.monetdb.org/Documentation/Manuals/SQLreference/Datatypes .
 +
 
 +
 
 +
* '''''Type Rules'''''
 +
** C types '''<tt>long</tt>''' & '''<tt>unsigned long</tt>''' are '''''evil''''' (i.e., '''''NOT portable''''') and '''''must NOT be used!''''' <br> While they are 32/64 bit on 32/64-bit systems under Unix, ''they are '''always 32 bit''' under '''Windows''' (also on 64-bit systems)''. <br> In other words, C types '''<tt>long</tt>''' & '''<tt>unsigned long</tt>''' are '''always 32 bit''', except on 64-bit non-Windows systems. <br> If you need a type that scales from 32 bit on 32-bit systems to 64 bit on 64-bit systems, consider choosing the appropriate '''portable''' alternatives detailed below, i.e., <tt>size_t</tt>, <tt>ssize_t</tt>, <tt>ptrdiff_t</tt>, <tt>BUN</tt>, <tt>oid</tt>.
 +
** '''<tt>#include "monetdb_config.h"</tt>''' '''''must''''' be the '''''first''''' (non-comment) statement in '''''each''''' '''<tt>.c</tt>''' file (while it is '''''not''''' to be included in any '''<tt>.h</tt>''' file).
 +
** In C, a ''tuple-/object-ID (OID)'' is of type '''<tt>oid</tt>''', '''''NOT''''' of type <tt>int</tt>, <tt>lng</tt>, <tt>size_t</tt>, <tt>BUN</tt>, etc. <br> Type <tt>oid</tt> is always 32 bit (4 byte) on 32-bit systems, but can be either 32 bit (explicit choice during ''configure'') or 64 bit (8 byte; default) on 64-bit systems.
 +
** In C, the ''number of tuples in a BAT (BATcount)'' is of type '''<tt>BUN</tt>''', '''''NOT''''' of type <tt>int</tt>, <tt>lng</tt>, <tt>size_t</tt>, <tt>oid</tt>, etc. <br> Type <tt>BUN</tt> has the same size (width) as type <tt>oid</tt>.
 +
** In C, the ''length of a string'' and the ''size of an array or memory region'' are of type '''<tt>size_t</tt>''', '''''NOT''''' of type <tt>int</tt>, <tt>lng</tt>, <tt>oid</tt>, <tt>BUN</tt>, etc. <br> Type <tt> size_t</tt> is 32 bit (4 byte) on 32-bit systems and 64 bit (8 byte) on 64-bit systems (as are types <tt>ssize_t</tt> and <tt>ptrdiff_t</tt>).
 +
** In C, a ''BAT-ID'' is of type '''<tt>bat</tt>''', '''''NOT''''' of type <tt>int</tt>.
  
* '''''Rule #2'''''
 
** C types ''long'' & ''unsigned long'' are '''evil''' and '''must NOT be used'''; while they are 32/64-bit on 32/64-bit systems under Unix, ''they are '''always 32-bit''' (also on 64-bit systems) under '''Windows'''''.
 
  
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
|+ MonetDB type system
+
|+ MonetDB type system (excerpt!)
 
|-
 
|-
! SQL           !! MAL  !! C   !! width <br> [byte] ([bit]) !! signed? !! NIL value                               !! min value                               !! max value                              !! format string               !! format cast !! availability                                              !! C example                     !! semantics
+
! Semantics                                    !! SQL             !! MAL  !! C type  !! width                                   !! signed? !! NIL value                             !! value range <br> (excluding NIL value!)                          !! format string               !! availability                                              !! C example
 
|-
 
|-
| TINYINT      || :bte || bte || 1  (8)                  || yes    ||                                    -128 ||                                     -127 ||                                     127 || "%hhd"                      ||             || always                                                    || bte x = 0; printf("%hhd", x); ||  1-byte  (8-bit) signed integer
+
| string length,<br>array size,<br>memory size<br>[byte]||       ||     || <tt>size_t</tt>        || 4/8-byte <br> 32/64-bit || no      ||                                       ||                                           [0:2^32-1/2^64-1]      || SZFMT                      || always                                                    ||style="text-align: left;"|                size_t x = 0; <br> printf( SZFMT, x);
 
|-
 
|-
| SMALLINT      || :sht || sht || 2  (16)                  || yes    ||                                   -65536 ||                                   -65535 ||                                   65535 ||  "%hd"                      ||            || always                                                    || sht x = 0; printf( "%hd", x); ||  2-byte  (16-bit) signed integer
+
| string length -,<br>array size -,<br>memory size -<br>- difference<br>[byte]|| || ||<tt>ssize_t</tt>||4/8-byte <br> 32/64-bit || yes    ||                                       ||                                     [-2^31/-2^63:2^31-1/2^63-1] || SSZFMT                      || always                                                    ||style="text-align: left;"|              ssize_t x = 0; <br> printf(SSZFMT, x);
 
|-
 
|-
| INT / INTEGER || :int || int || 4 (32)                  || yes    ||                             -2147483648 ||                             -2147483647 ||                             2147483647 ||   "%d"                     ||            || always                                                    || int x = 0; printf( "%d", x); ||  4-byte  (32-bit) signed integer
+
| pointer difference                          ||                 ||     || <tt>ptrdiff_t</tt>      || 4/8-byte <br> 32/64-bit || yes    ||                                       ||                                      [-2^31/-2^63:2^31-1/2^63-1] || PDFMT                      || always                                                    ||style="text-align: left;"|            ptrdiff_t x = 0; <br> printf( PDFMT, x);
 
|-
 
|-
| BIGINT        || :lng || lng || 8  (64)                  || yes    ||                     -9223372036854775808 ||                     -9223372036854775807 ||                    9223372036854775807 || LLFMT                       ||            || always                                                    || lng x = 0; printf(LLFMT , x); ||  8-byte  (64-bit) signed integer
+
| BAT-ID                                      ||                ||     ||<tt>'''bat'''</tt>      ||   4-byte <br>    32-bit || yes    || bat_nil == <br> (bat) int_nil          ||                                           (-2^31:2^31-1]        ||   "%d"                       || always                                                    ||style="text-align: left;"|                  bat x = 0; <br> printf( "%d", x);
 
|-
 
|-
| HUGEINT       || :hge || hge || 16 (128)                 || yes    || -170141183460469231731687303715884105728 || -170141183460469231731687303715884105727 || 170141183460469231731687303715884105727 || (none provide by compilers) ||            || if supported by compiler (configure then defines HAVE_HGE) || #ifdef HAVE_HGE <br> hge x = 0; printf(LLFMT , x); <br> #endif || 16-byte (128-bit) signed integer
+
| number of tuples in a BAT <br> (count)       ||                ||      || <tt>BUN</tt>            || 4/8-byte <br> 32/64-bit || no      || BUN_NONE == <br> (BUN) GDK_oid_max    || [0:BUN_MAX == (BUN_NONE - 1)] <br>        [0:2^31-2/2^63-2]      || BUNFMT                      || always                                                    ||style="text-align: left;"|                  BUN x = 0; <br> printf(BUNFMT, x);
 +
|-
 +
| object-ID / tuple-ID                        ||                || :oid || <tt>oid</tt>            || 4/8-byte <br> 32/64-bit || no      || oid_nil == <br> 2^31/2^63              || [GDK_oid_min:GDK_oid_max]    <br>        [0:2^31-1/2^63-1]      || OIDFMT                      || always                                                    ||style="text-align: left;"|                  oid x = 0; <br> printf(OIDFMT, x);
 +
|-
 +
| bit / boolean <br> (0/1 / false/true)        || BOOLEAN        || :bit || <tt>bit</tt>            ||  1-byte <br>    8-bit || (yes)  || bit_nil == <br> (bit) bte_nil          || {GDK_bit_min,GDK_bit_max} <br> {FALSE,TRUE} <br> {0,1}          || "%hhd"                      || always                                                    ||style="text-align: left;"|                  bit x = 0; <br> printf("%hhd", x);
 +
|-
 +
|  1-byte  (8-bit) <br> signed integer        || TINYINT        || :bte || <tt>bte</tt>            ||  1-byte <br>    8-bit || yes    || bte_nil == <br> GDK_bte_min            || (GDK_bte_min:GDK_bte_max]    <br>    (-2^7:2^7-1]              || "%hhd"                      || always                                                    ||style="text-align: left;"|                  bte x = 0; <br> printf("%hhd", x);
 +
|-
 +
|  1-byte  (8-bit) <br> unsigned integer      ||                ||      || <tt>unsigned char</tt>  ||  1-byte <br>    8-bit || no      ||                                        ||                                          [0:2^8-1]              || "%hhu"                      || always                                                    ||style="text-align: left;"|        unsigned char x = 0; <br> printf("%hhu", x);
 +
|-
 +
|  2-byte  (16-bit) <br> signed integer        || SMALLINT        || :sht || <tt>sht</tt>            ||  2-byte <br>    16-bit || yes    || sht_nil == <br> GDK_sht_min            || (GDK_sht_min:GDK_sht_max]    <br>    (-2^15:2^15-1]            ||  "%hd"                      || always                                                    ||style="text-align: left;"|                  sht x = 0; <br> printf( "%hd", x);
 +
|-
 +
|  2-byte  (16-bit) <br> unsigned integer      ||                ||      || <tt>unsigned short</tt> ||  2-byte <br>    16-bit || no      ||                                       ||                                           [0:2^16-1]            ||  "%hu"                      || always                                                    ||style="text-align: left;"|        unsigned short x = 0; <br> printf( "%hu", x);
 +
|-
 +
|  4-byte  (32-bit) <br> signed integer        || INT<br>INTEGER  || :int || <tt>int</tt>            ||  4-byte <br>    32-bit || yes    || int_nil == <br> GDK_int_min            || (GDK_int_min:GDK_int_max]    <br>    (-2^31:2^31-1]            ||  "%d"                      || always                                                    ||style="text-align: left;"|                  int x = 0; <br> printf(  "%d", x);
 +
|-
 +
|  4-byte  (32-bit) <br> unsigned integer      ||                ||      || <tt>unsigned int</tt>  ||  4-byte <br>    32-bit || no      ||                                        ||                                          [0:2^32-1]            ||  "%u"                      || always                                                    ||style="text-align: left;"|          unsigned int x = 0; <br> printf(  "%u", x);
 +
|-
 +
| machine-word-size <br> signed integer <br> 32/64 bit on 32/64-bit systems <br> '''''Deprecated''' as there is no such type in SQL. <br> Still used in MAL for counts, lacking :BUN in MAL <br> In C, use BUN for counts, otherwise ssize_t. || || :wrd || <tt>wrd</tt> || 4/8-byte <br> 32/64-bit || yes || wrd_nil == <br> GDK_wrd_min || (GDK_wrd_min:GDK_wrd_max] <br> (-2^31/63:2^31/63-1] || SSZFMT || always ||style="text-align: left;"| wrd x = 0; <br> printf( SSZFMT, x);
 +
|-
 +
|  8-byte  (64-bit) <br> signed integer        || BIGINT          || :lng || <tt>lng</tt>            ||  8-byte <br>    64-bit || yes    || lng_nil == <br> GDK_lng_min            || (GDK_lng_min:GDK_lng_max]    <br>    (-2^63:2^63-1]            || LLFMT                        || always                                                    ||style="text-align: left;"|                  lng x = 0; <br> printf( LLFMT, x);
 +
|-
 +
|  8-byte  (64-bit) <br> unsigned integer      ||                ||      || <tt>ulng</tt>          ||  8-byte <br>    64-bit || no      ||                                        ||                                          [0:2^64-1]            || ULLFMT                      || always                                                    ||style="text-align: left;"|                  ulng x = 0; <br> printf(ULLFMT, x);
 +
|-
 +
| 16-byte (128-bit) <br> signed integer        || HUGEINT        || :hge || <tt>hge</tt>            ||  16-byte <br>  128-bit || yes    || hge_nil == <br> GDK_hge_min            || (GDK_hge_min:GDK_hge_max]    <br>  (-2^127:2^127-1]            || (none provided by compilers) ||if supported by compiler<br>(configure then defines HAVE_HGE)||style="text-align: left;"| #ifdef HAVE_HGE <br> hge x = 0; <br> printf("%.40g", (dbl) x); <br> #endif
 +
|-
 +
| 16-byte (128-bit) <br> unsigned integer     ||                ||      || <tt>uhge</tt>          ||  16-byte <br>  128-bit || no      ||                                        ||                                          [0:2^128-1]            || (none provided by compilers) ||if supported by compiler<br>(configure then defines HAVE_HGE)||style="text-align: left;"| #ifdef HAVE_HGE <br>uhge x = 0; <br> printf("%.40g", (dbl) x); <br> #endif
 +
|-
 +
|  4-byte  (32-bit) <br> floating-point number || REAL            || :flt || <tt>flt</tt>            ||  4-byte <br>    32-bit || yes    || flt_nil == <br> GDK_flt_min            || (GDK_flt_min:GDK_flt_max]    <br> (-FLT_MAX:FLT_MAX]            || "%e", "%f", "%g"            || always                                                    ||style="text-align: left;"|                  flt x = 0; <br> printf(  "%f", x);
 +
|-
 +
|  8-byte  (64-bit) <br> floating-point number || FLOAT<br>DOUBLE || :dbl || <tt>dbl</tt>            ||  8-byte <br>    64-bit || yes    || dbl_nil == <br> GDK_dbl_min            || (GDK_dbl_min:GDK_dbl_max]    <br> (-DBL_MAX:DBL_MAX]            || "%e", "%f", "%g"            || always                                                    ||style="text-align: left;"|                  dbl x = 0; <br> printf(  "%f", x);
 +
|-
 +
| strings <br> ''Internally, only valid UTF-8 encoded strings are supported; <br> conversion from/to other encoding has to be performed <br> before/during base data import <br> and during/after query result export.'' || CHAR <br> CHARACTER <br> VARCHAR <br> CHARACTER VARYING <br> TEXT <br> STRING <br> CLOB <br> CHARACTER LARGE OBJECT || :str || <tt>str</tt> || || || str_nil <br> const char str_nil[2] <br> = { '\200', 0 }; || || "%s" || always || str x = ""; <br> printf("%s",x);
 
|}
 
|}
 +
 +
In case you have any questions about correct, proper, and portable type usage in MonetDB, please do not hesitate to ask Sjoerd or Stefan.

Latest revision as of 19:14, 3 November 2015

This page is mainly about C types.

For MonetDB's MAL type system, see also https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types .

For the SQL types supported in MonetDB, see also https://www.monetdb.org/Documentation/Manuals/SQLreference/Datatypes .


  • Type Rules
    • C types long & unsigned long are evil (i.e., NOT portable) and must NOT be used!
      While they are 32/64 bit on 32/64-bit systems under Unix, they are always 32 bit under Windows (also on 64-bit systems).
      In other words, C types long & unsigned long are always 32 bit, except on 64-bit non-Windows systems.
      If you need a type that scales from 32 bit on 32-bit systems to 64 bit on 64-bit systems, consider choosing the appropriate portable alternatives detailed below, i.e., size_t, ssize_t, ptrdiff_t, BUN, oid.
    • #include "monetdb_config.h" must be the first (non-comment) statement in each .c file (while it is not to be included in any .h file).
    • In C, a tuple-/object-ID (OID) is of type oid, NOT of type int, lng, size_t, BUN, etc.
      Type oid is always 32 bit (4 byte) on 32-bit systems, but can be either 32 bit (explicit choice during configure) or 64 bit (8 byte; default) on 64-bit systems.
    • In C, the number of tuples in a BAT (BATcount) is of type BUN, NOT of type int, lng, size_t, oid, etc.
      Type BUN has the same size (width) as type oid.
    • In C, the length of a string and the size of an array or memory region are of type size_t, NOT of type int, lng, oid, BUN, etc.
      Type size_t is 32 bit (4 byte) on 32-bit systems and 64 bit (8 byte) on 64-bit systems (as are types ssize_t and ptrdiff_t).
    • In C, a BAT-ID is of type bat, NOT of type int.


MonetDB type system (excerpt!)
Semantics SQL MAL C type width signed? NIL value value range
(excluding NIL value!)
format string availability C example
string length,
array size,
memory size
[byte]
size_t 4/8-byte
32/64-bit
no [0:2^32-1/2^64-1] SZFMT always size_t x = 0;
printf( SZFMT, x);
string length -,
array size -,
memory size -
- difference
[byte]
ssize_t 4/8-byte
32/64-bit
yes [-2^31/-2^63:2^31-1/2^63-1] SSZFMT always ssize_t x = 0;
printf(SSZFMT, x);
pointer difference ptrdiff_t 4/8-byte
32/64-bit
yes [-2^31/-2^63:2^31-1/2^63-1] PDFMT always ptrdiff_t x = 0;
printf( PDFMT, x);
BAT-ID bat 4-byte
32-bit
yes bat_nil ==
(bat) int_nil
(-2^31:2^31-1] "%d" always bat x = 0;
printf( "%d", x);
number of tuples in a BAT
(count)
BUN 4/8-byte
32/64-bit
no BUN_NONE ==
(BUN) GDK_oid_max
[0:BUN_MAX == (BUN_NONE - 1)]
[0:2^31-2/2^63-2]
BUNFMT always BUN x = 0;
printf(BUNFMT, x);
object-ID / tuple-ID :oid oid 4/8-byte
32/64-bit
no oid_nil ==
2^31/2^63
[GDK_oid_min:GDK_oid_max]
[0:2^31-1/2^63-1]
OIDFMT always oid x = 0;
printf(OIDFMT, x);
bit / boolean
(0/1 / false/true)
BOOLEAN :bit bit 1-byte
8-bit
(yes) bit_nil ==
(bit) bte_nil
{GDK_bit_min,GDK_bit_max}
{FALSE,TRUE}
{0,1}
"%hhd" always bit x = 0;
printf("%hhd", x);
1-byte (8-bit)
signed integer
TINYINT :bte bte 1-byte
8-bit
yes bte_nil ==
GDK_bte_min
(GDK_bte_min:GDK_bte_max]
(-2^7:2^7-1]
"%hhd" always bte x = 0;
printf("%hhd", x);
1-byte (8-bit)
unsigned integer
unsigned char 1-byte
8-bit
no [0:2^8-1] "%hhu" always unsigned char x = 0;
printf("%hhu", x);
2-byte (16-bit)
signed integer
SMALLINT :sht sht 2-byte
16-bit
yes sht_nil ==
GDK_sht_min
(GDK_sht_min:GDK_sht_max]
(-2^15:2^15-1]
"%hd" always sht x = 0;
printf( "%hd", x);
2-byte (16-bit)
unsigned integer
unsigned short 2-byte
16-bit
no [0:2^16-1] "%hu" always unsigned short x = 0;
printf( "%hu", x);
4-byte (32-bit)
signed integer
INT
INTEGER
:int int 4-byte
32-bit
yes int_nil ==
GDK_int_min
(GDK_int_min:GDK_int_max]
(-2^31:2^31-1]
"%d" always int x = 0;
printf( "%d", x);
4-byte (32-bit)
unsigned integer
unsigned int 4-byte
32-bit
no [0:2^32-1] "%u" always unsigned int x = 0;
printf( "%u", x);
machine-word-size
signed integer
32/64 bit on 32/64-bit systems
Deprecated as there is no such type in SQL.
Still used in MAL for counts, lacking :BUN in MAL
In C, use BUN for counts, otherwise ssize_t.
:wrd wrd 4/8-byte
32/64-bit
yes wrd_nil ==
GDK_wrd_min
(GDK_wrd_min:GDK_wrd_max]
(-2^31/63:2^31/63-1]
SSZFMT always wrd x = 0;
printf( SSZFMT, x);
8-byte (64-bit)
signed integer
BIGINT :lng lng 8-byte
64-bit
yes lng_nil ==
GDK_lng_min
(GDK_lng_min:GDK_lng_max]
(-2^63:2^63-1]
LLFMT always lng x = 0;
printf( LLFMT, x);
8-byte (64-bit)
unsigned integer
ulng 8-byte
64-bit
no [0:2^64-1] ULLFMT always ulng x = 0;
printf(ULLFMT, x);
16-byte (128-bit)
signed integer
HUGEINT :hge hge 16-byte
128-bit
yes hge_nil ==
GDK_hge_min
(GDK_hge_min:GDK_hge_max]
(-2^127:2^127-1]
(none provided by compilers) if supported by compiler
(configure then defines HAVE_HGE)
#ifdef HAVE_HGE
hge x = 0;
printf("%.40g", (dbl) x);
#endif
16-byte (128-bit)
unsigned integer
uhge 16-byte
128-bit
no [0:2^128-1] (none provided by compilers) if supported by compiler
(configure then defines HAVE_HGE)
#ifdef HAVE_HGE
uhge x = 0;
printf("%.40g", (dbl) x);
#endif
4-byte (32-bit)
floating-point number
REAL :flt flt 4-byte
32-bit
yes flt_nil ==
GDK_flt_min
(GDK_flt_min:GDK_flt_max]
(-FLT_MAX:FLT_MAX]
"%e", "%f", "%g" always flt x = 0;
printf( "%f", x);
8-byte (64-bit)
floating-point number
FLOAT
DOUBLE
:dbl dbl 8-byte
64-bit
yes dbl_nil ==
GDK_dbl_min
(GDK_dbl_min:GDK_dbl_max]
(-DBL_MAX:DBL_MAX]
"%e", "%f", "%g" always dbl x = 0;
printf( "%f", x);
strings
Internally, only valid UTF-8 encoded strings are supported;
conversion from/to other encoding has to be performed
before/during base data import
and during/after query result export.
CHAR
CHARACTER
VARCHAR
CHARACTER VARYING
TEXT
STRING
CLOB
CHARACTER LARGE OBJECT
:str str str_nil
const char str_nil[2]
= { '\200', 0 };
"%s" always str x = "";
printf("%s",x);

In case you have any questions about correct, proper, and portable type usage in MonetDB, please do not hesitate to ask Sjoerd or Stefan.