Table column type meta-data and the MonetDB API (MAPI)

Sjoerd Mullender sjoerd at monetdb.org
Thu Jan 3 16:04:46 CET 2013


On 2013-01-03 14:38, Brandon Jackson wrote:
> Good day,
> 
> I am working on improvements to the Pentaho MonetDB Bulk Loader plugin
> and want to get some advice for how to properly use the MonetDB API to
> get the MonetDB column data type meta-data for a given table.
> 
> Here is the use case:
> 
> A row of data in Pentaho Data Integration is composed of fields which
> have a basic Java data type and other meta data kept in a helper class
> that keeps track of format strings, digits of precision and so on.  At
> the current state of development a user has to know exactly which data
> type was used for a particular column and in which order the columns
> were defined by the original SQL CREATE statement.  In PDI, every field
> (column) starts life with primitive Java data types without necessarily
> having a specific format defined when the variable is initialized, a
> user might send a TIMESTAMP (i.e. 2013-01-03  00:00:00) to MonetDB where
> really a DATE (i.e 2013-01-03) was expected.
> 
> What I would like to do is before the bulk load actually runs, ask
> MonetDB through the API what the meta data is for each of the columns
> and output to the end user a table that shows the columns reported by
> MonetDB for a given table, the data type specified in the DDL, and show
> the user a suggested format string, which they could modify if they know
> what they are doing.   Format strings look like "yyyy-MM-dd  HH:mm:ss"
> and "#,###.#" etc.
> 
> Generally, what API calls or approach is recommended to accomplish what
> I am trying to do?
> 
> As a fall back, I considered querying the meta-data out like so:
> 
> SELECT name AS column_name, type as column_data_type
> FROM "sys"."columns"
> WHERE table_id = (
> SELECT id AS TABLE_ID
> FROM "sys"."tables"
> WHERE name = 'DIM_DATE'
> and schema_id = (SELECT id AS SCHEMA_ID
>      FROM "sys"."schemas"
>      WHERE name = 'testing'))
> ORDER BY number;
> 
> This brings back two columns, name and type where I can match them to
> the outgoing fields.
> Then I can process the result set in the step and update the value
> meta-data with the appropriate format mask.
> Running such SQL should be unnecessary but at least I have a fall-back
> solution.
> 
> Thanks for the advice,
> 
> Brandon Jackson

You could study how ODBC and JDBC do this.  For ODBC, look particularly
at ODBCInitResult().  One bit of magic that happens is that ODBC calls
mapi_set_size_header(mid, 1); early on (see SQLConnect_()) to request an
extra bit of information for each result set.  Using this extra
information, ODBC can figure out what the declared size is of char and
varchar columns.

As to the format string, I guess once you know what the SQL type is, you
can just use fixed formats for each type.

If you want to use pure SQL calls to get information about existing
tables, you can study what msqldump does in the file
clients/mapiclient/dump.c.  It uses SQL queries to the system tables to
find out the exact types of the tables that are to be dumped.  But note
that these queries tend to be quite large.


-- 
Sjoerd Mullender
_______________________________________________
developers-list mailing list
developers-list at monetdb.org
http://mail.monetdb.org/mailman/listinfo/developers-list



More information about the developers-list mailing list