Bug 3539 - Configurable parsing for ETL and bulk loading
Summary: Configurable parsing for ETL and bulk loading
Status: NEW
Alias: None
Product: SQL
Classification: Unclassified
Component: all (show other bugs)
Version: -- development
Hardware: Other Linux
: Normal enhancement
Assignee: SQL devs
Depends on:
Reported: 2014-08-12 20:03 CEST by Stefan de Konink
Modified: 2016-04-11 11:44 CEST (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Stefan de Konink 2014-08-12 20:03:37 CEST
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.58 Safari/537.36
Build Identifier: 

Currently MonetDB supports a variety of ways to use COPY INTO. Since it is the only efficient way, with respect to CSV loading. I wonder if it could be extended to be more adaptive way or give a direction how a general purpose API could make bulk loading into MonetDB better from foreign fileformats.

Primary the problems are related to parsing. While dates and timestamps are a good example MonetDB is already graceful about the separator 'T' or ' '. But a general format parameter might be low hanging fruit to allow to import a specific document.

Another example are comma's, MonetDB doesn't eat them as points for decimal places, but a way to either handle it gracefully, inside the current locale or adaptive to the choice of the user might make.

Hence what would I do if I want to parse column in the format: dd-MM-YYYY or a numeric value with: 1,1 instead of 1.1 without changing a gazillion line source document?

It does not seem to be as simple as changing LC_ALL. I understand streaming from STDIN might be an option, what does the core team find feasible?

Reproducible: Always