Bug 3539

Summary: Configurable parsing for ETL and bulk loading
Product: SQL Reporter: Stefan de Konink <stefan>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: enhancement    
Priority: Normal    
Version: -- development   
Hardware: Other   
OS: Linux   

Description Stefan de Konink 2014-08-12 20:03:37 CEST
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.58 Safari/537.36
Build Identifier: 

Currently MonetDB supports a variety of ways to use COPY INTO. Since it is the only efficient way, with respect to CSV loading. I wonder if it could be extended to be more adaptive way or give a direction how a general purpose API could make bulk loading into MonetDB better from foreign fileformats.

Primary the problems are related to parsing. While dates and timestamps are a good example MonetDB is already graceful about the separator 'T' or ' '. But a general format parameter might be low hanging fruit to allow to import a specific document.

Another example are comma's, MonetDB doesn't eat them as points for decimal places, but a way to either handle it gracefully, inside the current locale or adaptive to the choice of the user might make.

Hence what would I do if I want to parse column in the format: dd-MM-YYYY or a numeric value with: 1,1 instead of 1.1 without changing a gazillion line source document?

It does not seem to be as simple as changing LC_ALL. I understand streaming from STDIN might be an option, what does the core team find feasible?

Reproducible: Always