pymonetdb or Python UDF

Mark Clements mark.clements at ki.se
Sun Jun 17 23:30:10 CEST 2018


Adam,

A solution in Python could use the following SQL snippets:

-- example data
drop table if exists d;
create table d as select 'key1=value1, key2=value2, key3=value3' as s
union all select 'key1=value4, key2=value5, key3=value6';

-- create utility function
create or replace function pcre_replace(origin string, pat string, repl
string, flags string)
  returns string external name pcre.replace;

-- split - if you know the key and the number of columns
select pcre_replace(s,'.*key1[=]\([a-zA-Z0-9-]*\),.*','\\1','') as value1,
  pcre_replace(s,'.*key2[=]\([a-zA-Z0-9-]*\),.*','\\1','') as value2,
  pcre_replace(s,'.*key3[=]\([a-zA-Z0-9-]*\).*','\\1','') as value3
  from d;

A more general solution in Python could be similar to the following R
solution:

## use DBI and MonetDB.R packages
library(DBI)
library(MonetDB.R)
## create a connection
con <- dbConnect(MonetDB.R::MonetDB.R(), dbname="testt")
## example data
dbWriteTable(con,"d",
             data.frame(s=c('key1=value1, key2=value2, key3=value3',
                            'key1=value4, key2=value5, key3=value6'),
                        stringsAsFactors=FALSE), overwrite=TRUE)
## utility function
is.na(dbExecute(con, "create or replace function pcre_replace(origin
string, pat string, repl string, flags string) returns string external
name pcre.replace"))
## function to split a column and create a new table
splitColumn <- function(con, tableName, columnName, newTableName) {
    ## read the first row
    d1 <- dbGetQuery(con, sprintf("select %s from %s limit 1",
columnName, tableName))
    ## get the number of splits
    n <- length(strsplit(d1$s[1],", ")[[1]])
    ## get the key
    key <- gsub("^(.*)1=.*","\\1",d1$s[1])
    ## create the regexs
    regex <- paste0(".*",key,1:n,"=\\([a-zA-Z0-9_-]*\\).*")
    ## create the new table
    dbExecute(con, sprintf("drop table if exists %s", newTableName))
    stmt <- sprintf("create table %s as select %s from %s",
                    newTableName,
                    paste0(sprintf("pcre_replace(%s, '%s','\\\\1','') as
value%i",
                                   columnName, regex, 1:n), collapse=", "),
                    tableName)
    is.na(dbExecute(con, stmt))
}
splitColumn(con, "d", "s", "d2")
## check the new table
print(dbGetQuery(con, "select * from d2"))
## clean up
dbDisconnect(con)

I hope that this is helpful.

-- Mark

On 06/14/2018 04:41 PM, Doherty, Adam wrote:
> Hi Stefan:
>
> Thanks for the reply.
> I created a UDF which returns a table, however when I try to run the function I get the following error - 
>
> Error: Embedded Python is enabled but an error was thrown during initialization.
> SQLState:  PY000
> ErrorCode: 0
>
> I have Python 2.7.5 installed on CentOS 7.4
>
> Adam
>
>
> -----Original Message-----
> From: users-list [mailto:users-list-bounces+adam.doherty=esso.ca at monetdb.org] On Behalf Of Stefan Manegold
> Sent: Thursday, June 14, 2018 02:39
> To: Communication channel for MonetDB users <users-list at monetdb.org>
> Subject: Re: pymonetdb or Python UDF
>
>
> Hi Adam,
>
> if you know how to do it (efficiently) in Python,
> it's merely a matter of wrapping that Python code 
> into a suitable Python UDF;
> cf., https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb
>
> Alternatively, you could consider using a (JIT-compiled) C(/C++) UDF;
> cf., https://www.monetdb.org/node/437
>
> Best,
> Stefan
>
> ----- On Jun 13, 2018, at 6:31 PM, Doherty, Adam adam.doherty at esso.ca wrote:
>
>> Hello all:
>>
>>
>>
>> This is more of curiosity but I have data in my events table currently trapped
>> in a nvarchar field.
>>
>> I am wondering if I can build a table and do execute a select into with a python
>> UDF pulling data from the existing fields plus the values of my nvarchar field.
>>
>>
>>
>> My string data has the following format –
>>
>>
>>
>> “key1=value, key2=val, key3=value”…
>>
>>
>>
>> I want to break out the data first by splitting on the , and then the =
>>
>> Keys would be the columns in the table.
>>
>>
>>
>> I can do this with a python script and pymonetdb but a UDF would be faster.
>>
>>
>>
>> Adam Doherty
>>
>> Process Control Systems Administrator
>>
>> Kearl Technical, Imperial
>>
>> 505 Quarry Park Blvd SE
>>
>> W3B.224
>>
>> Calgary, AB T2C 5N1
>>
>> adam.doherty at esso.ca P 587.476.4640 M 403.869.1636
>>
>> imperialoil.ca | Twitter | YouTube
>>
>>
>>
>> _______________________________________________
>> users-list mailing list
>> users-list at monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list



More information about the users-list mailing list