Is it possible to construct aggregate functions using embedded R? Would the syntax be: "create aggregate function"? I need to pass in a pair of aligned vectors/columns to compute weighted percentiles.

I've held off on taking the R route for performance reasons, but I don't see how to pass multiple vectors into the new sub* aggregate implementation.

All of the aggregate functions in gdk_agg.c take a single column and I don't know if it is possible to alter the interface to pass in multiples.

Thanks,
Dave



-----Original Message-----
From: Roberto Cornacchia [roberto.cornacchia@gmail.com]
Sent: Wednesday, May 04, 2016 04:59 AM Eastern Standard Time
To: Communication channel for MonetDB users
Subject: Re: PCRE: references to captures in replace string

Another option is to use R. This is available in Jul2015 branch, and is very easy (https://www.monetdb.org/content/embedded-r-monetdb)

CREATE FUNCTION r_replace(s STRING, p STRING, r STRING) RETURNS STRING LANGUAGE R {
 sub(p,r,s)
};

sql>select r_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\2 \\1');
+------------------------+
| r_replace_single_value |
+========================+
| Pear Apple             |
+------------------------+

It works out of the box on columns:

select r_replace(value, '^.*(\d).*$','\\1%\\2') from obj_string limit 100;

And luckily without loops:

X_10:bat[:oid,:str] := batrapi.eval(nil,"{\n sub(p,r,s)\n};",X_9,A0,A1);


Still not very fast - I've just tried on 11M strings, 45s - probably because strings are fully copied, but for now this seems the easiest workaround.

It would be great to have it working on pcre though.
Roberto


On 4 May 2016 at 10:31, Roberto Cornacchia <roberto.cornacchia@gmail.com> wrote:
An interesting workaround for now could be to implement a simple UDF in python, to let python perform the regex processing: https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb

Unfortunately this is for now only available in a separate branch which is kept in sync with default. For me this is no option because I need to rely on a stable MonetDB version. But perhaps for other it is an option.

On 26 April 2016 at 16:57, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
I still find this to be an interested feature to add, as its very useful in string date manipulation. 

On Tue, Apr 26, 2016 at 5:28 PM, Roberto Cornacchia <roberto.cornacchia@gmail.com> wrote:
I found this thread from October 2015 where Sjoerd concludes that back references replacement is not implemented in the MonetDB wrapper: https://www.monetdb.org/pipermail/users-list/2015-October/008492.html

Probably that is still the case?

On 20 April 2016 at 16:35, Roberto Cornacchia <roberto.cornacchia@gmail.com> wrote:
After declaring a wrapper for pcre.replace:

sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace";
operation successful (6.187ms)

I cannot get replacements with references to captures work (but the code in pcre.c seems to support it):

sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', '');
+---------------------------+
| pcre_replace_single_value |
+===========================+
| \1 \2                     |
+---------------------------+
1 tuple (0.309ms)

The pattern is matched, but the back references are replaced literally.
Am I not getting the syntax correct?
From http://www.pcre.org/original/doc/html/pcresyntax.html I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.

Roberto



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list