Greetings,

We're currently evaluating MonetDB for a analytical DW and so far we are happy with the results.

I am trying to implement a grouping function that calculates a a value over a set of strings, so that my queries would read like this:

select metric, udf_aggregate(string_column)
from table
group by metric;

for a bit of background, we're using a dinstinct value sketch called HyperLogLog
http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/
http://blog.aggregateknowledge.com/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/

and we're currently storing estimations for each time period (day). HLL lets you merge/aggregate a set of estimations (each estimation is a vector of numbers, we're currently storing it as a string) for an arbitrary range, and still have an accurate estimation. (I'm sure the literature doesn't call it estimations, sorry for my English)

what I would like is a custom UDF like the one provided in MonetDB src (reverse) but that would operate and behave like an aggregate function.

Right now, I'm not considering using it for types other than string (no need for polymorphic right now).
Is this possible with an UDF? I found a way of registering aggregate functions on the mailing list, but the HLL is complex enough to warrant its own C impl, instead of a MAL function.

Thanks,
Miguel