Hi Puthick,

 

Last year I was in need of this function and I came across the following example on the mailing list:

 

---

 

function group_concat(b:bat[:oid, :str]):str;

       value := ";";

       barrier (h, t) := iterator.new(b);

              value := value + t;

              value := value + ";";

              redo (h, t) := iterator.next(b);

       exit (h, t);

       return value;

end group_concat;

 

function subgroup_concat(b:bat[:oid,:any_1],g:bat[:oid,:oid],e:bat[:oid,:any_2],skip_nils:bit) :bat[:oid,:str];

       nw := aggr.count(e);

       nl := calc.lng(nw);

       bn := bat.new(:oid, :str, nl);

 

       # check 'e' has some values - this is the list of group IDs in the head

       # 'g' is the group to the data BAT head ID mapping

       # 'b' is the data column BAT that we are aggregating over

       barrier (grpid, t) := iterator.new(e);

 

              # select GID from TID->GID map to get matching TIDs

              TIDs := algebra.uselect(g,grpid);

 

              # get DATA for matching TIDs

              b_data := algebra.kintersect(b,TIDs);

 

              # aggregate

              grpval := group_concat(b_data);

 

              # Store the result for this group

              bat.insert(bn, grpid, grpval);

 

              redo (grpid, t) := iterator.next(e);

       exit (grpid, t);

       return bn;

 

end subgroup_concat;

 

---

 

You should place this function in aggr.mal to prevent issues with the mitosis optimizer, see https://www.monetdb.org/pipermail/users-list/2013-August/006722.html for details on this.

 

Register this function in SQL using:

 

CREATE AGGREGATE group_concat (val STRING) returns STRING external name aggr.group_concat;

 

My experience with this MAL implementation is that performance is suboptimal. For better performance you can implement this function as a C UDF for instance..

 

Hope this helps.

 

Regards,

 

Nik

 

Van: users-list [mailto:users-list-bounces+n.h.schuiling=students.uu.nl@monetdb.org] Namens Puthick Hok
Verzonden: donderdag 16 april 2015 9:23
Aan: users-list@monetdb.org
Onderwerp: MAL aggregate function difficulty

 

Hi,

I'm trying to define an aggregate function in MonetDB to concatenate rows in select group by (GROUP_CONCAT in MySQL). Postgres does not have this but there is an example to define this function.

I'm starting to use monetdb from mysql thanks to MonetDB performance for large data sets. For some small tables, we have the feature to allow users to store small amount of custom data in rows and have a web service to convert rows into columns. We have this feature because we can do it in MySQL and Postgres using GROUP_CONCAT. I'm working on an open source breeding system which stores breeding data in MySQL, environment data in Postgres and now marker data in MonetDB in Australia. The system is called KDDart.

Going back to my problem is that there is the same question with GROUP_CONCAT about 3 years ago. I have been googling and trying based on the information available for 2 days now. I have learnt how to define a MAL function and make it available in SQL mode by saving it in the autoload. However, when it comes to define and write a MAL aggregate function. I'm unable to find any step-by-step example. I could not start successfully with a simple aggregate solution.

I google further that we a documentation on monetdb website about defining an aggregate function in R. I installed R and recompile my MonetDB source (MonetDB-11.19.9) on my openSuSe 12.2 with a pre-compiled R (R-base-2.15.1-1.1.4.x86_64). I followed the instruction to enable rintegration and compile successfully. R integration is loaded with mserver5 instruction. But when using monetdbd it says 'set: no such property: embedr'. I did bit of C debugging but I only got Segmentation fault.

I don't know what you could do to help me out. I wish I could write the GROUP_CONCAT MAL function and being able to define it permanently in .mal file and being able to link it to the SQL.

I hope you can help me out.

Thanks in advance.
Puthick