Roberto Cornacchia roberto.cornacchia at gmail.com
Mon Mar 21 14:24:57 CET 2016

In a C UDF, looping on a [:oid,:str:] bat, I'm tokenizing each string tail
into a str array:

BATloop(..) {
  str *token_array = /* create a str array by tokenizing the str tail of
this BUN */

  /* append a histogram of token_array to result */

The token array is expected to be in the range of 10 - 10K short strings (1
to 10 bytes).

To get the histogram, I'd turn the token_array into a BAT b, then use

BATgroup(&gn, NULL, &hn, b, NULL, NULL, NULL);

Do you see a more efficient way? Is there a group/histogram primitive
implemented that works directly on arrays rather than BATs?

Thanks, Roberto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20160321/b25230b5/attachment.html>

More information about the developers-list mailing list