aggregate function in MAL

Stefan Manegold Stefan.Manegold at cwi.nl
Fri Jul 5 13:32:46 CEST 2013


On Fri, Jul 05, 2013 at 10:08:31AM +0000, Scott Mathieson wrote:
> Hi,
> 
> I am trying to create a simple aggregate function entirely in MAL and am now struggling at the final hurdle. I have created the aggregate function such that it aggregates across an entire table but in the sub aggregate function I can't work out how to link the group to data BAT (g) to the data. The aggregate is designed to simply collect a sample of the values from the named column and return them as a pipe delimited string value. Can anyone tell me how to use the 'g' bat to join to the 'b' bat in the sub... aggregate function to get only the data rows for a specific group - currently I only get 2 rows of data per group even though I know that there should be 100k+ rows in each group?
> 
> Here's the code:
> 
> 
> function bin_sample(b:bat[:oid, :any]):str;
>        value := "";
>        bsamp := sample.uniform(b, 1000:wrd);
>        barrier (h, t) := iterator.new(bsamp);
>               tmp := "" + t;
>               type := bat.getTailType(bsamp);
>               test := str.startsWith(type, "str");
>               # Force the string type to be cast to a string or you just end up with numeric values
>               barrier xxxx := test;
>                   tmp := str.str(t);
>               exit xxxx;
> 
>               value := value + tmp;
>               value := value + "||";
>               redo (h, t) := iterator.next(bsamp);
>        exit (h, t);
>        return value;
> end bin_sample;

why not simply cast any type to string?
i.e.,

function bin_sample(b:bat[:oid, :any]):str;
       value := "";
       bsamp := sample.uniform(b, 1000:wrd);
       barrier (h, t) := iterator.new(bsamp);
              tmp := calc.str(t);
              value := value + tmp;
              value := value + "||";
              redo (h, t) := iterator.next(bsamp);
       exit (h, t);
       return value;
end bin_sample;

or even better, do the cast "in bulk":

function bin_sample(b:bat[:oid, :any]):str;
       value := "";
       bsamp := sample.uniform(b, 1000:wrd);
       bsampstr := batcalc.str(bsamp);
       barrier (h, t) := iterator.new(bsampstr);
              value := value + t;
              value := value + "||";
              redo (h, t) := iterator.next(bsampstr);
       exit (h, t);
       return value;
end bin_sample;

> 
> 
> function subbin_sample(b:bat[:oid,:any_1],g:bat[:oid,:oid],e:bat[:oid,:any_2],skip_nils:bit) :bat[:oid,:str];
>        bn:=bat.new(:oid, :str);
> 
>        # check 'e' has some values - this is the list of group IDs in the head
>        # 'g' is the group to the data BAT head ID mapping
>        # 'b' is the data column BAT that we're aggregating over
>        barrier (grpid, t) := iterator.new(e);
> 
>               # create a single row BAT of the group id we're processing
>               bn_gid := bat.new(:oid, :oid);
>               bat.insert(bn_gid, grpid, grpid);
> 
>               # join this to the GID->DATA map BAT to get the Group DATA oids
>               b_g2dr := bat.reverse(g);
>               b_gid2data := algebra.leftjoin(bn_gid, b_g2dr);
> 
>               grpval := "";
> 
>               # Get the BAT of the actual data to be processed
>               #
>               # !!! HOW TO GET THIS TO WORK????????
>               #
>               b_data := algebra.join(b_gid2data, b);
>               grpval := bin_sample(b_data);
> 
>               # Store the result for this group
>               bat.insert(bn, grpid, grpval);
>               redo (grpid, t) := iterator.next(e);
>        exit (grpid, t);
>        return bn;
> 
> end subbin_sample;

what about:

function subbin_sample(b:bat[:oid,:any_1],g:bat[:oid,:oid],e:bat[:oid,:any_2],skip_nils:bit) :bat[:oid,:str];
       nw := aggr.count(e);
       nl := calc.lng(nw);
       bn := bat.new(:oid, :str, nl);

       # check 'e' has some values - this is the list of group IDs in the head
       # 'g' is the group to the data BAT head ID mapping
       # 'b' is the data column BAT that we're aggregating over
       barrier (grpid, t) := iterator.new(e);

              # select GID from TID->GID map to get matching TIDs
              TIDs := algebra.uselect(g,grpid);

              # get DATA for matching TIDs
              b_data := algebra.kintersect(b,TIDs);

              # aggregate
              grpval := bin_sample(b_data);

              # Store the result for this group
              bat.insert(bn, grpid, grpval);

              redo (grpid, t) := iterator.next(e);
       exit (grpid, t);
       return bn;

end subbin_sample;

Stefan

> Scott Mathieson,
> 
> ________________________________
> 

> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> http://mail.monetdb.org/mailman/listinfo/users-list


-- 
| Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
|  www.CWI.nl/~manegold  | Science Park 123 (L321) |
|   +31 (0)20 592-4212   | 1098 XG Amsterdam  (NL) |



More information about the users-list mailing list