Support for multi-column aggregate functions

Anthony Damico ajdamico at gmail.com
Sat Apr 30 07:11:08 CEST 2016


if you are able to figure out how to define that function, i would
appreciate if you could share it.  thanks

On Fri, Apr 29, 2016 at 2:43 PM, Anderson, David B <
david.b.anderson at citi.com> wrote:

> Thanks. I will look at these.
>
>
>
> My problem is that my data set is ~1 billion records, and I need to group
> by several dimensions (so I have ~2000 composite factors) computing
> weighted averages for some fields along with weighted percentiles
> (0,25,50,75,100) for a smaller subset of the fields. Currently, I’m looking
> at the MonetDB internals to see if this is possible.
>
>
>
> A “perfect” query would look like:
>
>
>
> SELECT
>
> sum(a*w)/sum(w) as wa_a,
>
> sum(b*w)/sum(w) as wa_b,
>
> weightedpercentile(a,w,0.25) as wp25_a,
>
> weightedpercentile(a,w,0.50) as wp50_a
>
> FROM
>
> mytable
>
> GROUP BY c,d HAVING sum(w) > 0
>
>
>
>
>
>
>
> *From:* users-list [mailto:users-list-bounces+david.b.anderson=
> citi.com at monetdb.org] *On Behalf Of *Anthony Damico
> *Sent:* Friday, April 29, 2016 9:13 AM
>
> *To:* Communication channel for MonetDB users
> *Subject:* Re: Support for multi-column aggregate functions
>
>
>
> two alternatives (which might require some disentangling) are
>
> install.packages("sqlsurvey", repos="http://R-Forge.R-project.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__R-2DForge.R-2Dproject.org&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=jRfVH1kUI4OVngyLt6WSKZEFTBpSeL1GOKOpjRQQAO4&e=>
> ")
>
> sqlsurvey:::svyquantile.sqlrepsurvey
>
> sqlsurvey:::svyquantile.sqlsurvey
>
>
>
>
>
> On Fri, Apr 29, 2016 at 8:59 AM, Anthony Damico <ajdamico at gmail.com>
> wrote:
>
> david, these queries get pretty close to wtd.quantile and svyquantile on a
> 32-record data set.  if you have lots of records, i think it'd perform even
> better?
>
>
>
>
> pctiles <- seq( 0 , 1 , .01 )
>
> library(DBI)            # load the DBI package (implements the R-database
> coding)
> library(MonetDB.R)        # load the MonetDB.R package (connects r to a
> monet database)
> library(MonetDBLite)    # load MonetDBLite package (creates database files
> in R)
>
> db <- dbConnect( MonetDBLite() )
>
> # requires a unique identifier because i could not get row_number() working
> # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_bugzilla_show-5Fbug.cgi-3Fid-3D4000&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=OdhZqkKiM9K6szPKPt2QsjZQ423hv1Fawe-p_zS5EyM&e=>
> mtcars$id <- seq( nrow( mtcars ) )
>
> # store mtcars
> dbWriteTable( db , 'mtcars' , mtcars )
>
>
> sql <-
>     "SELECT
>         t1.id
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__t1.id&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=4-JSp4ejJ7-Q1lgKW66qDGHvMTaPfTcb-Br0MuJaidM&e=>
> ,
>         t1.hp ,
>         t1.wt ,
>         SUM( t2.wt ) AS cumwt
>     FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp
>     GROUP BY t1.id
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__t1.id&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=4-JSp4ejJ7-Q1lgKW66qDGHvMTaPfTcb-Br0MuJaidM&e=>
> , t1.hp , t1.wt"
>
> dbGetQuery( db , sql )
>
> sql2 <-
>     paste("( SELECT
>             row_number() over () AS id ,
>             wt ,
>             cumwt,
>             totwt,
>             hp
>     FROM
>             mtcars AS td
>             NATURAL JOIN
>             (
>                     " , sql , "
>             ) AS cj
>             ,
>             (
>                     SELECT
>                             SUM(wt) AS totwt
>                     FROM
>                             mtcars
>             ) AS tj ) ")
>
>
> dbGetQuery( db , sql2 )
>
> sql3 <-
>     paste(
>         "select pcs.percentile ,
>             min(case when cumwt >= totwt * pcs.percentile then hp end)
>             from (" , sql2 , ") AS v ,
>      (" ,
>
>      paste( "select" , pctiles , " as percentile" , collapse = " union all
> " ) ,
>      " ) pcs group by pcs.percentile" )
>
>
> library(Hmisc)
> library(survey)
> ms <- svydesign(~1,data=mtcars,weight=~wt)
>
> cbind( dbGetQuery( db , sql3 ),
> wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) ,
> as.numeric(svyquantile(~ hp,ms,pctiles)))
>
>
>
>
>
>
>
>
>
> On Wed, Apr 27, 2016 at 11:07 AM, Anderson, David B <
> david.b.anderson at citi.com> wrote:
>
> I was hoping to use a native/low level implementation for high performance
> (my data set is on the order of 2.  It looks like multiple column
> aggregates are supported (corr & covar) so in theory I could add a new
> function.
>
>
>
> Poking around in the archives, it looks like I need to add a new MAL and
> link it to a C implementation.
>
>
>
> https://www.monetdb.org/pipermail/developers-list/2015-December/004553.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_pipermail_developers-2Dlist_2015-2DDecember_004553.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=cMiWT4sRgzhb9yWo1raPOchSYFxsA4Bu4rH6nZsXv3I&e=>
>
>
>
> Are there any examples on how to do this?
>
>
>
> *From:* users-list [mailto:users-list-bounces+david.b.anderson=
> citi.com at monetdb.org] *On Behalf Of *Anthony Damico
> *Sent:* Wednesday, April 27, 2016 9:47 AM
> *To:* Communication channel for MonetDB users
> *Subject:* Re: Support for multi-column aggregate functions
>
>
>
> possible to translate one of these?
>
>
>
> http://stackoverflow.com/questions/16450779/another-approach-to-percentiles
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_16450779_another-2Dapproach-2Dto-2Dpercentiles&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=VPKaqqy1Sox6hJP3RD8x9lYAWs4sDr5tN4jtb0tElwI&s=4T2ole-AWFy74ddKmo_2X7-1Okti6t0SGcrP3Pg1ttw&e=>
>
>
>
> On Wednesday, April 27, 2016, Anderson, David B <david.b.anderson at citi.com>
> wrote:
>
> I am really looking for an implementation of a weighted percentile within
> a group.
>
> -----Original Message-----
> From: Anderson, David B [ICG-MKTS]
> Sent: Wednesday, April 27, 2016 9:18 AM
> To: users-list at monetdb.org
> Subject: Support for multi-column aggregate functions
>
> All,
>
> Does the current version of MonetDB support multi-column aggregate
> functions? I need to create a function which returns the "quantiles" of one
> column using another column as a weighting. It looks like PostgreSQL
> supports multi-column aggregates as of 8.2.
>
> I could write an embedded R function to compute this, but want to know if
> it is even possible to pass in multiple columns.
>
> Thanks,
> Dave
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=VPKaqqy1Sox6hJP3RD8x9lYAWs4sDr5tN4jtb0tElwI&s=T43LwaiW3foyJ1zGqAUQo9P1My84NzY92cS5wHK1xeA&e=>
>
>
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=FvzQxF2GuIV0fh_pivsgjq0sdmigvtu-gXDX9TmH4ek&e=>
>
>
>
>
>
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/users-list/attachments/20160430/6b17a4f9/attachment.html>


More information about the users-list mailing list