hi, yes!  this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations




    # run this line once
    # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))


    pctiles <- seq( 0 , 1 , .01 )

    library(DBI)            # load the DBI package (implements the R-database coding)
    library(MonetDBLite)    # load MonetDBLite package (creates database files in R)

    db <- dbConnect( MonetDBLite() )

    # requires a unique identifier because i could not get row_number() working
    # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000
    mtcars$id <- seq( nrow( mtcars ) )

    # store mtcars
    dbWriteTable( db , 'mtcars' , mtcars )


    sql <-
        "SELECT
            t1.id,
            t1.hp ,
            t1.wt ,
            SUM( t2.wt ) AS cumwt
        FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp
        GROUP BY t1.id , t1.hp , t1.wt"

    dbGetQuery( db , sql )

    sql2 <-
        paste("( SELECT
                row_number() over () AS id ,
                wt ,
                cumwt,
                totwt,
                hp
        FROM
                mtcars AS td
                NATURAL JOIN
                (
                        " , sql , "
                ) AS cj
                ,
                (
                        SELECT
                                SUM(wt) AS totwt
                        FROM
                                mtcars
                ) AS tj ) ")


    dbGetQuery( db , sql2 )

    sql3 <-
        paste(
            "select pcs.percentile ,
                min(case when cumwt >= totwt * pcs.percentile then hp end)
                from (" , sql2 , ") AS v ,
         (" ,
         
         paste( "select" , pctiles , " as percentile" , collapse = " union all " ) ,
         " ) pcs group by pcs.percentile" )


    library(Hmisc)
    library(survey)
    ms <- svydesign(~1,data=mtcars,weight=~wt)


    cbind( dbGetQuery( db , sql3 ),
    wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) ,
    as.numeric(svyquantile(~ hp,ms,pctiles)),
    as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))





    cbind( dbGetQuery( db , sql3 )[26,],
    wtd.quantile( mtcars$hp , mtcars$wt , .25 ) ,
    as.numeric(svyquantile(~ hp,ms,.25)),
    as.numeric(svyquantile(~ hp,ms,.25,method='constant')))


    cbind( dbGetQuery( db , sql3 )[51,],
    wtd.quantile( mtcars$hp , mtcars$wt , .5 ) ,
    as.numeric(svyquantile(~ hp,ms,.5)) ,
    as.numeric(svyquantile(~ hp,ms,.5,method='constant')))


    cbind( dbGetQuery( db , sql3 )[76,],
    wtd.quantile( mtcars$hp , mtcars$wt , .75 ) ,
    as.numeric(svyquantile(~ hp,ms,.75)),
    as.numeric(svyquantile(~ hp,ms,.75,method='constant')))



On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,

Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile.
Now I understand why you need the cumsum for this.

Thank you.



On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico <ajdamico@gmail.com> wrote:
might be able to use this.. will try..  http://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum


On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico <ajdamico@gmail.com> wrote:
thanks stefan (and also kai jiang)!  i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead.  the `cumjobs` column needs to be cumulative for a weighted percentile to work--   is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another-approach-to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Anthony,

while MonetDB indeed does not support this syntax,
I'm also puzzled what kind of semantic or result you'd expect from this syntax?

    SELECT
        batch_id,
        job_count,
        SUM(job_count) OVER (PARTITION BY duration) as cumjobs,
        SUM(job_count) over () as totjobs,
        duration
    FROM
        test_data ;

The only possible result would IMHO be a
table with as many rows as the input table test_data has,
columns batch_id, job_count, duration identical to the input,
one additional column cumjobs with the same value, i.e., sum
of all job_count within each duration, replicated for all rows
with same duration,
and one additional column totjobs, with the same value (global
sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT
        batch_id,
        job_count,
        cumjobs,
        totjobs,
        duration
FROM
        test_data AS td
        NATURAL JOIN
        (
                SELECT
                        duration,
                        SUM(job_count) AS cumjobs
                FROM
                        test_data
                GROUP BY
                        duration
        ) AS cj
        ,
        (
                SELECT
                        SUM(job_count) AS totjobs
                FROM
                        test_data
        ) AS tj
;


Best,
Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

> No, indeed, this syntax is not supported.
>
>
> On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hmm, that gives me the same error..
>
> SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS
> tot_wgt, hp FROM mtcars;
>
>
> Error in .local(conn, statement, ...) :
> Unable to execute statement 'SELECT
> mpg,
> wt,
> SUM(wt) OVER (PARTITION BY hp) AS cum_wgt,
> SUM(wt) over () AS tot_wgt,
> hp
> FROM mtc...'.
> Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
>
>
> sorry if i'm doing something silly..
>
>
>
>
> On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia <
> roberto.cornacchia@gmail.com > wrote:
>
>
>
> I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
>
> Otherwise where would the groups for the sum come from?
>
> Roberto
>
> On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hi, monetdb does not support SUM() OVER commands.. does anyone have a smart
> alternative to implement this? thanks
>
>
> SELECT
> batch_id,
> job_count,
> SUM(job_count) OVER (ORDER BY duration) as cumjobs,
> SUM(job_count) over () as totjobs,
> duration
> FROM
> test_data ;
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list