sql - Neither percentile_cont nor percentile_disc are calculating the desired 75th percentile in PostgreSQL 9.6.3 -


working percentile functions, not getting desired output. "incorrect", functions working intended, , not understanding them properly.

these numbers working with:

n = 32  160000 202800 240000 250000 265000 280000 285000 300000 300000 300000 300000 300000 309000 325000 350000 358625 364999.92 393750 400000 420000 425000 450000 450000 463500 475000 475000 505808 525000 550000 567300 665000 900000 

my understanding of percentile_cont aggregate 2 numbers if count in add them , divide two. understanding of percentile_disc select lowest number if count even.

this understanding of calculating percentile using 50th (median) example:

if number of numbers (n) odd, pick number in middle; if number even, average 2 numbers in middle. in case, there 32 numbers, median = (358625 + 364999.92) / 2 = 361812.46. percentile_cont returns correct value since averages 2 values; percentile_disc returns incorrect value since picks lowest of two.

regarding other percentiles, 10th example, understanding multiple percentile number of numbers (n) index: .10 * 32 = 3.2 index in case. supposed round nearest whole number , percentile value. if index whole number, average number in index number right after it.

in case, percentile_cont wrong because returns 251500 isn't number can arrive at. closest can averaging 24000, 250000, 265000 251666.67. percentile_disc returns correct result of 250000.

but real kicker one: 75th. should return 469250 according calculations. index = (32*.75) = 24, , index should result in (463500 + 475000) = 469250. percentile_disc returns 463500; percentile_cont returns 466375, again can't arrive @ number life of me.

this query:

select      itemcode,      count(itemcode) n,      percentile_dist(0.10) within group (order price) 10th,     percentile_dist(0.25) within group (order price) 25th,     percentile_cont(0.50) within group (order price) median,     avg(price) mean,     percentile_dist(0.65) within group (order price) 65th,     percentile_dist(0.75) within group (order price) 75th,     percentile_dist(0.90) within group (order price) 90th items itemcode = 26 , removed null group itemcode; 

note: there no cases removed not null.

what need working correctly , consistency? need write function checks n first before decide percentile_disc or percentile_cont based on whether or odd?

sql fiddle: http://sqlfiddle.com/#!17/aa09c/9

posted question reddit , able help.

apparently, percentile_cont function, in addition percentile , percentile.inc functions in excel, calculate using c=1 variant of linear interpolation explained in wikipedia:

https://en.wikipedia.org/wiki/percentile#second_variant.2c_.7f.27.22.60uniq--postmath-00000043-qinu.60.22.27.7f

apparently, have been using called empirical distribution averaging.

so native functions of postgresql won't work , need make custom function post when done. (i suspect use old ntile method before 9.4, still looking it).

but anyway, why off.


Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -