sql - Neither percentile_cont nor percentile_disc are calculating the desired 75th percentile in PostgreSQL 9.6.3 -
working percentile functions, not getting desired output. "incorrect", functions working intended, , not understanding them properly.
these numbers working with:
n = 32 160000 202800 240000 250000 265000 280000 285000 300000 300000 300000 300000 300000 309000 325000 350000 358625 364999.92 393750 400000 420000 425000 450000 450000 463500 475000 475000 505808 525000 550000 567300 665000 900000
my understanding of percentile_cont
aggregate 2 numbers if count in add them , divide two. understanding of percentile_disc
select lowest number if count even.
this understanding of calculating percentile using 50th (median) example:
if number of numbers (n) odd, pick number in middle; if number even, average 2 numbers in middle. in case, there 32 numbers, median = (358625 + 364999.92) / 2 = 361812.46
. percentile_cont
returns correct value since averages 2 values; percentile_disc
returns incorrect value since picks lowest of two.
regarding other percentiles, 10th example, understanding multiple percentile number of numbers (n) index: .10 * 32 = 3.2 index
in case. supposed round nearest whole number , percentile value. if index whole number, average number in index number right after it.
in case, percentile_cont
wrong because returns 251500
isn't number can arrive at. closest can averaging 24000, 250000, 265000
251666.67
. percentile_disc
returns correct result of 250000
.
but real kicker one: 75th. should return 469250
according calculations. index = (32*.75) = 24
, , index should result in (463500 + 475000) = 469250
. percentile_disc
returns 463500
; percentile_cont
returns 466375
, again can't arrive @ number life of me.
this query:
select itemcode, count(itemcode) n, percentile_dist(0.10) within group (order price) 10th, percentile_dist(0.25) within group (order price) 25th, percentile_cont(0.50) within group (order price) median, avg(price) mean, percentile_dist(0.65) within group (order price) 65th, percentile_dist(0.75) within group (order price) 75th, percentile_dist(0.90) within group (order price) 90th items itemcode = 26 , removed null group itemcode;
note: there no cases removed
not null
.
what need working correctly , consistency? need write function checks n
first before decide percentile_disc
or percentile_cont
based on whether or odd?
sql fiddle: http://sqlfiddle.com/#!17/aa09c/9
posted question reddit , able help.
apparently, percentile_cont
function, in addition percentile
, percentile.inc
functions in excel, calculate using c=1 variant of linear interpolation explained in wikipedia:
apparently, have been using called empirical distribution averaging.
so native functions of postgresql won't work , need make custom function post when done. (i suspect use old ntile
method before 9.4, still looking it).
but anyway, why off.
ReplyDeletePython String
Python Lists
Python Variable
Python User Input
Python Numbers
Python Tuples
Python Dictionary
Python If Statement