I’m getting into the Quantile section and there is something that is troubling me:
According to material:
“the yth percentile is the value at or below which Y percent observations lie”.
So if I go from there, if I have 50 observations and I want 10th percentile, I should look at observation number 5 (50*10%), which will encompass 10% of my observations.
However, if I take material example, they use the formula for linear interpolation and got: (50+1)*(10/100) = 5.1 which is quite different from 5.
I understand the theory and why this formula is used when you cannot find an exact percentile (15th percentile here for example) but i don’t understand the logic from the above exemple.
The point which is even troubling me more is that, if you take the 50th percentile, it gives you the median: (50+1)*(50/100) = (25+26)/2 so the formula is correct.
I think this is like the case when you count the years of a project, guess the following:
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
How many years are displayed? 2017 - 2008 ? or 2017 - 2008 + 1 ?
From the beggining of 2008 to the end of 2015 there are 10 years, not 9. So need to adjust it with the (+1)
Since the quartiles calculated as (n+1) y/100 is a measure of location we need to add the +1 to correctly considerate the first or the last observation as an observation too.
From your 50-observation 10th percentile example it must be 5.1 to correctly consider 5 observations inside it. And the 50th percentile is 25.5 to correctly consider 25 observations. But the decimal in both cases will give you the exact intrapolation you need to get exactly the 5% and the 50% observations inside those percentiles.
I spent a good amount of time agonizing over this same question. I ended up reaching out to a math professor. He said that quantiles, and specifically percentiles, can be very tricky as there is no standard definition.
The CFA curriculum defines the percentile “y” as the value AT or BELOW “y” percent of the distribution lies. Let’s call this “definition one.”
There’s an alternative definition that says the percentile “z” corresponds to the smallest value within the distribution that is greater than “z” percent of the distribution. Let’s call this “definition two.”
Here is some numerical data to help demonstrate what’s going on here:
By the first definition (the smallest value greater than or equal to 25% of the scores), the 25th percentile would be 5. That’s because rank positions 1 and 2 represent 25% of the total number of data points.
By the second definition (the smallest value greater than 25% of the scores), the 25th percentile woulud be 7.
There’s a third procedure for finding the percentile and it happens to be the one used by the CFA. The formula is familiar: L = (n + 1) * y/100
This procedure is the WEIGHTED AVERAGE of the first two definitions. (Follow the link below if you need help convincing yourself of that point.)