Adding 1 to n to calculate the position of the observation

LOS 7.f: Calculate Qantiles

The formula deals with the expression (n+1)time y over 100.

Is there any intuition behind adding 1 to the sample? Do I have more DoF available? Or is it pure calculus?

Can you simplify!

Thx

Bedee

Just a thought. So suppose we focus on the Median (50th percentile) => y=50

If you had an odd number of ordered observation - e.g. : 2, 3, 4, 5, 7 => n=5

The location of the median is given by (n+1)* 50/100 = (n+1)/2 =(5+1)/2 = 3 => 3rd term (4 in this case)

Similarly now consider a case with an even number of observations - e.g.: 2, 3, 4, 5, 7,10 => n=6

The location of the median is given by (n+1) * 50/100 = (6+1)/2 = 3.5 => Midpoint of term 3 and term 4 (midpoint of 4 and 5 which is 4.5)

So I guess the use of (n+1) ensures the correct location regardless of whether n is odd or even. If you used n/2 instead, for the first series (with an odd number of observations) the location would be 2.5 (midpoint of second and third term) and for the second series (with an even number of observations) the location would be 3 (third term). However, neither of these locations represent the centres or midpoints of the ordered datasets.

A good thought… as I am concerned… great thanks to this remote and wonderful place of yours…

Good look in the exams.

Well, the focus is on n… why do we add plus 1 instead of using just n… well you see… regardless it is even- or uneven, the number 1 is added. What does it enable?

Well, I suppose as shown in the example for the median (the 50th percentile), using just n doesn’t provide you with the correct location of the median.

You could perhaps confim this for some other quantile as well. However, if it’s a formula, it needs to work for any quantile you pick. If using just n doesn’t work for the median (50th percentile), it doesn’t make sense to use just n in the general formula.

In the case of the median, the objective is to find the location of the observation that is the exact midpoint of the ordered dataset. Using just n doesn’t give the precise location - which is the main focus of quantiles in general.

So I think the focus is not on n, but rather on the (correct) location of an observation, below which a given proportion of the dataset lies.

Can you think of a quantile example where Ly = (n+1) y/100 gives an incorrect location ? Unlikely

Thanks again!

I tried to use the formula for 10 observations to get 25% percentile using the formula to define the L of 25% ==> (10+1)x25/100 = 2.75. how come is that? shouldn’t be the location 2.5?