I didn’t do any mock exams from schweser. Just did the sample essay questions and item set questioms from CFAI ( which contains actual exam questions).
The reason why I thought of this was because using 40/60/80, there were lots of people who failed and passed that had very similiar scores.
I think the 40/60/80 overestimate the score for people who got the “<=50” score, and underestimate the score for people who got the “>70” score.
I read the thread of 40/60/80 and there were people scoring at band 3 that have similiar score with people who passed.
In a set question of 6, you could be scoring 0,1,2 or 3 out of 6 for “<=50”.
In a set question of 6, the possibility of getting into the bracket of >70 is if you get 5/6 or 6/6 which will means at least 83.33%, which means using 80% understate the score that people get for >70.
Obviously each topic might have more than 1 set question, but that was the reasoning behind the 0.25/0.66/0.92.
Not sure what the logic of 0.4/0.6/0.8 is.
Of course, I think this system is not perfect, but I think it will help the people who failed a more realistic idea of how they scored compared to people who pass.
I also agree that this logic works better for a level 2 exams because level 3 exams has the essay component.
You’re not connecting the reasoning to the numbers.
If I flip a fair coin six times, and then tell you that I got either 5 or 6 heads, on that information alone what is the expected number of heads? 5.5? Of course not, it is about 5.1.
+1 The outcomes are more likely binomially distributed and not uniformly distributed as what the OP is proposing, but it is still an improvement over the 40/60/80 method. Cue “what is your 38.10% / 66.67% / 85.71% score?” thread.
Since most of us model for a living, maybe we should create a separate thread just to discuss what is the best way to do this analysis.
I would do it based on a binomial distribution and then updated iteratively.
For any person, if you know p, the probability that they got a question selected at random right, the exercise becomes trivial.
So how to get p? Start with an “initial guess,” you can base it off the 40/60/80 method as that it close enough. Then, using the # of questions (to determine the possible percentages in that band), you can use Bayesian analysis to determine the probability that any of those particular percentages was attained. Then, use those probabilities to get the “expected score” on that section…do that for all sections to get an updated estimate for p.
Now, perform the same process using the updated p to get an even newer estimate…keep on doing this until p converges around a certain point. Then use that value of p to do a final binomial thing, and you have your answer.
It’s silly to model when the real information should be available freely.
Fking CFAI likes to make a big deal about everything. “Oooooooooh our exam questions are so so special, if you discuss them we will cut your X off. But we can publich level III AM questions, that’s OK. Only level I, II and PM questions are unmentionable.”
Palisoc_xb… Can you give a bit more background on how you got the “38.10% / 66.67% / 85.71%” ?
I started this thread immediately after the “25%/66.67%/92%” idea struck me just when I woke up this morning (still in bed). – so I havn’t put too much thoughts into this.
Kartelite : thanks for your suggestion - but difficult to calculate if the only score you have is your own score?
1recho , I heard from past candidates that they used to provide rankings (and scores?), but what happened was candidates begin to compare rankings and scores across years which may not be meaningful. If you compare a 70% score from one year, it might not mean the same as a 70% score the next year as the difficulty of exams changes.
Sorry. How I got that was by assuming that the scores in the itemsets were binomially distributed with N trials and P probability of getting the answers correct then getting the conditional expectation per band. I just used N=6 and P=50% (a crude assumption). 38.10% is the conditional expectation of the lower band, 66.67% for the middle as there can only be 1 score for the middle band and 85.71% for the upper band.
I think this is also how kartelite got approx. 5.1 for the upper band, I just divided my result by 6 to get 85.71%.
This is only applicable for the PM section, though, as the AM section is quite different.
That’s true. But you could get a pretty rough idea if someone ran a huge simulation over lots of scores based on your “initial” 40/60/80 score. As in, given that a 40/60/80 score was X, we can obtain a confidence interval about the score using the iterated method (which should be a good estimate itself).
This would be less accurate as it doesn’t account for the number of questions that were in a section for each particular band.
Maybe we could get a large sample size for the vignette sections and estimate P by using Maximum Likelihood Estimation but still assuming a binomial distribution?
300hours, may we have a peek at some of your data?