The latter is the penalized version of the other. It penalizes for too complex of a model relative to the sample size and relative to the improvement in reduction of the error variation.
I’m not misleading by introducing the work and guidance of well-trained practicing statisticians. Outside of your scope doesn’t mean it’s incorrect or misleading.
No one is arguing this is the case. The adjustment in R-squared penalizes for a complicated model relative to the sample size (read that as lots of terms with small sample); so this encourages parsimony.
Literally, I have seen 5-20% absolute difference with only a few variables.
R-squared doesn’t penalize for junk; R-squared (error variance) never decreases (decreases) as you add more terms to the model, irrespective of their statistical utility. This is blatantly obvious in that the model is fit by minimizing the sum of squared errors which is y-yhat where yhat is b0 + b1X1 + b2X2+…+bkXk … the more terms in that, the smaller the sum of squared errors will necessarily be-- this is straight forward. That increases R-squared. Junk can increase R-squared, but not necessarily the adjusted R-squared.
This just simply isn’t true. You need to review how R-squared is calculated.
You explicitly said that R-squared was the only way to tell.
Again, being unfamiliar with the subject doesn’t mean I’m misleading. It means you have some reading to do before continuing to reply based on personal feeling on the subject. I’m speaking strictly from 1) formal education 2) advice and discussion with real statisticians 3) personal experience and observation 4) not from my feeling on the subject.
I’m not asking for the formula, that’s not in contention. You have shown us that you didn’t actually conduct a simulation. A good way to do this would be to generate, say a set of X,Y (with two true X variables) that are known to follow some regression model you specify for the simulation. Add some noise to Y so the relationship isn’t deterministic. Then, generate a bunch of other, unpaired x-variables with random values (simulating random independent variables that are junk). Calculate r-squared and adjusted from the multivariable model of Y, X1 and X2, since we know this is the real relationship set in our model. Then adding each junk x-variable, calculate the new values of r-squared and r-squared adjusted. (Ideally you could run this 5000 times at a range of sample sizes to see what tends to happen.) This is a simulation study. What you have done is created some numbers that don’t account for fitting a new model with junk terms, reducing the SSE, and calculating new r-squared values (which is actually what would happen).
You make so many statements that are emotional and based on feelings, it seems. No one is talking about assumptions, so you’re definitely missing the point with this. Also, you’re ignoring that with less data, R-squared adjusted might be more relevant than with more observations, but you’re pointing out that limited sample size is a concern. You’re staring at food in front of you while saying you’re hungry!
I fight against them because even for finance and econ they do a terrible job. A good econometrics book demonstrates that. They are often flat out incorrect. Fun fact, every ANOVA table has an underlying regression model, so again, poor job on the CFA Institute to demonstrate the equivalent cases; a regression equation likely has far more utility than an ANOVA table, but either way, because they are special, common cases of one another, these criticisms still hold.
My point is to pick a book written by someone with a PhD in stats, rather than the CFA curriculum as your reference text. Almost any would be better than the CFAI book.
Claiming “this is just for finance” doesn’t make it correct when it’s flat out wrong.
I guess I didn’t do a good job of avoiding further discussion. I will ask that before you fire back another post based on feeling on the subject that you actually do some reading on it.
Having an understanding of how a model is fit (minimizing SSE, usually) and how the usual R-squared is calculated will pretty clearly contradict many of your points. This understanding and reading will also allow you to look past your own feeling on the subject.
You argument is basically that spoons are only used for eating ice cream because cereal isn’t necessarily tastier and because you prefer ice cream to cereal.
Now, I’ll be good and leave it to that.
P.S. If you are genuinely interested in some book suggestions I’ll happily post a few.