Sign up  |  Log in

Insignificant regression coefficients

Why doesn’t the text remove the variables whose coefficients are insignificant from the regression equation, using the original equation instead to estimate the independent variable?

Master the Level II curriculum by creating custom quizzes in the SchweserPro™ QBank. Question difficulty automatically adapts to your ability level on a given topic, measuring your knowledge and keeping you motivated. Included in every Schweser Study Package.

I don’t understand your question entirelly, but will try to explain something:

asubset wrote:

Why doesn’t the text remove the variables whose coefficients are insignificant from the regression equation, …

Some researchers state that it is not always a good idea to retire independent variables that depict insignificant coefficients from a statistical point of view, but that still shows sound economical explanation for the dependent variable. This is like “accepting” a less tight threshold for that specific “insignificant” variable. It is still upon the decision of the researcher to retire or keep the variable, tho.

asubset wrote:

using the original equation instead to estimate the independent variable?

Don’t understand this part of your question. Perhaps can you elaborate further your doubt?

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

asubset wrote:

Why doesn’t the text remove the variables whose coefficients are insignificant from the regression equation, using the original equation instead to estimate the independent variable?

Data are noisy and a “large” p-value in one data set doesn’t actually say anything about whether that variable is truly part of the real regression function. As Harrogath mentioned, subject matter expertise should nearly always win over an arbitrary p-value. P-values themselves are random variables with incredible variability. People unfamiliar with statistics have a false sense of confidence in trivial things like p-values because there is mathematics used to estimate that quantity (and others). Mathematics is often precise, provable. Statistics and probability are entirely built on using mathematics and logic to estimate how imprecise and uncertain things are, but people often conflate statistics (uncertain) for mathematics (much more certain).

https://www.youtube.com/watch?v=5OL1RqHrZQ8

If the text did as you suggest, this starts to walk the line of p-hacking and overfitting a model to intricacies of a particular data set. This is a common and generally terrible approach to fitting models, which is why many non-statisticians use it without a care in the world.

Imagine you drop a plate on the floor and it shatters into big and little pieces; imagine we put the plate back together (miraculously) but could see all of the big and little cracks. Now if I asked you to draw a highly detailed sketch of this plate, you would have some large, general crack patterns and then very tiny intricate patterns.

If we dropped a second plate and fix it in a similar manner, it may look loosely like the drawing of the plate. However, all of the tiny intricate patterns will likely be highly different, and surely the tiny, detailed crack pattern in your drawing isn’t a very accurate representation of most broken and repaired plates (but the big pattern probably does an okay job).

This is like fine tuning a model to one data set (i.e. letting the data tell you all of the decisions, bad idea); maybe some general features hold up in the real world on new data, but often you’ve fit a bunch of noise from the original data set, and that noise isn’t a real feature to describe what’s really happening. Then, your model performance tanks because you overfit it.

Yes, I have the same question I believe. If a coefficient is found to NOT be significantly different than zero, do we remove the corresponding variable or keep it because it may help the model overall by how it interacts with the other variables?