R-Squared is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model. It ranges from 0 to 1, indicating the goodness of fit of the model.
Understanding R-Squared
Definition and Interpretation
- Value Range: R-Squared values range from 0 to 1.
- Interpretation:
- An R-Squared of 0 means that the model explains none of the variability of the response data around its mean.
- An R-Squared of 1 indicates that the model explains all the variability of the response data around its mean.
- A value closer to 1 implies a better fit, while a value closer to 0 indicates a poor fit.
Calculation of R-Squared
R-Squared can be calculated using the following formula:
R² = 1 – (SSres / SStot)
- SSres: The sum of squares of the residuals (the differences between observed and predicted values).
- SStot: The total sum of squares (the variance of the observed data).
Example of R-Squared
Consider a simple linear regression analysis where we want to analyze the relationship between the number of hours studied and the scores obtained in an exam.
- Suppose we have the following data:
- Hours Studied: [1, 2, 3, 4, 5]
- Scores Obtained: [50, 55, 65, 70, 80]
- Assume the linear regression model gives us the predicted scores equipped with the model.
- The sum of squares of the residuals (SSres) might be calculated as follows:
– Predicted Scores: [52, 57, 62, 67, 72] – Residuals: [50-52, 55-57, 65-62, 70-67, 80-72] = [-2, -2, 3, 3, 8] – SSres = (-2)² + (-2)² + (3)² + (3)² + (8)² = 4 + 4 + 9 + 9 + 64 = 90 - The total sum of squares (SStot) is calculated as:
– Mean Score = (50 + 55 + 65 + 70 + 80) / 5 = 62
– SStot = (50-62)² + (55-62)² + (65-62)² + (70-62)² + (80-62)² = 144 + 49 + 9 + 64 + 324 = 590 - Substituting the values into the R-Squared formula:
R² = 1 – (90 / 590) ≈ 0.846
This means approximately 84.6% of the variability in the exam scores can be explained by the number of hours studied, indicating a strong relationship between the two variables.