R-Squared

« Back to Glossary Index

R-Squared is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model. It ranges from 0 to 1, indicating the goodness of fit of the model.

Understanding R-Squared

Definition and Interpretation

  • Value Range: R-Squared values range from 0 to 1.
  • Interpretation:
    • An R-Squared of 0 means that the model explains none of the variability of the response data around its mean.
    • An R-Squared of 1 indicates that the model explains all the variability of the response data around its mean.
    • A value closer to 1 implies a better fit, while a value closer to 0 indicates a poor fit.

Calculation of R-Squared

R-Squared can be calculated using the following formula:

R² = 1 – (SSres / SStot)

  • SSres: The sum of squares of the residuals (the differences between observed and predicted values).
  • SStot: The total sum of squares (the variance of the observed data).

Example of R-Squared

Consider a simple linear regression analysis where we want to analyze the relationship between the number of hours studied and the scores obtained in an exam.

  • Suppose we have the following data:
    • Hours Studied: [1, 2, 3, 4, 5]
    • Scores Obtained: [50, 55, 65, 70, 80]
  • Assume the linear regression model gives us the predicted scores equipped with the model.
  • The sum of squares of the residuals (SSres) might be calculated as follows:
    – Predicted Scores: [52, 57, 62, 67, 72] – Residuals: [50-52, 55-57, 65-62, 70-67, 80-72] = [-2, -2, 3, 3, 8] – SSres = (-2)² + (-2)² + (3)² + (3)² + (8)² = 4 + 4 + 9 + 9 + 64 = 90
  • The total sum of squares (SStot) is calculated as:
    – Mean Score = (50 + 55 + 65 + 70 + 80) / 5 = 62
    – SStot = (50-62)² + (55-62)² + (65-62)² + (70-62)² + (80-62)² = 144 + 49 + 9 + 64 + 324 = 590
  • Substituting the values into the R-Squared formula:
    R² = 1 – (90 / 590) ≈ 0.846

This means approximately 84.6% of the variability in the exam scores can be explained by the number of hours studied, indicating a strong relationship between the two variables.