57 `lm()` Function in R

57.1 Calculate the Line in R

The syntax for a model in R is

response variable ~ explanatory variables

where the explanatory variables component can involve several types of terms.

> myfit <- lm(weight ~ height, data=htwt)
> myfit

Call:
lm(formula = weight ~ height, data = htwt)

Coefficients:
(Intercept)       height  
    -130.91         1.15

57.2 An `lm` Object is a List

> class(myfit)
[1] "lm"
> is.list(myfit)
[1] TRUE
> names(myfit)
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"

57.3 From the R Help

lm returns an object of class “lm” or for multiple responses of class c(“mlm”, “lm”).

The functions summary and anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coefficients, effects, fitted.values and residuals extract various useful features of the value returned by lm.

57.4 Some of the List Items

These are some useful items to access from the lm object:

coefficients: a named vector of coefficients
residuals: the residuals, that is response minus fitted values.
fitted.values: the fitted mean values.
df.residual: the residual degrees of freedom.
call: the matched call.
model: if requested (the default), the model frame used.

57.5 `summary()`

> summary(myfit)

Call:
lm(formula = weight ~ height, data = htwt)

Residuals:
    Min      1Q  Median      3Q     Max 
-19.658  -5.381  -0.555   4.807  42.894 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -130.91040   11.52792  -11.36   <2e-16 ***
height         1.15009    0.06749   17.04   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.505 on 198 degrees of freedom
Multiple R-squared:  0.5946,    Adjusted R-squared:  0.5925 
F-statistic: 290.4 on 1 and 198 DF,  p-value: < 2.2e-16

57.6 `summary()` List Elements

> mysummary <- summary(myfit)
> names(mysummary)
 [1] "call"          "terms"         "residuals"     "coefficients" 
 [5] "aliased"       "sigma"         "df"            "r.squared"    
 [9] "adj.r.squared" "fstatistic"    "cov.unscaled"

57.7 Using `tidy()`

> library(broom)
> tidy(myfit)
# A tibble: 2 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)  -131.     11.5        -11.4 2.44e-23
2 height          1.15    0.0675      17.0 1.12e-40

57.8 Proportion of Variation Explained

The proportion of variance explained by the fitted model is called \(R^2\) or \(r^2\). It is calculated by:

\[r^2 = \frac{s^2_{\hat{y}}}{s^2_{y}}\]

> summary(myfit)$r.squared
[1] 0.5945555
> 
> var(myfit$fitted.values)/var(htwt$weight)
[1] 0.5945555

57.9 Assumptions to Verify

The assumptions on the above linear model are really about the joint distribution of the residuals, which are not directly observed. On data, we try to verify:

The fitted values and the residuals show no trends with respect to each other
The residuals are distributed approximately Normal\((0, \sigma^2)\)
- A constant variance is called homoscedasticity
- A non-constant variance is called heteroscedascity
There are no lurking variables

There are two plots we will use in this course to investigate the first two.

57.10 Residual Distribution

> plot(myfit, which=1)

57.11 Normal Residuals Check

> plot(myfit, which=2)

Foundations of Applied Statistics

57 `lm()` Function in R

57.1 Calculate the Line in R

57.2 An `lm` Object is a List

57.3 From the R Help

57.4 Some of the List Items

57.5 `summary()`

57.6 `summary()` List Elements

57.7 Using `tidy()`

57.8 Proportion of Variation Explained

57.9 Assumptions to Verify

57.10 Residual Distribution

57.11 Normal Residuals Check

57.12 Fitted Values Vs. Obs. Residuals

57 lm() Function in R

57.1 Calculate the Line in R

57.2 An lm Object is a List

57.3 From the R Help

57.4 Some of the List Items

57.5 summary()

57.6 summary() List Elements

57.7 Using tidy()

57.8 Proportion of Variation Explained

57.9 Assumptions to Verify

57.10 Residual Distribution

57.11 Normal Residuals Check

57.12 Fitted Values Vs. Obs. Residuals

57 `lm()` Function in R

57.2 An `lm` Object is a List

57.5 `summary()`

57.6 `summary()` List Elements

57.7 Using `tidy()`