讀進資料

其中年級由 2 開始,製造新的年級變項,讓截距對應起點

dta <- read.csv("nlsy.csv",header=T)
dta$grade_c2 <- dta$grade - 2
dta$female <- as.factor(dta$female)
head(dta,12)
##      id female low_birth_weight math grade  anti grade_c2
## 1   201      1                0   38     3 FALSE        1
## 2   201      1                0   55     5 FALSE        3
## 3   303      1                0   26     2  TRUE        0
## 4   303      1                0   33     5  TRUE        3
## 5  2702      0                0   56     2 FALSE        0
## 6  2702      0                0   58     4  TRUE        2
## 7  2702      0                0   80     8  TRUE        6
## 8  4303      1                0   41     3  TRUE        1
## 9  4303      1                0   58     4  TRUE        2
## 10 5002      0                0   46     4  TRUE        2
## 11 5002      0                0   54     6  TRUE        4
## 12 5002      0                0   66     8  TRUE        6
  • 每次觀察是 level 1,個人是 level 2。年級是level 1變項,性別是level2變項。
  • 每個人的觀察次數不一定相同,有些人只有兩次

載入 lme4 與畫圖套件

library(pacman)
## Warning: 套件 'pacman' 是用 R 版本 4.2.3 來建造的
p_load(lme4,ggplot2)

畫張圖,看看隨年級變化(成長)情形

ggplot(data=dta,
   aes(x = grade, y = math, group = id)) +               
   geom_line() +                                         
   theme_bw() +                                          
   scale_x_continuous(breaks = 2:8, name = "Grade") +    
   scale_y_continuous(name = "Mathematics")       

  • 截距是起點,斜率是進步速度

先只處理每個人的資料

rst1 <- lmer(math ~ 1 + (1 | id), data = dta, REML = FALSE)
summary(rst1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: math ~ 1 + (1 | id)
##    Data: dta
## 
##      AIC      BIC   logLik deviance df.resid 
##  17497.9  17515.0  -8746.0  17491.9     2218 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.79835 -0.57828  0.05408  0.57886  2.52690 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  46.92    6.85   
##  Residual             116.68   10.80   
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  45.9147     0.3237   141.8

把年級加進去

rst2 <- lmer(math ~ 1 + grade_c2 + (1 + grade_c2 | id), data = dta, REML = FALSE)
summary(rst2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: math ~ 1 + grade_c2 + (1 + grade_c2 | id)
##    Data: dta
## 
##      AIC      BIC   logLik deviance df.resid 
##  15949.4  15983.6  -7968.7  15937.4     2215 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.03039 -0.52586  0.00162  0.54076  2.54222 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 64.5563  8.0347        
##           grade_c2     0.7322  0.8557   -0.03
##  Residual             36.2315  6.0193        
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 35.26736    0.35499   99.35
## grade_c2     4.33932    0.08734   49.69
## 
## Correlation of Fixed Effects:
##          (Intr)
## grade_c2 -0.532
  • 請注意,每個人會有自己的截距(起點)、自己的斜率(進步速度)

紀錄每個人的截距(起點)與斜率(進步速度)

IntSlope <- coef(rst2)$id
IntSlope$id <- rownames(IntSlope)
names(IntSlope)[1:2] <- c('Intercept','Slope')
head(IntSlope)
##      Intercept    Slope   id
## 201   36.94689 4.534009  201
## 303   26.03608 4.050755  303
## 2702  49.70142 4.594245 2702
## 4303  41.04192 4.548051 4303
## 5002  37.01256 4.496700 5002
## 5005  37.68796 4.324230 5005

看一下男生與女生的平均截距(起點)與斜率(進步速度)

dta_F <- aggregate(female~id,data=dta,FUN=head, 1)
names(dta_F) <- c('id','female') 
dta_F$female <- as.factor(dta_F$female)
IntSlope <- merge(IntSlope,dta_F,by="id")
IntSlope_m <- aggregate(cbind(Intercept,Slope)~female,mean,data=IntSlope)
IntSlope_m
##   female Intercept    Slope
## 1      0  35.70421 4.353250
## 2      1  34.81719 4.324966
IntSlope_s <- aggregate(cbind(Intercept,Slope)~female,sd,data=IntSlope)
IntSlope_s
##   female Intercept     Slope
## 1      0  7.001527 0.3002893
## 2      1  6.970847 0.3282111
  • 起點或速度,會因性別而改變嗎?

畫個圖看看

ggplot(IntSlope, aes(x=Intercept, y=Slope, color=female)) + 
  geom_point(alpha=0.5) +
  labs(x= "Intercept", y="Slope(grade effect)")+
  geom_smooth()+
  geom_point(data=IntSlope_m, size=5, shape=24)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

  • 似乎不算大

以性別解釋起點與速度

rst3 <- lmer(math ~ 1 +  female + female:grade_c2 +(1 + grade_c2 | id), data = dta, REML = FALSE)
summary(rst3)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: math ~ 1 + female + female:grade_c2 + (1 + grade_c2 | id)
##    Data: dta
## 
##      AIC      BIC   logLik deviance df.resid 
##  15949.4  15995.0  -7966.7  15933.4     2213 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.02610 -0.52455  0.00425  0.53540  2.51795 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 64.2674  8.0167        
##           grade_c2     0.7328  0.8561   -0.03
##  Residual             36.2120  6.0176        
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)       35.7507     0.4994  71.591
## female1           -0.9818     0.7091  -1.385
## female0:grade_c2   4.3818     0.1234  35.494
## female1:grade_c2   4.2962     0.1235  34.782
## 
## Correlation of Fixed Effects:
##             (Intr) femal1 fm0:_2
## female1     -0.704              
## fml0:grd_c2 -0.537  0.378       
## fml1:grd_c2  0.000 -0.375  0.000

根據模型,畫畫看預測值