Analysis of growth data

讀進資料

其中年級由 2 開始，製造新的年級變項，讓截距對應起點

dta <- read.csv("nlsy.csv",header=T)
dta$grade_c2 <- dta$grade - 2
dta$female <- as.factor(dta$female)
head(dta,12)

##      id female low_birth_weight math grade  anti grade_c2
## 1   201      1                0   38     3 FALSE        1
## 2   201      1                0   55     5 FALSE        3
## 3   303      1                0   26     2  TRUE        0
## 4   303      1                0   33     5  TRUE        3
## 5  2702      0                0   56     2 FALSE        0
## 6  2702      0                0   58     4  TRUE        2
## 7  2702      0                0   80     8  TRUE        6
## 8  4303      1                0   41     3  TRUE        1
## 9  4303      1                0   58     4  TRUE        2
## 10 5002      0                0   46     4  TRUE        2
## 11 5002      0                0   54     6  TRUE        4
## 12 5002      0                0   66     8  TRUE        6

每次觀察是 level 1，個人是 level 2。年級是level 1變項，性別是level2變項。
每個人的觀察次數不一定相同，有些人只有兩次

載入 lme4 與畫圖套件

library(pacman)

## Warning: 套件 'pacman' 是用 R 版本 4.2.3 來建造的

p_load(lme4,ggplot2)

畫張圖，看看隨年級變化（成長）情形

ggplot(data=dta,
   aes(x = grade, y = math, group = id)) +               
   geom_line() +                                         
   theme_bw() +                                          
   scale_x_continuous(breaks = 2:8, name = "Grade") +    
   scale_y_continuous(name = "Mathematics")

截距是起點，斜率是進步速度

先只處理每個人的資料

rst1 <- lmer(math ~ 1 + (1 | id), data = dta, REML = FALSE)
summary(rst1)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: math ~ 1 + (1 | id)
##    Data: dta
## 
##      AIC      BIC   logLik deviance df.resid 
##  17497.9  17515.0  -8746.0  17491.9     2218 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.79835 -0.57828  0.05408  0.57886  2.52690 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  46.92    6.85   
##  Residual             116.68   10.80   
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  45.9147     0.3237   141.8

把年級加進去

rst2 <- lmer(math ~ 1 + grade_c2 + (1 + grade_c2 | id), data = dta, REML = FALSE)
summary(rst2)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: math ~ 1 + grade_c2 + (1 + grade_c2 | id)
##    Data: dta
## 
##      AIC      BIC   logLik deviance df.resid 
##  15949.4  15983.6  -7968.7  15937.4     2215 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.03039 -0.52586  0.00162  0.54076  2.54222 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 64.5563  8.0347        
##           grade_c2     0.7322  0.8557   -0.03
##  Residual             36.2315  6.0193        
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 35.26736    0.35499   99.35
## grade_c2     4.33932    0.08734   49.69
## 
## Correlation of Fixed Effects:
##          (Intr)
## grade_c2 -0.532

請注意，每個人會有自己的截距（起點）、自己的斜率（進步速度）

紀錄每個人的截距（起點）與斜率（進步速度）

IntSlope <- coef(rst2)$id
IntSlope$id <- rownames(IntSlope)
names(IntSlope)[1:2] <- c('Intercept','Slope')
head(IntSlope)

##      Intercept    Slope   id
## 201   36.94689 4.534009  201
## 303   26.03608 4.050755  303
## 2702  49.70142 4.594245 2702
## 4303  41.04192 4.548051 4303
## 5002  37.01256 4.496700 5002
## 5005  37.68796 4.324230 5005

看一下男生與女生的平均截距（起點）與斜率（進步速度）

dta_F <- aggregate(female~id,data=dta,FUN=head, 1)
names(dta_F) <- c('id','female') 
dta_F$female <- as.factor(dta_F$female)
IntSlope <- merge(IntSlope,dta_F,by="id")
IntSlope_m <- aggregate(cbind(Intercept,Slope)~female,mean,data=IntSlope)
IntSlope_m

##   female Intercept    Slope
## 1      0  35.70421 4.353250
## 2      1  34.81719 4.324966

IntSlope_s <- aggregate(cbind(Intercept,Slope)~female,sd,data=IntSlope)
IntSlope_s

##   female Intercept     Slope
## 1      0  7.001527 0.3002893
## 2      1  6.970847 0.3282111

起點或速度，會因性別而改變嗎？

畫個圖看看

ggplot(IntSlope, aes(x=Intercept, y=Slope, color=female)) + 
  geom_point(alpha=0.5) +
  labs(x= "Intercept", y="Slope(grade effect)")+
  geom_smooth()+
  geom_point(data=IntSlope_m, size=5, shape=24)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

似乎不算大

以性別解釋起點與速度

rst3 <- lmer(math ~ 1 +  female + female:grade_c2 +(1 + grade_c2 | id), data = dta, REML = FALSE)
summary(rst3)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: math ~ 1 + female + female:grade_c2 + (1 + grade_c2 | id)
##    Data: dta
## 
##      AIC      BIC   logLik deviance df.resid 
##  15949.4  15995.0  -7966.7  15933.4     2213 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.02610 -0.52455  0.00425  0.53540  2.51795 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 64.2674  8.0167        
##           grade_c2     0.7328  0.8561   -0.03
##  Residual             36.2120  6.0176        
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)       35.7507     0.4994  71.591
## female1           -0.9818     0.7091  -1.385
## female0:grade_c2   4.3818     0.1234  35.494
## female1:grade_c2   4.2962     0.1235  34.782
## 
## Correlation of Fixed Effects:
##             (Intr) femal1 fm0:_2
## female1     -0.704              
## fml0:grd_c2 -0.537  0.378       
## fml1:grd_c2  0.000 -0.375  0.000

Analysis of growth data

CCP

2023-04-29

讀進資料

其中年級由 2 開始，製造新的年級變項，讓截距對應起點

載入 lme4 與畫圖套件

畫張圖，看看隨年級變化（成長）情形

先只處理每個人的資料

把年級加進去

紀錄每個人的截距（起點）與斜率（進步速度）

看一下男生與女生的平均截距（起點）與斜率（進步速度）

畫個圖看看

以性別解釋起點與速度

根據模型，畫畫看預測值