柚子快報(bào)邀請(qǐng)碼778899分享:R語(yǔ)言實(shí)戰(zhàn)學(xué)習(xí)--回歸
柚子快報(bào)邀請(qǐng)碼778899分享:R語(yǔ)言實(shí)戰(zhàn)學(xué)習(xí)--回歸
文章目錄
普通最小二乘回歸(OLS)簡(jiǎn)單線性回歸多項(xiàng)式回歸多元線性回歸
回歸診斷標(biāo)準(zhǔn)方法QQ圖正態(tài)性檢驗(yàn)殘差圖誤差的獨(dú)立性成分殘差圖(偏殘差圖) 線性同方差性線性模型假設(shè)綜合驗(yàn)證異常觀測(cè)值高杠桿值強(qiáng)影響點(diǎn)變量添加圖氣泡圖
選擇最佳模型逐步回歸全子集回歸k重交叉驗(yàn)證相對(duì)重要性
普通最小二乘回歸(OLS)
OLS回歸是通過(guò)預(yù)測(cè)變量的加權(quán)和來(lái)預(yù)測(cè)量化的因變量
簡(jiǎn)單線性回歸
fit <- lm(weight~height,data = women)
summary(fit) ## 模型擬合結(jié)果
fitted(fit) ## 模型的擬合值
residuals(fit) ## 擬合模型的殘差
plot(women$height,women$weight, ## 簡(jiǎn)單做圖
xlab = "Height",
ylab = "Weight")
abline(fit)
anova(fit) ##生成擬合模型的方差分析表
多項(xiàng)式回歸
fit <- lm(weight~height+I(height^2),data = women) ## 一元二次回歸
summary(fit)
plot(women$height,women$weight, ## 簡(jiǎn)單做圖
xlab = "Height",
ylab = "Weight")
lines(women$height,fitted(fit))
fit <- lm(weight~height+I(height^2)+I(height^3),data = women) ## 一元三次回歸
summary(fit)
library(car)
scatterplot(weight~height,data = women,
spread=FALSE,smoother.arg=list(lty=2),pch=19,
main="Women Age 30-39",
xlab = "Height",
ylab = "Weight")
多元線性回歸
state <- as.data.frame(state.x77[,c(5,1,3,2,7)])
cor(state)
library(car)
scatterplotMatrix(state,spread=FALSE,smoother.args=list(lty=2),
main="Scatter Plot Matrix")
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state) ## 多元回歸
summary(fit)
fit <- lm(mpg~hp+wt+hp:wt,data = mtcars) ##有顯著交互的回歸
summary(fit)
library(effects)
plot(effect("hp:wt",fit,,list(wt=c(2.2,3.2,4.2))),mutiline=TRUE)
confint(fit)
回歸診斷
標(biāo)準(zhǔn)方法
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
par(mfrow=c(2,2))
plot(fit)
左上表示殘差分布圖 右上表示正態(tài)分布檢驗(yàn),落在線上表示符合正態(tài)分布 左下表示方差齊性檢驗(yàn),隨機(jī)分布則表示方差齊性 右下是殘差與杠桿圖,表示可以從中鑒別離群點(diǎn)、高杠桿點(diǎn)和強(qiáng)影響點(diǎn)
QQ圖正態(tài)性檢驗(yàn)
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
par(mfrow=c(1,1))
qqPlot(fit)
殘差圖
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
residplot <- function(fit,nbreaks=10){
z <- rstudent(fit)
hist(z,breaks = nbreaks,freq = FALSE,
xlab = "Studentized Residual",
main = "Distribution of Errors")
rug(jitter(z),col = "brown")
curve(dnorm(x,mean = mean(z),sd=sd(z)),
add = TRUE,col="blue",lwd=2)
lines(density(z)$x,density(z)$y,
col="red",lwd=2,lty=2)
legend("topright",
legend = c("Normal Curve","Kernel Density Curve"),
lty = 1:2,col = c("blue","red"),cex = .7)
}
residplot(fit)
誤差的獨(dú)立性
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
durbinWatsonTest(fit)
成分殘差圖(偏殘差圖) 線性
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
crPlots(fit)
線性模型對(duì)該模型似乎是合適的
同方差性
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
ncvTest(fit)
spreadLevelPlot(fit)
線性模型假設(shè)綜合驗(yàn)證
library(gvlma)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
gvmodel <- gvlma(fit)
summary(gvmodel)
異常觀測(cè)值
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
outlierTest(fit)
高杠桿值
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
hat.plot <- function(fit){
p <- length(coefficients(fit))
n <- length(fitted(fit))
plot(hatvalues(fit),main = "Index Plot of Hat Value")
abline(h=c(2,3)*p/n,col="red",lty=2)
identify(1:n,hatvalues(fit),names(hatvalues(fit)))
}
hat.plot(fit)
強(qiáng)影響點(diǎn)
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
cutoff <- 4/(nrow(state)-length(fit$coefficients)-2)
plot(fit,which = 4,cook.levels = cutoff)
abline(h=cutoff,lty=2,col="red")
變量添加圖
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
avPlots(fit,ask=FALSE,id.method="identity")
氣泡圖
library(car)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
summary(fit)
influencePlot(fit,id.method="identity",main="Influence Plot",
sub="Circle size is proportional to Cook's distance")
選擇最佳模型
fit1 <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
fit2 <- lm(Murder~Population+ Illiteracy,data = state)
anova(fit1,fit2)
AIC(fit1,fit2) ## Aic 方法
逐步回歸
library(MASS)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
stepAIC(fit,direction = "backward")
全子集回歸
library(leaps)
fit <- lm(Murder~Population+ Illiteracy+Income+Frost,data = state)
leaps <- regsubsets(Murder~Population+ Illiteracy+Income+Frost,data = state,nbest = 4)
plot(leaps,scale = "adjr2")
library(car)
subsets(leaps,statistic = "cp",
main="Cp Plot for All Subsets Regression")
abline(1,1,lty=2,col="red")
k重交叉驗(yàn)證
shrinkage <- function(fit, k=10){
require(bootstrap)
theta.fit <- function(x,y){lsfit(x,y)}
theta.predict <- function(fit,x){cbind(1,x)%*%fit$coef}
x <- fit$model[,2:ncol(fit$model)]
y <- fit$model[,1]
results <- crossval(x, y, theta.fit, theta.predict, ngroup=k)
r2 <- cor(y, fit$fitted.values)^2
r2cv <- cor(y, results$cv.fit)^2
cat("Original R-square =", r2, "\n")
cat(k, "Fold Cross-Validated R-square =", r2cv, "\n")
cat("Change =", r2-r2cv, "\n")
}
states <- as.data.frame(state.x77[, c("Murder", "Population", "Illiteracy", "Income", "Frost")])
fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data = states)
fit2<-lm(Murder~Population+Illiteracy,data=states)
shrinkage(fit)
shrinkage(fit2)#R平方減少得越少,預(yù)測(cè)則越精確。
相對(duì)重要性
states <- as.data.frame(state.x77[,c("Murder", "Population",
"Illiteracy", "Income", "Frost")])
zstates <- as.data.frame(scale(states))
#scale()函數(shù)將數(shù)據(jù)標(biāo)準(zhǔn)化為均值為0、標(biāo)準(zhǔn)差為1的數(shù)據(jù)集,這樣用R回歸即可獲得標(biāo)準(zhǔn)化的回歸系數(shù)。
#(注意,scale()函數(shù)返回的是一個(gè)矩陣,而lm()函數(shù)要求一個(gè)數(shù)據(jù)框,你需要用一個(gè)中間步驟來(lái)轉(zhuǎn)換一下。
zfit <- lm(Murder~Population + Income + Illiteracy + Frost, data=zstates)
coef(zfit)#Illiteracy是最重要的預(yù)測(cè)變量,而Frost是最不重要的
#相對(duì)重要性~相對(duì)權(quán)重法
relweights <- function(fit,...){
R <- cor(fit$model)
nvar <- ncol(R)
rxx <- R[2:nvar, 2:nvar]
rxy <- R[2:nvar, 1]
svd <- eigen(rxx)
evec <- svd$vectors
ev <- svd$values
delta <- diag(sqrt(ev))
lambda <- evec %*% delta %*% t(evec)
lambdasq <- lambda ^ 2
beta <- solve(lambda) %*% rxy
rsquare <- colSums(beta ^ 2)
rawwgt <- lambdasq %*% beta ^ 2
import <- (rawwgt / rsquare) * 100
import <- as.data.frame(import)
row.names(import) <- names(fit$model[2:nvar])
names(import) <- "Weights"
import <- import[order(import),1, drop=FALSE]
dotchart(import$Weights, labels=row.names(import),
xlab="% of R-Square", pch=19,
main="Relative Importance of Predictor Variables",
sub=paste("Total R-Square=", round(rsquare, digits=3)),
...)
return(import)
}
states <- as.data.frame(state.x77[,c("Murder", "Population",
"Illiteracy", "Income", "Frost")])
fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data=states)
relweights(fit, col="blue")#在這個(gè)模型中Illiteracy比Income相對(duì)更重要
柚子快報(bào)邀請(qǐng)碼778899分享:R語(yǔ)言實(shí)戰(zhàn)學(xué)習(xí)--回歸
相關(guān)鏈接
本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理,出于傳遞更多信息之目的,不代表金鑰匙跨境贊同其觀點(diǎn)和立場(chǎng)。
轉(zhuǎn)載請(qǐng)注明,如有侵權(quán),聯(lián)系刪除。