我有一个在R中使用梯度下降的多变量线性回归的有效实现。我想看看我是否可以使用必须进行随机梯度下降的东西。我不确定这是否真的没有效率。例如,对于每个α值,我要执行500个SGD迭代,并能够指定每个迭代中随机选取的样本数。这样做很好,这样我就可以了解样本数量如何影响结果。我在使用迷你批处理时遇到了麻烦,我希望能够轻松绘制结果。
这是我到目前为止的内容:
# Read and process the datasets
# download the files from GitHub
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3x.dat", "ex3x.dat", method="curl")
x <- read.table('ex3x.dat')
# we can standardize the x vaules using scale()
x <- scale(x)
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3y.dat", "ex3y.dat", method="curl")
y <- read.table('ex3y.dat')
# combine the datasets
data3 <- cbind(x,y)
colnames(data3) <- c("area_sqft", "bedrooms","price")
str(data3)
head(data3)
################ Regular Gradient Descent
# http://www.r-bloggers.com/linear-regression-by-gradient-descent/
# vector populated with 1s for the intercept coefficient
x1 <- rep(1, length(data3$area_sqft))
# appends to dfs
# create x-matrix of independent variables
x <- as.matrix(cbind(x1,x))
# create y-matrix of dependent variables
y <- as.matrix(y)
L <- length(y)
# cost gradient function: independent variables and values of thetas
cost <- function(x,y,theta){
gradient <- (1/L)* (t(x) %*% ((x%*%t(theta)) - y))
return(t(gradient))
}
# GD simultaneous update algorithm
# https://www.coursera.org/learn/machine-learning/lecture/8SpIM/gradient-descent
GD <- function(x, alpha){
theta <- matrix(c(0,0,0), nrow=1)
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
# gradient descent α = (0.001, 0.01, 0.1, 1.0) - defined for 500 iterations
alphas <- c(0.001,0.01,0.1,1.0)
# Plot price, area in square feet, and the number of bedrooms
# create empty vector theta_r
theta_r<-c()
for(i in 1:length(alphas)) {
result <- GD(x, alphas[i])
# red = price
# blue = sq ft
# green = bedrooms
plot(result[,1],ylim=c(min(result),max(result)),col="#CC6666",ylab="Value",lwd=0.35,
xlab=paste("alpha=", alphas[i]),xaxt="n") #suppress auto x-axis title
lines(result[,2],type="b",col="#0072B2",lwd=0.35)
lines(result[,3],type="b",col="#66CC99",lwd=0.35)
}
找到一种使用
sgd()
的方法更实用吗?我似乎不知道如何使用sgd
包来获得所需的控制级别 最佳答案
坚持你现在所拥有的
## all of this is the same
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3x.dat", "ex3x.dat", method="curl")
x <- read.table('ex3x.dat')
x <- scale(x)
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3y.dat", "ex3y.dat", method="curl")
y <- read.table('ex3y.dat')
data3 <- cbind(x,y)
colnames(data3) <- c("area_sqft", "bedrooms","price")
x1 <- rep(1, length(data3$area_sqft))
x <- as.matrix(cbind(x1,x))
y <- as.matrix(y)
L <- length(y)
cost <- function(x,y,theta){
gradient <- (1/L)* (t(x) %*% ((x%*%t(theta)) - y))
return(t(gradient))
}
我在您的
y
函数中添加了GD
,并创建了一个包装器函数myGoD
来调用您的函数,但首先要对数据进行子集化GD <- function(x, y, alpha){
theta <- matrix(c(0,0,0), nrow=1)
theta_r <- NULL
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
myGoD <- function(x, y, alpha, n = nrow(x)) {
idx <- sample(nrow(x), n)
y <- y[idx, , drop = FALSE]
x <- x[idx, , drop = FALSE]
GD(x, y, alpha)
}
检查以确保其正常工作,然后尝试使用其他Ns
all.equal(GD(x, y, 0.001), myGoD(x, y, 0.001))
# [1] TRUE
set.seed(1)
head(myGoD(x, y, 0.001, n = 20), 2)
# x1 V1 V2
# V1 147.5978 82.54083 29.26000
# V1 295.1282 165.00924 58.48424
set.seed(1)
head(myGoD(x, y, 0.001, n = 40), 2)
# x1 V1 V2
# V1 290.6041 95.30257 59.66994
# V1 580.9537 190.49142 119.23446
这是您可以使用的方式
alphas <- c(0.001,0.01,0.1,1.0)
ns <- c(47, 40, 30, 20, 10)
par(mfrow = n2mfrow(length(alphas)))
for(i in 1:length(alphas)) {
# result <- myGoD(x, y, alphas[i]) ## original
result <- myGoD(x, y, alphas[i], ns[i])
# red = price
# blue = sq ft
# green = bedrooms
plot(result[,1],ylim=c(min(result),max(result)),col="#CC6666",ylab="Value",lwd=0.35,
xlab=paste("alpha=", alphas[i]),xaxt="n") #suppress auto x-axis title
lines(result[,2],type="b",col="#0072B2",lwd=0.35)
lines(result[,3],type="b",col="#66CC99",lwd=0.35)
}
您不需要包装器功能,只需稍微更改
GD
。将参数显式传递给函数而不是依赖于作用域始终是一个好习惯。在您假定y
将被从您的全局环境中拉出之前;在这里必须提供y
,否则您将得到一个错误。这将避免以后出现许多头痛和错误。GD <- function(x, y, alpha, n = nrow(x)){
idx <- sample(nrow(x), n)
y <- y[idx, , drop = FALSE]
x <- x[idx, , drop = FALSE]
theta <- matrix(c(0,0,0), nrow=1)
theta_r <- NULL
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
关于r - R中的梯度下降实现的随机梯度下降,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37485138/