上次修改spark.yarn.cache.confArchive参数无效。
我就换了思路来做。
源码中关于创建这部分的代码如下:
private def createConfArchive(): File = {
val hadoopConfFiles = new HashMap[String, File]()
// Uploading $SPARK_CONF_DIR/log4j.properties file to the distributed cache to make sure that
// the executors will use the latest configurations instead of the default values. This is
// required when user changes log4j.properties directly to set the log configurations. If
// configuration file is provided through --files then executors will be taking configurations
// from --files instead of $SPARK_CONF_DIR/log4j.properties.
// Also uploading metrics.properties to distributed cache if exists in classpath.
// If user specify this file using --files then executors will use the one
// from --files instead.
for { prop <- Seq("log4j.properties", "metrics.properties")
url <- Option(Utils.getContextOrSparkClassLoader.getResource(prop))
if url.getProtocol == "file" } {
hadoopConfFiles(prop) = new File(url.getPath)
}
Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR").foreach { envKey =>
sys.env.get(envKey).foreach { path =>
val dir = new File(path)
if (dir.isDirectory()) {
val files = dir.listFiles()
if (files == null) {
logWarning("Failed to list files under directory " + dir)
} else {
files.foreach { file =>
if (file.isFile && !hadoopConfFiles.contains(file.getName())) {
hadoopConfFiles(file.getName()) = file
}
}
}
}
}
}
val confArchive = File.createTempFile(LOCALIZED_CONF_DIR, ".zip",
new File(Utils.getLocalDir(sparkConf)))
val confStream = new ZipOutputStream(new FileOutputStream(confArchive))
try {
confStream.setLevel(0)
hadoopConfFiles.foreach { case (name, file) =>
if (file.canRead()) {
confStream.putNextEntry(new ZipEntry(name))
Files.copy(file, confStream)
confStream.closeEntry()
}
}
就是读取HADOOP_CONF_DIR下的文件,但是这个环境变量我最开始就已经设置好了,为什么没有生效呢。
String dir = System.getenv("HADOOP_CONF_DIR");
System.out.println(dir);
按照spark源码的读取方式我读了一遍,结果我的控制台也打印了null,这是为啥。
查了半小时无果后,想起了网管那句话,重启解决一切问题。
尝试性重启下。
竟然OK了。环境变量正常,并且读取也没问题,各个xml成功加入了conf.zip.
不用去单独修改了!