我有一个运行名称节点,数据节点,作业历史记录,yarnmaster,oozie和mysql的容器的docker网络。我的oozie可以成功地将作业提交到我的hadoop集群。作业将成功,但是Jobhistory拒绝连接到oozie回调。稍后,oozie Web界面和实例停止工作,并且诸如“oozie job -info”之类的任何命令都将拒绝连接,如下所示:
bash-4.2$ oozie job -info 0000000-180822162217556-oozie-W
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.security.authentication.client.KerberosAuthenticator).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Job ID : 0000000-180822162217556-oozie-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : WorkflowRunnerTest
App Path : hdfs://namenode:8020/user/hadoop/oozie-jobs/WordCountTest
Status : RUNNING
Run : 0
User : hadoop
Group : -
Created : 2018-08-22 16:22 GMT
Started : 2018-08-22 16:22 GMT
Last Modified : 2018-08-22 16:23 GMT
Ended : -
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-180822162217556-oozie-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000000-180822162217556-oozie-W@intersection0 RUNNING job_1534954806897_0001 RUNNING -
------------------------------------------------------------------------------------------------------------------------------------
bash-4.2$ oozie job -info 0000000-180822162217556-oozie-W
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry count = 2
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 4 sec. Retry count = 3
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 8 sec. Retry count = 4
该作业的作业历史日志如下所示:
Showing 4096 bytes of 69256 total. Click here for the full log.
eds:0 ContAlloc:4 ContRel:0 HostLocal:3 RackLocal:0
2018-08-22 16:25:36,630 INFO [Thread-73] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://namenode:8020 /tmp/hadoop-yarn/staging/hadoop/.staging/job_1534954806897_0002
2018-08-22 16:25:36,636 INFO [Thread-73] org.apache.hadoop.ipc.Server: Stopping server on 35021
2018-08-22 16:25:36,638 INFO [IPC Server listener on 35021] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 35021
2018-08-22 16:25:36,639 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted
2018-08-22 16:25:36,639 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: TaskAttemptFinishingMonitor thread interrupted
2018-08-22 16:25:36,641 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2018-08-22 16:25:36,653 INFO [Thread-73] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job end notification started for jobID : job_1534954806897_0002
2018-08-22 16:25:36,654 INFO [Thread-73] org.mortbay.log: Job end notification attempts left 0
2018-08-22 16:25:36,654 INFO [Thread-73] org.mortbay.log: Job end notification trying http://oozie:11000/oozie/callback?id=0000000-180822162217556-oozie-W@intersection0&status=SUCCEEDED
2018-08-22 16:25:36,663 WARN [Thread-73] org.mortbay.log: Job end notification to http://oozie:11000/oozie/callback?id=0000000-180822162217556-oozie-W@intersection0&status=SUCCEEDED failed
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1199)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at org.apache.hadoop.mapreduce.v2.app.JobEndNotifier.notifyURLOnce(JobEndNotifier.java:130)
at org.apache.hadoop.mapreduce.v2.app.JobEndNotifier.notify(JobEndNotifier.java:174)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.sendJobEndNotify(MRAppMaster.java:686)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:654)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:728)
2018-08-22 16:25:37,666 WARN [Thread-73] org.mortbay.log: Job end notification failed to notify : http://oozie:11000/oozie/callback?id=0000000-180822162217556-oozie-W@intersection0&status=SUCCEEDED
2018-08-22 16:25:42,667 INFO [Thread-73] org.apache.hadoop.ipc.Server: Stopping server on 41027
2018-08-22 16:25:42,668 INFO [IPC Server listener on 41027] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 41027
2018-08-22 16:25:42,670 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2018-08-22 16:25:42,678 INFO [Thread-73] org.mortbay.log: Stopped [email protected]:0
有什么特别的东西可能会引起这种打?吗?
这是oozie.log的输出:
2018-08-22 20:25:21,367 INFO Services:520 - SERVER[oozie] Initialized
2018-08-22 20:25:21,369 INFO Services:520 - SERVER[oozie] Running with JARs for Hadoop version [2.6.5]
2018-08-22 20:25:21,369 INFO Services:520 - SERVER[oozie] Oozie System ID [oozie] started!
2018-08-22 20:25:31,345 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:25:31,345 INFO PauseTransitService:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:25:31,348 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running coordinator status service first instance
2018-08-22 20:25:31,609 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running bundle status service first instance
2018-08-22 20:25:31,637 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:25:31,641 INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] CoordMaterializeTriggerService - Curr Date= 2018-08-22T20:30Z, Num jobs to materialize = 0
2018-08-22 20:25:31,648 INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.CoordMaterializeTriggerService]
2018-08-22 20:25:31,723 INFO PurgeXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [30] days, Coordinator Jobs older than [7] days, and Bundlejobs older than [7] days.
2018-08-22 20:25:31,723 INFO PurgeXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] ENDED Purge deleted [0] workflows, [0] coordinatorActions, [0] coordinators, [0] bundles
2018-08-22 20:25:31,746 INFO PauseTransitService:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:26:31,571 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:26:31,572 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running coordinator status service from last instance time = 2018-08-22T20:25Z
2018-08-22 20:26:31,614 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running bundle status service from last instance time = 2018-08-22T20:25Z
2018-08-22 20:26:31,641 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:26:31,676 INFO PauseTransitService:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:26:31,708 INFO PauseTransitService:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:27:31,571 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:27:31,572 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running coordinator status service from last instance time = 2018-08-22T20:26Z
2018-08-22 20:27:31,584 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running bundle status service from last instance time = 2018-08-22T20:26Z
2018-08-22 20:27:31,589 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:27:31,639 INFO PauseTransitService:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:27:31,661 INFO PauseTransitService:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:27:47,241 INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] Start action [0000000-180822202517586-oozie-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-08-22 20:27:47,242 INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] [***0000000-180822202517586-oozie-W@:start:***]Action status=DONE
2018-08-22 20:27:47,242 INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] [***0000000-180822202517586-oozie-W@:start:***]Action updated in DB!
2018-08-22 20:27:47,394 INFO WorkflowNotificationXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000000-180822202517586-oozie-W
2018-08-22 20:27:47,394 INFO WorkflowNotificationXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000000-180822202517586-oozie-W@:start:
2018-08-22 20:27:47,432 INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] Start action [0000000-180822202517586-oozie-W@intersection0] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-08-22 20:27:47,507 INFO HadoopAccessorService:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] Processing configuration file [/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/action-conf/default.xml] for action[default] and hostPort [*]
2018-08-22 20:27:47,508 INFO HadoopAccessorService:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] Processing configuration file [/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/action-conf/map-reduce.xml] for action [map-reduce] and hostPort [*]
2018-08-22 20:27:48,482 WARN JobResourceUploader:64 - SERVER[oozie] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-08-22 20:27:48,493 WARN JobResourceUploader:171 - SERVER[oozie] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2018-08-22 20:27:50,173 INFO MapReduceActionExecutor:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] checking action, hadoop job ID [job_1534969405649_0001] status [RUNNING]
2018-08-22 20:27:50,175 INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] [***0000000-180822202517586-oozie-W@intersection0***]Action status=RUNNING
2018-08-22 20:27:50,176 INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] [***0000000-180822202517586-oozie-W@intersection0***]Action updated in DB!
2018-08-22 20:27:50,208 INFO WorkflowNotificationXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] No Notification URL is defined. Therefore nothing to notify for job 0000000-180822202517586-oozie-W@intersection0
2018-08-22 20:28:02,437 INFO CallbackServlet:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] callback for action [0000000-180822202517586-oozie-W@intersection0]
2018-08-22 20:28:06,269 INFO MapReduceActionExecutor:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] External ID swap, old ID [job_1534969405649_0001] new ID [job_1534969405649_0002]
2018-08-22 20:28:06,273 INFO MapReduceActionExecutor:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] checking action, hadoop job ID [job_1534969405649_0002] status [RUNNING]
最佳答案
您是否尝试过添加oozie info命令
-oozie $ OOZIE_URL
其中OOZIE_URL是为实际oozie url设置的变量
关于apache - OOZIE成功运行MR作业,但从未收到状态更新,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51971375/