问题描述
以下是我的流程:
GetFile > ExecuteSparkInteractive > PutFile
我想从 ExecuteSparkInteractive
处理器中的 GetFile
处理器读取文件,应用一些转换并将其放在某个位置.下面是我的流程
I want to read files from GetFile
processor in ExecuteSparkInteractive
processor, apply some transformations and put it in some location. Below is my flow
我在spark处理器的code
部分下写了spark scala code
:
I wrote spark scala code
under code
section of spark processor:
val sc1=sc.textFile("local_path")
sc1.foreach(println)
流程中没有任何事情发生.那么如何使用 GetFile 处理器读取 Spark 处理器中的文件.
There is nothing happening in the flow. So how can I read files in spark processor using GetFile processor.
第二部分:
我试过下面的流程只是为了练习:
2nd Part:
I tried below flow just for practice:
ExecuteScript > PutFile > LogMessage
并且我在executescript处理器中提到了以下代码:
and I have mentioned below code in executescript processor:
readFile = open("/home/cloudera/Desktop/sample/data","r")
for line in readFile:
lines = line.strip()
finalline = re.sub(pattern='((?<=[0-9])[0-9]|(?<=\.)[0-9])',repl='X',string=lines)
readFile = open("/home/cloudera/Desktop/sample/data","w")
readFile.write(finalline)
代码工作正常,但它不会将格式化数据写入目标文件夹.那么我这里哪里出错了.另外,我在本地机器上安装了 pandas 并从 executescript 处理器运行了 pandas 代码,但 nifi 不读取 pandas 模块.为什么会这样?我已经尽力了.另外,我找不到任何相关链接,我可以在其中获得基本流程
Code works fine but it doesn't write the formatted data into the destination folder. So where am I going wrong over here.Also, I installed pandas in local machine and ran pandas code from the executescript processor but nifi doesn't read pandas module. Why is it so ?I tried my best. Also, I couldn't find any relevant links for this where I can get basic flow
推荐答案
这不是真正的工作方式... GetFile 正在拾取 NiFi 节点本地的文件并将它们带入 NiFi 流进行处理.ExecuteSparkInteractive 在远程 Spark 集群上启动 Spark 作业,它不会将数据传输到 Spark.因此,您可能希望将数据放在 Spark 可以访问的地方,可能是 GetFile -> PutHDFS -> ExecuteSparkInteractive.
This is not really how it works... GetFile is picking up files local to the NiFi node and bringing them into the NiFi flow for processing. ExecuteSparkInteractive kicks off a spark job on a remote Spark cluster, it does not transfer data to Spark. So you would likely want to put the data somewhere Spark can access it, maybe GetFile -> PutHDFS -> ExecuteSparkInteractive.
这篇关于如何从 NiFi 中的 GetFilesProcessor 读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!