本文介绍了Spark:从单个DStream中获取多个DStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有可能在spark中从单个DStream中获取多个DStream.我的用例如下:我正在从HDFS文件获取日志数据流.日志行包含一个id(id = xyz).我需要根据ID对日志行进行不同的处理.因此,我尝试为输入Dstream中的每个ID设置不同的Dstream.我在文档中找不到任何相关内容.有谁知道如何在Spark中实现此目标,或指向此目标的任何链接.

Is is possible to get multiple DStream out of a single DStream in spark.My use case is follows: I am getting Stream of log data from HDFS file.The log line contains an id (id=xyz).I need to process log line differently based on the id.So I was trying to different Dstream for each id from input Dstream.I couldnt find anything related in documentation.Does anyone know how this can be achieved in Spark or point to any link for this.

谢谢

推荐答案

您不能从单个DStream中拆分多个DStream.您可以做的最好的事情是:-

You cannot Split multiple DStreams from Single DStreams.The best you can do is: -

  1. 修改您的源系统以具有用于不同ID的不同流,然后您可以具有不同的作业来处理不同的流
  2. 如果您的源无法更改并向您提供ID混合的流,那么您需要编写自定义逻辑来识别ID,然后执行适当的操作.

我总是更喜欢#1,因为这是更清洁的解决方案,但是有些例外需要实现#2.

I would always prefer #1 as that is cleaner solution but there are exceptions for which #2 needs to be implemented.

这篇关于Spark:从单个DStream中获取多个DStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-26 13:47