试验环境:
CDH 5.15.1
CentOS 7
Python 3.7.0
kafka 1.1.1
kafka-python :https://pypi.org/project/kafka-python/#files
实验目的:
通过python线程,不断的将指定接口中的数据取出,并将数据不断发送到kafka服务中。
实验步骤-1:
先将kafka-python下载并安装成功;
进行一个python调用kafka的简单测试:
进入python3的终端:
>>> from kafka import KafkaProducer
>>> producer = KafkaProducer(bootstrap_servers=["master:9092"])
>>> producer.send("test",b"Hello world")
<kafka.producer.future.FutureRecordMetadata object at 0x7f4bf56fbda0>
>>> producer.send("test",b"Hello world")
<kafka.producer.future.FutureRecordMetadata object at 0x7f4bf5715438>
启动kafka消费者:
kafka-console-consumer --zookeeper master:2181 --from-beginning --topic test
输出结果:
Hello world
Hello world
实验步骤-2:
实验代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @File : ParsePS.py
# @Author: cjj
# @Date : 2019/6/4
# @Desc : 请求接口,获取数据,对数据进行清洗 import re
import threading
import time
from urllib.error import URLError from kafka import KafkaProducer
from kafka.errors import KafkaError
from suds.client import Client class Data_clean:
# 获取测点数据的函数
def get_data(observation_point_name): try:
# 获取接口数据
user_url = 'http://xxx.xxx.xxx.xxx/ServiceSL/ServiceGetInsqlData.svc?wsdl'
client = Client(user_url)
result = client.service.GetSingleTagInfo(observation_point_name)
# 1.对数据进行清洗
# 1.1 先将数据转换成字符串
str1 = str(result)
# 1.2 取出所有双引号里面的数据,并将列表转换成字符串
pattern = re.compile('"(.*)"')
str2 = str(pattern.findall(str1))
# 1.3 将单引号去掉
str3 = str2.replace('\'', '')
# 1.4 将逗号换成制表符
str4 = str3.replace(', ', '\t')
# 1.5 去掉字符串前后的[]
str5 = str4[:-1][1:] return str5
except TimeoutError as e:
print("\033[1;31;0m>>>>>>TimeoutError ->->->->->-> 对接口的请求超时<<<<<<\033[0m")
# print(e)
except URLError as e:
print("\033[1;31;0m>>>>>>URLError ->->->->->-> 连接不到sql服务器<<<<<<\033[0m")
except:
print("\033[1;31;0m>>>>>>其它原因报错<<<<<<\033[0m") try:
producer = KafkaProducer(bootstrap_servers='master:9092')
while 1: msg = Data_clean.get_data("SLWS_ps_1hzybqz_WD.PV")
print(msg) # 指定主题和发送内容,将数据发送到kafka
producer.send('test', msg.encode('utf-8'))
time.sleep(5) except KafkaError as e:
print(e)
finally:
producer.close()
print('done!!!')
将代码上传到Linux服务器
执行代码:python3 ParsePS.py
查看kafka消费者结果: