这是一个很复杂的问题,所以我会尽量解释清楚,不要给出太多不必要的细节。
去年我为工作开发了一个python脚本。它获取基本的系统数据并将其发送到http/s服务器,如果用户愿意,该服务器可以发送回命令。在过去的一年里,这是一个很大的实验,看看什么有效,什么不有效。在公司内部测试不同的需求,等等。但是现在我对我们需要什么有了一个相当扎实的理解。所以我要从第二版开始我的旅程。
这个新版本的目的是在减少系统/cpu负载和带宽的同时保持功能。在开发出这个python脚本之后,剩下的工作将在http/s服务器上完成。我的问题是关于客户端的,python脚本。我使用的是Python2.7.x,通常是基于Debian的系统。
v1脚本获取系统数据,读取包含要向其发送数据的服务器的配置文件,使用线程向每个服务器发送数据。(仍在这些线程中)每个服务器可以返回一个或多个命令,然后这些命令也通过它们自己的线程进行处理。脚本通过crontab每分钟运行一次。您可以让5个或更多的服务器分别发送10个命令,脚本仍然能够顺利、有效地执行所有操作,而且不会花费很长时间来完成服务器发出的命令。
在v2脚本中,我试图进行以下必需的更改:
将作为系统服务运行。因此,脚本将每隔几秒循环一次,而不是cron每分钟运行一次代码。
循环需要每次通过循环收集一次数据,然后将其发送到每个web服务器(如配置文件中定义的那样)
我需要持久的http/s连接来优化性能和带宽。
我不想每次都通过每个http/s服务器的循环收集数据。我希望每次迭代只通过驱动服务的主循环收集一次数据,然后将该数据发送到管理已建立的http/s持久连接的线程。
我的问题就在这里。如何在各自的线程中获取持久连接,并在只收集一次数据的情况下将数据获取到这些线程?
does httplib reuse TCP connections?我可以看出,持久连接可以这样做(谢谢您Corey Goldberg):

con = httplib.HTTPConnection("myweb.com")
while True:
    con.request("GET", "/x.css", headers={"Connection":" keep-alive"})
    result = con.getresponse()
    result.read()
    print result.reason, result.getheaders()

数据收集需要在此循环内进行。但我需要在多个线程中同时与不同的服务器对话,并且不想浪费资源去多次获取数据。鉴于我对python的了解相对有限,我不知道这是怎么可能的。
基本上,正如我现在看到的,需要有一个循环来驱动线程中的http/s。然后我需要某种循环来收集我的数据,并准备将其发送到http/s连接。但是,我怎样才能以这种方式将第一个循环放入第二个循环中呢?这就像我需要在数据收集循环中使用http/s持久连接循环,但我也需要在http/s循环中使用数据收集循环。
我想探索任何纯粹的2.7.x pythonic方法来实现这一点。由于各种原因,依赖外部设施可能会有问题。这个脚本完成后,将被部署到150多个linux系统中,出错的次数越少越好。
谢谢你的帮助和考虑!

最佳答案

我将把这个留给其他人,像我一样,正在寻找扩展他们对python的理解。我花了一段时间才弄明白如何解决这个问题,但在与一位了解这类问题的同事交谈后,我清楚地找到了解决办法。
简而言之,对我有效的答案是使用Python2.7.x的本机线程和队列模块。
我有一个主程序来管理各种线程和队列的设置。扩展线程模块的networkworker类在初始化时也会为每个实例旋转自己的新队列。队列引用/处理程序存储在全局列表变量中。我只是循环遍历队列列表,然后将数据发送到主线程(main.py)中的每个线程队列。然后每个线程获取数据并执行它应该执行的操作。从每个http连接接收回来的数据被加载到另一个队列中,该队列由main.py中的一个命令执行线程处理。
以下代码已从其原始上下文中修改/提取。我已经对它进行了测试,只要在位于main.py>my_service>in it的self.conf dict中正确配置服务器,并使用有效的json配置服务器响应,它就可以完美地工作。老实说,这需要清理一下。为了确保代码保持公开和可访问性,我添加了一个creative commons许可证。任何人谁觉得这个代码类似于他们自己的代码可以联系我的适当归属。
除了main.py,其他两个文件的名称都很重要。共享的_globals.py和workerthread.py文件名区分大小写,必须与main.py位于同一文件夹中
主可执行文件:main.py

#!/usr/bin/python
# encoding=utf8

from time import sleep, time
import subprocess, sys, os # used to get IP, system calls, etc
import json

# For web support
import httplib
import urllib
import zlib
import base64

# wokerThread Dependancy
import shared_globals
from workerThread import NetworkWorker

import Queue
import threading

'''
This work, Python NetworkWorker Queue / Threading, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Written by John Minton @ http://pythonjohn.com/
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
'''

class my_service:

    # * * * *
    def __init__(self):

        # Manually list off the servers I want to talk to

        self.conf = {}
        self.conf['servers'] = {}

        self.conf['servers']['ServerName1'] = {}
        self.conf['servers']['ServerName1']['protocol'] = "http"
        self.conf['servers']['ServerName1']['url'] = "server.com"
        self.conf['servers']['ServerName1']['port'] = "80"
        self.conf['servers']['ServerName1']['path'] = "/somefile.php"
        self.conf['servers']['ServerName1']['timeout'] = "10" # Seconds. Make sure this is long enough for your largest OR mission critical HTTP/S transactions to finish + time it takes to wait for your data to come into your persistant HTTP/S thread. Data comes in every 2 seconds, so 5-10 seconds should be fine. Anything that takes too long will cause the queue to back up too much.

        self.conf['servers']['ServerName2'] = {}
        self.conf['servers']['ServerName2']['protocol'] = "http"
        self.conf['servers']['ServerName2']['url'] = "otherserver.net"
        self.conf['servers']['ServerName2']['port'] = "80"
        self.conf['servers']['ServerName2']['path'] = "/dataio.php"
        self.conf['servers']['ServerName2']['timeout'] = "5"

        # Start the Threading Manager, which will manage the various threads and their components
        # All cross thread communication needs to be managed with Queues
        self.threadManager()


    def threadManager(self):

        # A place to reference all threads
        self.threads = []

        print "Loading Shared Globals"
        # This is the 3rd file in this project. I would not need this if
        # the NetworkWorker Thread was inside of this same file. But since it
        # is in another file, we use this shared_globals file to make the Queue's
        # list and other shared resources available between the main thread and the NetworkWorker Threads
        shared_globals.init()

        # Keep track of all the threads / classes we are initializing
        self.workers = {} # Keep track of all the worker threads

        print "Initalizing Network Worker Threads from Config"
        # For each server we want to talk to, we start a worker thread
        # Read servers from self.conf and init threads / workers
        for t in self.conf['servers']: # Loop through servers in config
            # T = server name
            #print "T: ", self.conf['servers'][t]
            self.workers[t] = NetworkWorker()      # Save worker handlers to workers dict

            # Set the server data for each NetworkWorker Thread
            self.workers[t].set_server(self.conf['servers'][t]['url'], self.conf['servers'][t]['port'], self.conf['servers'][t]['timeout'], self.conf['servers'][t]['path'])

        print "Initalizing Command Processing Queue"
        cmd_q = Queue.Queue()
        cmd_q.daemon = True
        shared_globals.cmd_active_queue = cmd_q

        print "Starting Command Processing thread"
        # Start the data gathering thread
        t_cmd = threading.Thread(target=self.command_que_thread_manager)
        t_cmd.daemon = True
        self.threads.append(t_cmd)
        t_cmd.start()

        print "Start Data Gathering thread"
        # Start the data gathering thread
        t = threading.Thread(target=self.data_collector_thread)
        t.daemon = True
        self.threads.append(t)
        t.start()

        print "Starting Worker threads"
        for w in self.workers:      # Loop through all worker handlers
            self.workers[w].start() # Start the jobs

        # We have our NetworkWorker Threads running, and they init their own queues which we
        # send data to using the def below titled self.send_data_to_networkWorkers

        print "Service Started\n\n\n"

        # This keeps the main thread listening so you can perform actions like killing the application with CTRL+C
        while threading.active_count() > 0:
            try:
                sleep(0.1)
            except (KeyboardInterrupt, SystemExit): # Exits the main thread without complainnt!
                print "\n"
                os._exit(0)
        os._exit(0)

    def data_collector_thread(self):
        '''
        Gather all the data we want to send to each server
        Send data to the queues for each NetworkWorker thread we init'd above
        '''
        # Loop indefinately
        while True:

            # Gather your data and load into data Dict
            data = {"data":"values"}
            print "\n\nData to be sent to all NetworkWorker threads: ", data, "\n\n"

            # Prep the data for HTTP/S
            # If you need to do something else with the data besides sending it to the threads, do it here
            data = self.prep_data_for_HTTP(data) # Do any pre-HTTP/S processing here
            self.send_data_to_networkWorkers(data) # Send the data out to all the Threads Queue's
            sleep(2) # wait for a little bit and then iterate through the loop again. This is your main loop timer.

    def prep_data_for_HTTP(self, data):
        '''
        I am converting my data from a python dict to a JSON Starting
        I compress the JSON Starting
        I load the compressed string into another dict, as the HTTP/S object (in the NetworkWorker thread) expects a DICT
        URL encode the data for HTTP/S POST transit
        Return the manipulated data object, now ready for HTTP/S
        '''
        data = json.dumps(data, encoding='utf8') # Now continue preparing for HTTP/S
        data = zlib.compress(data, 8)
        # In PHP, get the data from the $_POST['data'] key
        data = {"data":data}
        data = urllib.urlencode(data)
        return data
    # END DEF

    def command_que_thread_manager(self):
        '''
        Run as a thread
        Send data to this thread via it's queue, init'd above in thread Manager
        Grabs data, and then does something to process it
        '''
        while True:
            data = shared_globals.cmd_active_queue.get()
            print "Processing Command: ", data
    # END DEF

    def send_data_to_networkWorkers(self,data):
        '''
        Send data to all the NetworkWorker threads
        '''
        for q in shared_globals.network_active_queues:
            q.put(data)

    def clean_exit(self):
        '''
        Run when exiting the program for a clean exit
        I don't think I actually call this in my example,
        but upon main thread exit it would be a good idea to do so
        '''
        for w in self.workers:      # Loop through all worker handlers
            self.workers[w].stop()  # Stop the jobs

    # END DEF

# END CLASS

if __name__ == "__main__":
    my_service = my_service()

共享全局文件:shared_globals.py
#!/usr/bin/python
# encoding=utf8

'''
This work, Python NetworkWorker Queue / Threading, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Written by John Minton @ http://pythonjohn.com/
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
'''

def init():

    global network_active_queues
    global cmd_active_queues
    global cmd_q

    # Keep track of the data going to the Network Worker Threads
    print "Initalizing Network Active Queues"
    network_active_queues = []

    # Keep track of the commands
    print "Initalizing Command Active Queues"
    cmd_active_queue = ""

    # ?
    #cmd_q = []

NetworkWorker类:workerThread.py
#!/usr/bin/python
# encoding=utf8
'''
This work, Python NetworkWorker Queue / Threading, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Written by John Minton @ http://pythonjohn.com/
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
'''
import Queue
import threading

import httplib
import urllib
import json

# wokerThread Dependancy
# Add another queue list for HTTP/S Responses
import shared_globals

class NetworkWorker(threading.Thread):

    def __init__(self):
        '''
        Extend the Threading module
        Start a new Queue for this instance of this class
        Run the thread as a daemon
        shared_globals is an external file for my globals between main script and this class.
        Append this Queue to the list of Queue's in shared_globals.network_active_queues
        Loop through shared_globals.network_active_queues to send data to all Queues that were started with this class
        '''
        threading.Thread.__init__(self)
        self.q = Queue.Queue()
        self.q.daemon = True
        shared_globals.network_active_queues.append(self.q)
        # Init the queue for processing commands

    def run(self):
        '''
        Establish a persistant HTTP Connection
        Pull data from the Queue
        When data comes in, send it to the server
        I send the response from the HTTP server to another queue / thread
        You can do what you want to do with responses from the HTTP Server
        '''
        # Set your headers
        headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain", "Connection": "keep-alive"} # "Connection": "keep-alive" for persistance
        # Init the presistant HTTP connection
        http_request = httplib.HTTPConnection( self.url, int(self.port), timeout=int(self.timeout) )
        # Init response_data
        response_data = str()
        # Start the loop
        while True:
            # The code waits here for the queue to have data. If no data, it just sleeps until you send it data via it's Queue.
            data = self.q.get()
            # .... When it gets data, we proceed with the data variable.
            try:
                http_request.request( "POST", self.path, data, headers )
                response = http_request.getresponse()
                response_data = response.read()
                # This is the response from the HTTP/S Server
                print "Response: ", response_data
            except Exception, e:
                # In the event something goes wrong, we can simply try to reestablish the HTTP
                print e, "Re-establishing HTTP/S Connection"
                http_request = httplib.HTTPConnection( self.url, int(self.port), timeout=int(self.timeout) )

            # If the HTTP transaction was successful, we will have our HTTP response data in response_data variable
            if response_data:
                # Try Except will fail on bad JSON object
                try:
                    # Validate JSON & Convert from JSON to native Python Dict
                    json_data = json.loads(response_data)

                    # Send response from server to the command thread manager
                    shared_globals.cmd_active_queue.put(json_data)

                except ValueError, e:
                    print "Bad Server Response: Discarding Invalid JSON"
                    # Repackage the invalid JSON, or some identifier thereof, and send to command processing thread
                    # Load into THIS NetworkWorker's thread queue a new data object to tell the server that there was malformed JSON and to resend the data.
                    #http_request.request( "POST", self.path, data, headers )
                    #response = http_request.getresponse()
                    #response_data = response.read()


        # Place this here for good measure, if we ever exit the while loop we will close the HTTP/S connection
        http_request.close()

    # END DEF


    def set_server(self, url, port, timeout, path):
        '''
        Use this to set the server for this class / thread instance
        Variables that are passed in are translated to class instance variables (self)
        '''
        self.url = url
        self.port = port
        self.timeout = timeout
        self.path = path
    # END DEF


    def stop(self):
        '''
        Stop this queue
        Stop this thread
        Clean up anything else as needed - tell other threads / queues to shutdown
        '''
        shared_globals.network_active_queues.remove(self.q)
        #self.q.put("shutdown") # Do we need to tell the threads to shutdown? Perhaps if reloading the config
        self.join()

    # END DEF

# END CLASS

关于linux - Py 2.7 arch:如何使用多个服务器持久化HTTP/S而不收集多次发送的数据?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37667441/

10-12 04:55
查看更多