本文介绍了如何使用SOCKS代理通过aiohttp发出请求?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 aiohttp 通过多个SOCKS代理发出异步HTTP请求。基本上,我正在创建具有不同IP地址的Tor客户端池,并希望能够使用 aiohttp 通过它们路由HTTP请求。

I am trying to use aiohttp to make asynchronous HTTP requests over multiple SOCKS proxies. Basically, I am creating a pool of Tor clients with different IP addresses, and want to be able to route HTTP requests through them using aiohttp.

基于建议和,我一直在尝试使用,但是这些线程中的示例确实(如果有的话),则无法使用,因为它们基于具有不同API的 aiosocks 的旧版本。在线使用 aiosocks 的文档和示例非常稀少(似乎并未广泛使用)。但是我无法找到将 aiohttp 与SOCKS代理一起使用的其他解决方案。

Based on the suggestions here and here, I have been trying to use aiosocks, but the examples in those threads do not work (if they ever did) because they are based on an old version of aiosocks with a different API. Documentation and examples of using aiosocks online are very sparse (it doesn't seem widely used). But I haven't been able to find any other solutions for using aiohttp with SOCKS proxies.

下面是到目前为止,我拥有的代码(对大量代码感到抱歉-我试图尽我所能来简化示例!)。首先,我用 stem 初始化Tor客户端:

Below is the code I have so far (sorry for the large amount of code - I tried to slim down the example as much as I could!). First I initialize the Tor clients with stem:

from datetime import datetime
import stem.process

from TorUtils import printCircuits, cleanShutdown

NUM_TOR_CLIENTS = 3

# create list of (source_port, control_port) tuples
tor_ports = [(str(9050 + i), str(9050 + NUM_TOR_CLIENTS + i)) for i in range(NUM_TOR_CLIENTS)]

# Every ISO 3166 country code except for {US} and {CA}
country_codes = '{AF}, {AX}, {AL}, {DZ}, {AS}, {AD}, {AO}, {AI}, {AQ}, {AG}, {AR}, {AM}, {AW}, {AU}, {AT}, {AZ}, {BS}, {BH}, {BD}, {BB}, {BY}, {BE}, {BZ}, {BJ}, {BM}, {BT}, {BO}, {BQ}, {BA}, {BW}, {BV}, {BR}, {IO}, {BN}, {BG}, {BF}, {BI}, {KH}, {CM}, {CV}, {KY}, {CF}, {TD}, {CL}, {CN}, {CX}, {CC}, {CO}, {KM}, {CG}, {CD}, {CK}, {CR}, {CI}, {HR}, {CU}, {CW}, {CY}, {CZ}, {DK}, {DJ}, {DM}, {DO}, {EC}, {EG}, {SV}, {GQ}, {ER}, {EE}, {ET}, {FK}, {FO}, {FJ}, {FI}, {FR}, {GF}, {PF}, {TF}, {GA}, {GM}, {GE}, {DE}, {GH}, {GI}, {GR}, {GL}, {GD}, {GP}, {GU}, {GT}, {GG}, {GN}, {GW}, {GY}, {HT}, {HM}, {VA}, {HN}, {HK}, {HU}, {IS}, {IN}, {ID}, {IR}, {IQ}, {IE}, {IM}, {IL}, {IT}, {JM}, {JP}, {JE}, {JO}, {KZ}, {KE}, {KI}, {KP}, {KR}, {KW}, {KG}, {LA}, {LV}, {LB}, {LS}, {LR}, {LY}, {LI}, {LT}, {LU}, {MO}, {MK}, {MG}, {MW}, {MY}, {MV}, {ML}, {MT}, {MH}, {MQ}, {MR}, {MU}, {YT}, {MX}, {FM}, {MD}, {MC}, {MN}, {ME}, {MS}, {MA}, {MZ}, {MM}, {NA}, {NR}, {NP}, {NL}, {NC}, {NZ}, {NI}, {NE}, {NG}, {NU}, {NF}, {MP}, {NO}, {OM}, {PK}, {PW}, {PS}, {PA}, {PG}, {PY}, {PE}, {PH}, {PN}, {PL}, {PT}, {PR}, {QA}, {RE}, {RO}, {RU}, {RW}, {BL}, {SH}, {KN}, {LC}, {MF}, {PM}, {VC}, {WS}, {SM}, {ST}, {SA}, {SN}, {RS}, {SC}, {SL}, {SG}, {SX}, {SK}, {SI}, {SB}, {SO}, {ZA}, {GS}, {SS}, {ES}, {LK}, {SD}, {SR}, {SJ}, {SZ}, {SE}, {CH}, {SY}, {TW}, {TJ}, {TZ}, {TH}, {TL}, {TG}, {TK}, {TO}, {TT}, {TN}, {TR}, {TM}, {TC}, {TV}, {UG}, {UA}, {AE}, {GB}, {UM}, {UY}, {UZ}, {VU}, {VE}, {VN}, {VG}, {VI}, {WF}, {EH}, {YE}, {ZM}, {ZW}'

tor_configs = [{'SOCKSPort': p[0], 'ControlPort': p[1], 'DataDirectory': './.tordata' + p[0],
    'CookieAuthentication' : '1',  'MaxCircuitDirtiness': '3600', 'ExcludeNodes': country_codes,
    'EntryNodes': '{us}, {ca}', 'ExitNodes': '{us}, {ca}', 'StrictNodes': '1',
    'GeoIPExcludeUnknown': '1', 'EnforceDistinctSubnets': '0'
    } for p in tor_ports]

print(f"Spawning {NUM_TOR_CLIENTS} tor clients ...")
start_time = datetime.now()

tor_clients = []
for cfg in tor_configs:
    tor_clients.append({'config': cfg, 'process': stem.process.launch_tor_with_config(config = cfg)})

...然后我是尝试使用以下代码通过 aiohttp 发出HTTP请求:

... and then I am trying to use the following code to make the HTTP requests with aiohttp:

from collections import defaultdict, deque
from datetime import datetime, timedelta
import asyncio
import aiohttp
import aiosocks
from aiosocks.connector import ProxyConnector, ProxyClientRequest
import async_timeout

TIMEOUT = 10

async def _get(url, session, proxy, request_limiter):
    try:
        async with request_limiter: # semaphore to limit number of concurrent requests
            async with async_timeout.timeout(TIMEOUT):
                async with session.get(url, proxy=proxy, proxy_auth=None) as resp:

                    status = int(resp.status)
                    headers = dict(resp.headers)
                    content_type = str(resp.content_type)
                    text = await resp.text()

                    return {'url': url, 'status': status, 'headers': headers, 'text': str(text), 'errors': None}

    except asyncio.TimeoutError as e:
        queue.visited_urls[url] = datetime.now()
        return {'url': url, 'status': None, 'headers': None, 'text': None, 'errors': str(e)}


async def _getPagesTasks(url_list, tor_clients, request_limiter, loop):
    """Launch requests for all web pages."""

    #deque rotates continuously through SOCKS sessions for each tor client ...
    sessions = deque()
    for tor_client in tor_clients:
        conn = ProxyConnector()
        session = aiohttp.ClientSession(connector=conn, request_class=ProxyClientRequest)
        sessions.append({'proxy': 'http://127.0.0.1:' + tor_client['config']['SOCKSPort'], 'session': session})

    tasks = []
    task_count = 0
    for url in url_list:
        s = sessions.popleft();
        session = s['session']
        proxy = s['proxy']
        task = loop.create_task(_get(url, session, proxy, request_limiter))
        tasks.append(task)
        task_count += 1
        session.append(s)

    results = await asyncio.gather(*tasks)

    for s in sessions:
        s.close()

    return results

def getPages(url_list, tor_clients):
    """Given a URL list, dispatch pool of tor clients to concurrently fetch URLs"""

    request_limiter = asyncio.Semaphore(len(tor_clients)) # limit to one request per client at a time

    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    responses = loop.run_until_complete(_getPagesTasks(url_list, tor_clients, request_limiter, loop))

    loop.close()
    return responses

但是,此代码未运行。当我尝试运行它时,出现以下错误。我想知道我是在做错什么,还是这是 aiosocks 的问题(似乎已经有一段时间没有维护了,并且可能是针对较老的 aiohttp 版本...):

This code is not running, however. When I try to run it, I get the error below. I'm wondering if I'm doing something wrong, or if this is some problem with aiosocks (which seems like it's been unmaintained for a while, and might be targetting an older version of aiohttp or something ...):

~/Code/gis project/code/TorGetQueue.py in _getPagesTasks(url_list, tor_clients, request_limiter, loop)
     50     sessions = deque()
     51     for client in tor_clients:
---> 52         conn = ProxyConnector()
     53         session = aiohttp.ClientSession(connector=conn, request_class=ProxyClientRequest)
     54         sessions.append({'proxy': 'http://127.0.0.1:' + client['config']['SOCKSPort'], 'session': session})

~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/aiosocks/connector.py in __init__(self, verify_ssl, fingerprint, resolve, use_dns_cache, family, ssl_context, local_addr, resolver, keepalive_timeout, force_close, limit, limit_per_host, enable_cleanup_closed, loop, remote_resolve)
     54             force_close=force_close, limit=limit,  loop=loop,
     55             limit_per_host=limit_per_host, use_dns_cache=use_dns_cache,
---> 56             enable_cleanup_closed=enable_cleanup_closed)
     57
     58         self._remote_resolve = remote_resolve

TypeError: __init__() got an unexpected keyword argument 'resolve'

我在这里做错了什么?有没有更简单的方法将SOCKS代理与 aiohttp 一起使用?要使此代码与 aiosocks 一起使用,我需要更改什么?

What am I doing wrong here? Is there an easier way to use SOCKS proxies with aiohttp? What do I need to change to make this code work with aiosocks?

谢谢!

推荐答案

我尝试对我的项目使用aiosocks来获得与您相同的错误,但后来发现aiosocks已被放弃。

I tried using aiosocks for my project to get the same error as yours only to later discover that aiosocks has been abandoned.

您可以改用。

import asyncio
import aiohttp
from aiosocksy import Socks5Auth
from aiosocksy.connector import ProxyConnector, ProxyClientRequest


async def fetch(url):
    connector = ProxyConnector()
    socks = 'socks5://127.0.0.1:9050'
    async with aiohttp.ClientSession(connector=connector, request_class=ProxyClientRequest) as session:
        async with session.get(url, proxy=socks) as response:
            print(await response.text())


loop = asyncio.get_event_loop()
loop.run_until_complete(fetch('http://httpbin.org/ip'))

这篇关于如何使用SOCKS代理通过aiohttp发出请求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-18 08:01
查看更多