『赠书活动 | 第十七期』《Python网络爬虫:从入门到实战》-LMLPHP


进程和线程

单线程改为多线程


import requests
from lxml import etree
import time
import os

dirpath = '图片/'
if not os.path.exists(dirpath):
    os.mkdir(dirpath)  # 创建文件夹

header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'
}
def get_photo():
    url = 'https://www.huya.com/g/4079/'  # 目标网站
    response = requests.get(url=url, headers=header)  # 发送请求
    data = etree.HTML(response.text)  # 转化为html格式
    return data

def jiexi():
    data = get_photo()
    image_url = data.xpath('//a//img//@data-original')
    image_name = data.xpath('//a//img[@class="pic"]//@alt')
    for ur, name in zip(image_url, image_name):
        url = ur.replace('?imageview/4/0/w/338/h/190/blur/1', '')
        title = name + '.jpg'
        response = requests.get(url=url, headers=header)  # 在此发送新的请求
        with open(dirpath + title, 'wb') as f:
            f.write(response.content)
        print("下载成功" + name)
        time.sleep(2)

if __name__ == '__main__':
        jiexi()

if __name__ == "__main__":
    threads = []
    start = time.time()
    # 创建四个进程
    for i in range(1, 5):
        thread = threading.Thread(target=jiexi(), args=(i,))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    end = time.time()
    running_time = end - start
    print('总共消耗时间 : %.5f 秒' % running_time)
    print("全部完成!")  # 主程序

『赠书活动 | 第十七期』《Python网络爬虫:从入门到实战》-LMLPHP

『赠书活动 | 第十七期』《Python网络爬虫:从入门到实战》-LMLPHP

『赠书活动 | 第十七期』《Python网络爬虫:从入门到实战》-LMLPHP

『赠书活动 | 第十七期』《Python网络爬虫:从入门到实战》-LMLPHP



『赠书活动 | 第十七期』《Python网络爬虫:从入门到实战》-LMLPHP

08-12 02:44