urllib 模块是一个高级的 web 交流库,其核心功能就是模仿web浏览器等客户端,去请求相应的资源,并返回一个类文件对象。urllib 支持各种 web 协议,例如:HTTP、FTP、Gopher;同时也支持对本地文件进行访问。但一般而言多用来进行爬虫的编写,而下面的内容也是围绕着如何使用 urllib 库去编写简单的爬虫。另外,如果要爬取 js 动态生成的东西, 如 js 动态加载的图片,还需要一些高级的技巧,这里的例子都是针对于静态的 html 网页的。
下面的说明都是针对于 python2.7 版本而言的,版本间存在差距,具体参考官方手册。
首先,如果我需要写一个爬虫,去爬取一个网站的图片的话,可以分为以下几步:
aaarticlea/png;base64," alt="" />
当然,你也可以将爬到的数据进行各种处理分析,例如你可以写一个比价网站,去各种网站获取报价,然后整合到一起。所以,将上面的布置拓展到所有的爬虫后:
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAApgAAAB2CAIAAADjgnRFAAAPKUlEQVR4nO2dPY7jRhCF90zjxIENGA7sA+gka2JhYAAnxsIH4EUsOHHixImTTXSJuYACRwbogCLZP0WpSTUfVZrvi3Y1T9KbqiZfd5MafegGzudzV8wi8dvb20avjOcQPGvEeNaI8awR41kj3tTzh9XPLBdTdI0YzxoxnjViPGvEeNaICfIdxHjWiPGsEeNZI8azRoznREyQ2+BZI8azRoxnjRjPGjGeEzFBboNnjRjPGjGeNWI8a8R4TsQEuQ2eNWI8a8R41ojxrBHjORET5DZ41ojxrBHjWSPGs0aM50RMkNvgWSPGs0aMZ40YzxoxnhMxQW6DZ40YzxoxnjViPGvEeE7EBLkNnjViPGvEeNaI8awR4zkRfzhvz9vbm+Bd6oJnDXjWgGcNeNaA5wRW5DZ41ojxrBHjWSPGs0aM50RMkNvgWSPGs0aMZ40YzxoxnhMxQW6DZ40YzxoxnjViPGvEeE7EBLkNnjViPGvEeNaI8awR4zkRE+Q2eNaI8awR41kjxrNGjOdETJDb4FkjxrNGjGeNGM8aMZ4TMUFug2eNGM8aMZ41YjxrxHhOxAS5DZ41YjxrxHjWiPGsEeM5ERPkNnjWiPGsEeNZI8azRoznREyQ2+BZI8azRoxnjRjPGjGeEzFBboNnjRjPGjGeNWI8a8R4TsQEuQ2eNWI8a8R41ojxrBHjORET5DZ41ojxrBHjWSPGs0aM50RMkNvgWSPGs0aMZ40YzxoxnhPxh7eB8/n8Vswi8SK2s4FnjQ08a2zgWWMDzxobeL7HBityGzxrxB8/fnwBIa+vrzu2OxS/vr7uXQxPfPr0aaOmlJw3aNYizKNs0/MzQW6DZ4147yPuPbJju0Px3mXwx0ZNKTlv7P2r+2Ndndd1kCCfBc8a8dyghy0orLYyyMuf+55ZWquNgrz8Nd8zBPm2NvCssUGQPywEuVMIckcQ5NvawLPGBkH+sBDkTiHIHfEugpybJjTcvKeJIH+HFFabIH80ltaKIN+RuVo9VZDrouzdU7GDBPlzUFhtgvzRWForgnxH5mr1hEFe/lxYAUEOJgS5UwhyRxDkUAeCHEwIcqcQ5I4gyKEOBDmYEOROIcgdQZBDHQhyMCHInUKQO4IghzoQ5GBCkDuFIHcEQQ51IMjBhCB3CkHuCIIc6kCQgwlB7hSC3BH7BPl5e/rvWethQAjoiyzo7P0wHpQ81MCg9eVs2rjw/Eyz7meuWSV1Xg0r8ieEFTmYsCJ3CityR7C1DnUgyMGEIHcKQe4IghzqQJCDCUHuFILcEQQ51IEgBxOC3CkEuSMIcqgDQQ4mBLlTCHJHEORQB4IcTAhypxDkjiDIoQ4EOZgQ5E4hyB1BkEMdCHIwIcidQpA7giCHOhDkYEKQO4UgdwRBDnUgyMGEIHcKQe4IghzqQJCDCUHuFILcEQQ51IEgBxOC3CkEuSMIcqgDQQ4mBLlTCHJHEORQB4IcTAhypxDkjtgnyM/bw9eYiumLLOjs/TAelDzUwKD15WzaOL7GtC5zzeJrTHOOzaE9XROc2sPLDZrjTWn0JsdmeMqxmXm1m+86vGd3bF7S32B6/XspKfJG7f7y5cuiV14xHk7tIavTqT3E5TQ6YY+Y+i07tYf40Vw1PnLl3ZtjPMwObZsNu6XjpbDa68bGly9fFr3yitZfJat79kDSist/srEzjrBEYHV7OBFZp4S5IbeCpbW62cGwWbusyNNiZl2IKtq01oEynHOdNIut9ZwxyM2ehC1JGmr01wqGy+MzQT5Lfk6YfaNc+hRB/vnz5x9++OG33377559/SvSrxsMlUmdTsC/sVOH+X9lQGc4DdVuWquw3ODYvL80x/qE9O50bnt2xWTxcCqu9bmx8/vz5xx9/vN76DYM8aW86DxpGxTQdG5pnNWgcMU0bjqK84FM2zLxGnV9uaa1udjBs1h5Bbs287cNknPEm/5iOPS/NIsgDkpP37aStEeTxKeHK6fPUHg6H4FnhSfqlaaM3Kli4rWXfIB8L9f333//6669///33Ff3K8ZD2bP5kHPzLTM2qLUtG52WFkM8eRlPH9nBoT8aMJJZZUX9zU8pg6yC/2frNgrzvRNC/S92MKvXSmVngzAQxb1I8UWgaZyvysFnffffdVsfpHPlmye0gz8trHV8P3CyCPMc8i60PcquvaW+NjbqMYW44xEbThNPHS/qk7ze8SfF04SYlRRYE+cg333zzyy+//PXXX//9998KqzNc3ZZOf37ZbhsbGp4h6rUsPBv1T43H25ghcdxc2fuJphbh9GXVtK+w2vcH+VzrNwryqLGH9tR1p7aZSmwUKo/8lL7n4YQqWgAm7xw4GH7+6Cty1XFqER04V6pVtCLvnDSLIM+pHOQFW+vDuTzfsguGY7D0uwTCuFLoT+DpG4V2nm5FnvP111///PPPf/7557///ltuNeJS/TBKZ66DZity8/iu2rI0yJvk54G18PQ1uyIfdcPTk2nCUgqrXTHIk9b//vvv61t/N8nYuWCdw4/Ny7Rya45dd2wPwYgJr7PPZkNNltZqXZDXPE6vEhw3wbFp9KYoyL00iyDPOTaXi54zIT0TtnNny9tBfmrbpnlpmuiGp/kLPYG1Y/MyTTqTMZgMs7pB/sh89dVXP/300x9//NH/d+Hvl9S9L/ZYvmDTfHrDsS3pZdGqLctX5OmSIByh4+J8ZkUe/QKXKUbKshGzaU8Lua/1c/TVHq+WWgQ7MMcrdzj2M/Xg/qlhtTeu9Q/R3Gp6jkWllLin4PewSbNO7SGciYd3tAQr63D0z3RsOCYesln54wR5wrQ4jh/fcGv91DZte1mpRVud/XkjTYVop2f8+ZQB05naPj/fOaKuH5yPQJ0gjxasWWfzFbmxt123ZcY18i6cIsx4tFfk/ZI+2Uqwf8FStmnmMrYP8ulR+2jPJ1b2Lc7JYZi8xdC0LDC2W+Q9R7NO7eFwOEzzqvASV9d1UeVLt9YfsFn54wR5wOXcvWRrveDQvrUiHzYA4uVUuC0bJsapPUSXVw+Hw7DxHr3A8P/U05rbmCJKiux1a73rsiMx+tiJpYpPEsG9rV3llhkr8uDJYWfHB/JL6J35uhWD/KbM4dZ6dOK2pubWFfFwNyZIh3yWNmVN06RThbHxTTN/y+K9LK3VI2+tH5tpEyq99TzXpjVMDwVHzSLIc4qDPD/jrgjyU9se46M93A6awqR/oG3GjdxxKEUp0I0/s9dpTxfk3377bdWbaGbXsQNBqka3q14yOCxv1ZbNBnksy6eAp+nOuKj7QZBnn6SNJyRFFFa7YpAnrd/srvU0yG9fSumOzRjvqW48oLu41fkdGZcGZfPDbv68soaltVoX5LWP06scm5fp4yK3gnzmWB9/5qJZBHlOYZCHPZ7TlF0jn54avGa6ZTuMt+Hlglc+hTdEj9760GmfNMg3+/iZufqOj9rxyI9X66f28DKstiNhnZbdCPIwty/rkehp2br9+oo820+8iSzI5R8/64l3yswg74uWjJcpJ/pyH4/TNDHaiI8f6IYGDZM+n0G+w8fPui6aSHVXgnxK7EGXXJ1y0yyCPKckyLOTu6EZdTZ5kEdTg3Tj9cZmZ3IPU3h199r7LmffIN/+D8IMq+hgsn45Ql/yjg/6/mifLpnl2+4GC1uWDCVj8AUBM5oI/7rR8CsFl9ft17JmqTcprLbLPwhj7JR1xya6m7lfpDX5Bx1yhlHVNi/DDQvjZxjjvOhvxGrMT6Y/dJDv+QdhhtXysZmtUj/0s2PTSlwnzSLIQ4YOWNPp+O4HszP3r8ijt5y9LGtZngvoZ9paF/yJ1ks9h73mtFZG5/MxM2V51ZaZK/Jsc//y7HGS0IyTkGCANcNWf7Yiv+XtCpsG+b5/otXsY3ana7AxM8flGnn8POtPSgwtG3ZpXa3Id/wTrdkE1LqJPBLMt2sK7+jVHrNZBDksY8cgXypmPCjZNMiXiml9OdWDPIRvP6sLQQ51IMjBhCB3CkHuiH2C/Lw9fI2pmL7Igs7eD+NByUMNDFpfzqaN42tM6zLXLL7GFJbBihxMWJE7hRW5I9hahzoQ5GBCkDuFIHcEQQ51IMjBhCB3CkHuCIIc6kCQgwlB7hSC3BEEOdSBIAcTgtwpBLkjCHKoA0EOJgS5UwhyRxDkUAeCHEwIcqcQ5I4gyKEOBDmYEOROIcgdQZBDHQhyMCHInUKQO4IghzoQ5GBCkDuFIHcEQQ51IMjBhCB3CkHuCIIc6kCQgwlB7hSC3BEEOdSBIAcTgtwpBLkjCHKoA0EOJgS5UwhyR+wT5G8D5/P5rZhF4hAGhIC+yBU7uJ2Y8aCkZGBs2u5QTOvLKWzcuqaUQLPKWdosk6WHFSvyJ4QVOZiwIncKK3JHsLUOdSDIwYQgdwpB7giCHOpAkIMJQe4UgtwRBDnUgSAHE4LcKQS5I95RkIOAih0kyJ+DwmoT5I/G0loR5DsyV6unCvLX11ddlL1jXl9fK3aQIH8OCqtNkD8aS2tFkO/IXK2eKsjrvjKeQzx65gShhCB3CkHuCIJ8Wxt41tggyB8WgtwpBLkjCPJtbeBZY4Mgf1gIcqcQ5I4gyLe1gWeNDYL8YSHInUKQO4Ig39YGnjU2VgQ5KNmx3XmQQzkbNaU8yKGcdXVe10GCfBY8a8QfP37c+4h7X9z8OMOm7Q7FfIBlEZ8+fdqoKSXnDZq1CPMoI8h3EONZI8azRoxnjRjPGjGeEzFBboNnjRjPGjGeNWI8a8R4TsQfztvTf8+aL/CsAc8a8KwBzxrwnMCK3AbPGjGeNWI8a8R41ojxnIgJchs8a8R41ojxrBHjWSPGcyImyG3wrBHjWSPGs0aMZ40Yz4mYILfBs0aMZ40YzxoxnjViPCdigtwGzxoxnjViPGvEeNaI8ZyICXIbPGvEeNaI8awR41kjxnMiJsht8KwR41kjxrNGjGeNGM+JmCC3wbNGjGeNGM8aMZ41YjwnYoLcBs8aMZ41YjxrxHjWiPGciAlyGzxrxHjWiPGsEeNZI8ZzIibIbfCsEeNZI8azRoxnjRjPiZggt8GzRoxnjRjPGjGeNWI8J2KC3AbPGjGeNWI8a8R41ojxnIgJchs8a8R41ojxrBHjWSPGcyLma0xt8KwBzxrwrAHPGvCcwIrcBs8aMZ41YjxrxHjWiPGciAlyGzxrxHjWiPGsEeNZI8ZzIv4fOHlX2k32iZUAAAAASUVORK5CYII=" alt="" />
下面我们按照这个通用的布置,学习每一步都应该怎么做。
1.打开目标网站
urllib.urlopen(url[, data[, proxies[, context]]])
去远程请求响应的 url,并返回一个类文件对象。(注意,此处已经发起了远程请求,也就是进行了联网操作,有数量流量)
url : 一个完整的远程资源路径,一般是一个网站。(注意,要包含协议头,例如:http://www.baidu.com/,此处的 http:// 不能省略)
如果该URL没有指明协议类型,或者其协议标识符为file:,则该函数会打开本地文件。如果无法打开远程地址,将触发 IOError 异常。
data : 如果使用的是 http:// 协议,这是一个可选的参数,用于指定一个 POST 请求(默认使用的是 GET 方法)。这个参数必须使用标准的 application/x-www-form-urlencoded 格式。我们可以使用 urlencode() 方法来快速生成。
proxies : 设置代理,有需要的参照官方文档。下面给出官网的例子:
# Use http://www.someproxy.com:3128 for HTTP proxying
proxies = {'http': 'http://www.someproxy.com:3128'}
filehandle = urllib.urlopen(some_url, proxies=proxies)
# Don't use any proxies
filehandle = urllib.urlopen(some_url, proxies={})
# Use proxies from environment - both versions are equivalent
filehandle = urllib.urlopen(some_url, proxies=None)
filehandle = urllib.urlopen(some_url)
context : 在用 HTTPS 连接时,这个参数要设置为 ssl.SSLContext
的实例,用于配置 SSL 。
一般而言,我们只需要设置 url 参数就可以了。
例如:
f = urllib.urlopen('http://www.baidu.com/')
这样我能就能够得到一个类文件对象了,然后就可以对这个类文件对象进行各种读取操作了。
2.操作类文件对象
下面的方法和文件操作中的一致,下面列举出来,详情请参考我在python文件操作中的解释:
1. read([size]) -> read at most size bytes, returned as a string.
读取整个文件,将读取结果返回一个字符串对象
2. readline([size]) -> next line from the file, as a string.
读取一行,将读取结果返回一个字符串对象
3. readlines([size]) -> list of strings, each a line from the file.
读取整个文件,将每一行封装成列表中的元素,返回一个列表。
4. readinto() -> Undocumented. Don't use this; it may go away.
一个可以无视的将要废弃的方法
5. close() -> None or (perhaps) an integer. Close the file.
关闭文件,返回None或者一个表示关闭状态的整数。
另外:也可以和文件对象一样直接进行迭代操作。
除了以上和文件操作中用法一样的方法之外,还有以下特殊的方法:
1. info()
返回文件信息,对于 http 协议来说,返回的是响应报文中的报文头。
例子:
f = urllib.urlopen('http://www.so.com/')
print f.info()
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAtEAAADYCAIAAACiOyixAAAgAElEQVR4nO2d3ZnkrJJuZcW4cMaDjQltwd4WJH58BhwcGDeSu7GkZUzOhSQUEMGPVJmqzKq1LvrJUguIACReAVJMDwAAAIDXMxnH5uCmDRfmy20CAACAn4ehOYJLQiN6NMfj8Xg85hDid9sAAADw0Riaw0+T3wbYS8ba6PcC35P3txAAAODdsdZWojeWVdYFFxfmdIKP60F/Xw/5uP7nNkI3Ux0YxWU+WxHe51Zm56z2hODWQ7XS85x374ucVpKnu6vHHAEAAPitWJpjIXoxFM/bgktabcn+eznltv5P2I4OpBpiyyfl94jrZMx+xC5rEzuLLNClm6m0higO7ktOu68AAADQxNrPkc0ciNF6e9bfjhRj8zJ+y9WYkVQjSNHQ+FErS/+QORep0kGpTcqEq1SJnikOAACAQZp7SI1ZhMdjnqvzAdmyynCqPuY8R3GkUVZDcxipNsklT95+x7hvdHGsqgAAABzA0Bzi2V8866cXaH2snPEQ6yuHUg0IkXLXRfSTc8WuEFlWKmwv97/t0rNU8m8ra3uBBQAAAPrU93O8Nd//IgnrKgAAAIf4SM1hvlhzceEoDgAAgEN8pOYAAACAjwPNAQAAAFeA5gAAAIArQHMAAADAFaA5AAAA4ArQHAAAAHAFaA4AAAC4AjQHAAAAXAGaAwAAAK7A0Bx7WJTXf2tTxnV9dnFbkBSXxck1I75Yyb72pVMjbIu25/FQ31Avw7+M+BVHWuxUqoYXg3Xoo4zSm77h+sp2BwCAN8Wc50hhVDthzGTc+lMsGazFpYiuT2EObh3Ldi/2mLGxY/cX47mIWHfRL4OqYY/+ivoWLrdhgMpHNNIc3HiqJBk2C0e8GKpD6YULfxc/XZi3YLz3V7Y7AAC8LU3N0REdIwNz+5xl3NKR4r+OCFKvvBiQSl/UHDJ59Msoa9sjz5Tn1Gj5lVtdy3nRASP21H4vOdbqMPNiDuG+aqsU4dffX9juAADwvvQ0hwrxnh6XjcnxMuL8IOVQKaLRn865MV6OPFfvSQp70ggaghsxaBuBR8by6Lds6wsOjbSNRQozVWrCvrpSaqhRh1quRT95v/jmfUPHAADAz2ZQcxgqRI0ZaWDqLclUi6uVfiLnup1DU/m1QXofrZWSMBBD9aDm2ERMdcqjNfdQN8dKtW0k6c6uGCe06lC22jRN0/Sv238mH4JzzrkQ0BwAAL+WpubIJ+DVlj+tFc5tC+xqjhM5V9cgij2bAyYNKjAjC1Hs4NpKdzxurq0Mp/rb1i5VL9LBRh0WnWaZGfJxDm5yYR4tFwAAfh4tzWE8pme7FIv1eHH6PD9hnmPft6hz7g9X1h7SoYTqNG1PX3NI+bBsfRixZx/g65MPrT2kpkCopUrybWTnabaBo1eH2cLUpjmyf8fyAQCAn8Xwu7LLXIO73ZxcAMhPM1707I4rxhuk6yHv9z0KejtJf5HFfDf1hObI7bn945Ml+sWTPbmgqK7W26rJz4Z7jXzM6aJKqv19VlmW2iNSejFWh6Jm5nXjSwwhbjUZK60DAAA/mU/8JtjI+x3nEFMajIUAAABP5RM1xwvZHu1RHAAAAE8GzQEAAABXgOYAAACAK0BzAAAAwBWgOQAAAOAK0BwAAABwBWgOAAAAuAI0BwAAAFwBmgMAAACuAM3xeqKf/nX7z+HQd4+H8Xl5AACAT8XUHHsojncY7LKwH29gz0FSCLXTIc2IhQYAAD8BQ3Nk4WQPfgRcRB99VqrlP9dxN4YrPkr+TC/20C22dBgoC80BAAA/AUNzyAhqB0ffc6NjO1WMcpJg+es8IxY+0QsZJ9484XX2AAAAvBdmLHtrhFvXWxY5six3+LCGKd8e4/USyMFU9yJ++p6VEWPdx/3nP37ags1PUjG5/ECfUS9cmFd3Yi3Vo4hPG33T9/8Jbpqcc9PkbjcnjUZzAADAT2BQc2zrLfu6S/Tr2FmoAZk2nZ3WaEZSmRTn7Is+22pL9Mtov1uoSx+k68UcnI+POdxkttoLOckx4vuiUNK/h+oHAADg3emsrTzmOd+L0PjxMJRBdbrii5pjEwLRq5UL+ePc1tOuF49iBsP2ojyl6/vyW/5byRkAAOAD6e0hXQc7c56jPoKuuy5ETlXtolOZ2LMIhggw5jlS6YN0vdjmOVwhTYpUWoO0fL+jOQAA4EfTfldWDHXGnoZpHRz3M9XnJLKchlPV7FEzC/mmB+dcno3hx8j4PeDFMsdinpR0RTnJ0fF91VB+35niwizeWj60LQUAAODt+OBvgu3rKstfbzQZoCc5AAAAfjufqDmyiYLsEFMBAAAA78onag4AAAD4PNAcAAAAcAVoDgAAALgCNAcAAABcAZoDAAAArgDNAQAAAFeA5gAAAIArQHMAAADAFbyn5rjoo6Ldj64DAADAs2jHW/mu4bgIE/9S3uqj6QAAAD+WXlzZS74mPofwfaM+mgMAAOAKDM2xS46L1MCzRv1z+aA5AAAArsDQHPYYbMSy9yG47ZA+olPJIyLm+46PDzOCW7f0YUai2yubdxvTMe0XAAAANBnUHNt6y77uEv2uNnb5kB9JZ6c1mhTkXY70eqahODJS+ggjpZtlFUVovwAAAKBDZ23lMc/5uHv0Rz6HYUuEruYYKWuEkdLNnBdHUrVovwAAAKBDbw/pOqQ2nv4bP0ROq3ZJMw1yp8h2fozH5x7OzXNYpdfLCqk2qn4BAABAh/a7smIwNnZUTJOP288/f8ojPpo5bUmNLR5qcmQvvlP6sP7ISxfvBDf2oFhvDls1BAAAAA3e85tgAAAA8NNAcwAAAMAVoDkAAADgCtAcAAAAcAVoDgAAALgCNAcAAABcAZoDAAAArgDNAQAAAFeA5gAAAIArQHMAAADAFaA5nsjyRfRGvFwAAIDfixnLfos2skYVefXA+W1js4wP+yQTpC9zuBHpHgAAYMWa55iD28bNFFP1hczBvUzXiPixJrtEeJKnzG0AAADYmGsr0a9BVecYvzgQj8V69d6/ZKTulr6c8EShgOYAAACwsfdzrKJDSI4seHuKIL8tvtxlUPksvnyf6JeMxPk6Unz1yCqOop+myXtfHkoLJ/f8b3VwyXf504fgrOj21dL3pM655aSyEio5v2B1BwAA4D2xNcciAYTkSI/v8kc25orZkUPrFHNwLsyP6FNmc9jH7aUs88hyeio0+mXQTv+TWVtDz3NEv7o1UPpeljhS3UOqc9Y/AAAAfiyV91bm4HzIFlb2J/LqMJnkw5HxUzzp7+qhyMA8MlXsKQTEiOZYfsZKPg172idrzVGekyZQHJtNAQDgp1N7VzabLtiHyf053hzOi5dFBxDvdmxTFnsZ2w5Q80iaFpk3pVLMPQgjVz2hEVMXt0Lx6HkOVbqe59hmW/LMiz+Z3gAAgN9I9fsc5WzFul/hdnPT9P9cOTchTsuPtIfVKDZgrDsexGYIY99DfY9F9JNz5au920kVE9Ruiu3Ast7SsKe2n2PZUrInV1lUcmY3BwAA/Hie/E2wY+sqzy3580btQ8tAAAAAn82zNMfBl1WezVr8p22LSJM1H2g7AADAMfj2OQAAAFwBmgMAAACuAM0BAAAAV4DmAAAAgCtAcwAAAMAVoDkAAADgCtAcAAAAcAVoDgAAALiCD9Icx4O5fAsiQi4AAAAkDM2xxT1pjfE6rusl1D4QngVO+cqIr/wSUWC23/+0fJ+Dm253+X3RrkHNr56LnFKQvee4CgAAcC3WPMcc3Dq+hWZ4tiLS7AU0hudn2dOIoKt+6LKin1z4+5hjHBUD0TdkWxZ1d5r8/UjOAAAAb4W9thK9C3MMrafoN9Ycraf/bii1r2iO9PewMoh+8r7v1PYbzQEAAJ9LbT9HMdoV+Ls4uA+YWYT36FNod+/3s+Q5KTDcetA6R5rgnMuLNuxZE4mYc8fCz2m/9DqS5ftD7uSI4Xbb1kAaCih6F+axiLKblBrMGQAA4O2oaI453NytOaTpkXIbFtPwuI6oPu4P52kWYvnPxzp+Z6OnPkfkfGCeIyV/tGdsen7V5zlK94VMC2JNpKINom/kVmad8hjJGQAA4A2prK2EVS3UEw6MzdG7EHw2HWDOGeh8aucMrK2Ic9ahuu3HCb9MM6oKYGjfa2MepiVa3v8tHgAAgA17D+myd1TsYNT05znW5ZUgN1iIJ/Z5ntv57OfsMyfNoVnv5zjxeu05zVHMwOxTLAOzES3psG3nXX6H+7GcAQAA3ojKu7L7PghzONze2CzGPLUPI3ofi22P6WVPH+Wjfrm6Ujz7r7b4ykse2ZxBsccit7E9N6D90u/K/vumfdfD/26R1Ajmq7XytOKcfC5k3yfDfg4AAPg8PuibYGc4tq5ykktf3QEAAPhQfqrmOPayCgAAALyan6o5AAAA4L1AcwAAAMAVoDkAAADgCtAcAAAAcAVoDgAAALgCNAcAAABcAZoDAAAArgDNAQAAAFfwdpoj2p83f108MyKlAcBRrrlvNAJQAJzgeI96dmAvO8abm54S00NkdMDHSqT4Rry5o/YUwV1ujegqo3bvrq4hddef63EfVaWKWCoDEWDESUWtyoyTa7rmiyNZJBdRH3nl62wMq8uTWsFvjGy2cwxPVeCd8pxGHZa9qPgKvg7pU55T5l0rqyhI1EaKVNgty7pQeqkse8b6lK4N7UheutXHRmzW9jT6cxmaURws82n2+YvCED3ljlRiXTvPEjcD9dxK9rWqNXqL3V7Na3DMr9elangxWIc+LuVk/Xo72DX6SRzqUc+P7GFoDhGYtYyQpswJDWNEhNdOPjmveoCQ8W7r5jQD1tfYlWDK2sxHh3Bbh7OKPaK9jbi50bvwtwihV3i6ZW21hcgxv4L2P27ygvR3qXi3GL+tnPP7cj6DJfRHCH8tT7NanfzdPKdSh1lZ+jv4ec7ROMfw1C4rq7HFmzVZjINllfUch1Jpe0ybFZWoAGWN5TWv+ljfZt1bzBbcIyfvtVrUoajuOTjbHjOfj2O5oa5Vs9XCc+6HA/Xc4Is2GL3FsGfoGuz6VfaW0VT9sqw+P1KH+ejwd/HThXkLe3632/2Z5CP1uaHtadhxZcXtsKEqujFa5Z25pU6OZHs6obAnM62WybgZpqdHNIfsixl//uRp76osezxQ57QsnDOROXnvi2HBLn1VPPWco598TFfiknNZFdVauhuNZNZkVoeZF3ZVl/VTyb/0tNSRdo3tbFUxUtaI7/rM0p6/ts0mqm9nNaZL131sxGbdW/o9as7Sq5vGmkG7z3+K6DCu939W7bZouHj44afKwXo2TP2KDePt3r0GC5rnZFa3cq5fO63rVFZIvQ713Thuk+Drj3vS1lm7P5HafSP9WZDNQT59AsBaW1ltKDtmFqc+nwy6W9fPH9NaFe9eHdnkXnrwMp/MdD4drB5jPzuqqu/YbF6QBzVHTVlvTb48Tt+NM2O43baZu5o9NQvLdo5+8nF/BpB2qIZvWF6c89gu46Iq5Dm2p0HN8MtzKtpOl/XQrZDnbNaS5YXyulJj+9PPWFlFPY9bqCtBJTJQ+WQ1pkvXfWzQ5qFWrl0d+fNeFBMx7T7fGhsMymt5+Z0GhrVgL6bEy/tG+jOtpBo5Z9XRvG8ZrbxUrDAxv0P2GannxtiTdXhZehpBhYWNfLLeMtBe3f7cyKdRN7Wnl5FrR3qxZ1ifm7Dvxn5pVOMxrJ22/O+yZzbXN+U9U/aoVgEvkO/1PaTRC5tSJdc0oJle/++Wz95o+kjSHA1pZtrTZryXF/fxjs1/szPXG8pfI59OWZZu8/ckDG43Z2sO0d+jn8qnSbPc/KDyq+hp2QW2tYp17yhylvepWlWII4anU1HfxTmm5rDK0kWrnHWS0lOjrFqN5U0yUFZZz4OplD2GzSZZPqrGjNKLPia1d8tmbU+rPxf1b93Gk6OtPj+gM3b0nWRRkOU2Lx+7baE2ipU51wZUY0goTsuvNfPe0uZgPQ8l1x2mY491KbVt6/bnZrtXzbFSjV471gmtOlSjw79u/5l8CM4550I4229TkqJnDo47e3uZ407xhPBcrP0cYiZI3Nlb4tcib5p5rtSI3YOVBGtcroPVMge3iDsfW8pp0EKpFcrFifp9oVJWt5vv55q1mp1UznLP898DbbEh7g3lPEizrJF763qJ72ojn0vcPR25m2R1qNenhKJJOdRy7v7W7WXU2GO/fQ2Wpet5MJX2ffDaLBWY5HZv1nxe9Q2bm3MYZn/Onh+sXa5Ve9prNE1qdzaZxZFruZ+zurd1fBy5EXU5Ws8Nk07aUxWr1bvxSH9ut/toqvFrJ1pP/s06LEeH9bqYg1un076oOVTN7/2u21ebJZquPoPmHlL9ZP+Q41xSiPb6U6ZqV9cOzHPk+8KL2jHt6bDWYfTudhNz4d32GLBZD4GpxTKxaJc1oOI3PWvUar63oLB0q3mrLdQImu2RcmHOhpG0Erm3ZGm8yrnqTl69t2JaqKhBPaYKM6p12BoSipwr97vSU6MsVWOnyirrOY6lUvaYNpuYDWSJiVTzVh9r2/xX2XO3WnC2937mbZGX3rHn2HYOfSfZ5jkyCVVeVwOaQ+ecrbgNt059aM+ulc5wNVDPDawr1yi9mpvuLUPtPtCfVT5G3xhKdaiszQtls0U+OmxXgfx3LB87a3v0tMedcodQo8RXTXI8TM0RzWe3cplIHKqaZqTZjumcsydHv75lmjq3mtMw8u622b72sXfK8uF7EhfS9rtns8iqamD6c9KrnnWrGzVoLJrXF/XUEfkQJn1PGmqapj9/xDnqDclaWaqSi7NW4VcW3/R0H8tqnpeKtFPTZc7GOaWn+gxdY+aV3C0rqwzl60Aq3chqd9RO+2oqm1mXrmqjYbOyp9XKWYfJzdaPbo0+38jHJLMpbkskRYM6J3dqFHVo3zG1t+Ie0L1nFhNpPso7zHrSspRfPHh0/TxaP+q0rPTbP6aFKnm9t1TvxmZ/HvBL9xbDU5XKLqvUQMqLsTpUbedCDEHu/znXOqpnprLc7eZUVcvO2mqvllz7Om/3TbALiL7ZhwHghzC2dNDN5FWPfE9hZGn2HGJKgxvmN9DtvS/qmS+c5Hj8Ts0BADBItPaX/RK2h+Jf6fzb86E9E80BAAAAV4DmAAAAgCtAcwAAAMAVoDkAAADgCtAcAAAAcAVoDgAAALgCNAcAAABcAZoDAAAAruDtNEe0gxC/0acAKxY+j+435TvIzzbD9bxRXwUY5vh9g++TwnEszaFCg5xlJL6Axrxl5xHfnkXfU3mGDIIwEMPi5IeX03dnjw5dB8MF7Z61zqyHjOja1rHBCr0xavHBnlkJAfMqXtJXdYSa56HrR1R1vdKec58wbhJ2exVX00D/GYjoUSErayRVw4vBOvRRx+9JB1/T7ppD95zXfiEbfirNuLK90EEitp75vyLi6IF70mWPifOAp3OM+rAO/ChDYckIXCccOel+PvsyFO5OBDusnLt7FreI4QOpDHs6OY/5a7ZF5VTRM89GHH0rUuhWHx97fNInZa3rZ4+AaseM3vni1ZpF5k3Rrkp79NU00H9aEUcbw2VZlnV1D3gxVIfSiy0S77RGdJ6m5Yp7Wbtvpof8VjbcmkxywCkMzSEvrKaqaHfQLPJQW50cyfaJCWPh6b2IHDhN0/Tvf/qao6ZdzjlyKlX0k/dH5jlGJkV07KjhqZTSnm7OI4xrDmnbiLJ8N3Q//POPnPrqKoEDtOpn4LL9ouYoe9R96Goa6T/Nds+s1i6M9J/atVDmVq/D8g653X9SxE9/X1v6Fe1uWqv/LMhCGDPJASew1lbM0DFF9PZ8ys8arf/8MftkPy78JurTg7I5Z6DzOUE/SE4Mt1stPrh4lE/uVy/g5WRfTp9m+eSLB/t/yVR2ja3Rt7Mbn0i1r9QuGcoZizm4SlnRTz5kM9hGKmGPCPUt7UnNt0Vuvhs5a7+0zUZb6Dq0eubQ2GBg2OPCXBqWtU7RV5Xv0cg560DNHm30qLwfnrguGnWyP6nXx549SWFPGkHDGrx7buaTRuCR9qr0nyG/OhNwZj1YV7eNUkONuQlb6/jFt84jRKf3mvcNfZmqhtgrttOBmOSAs9T3kKZ5vsdDXErjz/Hm/2757JemPpJG9IYAN+05y+6pcU/8J+wj+1SzcPuV33GMO9eqLMpp4dZ9X6Uqfd8KL2+sWao0+6vGEXe7ud2MPdUy06vLVKnKDSg1e4oRtsxZt6myOdbbYnysGn42031sma2X2zV069gtmN+gdc6Dz5e1p9L048R1Ua+foal8M7n8sevXRs1n67BD7aj7z7Bf1oF62trV3fYipW3UoWy1aZqm6V+3/0w+BOeccyGc7bcpycm7TWov60myuOQBDmPt58iep4vnePsRxyK/AOe50u/t+5QS2o2b8unOHy1PW+dXLBwc4UZ8H0y1+67vC3L4V6ORNdGbPanuqe5iQbx51+6Ml7k9C2bOuk3rNh+ow9NrK7U+L7M41ILdnFWvr3pdK/3EdVGtn7Ed0F3NYaqlMot82nNkbaV7F2q3+/jz0qhmNZ/8m3WYW5jm5Obg1um0L2oOVfN77xi/21TyZpIDztLcQ2o8gyb18Nj7ZWWVMXt2WTvwgXmOfPe/HrS0PUeZLU8LxN1KPVtvFm6PFI9Cu+jr2fZUZjCQquZ7cZMqUq13HHH7yec2rFS7x/vMh04lF1xq+9HKW1iRc9WvwuZ6W9h1GGM8v4dU27PNc7hmPfc1h85Zzj4NjoWm70bO3eGqUj+D45zVyrs9/TFM1ubSf0bssfpP1y+z/4w4ZV7dHS+0zc1yhObI/h3Lx866crcx71HrPXxEczDJAV/C0BziWSl//Cgfn7ZD1Q5opOnu54jbavm+Y0AYpAvPb0THrgXb0+pJdQv3jRhi7qO0OfrJuWxVf0u3LLxuOaqMylSG76UBMYQYilS5irPaRpeVztrv8MZD9ObsXonCnv//P+3+1OtjhfJUOZV1KDMq/swaeaC3ZPbEbYkktb9VY0W713pY6alo+O7VpCtM+m7vrRnz82j9qNMye27/+GRJrL3Glc+KFb0sl8t59Vj9p+uX8aCvPNVl6au7TGV6MVaHombmdSNFDEHu/znXOtZ9Y51Ku92cqmrZWVvtxSQHfJW3+ybYz+XEw8rpVOe4sqxv59THU3Qmb11j3R0IpxFTGoxA30C3976oZzLJAV/l92iO/Fmk+UT5wuIP3qDPpTrHlWX9DH5zjem5LXgffnPPhDfn92gOAAAA+E7QHAAAAHAFaA4AAAC4AjQHAAAAXAGaAwAAAK4AzQEAAABXgOYAAACAK0BzAAAAwBU8V3N812cZ5eeoAQAA4B1px1s5Oo7n0TFehmXhM+VO9mHhPdpCNf9YhEv52uf/OqWb0Wf6H0MeiWFRSSI/ODkSGMSoh6Gakb5GEWU1BX/4Qs8EAIDvx5znEOEOwxt+PXcJ3LgauVn4JM1RBjcS8QVaJaTgk18zo1+6iG8hy+wNwkaMTZ2PShRj0fhCU1ZTGXFBhyK7ylCxawzbRXQsUsTf7XYHAICPoaE5vjF+VSeid5SjexmF+bmlD0aUjn7yPsZnmNEq/V4Epx+NYFmm+juUj9YcIzUzXlbNwsccwn1fL1t/3O12BwCAj6GmOeT8dZrn3rdNLKsJXkyAl0/b6U8RlfnxMGLZW/HJRyhG2egnH9Zg0GlCfjf1yPhUjKAySHhLc8QYwizSZp6WVddQVL3Si9Yxa6ORZ/pt5pOlCrebFcV7tafSXFonjeg2fTz6yfulUb0fU34AAPDWDM5zLLPjcrvGNreeDT5aBxSjVTo7Pe+eHkKMsrZVgzSmFj9O5ZwCaN5u9TDOa42FEPwuElyYhcep8jqrAr3St40b+ah/XHPY+eTJwj49IduxGSL9i5pj29Txr9t/Jh+Cc865vVa7ngIAwPvS2c8hZrDzYcce0Y25B/XwqjcByv2C49TKerbm6B0W/zeHm3PV0peRetmUcLr0E+s+D2u9YzxteU7RDwbKOrq2IpTKHNw6wYbmAAD4eNqaIz2bb/MczpcSpDPPUQwP4ux5zp/6v6gM6lJj3+04WER1JaKeeD1NrICoeY7H4DJPu3Rhhthg2nXN2kNq5pNb4sp5DtkDaurp/B7SXdWk2RE95YbmAAD4VEbelU1LJPumhGUUcnKnhkiWEhl7AdILkWLEU/sKuuNK8e7lVljacbgrgG3byWJTdzKl8ELb2zZGjt9q50r/XeKR0vcKky+edLZmGG+r6nzWvIyVneKtVZWuaK+T78rqtnMhhiB3BD3nbWQAAPgWTn8T7OMeN5ubEAAAAODFnNQc6xMpgzgAAACMQbwVAAAAuAI0BwAAAFwBmgMAAACuAM0BAAAAV4DmAAAAgCtAcwAAAMAVoDkAAADgCtAcAAAAcAXP0hzys+hvyvbd7Lc28tP4gHZ/FpX+Y36Qdzn3x1TLyVYeii/0XvweTwG+B1NzmDFQupz7Gvq5sk7zcZ9sf39+VZVaYQvtGDo/rFrOuTOS6t1i6PweTwG+AUNziMAkA0HRds5cq2fLWtJ2osJbfPNIcMrmN+f7B9cLa3Xc2W6k3+/sCcdLf9VIPBRz+FJ+j6cA34ChOWQstCP3phPX6ny2rHPFmamuHDK/f3h+Ad/u1Hu2YPvM7620E6U/5XLL4xJP0zT9+7bfAd5kKP49ngJ8A4bmMC+5VlD1PZi9D2sI8iKGuRUJfZqm6c8f8/I2UuU5Zzn5e4qBLqKeq3weyciDykaV3vfLx04ceMvCA168ymbdzM9o9z5zcNPknJsmd7s5aWKqyb2wRq3eDb/+e6BW91V8WZbhe9F/UpPp6nfO1fLp9oQxFlO9z5OpfI7XmNkTxlrZSlXUqnIiHT9wbZa+D107H+kpwA9jUHOkiyT9mMN6T5X/lQbPdM52tdv30tr+OxdmOQmic9Zpoy/v2DqfWolt6n5lOY+4Vhy0PbW9OLzOdcJms01f0dYgm9YAAAPsSURBVO6mwS7M6d9a6SO1atdGp1bTrowY9odRo9DuPJmo55bxlVTHW9lH2aZmjzpRY+daudp/ot8zyZFPHYeuTe37uWvn/T0F+FF01lYe87zfgbMns8bNN7t681QFs1XW4JCjNUfXHvO0LkeHQJmwuAvWbG5k2K3DJ9ps/ddL2r1WtPy3ZnO3Vkf80hYuXTFf3yt9b7tfq42RfF7ays+qsW4rV+u5rqPkHeDc7rEvXjvv7ynAj6K3h9RnM4fi6U3+CvVn4i2npF36ZTWe1Sz1EGPlLvDEeY6KX815jpD8qto88lSq63DEhXM219v0q+0+OMhVNMf2TDtWq43akJ6qnhmLaXHtu5lVZcZC2NzKp+gJB2psOadoU6NHnaqxc61cTVV7t+dh76w85/tYK7+XpwC/jva7svkdeFqX24vlUbkZYR0yUlorp35Z5ipskXNKuB9Xzxn5Kqx4K1eeNjJXUfOrcesw3wFWn3gw9nPUvBArx5071mmbizYVR77a7h2b189Z+H2VflvxWFbn/SYHxmrV8qhXqw9jvMh9/1/df0TmezbLwd1mqw6rPWG8lZd8nXOlE3krH6+xvN6OXd15/4nbNpk5uHphWy6yzx31/UArv5WnAL8OvkP6WbSmKd6VT7T5exmssROTdu/Pb/Yd4OeD5gD4SLo7CH4wv9l3gI8GzQEAAABXgOYAAACAK0BzAAAAwBWgOQAAAOAK0BwAAABwBWgOAAAAuAI0BwAAAFwBmgMAAACuAM0hSZ8wBgAAgCdjaI5zMS5/CnxTGQAA4CWY8xxmJMxfApoDAADgJbyD5nirYf6tjAEAAPg51DSHWFtJMZ23yOLLkGyEiZarMtN/TUuwaXe7OSO09vDAvu6xCMFlQZ3MePfFopAqzAxgXsYQ75QFAAAAZ2jOc2RHijE3nWP/uD+in1yY07+1k0eIfi08pdoCXrcCX6dZmkbp5jmVsh7RozoAAADOMq45lETIZ0MeYspAjNnyXzvVAB19U8lnxOaGp5kK+b17agEAAJ7Eac2h93wU55ia49xOEXN+Ynye4zGHUCm9fo4xz/F4zDPzHAAAAOcwNMc6W5HGcvGcX66uTOt2Db2t4t83N02T9z79u6bVqTrrLFumi3Qp92o0Vzu2pPtJdZu3uZlGWUxzAAAAnOeJ3wST0oG3PwAAACDjmd8hTdMBvOMBAAAABXz7HAAAAK4AzQEAAABXgOYAAACAK0BzAAAAwBWgOQAAAOAK0BwAAABwBWgOAAAAuAI0BwAAAFwBmgMAAACuAM0BAAAAV4DmAAAAgCtAcwAAAMAVoDkAAADgCtAcAAAAcAVoDgAAALiC/wOHJ63l3CvDdQAAAABJRU5ErkJggg==" alt="" />
2. geturl()
返回当前页面的真正的 url ,针对于网站服务器在进行重定向以后,我们可以用它来获取重定向后的页面。
3. getcode()
返回当前请求的状态码,如成功请求的状态码就是 200 。如果不是使用 http 协议打开的,就返回 None。
3.操作读取后的网页源码
这个部分就和文件操作一样了,但多数情况下会和 re 模块配合起来进行数据筛选,例如:
f = urllib.urlopen('http://www.baidu.com/')
b = f.read()
p = re.compile(r'<img.*?src="//(.*?\.(?:jpg|gif|png))".*?>', re.I)
result = p.findall(b)
print result
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAe0AAAAdCAIAAAAIM6MBAAAGwUlEQVR4nO2cPa7bOBDH5xR7jV0gBMJbpNsUT611jATQAdRtlTa+warfKrrBFtEBwgcXLlaAgLiYLUhR/BhJtEX7Pb3MD4YhM8PhDB/9NzWmA/gI/i3hE8DnsnnIaAk05SeAzyD/6UzDq4uQ+TWJVmYKZvUC/FVf0+3X4c2/3+GlA2AYhmE2wTrOMAyzb1jHGYZh9g3rOMMwzL7JoOPPDDPD9tXFMMwqrOPMHdm+uhiGWYV1nLkj21cXwzCrJOl4UwIAAMi606+8Y5cvrRXM6yVeLWnc1oth7k20MrtaanV80QWbpuO1+/OCppTerw0yvu3b6gKAojqlWMZm6jgkds+CUqdKYHFUc/Fc5efD08+54BfyUm0vAB+Z9VXEqyWN23ol0dUIgLJOsiTMmtTuuaglaomg47nGz5+HF8j9urA7lPDoGQ5DoAJuSgRAeN98pFdmV9d70vGulmEeed/5x8Io45rZpWoJs8TuWWirCxTn5Xiu8rMQ/HJeubJWxyHj7CnVxqslAWKN5aUpMWXz1JRIhpHYPQtdjVCuxHOVn8fnfkPYWWa4BAT9KNeNg9GDgMfZ6w6/SfmF7LQzHSfI9bZ/NjvT2wVxY/frxmp7AcNRbR3L+lkIfjmvLFmP+/rMs5drmeallrcL4vbuV9ChBMwgD46fXeS+fZRaTvNWQgZvax5Yxx2UOhcbxHFj92sGmioqufwsBL+cV66s7zF7uZZpXspt4rixezq2opLRzy5yzziKe0NzMwnx7F/HVdsLuDw9XQCGqrqA6L+1/622tKRgtb0Ql2BjeCz0/ZFpMS99D6aw/jQI0X/7caqENrAXMwp4HPSdl1XksWU4KjWOi+JpEP5G1a+oePHM9bLtrrHrh8zdzWs2i7YP/tXNYm702Cb2441iJnOK0GzhxfAkyL8XAiBIzLN761ACHg4IgHWNIPF7Qgs9dIdSmiKsXdam+jm2mJe+B1NYP6CU+B2xlsbAXtA0xvOkyGNL44wrDyj9bWNQUXHjmetl211jT8gem7vu5Rp4Ef6Of4zjepYdysCnM2OzmUY2es0s19kDP1+i3MkppXPdvY6rcyH6Vp2L4twez61SKS3zwjoclbJfG6rjYC8myXOuzUvRt0pZWbQGgWUof7qgYY1HITsWKKqTbjfO3dFJ3fRjC3qR8QR+yNyDvGYnzZ0NPwty9Nhmea6WItSaPpNprmWKiKXEDrEssWvM2ymlhVrN5t0+fZ3VTBeT5DX+Pq4xb+NJFpvoIsYWNKzNKFJNibIe27XzJhTcUMuC2IJeZDyBn0fmPhe2f2E+3jo81FSXeMZIP7GNdbZQaieziDNazRHxbej4qRJD5Rw1SWkhsQIxKYU6F+Pnoe0YHA6Je8UXC2PN+Ymfx0TCIrLraqaXzcLsf2M/t2WRMhvx6KTn1SM3KREGY+VapqhrlP5Rk5SWGCth7rEE+52YbQkOLcS9SD9zY835iZ+nRKKv2oLY4l42C6sogZ9H5k66iiPUH731YfqwWR099rMQz3KEcTzXepis9q7jz/pWuugr4d/Ir7XE6HMXrsaZlrYXTvUjOJ5hts+ODdkrHmvcVPZ6LC1GSp0LuFSt0k7c5zmxc+Mhe8XnSWI/ZO5BXguTFni2WSyM7tqQZuRApOe5TMnVdNvZ8KZEKL3vr1JaSD96eKtxpqVD6VQ/gn2c2eg5NmSveCy7hdRjWVHQX75pJ+4zzgiHGw/ZK954xn4emftc2LEB+AWlwCyYseVM7awah11YrZqLbaEx7fDMm9DxtuqP6lRV0135aku8IVXqVFVnXV31bvYBQfRVgQDD168DRNvz0WZ4En7xd2z59oPYRNuT116hYyyXaw9F1Yvx2UgqXYk28ZC9CLPYz1LuU16BsLpuySzIIGObaSq8+6foryMuQozzo8vlgKaIZIvm/ljkarrtbHhXY4NY19e1xBvbujbaMWlcY6qfdYkA+PdYaSVtDtIvyI4t38lTDeNpaK/QYcvlDQJgWaMcn41czlSHTTxkr9gs9nO33Bf+mt5ZkTnnJWFAz9icn8DGMbCjE38damJJt148NLvQce/3nCH05u3NEexer+w4llPa/t274QP8fMzhyHj0hW8y1/2s9Y3HitbL3c+G752bD8lNZyo6fP8eP24+bJeBxlN5N0L7GbN880RC+rmBPH529HvOFRfAD37MPrLw4lnwgx/68Trh/yeLuSPbVxfDMKuwjjN3ZPvqYhhmFdZx5o5sX10Mw6zCOs7cke2ri2GYVV5r3Z5hGIZJg3WcYRhm37COMwzD7BvWcYZhmH3DOs4wDLNv/gfcRGQ6c2RK4gAAAABJRU5ErkJggg==" alt="" />
尝试在百度的首页爬取图片的地址,当然还有很多改进的地方,这里中做演示用。
关于python中的正则 re 模块的使用这里就不再重复了。
4.下载相应资源
在我们通过正则筛选出我们相要的图片的 url 之后,就可以开始下载了,在 urllib 中提供了相应的下载函数。
1. urllib.urlretrieve(url[, filename[, reporthook[, data]]])
将给定的 url 下载为本地文件,如果 url 指向的是本地的文件,或者是一个有效的缓存对象,那么将不会下载(注意,这里的存在是指下载的目录里有相同的文件了)。返回一个元祖(filename, headers),其中filename值的是本地保存的文件名,header指的是上面 info() 方法返回的对象。
url : 目标 url 。
filename : 下载到本地后保存的文件名, 可以是决对路径或相对路径形式。如果没有给,将缓存到一个临时文件夹中。
reporthook:一个回调函数,方法会在连接建立时和下载完成时调用这个函数。同时会向函数传递三个参数:1.目前为止下载了多少数据块;2.数据块的大小(单位是字节);3.文件的总大小;
data:如果使用的是 http:// 协议,这是一个可选的参数,用于指定一个 POST 请求(默认使用的是 GET 方法)。这个参数必须使用标准的 application/x-www-form-urlencoded 格式。我们可以使用 urlencode()来快速生成。
2. urllib.urlcleanup()
Clear the cache that may have been built up by previous calls to urlretrieve()
.
清除先前由 urlretrieve()
建立的缓存
。
其他常用模块内置方法:
1. urllib.quote(string[, safe])
将 string 编码成 url 格式,safe 指定不受影响的字符(默认 safe='/')。
因为 url 的编码和我们常用的 ASCII 并不同,例如我们常用的空格在 ASCII 中: 'scolia good' 直接现实一个空白字符,而在 url 中: 'scolia%20good' ,原本的空白字符变成了 %20 。
其转换规则为:
逗号,下划线,句号,斜线和字母数字这类符号是不需要转化。其他的则均需要转换。另外, 那些不被允许的字符前边会被加上百分号(%)同时转换成 16 进制,例如:“%xx”,“xx”代表这个字母的 ASCII 码的十六进制值。
例子:
f = urllib.quote('scolia good')
print f
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAV4AAAAgCAIAAADVFodIAAAFxElEQVR4nO2dzZXbOAzH0dWyhG3EKEOuILz5lO1A18EtB1ehw6oDN8E98BsAJdljeyZZ/F6en60hQfDrD5DjJBA2Wf3F+VsIIYQF3XXdLxZSYZgphLBeXXwzrHhtfppr3cHNn7JXNEOqvvjsM2VzwsOjfNrDF3JwdhI0QypTevGy7tS24hqYUGnmTYPZz+Bra4XQ932fr19Rek9phu1q3l187mVjQvRnvbpuOA5KA7PzwDC1VXj1rs/cwwfsP+bh67gdnZ1UuOzPV0tD21YINAPOiIso9p7BfKyVh33r+/7Chp7FcLXsSMNxvwnbEVkQJoAzwBmpfkSKsf0MMH+kN/FPXaneXwDONcKvVwcTQNwD0Y4oEwuKzMX3BVoPCSeAGXHKdt7hYYdS6+LXVJ2UMpJ7VlUXyhRpIEzzxaJ9aj2PCaWH41q8rTjsN+8aV6M1d3Ew//QXgItzE7j55KY0YqW56rAYVV5G86efwY9BL1Z/adYqr6XMxcBO3/cFYUKcAc6I83hOF+zWoViZo7bk2pADIsukJ2eISq32NPfiadIQaIYaGeSqrRkveblGy8cpb9T4PAfGGvNlmRDU8wJd+Y7iHk5IveVXethTAn5pNG6e5nyklGEclwYWyqQ0SFOy9QVhAjU33Gor2SHM274dMZg/woJJE+PrKAdUR/VITqfYGfRiw7I+y+Lg0PWdMHZ8pnWhdWtU+TrUnvRtSX+k2+oam3KoHubapdgdB4qw3jYT8u328u6lWVuj7ONdb9plVyHcXiWKndd5qI3VJNQ6bo/tMi23o7PDT7+jzcCiGWt9tPFE7OJtscyia/0jfWxfmZPbPosIxPOsA/JRg+fx1gd2+uzJ+WuXBm6N6l0LafQjOaFHZrB/mHuxIw3dRZdyXOxoMvbxAUbtCS2DsVDVuiuz+kvZyXvnbeah1N2XeKjR7Oq0pXPWUBNvWYZzbHbk6bfpe3a1XNzyftXWlQWq1erbWq+nrNoxilbLNMOTpUH6E8YzKCx392Ks1sYsb4/zhJ4tMDmq0vKRto5nDe0aSx4qO6X2NJXZkYa8W876hmGkjD1XkWkwzX14zyVxqSKHS3dqUs7kbZl//q7SeNaO06qHscDFuXy/8EIPy+rv6U59JWNctLuGM2xu+53Z0S7M+em62mnaaltvDqXN+IhaesrQ3cJQyFEaZ4QJ3I98Gk+vzQGb31V1o8rLDHrRzqDei3L3MZ9cydvbeW9Ho71r6O1o40w4U7gRNW3xOV28X3y3DvuVOfJZv4fqp16un2xNmUQxg7vScBcsJeZoSf6bKR6O0oov9/C53Hth/ru09d14Yt83ks099r4rsEfXi+dKw4il3v9/D9JVdpdlfC8Pjf8nYmUeoVwrbAXme3mPNBiG8Zth0mAYhoJJg2EYCiYNhmEomDQYhqFg0mAYhoJJg2EYCiYNhmEovEQaVu8AAIB9gYiQP/lyoqffzSvD+HpelzVIIVj9aevLWnGbOr+GEMj7NVSNabZvfuSe9b2vbyhYhvH1vFMatlm9QwqEzq+BiEInJYRRHGKh+ORJ4mDSYBgK30caYnlCQO89cQuEgB9h9c5VtXiONpg0GIbCvjQQ5py+bMWS5zf/UoLjWX6/5ZKV/u/yd2bitid0Tp4YsiRwsSDuYmOM2ZBPYi3nnEmDYQh2pUG7Mkibqfwo79w2qCsV2ZPyMb0ptwjOOU/VVGNVlQb5Rvqz+cSkwTAERw4UKbw25/7Rnh9E9dGTGu1rLe99OVlQCIQs9CsHipJ9CAW5641hGJX9rMHz/Vbj7JouBT6XNbRxm4ia+8hf5dIxt6VdQ6p5zeGsIV9wGobRsisNyu8PS7SvOtCf5Js6+VFzHdD8DwUxHTmdHIDz/xapIQSAv378bK8QuG12rcGMH79rQET7ZoNhCP6Ab0Nun2IMw3iEP0AatCTFMIzP8SdIg2EYT8ekwTAMBZMGwzAUTBoMw1D4D4jGmD9arv4IAAAAAElFTkSuQmCC" alt="" />
2. urllib.quote_plus(string[, safe])
和上面的几乎一样,只是不转换空白字符,而将其用 + 号替换。
例子:
f = urllib.quote_plus('scolia good')
print f
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAX0AAAAqCAIAAADQ5xz9AAAGLUlEQVR4nO2dwZXjLAzHKYsStpFQhlPBcsvp2w58He6pwhd3kCb8HQw2EgLjxCbJ7v/39s1MPCCEkIWQ/WbVBAAAbVG7e4x3re/jq+MORnVK9e5VORKfr2EbXK8OsMMhYz2sLhtzMNumrmnzDmrtfLhHHWqQ87xFkszizuwfV6WuygwZMQ+rbzYS40zooq5mhxmestp416qjY0U6++lxDZ/l0xy9ZnVIY+P86hg3jfZW1/FJxcJy5IxWNubDXjadvqZNe+K5b3KsRz2sPkrarlkcIJnEndHetH1M0zRNg8kHv6jZ5BvP1hzvumjW0d6j3z6xBpHnuT5sHYMNOrsgLtGwlpc1PJHK1fGsm8wyi9OmE29o410r0YMbGZOu4Lm9pmlvmvB+j5JnSmdxpA0zkkncidOESEpiLH6QqYw7TM4TaxB34d3JtJ88ar2u4Xk8alfHN04TkJOmQzc01yvTGyGramPM50Z5Wre9acLbPWrTW3JtDpZM4k79eM4wcfHZx380bs5Krkr1P/6H+d96G1h7U+q65ib+GDXfYP48zNvMDZOcy9IGsYbOdEr1xnRBTgsNCUKvmx3pgZ+0SdnjDWSTEeLOejSmeYofPdjE+Yv5Xnys2ew0/5+l6ZtW/X/2ptRN607p/qI7b7FluFXhxKq8jaQPXcGfzCz8eXM5pwvrXmUNOvfBqM6YXqmrMX1+TQdD/DDxzNxYqW+kBknbLEWJeRsQZ8pWcNMaovful/xk3JlcH1UK0ltiPQg4m94Ay8cuRIH5etjS12wlbTNN4jHK3fntyjXsjKOSz9SQsqQqy6DznRkXLNI2jPq4k9tkColPOvpgVKfErLY0lpfjTIgpscVU/zMNxrvs/DWXvYpWrclGBTmZWRQky6ucnKfI3J2ZJ967cXBjyarcD6UrdKxUn1Rt0ce6kAdkTwlSypaxRsl7d0vOnrOm8VE8p5QnE0KD6/P5W+F+KN0qkU+vOFN2QUHOeRpKtuqS3WC+98ptYh61q8MrDrk7je3DbPTcXZ3sunwslhOR0X/8x/grU7Ksc7K98QyxIjatG2/96Bk5NO/T9k6fWBWsusuRcr9KF7RmBelFoT4lxe4N790tOV9X3nrwER1ksjuhEs3khoyhxX2GtBntbQkTWzUOpmG6Y5yioUQUMny8CDvGeh5J23DqVkfcZNaMY1Z1qcTzea2jC94v9aJjjfdL2BLm/X+V7Hp1cNxJ9ZnyK5hIJrVI1quwymU7d8YyB0utmkquGas+34l9zGso3CnrTNP6VMYaU8F7d0vOPUffTGiXg0zokp4OXE8Tk9DSDGsQNQM5qQonybjNn19r6L1KJQxRw7nBTetQ0zlRw+XWopCT9pJID1J9p/wGw9bqSI9XeEVDfB4fjx4dxSP7JL3kZIdUvtwU8gvTG9Up/TtUQPzXqKjB64PEqrxNZhbxCsqzWOpN/UV30YsXnWSNuL5D5Uh2dqZ308O5aCy+poO1gyV+SD0zp7Nc+6NLn/pPkCYsoryCUhsuOeO9OyXvf2+QTLv0mox09mnMomEuIXq7hsdy3lsY7x3r0zhw7oU0eYut11a2aP3OTswrcSfHsD4t+gz8gw+SH32WhuDfJPHMGpaCyyEvx76HM+IOAACUQNwBALQGcQcA0BrEHQBAaxB3AACtQdwBALQGcQcA0BrEHQBAaw6IO6PVSiml2Ft4zvArpzJrgTcBAfgCjsp30igz2kvj1ykbRzoAwJOcF3fO7niqKADAiSDuAABaw+OOM8qjwykp1G9U9JdvNGkx8XveS6F/m4WLWYdaL5NrxoljpVfmXlprxB0AvgEWd6QyjV5CQvgrQlrbcf2e6ciuLB9z18U26VjFK4g7AHwD6TnLJw8hohQCihgy0mZUMM+DSLMlJwrDp2PV/AAA+Gh4vmP5Db9mEaO1B+Q7paxEzLaq8x1n8BwdgK+AxZ0l4VAklaEFH1ZhifqES1GZJvqfleZE6nLRtDQUk3asr+8YY/AGDwDfwEe9r1w+uAEA/hI+Ku5IqRMA4K/js+IOAOBfAHEHANAaxB0AQGsQdwAArUHcAQC0BnEHANAaxB0AQGsQdwAArUHcAQC0BnEHANAaxB0AQGv+B9UF/i/pUdqZAAAAAElFTkSuQmCC" alt="" />
3. urllib.unquote(string)
将 url 重新转码回去,是上面操作的urllib.quote(string[, safe])逆运算。
4. unquote_plus(string)
同上,是urllib.quote_plus(string[, safe])的逆运算。
5. urllib.urlencode(query[, doseq])
将一个python的字典快速转换为一个请求的格式,用于上面的 data 属性。
例子:
aDict = {'name': 'Georgina Garcia', 'hmdir': '~ggarcia'}
print urllib.urlencode(aDict)
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAbwAAAAnCAIAAAD7BGq8AAAIIklEQVR4nO2dz73jJhDHp4ock2NaoIStIB2YMuQKws2npANdzd1V6LB04CaUA/+ZAYEjyd735nvwx5ZhGAb4acB6u7C2eCoxAVwBriCXepmbMvGzlr4KXKVums9YJMwDxS3mIWDK20p8Fg9DefgqL3l4ID2jkxWW2o2O1KtRt76KLzrmh6MWtHYwn+ry2BqxnjLnk/Z9k31n1FOJvawN9eL9lrU8dWFC4zujbkI913Vd10WK6gRNiq2usJ0K5iGac8KoR/LtCxMoWTZ6Bld9Ud5n7c0hD3v53x4eSOfoOPTsbyGhF4d1J7bl7mrUIjkpmPkIHltrXfO+b/P+GUX3NO/FnjHc3bKeAa5wbhhbopkmaImjaKTNQ2QTpVM0CzsvTKC0Slk9i2zp4Qv2X/PwOJ69o+MK49TvoO7kqYSeQc6SyGfPCeZrrbzs22ga9fYZtTlbamU+yvLZYWyJZr8rWhaxSLfM7qPU8bZwd2+uyV1ikTArdQO4xqzQ7b6tOlg7qIwtiLJdlRdIPdRyApilnLydMzzMIGrdlHHVNVEGMzJRsts7IZrxRCXPEF3r4WbuLtZrlW3ZsOfbRmtN3ATM/6gbwE2ICcR8EZOLWJk7UFGl8ovSn3wE75VeuGOKcLxDjHtXNPK+LxImKWeAq5RzfUwXmc1DNDNrbeG5gQOCy4SzLHsPI3tajOBmNMjZu4/l7TFtJgrFxNhY70Oau49ornpOTseoW4EPllZ49YaPk++Sve6TqZgn4jLrSu6+9aPUmtLDSerc8pEe5oQkMTRqZSU9pMNlCvpFs3Z7b6ScuPVFwgTkfqLVlrOjpRfENGIw39dFuvVmX2vLgIxqzz6AsFPpRcMyPcpoG571XUvb8VmbRZtWVMt5SF3J28L+YLfJOTb5JKYhNzhZrkSjNXv3stwzprViRYGe9d5L7/Z8Nc+m6fZIeF3Tcz05byzm1jpPFmSEOhgmPYwXj/OQitVE3S3TpJIsk/LsHZ3ylK0mE0UGVLRem74o3ynbKrLRrPW7+5i+Fk62fcaLp8jNO4Q1Ji/9rVfs5Bm3UI9s69CK6tBEqn2FB7RnBPOLxJksdePZmL37Wm6Paa1rZIH2eu+l+4egrZ9Zk/1vNQcBcoz1Upkl5B0+K2PULfR561yv8BDfqw/xkCLROyd2/l4dt7G4TEnf6JC395jrWVfDT2dlv2LrxEylauVtmcfF389s5hUt6xl2Fk3sz1ofQWQ5O38vajVGuR3nSapiguGoYss9bfVnmukccx4SKyX2FJ/JVqKxNmbvXpZ7xpQMEW6oZ7330hLN5KGWzX1Q2P/6KnhTqec8JfQl5RJvMnLJThmI05O0zL8/4q3pSh3bkR7aAjch/LnGgR4GXcjJTpcWv/9aqDPN9sNeW6ND/ZhbnuKRjy6lrScHSUl8UC06zcxOe/XqMzs5S5hA/O1P/dxrcpBXnolnUS3LVHqRjiDdi3DGOl/ElDyjNlHRSM80cztUnLWc9frUOmmrHNNFqUVl8zCfmTWf6fPufOjx/PHWiEGkR5AqU1quzN4dLK/bY7quaLagOOCoBsr13ktbNIfAKXrGyc9SUQQPa6no2z3cl+OetntvW5/Gjn3fTJrqbD3ht8Uv9mzmCDuv9x1Fs8YSf6v6DNxPcllm+lkeMt8TNDN7CEeBu/wFxxdk9/V+gmgyDMN8HVg0GYZhBmDRZBiGGYBFk2EYZgAWTYZhmAFYNBmGYQZg0WQYhhmARZNhGGaALymaWsJHPag+6o+WANBbxSiRFv60vr/G1+gF8zV5m2jatT4gDkO2Lzv8ccR+Hm77Y9sSyb8XNKIaaeHhvqdNa2Xrxq7bL6yKHzZelFN7jCDDHMGbRFNLrxBais9ZHon6nOqhbTfVvpdFcxSjhNSuj9r/ByFePJMroYnwHcN8T04QzSxNAQCAvy4i6pBR6mM2YkEajDrVQy1BaitfpSfd1XtLFvz44fVaoW4mHceyzjDflPa/3A4AUuU7x7Bzk9q+F0IAiMtFQCozAiDbbmZU1h6ulV5x611qdzFJCZOt4z0UW9OvW73AVryUVBy0tWr+FKZLf8jWbSCV8Vvln07FnM/VOFvjQghrqWir5mHZJ6GMlkKIuB139VV2XIA253jTjq4cFjGGeRvtTFPLqFhhRsc3d7eJDa9FmqalW/8FQZL8WhDqZ17LpHZi2qclFFKMHVuRKG/0QrespR6S1Qt/jIoC1vAns2PURSnpq0XLweeNOIc2K31HKpsi//E9FEIonWTY2T68cYAw+maHiDHMG9kUTTRZ47qzizmspXxh4MQkkm5+G7U2Nc7ZApQibS+5pBdUSdJDXAv7Q65qdJGIod2cG+3+Qxl8cyLj3HPD6FEZrZxsJ2eXsmLHn3PiyBNjcUDEWDSZ9zIqmv6NUaK6mBPFMWZrg96oRWaaLyy5jV7Qi5DwENfCrcfv0IEgMh1jGLaydqdcZrUbcfYGmn1vorVX7ZDm1+yE37V7xuKQiLFoMu9l80wT3DoNN3n7XlwuAuD3PwBAShle48Z6My3AR1e4FnGmWSSV2XWp0cfwudGLxtFr4WFa68/wVA7tT3qiUAYjb/2nKyOUVgLgcsc+/0bH2X7rvtCNUDR///dilW2dM9nKdvWpPhfXalf2jhj/fM+8kV/94fbGzpQ5lDLydx4L5nvwq4tm+hw2pyCngiPPY8F8B3550WQYhjkTFk2GYZgBWDQZhmEGYNFkGIYZgEWTYRhmABZNhmGYAVg0GYZhBmDRZBiGGYBFk2EYZgAWTYZhmAH+A5LQU8c8XWFDAAAAAElFTkSuQmCC" alt="" />
这里注意:一旦设置的 data,就意味使用 POST 请求,如果要使用 GET 请求,请在 url 的后面加上 ? 号,再将转换好的数据放进去。
例如:
GET方法:
import urllib
params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
print f.read()
POST方法:
import urllib
params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
print f.read()
6. urllib.pathname2url(path)
将本地路径名转换成 url 风格的路径名,这不会产生一个完整的URL,而且得到的结果已经调用了 quote() 方法进行了处理。
例子:
a = r'd:\abc\def\123.txt'
print urllib.pathname2url(a)
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAARIAAAAlCAIAAAABXfHVAAAEfElEQVR4nO2czXUjKxCFKyxCeImIMKQIHjvtXga9NXtF0YtHBkqCWTQ/BRQ06Nhjjed+Cx25VRQF1KVA9jH5Ppo2O/iYYzfSe/xhbxruWj3cYWjCm8Zs13RV5smeP426G+e9e6jQvLXx3ntn7so8y3gexvnySRXhVdvS81dGWBJb5U6fRm3WP80ldd3aVAz8t91ddTZNDds3gwh3TVcqIhEDqPoKfqy+xgViM0bbh9813Y1Lr8nh6cyLNm08gp/OKAaehVWmuiNGnDvvvffuKS2eGJ8wgJDZdpPWrzeAmTd8STJWj2dQ8PN1EUpzdSW6Ed2oCInNtmzDec6ujt3mEuVKOYC2915S8lZiX9FPll92+BF+5K9VkOOYm925imdGWnaT1uJk3UeyYbv4rvNWLWN12ma6+xCJ0di9k3DiXl7YOHNPWe7MY6DbJsLK81dFKMEyPqR7rDYqNWltauZWp9r+y7HHUE3yU40r9y7kqNSq7Ms9LnFHs/pwGD3bjT5ZNm08vr+Cjec8G22rxWoTM+kmJ1NFOAXFJu3Rwm5lWYiWes/bkt6j+tNg0v7R2vz3T94UeY/jCA+Du1JXoluxzJ8fYcqMktCKeVYPF5pH48JGZGJ16u3fe++duR8zFsee/LC+eO95P76x+WlayaWGz8yRixvRjfSm6UrqX6Kb1lt6PQp+6k7bzqzWNp1R8BWURxHjUdtFpfMbX3c+G0nwt7FslqiOGTXSwek3kyLslaNvj/BzaUvNz+jr+/lE2fQIu8X7zKnVVT18uwjBm/MbZAPATwOyAWAZyAaAZSAbAJaBbABYBrIBYBnIBoBlIBsAlpmWjdWkyr8BmHnymufwlIhWfgNZ+HFGnTU/TFqj10YB/iYmZeOMqtJLfnL5ONJdzMdJzwmrF2TT+hk3T/aVWeHHavZRlFke2epYwU9hTjZrpSYmotXnu/Zoa1+RjeBn3LzzafITJMH+wPoS/VtNpD/48AbaD5+b04HM2IA3YUY2s6UmPkkZyXUjpuk43eZlI/p5QTaVH27Tez8T6cxAlkor+GYmZLN8q2GyGadC6Scdgrj8tCai5s5SnYqqeI5CoVSWQGgVjNhpq3RUj0uO3xnFrKweH9HYQS7UqBAJu7yVNhDP+3Mqm9VS46dlc9rK6iOJcp6KF5LKT7TOj1P7cfWbuh2VmhkYDj52RmnLT30TLsBbcSabV75Aixkgp9jQT7HjikKSzmJyiSjk1+zl0kHr7HbUv4itHgjFwUM2fwxj2bxQanzKgKFq+lt7/iTmllBt0gVajEcZl2tVEYhznWpzfjs6akR8b4qvBE6+s07fkdgYj7aFv9YGvDVD2bxYasRTepmmYp6FY//looiU+d9bY+xxDckto/v8pV3Pz3EtSsLhIbHLTddPPY7i/lF+2zYsYrw/bUMbZVz9i6neb5HAOzL+zzUvlJoZXmv1/n7A3wL+uAaAZSAbAJaBbABYBrIBYBnIBoBlIBsAloFsAFgGsgFgGcgGgGUgGwCWgWwAWAayAWAZyAaAZX4B1+NONa4kBcIAAAAASUVORK5CYII=" alt="" />
7. urllib.url2pathname(path)
将 url 风格的路径转换为本地路径风格,是上面方法的逆运算。
a = r'www.xxx.com/12%203/321/'
print urllib.url2pathname(a)
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZMAAAA4CAIAAAC30oKvAAAIPElEQVR4nO2cv4/kNBTH/We5oQPE33ASJwrGDUfPgZQtjqU7U410OhANZSSqdT+6Gq45LhJrWB3SVtMgrVa66vQo4iT+HWcmszPZ/X40Gs14nefnZ+ebZyc77B8AAFga7NgOAADAZNhLAABYGowAAGBpQLkAAMsDygUAWB77KZfecL7R+/rQCFYxVqt97cQ4fQ/vBlWzGeIwS1tbyfPBbMR4qEvqHIPSOM8+o2YNyOFmy3yWM8rVzrAzxs6YaNJ11tJyRInuEHYmJgRyp7jrDWeV25blswmQ7+GunNqpUjI6TmWhzOgIRVquyw7c0bFuOFJBywdzK1ejk7ukzt1j932UeWfUVvK5rE3qxdEsJ5VLyzWXWyIiakRaJq1qZCq346E3PDswWm6sv+4witbcVXV3+Wpk57PqzAUelrK3hwekcHQMw4Wu78XBumNfVPWGs+hMvaNguiN42KOIpiYUx59R8Z66vZgzhrNaTiqXnapYbQTh9pdjhcrl2dlhFO1D/MOdoOy4YNzfw8OxLR0dUzlMgg7UHfeiqmomahHJ7O4mmLu1srNvUxOKo8+o0dmSqnMSlpPKVW5XCc8hewVnvgrVZkZnjNUX5kP7Gk4kKdeMnQ35kVkMtqeo2Rfw67QVg7xPuhVsD5WoGKuFqDo7d+GhQ+SotdTuxodTJ2TKqDsXuohyDQt8N1cyrXcxUaYwfZTfVht2dxXTWuNrzuqf5JqxNecV4/WKVyZifXODw0FU/Toxf9wRvEj0wqya+92GyLgXRcPteyNYJUTN2JkQdXpMG+HMw2BmptoK50YYkLBOv7XSXkiiPfVGcDQa0dk7j+XxMZ1BuUjV1o5JeFINyxklw1Oo/1p1OtKWd2nFkDGFdYiii0G18U9438NKKNfyIT106dOlvtH23LY3bsI6HuXKlbrQZZKvsPVGsIpFM+tcW8aOEp0q2RFj9QU1wkz69j2VQUejWpIRR+wkepGxHB/lYFXo9F2JtuO10o3Suaj68zBW4rYV+hO6HZ1jVZdJJFcqsbQxEY3c7J3L8siYFq0WSW+zq618ODpxUXU6V8ycUbmTzTorBpTIT+KIncN5GItVFVyR2rM3X8dmWzo6/s5L6lz1cgGv9dQcCq78flteXua0fmG+2u+ek3mfgwukn6UWqNtwGS9vPWHHzT253Lh3DzNRnTSRUn8KB7RkBN3CyD5dTP1HZu+8lpNjWrZDP3YTylqOJa/GLBpo1SSGKnqtc+poue6FZmyvx/MwvGodxMMYlugYxemuWsOqKqzjUzY60QvdkPW0rvb3NPx+Da1Hzp/YUW5berPqLiptDjJYVjWbWblCfyg9goFlZ0/WOyozyvk4V0J6EyyMami5pK3ynMueY8bDyJky9DTcp0tEgzKzdy7LI2Na8lTEaFreL8e6Q8I1jqrd5KirKZpBbkXjrNgjK2q7zs+fDSJ9FtvKiXrYVlhz3u1tHdDD/uR0cXYc+uVAE9vnyj+PMjY6sVtd/s5O9OkKu3Vrc8GKT3BUPOFydgAVdTmOqAWrGH/e7QSZd2tzx98ndaLq10n0wh7BeC/6fbd6xSvrMZoqFg17n8u1E4uzErWirVJWW/6YNlI20pmH7sxM+RzfA3WHPpw/nbXIIMZHMFbHt5yYvTNYptExnesZ+jBjdIit4O6Y3sNUUnZ0D+flcE/lHLetU2PGvmdS9THGHkIaYxnPcNkc+r9/muHO3Wlgblg4OdppeQgeJsHMLKHfHprlceslgf9bBAAsDygXAGB5QLkAAMsDygUAWB5QLgDA8oByAQCWB5QLALA8oFwAgOVx35VLS84P84zeXJYP5yEA95f7rlwJYVDCemZeS84MySfplWB+ncByWCdh2Wkd0gXAdO69cgXCYPTF+pmJVfdnJRLapURvQktu6ZJl2a9zEbXstx7zEAAwxv1XrpgwKDFoR+pzArtKSnJMneJWIF0ATCStXJo4o9WKGCMpiXG6LCiJnH6aOCPGSLUfuqPyJemfyiPGiDGyfuzclCgiJcxnviLOyPpZWE8q4go1JiBKBOvJmOVI3uZnZ0GNAskEAAzkci7BSRMJQVoZKSkpiaCJC9KSuJxSErXDSBGRIia6Ek6aSAnisitXxDjpvg6FwhARitK8xzs0KjluYWw/LDgG0gXAFHLKJTlJSYwNalJSEsUoy8QSj1DX+pL2Q/g+2M7qjRKseLk2TbpilgvUDgCQJadcShATJDmpKSUhWpJQpOWwfCspifpjxEiZar08CUZSkxIklPNuHWsLg/NNS27tXMm+/OsfrtSr/25uPzi774EUdbYidRKW4yIF6QKgnJxyaUmKSMppJZI76iO52bcSjBij55fjJe3hnh2iYctsWAZ2JUKZDS8hiXfvToLWC4P/7IL1nTFblZ6cX/FHbz5ZvX324vry12+TD05ELAct9ZYjT1cEdgAAYzyAe4uGKStCIiL65bctf/Smf33+VHcp2L6W5/IQgAdLUrnam3T34LUz6tV/tnK1L5OCvXu/u10AwN48nJxrMn+9ex8qV/t69uL62N4B8KCBcuX46PGfkC0AThAoV44vvvs7VK7Xze2x/QLgoQPlynH+8jpUrifnV8f2C4CHDpQrh3d7EWkXACcClCvHH29v+aM3H3/59tXvN/atRqRdABwXKFeOm9sP3/z4b/8M1+vm9tNVg7QLgKMD5ZrG5bv3j59qpF0AHBco12Rubj989f0V0i4AjgiUa0fOX14j7QLgWEC5AADLA8oFAFgeUC4AwPKAcgEAlgeUCwCwPKBcAIDlAeUCACwPKBcAYHlAuQAAywPKBQBYHv8DTAyfBk7QFlQAAAAASUVORK5CYII=" alt="" />
同样的调用了 quote() 方法进行了处理
其他的内容请参考官方文档:https://docs.python.org/2/library/urllib.html