PHP 用QueryList抓取网页内容,querylist抓取


之前抓取网页数据都是用Java Jsoup,前几天听说用PHP抓更方便,今天就简单研究了一下,主要是用QueryList来实现.

QueryList是一个基于phpQuery的通用列表采集类,是一个简单、 灵活、强大的采集工具,采集任何复杂的页面 基本上就一句话就能搞定了.

直接拿博客园举例子了,http://www.cnblogs.com/ 我们用QueryList来抓取红框里面的内容

查看网页源代码找到红框的位置

1 <div id="post_list"> 2 3 <div class="post_item"> 4 <div class="digg"> 5 <div class="diggit" onclick="DiggPost('jr1993',4716308,222703,1)"> 6 <span class="diggnum" id="digg_count_4716308">0span> 7 div> 8 <div class="clear">div> 9 <div id="digg_tip_4716308" class="digg_tip">div> 10 div> 11 <div class="post_item_body"> 12 <h3><a class="titlelnk" href="http://www.cnblogs.com/jr1993/p/4716308.html" target="_blank">简单的jQuery 四级分类实用插件a>h3> 13 <p class="post_item_summary"> 14 <a href="http://www.cnblogs.com/jr1993/" target="_blank"><img width="48" height="48" class="pfs" src="http://www.bkjia.com/uploads/allimg/150830/1Z34K504-2.png" alt="">a> 前言最近因需要自己封装了一个很简单的四级分类的jQuery插件,主要用于后台数据的传输和获取。接下来就分享一下这个实用的插件吧。正文老规矩,先看一下效果,这个就很丑了,没有美化的,因为主要还是用于后台界面使用的,同时请忽略测试数据的内容:那么下面就介绍一下使用方式:首先html代码: ... 15 p> 16 <div class="post_item_foot"> 17 <a href="http://www.cnblogs.com/jr1993/" class="lightblue">郭锦荣a> 18 发布于 2015-08-09 20:40 19 <span class="article_comment"><a href="http://www.cnblogs.com/jr1993/p/4716308.html#commentform" title="" class="gray"> 20 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/jr1993/p/4716308.html" class="gray">阅读(21)a>span>div> 21 div> 22 <div class="clear">div> 23 div> 24 <div class="post_item"> 25 <div class="digg"> 26 <div class="diggit" onclick="DiggPost('maybe2030',4715035,229915,1)"> 27 <span class="diggnum" id="digg_count_4715035">0span> 28 div> 29 <div class="clear">div> 30 <div id="digg_tip_4715035" class="digg_tip">div> 31 div> 32 <div class="post_item_body"> 33 <h3><a class="titlelnk" href="http://www.cnblogs.com/maybe2030/p/4715035.html" target="_blank">[Data Structure & Algorithm] 七大查找算法a>h3> 34 <p class="post_item_summary"> 35 <a href="http://www.cnblogs.com/maybe2030/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/764050/20150531155648.png" alt="">a> 和排序算法一样,查找算法也是一种最为基本的算法。高效地查找可以使我们对数据进行更加高效地操作,熟练掌握各种查找算法也是一项基本的算法技能。 36 p> 37 <div class="post_item_foot"> 38 <a href="http://www.cnblogs.com/maybe2030/" class="lightblue">Poll的笔记a> 39 发布于 2015-08-09 20:27 40 <span class="article_comment"><a href="http://www.cnblogs.com/maybe2030/p/4715035.html#commentform" title="" class="gray"> 41 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/maybe2030/p/4715035.html" class="gray">阅读(12)a>span>div> 42 div> 43 <div class="clear">div> 44 div> 45 <div class="post_item"> 46 <div class="digg"> 47 <div class="diggit" onclick="DiggPost('zhanggui',4716267,191738,1)"> 48 <span class="diggnum" id="digg_count_4716267">0span> 49 div> 50 <div class="clear">div> 51 <div id="digg_tip_4716267" class="digg_tip">div> 52 div> 53 <div class="post_item_body"> 54 <h3><a class="titlelnk" href="http://www.cnblogs.com/zhanggui/p/4716267.html" target="_blank">第二章、进程的描述与控制a>h3> 55 <p class="post_item_summary"> 56 <a href="http://www.cnblogs.com/zhanggui/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/527522/20140908182900.png" alt="">a> 第二章、进程的描述与控制====##2.1 前趋图和程序执行### 2.1.1 前趋图####概念:所谓前趋图:指一个有向无循环图(DAG),它用于描述进程之间执行的先后顺序。###2.1.2 程序顺序执行####特征:* 顺序性* 封闭性:指程序在封闭的环境中运行,程序运行时独占全机资源,资源的状... 57 p> 58 <div class="post_item_foot"> 59 <a href="http://www.cnblogs.com/zhanggui/" class="lightblue">Scottzga> 60 发布于 2015-08-09 20:24 61 <span class="article_comment"><a href="http://www.cnblogs.com/zhanggui/p/4716267.html#commentform" title="" class="gray"> 62 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/zhanggui/p/4716267.html" class="gray">阅读(17)a>span>div> 63 div> 64 <div class="clear">div> 65 div> 66 <div class="post_item"> 67 <div class="digg"> 68 <div class="diggit" onclick="DiggPost('lubiao',4716200,158099,1)"> 69 <span class="diggnum" id="digg_count_4716200">0span> 70 div> 71 <div class="clear">div> 72 <div id="digg_tip_4716200" class="digg_tip">div> 73 div> 74 <div class="post_item_body"> 75 <h3><a class="titlelnk" href="http://www.cnblogs.com/lubiao/p/4716200.html" target="_blank">树莓派入门笔记a>h3> 76 <p class="post_item_summary"> 77 <a href="http://www.cnblogs.com/lubiao/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/550600/20150808222928.png" alt="">a> 一、关于开源硬件开源硬件的概念简单理解就是电子硬件的设计详细参数是公开的,比如电路图、材料清单和PCB布局等等。主要类型:Arduino、CubieBoard、RaspberryPi、PcDuino、BeagleBone、KiWIBoard和Mixteil开源中国社区-开源硬件专区http://ww... 78 p> 79 <div class="post_item_foot"> 80 <a href="http://www.cnblogs.com/lubiao/" class="lightblue">clbiaoa> 81 发布于 2015-08-09 20:05 82 <span class="article_comment"><a href="http://www.cnblogs.com/lubiao/p/4716200.html#commentform" title="" class="gray"> 83 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/lubiao/p/4716200.html" class="gray">阅读(59)a>span>div> 84 div> 85 <div class="clear">div> 86 div> 87 <div class="post_item"> 88 <div class="digg"> 89 <div class="diggit" onclick="DiggPost('xiaoheimiaoer',4716191,124701,1)"> 90 <span class="diggnum" id="digg_count_4716191">0span> 91 div> 92 <div class="clear">div> 93 <div id="digg_tip_4716191" class="digg_tip">div> 94 div> 95 <div class="post_item_body"> 96 <h3><a class="titlelnk" href="http://www.cnblogs.com/xiaoheimiaoer/p/4716191.html" target="_blank">JS监听组合按键a>h3> 97 <p class="post_item_summary"> 98 <a href="http://www.cnblogs.com/xiaoheimiaoer/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/435330/20140328110126.png" alt="">a> 有些时候,我们需要在网页上,增加一些快捷按键,方便用户使用一些常用的操作,比如:保存,撤销,复制、粘贴等等。 下面简单梳理一下思路: 我们所熟悉的按键有这么集中类型:单独的按键操作,如:delete、up、down等两位组合建,如:ctrl(cmd)+ 其他按键,alt+其他按键,shift... 99 p> 100 <div class="post_item_foot"> 101 <a href="http://www.cnblogs.com/xiaoheimiaoer/" class="lightblue">黑MAOa> 102 发布于 2015-08-09 19:59 103 <span class="article_comment"><a href="http://www.cnblogs.com/xiaoheimiaoer/p/4716191.html#commentform" title="" class="gray">104 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/xiaoheimiaoer/p/4716191.html" class="gray">阅读(36)a>span>div>105 div>106 <div class="clear">div>107 div>108 <div class="post_item">109 <div class="digg">110 <div class="diggit" onclick="DiggPost('QG-whz',4716139,205933,1)"> 111 <span class="diggnum" id="digg_count_4716139">0span>112 div>113 <div class="clear">div>114 <div id="digg_tip_4716139" class="digg_tip">div>115 div> 116 <div class="post_item_body">117 <h3><a class="titlelnk" href="http://www.cnblogs.com/QG-whz/p/4716139.html" target="_blank">编译器角度看C++复制构造函数a>h3> 118 <p class="post_item_summary">119 <a href="http://www.cnblogs.com/QG-whz/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/610439/20150502210248.png" alt="">a> #[C++对象模型]复制构造函数的建构操作关于复制构造函数的简单介绍,可以看我以前写过的一篇文章[C++复制控制之复制构造函数](http://www.cnblogs.com/QG-whz/p/4485574.html "C++ 复制控制之复制构造函数")该文章中介绍了复制构造函数的定义、调用时机、... 120 p> 121 <div class="post_item_foot"> 122 <a href="http://www.cnblogs.com/QG-whz/" class="lightblue">melonstreeta> 123 发布于 2015-08-09 19:44 124 <span class="article_comment"><a href="http://www.cnblogs.com/QG-whz/p/4716139.html#commentform" title="" class="gray">125 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/QG-whz/p/4716139.html" class="gray">阅读(41)a>span>div>126 div>127 <div class="clear">div>128 div>129 <div class="post_item">130 <div class="digg">131 <div class="diggit" onclick="DiggPost('advances',4716089,238649,1)"> 132 <span class="diggnum" id="digg_count_4716089">1span>133 div>134 <div class="clear">div>135 <div id="digg_tip_4716089" class="digg_tip">div>136 div> 137 <div class="post_item_body">138 <h3><a class="titlelnk" href="http://www.cnblogs.com/advances/p/4716089.html" target="_blank">【Cocos2d入门教程三】HelloWorld之一目了然a>h3> 139 <p class="post_item_summary">140 <a href="http://www.cnblogs.com/advances/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/794244/20150804212641.png" alt="">a> 什么程序都是从HelloWorld先开始。同样Cocos2d-x我们先从HelloWorld进行下手、下面是HelloWorld的运行完成图:建立好的Cocos游戏项目中会有两个比较常用接触的文件夹。分别为Classes与resource。Classes存取代码文件,resource存取资源文件,下... 141 p> 142 <div class="post_item_foot"> 143 <a href="http://www.cnblogs.com/advances/" class="lightblue">蔡明勇a> 144 发布于 2015-08-09 19:27 145 <span class="article_comment"><a href="http://www.cnblogs.com/advances/p/4716089.html#commentform" title="2015-08-09 20:37" class="gray">146 评论(5)a>span><span class="article_view"><a href="http://www.cnblogs.com/advances/p/4716089.html" class="gray">阅读(65)a>span>div>147 div>148 <div class="clear">div>149 div>150 <div class="post_item">151 <div class="digg">152 <div class="diggit" onclick="DiggPost('kodoyang',4715572,180900,1)"> 153 <span class="diggnum" id="digg_count_4715572">0span>154 div>155 <div class="clear">div>156 <div id="digg_tip_4715572" class="digg_tip">div>157 div> 158 <div class="post_item_body">159 <h3><a class="titlelnk" href="http://www.cnblogs.com/kodoyang/p/MonteCarloMethod_PI.html" target="_blank">蒙特卡罗方法计算圆周率a>h3> 160 <p class="post_item_summary">161 <a href="http://www.cnblogs.com/kodoyang/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/618527/20150809193301.png" alt="">a> 为了避免计算时间超过十秒钟,很随意的减小了样本值。 【方形中的所有像素计算】中一共计算10^8次,当在【方形中的随机像素计算】中也计算相同的次数时,就会陷入等待。 猜测原因是获取随机数的时候浪费了很多时间,也可能是循环的次数太多消耗时间。 【方形中的随机像素求平均值】中巴10^8分成了计算10... 162 p> 163 <div class="post_item_foot"> 164 <a href="http://www.cnblogs.com/kodoyang/" class="lightblue">kodoyanga> 165 发布于 2015-08-09 19:24 166 <span class="article_comment"><a href="http://www.cnblogs.com/kodoyang/p/MonteCarloMethod_PI.html#commentform" title="" class="gray">167 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/kodoyang/p/MonteCarloMethod_PI.html" class="gray">阅读(48)a>span>div>168 div>169 <div class="clear">div>170 div>171 <div class="post_item">172 <div class="digg">173 <div class="diggit" onclick="DiggPost('xyczero',4716019,198864,1)"> 174 <span class="diggnum" id="digg_count_4716019">0span>175 div>176 <div class="clear">div>177 <div id="digg_tip_4716019" class="digg_tip">div>178 div> 179 <div class="post_item_body">180 <h3><a class="titlelnk" href="http://www.cnblogs.com/xyczero/p/4716019.html" target="_blank">Android 之夜间模式(多主题)的实现a>h3> 181 <p class="post_item_summary">182 <a href="http://www.cnblogs.com/xyczero/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/674691/20140930230333.png" alt="">a> ##引言夜间模式其实属于多主题切换的一种,不过是最麻烦的一种。因为在夜间模式下不仅要切换主色调,次要色调等等,还要覆盖一些特殊的颜色,因为在夜间模式下总不能什么都是黑的把,那不得丑死-。-,所以当你夜间模式完成后,你的App对于日后多主题的扩展就可以轻松胜任了。##实现思路多数App由于历史原因当对... 183 p> 184 <div class="post_item_foot"> 185 <a href="http://www.cnblogs.com/xyczero/" class="lightblue">xyczeroa> 186 发布于 2015-08-09 18:40 187 <span class="article_comment"><a href="http://www.cnblogs.com/xyczero/p/4716019.html#commentform" title="" class="gray">188 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/xyczero/p/4716019.html" class="gray">阅读(78)a>span>div>189 div>190 <div class="clear">div>191 div>192 <div class="post_item">193 <div class="digg">194 <div class="diggit" onclick="DiggPost('xishuai',4715000,124657,1)"> 195 <span class="diggnum" id="digg_count_4715000">8span>196 div>197 <div class="clear">div>198 <div id="digg_tip_4715000" class="digg_tip">div>199 div> 200 <div class="post_item_body">201 <h3><a class="titlelnk" href="http://www.cnblogs.com/xishuai/p/4715000.html" target="_blank">2015-写给明年现在的自己a>h3> 202 <p class="post_item_summary">203 <a href="http://www.cnblogs.com/xishuai/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/435188/20130715165802.png" alt="">a> 《[2014-写给明年现在的自己](http://www.cnblogs.com/xishuai/p/3900217.html)》时间如流水,转眼又是一年,回头看去年现在的自己,仿佛还在昨天。去年的那篇博文,如果认真去读的话,你会发现我是带有情绪的,对自己以及对所看到人和事不满的一种情绪发泄,写出来... 204 p> 205 <div class="post_item_foot"> 206 <a href="http://www.cnblogs.com/xishuai/" class="lightblue">田园里的蟋蟀a> 207 发布于 2015-08-09 18:08 208 <span class="article_comment"><a href="http://www.cnblogs.com/xishuai/p/4715000.html#commentform" title="2015-08-09 20:17" class="gray">209 评论(5)a>span><span class="article_view"><a href="http://www.cnblogs.com/xishuai/p/4715000.html" class="gray">阅读(289)a>span>div>210 div>211 <div class="clear">div>212 div>213 <div class="post_item">214 <div class="digg">215 <div class="diggit" onclick="DiggPost('anding',4715440,76293,1)"> 216 <span class="diggnum" id="digg_count_4715440">7span>217 div>218 <div class="clear">div>219 <div id="digg_tip_4715440" class="digg_tip">div>220 div> 221 <div class="post_item_body">222 <h3><a class="titlelnk" href="http://www.cnblogs.com/anding/p/4715440.html" target="_blank">Winform开发全套31个UI组件开源共享a>h3> 223 <p class="post_item_summary">224 <a href="http://www.cnblogs.com/anding/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/151257/20150809180411.png" alt="">a> 一.前言 这套UI库是上一个公司(好几年前了)完成的。当时主要为开发公司内部ERP系统,重新设计实现了所有用到的Winform组建,包括Form窗体组建6个(支持换肤),基础控件25个。其中有很多参考借鉴其他开源组件,也有几个是集成的别人的组件,然后做了些调整。 现在已经好几年不做Winform..... 225 p> 226 <div class="post_item_foot"> 227 <a href="http://www.cnblogs.com/anding/" class="lightblue">/*梦里花落知多少*/a> 228 发布于 2015-08-09 18:01 229 <span class="article_comment"><a href="http://www.cnblogs.com/anding/p/4715440.html#commentform" title="2015-08-09 20:33" class="gray">230 评论(4)a>span><span class="article_view"><a href="http://www.cnblogs.com/anding/p/4715440.html" class="gray">阅读(346)a>span>div>231 div>232 <div class="clear">div>233 div>234 <div class="post_item">235 <div class="digg">236 <div class="diggit" onclick="DiggPost('liulun',4714858,32486,1)"> 237 <span class="diggnum" id="digg_count_4714858">1span>238 div>239 <div class="clear">div>240 <div id="digg_tip_4714858" class="digg_tip">div>241 div> 242 <div class="post_item_body">243 <h3><a class="titlelnk" href="http://www.cnblogs.com/liulun/p/4714858.html" target="_blank">用Nim语言开发windows GUI图形界面程序a>h3> 244 <p class="post_item_summary">245 <a href="http://www.cnblogs.com/liulun/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/u28932.png?id=12164046" alt="">a> 前言本文得到了“樂師”的大力支持,我们一起调试程序到深夜,要是没有他的帮忙,我不知道要多久才能迈过这道坎,另外“归心”还有其他人也提供了帮助,他们都来自于QQ群:“Nim开发集中营”469329878;感兴趣的朋友,可以加这个群一起讨论配置GUI开发环境我在这篇博客中,写到了Nim开发环境的搭建那篇... 246 p> 247 <div class="post_item_foot"> 248 <a href="http://www.cnblogs.com/liulun/" class="lightblue">liuluna> 249 发布于 2015-08-09 17:47 250 <span class="article_comment"><a href="http://www.cnblogs.com/liulun/p/4714858.html#commentform" title="" class="gray">251 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/liulun/p/4714858.html" class="gray">阅读(175)a>span>div>252 div>253 <div class="clear">div>254 div>255 <div class="post_item">256 <div class="digg">257 <div class="diggit" onclick="DiggPost('klguang',4715529,232502,1)"> 258 <span class="diggnum" id="digg_count_4715529">2span>259 div>260 <div class="clear">div>261 <div id="digg_tip_4715529" class="digg_tip">div>262 div> 263 <div class="post_item_body">264 <h3><a class="titlelnk" href="http://www.cnblogs.com/klguang/p/4715529.html" target="_blank">JSP执行过程详解a>h3> 265 <p class="post_item_summary">266 复习JSP的概念 JSP是Java Server Page的缩写,在传统的HTML页面中加入JSP标签和java的程序片段就构成了JSP。 JSP的基本语法:两种注释类型、三个脚本元素、三个元素指令、八个动作指令。 JSP的内置对象常用的有:Request、Response、Out、Session、... 267 p> 268 <div class="post_item_foot"> 269 <a href="http://www.cnblogs.com/klguang/" class="lightblue">klguanga> 270 发布于 2015-08-09 17:46 271 <span class="article_comment"><a href="http://www.cnblogs.com/klguang/p/4715529.html#commentform" title="" class="gray">272 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/klguang/p/4715529.html" class="gray">阅读(71)a>span>div>273 div>274 <div class="clear">div>275 div>276 <div class="post_item">277 <div class="digg">278 <div class="diggit" onclick="DiggPost('Lance--blog',4715495,226155,1)"> 279 <span class="diggnum" id="digg_count_4715495">0span>280 div>281 <div class="clear">div>282 <div id="digg_tip_4715495" class="digg_tip">div>283 div> 284 <div class="post_item_body">285 <h3><a class="titlelnk" href="http://www.cnblogs.com/Lance--blog/p/4715495.html" target="_blank">有关PHPstorm的git环境的配置和git密钥的生成总结a>h3> 286 <p class="post_item_summary">287 <a href="http://www.cnblogs.com/Lance--blog/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/752442/20150502152858.png" alt="">a> phpstorm上配置git环境的配置总感觉很简单,没发现看似简单的东西浪费我好多时间。我在网上查了一下关于phpstorm的git环境的配置没有具体的总结所以我把自己的配过程简单总结了一下接下来是我的配置环境的具体步骤一:在PHPstorm中配置 git环境(1)点击phpstorm的file->... 288 p> 289 <div class="post_item_foot"> 290 <a href="http://www.cnblogs.com/Lance--blog/" class="lightblue">lance--bloga> 291 发布于 2015-08-09 17:40 292 <span class="article_comment"><a href="http://www.cnblogs.com/Lance--blog/p/4715495.html#commentform" title="" class="gray">293 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/Lance--blog/p/4715495.html" class="gray">阅读(40)a>span>div>294 div>295 <div class="clear">div>296 div>297 <div class="post_item">298 <div class="digg">299 <div class="diggit" onclick="DiggPost('now-fighting',4715432,167921,1)"> 300 <span class="diggnum" id="digg_count_4715432">0span>301 div>302 <div class="clear">div>303 <div id="digg_tip_4715432" class="digg_tip">div>304 div> 305 <div class="post_item_body">306 <h3><a class="titlelnk" href="http://www.cnblogs.com/now-fighting/p/4715432.html" target="_blank">Java的Package和Classpatha>h3> 307 <p class="post_item_summary">308 <a href="http://www.cnblogs.com/now-fighting/" target="_blank"><img width="48" height="48" class="pfs" src="http://pic.cnblogs.com/face/579605/20140306195458.png" alt="">a> ## Package在Java中,Package是用来包含一系相关实例的集合。这些相关联的实例包括:类、接口、异常、错误以及枚举。Package主要有一些的几点作用:1. Package可以处理名字冲突,在冲突的名字前加上包的名字,通过使用名字的全限定名来访问名字的时候,可以避免名字冲突。因为在不同... 309 p> 310 <div class="post_item_foot"> 311 <a href="http://www.cnblogs.com/now-fighting/" class="lightblue">Now&Fighta> 312 发布于 2015-08-09 17:13 313 <span class="article_comment"><a href="http://www.cnblogs.com/now-fighting/p/4715432.html#commentform" title="" class="gray">314 评论(0)a>span><span class="article_view"><a href="http://www.cnblogs.com/now-fighting/p/4715432.html" class="gray">阅读(86)a>span>div>315 div>316 <div class="clear">div>317 div>318 <div class="post_item">319 <div class="digg">320 <div class="diggit" onclick="DiggPost('LBSer',4715395,149585,1)"> 321 <span class="diggnum" id="digg_count_4715395">2span>322 div>323 <div class="clear">div>324 <
08-22 03:03