如何从网页来源中获取一个字符串?我到处都是PHP.net,但无法确定PHP是否具有可以从中获取字符串的函数或函数集。
例如,这是我目前拥有的(并且我想从"wgCategories"
到"wgMonthNamesShort"
的所有内容)从存储在$html
中的网页中获得:
<?php
error_reporting(E_ALL);
$html = file_get_contents('http://en.wikipedia.org/wiki/Los_Angeles');
$string = <>;
?>
首先,我将网页的源代码捕获到$ html变量中。现在,我需要一个可以捕获从
"wgCategories"
到"wgMonthNamesShort"
的所有内容并将其存储到$ string中的函数或函数集。所需结果:
$string = "wgCategories":["All articles with dead external links","Articles with dead external links from March 2013","Articles with dead external links from March 2014","Pages with broken reference names","Articles with dead external links from January 2014","Articles with dead external links from September 2011","Articles with dead external links from October 2011","CS1 errors: dates","Use mdy dates from May 2014","Wikipedia indefinitely semi-protected pages","Wikipedia indefinitely move-protected pages","Coordinates on Wikidata","Articles including recorded pronunciations","Articles containing Spanish-language text","All articles with unsourced statements","Articles with unsourced statements from December 2013","Spoken articles","Articles with hAudio microformats","Los Angeles, California","Cities in Los Angeles County, California","Communities on U.S. Route 66","County seats in California","Incorporated cities and towns in California","Populated coastal places in California","Populated places established in 1781","Port cities and towns of the United States Pacific coast","Butterfield Overland Mail in California","Stockton - Los Angeles Road"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort";
最后,请注意,从
"wgCategories"
到"wgMonthNamesShort"
的所有内容都存储在<script>
标记之间(不知道这是否重要,但是有人告诉我这值得一提)。让我知道是否需要澄清。
最佳答案
您可以将preg_match
与s
标志(DOTALL)结合使用,以抓取2个关键字之间的字符串:
error_reporting(E_ALL);
$html = file_get_contents('http://en.wikipedia.org/wiki/Los_Angeles');
if (preg_match('/wgCategories.*?wgMonthNamesShort/is', $html, $matches))
echo $matches[0];
您可以避免使用正则表达式,而使用PHP字符串函数(例如
stristr
)也可以这样做。上面的代码打印:
wgCategories":["All articles with dead external links","Articles with dead external links from March 2013","Articles with dead external links from March 2014","Pages with broken reference names","Articles with dead external links from January 2014","Articles with dead external links from September 2011","Articles with dead external links from October 2011","CS1 errors: dates","Use mdy dates from May 2014","Wikipedia indefinitely semi-protected pages","Wikipedia indefinitely move-protected pages","Coordinates on Wikidata","Articles including recorded pronunciations","Articles containing Spanish-language text","All articles with unsourced statements","Articles with unsourced statements from December 2013","Spoken articles","Articles with hAudio microformats","Los Angeles, California","Cities in Los Angeles County, California","Communities on U.S. Route 66","County seats in California","Incorporated cities and towns in California","Populated coastal places in California","Populated places established in 1781","Port cities and towns of the United States Pacific coast","Butterfield Overland Mail in California","Stockton - Los Angeles Road"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort
关于javascript - 如何从网页来源中获取一个字符串?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/25216541/