问题描述
我想将我的Google chrome书签保存到数据库中,所以我的第一步是使用php从chrome导出的.html文件并将数据放入变量中,我希望能够获得一些PHP代码运行下面的数据,它将URL,ADD_DATE,ICON和链接"文本全部提取到各自的变量中.
I am wanting to get my google chrome bookmarks into a database, so my first step is taking an exported .html file from chrome with PHP and getting the data into variables, I am hoping to get some PHP code that will be able to run the data below and it will extract the URL, ADD_DATE, ICON, and the Link text all into there own variables.
我知道我需要为此使用一些正则表达式,任何人都可以帮忙吗?谢谢,我会在时间允许的情况下为您提供赏金.
I know I will need to use some regex with this, can anyone help? Thanks, I will add a bounty to this when the time allows.
<A HREF="http://snipt.net/public/tag/css"
ADD_DATE="1271801059"
ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACtklEQVQ4jXWSS2gTURSGvzszyaSxpsS2vhe2WosgilgVHyDqzo2iIoog+EIKCiIuFNTGjUoVBLWCiKArFcSFi7hQFLT4Qqp10SK11mKbgk3SmjSdJDNzj4s+0Fb/zTmL/3z8596jmKDElxcVYTuwxS3+Gu7O9DysqzvsTvT8KfVnP9DdvBfRZ3w3N197DqGAepV2AyePPuj9FDKNGUZBG68/dzo/Hjcm/gL0dcQrS4KRO9pzNvt+EdvUDOVdWr6lSKSdYUeFr39NhuNdP7N2KvNrZti21brF856eO7AloQAGul40iHgx3ysQsoNXP3Znih/avp6YX2lSXWESDRvprFe2fNHqfd8BdsduViQzxQ19mcxLAwAxporWKKXwXIyQJWxdMZu1i2YTjUTxsKeV2dlLsVjMALgXO5yMRqYMhE1zpjW6SBalQBSuXziyoNzC9UPk3QJaRsFa7QjOil5YWX/15Yqa6VYinc3m0vl2C0BEJxUKQQCh6Gu074MIIoIWjWhh55LipkiopDGpnVzT8UN5AGskgDRjmL74YooWEI2IIGhAA4IWQWD55prc1uo1R26P/YIBEK3e2KoM+5HCGB8ADTJSR2CC1oInXqz92anyvwAAnngNygrmRDQylmC8CogQDviIl5v7NrXg9CRAxbz17UpZTUqZiOjRNUYAQVMzNeDQ0muyL76Jg893Hdt+Y2jJ+BuMqeANXw5YJXs8d2iOiGAqTant0tVf5Mr7Wu53rsOX6ZSEvZ62nqyeeMoAJDuf1nvO4A2bQTLOMHdbolxrXUV/fiGEKFRFBm5VlfZffH66tvefgI6OuF0u7pt4a2pZ47vFfE4thWCQytLck9qy/nPNZ6veTZyZpPP3m7cF6n8K+0VKjxba6xp6d/3POynBmJaed07afs4s+tmmT7Gqwf/5fgMaeWl1u/QPfAAAAABJRU5ErkJggg=="
>Snipt - public - css | Share and store code or command snippets.</A>
更新
我喜欢yc用户推荐的使用类似这样的东西代替regex的建议
UPDATE
I liked user yc's recommendation of using something like this instead of regex
$s = '<A HREF="http://snipt.net/public/tag/css"
ADD_DATE="1271801059"
ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACtklEQVQ4jXWSS2gTURSGvzszyaSxpsS2vhe2WosgilgVHyDqzo2iIoog+EIKCiIuFNTGjUoVBLWCiKArFcSFi7hQFLT4Qqp10SK11mKbgk3SmjSdJDNzj4s+0Fb/zTmL/3z8596jmKDElxcVYTuwxS3+Gu7O9DysqzvsTvT8KfVnP9DdvBfRZ3w3N197DqGAepV2AyePPuj9FDKNGUZBG68/dzo/Hjcm/gL0dcQrS4KRO9pzNvt+EdvUDOVdWr6lSKSdYUeFr39NhuNdP7N2KvNrZti21brF856eO7AloQAGul40iHgx3ysQsoNXP3Znih/avp6YX2lSXWESDRvprFe2fNHqfd8BdsduViQzxQ19mcxLAwAxporWKKXwXIyQJWxdMZu1i2YTjUTxsKeV2dlLsVjMALgXO5yMRqYMhE1zpjW6SBalQBSuXziyoNzC9UPk3QJaRsFa7QjOil5YWX/15Yqa6VYinc3m0vl2C0BEJxUKQQCh6Gu074MIIoIWjWhh55LipkiopDGpnVzT8UN5AGskgDRjmL74YooWEI2IIGhAA4IWQWD55prc1uo1R26P/YIBEK3e2KoM+5HCGB8ADTJSR2CC1oInXqz92anyvwAAnngNygrmRDQylmC8CogQDviIl5v7NrXg9CRAxbz17UpZTUqZiOjRNUYAQVMzNeDQ0muyL76Jg893Hdt+Y2jJ+BuMqeANXw5YJXs8d2iOiGAqTant0tVf5Mr7Wu53rsOX6ZSEvZ62nqyeeMoAJDuf1nvO4A2bQTLOMHdbolxrXUV/fiGEKFRFBm5VlfZffH66tvefgI6OuF0u7pt4a2pZ47vFfE4thWCQytLck9qy/nPNZ6veTZyZpPP3m7cF6n8K+0VKjxba6xp6d/3POynBmJaed07afs4s+tmmT7Gqwf/5fgMaeWl1u/QPfAAAAABJRU5ErkJggg=="
>Snipt - public - css | Share and store code or command snippets.</A>';
$bookmarks = simplexml_load_string($s2);
echo $bookmarks["HREF"]; //URL
echo '<br>';
echo $bookmarks[0]; //Name
echo '<br>';
echo $bookmarks['ICON']; //Icon
echo '<br>';
echo $bookmarks['ADD_DATE']; //Add_Date
但是我还没有弄清楚如何使它与html页面或字符串上的多个链接一起使用.
however I have not figured out how to make it work with multiple links on an html page or string yet.
然后,我找到了这个PHP DOMDocument类,并且似乎使它像这样工作...
I then found this PHP DOMDocument class and I seem to have it working like this...
$html = '<DT><A HREF="http://stackapps.com/questions/518/stacktack-a-javascript-widget-you-can-stick-anywhere" ADD_DATE="1301274664" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACY0lEQVQ4jX2SS0jVQRTGv//MnJnxGjctpbCFIrgO2rRr06KiRdtKEYLwUj4gohdBZlFUEmmp0N8WIZXrIMiF25Zu27hQIaKiuHaze/93ni3ykY/bWR3O/M43zDdfghp14fpEo67HZwCx+ONby8uR28s7cayWQH0DP6G01lrruqb9LcdrcclaMzA0doqIHnESrxQnmfBkgHGeTwD4EEoxhKfWWuOc7/LGXH0y2Pd2XaCn5znl2zBPUraSJAghwAUHYxwAEGOAcw7eeVhjYapmqbQYO9K0YBkApGnBMqJJIoKSEkorKK2hlCwpJUtKa2itIZUEEUGQmEzTgt3kAZHgggSEJBBRJjjvz4Jpz4Jp5yT6BVG2ugxOgq97cO7G/ea9u5uO5nK5CaVVo9IanLG+S10nx/81a3R6ptd7N5ZVMmSVbLlcznpXvhdnWUN9wxIJMc0Za2SMgyVJyVTd1Fa3s3J1KkmSEmMcgvMGKfjrXOOuxe3fmGybbD9PNjhWWf7dZl3odCEshxAQI/JSqe6tezqnumNEPvgA50PRene2uFJsXb/v5sjULaXkkNYagihjgl8pV8vTAJBTuTPBh2Fnra5mGYyxg3cHuu4AgFgT8NZ5zzmssYiAFjE+qxPqHgAE5/PeOVhj4Z2DrTq/6cV/g5TMSyVbSQoIQRtBSoAY1oLkVoNkl34uhI40LVgOAHNz78LhI8cWInAohjCKGD947w+GELT3DtbYkvN+2JjqrLXugDfm8vjjix//6/m1By9OM+LTABCs73x4/fybnTix0xAAvn75NLOneV8FQFL5Fd/X4v4ArZQWGyLoDDcAAAAASUVORK5CYII=">app - StackTack, a JavaScript widget you can stick anywhere - Stack Apps</A>
<DT><A HREF="https://chrome.google.com/extensions/detail/paoeolblihedcagbofkkkecjilmpehmo" ADD_DATE="1301275461" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAADSklEQVQ4jW2TX2gbBQDGf5e7NGmTlOZ/xKVtbOfWrjWUNa6dOB9ay4aMqvgwEPRFRYTBGEPwwRcnfVB8ERQn+CBS6XSoVAWZImMWOttJtcZ2bdpm69qkjfnTXq7J3SV358uEyvzePvh9fC/fJ/A/OptIPPH6yMjzmq4PIgh2h9O5tZTPf33uypWJZC63vZ8V9ptwRzj0WvvhCy9EDrxUmJnxWsUilmUhOJ1IPp9VdjqTP+Ryb4+tr39xX+vQqWPdE9Pjy99cOG9lgiHrjiham0e6rc1HE9ZGLGYtgbUA1prTab3n8bzxb04ECAaD7g8n35z0NAd62FA5fPsuW0+d4nqsk6Q/TPlQB13xOL70bWyKwqDbPeR7su3Pq0v5RRFg9NzJs9E+x4t2a49SYxMlVxsflyDde4Z08DG+N0PoITcBu4F5J0tDRSMx8HBivLjzmRgOh13D50ferRibDzrRiQRb+fTaCguto4Tb++jv9qO5Olis5BG9axSG27jhCrArNHkdsZ1ZKXikNS43GO3ynkLE4aRumeQNGaVSIyfryB4RRQW5mCXQXufMcxmKp7P8nowSmbGdkMpq+YGNvVKT3aaTrZSJtuzQdTDC4i+XmZNcpIoRasVlfDevEjvmg/IWLUKGwa46yXkrIlk1dK1u1/NmzeW3a7QUFxgZOUEm+zPJ6TF2qh58lswrz/TQfzSDJWdQa40oio2KqqnS1tpmqjOnbZf8dm9KqSAKINqnOf10goOODR55qJGe7iY6W1dg5ztqpoSqSqiawPyt8i3RqOqy5LZ3O45G+zdKOUwRcms1WtLw8rMCfX2r+BzXYfcnTMNA0+wolWbWs5py8VJhzAboqW9/+9y5aeRw+5hf3WPIauXV0QLeph+hMAXKEqYpoGoOdmUPRt3kky/lr0ol/hAB0M2/95bzZufjvQNii6/heKiZ/tANqK6AUcUwLGp1G1W1Gb0mcWmiMPX+ePkikBbvLbKmFspr8uz6brg9FF0NBANoMQ55y0i2EjZTQNcl/kohv/XB7uRHl+V3gFnA/M+ZgCBw3N97YDg8EI93tYX8cVuqKuVuKlNz3P11Tp0pyFwDFoH6fW+8JwmIAFHAA9SAKpAHtoHyfvgfh8p7963YqU4AAAAASUVORK5CYII=">StackStalker - Google Chrome extension gallery</A>
<DT><A HREF="http://stackapps.com/questions/319/phpstack-a-php-wrapper-to-the-se-api" ADD_DATE="1301276371" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACY0lEQVQ4jX2SS0jVQRTGv//MnJnxGjctpbCFIrgO2rRr06KiRdtKEYLwUj4gohdBZlFUEmmp0N8WIZXrIMiF25Zu27hQIaKiuHaze/93ni3ykY/bWR3O/M43zDdfghp14fpEo67HZwCx+ONby8uR28s7cayWQH0DP6G01lrruqb9LcdrcclaMzA0doqIHnESrxQnmfBkgHGeTwD4EEoxhKfWWuOc7/LGXH0y2Pd2XaCn5znl2zBPUraSJAghwAUHYxwAEGOAcw7eeVhjYapmqbQYO9K0YBkApGnBMqJJIoKSEkorKK2hlCwpJUtKa2itIZUEEUGQmEzTgt3kAZHgggSEJBBRJjjvz4Jpz4Jp5yT6BVG2ugxOgq97cO7G/ea9u5uO5nK5CaVVo9IanLG+S10nx/81a3R6ptd7N5ZVMmSVbLlcznpXvhdnWUN9wxIJMc0Za2SMgyVJyVTd1Fa3s3J1KkmSEmMcgvMGKfjrXOOuxe3fmGybbD9PNjhWWf7dZl3odCEshxAQI/JSqe6tezqnumNEPvgA50PRene2uFJsXb/v5sjULaXkkNYagihjgl8pV8vTAJBTuTPBh2Fnra5mGYyxg3cHuu4AgFgT8NZ5zzmssYiAFjE+qxPqHgAE5/PeOVhj4Z2DrTq/6cV/g5TMSyVbSQoIQRtBSoAY1oLkVoNkl34uhI40LVgOAHNz78LhI8cWInAohjCKGD947w+GELT3DtbYkvN+2JjqrLXugDfm8vjjix//6/m1By9OM+LTABCs73x4/fybnTix0xAAvn75NLOneV8FQFL5Fd/X4v4ArZQWGyLoDDcAAAAASUVORK5CYII=">library - PHPstack - A PHP wrapper to the SE API - Stack Apps</A>
';
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node)
{
echo 'Title = ' .$node->nodeValue. '</br>';
echo 'URL = ' .$node->getAttribute("href"). '</br>';
echo 'Icon = ' . $node->getAttribute("icon"). '</br>';
echo 'Date Added = ' . $node->getAttribute("add_date"). '</br>';
echo '<br>';
}
推荐答案
请勿使用,因为HTML(即使由Chrome提供)也不是常规语言.
Don't use regex, since HTML, even if provided by Chrome, isn't a regular language.
使用XML解析器,例如SimpleXML
.
Use an XML Parser, like SimpleXML
.
如果上面的字符串是$s
,
$bookmarks = simplexml_load_string($s);
echo $bookmarks["HREF"]; //URL
echo $bookmarks[0]; //Name
这篇关于使用PHP从Google Chrome书签导出中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!