本文介绍了XHTML到XML的转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 限时删除!! 我正在尝试做一些屏幕抓取,并且正在使用 < http://www.oreilly.com/catalog/xmlhks/>灵感。 首先我想将XHTML转换为XML,或者从XHTML中提取XML,我是 不知道怎么用短语那个。 使用Cocoon创建一个网页的形成良好的视图,然后刮掉它 for data" < http://hacks.oreilly.com/pub/h/2125> 这就是我想要做的事情,但是现在我'我正在工作 更简单。 首先, 将HTML文档转换为带有HTML Tidy的XHTML < http://hacks.oreilly.com/pub/h/2054> 而不是Tidy,我选择TagSoup < http://mercury.ccil.org/~cowan/XML/tagsoup/>。 然后我想从XHTML转到XML以便: "使用Relaxer生成XSLT标识样式表> < http://hacks.oreilly.com/pub/h/2069> 如何从XHTML中获取XML? 这里是'我所拥有的:[thufir @ arrakis tagSoup] $ [thufir @ arrakis tagSoup] $ date Sun Aug 14 23:34: 13 IST 2005 [thufir @ arrakis tagSoup] $ pwd / home / thufir / Desktop / tagSoup [thufir @ arrakis tagSoup] $ l $ 总计60 -rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html -rw- rw-r-- 1 thufir thufir 42207 8月14日23:32 tagsoup.jar [thufir @ arrakis tagSoup] $ java -jar tagsoup.jar --files google.html src:google.html dst:google.xhtml [thufir @ arrakis tagSoup] $ ll 总计76 -rw-rw -r-- 1 thufir thufir 7662 Aug 13 22:08 google.html -rw-rw-r-- 1 thufir thufir 10568 8月14日23:34 google.xhtml -rw-rw-r-- 1 thufir thufir 42207 8月14日23:32 tagsoup.jar [thufir @ arrakis tagSoup] $ cat google.xhtml -n 1<?xml version =" 1.0" standalone =" yes"?> 2 3< html version =" - // W3C // DTD HTML 4.01 Transitional // EN" xmlns =" http://www.w3.org/1999/xhtml">< head>< title> Google 目录< / title>< style>& lt;! - 4 body,td,a,p,.h {font-family:arial,sans-serif;} .. h {color:#008000} ..q {text-decoration:none;颜色:#0000cc;} 5 // - & gt;< / style>< script> 6& lt;! - 7函数sf(){document.fqfocus();} 8 // - & gt; 9< / script> ;< / head>< body bgcolor =" #ffffff" text ="#000000" link ="#3300cc"的Vlink = QUOT;#660066" ALINK = QUOT;#FF0000" onload =" sf();"> 10< center> 11< table cellpadding =" 0" CELLSPACING = QUOT; 0" border =" 0">< tr>< td align =" right"列跨度= QUOT 1 QUOT;行跨度= QUOT 1 QUOT; valign =" bottom">< img src =" http://www.google.com/images/hp0.gif"宽度= QUOT; 158" height =" 78" alt =" Google Directory">< / img>< / td>< td colspan =" 1" rowspan =" 1" valign =" bottom">< img src =" http://www.google.com/images/hp1.gif" width =" 50"高度= QUOT; 78" alt ="">< / img>< / td>< td colspan =" 1" rowspan =" 1" valign =" bottom">< img src =" http://www.google.com/images/hp2.gif" width =" 68"高度= QUOT; 78" alt ="">< / img>< / td>< / tr>< tr>< td align =" right" colspan =" 1"行跨度= QUOT 1 QUOT; VALIGN = QUOT;顶" class =" h">< b>目录< / b>< / td>< td colspan =" 1"行跨度= QUOT 1 QUOT; valign =" top">< img src =" http://www.google.com/images/hp3.gif"宽度= QUOT; 50" height =" 32" alt ="">< / img>< / td>< td colspan =" 1"行跨度= QUOT 1 QUOT; valign =" top" class =" h">< / td>< / tr>< / table>< br clear =" none">< / br>< table border =" 0" cellspacing =" 0" cellpadding =" 0">< tr>< td colspan =" 1" rowspan =" 1" width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap" rowspan =" 1" ID = QUOT; 0" BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect" class =" q" ID = QUOT; 0A" href =" http://www.google.com/webhp?hl = zh">< font size =" -1"> Web< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1" width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap" rowspan =" 1" ID = QUOT 1 QUOT; BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect" class =" q" ID = QUOT 1a是QUOT; href =" http://www.google.com/imghp?hl = zh">< font size =" -1"> Images< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1" width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap" rowspan =" 1" ID = QUOT; 2英寸BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect" class =" q" ID = QUOT 2a是QUOT; href =" http://www.google.com/grphp?hl = zh">< font size =" -1"> Groups< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1" width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap" rowspan =" 1" ID = QUOT; 3英寸BGCOLOR = QUOT;#008000" width =" 95">< font color =" #ffffff" size =" -1">< b> Directory< / b>< / font>< ; / td>< td colspan =" 1" rowspan =" 1" width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap" rowspan =" 1" ID = QUOT; 4英寸BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect" class =" q" ID = QUOT 4a是QUOT; href =" http://www.google.com/nwshp?hl = zh">< font size =" -1"> News< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1" width =" 15"> < / td>< td colspan =" 1" rowspan =" 1" width =" 15"> < / td>< / tr>< tr>< td colspan =" 12" rowspan =" 1" bgcolor ="#008000">< img width =" 1" height =" 1" alt ="">< / img>< / td>< / tr>< / table>< br clear =" none" ;>< / br>< form enctype =" application / x-www-form-urlencoded" method =" get" action =" http://www.google.com/search" name =" f">< table cellpadding =" 0" cellspacing =" 0">< tr align =" middle" valign =" center">< td colspan =" 1" rowspan =" 1"宽度= QUOT; 150"> < / td>< td colspan =" 1" rowspan =" 1">< input maxlength =" 256"类型= QUOT;文本"名称= QUOT; Q" size =" 40" value ="">< / input>< script> document.fqfocus();< / script>< input type =" submit"名称= QUOT; btnG" value =" Google Search">< / input>< input type =" hidden"名称= QUOT; HL" value =" en">< / input>< input type =" hidden" name =" cat" value =" gwd / Top">< / input>< / td>< td align =" left" colspan =" 1" rowspan =" 1" width =" 150">< font size =" -2"> •< a shape =" rect" href =" http://www.google.com/dirhelp.html">目录 帮助< / a>< / font>< / td>< / tr>< ; / table>< / form>< p>< font color ="#008000">< b> 网页按主题分类。< / b> < / font>< / p>< p>< / p>< table align =" center"宽度= QUOT 1%QUOT;边界=" 0" cellspacing =" 7" cellpadding =" 0">< tr>< td colspan =" 4"行跨度= QUOT 1 QUOT; bgcolor ="#008000">< img width =" 1"高度= QUOT 1 QUOT; alt ="">< / img>< / td>< / tr>< tr>< td colspan =" 1" rowspan =" 1" > < / td>< td colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1"> 12< b>< a shape =" rect" href =" / Top / Arts /"> Arts< / a>< / b>< br clear =" none">< / br> 13< font size =" -1">< a shape =" rect" href =" / Top / Arts / Movies /">电影< / a>,< a shape =" rect" href =" / Top / Arts / Music /">音乐< / a>,< a shape =" ; rect" href =" / Top / Arts / Television /"> Television< / a> ;, ...< / font>< p> 14< b>< a shape =" rect" href =" / Top / Business /">业务< / a>< / b>< br clear =" none">< / br> 15< font size =" -1">< a shape =" rect" href =" / Top / Business / Major_Companies /">公司< / a>,< a shape =" rect" href =" / Top / Business / Financial_Services /"> Finance< / a>,< a shape =" ; rect" href =" / Top / Business / Employment /"> Jobs< / a> ;, ...< / font>< / p>< p> 16< b>< a shape =" rect" href =" / Top / Computers /"> Computers< / a>< / b>< br clear =" none">< / br> 17< font size =" -1">< a shape =" rect" href =" / Top / Computers / Internet /">互联网< / a>,< a shape =" rect" href =" / Top / Computers / Hardware /">硬件< / a>,< a shape =" ; rect" href =" / Top / Computers / Software /"> Software< / a> ;, ...< / font>< / p>< p> 18< b>< a shape =" rect" href =" / Top / Games /"> Games< / a>< / b>< br clear =" none">< / br> 19< font size =" -1">< a shape =" rect" href =" / Top / Games / Board_Games /"> Board< / a>,< a shape =" rect" href =" / Top / Games / Roleplaying /"> Roleplaying< / a>,< a shape =" ; rect" href =" / Top / Games / Video_Games /">视频< / a>,...< / font>< / p>< p> 20< b>< a shape =" rect" href =" / Top / Health /"> Health< / a>< / b>< br clear =" none">< / br> 21< font size =" -1">< a shape =" rect" href =" / Top / Health / Alternative /">替代< / a>,< a shape =" rect" href =" / Top / Health / Fitness /"> Fitness< / a>,< a shape =" ; rect" href =" / Top / Health / Medicine /"> Medicine< / a> ;, ...< / font>< / p>< p> 22< / p>< / td>< td colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1"> 23< b>< a shape =" rect" href =" / Top / Home /"> Home< / a>< / b>< br clear =" none">< / br> 24< font size =" -1">< a shape =" rect" href =" / Top / Home / Consumer_Information /">消费者< / a>,< a shape =" rect" href =" / Top / Home / Homeowners /">房主< / a>,< a shape =" ; rect" href =" / Top / Home / Family /"> Family< / a> ;, ...< / font>< p> 25< b>< a shape =" rect" href =" / Top / Kids_and_Teens /"> Kids and Teens< / a>< / b>< br clear =" none">< / br> 26< font size =" -1">< a shape =" rect" href =" / Top / Kids_and_Teens / Computers /" >计算机< / a>,< a shape =" rect" href =" / Top / Kids_and_Teens / Entertainment /">娱乐< / a>,< a shape =" rect" href =" / Top / Kids_and_Teens / School_Time /"> School< / a>, ....< / font>< / p>< p> 27< b>< a shape =" rect" href =" / Top / News /"> News< / a>< / b>< br clear =" none">< / br> 28< font size =" -1">< a shape =" rect" href =" / Top / News / Media /">媒体< / a>,< a shape =" rect" href =" / Top / News / Newspapers /"> Newspapers< / a>,< a shape =" ; rect" href =" / Top / News / Current_Events /"> Current Events< / a>,...< / font>< / p>< p> 29< b>< a shape =" rect" href =" / Top / Recreation /"> Recreation< / a>< / b>< br clear =" none">< / br> 30< font size =" -1">< a shape =" rect" href =" / Top / Recreation / Food /"> Food< / a> ,< a shape =" rect" href =" / Top / Recreation / Outdoors /"> Outdoors< / a>,< a shape =" rect" href =" / Top / Recreation / Travel /"> Travel< / a> ;, ...< / font>< / p>< p> 31< b>< a shape =" rect" href =" / Top / Reference /">参考< / a>< / b>< br clear =" none">< / br> 32< font size =" -1">< a shape =" rect" href =" / Top / Reference / Education /">教育< / a>,< a shape =" rect" href =" / Top / Reference / Libraries /"> Libraries< / a>,< a shape =" ; rect" href =" / Top / Reference / Maps /"> Maps< / a>,...< / font>< / p>< p> 33< / p>< / td>< td colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1"> 34< b>< a shape =" rect" href =" / Top / Regional /"> Regional< / a>< / b>< br clear =" none">< / br> 35< font size =" -1">< a shape =" rect" href =" / Top / Regional / Asia /">亚洲< / a>,< a shape =" rect" href =" / Top / Regional / Europe /"> Europe< / a> ;,< a shape =" ; rect" href =" / Top / Regional / North_America /"> North America< / a> ;, ...< / font>< p> 36< b>< a shape =" rect" href =" / Top / Science /"> Science< / a>< / b>< br clear =" none">< / br> 37< font size =" -1">< a shape =" rect" href =" / Top / Science / Biology /"> Biology< / a>,< a shape =" rect" href =" / Top / Science / Social_Sciences / Psychology /"> Psychology< / a> ;,&l; a shape =" rect" href =" / Top / Science / Physics /"> Physics< / a>, ....< / font>< / p>< p> 38< b>< a shape =" rect" href =" / Top / Shopping /"> Shopping< / a>< / b>< br clear =" none">< / br> 39< font size =" -1">< a shape =" rect" href =" / Top / Shopping / Vehicles / Autos /" > Autos< / a>,< a shape =" rect" href =" / Top / Shopping / Clothing /">服装< / a>,< a shape =" rect" href =" / Top / Shopping / Gifts /"> Gifts< / a>,...< / font>< / p>< p> ; 40< b>< a shape =" rect" href =" / Top / Society /"> Society< / a>< / b>< br clear =" none">< / br> 41< font size =" -1">< a shape =" rect" href =" / Top / Society / Issues /">问题< / a>,< a shape =" rect" href =" / Top / Society / People /"> People< / a>,< a shape =" ; rect" href =" / Top / Society / Religion_and_Spirituality /"> Religion< / a> ;, ....< / font>< ; / p>< p> 42< b>< a shape =" rect" href =" / Top / Sports /"> Sports< / a>< / b>< br clear =" none">< / br> 43< font size =" -1">< a shape =" rect" href =" / Top / Sports / Basketball /">篮球< / a>,< a shape =" rect" href =" / Top / Sports / Football /"> Football< / a>,< a shape =" ; rect" href =" / Top / Sports / Soccer /"> Soccer< / a> ;, ...< / font>< / p>< p> 44< / p>< / td>< / tr>< tr>< td colspan =" 1"行跨度= QUOT 1 QUOT;> < / td>< td colspan =" 3" rowspan =" 1">< b>< a shape =" rect" href =" / Top / World /"> World< / a>< / b>< br clear =" none">< / br> 45< font size =" -1">< a shape =" rect" href =" / Top / World / Deutsch /"> Deutsch< / a>,< a shape =" rect" href =" / Top / World / Espa%C3%B1ol /"> Espa&#65533; ol< / a>,< a shape =" rect" href =" / Top / World / Fran %C3%A7ais /"> Fran&#65533; ais< / a>,< a shape =" rect" href =" / Top / World / Italiano /"> ; Italiano< / a>,< a shape =" rect" href =" / Top / World / Japanese /"> Japanese< / a> ;,< a shape = " rect" href =" / Top / World / Korean /"> Korean< / a> ;,< a shape =" rect" href =" / Top / World / Nederlands /"> Nederlands< / a> ;,< a shape =" rect" href =" ; / Top / World / Polska /"> Polska< / a>,< a shape =" rect" href =" / Top / World / Svenska /"> Svenska< / a>,...< / font>< p> 46< / p>< / td>< / tr>< tr>< td colspan = QUOT 1 QUOT;行跨度= QUOT 1 QUOT;> < / td>< td colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1">< font size =" -1"> < / font>< / td>< / tr>< tr>< td colspan =" 4" rowspan =" 1" bgcolor ="#008000">< img width =" 1" height =" 1" alt ="">< / img>< / td>< / tr>< / table>< br clear =" none" ;>< / br>< font size =" -1">< a shape =" rect" href =" http ://www.google.com/ads/">刊登广告< / A> - < a shape =" rect" href =" http://www.google.com/about.html"> Jobs,Press, Cool Stuff ...< / a>< / font>< p>< font face =" arial,sans-serif"大小= QUOT; -1 QUOT;> ©2004 Google< / font>< / p>< br clear =" none">< / br>< table align =" center"边界=" 0" bgcolor ="#336600" cellpadding =" 3" cellspacing =" 0">< tr>< td colspan =" 1"行跨度= QUOT 1 QUOT;> < table width =" 100%" CELLPADDING = QUOT; 2英寸CELLSPACING = QUOT; 0" border =" 0">< tr align =" center">< td colspan =" 1" rowspan =" 1">< font face =" sans-serif, Arial,Helvetica"大小= QUOT; 2英寸color =" #ffffff">帮助在网络上构建最大的人工编辑目录。< / font>< / td>< / tr>< tr align =" ; center bgcolor =" #cccccc">< td colspan =" 1" rowspan =" 1">< font face =" sans-serif, Arial,Helvetica" size =" 2"> 47< a shape =" rect" href =" http://dmoz.org/add.html"> 48提交网站< / a> - < a shape =" rect" href =" http://dmoz.org/about.html">< b> Open Directory Project< / b>< /一个> - 49< a shape =" rect" href =" http://dmoz.org/cgi-bin/apply.cgi">成为 编辑< / a> < / font> 50< / td>< / tr>< / table> < / td>< / tr>< / table> 52< / center>< / body>< / html> [thufir @ arrakis tagSoup] $ date Sun Aug 14 23:34:57 IST 2005 [thufir @ arrakis tagSoup] $ 谢谢, Thufir 解决方案 pwd / home / thufir / Desktop / tagSoup [thufir @ arrakis tagSoup] I''m trying do some "screen scraping", and am using<http://www.oreilly.com/catalog/xmlhks/> for inspiration.First I''d like to convert XHTML to XML, or extract XML from XHTML, I''mnot sure how to phrase that."Use Cocoon to Create a Well-Formed View of a Web Page, Then Scrape Itfor Data"<http://hacks.oreilly.com/pub/h/2125>Is what I''d like to do down the line, but for now I''m working onsomething simpler.First,"Convert an HTML Document to XHTML with HTML Tidy"<http://hacks.oreilly.com/pub/h/2054>Instead of Tidy, I went with TagSoup<http://mercury.ccil.org/~cowan/XML/tagsoup/>.Then I''d like go from XHTML to XML in order to:"Generate an XSLT Identity Stylesheet with Relaxer"<http://hacks.oreilly.com/pub/h/2069>How do I get the XML from the XHTML, please?here''s what I have:[thufir@arrakis tagSoup]$[thufir@arrakis tagSoup]$ dateSun Aug 14 23:34:13 IST 2005[thufir@arrakis tagSoup]$ pwd/home/thufir/Desktop/tagSoup[thufir@arrakis tagSoup]$ lltotal 60-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar[thufir@arrakis tagSoup]$ java -jar tagsoup.jar --files google.htmlsrc: google.html dst: google.xhtml[thufir@arrakis tagSoup]$ lltotal 76-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html-rw-rw-r-- 1 thufir thufir 10568 Aug 14 23:34 google.xhtml-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar[thufir@arrakis tagSoup]$ cat google.xhtml -n1 <?xml version="1.0" standalone="yes"?>23 <html version="-//W3C//DTD HTML 4.01 Transitional//EN"xmlns="http://www.w3.org/1999/xhtml"><head><title>GoogleDirectory</title><style><!--4 body,td,a,p,.h{font-family: arial,sans-serif;}..h{color:#008000}..q{text-decoration:none; color:#0000cc;}5 //--></style><script>6 <!--7 function sf(){document.f.q.focus();}8 // -->9 </script></head><body bgcolor="#ffffff" text="#000000"link="#3300cc" vlink="#660066" alink="#ff0000" onload="sf();">10 <center>11 <table cellpadding="0" cellspacing="0" border="0"><tr><tdalign="right" colspan="1" rowspan="1" valign="bottom"><imgsrc="http://www.google.com/images/hp0.gif" width="158" height="78"alt="Google Directory"></img></td><td colspan="1" rowspan="1"valign="bottom"><img src="http://www.google.com/images/hp1.gif"width="50" height="78" alt=""></img></td><td colspan="1" rowspan="1"valign="bottom"><img src="http://www.google.com/images/hp2.gif"width="68" height="78" alt=""></img></td></tr><tr><td align="right"colspan="1" rowspan="1" valign="top" class="h"><b>Directory</b></td><tdcolspan="1" rowspan="1" valign="top"><imgsrc="http://www.google.com/images/hp3.gif" width="50" height="32"alt=""></img></td><td colspan="1" rowspan="1" valign="top"class="h"></td></tr></table><br clear="none"></br><table border="0"cellspacing="0" cellpadding="0"><tr><td colspan="1" rowspan="1"width="15"> </td><td align="center" colspan="1" nowrap="nowrap"rowspan="1" id="0" bgcolor="#efefef" width="95"><a shape="rect"class="q" id="0a" href="http://www.google.com/webhp?hl=en"><fontsize="-1">Web</font></a></td><td colspan="1" rowspan="1"width="15"> </td><td align="center" colspan="1" nowrap="nowrap"rowspan="1" id="1" bgcolor="#efefef" width="95"><a shape="rect"class="q" id="1a" href="http://www.google.com/imghp?hl=en"><fontsize="-1">Images</font></a></td><td colspan="1" rowspan="1"width="15"> </td><td align="center" colspan="1" nowrap="nowrap"rowspan="1" id="2" bgcolor="#efefef" width="95"><a shape="rect"class="q" id="2a" href="http://www.google.com/grphp?hl=en"><fontsize="-1">Groups</font></a></td><td colspan="1" rowspan="1"width="15"> </td><td align="center" colspan="1" nowrap="nowrap"rowspan="1" id="3" bgcolor="#008000" width="95"><font color="#ffffff"size="-1"><b>Directory</b></font></td><td colspan="1" rowspan="1"width="15"> </td><td align="center" colspan="1" nowrap="nowrap"rowspan="1" id="4" bgcolor="#efefef" width="95"><a shape="rect"class="q" id="4a" href="http://www.google.com/nwshp?hl=en"><fontsize="-1">News</font></a></td><td colspan="1" rowspan="1"width="15"> </td><td colspan="1" rowspan="1"width="15"> </td></tr><tr><td colspan="12" rowspan="1"bgcolor="#008000"><img width="1" height="1"alt=""></img></td></tr></table><br clear="none"></br><formenctype="application/x-www-form-urlencoded" method="get"action="http://www.google.com/search" name="f"><table cellpadding="0"cellspacing="0"><tr align="middle" valign="center"><td colspan="1"rowspan="1" width="150"> </td><td colspan="1" rowspan="1"><inputmaxlength="256" type="text" name="q" size="40"value=""></input><script>document.f.q.focus();</script><inputtype="submit" name="btnG" value="Google Search"></input><inputtype="hidden" name="hl" value="en"></input><input type="hidden"name="cat" value="gwd/Top"></input></td><td align="left" colspan="1"rowspan="1" width="150"><font size="-2"> • <ashape="rect" href="http://www.google.com/dirhelp.html">DirectoryHelp</a></font></td></tr></table></form><p><font color="#008000"><b>Theweb organized by topic into categories.</b></font></p><p></p><tablealign="center" width="1%" border="0" cellspacing="7"cellpadding="0"><tr><td colspan="4" rowspan="1" bgcolor="#008000"><imgwidth="1" height="1" alt=""></img></td></tr><tr><td colspan="1"rowspan="1"> </td><td colspan="1" nowrap="nowrap" rowspan="1">12 <b><a shape="rect" href="/Top/Arts/">Arts</a></b><brclear="none"></br>13 <font size="-1"><a shape="rect"href="/Top/Arts/Movies/">Movies</a>, <a shape="rect"href="/Top/Arts/Music/">Music</a>, <a shape="rect"href="/Top/Arts/Television/">Television</a>, ...</font><p>14 <b><a shape="rect" href="/Top/Business/">Business</a></b><brclear="none"></br>15 <font size="-1"><a shape="rect"href="/Top/Business/Major_Companies/">Companies</a>, <a shape="rect"href="/Top/Business/Financial_Services/">Finance</a>, <a shape="rect"href="/Top/Business/Employment/">Jobs</a>, ...</font></p><p>16 <b><a shape="rect" href="/Top/Computers/">Computers</a></b><brclear="none"></br>17 <font size="-1"><a shape="rect"href="/Top/Computers/Internet/">Internet</a>, <a shape="rect"href="/Top/Computers/Hardware/">Hardware</a>, <a shape="rect"href="/Top/Computers/Software/">Software</a>, ...</font></p><p>18 <b><a shape="rect" href="/Top/Games/">Games</a></b><brclear="none"></br>19 <font size="-1"><a shape="rect"href="/Top/Games/Board_Games/">Board</a>, <a shape="rect"href="/Top/Games/Roleplaying/">Roleplaying</a>, <a shape="rect"href="/Top/Games/Video_Games/">Video</a>, ...</font></p><p>20 <b><a shape="rect" href="/Top/Health/">Health</a></b><brclear="none"></br>21 <font size="-1"><a shape="rect"href="/Top/Health/Alternative/">Alternative</a>, <a shape="rect"href="/Top/Health/Fitness/">Fitness</a>, <a shape="rect"href="/Top/Health/Medicine/">Medicine</a>, ...</font></p><p>22 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">23 <b><a shape="rect" href="/Top/Home/">Home</a></b><brclear="none"></br>24 <font size="-1"><a shape="rect"href="/Top/Home/Consumer_Information/">Consumers</a>, <a shape="rect"href="/Top/Home/Homeowners/">Homeowners</a>, <a shape="rect"href="/Top/Home/Family/">Family</a>, ...</font><p>25 <b><a shape="rect" href="/Top/Kids_and_Teens/">Kids andTeens</a></b><br clear="none"></br>26 <font size="-1"><a shape="rect"href="/Top/Kids_and_Teens/Computers/">Computers</a>, <a shape="rect"href="/Top/Kids_and_Teens/Entertainment/">Entertainment</a>, <ashape="rect" href="/Top/Kids_and_Teens/School_Time/">School</a>,....</font></p><p>27 <b><a shape="rect" href="/Top/News/">News</a></b><brclear="none"></br>28 <font size="-1"><a shape="rect"href="/Top/News/Media/">Media</a>, <a shape="rect"href="/Top/News/Newspapers/">Newspapers</a>, <a shape="rect"href="/Top/News/Current_Events/">Current Events</a>, ...</font></p><p>29 <b><a shape="rect"href="/Top/Recreation/">Recreation</a></b><brclear="none"></br> 30 <font size="-1"><a shape="rect"href="/Top/Recreation/Food/">Food</a>, <a shape="rect"href="/Top/Recreation/Outdoors/">Outdoors</a>, <a shape="rect"href="/Top/Recreation/Travel/">Travel</a>, ...</font></p><p>31 <b><a shape="rect" href="/Top/Reference/">Reference</a></b><brclear="none"></br>32 <font size="-1"><a shape="rect"href="/Top/Reference/Education/">Education</a>, <a shape="rect"href="/Top/Reference/Libraries/">Libraries</a>, <a shape="rect"href="/Top/Reference/Maps/">Maps</a>, ...</font></p><p>33 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">34 <b><a shape="rect" href="/Top/Regional/">Regional</a></b><brclear="none"></br>35 <font size="-1"><a shape="rect"href="/Top/Regional/Asia/">Asia</a>, <a shape="rect"href="/Top/Regional/Europe/">Europe</a>, <a shape="rect"href="/Top/Regional/North_America/">North America</a>, ...</font><p>36 <b><a shape="rect" href="/Top/Science/">Science</a></b><brclear="none"></br>37 <font size="-1"><a shape="rect"href="/Top/Science/Biology/">Biology</a>, <a shape="rect"href="/Top/Science/Social_Sciences/Psychology/">Psychology</a>, <ashape="rect" href="/Top/Science/Physics/">Physics</a>,....</font></p><p>38 <b><a shape="rect" href="/Top/Shopping/">Shopping</a></b><brclear="none"></br>39 <font size="-1"><a shape="rect"href="/Top/Shopping/Vehicles/Autos/">Autos</a>, <a shape="rect"href="/Top/Shopping/Clothing/">Clothing</a>, <a shape="rect"href="/Top/Shopping/Gifts/">Gifts</a>, ...</font></p><p>40 <b><a shape="rect" href="/Top/Society/">Society</a></b><brclear="none"></br>41 <font size="-1"><a shape="rect"href="/Top/Society/Issues/">Issues</a>, <a shape="rect"href="/Top/Society/People/">People</a>, <a shape="rect"href="/Top/Society/Religion_and_Spirituality/">Religion</a>,....</font></p><p>42 <b><a shape="rect" href="/Top/Sports/">Sports</a></b><brclear="none"></br>43 <font size="-1"><a shape="rect"href="/Top/Sports/Basketball/">Basketball</a>, <a shape="rect"href="/Top/Sports/Football/">Football</a>, <a shape="rect"href="/Top/Sports/Soccer/">Soccer</a>, ...</font></p><p>44 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><tdcolspan="3" rowspan="1"><b><a shape="rect"href="/Top/World/">World</a></b><br clear="none"></br>45 <font size="-1"><a shape="rect"href="/Top/World/Deutsch/">Deutsch</a>, <a shape="rect"href="/Top/World/Espa%C3%B1ol/">Espa�ol</a>, <a shape="rect"href="/Top/World/Fran%C3%A7ais/">Fran�ais</a>, <a shape="rect"href="/Top/World/Italiano/">Italiano</a>, <a shape="rect"href="/Top/World/Japanese/">Japanese</a>, <a shape="rect"href="/Top/World/Korean/">Korean</a>, <a shape="rect"href="/Top/World/Nederlands/">Nederlands</a>, <a shape="rect"href="/Top/World/Polska/">Polska</a>, <a shape="rect"href="/Top/World/Svenska/">Svenska</a>, ...</font><p>46 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><tdcolspan="1" nowrap="nowrap" rowspan="1"><fontsize="-1"> </font></td></tr><tr><td colspan="4" rowspan="1"bgcolor="#008000"><img width="1" height="1"alt=""></img></td></tr></table><br clear="none"></br><font size="-1"><ashape="rect"href="http://www.google.com/ads/">Advertise with Us</a> - <ashape="rect"href="http://www.google.com/about.html">Jobs, Press, Cool Stuff...</a></font><p><fontface="arial,sans-serif" size="-1"> ©2004 Google</font></p><brclear="none"></br><table align="center" border="0" bgcolor="#336600"cellpadding="3" cellspacing="0"><tr><td colspan="1" rowspan="1"> <tablewidth="100%" cellpadding="2" cellspacing="0" border="0"><tralign="center"><td colspan="1" rowspan="1"><font face="sans-serif,Arial, Helvetica" size="2" color="#ffffff">Help build the largesthuman-edited directory on the web.</font></td></tr><tr align="center"bgcolor="#cccccc"><td colspan="1" rowspan="1"><font face="sans-serif,Arial, Helvetica" size="2">47 <a shape="rect" href="http://dmoz.org/add.html">48 Submit a Site</a> - <a shape="rect"href="http://dmoz.org/about.html"><b>Open Directory Project</b></a> -49 <a shape="rect" href="http://dmoz.org/cgi-bin/apply.cgi">Becomean Editor</a> </font>50 </td></tr></table>51 </td></tr></table>52 </center></body></html>53[thufir@arrakis tagSoup]$ dateSun Aug 14 23:34:57 IST 2005[thufir@arrakis tagSoup]$Thanks,Thufir 解决方案 这篇关于XHTML到XML的转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 1403页,肝出来的.. 09-07 01:34