我试图使用Xpath获取DataTable标头。

我的输出应该是:


ItemNum |项目|的ResultCode |状态| ExtBackLinks | RefDomains | AnalysisResUnitsCost | ACRank |的ItemType | IndexedURLs | GetTopBackLinksAnalysisResUnitsCost | DownloadBacklinksAnalysisResUnitsCost | DownloadRefDomainBacklinksAnalysisResUnitsCost | RefIPs | RefSubNets | RefDomainsEDU | ExtBackLinksEDU | RefDomainsGOV | ExtBackLinksGOV | RefDomainsEDU_Exact | ExtBackLinksEDU_Exact | RefDomainsGOV_Exact | ExtBackLinksGOV_Exact | CrawledFlag | LastCrawlDate | LastCrawlResult | RedirectFlag | FinalRedirectResult | OutDomainsExternal | OutLinksExternal | OutLinksInternal | OutLinksPages | LastSeen |标题| RedirectTo |语言LanguageDesc | LanguageConfidence | LanguagePageRatios | LanguageTotalPages | RefLanguage | RefLanguageDesc | RefLanguageConfidence | RefLanguagePageRatios | RefLanguageTotalPages | CrawledURLs | RootDomainIPAddress | TotalNonUniqueLinks | NonUniqueLinkTypeHomepages | NonUniqueLinkTypeIndirect | NonUniqueLinkTypeDeleted | NonUniqueLinkTypeNoFollow | NonUniqueLinkTypeProtocolHTTPS | NonUniqueLinkTypeFrame | NonUniqueLinkTypeImageLink | NonUniqueLinkTypeRedirect | NonUnique LinkTypeTextLink | RefDomainTypeLive | RefDomainTypeFollow | RefDomainTypeHomepageLink | RefDomainTypeDirect | RefDomainTypeProtocolHTTPS | CitationFlow | TrustFlow | TrustMetric | TopicalTrustFlow_Topic_0 | TopicalTrustFlow_Value_0 | TopicalTrustFlow_Topic_1 | TopicalTrustFlow_Value_1 | TopicalTrustFlow_Value_1 | TopicalTrustFlow_Value_1


这是原始的XML:



<Result Code="OK" ErrorMessage="" FullError="">
<GlobalVars FirstBackLinkDate="2012-09-21" IndexBuildDate="2018-05-24 19:47:18" IndexType="0" MostRecentBackLinkDate="2018-04-23" QueriedRootDomains="1" QueriedSubDomains="0" QueriedURLs="0" QueriedURLsMayExist="0" ServerBuild="2018-06-11 13:52:01" ServerName="BRUNO28" ServerVersion="1.0.6736.23160" UniqueIndexID="20180524194718-HISTORICAL"/>
<DataTables Count="1">
<DataTable Name="Results" RowsCount="1" Headers="ItemNum|Item|ResultCode|Status|ExtBackLinks|RefDomains|AnalysisResUnitsCost|ACRank|ItemType|IndexedURLs|GetTopBackLinksAnalysisResUnitsCost|DownloadBacklinksAnalysisResUnitsCost|DownloadRefDomainBacklinksAnalysisResUnitsCost|RefIPs|RefSubNets|RefDomainsEDU|ExtBackLinksEDU|RefDomainsGOV|ExtBackLinksGOV|RefDomainsEDU_Exact|ExtBackLinksEDU_Exact|RefDomainsGOV_Exact|ExtBackLinksGOV_Exact|CrawledFlag|LastCrawlDate|LastCrawlResult|RedirectFlag|FinalRedirectResult|OutDomainsExternal|OutLinksExternal|OutLinksInternal|OutLinksPages|LastSeen|Title|RedirectTo|Language|LanguageDesc|LanguageConfidence|LanguagePageRatios|LanguageTotalPages|RefLanguage|RefLanguageDesc|RefLanguageConfidence|RefLanguagePageRatios|RefLanguageTotalPages|CrawledURLs|RootDomainIPAddress|TotalNonUniqueLinks|NonUniqueLinkTypeHomepages|NonUniqueLinkTypeIndirect|NonUniqueLinkTypeDeleted|NonUniqueLinkTypeNoFollow|NonUniqueLinkTypeProtocolHTTPS|NonUniqueLinkTypeFrame|NonUniqueLinkTypeImageLink|NonUniqueLinkTypeRedirect|NonUniqueLinkTypeTextLink|RefDomainTypeLive|RefDomainTypeFollow|RefDomainTypeHomepageLink|RefDomainTypeDirect|RefDomainTypeProtocolHTTPS|CitationFlow|TrustFlow|TrustMetric|TopicalTrustFlow_Topic_0|TopicalTrustFlow_Value_0|TopicalTrustFlow_Topic_1|TopicalTrustFlow_Value_1|TopicalTrustFlow_Topic_2|TopicalTrustFlow_Value_2" MaxTopicsRootDomain="30" MaxTopicsSubDomain="20" MaxTopicsURL="10" TopicsCount="3">
<Row>
0|nu.nl|OK|Found|508322106|165344|508322106|-1|1|4149991|5000|512472097|3356880|59147|26204|233|3613|43|308|73|1757|4|12|False| | |True| |5|10|44|1722150| |NU - Het laatste nieuws het eerst op NU.nl|https://www.nu.nl/|nl|Dutch/Flemish|92|99.9|482980|nl,en,de|Dutch/Flemish,English,German|87,93,58|96.5,3.1,0.1|76319583|1915923|52.85.201.19|611833777|15034990|53120677|444371798|95283418|52384870|388104|53497551|5655999|552292123|102171|115787|21952|150164|49554|76|70|70|News/Breaking News|69|Sports/Resources|45|Arts/Radio|43
</Row>
</DataTable>
</DataTables>
</Result>





当我在Google表格中使用此Xpath命令时:

=importxml("http://enterprise.majesticseo.com/api_command?privatekey=xxx&accessToken=xxx&cmd=GetIndexItemInfo&item0=nu.nl&items=1","//DataTable"


我得到行结果。很棒,但是我还需要在工作表的第一行中添加标题名称。

最佳答案

XPath的简短介绍:-)
使用//DataTable,您将在XML中的任何位置获取任何<DataTable>的完整节点(此处不涉及名称空间)。
根据经验,最好是尽可能具体(而不是使用/Result/DataTables/DataTable)。但这不是您问题的答案...
试想一下这样的XML:

<root>
  <innerNode attr="1"><a>Some a content</a><b>Some b content</b></innerNode>
  <innerNode attr="2"><a>aaa</a><b>bbb</b></innerNode>
</root>

使用/root/innerNode,您将同时获得所有内容的<innerNode>
使用/root/innerNode[(b/text())[1]="bbb"]只会得到一个<innerNode>,其中<b>text()"bbb"
使用/root/innerNode[@attr="1"],您将得到一个<innerNode>,其中属性attr的值为“ 2”。
所有三个XPath样本都带回整个节点,包括子节点,属性等等。
如果只需要属性的值,则必须要求它:
(/root/innerNode/@attr)[2]

...返回第二个<innerNode>的属性值(实际上是第二次出现)
/root/innerNode[(b/text())[1]="Some b content"]/@attr

...返回<innerNode>的属性值,其中<b>具有text() 0f "Some b content"
回到你的问题
您想读取位于Headers的元素<DataTable>中的属性/Result/DataTables。只需使用
/Result/DataTables/DataTable/@Headers

关于xml - XML输出所需的Xpath帮助,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51040348/

10-10 04:48