在下面的HTML中,我可以解析table
元素,但是我不知道如何跳过th
元素。
我只想获取td
元素,但是当我尝试使用时:
foreach (HtmlNode cell in row.SelectNodes("td"))
...我有一个例外。
<table class="tab03">
<tbody>
<tr>
<th class="right" rowspan="2">first</th>
</tr>
<tr>
<th class="right">lp</th>
<th class="right">name</th>
</tr>
<tr>
<td class="right">1</td>
<td class="left">house</td>
</tr>
<tr>
<th class="right" rowspan="2">Second</th>
</tr>
<tr>
<td class="right">2</td>
<td class="left">door</td>
</tr>
</tbody>
</table>
我的代码:
var document = doc.DocumentNode.SelectNodes("//table");
string store = "";
if (document != null)
{
foreach (HtmlNode table in document)
{
if (table != null)
{
foreach (HtmlNode row in table.SelectNodes("tr"))
{
store = "";
foreach (HtmlNode cell in row.SelectNodes("th|td"))
{
store = store + cell.InnerText+"|";
}
sw.Write(store );
sw.WriteLine();
}
}
}
}
sw.Flush();
sw.Close();
最佳答案
此方法使用LINQ查询名称为HtmlNode
的td
实例。
我还注意到您的输出显示为val|val|
(带有尾随管道),此示例使用string.Join(pipe, array)
作为删除尾随管道的一种不太常见的方法:val|val
。
using System.Linq;
// ...
var tablecollection = doc.DocumentNode.SelectNodes("//table");
string store = string.Empty;
if (tablecollection != null)
{
foreach (HtmlNode table in tablecollection)
{
// For all rows with at least one child with the 'td' tag.
foreach (HtmlNode row in table.DescendantNodes()
.Where(desc =>
desc.Name.Equals("tr", StringComparison.OrdinalIgnoreCase) &&
desc.DescendantNodes().Any(child => child.Name.Equals("td",
StringComparison.OrdinalIgnoreCase))))
{
// Combine the child 'td' elements into an array, join with the pipe
// to create the output in 'val|val|val' format.
store = string.Join("|", row.DescendantNodes().Where(desc =>
desc.Name.Equals("td", StringComparison.OrdinalIgnoreCase))
.Select(desc => desc.InnerText));
// You can probably get rid of the 'store' variable as it's
// no longer necessary to store the value of the table's
// cells over the iteration.
sw.Write(store);
sw.WriteLine();
}
}
}
sw.Flush();
sw.Close();