问题描述
如何解析名称:&来自DIHtmlParser的标签内的值我尝试使用来自Clever Components的TCLHtmlParser,但它失败了。第二个问题是可以DIHtmlParser解析个别标签,例如循环通过其子标签。对于这样一个简单的问题,它是一个完全的噩梦。
How can I parse Name: & Value text from within the tag with DIHtmlParser? I tried doing it with TCLHtmlParser from Clever Components but it failed. Second question is can DIHtmlParser parse individual tags for example loop through its sub tags. Its a total nightmare for such a simple problem.
<div class="tvRow tvFirst hasLabel tvFirst" title="example1">
<label class="tvLabel">Name:</label>
<span class="tvValue">Value</span>
<div class="clear"></div></div>
<div class="tvRow tvFirst hasLabel tvFirst" title="example2">
<label class="tvLabel">Name:</label>
<span class="tvValue">Value</span>
<div class="clear"></div></div>
推荐答案
您可以使用 DOM来解析你的任何元素需要从HTML:
You could use IHTMLDocument2
DOM to parse whatever elements you need from the HTML:
uses ActiveX, MSHTML;
const
HTML =
'<div class="tvRow tvFirst hasLabel tvFirst" title="example1">' +
'<label class="tvLabel">Name:</label>' +
'<span class="tvValue">Value</span>' +
'<div class="clear"></div>' +
'</div>';
procedure TForm1.Button1Click(Sender: TObject);
var
doc: OleVariant;
el: OleVariant;
i: Integer;
begin
doc := coHTMLDocument.Create as IHTMLDocument2;
doc.write(HTML);
doc.close;
ShowMessage(doc.body.innerHTML);
for i := 0 to doc.body.all.length - 1 do
begin
el := doc.body.all.item(i);
if (el.tagName = 'LABEL') and (el.className = 'tvLabel') then
ShowMessage(el.innerText);
if (el.tagName = 'SPAN') and (el.className = 'tvValue') then
ShowMessage(el.innerText);
end;
end;
我想提一个非常好的HTML解析器今天发现:(Delphi Dom HTML解析器和转换器)。它显然不如 IHTMLDocument2
那么灵活,但它很容易使用,快速,免费,并支持旧版Delphi版本的Unicode。
I wanted to mention another very nice HTML parser I found today: htmlp
(Delphi Dom HTML Parser and Converter). It's not as flexible as the IHTMLDocument2
obviously, but it's very easy to work with, fast, free, and supports Unicode for older Delphi versions.
使用示例:
uses HtmlParser, DomCore;
function GetDocBody(HtmlDoc: TDocument): TElement;
var
i: integer;
node: TNode;
begin
Result := nil;
for i := 0 to HtmlDoc.documentElement.childNodes.length - 1 do
begin
node := HtmlDoc.documentElement.childNodes.item(i);
if node.nodeName = 'body' then
begin
Result := node as TElement;
Break;
end;
end;
end;
procedure THTMLForm.Button2Click(Sender: TObject);
var
HtmlParser: THtmlParser;
HtmlDoc: TDocument;
i: Integer;
body, el: TElement;
node: TNode;
begin
HtmlParser := THtmlParser.Create;
try
HtmlDoc := HtmlParser.parseString(HTML);
try
body := GetDocBody(HtmlDoc);
if Assigned(body) then
for i := 0 to body.childNodes.length - 1 do
begin
node := body.childNodes.item(i);
if (node is TElement) then
begin
el := node as TElement;
if (el.tagName = 'div') and (el.GetAttribute('class') = 'tvRow tvFirst hasLabel tvFirst') then
begin
// iterate el.childNodes here...
ShowMessage(IntToStr(el.childNodes.length));
end;
end;
end;
finally
HtmlDoc.Free;
end;
finally
HtmlParser.Free
end;
end;
这篇关于HTML标签解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!