本文介绍了如何从网站conatct页面获取只有公司地址块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从网站联系页面获取公司地址块



i试过这个..



How to get only company address block from website conatct page

i have tried this..

public void Extract_all_text_from_webpage(string filename)
{
    HtmlDocument document = new HtmlDocument();
    document.Load(new MemoryStream(File.ReadAllBytes(filename)));
    textBox1.Text += Environment.NewLine + (ExtractViewableTextCleaned(document.DocumentNode));
   // if (_addressDictionaries.AddressDictDuplicates.Contains(ExtractViewableTextCleaned(document.DocumentNode)))
    {
        listBox1.Items.Add(Environment.NewLine + (ExtractViewableTextCleaned(document.DocumentNode)));
    }
}

public static string ExtractViewableTextCleaned(HtmlNode node)
{
    string textWithLotsOfWhiteSpaces = ExtractViewableText(node);
    return _removeRepeatedWhitespaceRegex.Replace(textWithLotsOfWhiteSpaces, " ").Replace(" ","").Replace("©","");
}

public static string ExtractViewableText(HtmlNode node)
{
    StringBuilder sb = new StringBuilder();
    ExtractViewableTextHelper(sb, node);
    return sb.ToString();
}

private static void ExtractViewableTextHelper(StringBuilder sb, HtmlNode node)
{
    if (node.Name != "script" && node.Name != "style" && node.Name!="a")
    {
        if (node.NodeType == HtmlNodeType.Text)
        {
            AppendNodeText(sb, node);
        }

        foreach (HtmlNode child in node.ChildNodes)
        {
            ExtractViewableTextHelper(sb, child);
        }
    }
}

private static void AppendNodeText(StringBuilder sb, HtmlNode node)
{
    string text = ((HtmlTextNode)node).Text;
    if (string.IsNullOrWhiteSpace(text) == false)
    {
        sb.Append(Environment.NewLine + text);

        // If the last char isn't a white-space, add a white space
        // otherwise words will be added ontop of each other when they're only separated by
        // tags
        if (text.EndsWith("\t") || text.EndsWith("\n") || text.EndsWith(" ") || text.EndsWith("\r"))
        {
            // We're good!
        }
        else
        {
            sb.Append(" ");
        }
    }
}

推荐答案


这篇关于如何从网站conatct页面获取只有公司地址块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 17:54