C＃中的正则表达式查找上述&lt联系; A&GT;具体的结局

本文介绍了C＃中的正则表达式查找上述&lt联系; A&GT;具体的结局的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要在一个字符串中找到的链接（用HTML代码）的正则表达式来获得与文件结尾像gif或png格式

为例字符串链接：

 < A HREF =// site.com/folder/picture.png目标=_空白>图片巴纽< / A>

现在我得到的和℃之间的文字之间的一切; A> 和< / A>

我要得到这

HREF = // site.com/folder/picture.png~~V
字符串= picture.png

我到目前为止的代码：

使用系统; 
使用System.Collections.Generic; 
使用System.ComponentModel; 
使用System.Data这;使用System.Diagnostics程序
; 
使用System.Drawing中; 
使用System.Linq的;使用System.Net 
; 
使用System.Text;使用System.Text.RegularExpressions 
;使用System.Threading.Tasks 
;使用System.Windows.Forms的
; 
 
命名空间下载
 {
公共部分Form1类：表格
 {
公共Form1中（）
 {
的InitializeComponent（） ; 
} 
 
私人无效的button1_Click（对象发件人，EventArgs五）
 {
字符串URL = textBox1.Text; 
字符串s = gethtmlcode（URL）; 
的foreach（LinkItem i的LinkFinder.Find（S））
 {
 richTextBox1.Text + = Convert.ToString（ⅰ）; 
} 
 
}使用
 
 
静态字符串gethtmlcode（字符串URL）
 {
（Web客户端的客户端=新的Web客户端（ ））
 {
串htmlCode = client.DownloadString（URL）; 
返回htmlCode; 
} 
} 
 
公共结构LinkItem 
 {
公共字符串HREF; 
公共字符串文本; 
公共重写字符串的ToString（）
 {
返回HREF +\\\
\t+文字+\\\
\t 
} 
} 
静态类LinkFinder 
 {
公共静态列表< LinkItem>查找（字符串文件）
 {
名单，LT; LinkItem>名单=新名单，LT; LinkItem>（）; 
 
 // 1 
 //查找文件中的所有比赛。 
 MatchCollection M1 = Regex.Matches（文件，@（小于一个*方式>？？*&下; / A>），
 RegexOptions.Singleline）; 
 
 // 2. 
 //循环每场比赛。 
的foreach（M1中的匹配M）
 {
字符串值= m.Groups [1] .value的; 
 LinkItem I =新LinkItem（）; 
 
 // 3，
 //获取href属性。 
匹配M2 = Regex.Match（价值@的href = \（*）\，
 RegexOptions.Singleline？）; 
如果（m2.Success）
 {
 i.Href = m2.Groups [1] .value的; 
} 
 
 // 4. 
 //从文本中删除标签内。 
串T = Regex.Replace（价值@\s *< * GT; \s *，，
 RegexOptions.Singleline）; 
 i.Text = T; 
 
 list.Add（ⅰ）; 
} 
返回列表; 
} 
} 
 
} 
 
}

解决方案

我可以建议使用 HtmlAgilityPack 完成这个任务。安装使用的管理的NuGet软件包的解决方案的菜单，并添加下面的方法：

  ///< ;总结> 
 ///收集了href属性值和节点值，如果图像的扩展名是JPG或PNG 
 ///< /总结> 
 ///< PARAM NAME =HTML> HTML字符串或URL< /参数> 
 ///<退货和GT; HREF值和节点值℃的键值对清单; /回报> 
私有列表< KeyValuePair<字符串，字符串>> GetLinksWithHtmlAgilityPack（字符串HTML）
 {
 VAR的结果=新的List< KeyValuePair<字符串，字符串>>（）; 
 HtmlAgilityPack.HtmlDocument HAP; 
乌里uriResult; 
如果（Uri.TryCreate（HTML，UriKind.Absolute，出uriResult）及和放大器; uriResult.Scheme == Uri.UriSchemeHttp）
 {// HTML是一个URL 
 VAR DOC =新HtmlAgilityPack.HtmlWeb（）; 
 HAP = doc.Load（uriResult.AbsoluteUri）; 
} 
，否则
 {// HTML是一个字符串
 HAP =新HtmlAgilityPack.HtmlDocument（）; 
 hap.LoadHtml（HTML）; 
} 
 VAR节点= hap.DocumentNode.SelectNodes（// A）; 
如果（节点！= NULL）
的foreach（VAR中的节点的节点）
如果（Path.GetExtension（node.InnerText.Trim（））。ToLower将（）==巴纽 || 
 Path.GetExtension（node.InnerText.Trim（））ToLower将（）==.JPG）
 result.Add（新KeyValuePair<字符串，字符串>（node.GetAttributeValue（ HREF，NULL），node.InnerText））; 
返回结果; 
}

然后，（我使用的是虚拟字符串，仅用于演示使用）

  VAR的结果= GetLinksWithHtmlAgilityPack（< A HREF = \// site.com/folder/picture.png \目标= \_blank\> picture.png< / A>< A HREF = \// site.com/folder/picture.bmp\TARGET = \_blank\\ \\> PICTURE.BMP&下; / A>中）;

输出：

或者，用网址，喜欢的东西：

  VAR的结果= GetLinksWithHtmlAgilityPack（http://www.google.com）;

I need a regex pattern for finding links in a string (with HTML code) to get the links with file endings like .gif or .png

Example String:

<a href="//site.com/folder/picture.png" target="_blank">picture.png</a>

For now I get everything between the " " and the text between the <a> and </a>.

I want to get this:

Href = //site.com/folder/picture.pngString = picture.png

My code so far:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Diagnostics;
using System.Drawing;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace downloader
{
public partial class Form1 : Form
{
    public Form1()
    {
        InitializeComponent();
    }

    private void button1_Click(object sender, EventArgs e)
    {
        string url = textBox1.Text;
        string s = gethtmlcode(url);
        foreach (LinkItem i in LinkFinder.Find(s))
        {
            richTextBox1.Text += Convert.ToString(i);
        }

    }


    static string gethtmlcode(string url)
    {
        using (WebClient client = new WebClient())
        {
            string htmlCode = client.DownloadString(url);
            return htmlCode;
        }
    }

    public struct LinkItem
    {
        public string Href;
        public string Text;
        public override string ToString()
        {
            return Href + "\n\t" + Text + "\n\t";
        }
    }
    static class LinkFinder
    {
        public static List<LinkItem> Find(string file)
        {
            List<LinkItem> list = new List<LinkItem>();

            // 1.
            // Find all matches in file.
            MatchCollection m1 = Regex.Matches(file, @"(<a.*?>.*?</a>)",
                RegexOptions.Singleline);

            // 2.
            // Loop over each match.
            foreach (Match m in m1)
            {
                string value = m.Groups[1].Value;
                LinkItem i = new LinkItem();

                // 3.
                // Get href attribute.
                Match m2 = Regex.Match(value, @"href=\""(.*?)\""",
                RegexOptions.Singleline);
                if (m2.Success)
                {
                    i.Href = m2.Groups[1].Value;
                }

                // 4.
                // Remove inner tags from text.
                string t = Regex.Replace(value, @"\s*<.*?>\s*", "",
                RegexOptions.Singleline);
                i.Text = t;

                list.Add(i);
            }
            return list;
        }
    }

}

}

解决方案

I can suggest using HtmlAgilityPack for this task. Install using Manage NuGet Packages for Solution menu, and add the following method:

/// <summary>
/// Collects a href attribute values and a node values if image extension is jpg or png
/// </summary>
/// <param name="html">HTML string or an URL</param>
/// <returns>A key-value pair list of href values and a node values</returns>
private List<KeyValuePair<string, string>> GetLinksWithHtmlAgilityPack(string html)
{
    var result = new List<KeyValuePair<string, string>>();
    HtmlAgilityPack.HtmlDocument hap;
    Uri uriResult;
    if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
    { // html is a URL
        var doc = new HtmlAgilityPack.HtmlWeb();
        hap = doc.Load(uriResult.AbsoluteUri);
    }
    else
    { // html is a string
        hap = new HtmlAgilityPack.HtmlDocument();
        hap.LoadHtml(html);
    }
    var nodes = hap.DocumentNode.SelectNodes("//a");
    if (nodes != null)
        foreach (var node in nodes)
            if (Path.GetExtension(node.InnerText.Trim()).ToLower() == ".png" ||
                    Path.GetExtension(node.InnerText.Trim()).ToLower() == ".jpg")
            result.Add(new KeyValuePair<string,string>(node.GetAttributeValue("href", null), node.InnerText));
    return result;
}

Then, use it as (I am using a dummy string, just for demo)

var result = GetLinksWithHtmlAgilityPack("<a href=\"//site.com/folder/picture.png\" target=\"_blank\">picture.png</a><a href=\"//site.com/folder/picture.bmp\" target=\"_blank\">picture.bmp</a>");

Output:

Or, with a URL, something like:

var result = GetLinksWithHtmlAgilityPack("http://www.google.com");

这篇关于C＃中的正则表达式查找上述&lt联系; A&GT;具体的结局的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！