如何在XML文件中搜索字符串并将其写入文本文件

本文介绍了如何在XML文件中搜索字符串并将其写入文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好，

我有200个xml文件。每一个都包含路径（类似于网络信息）。每个途径由具有某些属性的实体组成。我想问一下如何为每个xml文件创建一个文本文件，该文件包含此xml文件中所有实体的唯一名称属性。我有这种格式的xml文件：

Hello,
I have 200 xml files. each one consists of pathways (something like network information). each pathway consists of entities with some attributes. I would like to ask how I can creat a text file for each xml file that contain the only name attributes for all the entities inside this xml file. I have the xml files in this format:

<?xml version="1.0" ?> 
  <!DOCTYPE pathway (View Source for full doctype...)> 
- <!--  Creation date: Oct 7, 2014 11:01:31 +0900 (GMT+09:00) 
  --> 
- <pathway name="path:gmx00010" org="gmx" number="00010" title="Glycolysis / Gluconeogenesis">

- <entry id="13" name="gmx:100527532 gmx:100775844 gmx:100778363 gmx:100786504 gmx:100792394 gmx:100795446 gmx:100798677 gmx:100802732 gmx:100815070 gmx:100818383 gmx:100818915 gmx:547751" type="gene" >
  </entry>

- <entry id="37" name="gmx:100777399 gmx:100778722 gmx:100782019 gmx:100783726 gmx:100784210 gmx:100786773 gmx:100798020 gmx:100798892 gmx:100800699 gmx:100803104 gmx:100808513 gmx:100809812 gmx:100811186 gmx:100811501 gmx:100811891 gmx:100816594 gmx:100817701 gmx:100819197 gmx:547717" type="gene">
  </entry>

- <entry id="38" name="ko:K01905" type="ortholog">
  </entry>

- <entry id="39" name="ko:K00129" type="ortholog">
  </entry>

我想用Visual C ++编写一个程序来创建一个与xml文件具有相同标题的文本文件，这个文本文件包含name属性值（例如：gmx：100527532 gmx：100775844 gmx：100778363 gmx：100786504 gmx： 100792394 gmx：100795446 gmx：100798677 gmx：100802732 gmx：100815070 gmx：100818383 gmx：100818915 gmx：547751）对于所有类型=gene的实体并忽略任何其他类型的实体。

谢谢。

I want to write a program in visual C++ to create a text file with the same title as the xml file and this text file contains the name attribute values (ex: gmx:100527532 gmx:100775844 gmx:100778363 gmx:100786504 gmx:100792394 gmx:100795446 gmx:100798677 gmx:100802732 gmx:100815070 gmx:100818383 gmx:100818915 gmx:547751) for all the entities of type="gene" and ignore any entity with other types.

Thanks.

推荐答案

if (File.Exists(outputPath)) File.Delete(outputPath);
string output = ApplyTransform(xmlToTransform, xslTemplate);
StreamWriter writer = new StreamWriter(outputPath, false, Encoding.Unicode);
writer.Write(output);
writer.Flush();
writer.Close();

如果你有一个类似下面所示的方法来应用转换。这里有两个问题：

1 - 你必须学习XSL和XPath，不是太困难但它确实需要时间而且确实有一些问题。

2 - 一切都是在记忆中完成这限制了您可以使用的文件大小，对于较大的XML文档可能会非常慢。

Where you have a method something like the one shown below to apply the transform. Two problems here:
1 - You have to learn XSL and XPath, not too difficult but it does take time and it does have a few gotchas.
2 - Everything is done in memory. This limits this size of file you can work with and can be very slow for larger XML documents.

/// <summary>
/// Apply an XSL transform to a well formed XML string
/// returning the transform output as a string.
/// </summary>
/// <param name="xmlToTransform">Well formed XML as a string.</param>
/// <param name="xslTemplate">Full path to an XSL template file.</param>
/// <returns></returns>
public static string ApplyTransform(string xmlToTransform,
                                    string xslTemplate)
{

  XmlReader reader = null;
  XmlWriter writer = null;
  StringWriter sw = new StringWriter();

  try
  {

    // Using a reader allows us to use stylesheets with embedded DTD.
    XmlReaderSettings readSettings = new XmlReaderSettings();
    readSettings.ProhibitDtd = false;
    reader = XmlReader.Create(xslTemplate, readSettings);

    // We want the output indented by tag.
    XmlWriterSettings writeSettings = new XmlWriterSettings();
    writeSettings.OmitXmlDeclaration = true;
    writeSettings.ConformanceLevel = ConformanceLevel.Fragment;
    writeSettings.CloseOutput = true;
    writeSettings.Indent = true;
    writeSettings.IndentChars = "  ";
    writeSettings.NewLineChars = System.Environment.NewLine;
    writeSettings.Encoding = Encoding.Unicode;
    writeSettings.CheckCharacters = false;
    writer = XmlWriter.Create(sw, writeSettings);

    // Turn the incoming string into something we can apply a
    // a transform to.
    XmlDocument dbSchema = new XmlDocument();
    dbSchema.LoadXml(xmlToTransform);
    XPathNavigator xpath = dbSchema.CreateNavigator();

    // Apply the transform.
    XslCompiledTransform styleSheet = new XslCompiledTransform(true);
    styleSheet.Load(reader);
    styleSheet.Transform(xpath, null, writer, null);

  }
  catch(System.Exception ex)
  {
    #if DEBUG
    System.Diagnostics.Debugger.Break();
    #endif
    throw ex;
  }
  finally
  {
    if (reader != null) reader.Close();
    if (writer != null) writer.Close();
  }

  return sw.ToString();

}

硬编码路线。

这可以简单如下：

The "hard coded" route.
This can be as simple as the following:

ExtractToFile(@"c:\someDirectory\geneInfo.xml",
              @"c:\someDirectory\geneInfo.txt",
              "entry", "type", "gene", "name");

其中ExtractToFile看起来像这样....

Where ExtractToFile looks like this....

/// <summary>
/// Extract the value of the specified attribute for elements of the
/// specified name where a search attribute has a specific value.
/// </summary>
/// <param name="inFile">full path to source xml</param>
/// <param name="outFile">full path spec of file to create</param>
/// <param name="elementName">The element to find in the XML</param>
/// <param name="attributeName">The search/filter attribute</param>
/// <param name="attributeValue">The required search/filter attribute value.</param>
/// <param name="attributeOut">The attribute for which we want the value.</param>
public static void ExtractToFile(string inFile,
                                 string outFile,
                                 string elementToFind,
                                 string attributeName,
                                 string attributeValue,
                                 string attributeOut) {

  // XML is case sensitive, but we're not.
  StringComparison ignoreCase = StringComparison.InvariantCultureIgnoreCase;

  // Decide how often we're going to dump output from buffer to disk.
  int rowCount   = 0;
  int flushCount = 1000;

  if (File.Exists(outFile)) {
    File.Delete(outFile);
  }

  using (StreamWriter output = new StreamWriter(outFile)) {

    // We assume the file exists and that the contents are valid XML
    // An XMLReader instance will work through all the nodes in the XML from the
    // start to the end. All we do is sit and wait for the elements we're
    // interested in to come floating past and deal with with them as they do.
    using (XmlReader fileReader = XmlReader.Create(inFile))

      while(fileReader.Read()) {
        if( fileReader.NodeType == XmlNodeType.Element &&
            fileReader.Name.Equals(elementToFind, ignoreCase) &&
            fileReader.HasAttributes) {

          string _find = fileReader.GetAttribute(attributeName);
          string _out  = fileReader.GetAttribute(attributeOut);

          if (_find.Equals(attributeValue, ignoreCase)) {
            output.WriteLine(_out);
            if (rowCount == flushCount){
              rowCount = 0;
              output.Flush();
          }
        }
      }
    }
  }
}

我相信XMLReader仅限于2GB文件。如果你的文件大于这个，那么你将要考虑其他解决方案。这里可能是一个好的起点

[]

I believe that XMLReader is limited to 2GB files. If your files are larger than this you are going to
have to consider alternative solutions. Here might be a good place to start

Parse XML at SAX Speed without DOM or SAX[^]

这篇关于如何在XML文件中搜索字符串并将其写入文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！