



I have 200 xml files. each one consists of pathways (something like network information). each pathway consists of entities with some attributes. I would like to ask how I can creat a text file for each xml file that contain the only name attributes for all the entities inside this xml file. I have the xml files in this format:

<?xml version="1.0" ?> 
  <!DOCTYPE pathway (View Source for full doctype...)> 
- <!--  Creation date: Oct 7, 2014 11:01:31 +0900 (GMT+09:00) 
- <pathway name="path:gmx00010" org="gmx" number="00010" title="Glycolysis / Gluconeogenesis">

- <entry id="13" name="gmx:100527532 gmx:100775844 gmx:100778363 gmx:100786504 gmx:100792394 gmx:100795446 gmx:100798677 gmx:100802732 gmx:100815070 gmx:100818383 gmx:100818915 gmx:547751" type="gene" >

- <entry id="37" name="gmx:100777399 gmx:100778722 gmx:100782019 gmx:100783726 gmx:100784210 gmx:100786773 gmx:100798020 gmx:100798892 gmx:100800699 gmx:100803104 gmx:100808513 gmx:100809812 gmx:100811186 gmx:100811501 gmx:100811891 gmx:100816594 gmx:100817701 gmx:100819197 gmx:547717" type="gene">

- <entry id="38" name="ko:K01905" type="ortholog">

- <entry id="39" name="ko:K00129" type="ortholog">

I want to write a program in visual C++ to create a text file with the same title as the xml file and this text file contains the name attribute values (ex: gmx:100527532 gmx:100775844 gmx:100778363 gmx:100786504 gmx:100792394 gmx:100795446 gmx:100798677 gmx:100802732 gmx:100815070 gmx:100818383 gmx:100818915 gmx:547751) for all the entities of type="gene" and ignore any entity with other types.



if (File.Exists(outputPath)) File.Delete(outputPath);
string output = ApplyTransform(xmlToTransform, xslTemplate);
StreamWriter writer = new StreamWriter(outputPath, false, Encoding.Unicode);


1 - 你必须学习XSL和XPath,不是太困难但它确实需要时间而且确实有一些问题。

2 - 一切都是在记忆中完成这限制了您可以使用的文件大小,对于较大的XML文档可能会非常慢。

Where you have a method something like the one shown below to apply the transform. Two problems here:
1 - You have to learn XSL and XPath, not too difficult but it does take time and it does have a few gotchas.
2 - Everything is done in memory. This limits this size of file you can work with and can be very slow for larger XML documents.

/// <summary>
/// Apply an XSL transform to a well formed XML string
/// returning the transform output as a string.
/// </summary>
/// <param name="xmlToTransform">Well formed XML as a string.</param>
/// <param name="xslTemplate">Full path to an XSL template file.</param>
/// <returns></returns>
public static string ApplyTransform(string xmlToTransform,
                                    string xslTemplate)

  XmlReader reader = null;
  XmlWriter writer = null;
  StringWriter sw = new StringWriter();


    // Using a reader allows us to use stylesheets with embedded DTD.
    XmlReaderSettings readSettings = new XmlReaderSettings();
    readSettings.ProhibitDtd = false;
    reader = XmlReader.Create(xslTemplate, readSettings);

    // We want the output indented by tag.
    XmlWriterSettings writeSettings = new XmlWriterSettings();
    writeSettings.OmitXmlDeclaration = true;
    writeSettings.ConformanceLevel = ConformanceLevel.Fragment;
    writeSettings.CloseOutput = true;
    writeSettings.Indent = true;
    writeSettings.IndentChars = "  ";
    writeSettings.NewLineChars = System.Environment.NewLine;
    writeSettings.Encoding = Encoding.Unicode;
    writeSettings.CheckCharacters = false;
    writer = XmlWriter.Create(sw, writeSettings);

    // Turn the incoming string into something we can apply a
    // a transform to.
    XmlDocument dbSchema = new XmlDocument();
    XPathNavigator xpath = dbSchema.CreateNavigator();

    // Apply the transform.
    XslCompiledTransform styleSheet = new XslCompiledTransform(true);
    styleSheet.Transform(xpath, null, writer, null);

  catch(System.Exception ex)
    #if DEBUG
    throw ex;
    if (reader != null) reader.Close();
    if (writer != null) writer.Close();

  return sw.ToString();




The "hard coded" route.
This can be as simple as the following:

              "entry", "type", "gene", "name");


Where ExtractToFile looks like this....

/// <summary>
/// Extract the value of the specified attribute for elements of the
/// specified name where a search attribute has a specific value.
/// </summary>
/// <param name="inFile">full path to source xml</param>
/// <param name="outFile">full path spec of file to create</param>
/// <param name="elementName">The element to find in the XML</param>
/// <param name="attributeName">The search/filter attribute</param>
/// <param name="attributeValue">The required search/filter attribute value.</param>
/// <param name="attributeOut">The attribute for which we want the value.</param>
public static void ExtractToFile(string inFile,
                                 string outFile,
                                 string elementToFind,
                                 string attributeName,
                                 string attributeValue,
                                 string attributeOut) {

  // XML is case sensitive, but we're not.
  StringComparison ignoreCase = StringComparison.InvariantCultureIgnoreCase;

  // Decide how often we're going to dump output from buffer to disk.
  int rowCount   = 0;
  int flushCount = 1000;

  if (File.Exists(outFile)) {

  using (StreamWriter output = new StreamWriter(outFile)) {

    // We assume the file exists and that the contents are valid XML
    // An XMLReader instance will work through all the nodes in the XML from the
    // start to the end. All we do is sit and wait for the elements we're
    // interested in to come floating past and deal with with them as they do.
    using (XmlReader fileReader = XmlReader.Create(inFile))

      while(fileReader.Read()) {
        if( fileReader.NodeType == XmlNodeType.Element &&
            fileReader.Name.Equals(elementToFind, ignoreCase) &&
            fileReader.HasAttributes) {

          string _find = fileReader.GetAttribute(attributeName);
          string _out  = fileReader.GetAttribute(attributeOut);

          if (_find.Equals(attributeValue, ignoreCase)) {
            if (rowCount == flushCount){
              rowCount = 0;



I believe that XMLReader is limited to 2GB files. If your files are larger than this you are going to
have to consider alternative solutions. Here might be a good place to start

Parse XML at SAX Speed without DOM or SAX[^]


