问题描述
您好,
我有200个xml文件。每一个都包含路径(类似于网络信息)。每个途径由具有某些属性的实体组成。我想问一下如何为每个xml文件创建一个文本文件,该文件包含此xml文件中所有实体的唯一名称属性。我有这种格式的xml文件:
Hello,
I have 200 xml files. each one consists of pathways (something like network information). each pathway consists of entities with some attributes. I would like to ask how I can creat a text file for each xml file that contain the only name attributes for all the entities inside this xml file. I have the xml files in this format:
<?xml version="1.0" ?>
<!DOCTYPE pathway (View Source for full doctype...)>
- <!-- Creation date: Oct 7, 2014 11:01:31 +0900 (GMT+09:00)
-->
- <pathway name="path:gmx00010" org="gmx" number="00010" title="Glycolysis / Gluconeogenesis">
- <entry id="13" name="gmx:100527532 gmx:100775844 gmx:100778363 gmx:100786504 gmx:100792394 gmx:100795446 gmx:100798677 gmx:100802732 gmx:100815070 gmx:100818383 gmx:100818915 gmx:547751" type="gene" >
</entry>
- <entry id="37" name="gmx:100777399 gmx:100778722 gmx:100782019 gmx:100783726 gmx:100784210 gmx:100786773 gmx:100798020 gmx:100798892 gmx:100800699 gmx:100803104 gmx:100808513 gmx:100809812 gmx:100811186 gmx:100811501 gmx:100811891 gmx:100816594 gmx:100817701 gmx:100819197 gmx:547717" type="gene">
</entry>
- <entry id="38" name="ko:K01905" type="ortholog">
</entry>
- <entry id="39" name="ko:K00129" type="ortholog">
</entry>
我想用Visual C ++编写一个程序来创建一个与xml文件具有相同标题的文本文件,这个文本文件包含name属性值(例如:gmx:100527532 gmx:100775844 gmx:100778363 gmx:100786504 gmx: 100792394 gmx:100795446 gmx:100798677 gmx:100802732 gmx:100815070 gmx:100818383 gmx:100818915 gmx:547751)对于所有类型=gene的实体并忽略任何其他类型的实体。
谢谢。
I want to write a program in visual C++ to create a text file with the same title as the xml file and this text file contains the name attribute values (ex: gmx:100527532 gmx:100775844 gmx:100778363 gmx:100786504 gmx:100792394 gmx:100795446 gmx:100798677 gmx:100802732 gmx:100815070 gmx:100818383 gmx:100818915 gmx:547751) for all the entities of type="gene" and ignore any entity with other types.
Thanks.
推荐答案
if (File.Exists(outputPath)) File.Delete(outputPath);
string output = ApplyTransform(xmlToTransform, xslTemplate);
StreamWriter writer = new StreamWriter(outputPath, false, Encoding.Unicode);
writer.Write(output);
writer.Flush();
writer.Close();
如果你有一个类似下面所示的方法来应用转换。这里有两个问题:
1 - 你必须学习XSL和XPath,不是太困难但它确实需要时间而且确实有一些问题。
2 - 一切都是在记忆中完成这限制了您可以使用的文件大小,对于较大的XML文档可能会非常慢。
Where you have a method something like the one shown below to apply the transform. Two problems here:
1 - You have to learn XSL and XPath, not too difficult but it does take time and it does have a few gotchas.
2 - Everything is done in memory. This limits this size of file you can work with and can be very slow for larger XML documents.
/// <summary>
/// Apply an XSL transform to a well formed XML string
/// returning the transform output as a string.
/// </summary>
/// <param name="xmlToTransform">Well formed XML as a string.</param>
/// <param name="xslTemplate">Full path to an XSL template file.</param>
/// <returns></returns>
public static string ApplyTransform(string xmlToTransform,
string xslTemplate)
{
XmlReader reader = null;
XmlWriter writer = null;
StringWriter sw = new StringWriter();
try
{
// Using a reader allows us to use stylesheets with embedded DTD.
XmlReaderSettings readSettings = new XmlReaderSettings();
readSettings.ProhibitDtd = false;
reader = XmlReader.Create(xslTemplate, readSettings);
// We want the output indented by tag.
XmlWriterSettings writeSettings = new XmlWriterSettings();
writeSettings.OmitXmlDeclaration = true;
writeSettings.ConformanceLevel = ConformanceLevel.Fragment;
writeSettings.CloseOutput = true;
writeSettings.Indent = true;
writeSettings.IndentChars = " ";
writeSettings.NewLineChars = System.Environment.NewLine;
writeSettings.Encoding = Encoding.Unicode;
writeSettings.CheckCharacters = false;
writer = XmlWriter.Create(sw, writeSettings);
// Turn the incoming string into something we can apply a
// a transform to.
XmlDocument dbSchema = new XmlDocument();
dbSchema.LoadXml(xmlToTransform);
XPathNavigator xpath = dbSchema.CreateNavigator();
// Apply the transform.
XslCompiledTransform styleSheet = new XslCompiledTransform(true);
styleSheet.Load(reader);
styleSheet.Transform(xpath, null, writer, null);
}
catch(System.Exception ex)
{
#if DEBUG
System.Diagnostics.Debugger.Break();
#endif
throw ex;
}
finally
{
if (reader != null) reader.Close();
if (writer != null) writer.Close();
}
return sw.ToString();
}
硬编码路线。
这可以简单如下:
The "hard coded" route.
This can be as simple as the following:
ExtractToFile(@"c:\someDirectory\geneInfo.xml",
@"c:\someDirectory\geneInfo.txt",
"entry", "type", "gene", "name");
其中ExtractToFile看起来像这样....
Where ExtractToFile looks like this....
/// <summary>
/// Extract the value of the specified attribute for elements of the
/// specified name where a search attribute has a specific value.
/// </summary>
/// <param name="inFile">full path to source xml</param>
/// <param name="outFile">full path spec of file to create</param>
/// <param name="elementName">The element to find in the XML</param>
/// <param name="attributeName">The search/filter attribute</param>
/// <param name="attributeValue">The required search/filter attribute value.</param>
/// <param name="attributeOut">The attribute for which we want the value.</param>
public static void ExtractToFile(string inFile,
string outFile,
string elementToFind,
string attributeName,
string attributeValue,
string attributeOut) {
// XML is case sensitive, but we're not.
StringComparison ignoreCase = StringComparison.InvariantCultureIgnoreCase;
// Decide how often we're going to dump output from buffer to disk.
int rowCount = 0;
int flushCount = 1000;
if (File.Exists(outFile)) {
File.Delete(outFile);
}
using (StreamWriter output = new StreamWriter(outFile)) {
// We assume the file exists and that the contents are valid XML
// An XMLReader instance will work through all the nodes in the XML from the
// start to the end. All we do is sit and wait for the elements we're
// interested in to come floating past and deal with with them as they do.
using (XmlReader fileReader = XmlReader.Create(inFile))
while(fileReader.Read()) {
if( fileReader.NodeType == XmlNodeType.Element &&
fileReader.Name.Equals(elementToFind, ignoreCase) &&
fileReader.HasAttributes) {
string _find = fileReader.GetAttribute(attributeName);
string _out = fileReader.GetAttribute(attributeOut);
if (_find.Equals(attributeValue, ignoreCase)) {
output.WriteLine(_out);
if (rowCount == flushCount){
rowCount = 0;
output.Flush();
}
}
}
}
}
}
我相信XMLReader仅限于2GB文件。如果你的文件大于这个,那么你将要考虑其他解决方案。这里可能是一个好的起点
[]
I believe that XMLReader is limited to 2GB files. If your files are larger than this you are going to
have to consider alternative solutions. Here might be a good place to start
Parse XML at SAX Speed without DOM or SAX[^]
这篇关于如何在XML文件中搜索字符串并将其写入文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!