本文介绍了从文本文档中提取名词的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 嘿,我目前正在从事自然语言项目。所以最初的任务是从文本中提取关键字。现在dat完成了,我将把代码放在这里。任何人都可以提出一些技巧,通过进一步修改代码来从文本中提取名词。 命名空间 maxrep { class 计划 { 静态 void Main( string [] args) { string filename = hello.txt; // string filename1 =text.txt; / * * * List< streamreader> SRL = new List< streamreader>(); for(int i = 1; i< foo.number_of_files + 1; i ++)> { StreamReader aa = new StreamReader(@realtime_+ Foo.main_id +_+ i +。txt); SRL.Add(aa); } * / string inputString = File.ReadAllText(filename); // string inputStr = File.ReadAllText(filename1); inputString = inputString.ToLower(); // 定义要从输入中剥离的字符并执行 string [] stripChars = { ;, ,, 。, - , _, ^, (, ), [, ], 0, 1, 2, 3, 4, 5, 6 , 7, 8, 9 , \ n, \t, \ r}; foreach (字符串字符 in stripChars) { inputString = inputString.Replace(character, ); } List< string> wordList = inputString.Split(' ')。ToList(); string [] stopwords = new string [] { 和, , 她, for, this, you, 但}; // string [] negative = new string [] {bad,bad,low ,减少,失败,减少,弱,悲伤}; foreach (字符串字 停用词) { while (wordList.Contains(word)) { wordList.Remove(word); } } 字典< string,int> dictionary = new Dictionary< string,int>(); foreach ( string word in wordList) { if (word.Length > = 3 ) { if (dictionary.ContainsKey(word)) { dictionary [word] ++; } else { dictionary [word] = 1 ; } } } var sortedDict =(来自条目 字典 orderby entry.Value 降序 选择条目。.ToDictionary(pair = > pair.Key,pair = > pair.Value); int count = 1 ; Console.WriteLine( ----文件中最常用的术语: + filename + ----); Console.WriteLine(); foreach (KeyValuePair< string,int> pair in sortedDict) { Console.WriteLine(count + \t + pair.Key + \t + pair.Value); count ++; } Console.ReadKey(); } } } 解决方案 我修复了问题中代码的格式。 但是,你试图获取一个排序字典是行不通的。 使用 .ToDictionary(...)将其变回常规 词典,但不保留任何订单。 看起来你可以使用查询使 IEnumerable< KeyValuePair< string,int>> 并迭代: var sortedWordCounts = 来自条目 字典 orderby entry.Value descending select 条目; int count = 1 ; Console.WriteLine( ----文件中最常用的术语: + filename + ----); Console.WriteLine(); foreach ( var 对 sortedWordCounts) { Console.WriteLine(count + \t + pair.Key + \t + pair.Value); count ++; } Console.ReadKey(); 如果你真的需要按照排序顺序保存集合,你应该使用 .ToList()或 .ToArray()。 hey i am currently working on a natural language project. So at first the task at had was to extract the keywords out of a text. Now dat is done and i am gonna put the codes in here. Can anyone suggest some techniques to extract the nouns out of the text by further modifying the code.namespace maxrep{ class Program { static void Main(string[] args) { string filename = "hello.txt"; // string filename1 = "text.txt"; /* * *List<streamreader> SRL = new List<streamreader>(); for (int i=1; i<foo.number_of_files+1;i++)> { StreamReader aa= new StreamReader(@"realtime_" + Foo.main_id + "_" + i + ".txt"); SRL.Add (aa); } */ string inputString = File.ReadAllText(filename); // string inputStr = File.ReadAllText(filename1); inputString = inputString.ToLower(); // Define characters to strip from the input and do it string[] stripChars = { ";", ",", ".", "-", "_", "^", "(", ")", "[", "]", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "\n", "\t", "\r" }; foreach (string character in stripChars) { inputString = inputString.Replace(character, ""); } List<string> wordList = inputString.Split(' ').ToList(); string[] stopwords = new string[] { "and", "the", "she", "for", "this", "you", "but" }; // string[] negative = new string[] { "bad", "worse", "low", "decrease", "fail", "reduce", "weak", "sad" }; foreach (string word in stopwords) { while (wordList.Contains(word)) { wordList.Remove(word); } } Dictionary<string, int> dictionary = new Dictionary<string, int>(); foreach (string word in wordList) { if (word.Length >= 3) { if (dictionary.ContainsKey(word)) { dictionary[word]++; } else { dictionary[word] = 1; } } } var sortedDict = (from entry in dictionary orderby entry.Value descending select entry).ToDictionary(pair => pair.Key, pair => pair.Value); int count = 1; Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----"); Console.WriteLine(); foreach (KeyValuePair<string, int> pair in sortedDict) { Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value); count++; } Console.ReadKey(); } }} 解决方案 I fixed the formatting of the code in your question.However, your attempt to get a sorted dictionary will not work.Using the .ToDictionary(...) turns it back into a regular Dictionary which does not preserve any ordering.It looks like you can just use the query to make an IEnumerable<KeyValuePair<string, int>> and iterate over that:var sortedWordCounts = from entry in dictionary orderby entry.Value descending select entry;int count = 1;Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----");Console.WriteLine();foreach (var pair in sortedWordCounts){ Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value); count++;}Console.ReadKey();If you really need to keep the collection in the sorted order, you should use .ToList() or .ToArray(). 这篇关于从文本文档中提取名词的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-05 11:06