机器学习和自然语言处理

本文介绍了机器学习和自然语言处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设您认识一个想要学习机器学习和自然语言处理的学生.

Assume you know a student who wants to study Machine Learning and Natural Language Processing.

他们应该关注哪些特定的计算机科学学科，哪些编程语言是专门为解决这些类型的问题而设计的?

What specific computer science subjects should they focus on and which programming languages are specifically designed to solve these types of problems?

我不是在寻找您最喜欢的主题和工具，而是在寻找行业标准.

I am not looking for your favorite subjects and tools, but rather industry standards.

示例:我猜了解 Prolog 和 Matlab 可能会对他们有所帮助.他们可能还想学习离散结构*、微积分和统计学.

Example: I'm guessing that knowing Prolog and Matlab might help them. They also might want to study Discrete Structures*, Calculus, and Statistics.

*图形和树.函数:属性、递归定义、求解递归.关系:性质、等价、偏序.证明技巧，归纳证明.计数技术和离散概率.逻辑:命题演算，一阶谓词演算.形式推理:自然演绎、解析.程序正确性和自动推理的应用.计算中的代数结构简介.

*Graphs and trees. Functions: properties, recursive definitions, solving recurrences. Relations: properties, equivalence, partial order. Proof techniques, inductive proof. Counting techniques and discrete probability. Logic: propositional calculus, first-order predicate calculus. Formal reasoning: natural deduction, resolution. Applications to program correctness and automatic reasoning. Introduction to algebraic structures in computing.

推荐答案

这个相关的 stackoverflow 问题有一些不错的答案:对于对自然语言处理感兴趣的人来说，什么是好的起点?

This related stackoverflow question has some nice answers: What are good starting points for someone interested in natural language processing?

这是一个非常大的领域.先决条件主要包括概率/统计、线性代数和基础计算机科学，尽管自然语言处理需要更深入的计算机科学背景(通常涵盖一些基本的人工智能).关于特定语言:Lisp 被创建作为事后的想法"用于进行 AI 研究，而 Prolog(源于形式逻辑)特别针对自然语言处理，许多课程将使用 Prolog、Scheme、Matlab、R 或其他函数式语言(例如 OCaml 用于康奈尔大学的本课程)，因为它们非常适合这种分析.

This is a very big field. The prerequisites mostly consist of probability/statistics, linear algebra, and basic computer science, although Natural Language Processing requires a more intensive computer science background to start with (frequently covering some basic AI). Regarding specific langauges: Lisp was created "as an afterthought" for doing AI research, while Prolog (with it's roots in formal logic) is especially aimed at Natural Language Processing, and many courses will use Prolog, Scheme, Matlab, R, or another functional language (e.g. OCaml is used for this course at Cornell) as they are very suited to this kind of analysis.

这里有一些更具体的提示:

Here are some more specific pointers:

对于机器学习，斯坦福 CS 229:机器学习 很棒:它包括所有内容，包括讲座的完整视频(也在 iTunes 上提供)、课程笔记、习题集等，并且吴恩达.

注意先决条件:

学生应具备以下背景:知识基本的计算机科学原理和技能，水平足以写作一个相当重要的计算机程序.熟悉基本的概率论.熟悉基本的线性代数.

课程使用 Matlab 和/或 Octave.还推荐以下读物(虽然课程笔记本身很完整):

The course uses Matlab and/or Octave. It also recommends the following readings (although the course notes themselves are very complete):

Christopher Bishop，模式识别和机器学习.斯普林格，2006 年.
Richard Duda、Peter Hart 和 David Stork，模式分类，第 2 版.约翰威利 &儿子们，2001 年.
Tom Mitchell，机器学习.麦格劳-希尔，1997 年.
Richard Sutton 和 Andrew Barto，强化学习:简介.麻省理工学院出版社，1998 年

Christopher Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
Richard Duda, Peter Hart and David Stork, Pattern Classification, 2nd ed. John Wiley & Sons, 2001.
Tom Mitchell, Machine Learning. McGraw-Hill, 1997.
Richard Sutton and Andrew Barto, Reinforcement Learning: An introduction. MIT Press, 1998

对于自然语言处理，斯坦福大学的 NLP 小组提供了许多很好的资源.入门课程Stanford CS 224:自然语言处理包括所有在线讲座，并具备以下先决条件:

For Natural Language Processing, the NLP group at Stanford provides many good resources. The introductory course Stanford CS 224: Natural Language Processing includes all the lectures online and has the following prerequisites:

有足够的编程经验和正式的结构.编程项目将使用 Java 1.5 编写，所以 Java 知识(或愿意自己学习)是必需的.标准概念的知识人工智能和/或计算语言学.基本的熟悉逻辑，向量空间，和概率.

一些推荐的文本是:

丹尼尔·朱拉夫斯基和詹姆斯·H·马丁.2008. 语音和语言处理:自然语言处理简介，计算语言学和语音识别.第二版.普伦蒂斯大厅.
Christopher D. Manning 和 Hinrich Schütze.1999. 统计自然语言处理基础.麻省理工学院出版社.
詹姆斯·艾伦.1995. 自然语言理解.本杰明/卡明斯，2 岁.
杰拉尔德·加兹达尔和克里斯·梅利什.1989. Prolog 中的自然语言处理.艾迪生-卫斯理.(可在网上免费获得)
弗雷德里克·耶利内克.1998. 语音识别的统计方法.麻省理工学院出版社.

Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Second Edition. Prentice Hall.
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in Prolog. Addison-Wesley. (this is available online for free)
Frederick Jelinek. 1998. Statistical Methods for Speech Recognition. MIT Press.

先决条件计算语言学课程需要基本的计算机编程和数据结构知识，并使用相同的教科书.所需的人工智能课程也可在线获得以及所有讲义和使用:

The prerequisite computational linguistics course requires basic computer programming and data structures knowledge, and uses the same text books. The required articificial intelligence course is also available online along with all the lecture notes and uses:

S.Russell 和 P. Norvig，人工智能:现代方法.第二版

这是标准的人工智能文本，也值得一读.

This is the standard Artificial Intelligence text and is also worth reading.

我自己使用 R 进行机器学习，我非常推荐它.为此，我建议查看统计学习的要素，全文可免费在线获取.您可能需要参考机器学习和自然语言处理在 CRAN 上查看特定功能.

I use R for machine learning myself and really recommend it. For this, I would suggest looking at The Elements of Statistical Learning, for which the full text is available online for free. You may want to refer to the Machine Learning and Natural Language Processing views on CRAN for specific functionality.

这篇关于机器学习和自然语言处理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！