以下是本人原创,如若转载和使用请注明转载地址。本博客信息切勿用于商业,可以个人使用,若喜欢我的博客,请关注我,谢谢!博客地址
MRQL简介
MRQL (发音 miracle) 是一个查询处理和优化系统,适用于大规模分布式的数据分析。MRQL (MapReduce Query Language) 是一个在计算机集群中对大规模数据的类 SQL 查询语言。MRQL 查询处理系统可使用如下三种模式评估 MRQL 查询:
- 使用 Hadoop 的 Map-Reduce 模式
- 使用 Apache Hama 的 BSP 模式 (Bulk Synchronous Parallel mode)
- 基于 Apache Spark 的 Spark 模式
MRQL一般的使用语法
Evaluating MRQL Queries Using Map-Reduce
Before deploying your MRQL queries on a Hadoop cluster, you can run these queries in memory on a small amount of data using the command:
which evaluates MRQL top-level commands and queries from the input until you type quit. To run MRQL in Hadoop's standalone mode (single node on local files), use:
To run MRQL in Hadoop's fully distributed mode (cluster mode), use:
//MRQL运行Hadoop的完全分布式模式(集群模式)
Accessing the Data Sources
The MRQL expression that makes a directory of raw files accessibleto a query is:
where path is the URI of the directory that contains thesource files (a string), parser is the name of the parser toparse the files, and args are various parameters specific tothe parsing method. It returns a !bag(t), for some t,that is, it returns a map-reduce type. Currently, there are foursupported parsers: line, xml, json, andbinary, but it is easy to define and embed your own parser(explained later).
Parsing XML Documents
The MRQL expression used for parsing an XML document is:
source( xml, path, tags, xpath )
For example, the following expression:
binds the variable XMark to the result of parsing thedocument "xmark.xml" and returns a list of personelements. A more complex example is:
下面是我自己做的例子:
1 2 3 4 5 6 7 8 | < person > < name > 张三 </ name > < age > 20 </ age > </ person > |
将1.xml文件上传到hdfs目录下
hadoop fs -put ~/1.xml /user/hadoop/jl
查看jl目录