使用正则表达式从MongoDB中提取子字符串列表

使用正则表达式从MongoDB中提取子字符串列表

本文介绍了使用正则表达式从MongoDB中提取子字符串列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要提取与正则表达式匹配的字符串的一部分并返回它.

I need to extract a part of a string that matches a regex and return it.

我有一组文件,例如:

{"_id" :12121, "fileName" : "apple.doc"},
{"_id" :12125, "fileName" : "rap.txt"},
{"_id" :12126, "fileName" : "tap.pdf"},
{"_id" :12126, "fileName" : "cricket.txt"},

我需要提取所有文件扩展名并返回{".doc", ".txt", ".pdf"}.

I need to extract all file extensions and return {".doc", ".txt", ".pdf"}.

我正在尝试使用$regex运算符查找子字符串并根据结果进行汇总,但是无法提取所需的部分并将其传递到管道中.

I am trying to use the $regex operator to find the sub strings and aggregate on the results but am unable to extract the required part and pass it down the pipeline.

我尝试了类似的尝试,但没有成功:

I have tried something like this without success:

aggregate([
  { $match: { "name": { $regex: '/\.[0-9a-z]+$/i', "$options": "i" } } },
  { $group: { _id: null, tot: { $push: "$name" } } }
])

推荐答案

在即将发布的MongoDB版本中(撰写本文时),可以使用聚合框架和$indexOfCP运算符执行此操作.在那之前,您最好的选择是MapReduce.

It will be possible to do this in the upcoming version of MongoDB(as the time of this writing) using the aggregation framework and the $indexOfCP operator. Until then, your best bet here is MapReduce.

var mapper = function() {
    emit(this._id, this.fileName.substring(this.fileName.indexOf(".")))
};

db.coll.mapReduce(mapper,
                  function(key, value) {},
                  { "out": { "inline": 1 }}
)["results"]

哪种产量:

[
    {
        "_id" : 12121,
        "value" : ".doc"
    },
    {
        "_id" : 12125,
        "value" : ".txt"
    },
    {
        "_id" : 12126,
        "value" : ".pdf"
    },
    {
        "_id" : 12127,
        "value" : ".txt"
    }
]


为完整起见,这是使用聚合框架的解决方案

db.coll.aggregate(
    [
        { "$match": { "name": /\.[0-9a-z]+$/i } },
        { "$group": {
            "_id": null,
            "extension":  {
                "$push": {
                    "$substr": [
                        "$fileName",
                        { "$indexOfCP": [ "$fileName", "." ] },
                        -1
                    ]
                }
            }
        }}
    ])

产生:

{
    "_id" : null,
    "extensions" : [ ".doc", ".txt", ".pdf", ".txt" ]
}


这篇关于使用正则表达式从MongoDB中提取子字符串列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 03:21