本文介绍了使用 mgo 在 MongoDB 中进行高效分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们在 MongoDB 中有一个 users 集合,使用这个 Go struct 建模:

Let's say we have a users collection in MongoDB modeled with this Go struct:

type User struct {
    ID      bson.ObjectId `bson:"_id"`
    Name    string        `bson:"name"`
    Country string        `bson:"country"`
}

我们想根据某些标准对用户进行排序和列出,但由于预期的结果列表很长,因此实施了分页.

We want to sort and list users based on some criteria, but have paging implemented due to the expected long result list.

为了实现某些查询结果的分页,MongoDB和mgo.v2 驱动程序包具有 Query.Skip()Query.Limit(),例如:

To achieve paging of the results of some query, MongoDB and the mgo.v2 driver package has built-in support in the form of Query.Skip() and Query.Limit(), e.g.:

session, err := mgo.Dial(url) // Acquire Mongo session, handle error!

c := session.DB("").C("users")
q := c.Find(bson.M{"country" : "USA"}).Sort("name", "_id").Limit(10)

// To get the nth page:
q = q.Skip((n-1)*10)

var users []*User
err = q.All(&users)

然而,如果页数增加,这会变得很慢,因为 MongoDB 不能只是神奇地"跳转到结果中的第 x 个文档,它必须遍历所有结果文档并省略(不返回)第一个需要跳过的x.

This however becomes slow if the page number increases, as MongoDB can't just "magically" jump to the x document in the result, it has to iterate over all the result documents and omit (not return) the first x that need to be skipped.

MongoDB 提供了正确的解决方案:如果查询对索引进行操作(它必须对索引进行操作),cursor.min() 可用于指定第一个索引条目开始列出结果.

MongoDB provides the right solution: If the query operates on an index (it has to work on an index), cursor.min() can be used to specify the first index entry to start listing results from.

此 Stack Overflow 答案显示了如何使用 mongo 客户端完成此操作:如何在 MongoDB 中使用范围查询进行分页?

This Stack Overflow answer shows how it can be done using a mongo client: How to do pagination using range queries in MongoDB?

注意:上述查询所需的索引为:

Note: the required index for the above query would be:

db.users.createIndex(
    {
        country: 1,
        name: 1,
        _id: 1
    }
)

但是有一个问题:mgo.v2 包不支持指定这个 min().

There is one problem though: the mgo.v2 package has no support specifying this min().

我们如何使用 mgo.v2 驱动程序实现使用 MongoDB cursor.min() 功能的高效分页?

How can we achieve efficient paging that uses MongoDB's cursor.min() feature using the mgo.v2 driver?

推荐答案

不幸的是 mgo.v2 驱动程序不提供 API 调用来指定 cursor.min().

但是有一个解决方案.mgo.Database 类型提供了一个 Database.Run() 方法来运行任何 MongoDB 命令.可用命令及其文档可在此处找到:数据库命令

从 MongoDB 3.2 开始,一个新的 find 命令可用于执行查询,它支持指定 min 参数,该参数表示开始列出结果的第一个索引条目.

Starting with MongoDB 3.2, a new find command is available which can be used to execute queries, and it supports specifying the min argument that denotes the first index entry to start listing results from.

好.我们需要做的是在每批(页面的文档)从查询结果的最后一个文档生成min文档后,该文档必须包含用于执行查询的索引条目的值查询,然后在执行查询之前通过设置这个最小索引条目可以获取下一批(下一页的文档).

Good. What we need to do is after each batch (documents of a page) generate the min document from the last document of the query result, which must contain the values of the index entry that was used to execute the query, and then the next batch (the documents of the next page) can be acquired by setting this min index entry prior to executing the query.

这个索引条目——从现在起我们称它为cursor——可能被编码为一个string并与结果一起发送给客户端,当客户端需要时下一页,他发回 cursor 说他希望结果从这个游标之后开始.

This index entry –let's call it cursor from now on– may be encoded to a string and sent to the client along with the results, and when the client wants the next page, he sends back the cursor saying he wants results starting after this cursor.

要执行的命令可以有不同的形式,但命令名(find)必须在编组结果的第一个,所以我们将使用bson.D(与 bson.M):

The command to be executed can be in different forms, but the command name (find) must be first in the marshaled result, so we'll use bson.D (which preserves order in contrast to bson.M):

limit := 10
cmd := bson.D{
    {Name: "find", Value: "users"},
    {Name: "filter", Value: bson.M{"country": "USA"}},
    {Name: "sort", Value: []bson.D{
        {Name: "name", Value: 1},
        {Name: "_id", Value: 1},
    },
    {Name: "limit", Value: limit},
    {Name: "batchSize", Value: limit},
    {Name: "singleBatch", Value: true},
}
if min != nil {
    // min is inclusive, must skip first (which is the previous last)
    cmd = append(cmd,
        bson.DocElem{Name: "skip", Value: 1},
        bson.DocElem{Name: "min", Value: min},
    )
}

使用 Database.Run() 执行 MongoDB find 命令的结果可以使用以下类型捕获:

The result of executing a MongoDB find command with Database.Run() can be captured with the following type:

var res struct {
    OK       int `bson:"ok"`
    WaitedMS int `bson:"waitedMS"`
    Cursor   struct {
        ID         interface{} `bson:"id"`
        NS         string      `bson:"ns"`
        FirstBatch []bson.Raw  `bson:"firstBatch"`
    } `bson:"cursor"`
}

db := session.DB("")
if err := db.Run(cmd, &res); err != nil {
    // Handle error (abort)
}

我们现在有了结果,但是在 []bson.Raw 类型的切片中.但是我们希望它在 []*User 类型的切片中.这是 Collection.NewIter()派上用场.它可以将 []bson.Raw 类型的值转换(解组)为我们通常传递给 Query.All()Iter.All().好的.我们来看看:

We now have the results, but in a slice of type []bson.Raw. But we want it in a slice of type []*User. This is where Collection.NewIter() comes handy. It can transform (unmarshal) a value of type []bson.Raw into any type we usually pass to Query.All() or Iter.All(). Good. Let's see it:

firstBatch := res.Cursor.FirstBatch
var users []*User
err = db.C("users").NewIter(nil, firstBatch, 0, nil).All(&users)

我们现在有了下一页的用户.只剩下一件事:生成用于获取后续页面的光标,如果我们需要它:

We now have the users of the next page. Only one thing left: generating the cursor to be used to get the subsequent page should we ever need it:

if len(users) > 0 {
    lastUser := users[len(users)-1]
    cursorData := []bson.D{
        {Name: "country", Value: lastUser.Country},
        {Name: "name", Value: lastUser.Name},
        {Name: "_id", Value: lastUser.ID},
    }
} else {
    // No more users found, use the last cursor
}

这一切都很好,但是我们如何将 cursorData 转换为 string,反之亦然?我们可以使用 bson.Marshal()bson.Unmarshal() 结合base64 编码;使用 base64.RawURLEncoding 会给我们一个网络- 安全游标字符串,可以添加到 URL 查询中而无需转义.

This is all good, but how do we convert a cursorData to string and vice versa? We may use bson.Marshal() and bson.Unmarshal() combined with base64 encoding; the use of base64.RawURLEncoding will give us a web-safe cursor string, one that can be added to URL queries without escaping.

这是一个示例实现:

// CreateCursor returns a web-safe cursor string from the specified fields.
// The returned cursor string is safe to include in URL queries without escaping.
func CreateCursor(cursorData bson.D) (string, error) {
    // bson.Marshal() never returns error, so I skip a check and early return
    // (but I do return the error if it would ever happen)
    data, err := bson.Marshal(cursorData)
    return base64.RawURLEncoding.EncodeToString(data), err
}

// ParseCursor parses the cursor string and returns the cursor data.
func ParseCursor(c string) (cursorData bson.D, err error) {
    var data []byte
    if data, err = base64.RawURLEncoding.DecodeString(c); err != nil {
        return
    }

    err = bson.Unmarshal(data, &cursorData)
    return
}

我们终于有了高效但不那么短的 MongoDB mgo 分页功能.继续阅读...

And we finally have our efficient, but not so short MongoDB mgo paging functionality. Read on...

手动方式相当冗长;它可以通用自动化.这就是 github.com/icza/minquery 出现的地方(披露:我是作者).它提供了一个包装器来配置和执行 MongoDB find 命令,允许您指定一个游标,并在执行查询后返回新的游标以用于查询下一批结果.包装器是 MinQuery 类型,非常相似到 mgo.Query 但它支持指定 MongoDB 的 min 通过 MinQuery.Cursor() 方法.

The manual way is quite lengthy; it can be made general and automated. This is where github.com/icza/minquery comes into the picture (disclosure: I'm the author). It provides a wrapper to configure and execute a MongoDB find command, allowing you to specify a cursor, and after executing the query, it gives you back the new cursor to be used to query the next batch of results. The wrapper is the MinQuery type which is very similar to mgo.Query but it supports specifying MongoDB's min via the MinQuery.Cursor() method.

上述使用 minquery 的解决方案如下所示:

The above solution using minquery looks like this:

q := minquery.New(session.DB(""), "users", bson.M{"country" : "USA"}).
    Sort("name", "_id").Limit(10)
// If this is not the first page, set cursor:
// getLastCursor() represents your logic how you acquire the last cursor.
if cursor := getLastCursor(); cursor != "" {
    q = q.Cursor(cursor)
}

var users []*User
newCursor, err := q.All(&users, "country", "name", "_id")

仅此而已.newCursor 是用于获取下一批的游标.

And that's all. newCursor is the cursor to be used to fetch the next batch.

注意 #1: 在调用 MinQuery.All() 时,您必须提供游标字段的名称,这将用于构建游标数据(以及最终的游标字符串)来自.

Note #1: When calling MinQuery.All(), you have to provide the names of the cursor fields, this will be used to build the cursor data (and ultimately the cursor string) from.

注意 #2:如果您要检索部分结果(通过使用 MinQuery.Select()),则必须包括属于游标(索引条目)即使您不打算直接使用它们,否则 MinQuery.All() 将不会拥有游标字段的所有值,因此它将无法创建正确的游标值.

Note #2: If you're retrieving partial results (by using MinQuery.Select()), you have to include all the fields that are part of the cursor (the index entry) even if you don't intend to use them directly, else MinQuery.All() will not have all the values of the cursor fields, and so it will not be able to create the proper cursor value.

在此处查看 minquery 的包文档:https://godoc.org/github.com/icza/minquery,它很短,希望干净.

Check out the package doc of minquery here: https://godoc.org/github.com/icza/minquery, it is rather short and hopefully clean.

这篇关于使用 mgo 在 MongoDB 中进行高效分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-02 21:37