本文介绍了我应该如何在 MongoDB 中实现这个模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写跟踪脚本,但在弄清楚数据库应该如何工作时遇到了麻烦.

在 MySQL 中,我会创建一个类似于

的表

用户:用户名_名称:字符串活动:标题:字符串描述:字符串链接:字符串用户活动:user_id:整数camp_id:整数点击:操作系统:文本参考:文本camp_id:整数user_id:整数

我需要能够:

  • 查看每次点击的信息,如 IP、Referer、OS 等
  • 查看来自 X IP、X Referer、X OS 的点击频率
  • 将每次点击与用户和广告系列相关联

如果我做一些事情

用户{活动: [{点击次数:[]}]}

我遇到了两个问题:

  • 它为每个用户创建一个新的活动对象,这是一个问题,因为如果我需要更新我的活动,我需要为每个用户更新对象
  • 我希望 Clicks 数组包含大量数据,我觉得将它作为 User 对象的一部分会使查询速度变慢

解决方案

好的,我认为您需要将其分解为基本的种类".

您有两个实体"样式的对象:

  • 用户
  • 活动

您有一个映射"样式的对象:

  • UserCampaign

你有一个事务"风格的对象:

  • 点击

第 1 步:实体

让我们从简单的开始:User &活动.这确实是两个独立的对象,一个对象的存在并不真正依赖于另一个对象.两者之间也没有隐含的层次结构:用户不属于广告系列,广告系列也不属于用户.

当您拥有两个这样的顶级对象时,它们通常会获得自己的收藏.所以你需要一个 Users 集合和一个 Campaigns 集合.

第 2 步:映射

UserCampaign 当前用于表示 N 到 M 的映射.现在,一般来说,当你有一个 N 到 1 的映射时,你可以把 N 放在 1 里面.但是,对于 N 到 M 的映射,你通常必须选择一边".

理论上,您可以执行以下操作之一:

  1. 在每个User
  2. 中放入一个Campaign ID列表
  3. 在每个Campaign
  4. 中放入一个Users ID列表

就我个人而言,我会做 #1.您可能有更多的用户参与营销活动,并且您可能希望将数组放在较短的位置.

第 3 步:交易

点击真的是一个完全不同的野兽.在对象方面,您可以考虑以下内容:Clicks属于"UserClicks属于"Campaign代码>.因此,理论上,您可以将点击存储为这些对象中的任何一个的一部分.很容易认为点击属于用户或广告系列.

但是如果你真的深入挖掘,上面的简化真的是有缺陷的.在您的系统中,Clicks 实际上是一个中心对象.事实上,您甚至可以说 Users &广告系列实际上只是与点击相关联".

查看您提出的问题/疑问.所有这些问题实际上都以点击为中心.用户和广告系列不是您数据中的中心对象,点击次数才是.

此外,点击次数将成为您系统中最丰富的数据.您将获得比其他任何东西都多的点击次数.

在为这样的数据设计架构时,这是最大的障碍.有时,当父"对象不是最重要的事情时,您需要将它们推开.想象一下构建一个简单的电子商务系统.很明显,orders 将属于"users,但 orders 对系统如此重要,以至于它将成为顶级"" 对象.

总结

您可能需要三个系列:

  1. 用户 -> 有活动列表._id
  2. 广告系列
  3. 点击次数 -> 包含 user._id、campaign._id

这应该可以满足您所有的查询需求:

查看每次点击的信息,如 IP、Referer、OS 等

db.clicks.find()

查看来自 X IP、X Referer、X OS 的点击频率

db.clicks.group() 或运行 Map-Reduce.

将每次点击与用户和广告系列相关联

db.clicks.find({user_id : blah}) 也可以将点击 ID 推送到用户和广告系列中(如果有意义的话).

请注意,如果您有大量的点击次数,您真的必须分析您运行最多的查询.您无法为每个字段建立索引,因此您通常需要运行 Map-Reduces 来汇总"这些查询的数据.

I'm trying to write a tracking script and I'm having trouble with figuring out how the database should work.

In MySQL I'd create a table that looks similar to

User:
   username_name: string

Campaign:
   title: string
   description: string
   link: string

UserCampaign:
   user_id: integer
   camp_id: integer

Click:
   os: text
   referer: text
   camp_id: integer
   user_id: integer

I need to be able to:

  • See the information from each click like IP, Referer, OS, etc
  • See how many often clicks are coming from X IP, X Referer, X OS
  • Associate each click with a User and a Campaign

If I do something along the lines of

User {
     Campaigns: [
         {
           Clicks: []
         }
     ]
}

I run into two problems:

  • It creates a new campaign object for each user which is a problem because if I need to update my campaign I'd need to update the object for each user
  • I expect the Clicks array to contain a LARGE amount of data, I feel like having it a part of the User object will make it very slow to query

解决方案

OK, I think you need to break this out into the basic "varieties".

You have two "entity"-style objects:

  • User
  • Campaign

You have one "mapping"-style object:

  • UserCampaign

You have one "transactional"-style object:

  • Click

Step 1: entity

Let's start with the easy ones: User & Campaign. These are truly two separate objects, neither one really depends on the other for its existence. There's also no implicit heirarchy between the two: Users do not belong to Campaigns, nor do Campaigns belong to Users.

When you have two top-level objects like this, they generally earn their own collection. So you'll want a Users collection and a Camapaigns collection.

Step 2: mapping

UserCampaign is currently used to represent an N-to-M mapping. Now, in general, when you have an N-to-1 mapping, you can put the N inside of the 1. However, with the N-to-M mapping, you generally have to "pick a side".

In theory, you could do one of the following:

  1. Put a list of Campaign IDs inside of each User
  2. Put a list of Users IDs inside of each Campaign

Personally, I would do #1. You probably have way more users that campaigns, and you probably want to put the array where it will be shorter.

Step 3: transactional

Clicks is really a completely different beast. In object terms you could think the following: Clicks "belong to" a User, Clicks "belong to" a Campaign. So, in theory, you could just store clicks are part of either of these objects. It's easy to think that Clicks belong under Users or Campaigns.

But if you really dig deeper, the above simplification is really flawed. In your system, Clicks are really a central object. In fact, you might even be able to say that Users & Campaigns are really just "associated with" the click.

Take a look at the questions / queries that you're asking. All of those questions actually center around clicks. Users & Campaigns are not the central object in your data, Clicks are.

Additionally, Clicks are going to be the most plentiful data in your system. You're going to have way more clicks than anything else.

This is the biggest hitch when designing a schema for data like this. Sometimes you need to push off "parent" objects when they're not the most important thing. Imagine building a simple e-commerce system. It's clear that orders would "belong to" users, but orders is so central to the system that it's going to be a "top-level" object.

Wrapping it up

You'll probably want three collections:

  1. User -> has list of campaign._id
  2. Campaign
  3. Clicks -> contains user._id, campaign._id

This should satisfy all of your query needs:

db.clicks.find()

db.clicks.group() or run a Map-Reduce.

db.clicks.find({user_id : blah}) It's also possible to push click IDs into both users and campaigns (if that makes sense).

Please note that if you have lots and lots of clicks, you'll really have to analyze the queries you run most. You can't index on every field, so you'll often want to run Map-Reduces to "roll-up" the data for these queries.

这篇关于我应该如何在 MongoDB 中实现这个模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 19:18