问题描述
这周我们在办公室进行了一场健康的辩论.我们正在创建一个Db来存储代理信息,在大多数情况下,除了应该如何存储IP之外,我们还制定了模式.一个营地想要使用4个smallint,每个八位位组一个,另一个想要使用1个大整数,即INET_ATON.
We've got a healthy debate going on in the office this week. We're creating a Db to store proxy information, for the most part we have the schema worked out except for how we should store IPs. One camp wants to use 4 smallints, one for each octet and the other wants to use a 1 big int,INET_ATON.
这些表将非常庞大,因此性能至关重要.我处于中间位置,因为我通常在世界中使用MS SQL和4个小整数.我对这种类型的IP存储量没有足够的经验.
These tables are going to be huge so performance is key. I am in middle here as I normally use MS SQL and 4 small ints in my world. I don't have enough experience with this type of volume storing IPs.
我们将使用perl和python脚本访问数据库,以将数据进一步归一化为其他几个表,以供谈话者,有趣的流量等使用.
We'll be using perl and python scripts to access the database to further normalize the data into several other tables for top talkers, interesting traffic etc.
我确定社区中有些人所做的事情与我们正在做的事情类似,并且我想听听他们的经验以及哪种路线最好,IP地址应为1大整数还是4小整数.
I am sure there are some here in the community that have done something simular to what we are doing and I am interested in hearing about their experiences and which route is best, 1 big int, or 4 small ints for IP addresses.
编辑-我们关注的问题之一是空间,该数据库的规模将庞大,就像每天有5亿条记录一样.因此,我们正在尝试权衡空间问题和性能问题.
EDIT - One of our concerns is space, this database is going to be huge like in 500,000,000 records a day. So we are trying to weigh the space issue along with the performance issue.
编辑2 有些对话已经转移到我们将要存储的数据量中……这不是我的问题.问题是哪种是存储IP地址的最佳方式,为什么.就像我在评论中说过的那样,我们为一家财富50强的公司工作.我们的日志文件包含来自用户的使用情况数据.反过来,这些数据将在安全上下文中使用,以驱动某些指标并驱动多个安全工具.
EDIT 2 Some of the conversation has turned over to the volume of data we are going to store...that's not my question. The question is which is the preferable way to store an IP address and why. Like I've said in my comments, we work for a large fortune 50 company. Our log files contain usage data from our users. This data in turn will be used within a security context to drive some metrics and to drive several security tools.
推荐答案
我建议您查看将要运行的查询类型,以决定采用哪种格式.
I would suggest looking at what type of queries you will be running to decide which format you adopt.
仅当您需要提取或比较各个八位位组时,才需要考虑将它们分成单独的字段.
Only if you need to pull out or compare individual octets would you have to consider splitting them up into separate fields.
否则,将其存储为4字节整数.这还具有允许您使用MySQL内置 INET_ATON()
和 INET_NTOA()
函数.
Otherwise, store it as a 4 byte integer. That also has the bonus of allowing you to use the MySQL built-in INET_ATON()
and INET_NTOA()
functions.
存储空间:
如果仅支持IPv4地址,则MySQL中的数据类型可以为UNSIGNED INT
,仅使用4个字节的存储空间.
If you are only going to support IPv4 addresses then your datatype in MySQL can be an UNSIGNED INT
which only uses 4 bytes of storage.
要存储单个八位位组,只需使用UNSIGNED TINYINT
数据类型,而无需使用SMALLINTS
数据类型,每个数据类型将占用1个字节的存储空间.
To store the individual octets you would only need to use UNSIGNED TINYINT
datatypes, not SMALLINTS
, which would use up 1 byte each of storage.
这两种方法都将使用相似的存储空间,可能会为单独的字段使用更多的存储空间,从而产生一些开销.
Both methods would use similar storage with perhaps slightly more for separate fields for some overhead.
更多信息:
- Numeric Type Overview
- Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT
性能:
使用单个字段将产生更好的性能,它是单个比较而不是4.您提到您将只对整个IP地址运行查询,因此不必将八位字节分开.使用MySQL的INET_*
函数将文本和整数表示形式之间进行一次转换以进行比较.
Using a single field will yield much better performance, it's a single comparison instead of 4. You mentioned that you will only run queries against the whole IP address, so there should be no need to keep the octets separate. Using the INET_*
functions of MySQL will do the conversion between the text and integer representations once for the comparison.
这篇关于如何在MySQL中存储IP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!