本文介绍了GeoIP表与MySQL中的IP表联接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在寻找一种快速连接表的方式时遇到了问题:

I am having a issue finding a fast way of joining the tables looking like that:

mysql> explain geo_ip;
+--------------+------------------+------+-----+---------+-------+
| Field        | Type             | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| ip_start     | varchar(32)      | NO   |     | ""      |       |
| ip_end       | varchar(32)      | NO   |     | ""      |       |
| ip_num_start | int(64) unsigned | NO   | PRI | 0       |       |
| ip_num_end   | int(64) unsigned | NO   |     | 0       |       |
| country_code | varchar(3)       | NO   |     | ""      |       |
| country_name | varchar(64)      | NO   |     | ""      |       |
| ip_poly      | geometry         | NO   | MUL | NULL    |       |
+--------------+------------------+------+-----+---------+-------+


mysql> explain entity_ip;
+------------+---------------------+------+-----+---------+-------+
| Field      | Type                | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| entity_id  | int(64) unsigned    | NO   | PRI | NULL    |       |
| ip_1       | tinyint(3) unsigned | NO   |     | NULL    |       |
| ip_2       | tinyint(3) unsigned | NO   |     | NULL    |       |
| ip_3       | tinyint(3) unsigned | NO   |     | NULL    |       |
| ip_4       | tinyint(3) unsigned | NO   |     | NULL    |       |
| ip_num     | int(64) unsigned    | NO   |     | 0       |       |
| ip_poly    | geometry            | NO   | MUL | NULL    |       |
+------------+---------------------+------+-----+---------+-------+

请注意,我对一次仅用一个IP地址在geo_ip中查找所需行不感兴趣,我需要一个entity_ip LEFT JOIN geo_ip(或类似/类似方式).

Please note that I am not interested in finding the needed rows in geo_ip by only ONE IP address at once, I need a entity_ip LEFT JOIN geo_ip (or similar/analogue way).

这就是我现在拥有的(使用):

This is what I have for now (using polygons as advised on http://jcole.us/blog/archives/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql-gis/):

mysql> EXPLAIN SELECT li.*, gi.country_code FROM entity_ip AS li
-> LEFT JOIN geo_ip AS gi ON
-> MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`);

+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
|  1 | SIMPLE      | li    | ALL  | NULL          | NULL | NULL    | NULL |   2470 |       |
|  1 | SIMPLE      | gi    | ALL  | ip_poly_index | NULL | NULL    | NULL | 155183 |       |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+

mysql> SELECT li.*, gi.country_code FROM entity AS li LEFT JOIN geo_ip AS gi ON MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`) limit  0, 20;
20 rows in set (2.22 sec)

没有多边形

mysql> explain SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.`ip_num` >= gi.`ip_num_start` AND li.`ip_num` <= gi.`ip_num_end` LIMIT 0,20;
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys             | key  | key_len | ref  | rows   | Extra |
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
|  1 | SIMPLE      | li    | ALL  | NULL                      | NULL | NULL    | NULL |   2470 |       |
|  1 | SIMPLE      | gi    | ALL  | PRIMARY,geo_ip,geo_ip_end | NULL | NULL    | NULL | 155183 |       |
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+

mysql> SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.ip_num BETWEEN gi.ip_num_start AND gi.ip_num_end limit  0, 20;
20 rows in set (2.00 sec)

(搜索中的行数更多-没有区别)

(On higher number of rows in the search - there is no difference)

当前我无法从这些查询中获得任何更快的性能,因为每个IP 0.1秒对我来说太慢了.

Currently I cannot get any faster performance from these queries as 0.1 seconds per IP is way too slow for me.

有什么方法可以使其更快?

Is there any way to make it faster?

推荐答案

此方法存在一些可扩展性问题(您应该选择移至特定于城市的geoip数据),但是对于给定的数据大小,它将提供了可观的优化.

This approach has some scalability issues (should you choose to move to, say, city-specific geoip data), but for the given size of data, it will provide considerable optimization.

您实际上面临的问题是MySQL不能很好地优化基于范围的查询.理想情况下,您希望对索引进行精确的("=")查找,而不是大于"查找,因此我们需要根据可用数据来构建类似的索引.这样,MySQL在寻找匹配项时将需要评估的行要少得多.

The problem you are facing is effectively that MySQL does not optimize range-based queries very well. Ideally you want to do an exact ("=") look-up on an index rather than "greater than", so we'll need to build an index like that from the data you have available. This way MySQL will have much fewer rows to evaluate while looking for a match.

为此,我建议您创建一个查找表,该索引表根据IP地址的第一个八位位组(1.2.3.4中的= 1)为地理位置表建立索引.这个想法是,对于每次查找,您都可以忽略所有以与所查找IP相同的八位字节开头的地理位置IP.

To do this, I suggest that you create a look-up table that indexes the geolocation table based on the first octet (=1 from 1.2.3.4) of the IP addresses. The idea is that for each look-up you have to do, you can ignore all geolocation IPs which do not begin with the same octet than the IP you are looking for.

CREATE TABLE `ip_geolocation_lookup` (
  `first_octet` int(10) unsigned NOT NULL DEFAULT '0',
  `ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
  `ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
  KEY `first_octet` (`first_octet`,`ip_numeric_start`,`ip_numeric_end`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

接下来,我们需要获取地理位置表中可用的数据,并生成涵盖地理位置行覆盖的 all (第一个)八位字节的数据:如果您有一个带有ip_start = '5.3.0.0'和,查找表将需要包含八位字节5、6、7和8的行.所以...

Next, we need to take the data available in your geolocation table and produce data that covers all (first) octets the geolocation row covers: If you have an entry with ip_start = '5.3.0.0' and ip_end = '8.16.0.0', the lookup table will need rows for octets 5, 6, 7, and 8. So...

ip_geolocation
|ip_start       |ip_end          |ip_numeric_start|ip_numeric_end|
|72.255.119.248 |74.3.127.255    |1224701944      |1241743359    |

应转换为:

ip_geolocation_lookup
|first_octet|ip_numeric_start|ip_numeric_end|
|72         |1224701944      |1241743359    |
|73         |1224701944      |1241743359    |
|74         |1224701944      |1241743359    |

由于这里有人要求提供本机MySQL解决方案,因此以下是一个存储过程,它将为您生成该数据:

Since someone here requested for a native MySQL solution, here's a stored procedure that will generate that data for you:

DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup;

CREATE PROCEDURE recalculate_ip_geolocation_lookup()
BEGIN
    DECLARE i INT DEFAULT 0;

    DELETE FROM ip_geolocation_lookup;

    WHILE i < 256 DO
       INSERT INTO ip_geolocation_lookup (first_octet, ip_numeric_start, ip_numeric_end)
                SELECT  i, ip_numeric_start, ip_numeric_end FROM ip_geolocation WHERE
                ( ip_numeric_start & 0xFF000000 ) >> 24 <= i AND
                ( ip_numeric_end & 0xFF000000 ) >> 24 >= i;

       SET i = i + 1;
    END WHILE;
END;

然后您将需要通过调用该存储过程来填充表:

And then you will need to populate the table by calling that stored procedure:

CALL recalculate_ip_geolocation_lookup();

这时,您可以删除刚刚创建的过程-不再需要它,除非您要重新计算查找表.

At this point you may delete the procedure you just created -- it is no longer needed, unless you want to recalculate the look-up table.

查找表到位后,您所要做的就是将其集成到查询中,并确保按第一个八位位组进行查询.您对查询表的查询将满足两个条件:

After the look-up table is in place, all you have to do is integrate it into your queries and make sure you're querying by the first octet. Your query to the look-up table will satisfy two conditions:

  1. 查找与您的IP地址的第一个八位位组匹配的所有行
  2. 该子集的 :找到具有与您的IP地址匹配的范围的行
  1. Find all rows which match the first octet of your IP address
  2. Of that subset: Find the row which has the the range that matches your IP address

由于第二步是对一部分数据执行的,因此比对整个数据进行范围测试要快得多.这是这种优化策略的关键.

Because the step two is carried out on a subset of data, it is considerably faster than doing the range tests on the entire data. This is the key to this optimization strategy.

有多种方法可以确定IP地址的第一个八位字节是什么;我使用了( r.ip_numeric & 0xFF000000 ) >> 24,因为我的源IP是数字形式的

There are various ways for figuring out what the first octet of an IP address is; I used ( r.ip_numeric & 0xFF000000 ) >> 24 since my source IPs are in numeric form:

SELECT
    r.*,
    g.country_code
FROM
    ip_geolocation g,
    ip_geolocation_lookup l,
    ip_random r
WHERE
    l.first_octet = ( r.ip_numeric & 0xFF000000 ) >> 24 AND
    l.ip_numeric_start <= r.ip_numeric AND
    l.ip_numeric_end >= r.ip_numeric AND
    g.ip_numeric_start = l.ip_numeric_start;

现在,当然,我的确得到了一点懒惰:如果使ip_geolocation_lookup表也包含国家/地区数据,则可以轻松地完全摆脱ip_geolocation表.我猜想从该查询中删除一张表会使它更快一些.

Now, admittedly I did get a little lazy in the end: You could easily get rid of ip_geolocation table altogether if you made the ip_geolocation_lookup table also contain the country data. I'm guessing dropping one table from this query would make it a bit faster.

最后,这是我在此响应中使用的另外两个表作为参考,因为它们与您的表不同.我确定它们是兼容的.

And, finally, here are the two other tables I used in this response for reference, since they differ from your tables. I'm certain they are compatible, though.

# This table contains the original geolocation data

CREATE TABLE `ip_geolocation` (
  `ip_start` varchar(16) NOT NULL DEFAULT '',
  `ip_end` varchar(16) NOT NULL DEFAULT '',
  `ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
  `ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
  `country_code` varchar(3) NOT NULL DEFAULT '',
  `country_name` varchar(64) NOT NULL DEFAULT '',
  PRIMARY KEY (`ip_numeric_start`),
  KEY `country_code` (`country_code`),
  KEY `ip_start` (`ip_start`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


# This table simply holds random IP data that can be used for testing

CREATE TABLE `ip_random` (
  `ip` varchar(16) NOT NULL DEFAULT '',
  `ip_numeric` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

这篇关于GeoIP表与MySQL中的IP表联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 16:44