本文介绍了将多维数组存储在数据库中:关系还是多维?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读过许多关于多维到单维多维数据库等的帖子,但是没有一个答案有帮助.我确实在Google上找到了很多文档,但是这些文档仅提供背景信息,而没有回答眼前的问题.

I have read numerous posts along the lines of multidimensional to single dimension, multidimensional database, and so on, but none of the answers helped. I did find a lot of documentation on Google but that only provided background information and didn't answer the question at hand.

我有很多彼此相关的字符串. PHP脚本中需要它们.该结构是分层的.这是一个例子.

I have a lot of strings that are related to one another. They are needed in a PHP script. The structure is hierarchical. Here is an example.

A:
  AA:
    AAA
    AAC
  AB
  AE:
    AEA
    AEE:
      AEEB
B:
  BA:
    BAA
  BD:
    BDC:
      BDCB
      BDCE
    BDD:
      BDDA
  BE:
    BED:
      BEDA
C:
  CC:
    CCB:
      CCBC
      CCBE
    CCC:
      CCCA
      CCCE
  CE

每个缩进在多维数组中都假定一个新级别.

Each indent supposes a new level in the multidimensional array.

目标是按名称及其后代检索具有PHP的元素.例如,如果我查询A,我想接收一个包含array('A', 'AA', 'AAA', 'AAC', 'AB', 'AE', 'AEA', 'AEE', 'AEEB')的字符串数组. 问题"是还可以对较低级别的元素进行查询.如果我查询AEE,我想获取array('AEE', 'AEEB').

The goal is to retrieve an element with PHP by name and all its descendants. If for instance I query for A, I want to receive an array of string containing array('A', 'AA', 'AAA', 'AAC', 'AB', 'AE', 'AEA', 'AEE', 'AEEB'). The 'issue' is that queries can also be made to lower-level elements. If I query AEE, I want to get array('AEE', 'AEEB').

据我了解关系数据库的概念,这意味着我不能使用关系数据库,因为元素之间没有通用的键".我认为可能的解决方案是为每个单元分配PARENT元素.因此,在一个表中:

As I understand the concept of relational databases, this means that I cannot use a relational database because there is no common 'key' between elements. The solution that I thought is possible, is assigning PARENT elements to each cell. So, in a table:

CELL | PARENT
A      NULL
AA     A
AAA    AA
AAC    AA
AB     A
AE     A
AEA    AE
AEE    AE
AEEB   AEE

这样做,我认为您应该能够查询给定的字符串以及共享此父项的所有项目,然后递归地沿着这条路径前进,直到找不到更多项目为止. 但是,这对我来说似乎很慢,因为整个搜索空间都需要在每个级别上进行浏览-这正是您在多维数组中所不想要的.

By doing so, I think you should be able to query the given string, and all items that share this parent, and then recursively go down this path until no more items are found. However, this seems rather slow to me because the whole search space would need to be looked through on each level - which is exactly what you don't want in a multidimensional array.

所以我有点茫然.请注意,实际上大约有100,000个以这种方式构造的字符串,因此速度很重要.幸运的是,数据库是静态的,不会更改.如何在无需处理冗长的循环和搜索时间的情况下将这样的数据结构存储在数据库中? 我注意到,我们的服务器上已经存在PostgreSQL,所以我宁愿坚持使用.

So I am a bit at loss. Note that there are actually around 100,000 strings structured in this way, so speed is important. Luckily the database is static and would not change. How can I store such a data structure in a database without having to deal with long loops and search times? It has come to my attention that PostgreSQL is already present on our servers so I'd rather stick with that.

正如我所说,我是数据库新手,但我非常渴望学习.因此,我正在寻找一个详尽的答案,该答案将详细介绍并提供某种方法的优缺点.性能是关键.预期的答案将包含针对此用例的最佳数据库类型和语言,并以该语言编写脚本以构建这种结构.

As I said I am new to databases but I am very eager to learn. Therefore, I am looking for an extensive answer that goes into detail and provides advantages and disadvantages of a certain approach. Performance is key. An expected answer would contain the best database type and language for this use case, and also script in that language to build such a structure.

推荐答案

如果这是您所需要的,可以使用LIKE搜索

If that is all you need, you can use a LIKE search

SELECT *
FROM Table1
WHERE CELL LIKE 'AEE%';

使用以CELL开头的索引,这是一个范围检查,速度很快.

With an index beginning with CELL this is a range check, which is fast.

如果数据看起来不是这样,则可以创建一个path列,该列看起来像目录路径,并且包含从根到元素在路径/路径上"的所有节点.

If your data doesn't look like that, you can create a path column which looks like a directory path and contains all nodes "on the way/path" from root to the element.

| id | CELL | parent_id | path     |
|====|======|===========|==========|
|  1 | A    |      NULL | 1/       |
|  2 | AA   |         1 | 1/2/     |
|  3 | AAA  |         2 | 1/2/3/   |
|  4 | AAC  |         2 | 1/2/4/   |
|  5 | AB   |         1 | 1/5/     |
|  6 | AE   |         1 | 1/6/     |
|  7 | AEA  |         6 | 1/6/7/   |
|  8 | AEE  |         6 | 1/6/8/   |
|  9 | AEEB |         8 | 1/6/8/9/ |

要检索"AE"(包括其自身)的所有后代,您的查询将是

To retrieve all descendants of 'AE' (including itself) your query would be

SELECT *
FROM tree t
WHERE path LIKE '1/6/%';

或(特定于MySQL的串联)

or (MySQL specific concatenation)

SELECT t.*
FROM tree t
CROSS JOIN tree r -- root
WHERE r.CELL = 'AE'
  AND t.path LIKE CONCAT(r.path, '%');

结果:

| id | CELL | parent_id |     path |
|====|======|===========|==========|
|  6 | AE   |         1 | 1/6/     |
|  7 | AEA  |         6 | 1/6/7/   |
|  8 | AEE  |         6 | 1/6/8/   |
|  9 | AEEB |         8 | 1/6/8/9/ |

演示

我已经在 MariaDB 上使用序列插件,使用以下脚本:

I have created 100K rows of fake data on MariaDB with the sequence plugin using the following script:

drop table if exists tree;
CREATE TABLE tree (
  `id` int primary key,
  `CELL` varchar(50),
  `parent_id` int,
  `path` varchar(255),
  unique index (`CELL`),
  unique index (`path`)
);

DROP TRIGGER IF EXISTS `tree_after_insert`;
DELIMITER //
CREATE TRIGGER `tree_after_insert` BEFORE INSERT ON `tree` FOR EACH ROW BEGIN
    if new.id = 1 then
        set new.path := '1/';
    else
        set new.path := concat((
            select path from tree where id = new.parent_id
        ), new.id, '/');
    end if;
END//
DELIMITER ;

insert into tree
    select seq as id
        , conv(seq, 10, 36) as CELL
        , case
            when seq = 1 then null
            else floor(rand(1) * (seq-1)) + 1
        end as parent_id
        , null as path
    from seq_1_to_100000
;
DROP TRIGGER IF EXISTS `tree_after_insert`;
-- runtime ~ 4 sec.

测试

计算根目录下的所有元素:

Tests

Count all elements under the root:

SELECT count(*)
FROM tree t
CROSS JOIN tree r -- root
WHERE r.CELL = '1'
  AND t.path LIKE CONCAT(r.path, '%');
-- result: 100000
-- runtime: ~ 30 ms

获取特定节点下的子树元素:

Get subtree elements under a specific node:

SELECT t.*
FROM tree t
CROSS JOIN tree r -- root
WHERE r.CELL = '3B0'
  AND t.path LIKE CONCAT(r.path, '%');
-- runtime: ~ 30 ms

结果:

| id    | CELL | parent_id | path                                |
|=======|======|===========|=====================================|
|  4284 | 3B0  |       614 | 1/4/11/14/614/4284/                 |
|  6560 | 528  |      4284 | 1/4/11/14/614/4284/6560/            |
|  8054 | 67Q  |      6560 | 1/4/11/14/614/4284/6560/8054/       |
| 14358 | B2U  |      6560 | 1/4/11/14/614/4284/6560/14358/      |
| 51911 | 141Z |      4284 | 1/4/11/14/614/4284/51911/           |
| 55695 | 16Z3 |      4284 | 1/4/11/14/614/4284/55695/           |
| 80172 | 1PV0 |      8054 | 1/4/11/14/614/4284/6560/8054/80172/ |
| 87101 | 1V7H |     51911 | 1/4/11/14/614/4284/51911/87101/     |

PostgreSQL

这也适用于PostgreSQL.只需更改字符串连接语法:

PostgreSQL

This also works for PostgreSQL. Only the string concatenation syntax has to be changed:

SELECT t.*
FROM tree t
CROSS JOIN tree r -- root
WHERE r.CELL = 'AE'
  AND t.path LIKE r.path || '%';

演示: sqlfiddle - rextester

如果查看测试示例,您将看到结果中的所有路径均以"1/4/11/14/614/4284/"开头.这是带有CELL='3B0'的子树根的路径.如果对path列进行索引,则引擎将高效地找到它们,因为索引是按path排序的.就像您要在包含10万个单词的字典中查找所有以'pol'开头的单词一样.您无需阅读整个词典.

If you look at the test example, you'll see that all paths in the result begin with '1/4/11/14/614/4284/'. That is the path of the subtree root with CELL='3B0'. If the path column is indexed, the engine will find them all efficiently, because the index is sorted by path. It's like you would want to find all the words that begin with 'pol' in a dictionary with 100K words. You wouldn't need to read the entire dictionary.

这篇关于将多维数组存储在数据库中:关系还是多维?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-07 09:20