设计大量数据的问题

本文介绍了设计大量数据的问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我正在修补一个数据收集系统，并以一种非常黑客的方式来存储我的数据 - 以b / b 为参考，我是预计每年收集至少1亿个不同的数据每年可能会更多。 --- 366个数据表（每天一个）每年），每一行分配一个唯一的DataId（在所有366个表中也是唯一的） --- 100个data_map表，表0的所有DataIds都以00结尾，表格 99，所有DataIds都以99结尾等等。这主要是因为我的一位与mySQL合作的朋友说它索引大表的速度非常慢，即使你主要使用整数。但是，我读过mysql可以处理数百万行没有问题，所以它好像b $ b似乎我的基本设计过于复杂，并且由于所有的连接而导致大量的减速。另一个朋友我的建议使用文件分区（虽然他是使用MSSQL），那么另一种选择吗？有什么建议吗？I''m tinkering around with a data collection system, and have come upwith a very hackish way to store my data- for reference, I''manticipating collecting at least 100 million different dataIdwhatevers per year, possibly much more.---366 data tables ( one for each day of the year ), each row beingassigned a unique DataId ( unique across all 366 tables too )---100 data_map tables, table 0 having all DataIds ending in 00, table99 having all DataIds ending in 99 and so on.This is mostly because a friend of mine who works with mySQL said itis very slow to index large tables, even if you work with mostlyintegers.However, I''ve read mysql can handle millions of rows no problem, so itseems my basic design is overly complicated and will lead to tons ofslowdowns thanks to all the joins.Another friend of mine suggested using file partioning ( though heuses MSSQL ), so is that another option?Any advice?推荐答案分区适用于管理超大型表，因为您可以重建单个分区索引而无需触及整个表。这个减少了重建时间和中间空间要求。请注意，分区功能仅适用于企业版和开发人员版。凭借良好的索引策略，理想情况下响应时间应成比例到检索的数据量（禁止缓存数据），无论是否使用或不使用分区。按日期分区可以促进某些流程，例如增量数据加载和清除/存档以及某些类型的查询。但是，无论是否进行分区，从性能角度来看，索引是键。 - 希望这会有所帮助。 Dan Guzman SQL Server MVP http://weblogs.sqlteam.com/dang/ " nflacco" < ma ********* @ gmail.comwrote in message news：9e ********************* ************* @ d19g2000 prm.googlegroups.com ...Partitioning is good for managing very large tables because you can rebuildindividual partition indexes without touching the entire table. Thisreduces rebuild time and intermediate space requirements. Be aware that thepartitioning feature is available only in Enterprise and Developer editions.With a good indexing strategy, response time should ideally be proportionalto the amount of data retrieved (barring cached data) regardless of whetheror not partitioning is used. Partitioning by date can facilitate certainprocesses, like incremental data loads and purge/archival as well as certaintypes of queries. However, with or without partitioning, indexing is thekey from a a performance perspective.--Hope this helps.Dan GuzmanSQL Server MVP http://weblogs.sqlteam.com/dang/"nflacco" <ma*********@gmail.comwrote in messagenews:9e**********************************@d19g2000 prm.googlegroups.com... 分区适用于管理超大型表，因为您可以重建单个分区索引而无需触及整个表。这个减少了重建时间和中间空间要求。请注意，分区功能仅适用于企业版和开发人员版。凭借良好的索引策略，理想情况下响应时间应成比例到检索的数据量（禁止缓存数据），无论是否使用或不使用分区。按日期分区可以促进某些流程，例如增量数据加载和清除/存档以及某些类型的查询。但是，无论是否进行分区，从性能角度来看，索引是键。 - 希望这会有所帮助。 Dan Guzman SQL Server MVPhttp：//weblogs.sqlteam.com/dang/ " nflacco" < mail.fla ... @ gmail.comwrote in message 新闻：9e ********************* ************* @ d19g2000 prm.googlegroups.com ...Partitioning is good for managing very large tables because you can rebuildindividual partition indexes without touching the entire table. Thisreduces rebuild time and intermediate space requirements. Be aware that thepartitioning feature is available only in Enterprise and Developer editions.With a good indexing strategy, response time should ideally be proportionalto the amount of data retrieved (barring cached data) regardless of whetheror not partitioning is used. Partitioning by date can facilitate certainprocesses, like incremental data loads and purge/archival as well as certaintypes of queries. However, with or without partitioning, indexing is thekey from a a performance perspective.--Hope this helps.Dan GuzmanSQL Server MVPhttp://weblogs.sqlteam.com/dang/"nflacco" <[email protected] in messagenews:9e**********************************@d19g2000 prm.googlegroups.com... 重新编制索引让我很担心。如果我们按照不使用过多的表格方案，我将不断向主表（以前的366天表）添加新数据，以及处理数据表。（The re-indexing is what worries me. I''ll be constantly adding new datato main table ( formerly the 366 day tables ) if we follow the not usetoo many tables scheme, as well as the processed-data tables.( 这篇关于设计大量数据的问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！