本文介绍了优化查询(Indexing, EXPLAIN)Mysql的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据另一位开发人员在 stackoverflow 上的建议,我更新了我的查询如下,但我仍然需要进一步优化它.有人可以指导我如何最好地将索引应用于查询.

Based on an advise from another developer here on stackoverflow, I have updated my query as below, but I still need to optimise it further. Can someone guide my on how best I can apply indexing to the query.

见下面的查询:

SELECT a.id, a.user_unique_id, a.loan_location,
          a.ippis, a.tel_no,
          a.organisation, a.branch, a.loan_agree,
          a.loan_type, a.appr, a.sold,
          a.loan_status, a.top_up, a.current_loan,
          a.date_created, a.date_updated, c.loan_id, c.user_unique_id AS tu_user_unique_id,
          c.ippis AS tu_ippis, c.top_up_approved,
           c.loan_type AS tu_loan_type, c.dse, c.status, c.current_loan AS tu_current_loan,
          c.record_category, c.date_created AS tu_date_created,
          c.date_updated AS tu_date_updated
FROM loan_applications_tbl a
LEFT JOIN topup_or_reapplication_tbl c
    ON a.ippis=c.ippis
WHERE ((c.status IN ('pending', 'corrected', 'Rejected', 'Processing', 'Captured', 'Reviewed', 'top up')
       AND MONTH(CURRENT_DATE) IN (MONTH(c.date_created), MONTH(c.date_updated)
       AND YEAR(CURRENT_DATE) IN (YEAR(c.date_created), YEAR(c.date_updated))
       AND   c.current_loan='1' ))
OR ( a.loan_status IN ('pending', 'corrected', 'Rejected', 'Processing', 'Captured', 'Reviewed', 'top up')
     AND MONTH(CURRENT_DATE) IN (MONTH(a.date_created), MONTH(a.date_updated)) )
     AND YEAR(CURRENT_DATE) IN (YEAR(a.date_created), YEAR(a.date_updated))
     AND (a.current_loan='1'
          OR (a.current_loan='0'
              AND a.loan_status IN('Approved','Closed')))))

执行时间:53s

记录数:11000

使用mysql EXPLAIN给出如下截图:(如何最大化可能的keys列中的信息

using mysql EXPLAIN gives the screenshot below: (How do I maximise the information in the possible_keys column

我已更新以下附加信息:

I HAVE UPDATED ADDITIONAL INFORMATION BELOW:

我在 c 和 a 之间使用 OR 的原因如下:

I am using the OR between c and a for the below reasons:

  1. a 是具有 66 列的父表,如果 a 上的新条目具有匹配/现有的 ippis,则该表将填充贷款条目(a 上的唯一字段)a 中的某些列被新条目中的数据更新/覆盖,而条目中的其余数据作为新插入c 中的行(ippis 在表 c 中不是唯一的).这是为了保留所有后续贷款请求的历史记录,同时不为冗余留出空间

  1. a is the parent table with 66 columns which gets populated with loan entries, if a new entry on a has a matching/existing ippis (Unique field on a) some columns in a are updated/overwriten with data from the new entry, while the remaining data in the entry are inserted as new rows in c (ippis is not unique in table c). This is to keep a history of all subsequent loan requests while not giving room for redundancy

在检索记录时,我需要大 OR 子句来让我检查 ac 表的所有实例status、date 和 current_loan 列与我的 WHERE 子句中的参数匹配的贷款记录.

While retrieving records I needed the big OR clause to enable me check both a and c tables for all instances of each loan records where the status, date and current_loan columns match the parameters in my WHERE clause.

a 总是有完整的记录,但 c 不会总是有记录,除非有更多的贷款请求相同的唯一ID.a 包含谁是帐户人,例如通过唯一 ID",以及第一笔贷款的附加/补充状态详细信息,随后在第一笔贷款c"之后.将是具有相同唯一 ID 的实际贷款申请的附加/补充状态详细信息.

a will ALWAYS have a complete record in it but c will NOT ALWAYS have a record in it, except there are more loan requests for the same unique ID. a contains the "who is the account person such as by unique ID", and the additional / supplemental status detail FOR THE FIRST LOAN, subsequently, after the first loan "c" will be additional / supplemental status detail of the actual loan applications with the same Unique ID.

如果A"于 3 月 12 日创建,并创建了一个新的c".记录是在 3 月 16 日创建的.记录也会获得最后更新的标记为 3 月 16 日,因为它有一个对其有一定影响的子附件,而新的 c 记录有它自己创建和更新的时间戳.a 记录的 Updated 字段将为空/空,直到进行更改或存在 c 记录,c 的 Updated 字段将为空/空 记录直到对 c 记录进行了一些更改

if "A" is created Mar 12, and a new "c" record is created on Mar 16. The "A" record also gets the last updated stamped with Mar 16 since it has a child attachment that has some impact against it, while the new c record has it own created and updated time stamps. The Updated field will be blank/null for the a record until changes are made or there is a c record, The Updated field will be blank/null for c record until there's some changes made to c record

我希望这是可以理解的

推荐答案

我一直忘记这个术语,因为它对我来说很少出现,但无论如何,你的索引不能使用 MONTH() 和 YEAR() 作为优化它们是基础数据的函数.通过应用日期范围,他们可以.因此,您可以保留您的月/年,例如某些内容是在 2021 年 1 月创建并在 2021 年 3 月更新,但此外,添加一个 "and c.date_created >= current_date AND current_date ,如果索引中包含创建日期,则可以使用该索引(在这种情况下,更新日期不太重要.其他表也是如此.

I keep forgetting the term as it comes up very rarely to me, but anyhow, your indexes can not be optimized by using MONTH() and YEAR() as they are functions on the underlying data. By applying a date RANGE, they can. So you can keep your month/year such as if something was created in Jan 2021 and updated in Mar 2021, but in addition, adding an "and c.date_created >= current_date AND current_date <= c.date_updated", you CAN utilize the index if it has the created date in it (less important in this case for the date updated.Similarly for your other table.

此外,当您从a"进行左连接时到c"表,然后应用 where,这几乎就像您试图强制连接但由于 OR 保持左连接.

In addition, when you have your left-join from the "a" to the "c" table, then applying where, its almost like you are trying to force the join but remains left-join due to the OR.

我会移动c"左连接的基于条件,然后只需测试在那里找到的记录是否为 NULL.

I would move the "c" based condition to the left-join, then just test for the record found there as NULL or not.

虽然不清楚(我问时没有澄清),但我认为当一个新的A"记录创建后,系统实际上可能会将创建日期同时放在创建日期和更新日期中.如果是这种情况,那么我们只需要查询/关注当前活动的当前月份/年份的最后更新日期字段.这现在是 where 子句的主要要求——不管C"的底层 OR 条件如何.表.

Although not clear (was not clarified when I asked), I THINK that when a new "A" record is created, the system may actually put the creation date into both the date created and date updated. IF THIS IS THE CASE, then we only need to query/concern the last updated date field with the current month/year of activity. That is now the PRIMARY requirement for the where clause -- REGARDLESS of the underlying OR condition to the "C" table.

此外,由于月() 和年() 不是sargeable (谢谢 Ollie),我正在做一个预查询以获得当月和下个月的开始,以便我可以构建一个

Additionally, since the month() and year() are not sargeable (Thanks Ollie), I am doing a prequery to get the beginning of the current month and next month so I can build out a

WHERE > beginning of this month and LESS than beginning of next month

至于索引,我会开始更新到

As for indexes, I would start update to

loan_applications_tbl ( date_created, date_updated, loan_status, current_loan, ippis )
topup_or_reapplication_tbl ( ippis, status, current_loan, date_created, date_updated )

要尝试的最终查询.

SELECT
        a.id,
        a.user_unique_id,
        a.loan_location,
        a.ippis,
        a.tel_no,
        a.organisation,
        a.branch,
        a.loan_agree,
        a.loan_type,
        a.appr,
        a.sold,
        a.loan_status,
        a.top_up,
        a.current_loan,
        a.date_created,
        a.date_updated,
        c.loan_id,
        c.user_unique_id tu_user_unique_id,
        c.ippis tu_ippis,
        c.top_up_approved,
        c.loan_type tu_loan_type,
        c.dse,
        c.status,
        c.current_loan tu_current_loan,
        c.record_category,
        c.date_created tu_date_created,
        c.date_updated tu_date_updated
    FROM
        -- this creates inline mySQL variables I can use for the WHERE condition
        -- by doing comma after with no explicit join, it is a single row
        -- and thus no Cartesian result, just @variables available now
        ( select
                -- first truncating any TIME portion by casting to DATE()
                @myToday := date(curdate()),
                @howFarBack := date_sub( @myToday, interval 6 month ),
                -- now subtract day of month -1 to get first of THIS month
                @beginOfMonth := date_sub( @myToday, interval dayOfMonth( @myToday ) -1 day ),
                -- and now, add 1 month for beginning of next
                @beginNextMonth := date_add( @beginOfMonth, interval 1 month ) ) SqlVars,

        loan_applications_tbl a

            LEFT JOIN topup_or_reapplication_tbl c
                ON  a.ippis = c.ippis
                AND c.current_loan='1'
                AND c.status IN ('pending', 'corrected', 'Rejected',
                                'Processing', 'Captured', 'Reviewed', 'top up')
                AND
                (
                        (@beginOfMonth <= c.date_created
                    AND c.date_created < @beginNextMonth)

                OR
                        (@beginOfMonth <= a.date_updated
                    AND a.date_updated < @beginNextMonth )
                )

    WHERE
            -- forces only activity for the single month in question
            -- since the "a" table knows of any "updates" to the "C",
            -- its updated basis will keep overall restriction to any accounts

            -- updated within this month in question only
            -- testing specifically for created OR updated within the
            -- current month in question

        a.date_created >= @howFarBack
        AND
            (
                    (@beginOfMonth <= a.date_created
                AND a.date_created < @beginNextMonth)

            OR
                    (@beginOfMonth <= a.date_updated
                AND a.date_updated < @beginNextMonth )
            )

        -- and NOW we can easily apply the OR without requiring
        -- to run against the ENTIRE set of BOTH tables.
        AND (
                    c.ippis IS NOT NULL
                OR
                    ( a.loan_status IN (  'pending', 'corrected', 'Rejected', 'Processing',
                            'Captured', 'Reviewed', 'top up')
                    AND (
                            a.current_loan = '1'
                        OR  (   a.current_loan = '0'
                            AND a.loan_status IN ('Approved', 'Closed')
                            )
                        )
                    )
            )

关闭对查询的评论

我修改了查询以及第一个表上的主索引以包含(第一个位置)记录的创建日期.我还添加了一个额外的变量 @howFarBack 作为考虑贷款的最长回溯时间.我默认为 6 个月前.您是否需要考虑贷款超过 6 个月的给定帐户?或者是a"帐户记录的东西可以追溯到 10 年前并希望包括在内?我的印象是这是一个新的贷款申请添加日期.如果是这样,在批准、最终确定、取消之前允许回溯 6 个月,仍然会阻止过去几个月的数据.

I modified the query and also the primary index on the first table to INCLUDE (first position) the date created of the record. I also added an additional variable @howFarBack to be the maximum going back time to consider for a loan. I defaulted to 6 months back. Would you ever need to consider a given account older than 6 months for a loan? Or is the "a" account records something that could go back 10 years and want to include? My impression is that it is a new LOAN APPLICATION add date. If so, allowing to go 6 months back before it is approved, finalized, cancelled would still prevent going through as many months of data historically.

在 WHERE 子句中,我为 CREATED_DATE >= @howFarBack 添加了显式添加.永远不可能创建子记录,更不用说在原始添加日期之前的任何时间更新了.这将仅强制当月活动或转发符合条件.

In the WHERE clause, I added explicit add for the CREATED_DATE >= @howFarBack. It would never be possible for a child record to be created, let alone updated any time prior to the original add date. This will force only current month activity OR FORWARD to qualify.

例如:在 4 月 28 日创建一笔贷款.因此,运行查询,月初是 4 月 1 日,但比 5 月 1 日少(这允许包含 4 月 30 日晚上 11:59:59)

Ex: Create a loan on April 28th. So running the query, the beginning of the month is April 1st but LESS than May 1st (this allows inclusion of April 30 at 11:59:59pm)

现在,我们进入了 5 月,5 月 4 日完成了贷款变更.我们在新的一个月里,@howFarBack 仍然允许 2020 年 12 月之前的旧应用程序与整个应用程序表相比,这些应用程序可能符合我们所知道的可以追溯到 2005 年的整个表.您始终使用最新的数据,并且可以轻松地将 @howFarBack 更改为最大回溯时间.这应该有助于您的性能需求.

Now, we get into May and a change on the loan is done on May 4th. We are in a new month and the @howFarBack still allows older applications as far as December 2020 to POSSIBLY qualify vs the entire table of applications that could go back as far as 2005 for all we know. You are always staying with the most current data and you can change the @howFarBack easily enough as the maximum going back time. This should help your performance needs.

这篇关于优化查询(Indexing, EXPLAIN)Mysql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 13:07