论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2007-03-11 17:51 |只看该作者 |倒序浏览

In this third case we get to see some really fun stuff: The effect of
ANALYZE
on index selection for a three table join, an ORDER BY with LIMIT optimization, and documented evidence
that what we'll fix makes a night and day difference for the better. The customer in this case runs a gaming website
using the
Neocrome Land Down Under
website engine. The customer's website
averages about 7,000 unique vistors and 150,000 pages views a day. The most popular forum averages 300 - 500
people online. At least, it does now; before the fixes we're about to do the server would regularly fall down
trying to load the homepage. That is, it could take up to a minute to load the homepage. By now I hope
you've read
Case 1: Basic Indexes
and Case 2: Table Design and Index
Details so we'll jump straight into the action.
ORDER BY With LIMIT
Until now index usage by MySQL has been straightforward common sense. The following example comes from the first
easily fixed slow query:

   # Query_time: 39  Lock_time: 0  Rows_sent: 2  Rows_examined: 184753
   SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;

Without looking we already know that there's no index on fp_topicid, but I'll show it for completeness:
      mysql> SHOW INDEXES FROM ldu_forum_posts;
   +-----------------+------------+------------+--------------+-------------+-----------+-------------+
   | Table          | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality |
   +-----------------+------------+------------+--------------+-------------+-----------+-------------+
   | ldu_forum_posts |       0 | PRIMARY |          1 | fp_id    | A       |    187356 |
   | ldu_forum_posts |       1 | fp_updated |          1 | fp_creation | A       |    187356 |
   +-----------------+------------+------------+--------------+-------------+-----------+-------------+

   mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
   +-----------------+-------+---------------+---------+---------+------+--------+-------------+
   | table          | type  | possible_keys | key    | key_len | ref  | rows | Extra    |
   +-----------------+-------+---------------+---------+---------+------+--------+-------------+
   | ldu_forum_posts | index | NULL       | PRIMARY |    3 | NULL | 187360 | Using where |
   +-----------------+-------+---------------+---------+---------+------+--------+-------------+

Although we know the solution is index fp_topicid, fp_id, I want to consider why MySQL says no possible keys yet
it uses the PRIMARY key and does an index scan instead of a table scan. If the query matches rows on fp_topicid,
but this column is not in the PRIMARY key, why is MySQL scanning the PRIMARY key? MySQL is obviously not going
to find any fp_topicid = 24701 in the PRIMARY key. The answer is vaguely stated in the MySQL manual section
on
LIMIT optimization
: "If you use
LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather
than sorting the whole table." In short: Using ORDER BY with LIMIT causes this type of optimization. To
understand why consider how table data is stored:
      On HDD:                               PRIMARY key:
      fp_id fp_topicid HDD location    fp_id fp_topicid HDD location
      1    7          a                1    7          a
      4    7          b                2    7          d
      3    12          c                3    12          c
      2    7          d                4    7          b
      5    15          e                5    15          e

Notice on the hard drive (HDD) data is pretty much in random order; MySQL expects this. "HDD location"
is a laughable simplification to show the physical ordering of data on the hard drive. In the PRIMARY
key data is sorted by fp_id, which translates to respective HDD locations. This is, of course, the nature of
indexes: If you want fp_id = 5 MySQL simply jumps to HDD location 'e'; witout the index MySQL would have
to scan HDD locations 'a' through 'd' until arriving at 'e'. Consider our query with just the ORDER BY
condition: Either way it goes this causes MySQL to do a full table scan. MySQL could index scan and get every
row for every index record and by doing this would at least avoid a filesort because the index is already in
sorted ORDER BY fp_id. In effect it's still a table scan though. As proof that adding the LIMIT condition enables
the optimization lets table scan and filesort our test data manually:

   Query: SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 7 ORDER BY fp_id ASC;
   Table scan: a (match 1), b (match 4), c (no), d (match 2), e (no)
   Filesort ASC: 1, 4, 2 (a, b, d) => 1, 2, 4 (a, d, b)
   Retrieve rows: a, d, b (1, 2, 4)

Wihout LIMIT we must scan all the way to the end of the table, but if we only need the first 2 matches then it
becomes efficient to index scan and get the respective rows, and in so doing avoid a filesort:

   Query: SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 7 ORDER BY fp_id ASC LIMIT 2;
   PRIMARY index scan to table scan: a (match 1), d (match 2)
   Retrieve rows: a, d (1, 2)

Same end result: Rows 'a' and 'd' are retrieved first and second. ORDER BY fp_id gives us the disposition
to use the PRIMARY key, LIMIT makes it practical to do so.
With that explained, we fix the query by (unique) indexing fp_topicid, fp_id and the result is expected:
      mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
   +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
   | table          | type | possible_keys | key          | key_len | ref | rows | Extra                   |
   +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
   | ldu_forum_posts | ref  | fp_topicid_id | fp_topicid_id |    3 | const | 31 | Using where; Using index |
   +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+

Index ANALYZE Effect
The next slow query and the previous turn out to have someting in common:

   # Query_time: 18  Lock_time: 7  Rows_sent: 8  Rows_examined: 575139
   SELECT
      p.fp_id,
      t.ft_title, t.ft_id, t.ft_updated, t.ft_postcount, t.ft_lastposterid, t.ft_lastpostername,
      s.fs_id, s.fs_title
   FROM
      ldu_forum_posts AS p,
      ldu_forum_topics AS t,
      ldu_forum_sections AS s
   WHERE s.fs_minlevel
Notice p.fp_topicid = t.ft_id could use the index we just created; MySQL notices this too:
      mysql> EXPLAIN (the above query)
   +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
   | table | type | possible_keys | key    | key_len | ref          | rows | Extra                         |
   +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
   | p    | ALL | fp_topicid_id | NULL | NULL | NULL          | 188047 | Using temporary; Using filesort |
   | t    | eq_ref | PRIMARY    | PRIMARY |    3 | p.fp_topicid |    1 | Using where                   |
   | s    | eq_ref | PRIMARY    | PRIMARY |    2 | p.fp_sectionid |    1 | Using where                   |
   +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+

The two equality references (eq_ref) are great, but the table scan for 'p' is not, and "Using filesort" just
makes it worse. I couldn't figure out why MySQL wasn't using this index (and it was still logging this query
as slow). Then I remembered what the MySQL manual says about ANALYZE: "If you have a problem with incorrect
index usage, you should run ANALYZE TABLE to update table statistics such as cardinality of keys, which can
affect the choices the optimizer makes." I ran ANALYZE on all three tables and to our delight MySQL changed:
      mysql> EXPLAIN (the above query)
   +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
   | table | type | possible_keys | key          | key_len | ref          | rows  | Extra          |
   +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
   | t    | index  | PRIMARY    | ft_updated |    4 | NULL          | 12152 | Using temporary |
   | p    | ref | fp_topicid_id | fp_topicid_id |    3 | t.ft_id       | 16 | Using where    |
   | s    | eq_ref | PRIMARY    | PRIMARY    |    2 | p.fp_sectionid |    1 | Using where    |
   +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+

A pretty big change too: Everything except for table 's' is different. The obvious benefits of these changes
are: No filesort or table scan. There is another very important benefit: 175,277 less rows read. The total
number of rows a join will produce (not to be confused with how many the query will actually match and return)
is the "product of the number of rows in each table." In our case, before ANALYZE this amounts to 188,047 rows,
after ANAZLYE 194,432 rows. Even though the query will now produce about 10,000 more rows it will read 175,000
less rows because of the way MySQL resolves joins. An explanation for this is beyond scope but I wrote about
it in
JOIN Rows Produced vs. Read
. Overall the effect is clear:
ANALYZE turned an okay join into a good join (although the MySQL manual will disagree with me, calling the
JOIN before ANAZLYE "perfect" and I disagree with it because our post-ANALYZE join is consistently 62% faster
when tested on the production server).
Evidence of Improvement
Before I changed anything on this server I recorded a 24 hour snapshot of MySQL performance using

mysqlreport
. After the above fixes I restarted MySQL, waited 24 hours, and took
another snapshot. The results speak for themselves (okay maybe not because I've annotated them; all are
percentages):

   Slow queries: -99.72
   Questions/second: +100 (22 to 44/second average or 1.9M to 3.8M/day)
   SELECT full scan: +78
   SELECT full join: +136
   Table locks waited: -39
   Slow launch threads: -100
   Connections used: -18 (although more users are connecting)

The server's load was also    cut by three-quarters. Ironically, the day the queries were fixed (around
02:00 hours in the morning) was a day the website's primary game was down. On such days the website fills with
users. That day set the record for unique visitors and page views.
Final Thoughts
I think this is what makes slow query optimization fun: Subtle optimizations like ORDER BY with LIMIT, sweeping
JOIN plan changes, and helping what is fun and entertaining for other people.

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u/2389/showart_257013.html

文库|博客

返回列表

Chinaunix › 论坛 › 数据库技术 › MySQL › MySQL文档中心 › MySQL ORDER BY With LIMIT and ANALYZE

MySQL ORDER BY With LIMIT and ANALYZE [复制链接]

浏览过的版块