免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1475 | 回复: 0
打印 上一主题 下一主题

MySQL ORDER BY With LIMIT and ANALYZE [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-03-11 17:51 |只看该作者 |倒序浏览

   In this third case we get to see some really fun stuff: The effect of
ANALYZE
   on index selection for a three table join, an ORDER BY with LIMIT optimization, and documented evidence
   that what we'll fix makes a night and day difference for the better. The customer in this case runs a gaming website
   using the
Neocrome Land Down Under
website engine. The customer's website
   averages about 7,000 unique vistors and 150,000 pages views a day. The most popular forum averages 300 - 500
   people online. At least, it does now; before the fixes we're about to do the server would regularly fall down
   trying to load the homepage. That is, it could take up to a minute to load the homepage. By now I hope
   you've read
Case 1: Basic Indexes
and Case 2: Table Design and Index
   Details so we'll jump straight into the action.
   ORDER BY With LIMIT
   Until now index usage by MySQL has been straightforward common sense. The following example comes from the first
   easily fixed slow query:
   
      # Query_time: 39  Lock_time: 0  Rows_sent: 2  Rows_examined: 184753
      SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
   
   Without looking we already know that there's no index on fp_topicid, but I'll show it for completeness:
         mysql> SHOW INDEXES FROM ldu_forum_posts;
      +-----------------+------------+------------+--------------+-------------+-----------+-------------+
      | Table           | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality |
      +-----------------+------------+------------+--------------+-------------+-----------+-------------+
      | ldu_forum_posts |          0 | PRIMARY    |            1 | fp_id       | A         |      187356 |
      | ldu_forum_posts |          1 | fp_updated |            1 | fp_creation | A         |      187356 |
      +-----------------+------------+------------+--------------+-------------+-----------+-------------+
      
      mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
      +-----------------+-------+---------------+---------+---------+------+--------+-------------+
      | table           | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
      +-----------------+-------+---------------+---------+---------+------+--------+-------------+
      | ldu_forum_posts | index | NULL          | PRIMARY |       3 | NULL | 187360 | Using where |
      +-----------------+-------+---------------+---------+---------+------+--------+-------------+
   
   Although we know the solution is index fp_topicid, fp_id, I want to consider why MySQL says no possible keys yet
   it uses the PRIMARY key and does an index scan instead of a table scan. If the query matches rows on fp_topicid,
   but this column is not in the PRIMARY key, why is MySQL scanning the PRIMARY key? MySQL is obviously not going
   to find any fp_topicid = 24701 in the PRIMARY key. The answer is vaguely stated in the MySQL manual section
   on
LIMIT optimization
: "If you use
   LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather
   than sorting the whole table." In short: Using ORDER BY with LIMIT causes this type of optimization. To
   understand why consider how table data is stored:
         On HDD:                                PRIMARY key:
         fp_id   fp_topicid   HDD location      fp_id   fp_topicid   HDD location
         1       7            a                 1       7            a
         4       7            b                 2       7            d
         3       12           c                 3       12           c
         2       7            d                 4       7            b
         5       15           e                 5       15           e
   
   Notice on the hard drive (HDD) data is pretty much in random order; MySQL expects this. "HDD location"
   is a laughable simplification to show the physical ordering of data on the hard drive. In the PRIMARY
   key data is sorted by fp_id, which translates to respective HDD locations. This is, of course, the nature of
   indexes: If you want fp_id = 5 MySQL simply jumps to HDD location 'e'; witout the index MySQL would have
   to scan HDD locations 'a' through 'd' until arriving at 'e'. Consider our query with just the ORDER BY
   condition: Either way it goes this causes MySQL to do a full table scan. MySQL could index scan and get every
   row for every index record and by doing this would at least avoid a filesort because the index is already in
   sorted ORDER BY fp_id. In effect it's still a table scan though. As proof that adding the LIMIT condition enables
   the optimization lets table scan and filesort our test data manually:
   
      Query: SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 7 ORDER BY fp_id ASC;
      Table scan: a (match 1), b (match 4), c (no), d (match 2), e (no)
      Filesort ASC: 1, 4, 2 (a, b, d) => 1, 2, 4 (a, d, b)
      Retrieve rows: a, d, b (1, 2, 4)
   
   Wihout LIMIT we must scan all the way to the end of the table, but if we only need the first 2 matches then it
   becomes efficient to index scan and get the respective rows, and in so doing avoid a filesort:
   
      Query: SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 7 ORDER BY fp_id ASC LIMIT 2;
      PRIMARY index scan to table scan: a (match 1), d (match 2)
      Retrieve rows: a, d (1, 2)
   
   Same end result: Rows 'a' and 'd' are retrieved first and second. ORDER BY fp_id gives us the disposition
   to use the PRIMARY key, LIMIT makes it practical to do so.
   With that explained, we fix the query by (unique) indexing fp_topicid, fp_id and the result is expected:
         mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
      +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
      | table           | type | possible_keys | key           | key_len | ref   | rows | Extra                    |
      +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
      | ldu_forum_posts | ref  | fp_topicid_id | fp_topicid_id |       3 | const |   31 | Using where; Using index |
      +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
   
   Index ANALYZE Effect
   The next slow query and the previous turn out to have someting in common:
   
      # Query_time: 18  Lock_time: 7  Rows_sent: 8  Rows_examined: 575139
      SELECT
         p.fp_id,
         t.ft_title, t.ft_id, t.ft_updated, t.ft_postcount, t.ft_lastposterid, t.ft_lastpostername,
         s.fs_id, s.fs_title
      FROM
         ldu_forum_posts AS p,
         ldu_forum_topics AS t,
         ldu_forum_sections AS s
      WHERE s.fs_minlevel
   Notice p.fp_topicid = t.ft_id could use the index we just created; MySQL notices this too:
         mysql> EXPLAIN (the above query)
      +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
      | table | type   | possible_keys | key     | key_len | ref            | rows   | Extra                           |
      +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
      | p     | ALL    | fp_topicid_id | NULL    |    NULL | NULL           | 188047 | Using temporary; Using filesort |
      | t     | eq_ref | PRIMARY       | PRIMARY |       3 | p.fp_topicid   |      1 | Using where                     |
      | s     | eq_ref | PRIMARY       | PRIMARY |       2 | p.fp_sectionid |      1 | Using where                     |
      +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
   
   The two equality references (eq_ref) are great, but the table scan for 'p' is not, and "Using filesort" just
   makes it worse. I couldn't figure out why MySQL wasn't using this index (and it was still logging this query
   as slow). Then I remembered what the MySQL manual says about ANALYZE: "If you have a problem with incorrect
   index usage, you should run ANALYZE TABLE to update table statistics such as cardinality of keys, which can
   affect the choices the optimizer makes." I ran ANALYZE on all three tables and to our delight MySQL changed:
         mysql> EXPLAIN (the above query)
      +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
      | table | type   | possible_keys | key           | key_len | ref            | rows  | Extra           |
      +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
      | t     | index  | PRIMARY       | ft_updated    |       4 | NULL           | 12152 | Using temporary |
      | p     | ref    | fp_topicid_id | fp_topicid_id |       3 | t.ft_id        |    16 | Using where     |
      | s     | eq_ref | PRIMARY       | PRIMARY       |       2 | p.fp_sectionid |     1 | Using where     |
      +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
   
   A pretty big change too: Everything except for table 's' is different. The obvious benefits of these changes
   are: No filesort or table scan. There is another very important benefit: 175,277 less rows read. The total
   number of rows a join will produce (not to be confused with how many the query will actually match and return)
   is the "product of the number of rows in each table." In our case, before ANALYZE this amounts to 188,047 rows,
   after ANAZLYE 194,432 rows. Even though the query will now produce about 10,000 more rows it will read 175,000
   less rows because of the way MySQL resolves joins. An explanation for this is beyond scope but I wrote about
   it in
JOIN Rows Produced vs. Read
. Overall the effect is clear:
   ANALYZE turned an okay join into a good join (although the MySQL manual will disagree with me, calling the
   JOIN before ANAZLYE "perfect" and I disagree with it because our post-ANALYZE join is consistently 62% faster
   when tested on the production server).
   Evidence of Improvement
   Before I changed anything on this server I recorded a 24 hour snapshot of MySQL performance using
   
mysqlreport
. After the above fixes I restarted MySQL, waited 24 hours, and took
   another snapshot. The results speak for themselves (okay maybe not because I've annotated them; all are
   percentages):
   
      Slow queries: -99.72
      Questions/second: +100 (22 to 44/second average or 1.9M to 3.8M/day)
      SELECT full scan: +78
      SELECT full join: +136
      Table locks waited: -39
      Slow launch threads: -100
      Connections used: -18 (although more users are connecting)
   
   The server's load was also     cut by three-quarters. Ironically, the day the queries were fixed (around
   02:00 hours in the morning) was a day the website's primary game was down. On such days the website fills with
   users. That day set the record for unique visitors and page views.
   Final Thoughts
   I think this is what makes slow query optimization fun: Subtle optimizations like ORDER BY with LIMIT, sweeping
   JOIN plan changes, and helping what is fun and entertaining for other people.
               
               
               

本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/2389/showart_257013.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP