Monday, February 25, 2013

Optimize MySQL COUNT (*) query

There is not magic solution to count table contents especially if you have a client that has a table with 40+ million records and you need to do the filtering using the 'OR' SQL statement.

Original query was the following:

SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') OR field2 IN ('val3','val4');

First benchmark before doing optimization showed me results after 4 minutes.

My first step was to create a compound key (multi-column key) using the field1 and field2.
I created it as following:
ALTER TABLE table1 ADD INDEX `field1_field2_idx` (`field1`,`field2`);

After waiting for 3 hours (you remember I had a table with 40 million records) I was able to continue.

This new compound key does not gave me any performance increase because OR statement was used in the original query.

As some of you know "ORs are notoriously bad performers, because it splinters the execution path" (from: http://stackoverflow.com/questions/6551682/mysql-indexing-in-an-or-statement).

After some thinking I decided to get rid of the OR and create 2 queries instead.

The following 2 queries gave me the same result ( I had the sum up the results):

SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') AND field2 NOT IN ('val3','val4');
SELECT COUNT(*) from table1 WHERE field2 IN ('val3','val4');

Now the results were ready after 2 minutes. The first query was almost instant because it used the compound key and the second query took all the time. It was because the index was missing for field2.

So, as a final step I had to add an another index on field2. I used the following command:

ALTER TABLE table1 ADD INDEX `field2_idx` (`field2`);

I had to waite another 3 hours for the index to be ready and finally:

JUST 5 SECS to count rows on 40+ million records' table result was achieved.


Hope you will find this information useful in your case.

Best regards,
Yuli



7 comments:

  1. Thanks. Great solution!

    ReplyDelete
  2. Based on what you wrote it's possible to remove the field1_field2_idx and have instead field1_idx and field2_idx. Which will result in less wasted space 'cause field1_idx will be a lot smaller than field1_field2_idx.

    ReplyDelete
  3. Yes, the field1_field2_idx index can be removed but the query will run a bit slowly.

    The system is in production now. I can not provide the benchmarks.

    Yuli

    ReplyDelete
  4. Hi Yuli,
    you can use the First index only if you just run the SQL
    SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2');
    SELECT COUNT(*) from table1 WHERE field2 IN ('val3','val4')
    AND field1 NOT IN ('val1','val2');
    That will save you having both indexes, but may give you slower performance.

    ReplyDelete
    Replies
    1. Yes IMO this will be a better solution, I don't think this will give slower performance.

      Delete
    2. Yes IMO this will be better, I don't think this will give slower performance. Why to have another Index when only first one can do.

      Delete
  5. I enϳoy what уоu guyѕ агe up tоо.
    Ѕuch cleѵег worκ and ехposurе!

    Keeρ up the very good ωorkѕ guys I've included you guys to our blogroll.

    ReplyDelete