Monday, February 25, 2013

Optimize MySQL COUNT (*) query

There is not magic solution to count table contents especially if you have a client that has a table with 40+ million records and you need to do the filtering using the 'OR' SQL statement.

Original query was the following:

SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') OR field2 IN ('val3','val4');

First benchmark before doing optimization showed me results after 4 minutes.

My first step was to create a compound key (multi-column key) using the field1 and field2.
I created it as following:
ALTER TABLE table1 ADD INDEX `field1_field2_idx` (`field1`,`field2`);

After waiting for 3 hours (you remember I had a table with 40 million records) I was able to continue.

This new compound key does not gave me any performance increase because OR statement was used in the original query.

As some of you know "ORs are notoriously bad performers, because it splinters the execution path" (from: http://stackoverflow.com/questions/6551682/mysql-indexing-in-an-or-statement).

After some thinking I decided to get rid of the OR and create 2 queries instead.

The following 2 queries gave me the same result ( I had the sum up the results):

SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') AND field2 NOT IN ('val3','val4');
SELECT COUNT(*) from table1 WHERE field2 IN ('val3','val4');

Now the results were ready after 2 minutes. The first query was almost instant because it used the compound key and the second query took all the time. It was because the index was missing for field2.

So, as a final step I had to add an another index on field2. I used the following command:

ALTER TABLE table1 ADD INDEX `field2_idx` (`field2`);

I had to waite another 3 hours for the index to be ready and finally:

JUST 5 SECS to count rows on 40+ million records' table result was achieved.


Hope you will find this information useful in your case.

Best regards,
Yuli