Original query was the following:
SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') OR field2 IN ('val3','val4');
First benchmark before doing optimization showed me results after 4 minutes.
My first step was to create a compound key (multi-column key) using the field1 and field2.
I created it as following:
ALTER TABLE table1 ADD INDEX `field1_field2_idx` (`field1`,`field2`);
After waiting for 3 hours (you remember I had a table with 40 million records) I was able to continue.
This new compound key does not gave me any performance increase because OR statement was used in the original query.
As some of you know "OR
s are notoriously bad performers, because it splinters the execution path" (from: http://stackoverflow.com/questions/6551682/mysql-indexing-in-an-or-statement).After some thinking I decided to get rid of the OR and create 2 queries instead.
The following 2 queries gave me the same result ( I had the sum up the results):
SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') AND field2 NOT IN ('val3','val4');
SELECT COUNT(*) from table1 WHERE field2 IN ('val3','val4');
Now the results were ready after 2 minutes. The first query was almost instant because it used the compound key and the second query took all the time. It was because the index was missing for field2.
So, as a final step I had to add an another index on field2. I used the following command:
ALTER TABLE table1 ADD INDEX `field2_idx` (`field2`);
I had to waite another 3 hours for the index to be ready and finally:
JUST 5 SECS to count rows on 40+ million records' table result was achieved.
Hope you will find this information useful in your case.
Best regards,
Yuli
Thanks. Great solution!
ReplyDeleteBased on what you wrote it's possible to remove the field1_field2_idx and have instead field1_idx and field2_idx. Which will result in less wasted space 'cause field1_idx will be a lot smaller than field1_field2_idx.
ReplyDeleteYes, the field1_field2_idx index can be removed but the query will run a bit slowly.
ReplyDeleteThe system is in production now. I can not provide the benchmarks.
Yuli
Hi Yuli,
ReplyDeleteyou can use the First index only if you just run the SQL
SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2');
SELECT COUNT(*) from table1 WHERE field2 IN ('val3','val4')
AND field1 NOT IN ('val1','val2');
That will save you having both indexes, but may give you slower performance.
Yes IMO this will be a better solution, I don't think this will give slower performance.
DeleteYes IMO this will be better, I don't think this will give slower performance. Why to have another Index when only first one can do.
DeleteI enϳoy what уоu guyѕ агe up tоо.
ReplyDeleteЅuch cleѵег worκ and ехposurе!
Keeρ up the very good ωorkѕ guys I've included you guys to our blogroll.