MySQL: Pagination - SQL_CALC_FOUND_ROWS vs COUNT()-Query

Permanent Link: MySQL: Pagination - SQL_CALC_FOUND_ROWS vs COUNT()-Query 17. Juni 2009 Comment Comments (10)

When using pagination for let's say a list of offers that where never clicked, you have to know the exact amount of offers (which were never clicked) in order to know how many pages you have. Now there are 2 possible ways of calculating the exact amount of offers: You can use either SQL_CALC_FOUND_ROWS or you can setup a second query with a COUNT() in it. I did the tests with SQL_NO_CACHE in order to get the best results possible. The clicks table has about 18.000.000 rows, the offer table about 800.000. Let's start with some time results.

Using SQL_CALC_FOUND_ROWS:

mysql> SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (30.84 sec)

mysql> SELECT FOUND_ROWS();
1 row in set (0.00 sec)

Using a second query with COUNT():

mysql> SELECT SQL_NO_CACHE o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (0.03 sec)

mysql> SELECT SQL_NO_CACHE COUNT(o.offer_id)
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL;
1 row in set (30.97 sec)

At a first glance they look equally fast, both taking about 30 seconds. But: They are only equally fast, when query caching is turned off. Let's assume we're on a high-traffic website where performance matters, so we turn the query cache on. MySQL Query caching is like a key-value cache with the key being the EXACT query and the resultset being the value. Once we turn on the cache, the pagination is way faster with the second query using COUNT().

Why?

When using SQL_CALC_FOUND_ROWS the application has to calculate the found rows every single time we turn the page, because the query changes, while the COUNT()-Query always remains the same, meaning that its result comes from the query cache from the second time on. Let's emulate:

Using SQL_CALC_FOUND_ROWS:

mysql> SELECT SQL_CALC_FOUND_ROWS o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (31.13 sec)

mysql> SELECT FOUND_ROWS();
1 row in set (0.00 sec)

mysql> SELECT SQL_CALC_FOUND_ROWS o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 100,50;
50 rows in set (30.71 sec)

mysql> SELECT FOUND_ROWS();
1 row in set (0.00 sec)

Using a second query with COUNT():

mysql> SELECT o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (0.03 sec)

mysql> SELECT COUNT(o.offer_id)
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL;
1 row in set (31.11 sec)

mysql> SELECT o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 100,50;
50 rows in set (0.04 sec)

mysql> SELECT COUNT(o.offer_id)
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL;
1 row in set (0.00 sec)

Et Voilà! While the SQL_CALC_FOUND_ROWS Queries took more than one minute together, the queries with the second COUNT() query only took a bit more than 30 seconds together, meaning they are twice as fast.

I'm pretty sure there are situations where SQL_CALC_FOUND_ROWS is the way to go, but in cases like this one you definately wanna go for the COUNT()-Query.

Canonicals: What they are and how to use them

Permanent Link: Canonicals: What they are and how to use them 10. Juni 2009 Comment One Comment

Today I wanna give a short insight on canonicals. A canonical is a (fairly new) SEO item in the head section of a HTML document, which contains the parent URL. Let's say you have a page which contains a sortable list whose URL is www.chip.de/sortable-list. Besides AJAX-sorting you also have sorting with URLs like www.chip.de/sortable-list/order/price/dir/desc, which reloads the page showing the sortable list ordered by the price descending. Usually you either 2 would have pages in the Google index with the almost the same content or 1 page since you put noindex,nofollow in one of them. This is where canonical comes into play:

First off you leave the noindex, nofollow in the page with the sorted list. Second you put a canonical into the HTML head section telling search engines where it can find the original list!

<link rel="canonical" href="/sortable-list">

That way the sorted list URL will not only not be in the Google index but the original URL gets more "attention".

The latest Firefox versions already show if a canonical is defined:

Canonicals in Firefox

P.S. It does no harm to put the canonical tag in the original page itself.

Scrumfails Part 2

Permanent Link: Scrumfails Part 2 10. Juni 2009 Comment No Comment

Sprint Planning I: The product owner presents an item with 8 storypoints. Not one team member can remember to have ever estimated that item and everyone agrees that it's way bigger than an 8. How the hell could that happen? We don't know. What did we do? We excluded the item from the sprint and set up a meeting to discuss the item.

Job offer at CHIP Online as PHP Developer

Permanent Link: Job offer at CHIP Online as PHP Developer 5. Juni 2009 Comment No Comment

If you speak german and you're looking for a job as PHP Developer, we are looking for one at CHIP Online in the price comparison team.

In case you are interested, click here.

Scrumfails Part 1

Permanent Link: Scrumfails Part 1 4. Juni 2009 Comment No Comment

Estimation meeting. Cards are layed off. One 5, two 8s, one 13. Question from the one with the 5: "Why 8?"

< Mai 2009 | Juli 2009 >