Monday, March 26, 2012

Utilizing Page Caching while using query string parameters using Nginx

One of the common practices for ruby on rails developers is to wrap all parameters in the URI itself, so they can easily make use of page caching. Sometimes having a lot of parameters can start being confusing specially with the ordering of the parameters to match the defined routes.

Let's elaborate with an example.

Consider an API that provides a list of entries, with possible filters:

http://api.example.com/feed.json

This can simply be page cached, rails will place the proper feed.json file in the public directory.

If we wanted to add use some supported filters like:

  1. date (in the format of 2012-03-25)
  2. order (by date, score or relevance)
  3. page (used to get the next pages of the feed)
  4. limit (or the page size.. the expected number of entries per response)
  5. type (some app related filter. let's assume the values can be: post, comment, announcement)

If we used the simply form of

http://api.example.com/feed.json?date=2012-03-25&page=2&limit=20&type=comment&order=score

all the requests will be page cached at public/feed.json regardless of the parameter values. causing them to be ignored. And even worse, if the cached page happened to be generated with filtered requests, the cached response will be filtered for further requests even if no filters was specified.

To Avoid this, most developers will include the parameters in the URL directly with the following route in the config/routes.rb

match '(/date/:date)(/type/:type)(/page/:page)(/limit/:limit)(/order/:order)/feed.json', :to=>'api/feed_entries#index'

And expect request to be:

http://api.example.com/date/2012-03-25/type/comment/page/2/limit/20/order/score/feed.json
http://api.example.com/page/1/limit/20/order/relevance/feed.json
http://api.example.com/date/2012-03-20/page/1/limit/30/order/score/feed.json
http://api.example.com/type/post/page/2/limit/20/feed.json

This will cause all requests to be cached separately and properly. and will work like charm. However, it makes it easy for other developers and API users to mess up the parameters order while constructing the URL. The first approach is definitely easier for them to use. but hard for us to cache.

One approach to fix this, is to step out of the rails app box. and to utilize the webserver (in our case, nginx). Using nginx rewrite module, you can allow developers to use the query string parameters that is easy for them, map the query string parameters to the proper route that is expected by rails, and make use of page caching.

Here is the nginx configuration snippet that does the mapping:

location = /feed.json {
set $apiurl '';
if ($arg_date != '') { set $apiurl /date/$arg_date ;}
if ($arg_type != '') { set $apiurl $apiurl/type/$arg_type ;}
if ($arg_page != '') { set $apiurl $apiurl/page/$arg_page ;}
if ($arg_limit != '') { set $apiurl $apiurl/limit/$arg_limit ;}
if ($arg_order != '') { set $apiurl $apiurl/order/$arg_order ;}
if ($apiurl != '') {rewrite ^ $apiurl/feed.json ; break; }
}

So, using this, both you and developers using your API will be happy. This will needs you to maintain your routes synchronized with nginx configuration. but the gains are awesome.

I am sure the same can be achieved via a rack middleware. which will be fine if handling nginx was not possible. but for high traffic websites. handling this on nginx level will be much more better than reaching rails stack. That's why we used page caching in the first place

The point is that if you are a web developer, you should never be restricted to the constrains made by your application framework. You can utilize anything between the end client/browser to ur most wrapped DB storage. The application server is only one of those components.