How caching saved my Drupal site and how I configured it

Every web developer dreams of seeing his sites finally “going big”. After months or even years of setup, configuration, content creation, bug-fixing, and digital sweat, patience is often rewarded with an influx of new visitors and signups… only to be followed by an error similar to this one:

For those who haven’t yet experienced this bitter-sweet message, max_user_connections is a mySQL database connection limit that many webhosts enforce on their shared mySQL servers to avoid any single client from hogging more than his fair share of database resources. On my most popular site users started seeing this message multiple times every day, sometimes after trying to make a post, which meant them having to try to recover their post content and re-posting it. I also received a polite message from my webhost:

We’ve had some recent complaints of poor server performance resulting from
recurring load spikes on your shared server.

For example, during a ~6 minute period from 12:04PM to 12:10PM server time
today, during which server load spiked to ~12, we logged the following activity
for your database:

Select queries: 290845 (812qps)
Update queries: 4182 (11qps)
Insert queries: 3105 (8qps)

That is admittedly quite high. So, what can be done to increase the capacity of the database? Well, nothing, frankly, apart from upgrading the hardware or reducing the number of users per server. However, it is possible to reduce the number of requests that are made to the database. Obviously, certain optimizations can be made to the web applications themselves so that they hit the database less. But in my case, I’m not a coder, and my site offers quite a bit of advanced functionality that simply generates a lot of SQL queries. Enter caching. A good caching setup acts as a middle layer between the application (Drupal) and the database (mySQL). For Drupal, there is a mind boggling plethora of caching options, some of which are considerably more effective and stable than others.

Memcached integration for Drupal

After doing a lot of research, I first decided to try the Memcache module. This mechanism uses a separate Memcached process running on the webserver to cache certain database requests in RAM. The main reason I was interested in this functionality was to reduce the number of requests made to mySQL. But obviously a side benefit (or primary concern in certain use cases) is that accessing data in RAM is much faster than querying a SQL server with its data stored mainly on disk. Thanks to the Drupal module that utilizes the Memcached service, no changes to the Drupal site code are necessary. However, after trying the Memcache module for a while on my Drupal site, I realized that there is an important limitation to Memcached. As mentioned, a separate Memcached process must be running on the server to handle cache queries. As time goes by and more site data is accessed by users, the amount of cached data in Memcached will gradually increase and the load on the SQL server will gradually decrease. But when the machine is restarted, the RAM is obviously cleared, and all the data that Memcached has accumulated is lost. Of course, the master data still resides in the mySQL database, so no actual data loss occurs as far as Drupal is concerned. Yet, a high-traffic site that has a large amount of cached data in Memcached will experience performance problems and possibly much higher load on mySQL after a server restart until Memcached manages to re-cache data. Given this limitation, I started Googling for “memcached persistance”, which turned up another solution.

Redis Server integration for Drupal

Redis Server is an increasingly popular option for administrators that want a fast RAM caching mechanism like Memcached but with the addition of cache data persistance after a reboot. Redis stores all of its data in RAM, so it is extremely fast but not appropriate for data sets that are larger than the server’s memory. Although Redis runs entirely in RAM, it periodically saves out its cache data to a dump file on the hard disk, which it can automatically load on startup after a server reboot. It should be noted that Redis is not intended just for caching, and in fact it can be used like a database all by itself. Technically, Redis can be described as a “data structure server since keys can contain strings, hashes, lists, sets and sorted sets”1. But for the intents and purposes of a Drupal administrator that runs a site on mySQL, a Redis Server instance can be configured to act as a caching mechanism that provides the same function as as Memcached, although its data structure and algorithms are totally different from those of Memcached.

Configuring Redis Server to act as a cache

Since Redis Server can be used alternatively as a persistant database in place of SQL for certain applications, some of its default configuration options must be altered to make it behave in a way similar to the more common Memcached. Fortunately, even for a beginner like myself, the redis.conf file is easy to understand and very well documented with comments. The redis.conf file is distributed with Redis in the regular tarball download file.

Since I use a unique VPS-esque shared hosting account at Webfaction, I was able to download, compile, and install Redis Server into ~/bin/redis. Here are some instructions for doing so on Webfaction. I also put my redis.conf into that same directory. Then I changed the following options:

This seems to be optimal for my Webfaction setup:
daemonize yes

If running Redis as a daemon, it needs a process ID file. I prefer mine to be in my own ~/bin directory.
pidfile /home/myusername/bin/redis.pid

I created a new “custom app listening on a port” on my Webfaction account which serves the sole purpose of reserving a port for Redis. Then I inserted the port number that the control panel shows:
port 13129

I chose this directory to save the dump file for persistance:
dir /home/myusername/webapps/drupal_6/cache/redis

Since I’m on a shared server, I added a password for access to my Redis instance, just in case:
requirepass p123

Since we are running on a shared server with limited memory (currently 256 MB of application memory per user), we need to set a maximum amount of memory that Redis can use in megabytes. Redis will start discarding data when this limit is reached. I set mine to a very conservative value of:
maxmemory 50M

Now that Redis is configured to our liking, we just need to start an instance of it, using the config file we just saved in the Redis directory. Run this command:
cd ~/bin/redis && ./redis-server ./redis.conf

Redis module configuration on Drupal

Once I got the Redis Server instance running, I had to configure Drupal to use it instead of its default database cache. Redis Server integration for Drupal is provided by the Redis module. As noted in the documentation, the Redis module can use two different PHP backends that interface with the Redis Server instance— Predis and PhpRedis. Since I’m not on a dedicated or VPS server and therefore don’t have complete control over the LAMP stack, I opted for Predis, which is a PHP library that can be easily enabled in the Drupal settings.php file and doesn’t require changes to the server setup.

To start using Redis on my Drupal 6 site, I first had to enable the Cache Backport module. This module replaces the old Drupal 6 caching system with the much improved, modern cache system that debuted in Drupal 7. Enabling the backported cache module is different from enabling a normal Drupal module. Since the module actually comes from Drupal 7, it can’t be enabled on Drupal 6 through the admin interface. Rather, it must be enabled directly in the Drupal settings.php file. So after decompressing the Cache Backport module to the usual /sites/all/modules directory, I put the following lines at the bottom of my settings.php:

##### Enable Cache Backport #####
$conf['cache_inc'] = 'sites/all/modules/cache_backport/cache.inc';

The above configuration won’t do anything by itself. The cache system from Drupal 7 is designed to permit multiple cache engines to work in tandem on the same site, thus making it possible for the optimum cache mechanism to be used for different cacheable data types in Drupal. For now, I just want to use Redis. So I added the following lines to settings.php below the previous addition:

##### Enable different Cache Backport engines #####
$conf['cache_backends'][] = 'sites/all/modules/redis/redis.autoload.inc';
$conf['redis_client_interface'] = 'Predis';

The above lines tell the Cache Backend to include Redis as one of its available engines and to communicate with the Redis Server via the Predis backend. But the current configuration still won’t cache data. So the next task is to determine which data in Drupal is available for caching and which of those data are appropriate for storing in Redis. I found an extremely helpful article that explains that Drupal already does a lot of caching, but that by default the cache itself is stored in the mySQL database. Since I want to reduce the load on the mySQL database, I need to configure Drupal to not query the mySQL database but rather the Redis server for certain cache tables. To determine which tables in mySQL are available for caching, I used the following query in the mysql command line or in phpMyAdmin:

show tables like 'cache%';

In my case, this returned the following table names:

cache
cache_block
cache_content
cache_filter
cache_form
cache_menu
cache_page
cache_pathdst
cache_pathsrc
cache_rules
cache_update
cache_views
cache_views_data

In its INSTALL.txt file, the Cache Backport module explains what sort of data is stored in these tables. Volatile (frequently changing) tables that are frequently accessed and don’t store huge amounts of data are optimal for caching in Redis. This is because Redis is the fastest of the caching engines that are described in this article, but its storage capacity is limited by available RAM. After some experimenting with my setup, I decided to use Redis for all cache tables except cache_filter and cache_update. Here’s how this is defined in the settings.php file, immediately after the previously inserted lines:

##### Redis config #####
exec("~/bin/redis/redis-cli -h 127.0.01 -p 13129 -a p123 ping", $output); 
if ($output[0]=="PONG") {  # Use Redis for caching. 

	$conf['redis_client_host'] = '127.0.0.1';
	$conf['redis_client_port'] = 14768;
	$conf['redis_client_password'] = "p123";

	$conf['cache_prefix'] = 'mysite_redis';

	# Handle these with Redis #
	$conf['cache_default_class']          = 'Redis_Cache';
	$conf['cache_class_cache_bootstrap']  = 'Redis_Cache'; 
	$conf['cache_class_cache']            = 'Redis_Cache';
	$conf['cache_class_cache_block']      = 'Redis_Cache';
	$conf['cache_class_cache_content']    = 'Redis_Cache';
	$conf['cache_class_cache_form']       = 'Redis_Cache';
	$conf['cache_class_cache_menu']       = 'Redis_Cache';
	$conf['cache_class_cache_page']       = 'Redis_Cache';
	$conf['cache_class_cache_pathdst']    = 'Redis_Cache';
	$conf['cache_class_cache_pathsrc']    = 'Redis_Cache';
	$conf['cache_class_cache_rules']      = 'Redis_Cache';
	$conf['cache_class_cache_users']      = 'Redis_Cache';
	$conf['cache_class_cache_views']      = 'Redis_Cache';
	$conf['cache_class_cache_views_data'] = 'Redis_Cache';
}
else {  #If Redis is not running or not responding
    #kill any hanging Redis processes
    exec("pkill -u $USER redis", $output);

    #start a new Redis instance	
    exec("cd ~/bin/redis && ./redis-server ./redis.conf", $output);
}

Notice that in the above block of code, there are a bunch of lines that start with cache_class_. That’s the prefix that must be added before the name of the cache tables that I previously searched for in the Drupal database. All of the above are handled by Redis_Cache, which is the variable name defined by the Redis integration module.

One thing I prefer about the Memcache integration module for Drupal over the Redis module is that Memcache gracefully fails over to standard Drupal caching if Memcached goes offline. On the other hand, the default configuration of the Redis integration module causes Drupal to crash if Redis Server isn’t running. Although Redis seems to be highly reliable and capable of running continuously for months on end, there’s still no way to know when the webhost might reboot the server after an error or after applying security updates. So I added in some lines to the above block of code such that if Redis Server is not running or not responding, Drupal will gracefully failover to its standard caching mechanism and even try to restart Redis automatically. This is relatively easy to do since the config file for Drupal is still a PHP file. Therefore, conditional commands can be run on the server with the PHP exec() function. For example, this line from above:
exec("~/bin/redis/redis-cli -h 127.0.01 -p 13129 -a p123 ping", $output);
This command tries to ping the Redis Server on host 127.0.0.1 running on port 13129 with the password p123. If everything is working correctly, Redis should respond with the word pong. In that case, it proceeds to define which cache bins should be handled by Redis. Otherwise, it uses the pkill command to kill any Redis processes that might happen to be hanging, and it starts a new instance.

Now the settings.php file should be configured to make Drupal interface with Redis Server. I checked to make sure that everything is working at the Drupal status page:
/admin/settings/performance/cache-backend
If everything is configured and running correctly, for each of the cache tables I defined previously, it should show the Backend to be Redis_Cache, and the Status should be OK, and Has data should say yes.

Additionally, I monitored the amount of memory used by the redis-server process to see if it was actually caching data. For an active site, redis-server memory usage should constantly fluctuate. Mine usually uses between 9,000 and 15,000 kB of memory. It’s also possible to get a lot of useful information about the currently running Redis Server instance by connecting to it like this, in conformance with my previously listed configuration:
~/bin/redis/redis-cli -h 127.0.01 -p 13129 -a p123 info

Drupal file caching

When I set up Redis, you will remember that I chose to leave out two cache tables from the Redis configuration, namely, cache_filter and cache_update. The reason is that cache_filter stores a lot of data relating to the actual content of rendered nodes, and on a site with thousands of nodes this table can easily pass the 50 MB limit that I set. As for cache_update, I don’t frequently access the list of updates to my Drupal modules, so I don’t want to waste the precious fast RAM cache in Redis. For these two tables, a different cache engine for Drupal fills the bill nicely. File Cache is a module for Drupal 7 that also works with the Drupal 6 Cache Backport module. File Cache, as its name suggests, caches data on the hard disk in flat files instead of in the database. While this might sound slow and antiquated, in reality File Cache has been shown to give excellent performance compared to SQL queries on busy servers, and like Redis, reduces the amount of queries that reach the mySQL server.

To start configuring File Cache, I returned to the following section of settings.php:
##### Enable different Cache Backport engines #####
After the Redis stuff, I added two additional lines:

$conf['cache_backends'][] = 'sites/all/modules/filecache/filecache.inc';
$conf['filecache_directory'] = 'cache/filecache';

The above settings enable File Cache as an additional cache engine, and it sets the directory where the storage files for the cache will be located. In my case, I want cache files to be located in the filecache directory that is inside the cache directory inside the Drupal root. I created both of these directories myself.

Now I have to tell the Cache Backend to use File Cache for some cache tables by adding these lines at the end of the settings.php file:

##### Handle these with FileCache #####
$conf['cache_class_cache_filter']  = 'DrupalFileCache';
$conf['cache_class_cache_update']  = 'DrupalFileCache';

After saving the settings.php file, I verified that the cache/filecache directory inside my Drupal root was filling up with files that contained cached data. And that’s all. File Cache is relatively straightforward to configure and use, and it drastically reduces the amount of queries that actually reach the mySQL server.

Boost Cache

Redis and File Cache have proven to be extremely effective for caching SQL queries generated by registered users that would normally create load on the mySQL server. However, on my site, about 90% of traffic is anonymous. This is where the Boost module comes into play. When an unregistered visitor views a node, Boost saves an HTML snapshot of that node to disk. Then, the next time an anonymous visitors hits the same node, by means of an .htaccess redirect rule it simply serves up the HTML snapshot. Therefore, Boost is by far the most extreme form of caching discussed here, since it completely bypasses the PHP interpreter and the mySQL server altogether.

Boost is available natively for Drupal 6 and 7. On Drupal 6, I simply installed the module and configured it at admin/settings/performance/boost. Boost is fairly friendly to beginners, since it helps you configure the .htaccess file to redirect unregistered visitors to the Boost cached page if available. After configuring my desired settings, I visited admin/settings/performance/boost-rules and was pleased to find that it generated all of the complex .htaccess rules for me. There is even an option to automatically insert the rules into the .htaccess file, but I preferred to do so manually.

To verify that Boost was working, I logged out of Drupal and visited a page. Then I reloaded the page and viewed the page source. At the very bottom of the source, there should be a line that says something like:
<!-- Page cached by Boost 2012-11-28 12:46:54, expires 2012-11-28 13:46:54 -->

From here, I ran into a few minor issues with Boost’s caching logic. My site uses a Tracker block powered by Views on the front page to show recent post activity. I noticed on Boost cache pages that the block wasn’t updating when new nodes were posted. Likewise, on the /forum pages, new posts were not appearing on cached pages. But fortunately, Drupal was flexible enough to allow me to work around this without hacking the Boost code. I decided to use Rules to dump the Boost cache for those pages when new nodes or comments are posted. To start, I created a Rule that activates “After saving new content”. The associated action should be “Execute custom PHP code”. There I added the following lines of code:

boost_cache_expire_derivative(array(''), TRUE, TRUE);
boost_cache_expire_derivative(array('forum'), TRUE, TRUE);
boost_cache_expire_derivative(array( drupal_lookup_path('alias',"forum/".$node->tid)  ), TRUE, TRUE);

What this does is expire the Boost cache for the page in the first argument. The first line has '' as the first argument, which is for the frontpage. The second line expires the Boost cache for the URL /forum. And the third line determines what forum category a node belongs to and expires the overview list page for that category.

After saving the first Rule that activates on a node save, I created another Rule that activates “After saving a new comment”, and I used the exact same PHP code.

Conclusion

So, the million dollar question: Does it all work? The answer is a resounding yes. Even though my traffic continues to increase from month to month and is currently at almost half a million hits per day, the max_user_connections limit error has virtually disappeared. And an additional side benefit of this caching setup is noticeably better response times for users. Many thanks to the creators of Redis and its Drupal module, together with the authors of File Cache and Boost, all of which have unquestionably helped to save my Drupal site.

Footnotes

1 http://redis.io

Tagging: 

1 Comment