Skip to content

Category: Wordpress

Lua String Compare Performance Testing (Nginx-Lua)

In another article I wrote about my ongoing attempt to move my server’s WordPress’s security plugin’s firewall functionality out of PHP and into the embedded lua environment in Nginx. While I’m certainly not nearly as the scale where the C10K problem is a real issue for me, I still do my best to insure that I’m doing things as efficiently as possible.

In my last post, I was looking at the performance degradation between doing no firewalling at all (just building the page in WordPress and serving it), and using the embedded Lua environment to do basic application firewalling tests.

In that article, I saw approximately 425 microsecond latency impact form the Lua processing compared to just building the page. Of course, that was still on the order of 2 orders of magnitude faster than doing the same work in PHP.

Part of the larger part of the actual processing that is being done, is looking for various strings in the myriad of data that’s pushed along as part of the various requests. Things like, know bad user agents, key bits used in SQL injection attacks, and various things like that.

Lua and Nginx both offer some options for searching strings. On the Lua side, there’s the built in string.find() (Lua5.1 docs) and associated functions. On the Nginx-Lua side of things there’s ngx.re.find() (lua-nginx-module docs) which allows calls into Nginx’s regex engine.

I’ve done a significant amount of digging trying to find performance informational about both of these methods, and I haven’t been able to find any. So I sat down and did my own testing.

WordPress Hardening, Moving WP-Config

Lets be clear about somethings. Security is hard. Even the so called experts get it wrong, surprisingly often at that. I’m not an expert, and I’m not proposing that I’m right. Take what I say with a grain of salt.

In part of WordPress’s hardening guide they discuss moving the WordPress config file (wp-config.php) out of the document root as a mechanism that can be used to make it more difficult to attack. Specifically they say:

You can move the wp-config.php file to the directory above your WordPress install. This means for a site installed in the root of your webspace, you can store wp-config.php outside the web-root folder.

Note: Some people assert that moving wp-config.php has minimal security benefits and, if not done carefully, may actually introduce serious vulnerabilities. Others disagree.

Note that wp-config.php can be stored ONE directory level above the WordPress (where wp-includes resides) installation. Also, make sure that only you (and the web server) can read this file (it generally means a 400 or 440 permission).

The linked discussion on the topic on Stack Exchange is also worth reading while you’re at it.

With that said, I’m going to throw my 2-cents in to the discussion.

Fixing my slow Wordpress, Nginx, & WP-Supercache Setup

I probably should have caught this one a long time ago, but I didn’t. For a while now I’ve been complaining endlessly, at least in my internal monologue, about the poor performance I’ve been seeing from WP-Supercache on my VPS. Preloaded cache files simply shouldn’t take 1-1.5 seconds to serve up. They should be quick quick quick. Yet I was seeing such slow load times.

I’ve been struggling with the issue for quite sometime. I had changed from WP-Supercache to W3 Total Cache, added memcached, cached DB operations and so on and so forth trying to figure out why my pages were so slow to load.

What struck me, was that when I rolled back my W3 Total Cache implementation to no caching, responsiveness stayed about the same or got slightly better. As a non-logged in user, the inverse should have been true. Pages should have taken longer to load without a caching implementation than with one.

Then it hit me, Nginx config files are parsed in order.

Okay, let me take  a step to the side here a moment. On Dreamhost VPSes if you’re running Nginx the server looks for supplementary config files in ~/nginx/$domain/ for each of the virtual hosts it’s configured to serve. Knowing that Nginx config files are read in order, I organized mine using the ##-description.conf style. So I might have 10-rewrites.conf, 50-wordpress.conf, 60-supercache.conf.

On a lark the idea struck me that maybe I should try loading the supercache rules before I got to the regular Wordpress rules, which include the directives for passing requests for .php files back to the PHP back-end.

A quick rename of 45-supercache.conf to 30-supercache.conf, thus placing the supercache rules ahead of the Wordpress rules, and my non-logged in user (i.e. reading from the static HTML cache) page response times dropped form 1-1.5s to less than 400ms.

Suffice to say, all my grumbling about slow performance was entirely due to poor configuration on my part.

I’m sure there’s probably a note about this somewhere in one of the myriad of guides for running Wordpress and WP-Supercache on Nginx, but I missed it.

I’d still love to improve the response time for pages that are being processed with PHP, but I can live with it being a touch slower for me knowing that it’s a whole lot faster for everybody else.

Sometimes I really hate software…or maybe just Wordpress

I’ve been struggling for a while to get some kind of workflow in place to track and organize my posts on my photography site. Just having a pile of drafts with no real order to them was causing me to lose things that were almost complete but needed a finishing touch to two. Then I cam across the plug-in Edit Flow which seemed like the a good solution to my problem. It would let me create custom post states (i.e. draft, pending review, etc.), which I could then use to organize my content as I was developing it.

Only problem, Edit Flow’s developers apparently do something with Wordpress’ post_date_gmt field, which Wordpress uses to indicate whether a post is scheduled or not, and what date ot post it on. Because of this, they have a function that runs to normalize the post_date_gmt instead of leaving it to Wordpress’ devices. As a result, Edit Flow basically breaks the publishing and administrative post display behavior built into Wordpress.

Now I’m not sure which is a worse actually.

Wordpress really should have a post_on_date field for each post that controls the scheduling. If it’s set to null—and yes, null is a perfectly appropriate value to represent not applicable, more so than ‘0000-00-00 00:00:00‘ is at least—then the post isn’t scheduled to be posted and Wordpress should behave as if it’s a publish immediately type of thing. On the other hand, if there’s a date stamp in the field, then that’s when the post is scheduled to be published, assuming it’s status is set to publish or scheduled.

Going off on a slight tangent here the date situation in thewp_posts table is simply ridiculous. Granted dates suck, the whole time zones, daylight savings, etc., etc. makes dealing with dates a minefield for even the most experienced programmers. That said, pick a standard and stick to it, either store all the dates in GMT or store all the dates in local time, or GMT with an offset (i.e. YYYY-MM-DD HH:MM:SS ±OFFSET). The rest can be handled though the intermediate code (either at the DB level with stored procedures, views, or dynamic columns) or in the API side though the functions presented to the users.

Getting back to the rant…

Of course the Edit Flow developers instead of working around the issue, say by building their code to GMTify the post_date field, or creating a new field, or storing things in another table, decided to appropriate the post_date_gmt field for something without much concern or though—at least it seems that way—for the fact that doing what they do break Wordpress’s default behavior. I’m sure they had their reasons, and personally I don’t know if I really care what they were, but I really hate people that do things in a way that breaks the functional, if quirky, default behavior unless the point is to deliberately change that behavior. Nor do I think that deliberately breaking the default behavior was their intent.

Oh and how do I come across this little gem?

I was writing a plug-in that would reset that the “scheduled/post on date”. Why isn’t this native functionality, though? It sure as hell should be.

Slight side tangent: What’s worse that it not being native functionality, is the insultingly ridiculous work around that’s trotted out when you Google for how to unscheduled a post in Wordpress. Here’s what your “supposed” to do: Change the date the post is scheduled to be posted on. That’s been the work around for at least a couple of years now to boot, even though there seems to have been patches submitted that fixed it almost 2 years ago.

Back to where I was, ya, so resetting the “scheduled post date” back to “post immediately” is actually as simple as zeroing out the post_date_gmt field, which is a rather trivial task to write a plug-in to do. (Yes, when I’m sure there aren’t any unintended consequences for doing what I’m doing and the code is the way I want it, I’ll probably release the plug in here, under my typical “You can use it, but don’t ask me for help” type license.) Well, at least it would be trivial if there was good documentation available. Instead, you’re left to do something like this.

Google a post hook that sounds about right, or visit this page or this page. Find something that looks reasonable, and then visit here to see where the hook occurs in the Wordpress source code or just grep though the source with something like grep -R 'action_hook' * | grep 'do_action'. Open up the source file, find the function and the hook call, and see if there isn’t a more appropriate action to hook into. In this case there isn’t, I can’t hook into pre_post_update because it doesn’t have a way to modify the post object before it’s put in the DB and doing the DB manipulation directly at that point would overwrite the operation.

Speaking of dropping down to writing the SQL, seriously Wordpress guys, I can do in 5 SQL statements what takes 50 odd plus lines of PHP and god knows how much more code and queries between the function calls and manipulating the data. Should I, or anybody else, do that? Probably not, but damn if it’s not an attractive proposition.

So what have I learned form this?

On one hand, I  can’t imagine giving up Wordpress now that I’m familiar enough with it, that I can actually make it do what I want on a programmatic level. Moreover, the fact you can get plug-ins to do so much (I picked up WYSIWYG Inline Code Command in the process of writing this to save having to drop to HTML to wrap bits in <code></code> tags), makes it real attractive to keep dealing with the annoyances.

On the other hand, some things really make me want to fork Wordpress and re-implement it in a more clean, extensible, clear, way. Then again, I don’t really want to release my plug-ins because I don’t want to feel obligated to support them after I solve my problem, dealing with a major 1100+ file 170,000 odd line (by my count) code base. Such is life.

Wordpress Live Drafts Plugin – Patch

I found Stephen Sandison’s Live Drafts Wordpress plug-in the other day and it’s been a godsend for the “publish place holder and then replace with final post” type of posts I publish on some of my other sites where I’m abusing Wordpress as a CMS. If you are looking for a way to have a draft copy of an update (including previews) of an already published post; this is the plug in you want.

That said, as a nitpicker, I found one minor detail that annoyed me to no end. The safe draft button was misaligned.

Alignment of save draft button when running Live Drafts as of version 3.0.1.

How it should look.

Correct alignment of save draft button.

With that in mind, I dug into the impressive short and clear source code (at least for a Wordpress plugin) and identified the problem which was quickly rectified with the patch below.

diff --git liveDrafts.php liveDrafts.php
index 0707b0e..93fa739 100644
--- liveDrafts.php
+++ liveDrafts.php
@@ -48,7 +48,7 @@ if (!class_exists('liveDrafts')) {
                     // Add save draft button to live pages
                     jQuery(document).ready(function() {

-                        jQuery('<input type="submit" tabindex="4" value="Save Draft" id="save-post" name="save">').appendTo('#save-action');
+                        jQuery('<input type="submit" tabindex="4" value="Save Draft" id="save-post" name="save">').prependTo('#save-action');

                     });

If you know how to apply a patch a source code file, you’re good to go. If not, or you want to make the fix though the editor in your Wordpress install, you’re looking for line 51 where it says “jQuery('<input type....“. You want to change the appendTo in .appendTo('#save-action'); at the end of the line to prependTo so the end of the line reads .prependTo('#save-action');.

Wordpress on Nginx on Dreamhost

By the time this is done being written I’ll have been running Wordpress on Nginx on a Dreamhost VPS. Better yet, I’ll be doing it with a smaller more well defined resource foot print, with better response times, and faster page loads than I had with Apache. The tradeoff, some more upfront configuration.

I’ve covered the broad strokes about my motivation here, here, and here. In short, Nginx offers a more consistent dependable level resource usage while still having the capability of scaling to serve may users, reducing the possibility of crashing the VPS while under moment of heavy load.

This is long, and may end up being multiple parts, so if you’re interested follow the jump and keep reading.

Getting a feel for the Nginx Stack

Forgive me, I’m about to ramble here.

For the past several months now I’ve been dealing with trying to get my VPS configured in such a way that it was stable and used as few resources (mostly RAM) as possible. During this process I had considered switching the web server from Apache2 to one of the lighter replacements. More and more I’ve been reading about the preference for Nginx (pronounced engine-x), along with PHP-FPM, as the defacto standard for high performance PHP sites.

Time to investigate.

The Nginx Stack

I swapped Apache 2 and mod_php out on my dev machine with Nginx and php-fpm a couple of days ago. Mostly to make sure everything would go smoothly if I decided to move my VPS over and figure out what rule changes I’d have to make to get Nginx running.

To start with Nginx doesn’t use the 1-process per connection model Apache does, instead it uses async IO. This addresses one of the biggest Apache problems I’ve had to contend with, a sudden spike in traffic spawning off a 10s of new processes, is no longer an issue at all.

Nginx’s memory foot print is comparatively tiny too. I’m seeing about 10MB total for the 2 workers + the master process, instead of 4-8MB or more per process.

Couple that with a fixed number of cgi processes on the back end (either with fastcgi or php-fpm) and you can account for most if not all of resources that will be used under any load conditions.

PHP

With Apache gone, so to goes mod_php and mod_fcgid. Neither are ideal solutions to running PHP sites, but those are the breaks (devsrv was running mod_php because it’s was what Ubuntu setup back when I installed it, and mod_fcgid is what Dreamhost uses).

Nginx does things a bit differently. PHP is run as a stand alone CGI “server” that Nginx proxies requests to. I find there are a couple of really nice advantages to this, especially if you can run php-fpm.

For example, you can pool cgi processes based on actual more broadly defined considerations rather than Apache’s process class. Say you have 3 vhosts, each running Wordpress, they can be served by a single pool of php processes that can share resources like a php-apc cache.

Ultimately, this means you can still control the number of backend processes that used for PHP, but can do so while still sharing resources where it makes the most sense.

Figuring and Managing Resources

With a VPS like Dreamhost’s where there are hard limits on memory usage, and since there’s no swap space you can’t really deal with overages when they occur other than to have the watchdog kill your VPS. In short, you really want to be able to account for resource usage and deal with transient spikes in a way that doesn’t result in spiraling resource usage.

Apache has always made this a fun exercise. Yes it can be done, but like I said, it’s fun trying to tune a fuzzy system optimally when there are hard limits.

With Nginx, I can count on the server’s staying exactly, or very nearly exactly, where they were at initialization. 2 worker processes + a controller process nets me between 8 and 12MB of RAM used, and it’ll be that much regardless of whether I have 1 connection or 100 connections. The CGI upstream servers are likewise manageable. With php-fpm you can let it spawn more processes, but you can just as easily limit it to 1 or 2 if resources are scarce. Simply put, it’s much easier to account for how much resources you actually need.

Performance

The whole point of Nginx is performance, but I’m not use to seeing 2 processes handle a lot of requests and that takes some getting use to.

I ran Apache Bench (ab) against my dev server to see just how it performed against Apache setup similarly to how it is in production and the results were certainly impressive to me.

Serving static content (cached HTML files, images, css, js, etc.) 2 Nginx processes could handle more than 2000 connections per second (over 1000mbit network). This is about 2x more than Apache2 was able to handle in about 1/30th the memory foot print.

Serving dynamic content is of course considerably slower, ~3 connections/second can be handled going though the full Wordpress PHP + MySQL stack with 4 php-fpm workers and xcache running.

It’s real hard to completely quantify all the variables, especially when I’m deliberately trying not to.

Conclusions

Okay I admit this post was sparse on details, I’ll try and rectify that in the near future. For now, let me just say, if you can switch to Nginx with your php application you should see better performance and lower resource usage than the same thing running under Apache. I certainly have.

Wordpress is a Pig and Dreamhost’s VPSes aren’t configured for it.

I’ve been fighting with this for quite sometime. I moved to a VPS over a year ago in hopes of a more stable Dreamhost experience, and for a while it was. Then about 9 months ago my site started crashing in out of the blue. I’d be chugging along just fine then, “blam!”, site down. I started aggressively caching things with WP Super Cache, then W3 Total Cache. It helped a little, but ultimately things just got more and more unstable. About 2 months ago I gave up on using PhpMyAdmin when I needed to do SQL stuff, simply because it was an instacrash for my VPS. About 3 weeks ago, I had enough and decided it was time to seriously track down the problem.

To make a long story short, Wordpress, is a massive memory hog. I’m pushing on average 30MB before I even start loading plugins. That’s not a lot if you have a 8GB server dedicated to nothing but pushing wordpress but, 30MB is 5-10% of a small VPS. I’ve gone though all the Wordpress tuning guides I can find. I’ve manually cleaned up the database. Nothing really helps. Of course if the server was configured for the load it has, the problem would be considerably smaller.

Which brings us back to Dreamhost’s VPS. They say it’s designed to scale with the RAM that’s allocated to it. Sure, maybe if you’re running static HTML pages. In which case the 69 concurrent clients configured on a 400MB server would get ~6MB a piece which is just barely enough for Apache to serve static HTML. Even then it doesn’t really work out, since there’s non Apache overhead. In fact, now that I think about it, by default under full load, Apache is configured in such a way that it can easily exceed a VPS’s memory allotment just serving static content. 😮

Then comes mod_fcgi. By default it’s configured to allow 20 instances per process class (I’ll come back to this), and the Apache default is 1000.

What are process classes and why are they important. Process classes are spawned by the same executable and share a common virtual host and identity. For example, if my virtual host for cult-of-tech spawns a CGI process, that process can’t be used by another virtual host on my server. Now here’s the kicker. When wordpress gets going and everything is loaded, that 30MB+ of Wordpress and all the PHP overhead + whatever space you allot for caching (XCacahe, APC) is how big the fcgi istance will be. In my case, that means each php.cgi instance is 60-70MB. On a 400MB server, that means once you spawn 5-6 php processes you’ve used up the entirety of your VPS’s memory and, again, blam!

The kicker though, is Dreamhost’s overly aggressive memory manager on their VPSes. Instead of killing off processes, or for that matter special casing it and just restarting Apache if it’s running, the watchdog merely kills off the VPS. Well it may do more, because it can take 10 minutes for the VPS to come back up unless you manually reboot it.

Interestingly enough the answer to all of this is not to simply throw money at it. In the process of troubleshooting this I temporary pushed my memory limits up, and even at 600MB or 800MB the config would still allow enough processes to cause the server to crash. For that matter my development server, which has a ton of RAM available, can comfortably do many of the things that was causing my VPS to crash, without exceeding a 200MB memory foot print.

Simply put, there’s no reason a lightly trafficked Wordpress site should require more than 300MB, maybe 400MB, but certainly not 600MB to simply stay upright. At least not with a properly configured server behind it.

The moral of the story is:

  • Wordpress is a memory pig, and they need to seriously consider a couple of releases focusing entirely on performance and lowering the memory footprint.
  • Dreamhost’s configuration for Apache and mod_fcgi on their VPSes is overly generous for small servers and needs to be curtailed to more reasonable numbers.
  • Dreamhost’s VPS memory watch dog is aggressive, and naive, and will take down a server in a hard to quickly recover way to insure it doesn’t use more resources than the client is paying for.

And what am I doing about this?

I’ve curtailed my Apache and mod_fcgi configs to more reasonable settings.

I’ve set mod_fcgi’s MaxProcesses directive to floor( (400 - typical_process_size) / typical_process size) and my Apache MaxClients to floor(((typ-cgi-process size * 2) - 20) /5). I won’t be anymore specific than that, because what will actually work while still being performant, varies based on site, software, traffic, caching, and number of virtual hosts.

Wordpess add Page Stats to the Admin Bar

I find it handy, especially when developing and tuning temples and plugins to, to get some info about how the page performed.

Previous to Wordpress 3, I’d define a constant in the wp-settings.php file on my development server and then have my theme check to see if that was set and if it was, insert a small fixed position div that contained memory, DB query, and page render statistics. It worked, but it wasn’t real elegant.

Wordpress 3 introduced the concept of the admin bar. Available to all logged in users, the admin bar does provide a continent way to handle and display some simple page stats.

Currently the plugin displays 3 stats, max memory used, number of DB queries reported by $wpdb, and page render time and will appear for all logged in users.

Download: cot-admin-bar-stats.zip

Twitter Tools and Custom Shortened URLs

I’ve been using Alex King’s Twitter Tools for sometime now on some other Wordpress sites I’ve developed. It’s clean, simple and just plain works. However, one thing that’s never really thrilled me about twitter is its use of bit.ly or any of the other automatic URL shortening. Mr. King provides a bit.ly plugin that lets you use your own bit.ly API, so you can tract traffic though your link, but that’s only a minor improovment.

The best solution, in my opinion is to use a custom short URL that’s provided though your own domain. This has several advantages, the biggest for me is that it makes it clear where the link is actually going. However, short canonical URLs are also being touted as a way to avoid massive internet link-rot created by the myrid of URLs services like bit.ly and tinyurl can produce. In fact I’m rather opposed to those services (even if I do use them) for security and clarity reasons. You can read more about rev=”canonical” urls here and here.

Fortunately, running a custom URL shortening service on your own blog isn’t difficult. In fact there’s already a Wordpress plug-in that handles both the shortening and redirection.

All that’s left for my purposes is tying that shortened canonical URL into Twitter Tools so that my tweets don’t get bit.ly or tinyurl links.

What You’ll Need

  1. Rev Canonical – this provides an SEO and URL shortening mechanism
  2. Twitter Tools – to get the updates on twitter.
  3. My TwitterTools RevCanonical URL shortening plug-in – ties it all togeather