Category: Linux

Getting a feel for the Nginx Stack

Published on November 8, 2011

Forgive me, I’m about to ramble here.

For the past several months now I’ve been dealing with trying to get my VPS configured in such a way that it was stable and used as few resources (mostly RAM) as possible. During this process I had considered switching the web server from Apache2 to one of the lighter replacements. More and more I’ve been reading about the preference for Nginx (pronounced engine-x), along with PHP-FPM, as the defacto standard for high performance PHP sites.

Time to investigate.

The Nginx Stack

I swapped Apache 2 and mod_php out on my dev machine with Nginx and php-fpm a couple of days ago. Mostly to make sure everything would go smoothly if I decided to move my VPS over and figure out what rule changes I’d have to make to get Nginx running.

To start with Nginx doesn’t use the 1-process per connection model Apache does, instead it uses async IO. This addresses one of the biggest Apache problems I’ve had to contend with, a sudden spike in traffic spawning off a 10s of new processes, is no longer an issue at all.

Nginx’s memory foot print is comparatively tiny too. I’m seeing about 10MB total for the 2 workers + the master process, instead of 4-8MB or more per process.

Couple that with a fixed number of cgi processes on the back end (either with fastcgi or php-fpm) and you can account for most if not all of resources that will be used under any load conditions.

PHP

With Apache gone, so to goes mod_php and mod_fcgid. Neither are ideal solutions to running PHP sites, but those are the breaks (devsrv was running mod_php because it’s was what Ubuntu setup back when I installed it, and mod_fcgid is what Dreamhost uses).

Nginx does things a bit differently. PHP is run as a stand alone CGI “server” that Nginx proxies requests to. I find there are a couple of really nice advantages to this, especially if you can run php-fpm.

For example, you can pool cgi processes based on actual more broadly defined considerations rather than Apache’s process class. Say you have 3 vhosts, each running Wordpress, they can be served by a single pool of php processes that can share resources like a php-apc cache.

Ultimately, this means you can still control the number of backend processes that used for PHP, but can do so while still sharing resources where it makes the most sense.

Figuring and Managing Resources

With a VPS like Dreamhost’s where there are hard limits on memory usage, and since there’s no swap space you can’t really deal with overages when they occur other than to have the watchdog kill your VPS. In short, you really want to be able to account for resource usage and deal with transient spikes in a way that doesn’t result in spiraling resource usage.

Apache has always made this a fun exercise. Yes it can be done, but like I said, it’s fun trying to tune a fuzzy system optimally when there are hard limits.

With Nginx, I can count on the server’s staying exactly, or very nearly exactly, where they were at initialization. 2 worker processes + a controller process nets me between 8 and 12MB of RAM used, and it’ll be that much regardless of whether I have 1 connection or 100 connections. The CGI upstream servers are likewise manageable. With php-fpm you can let it spawn more processes, but you can just as easily limit it to 1 or 2 if resources are scarce. Simply put, it’s much easier to account for how much resources you actually need.

Performance

The whole point of Nginx is performance, but I’m not use to seeing 2 processes handle a lot of requests and that takes some getting use to.

I ran Apache Bench (ab) against my dev server to see just how it performed against Apache setup similarly to how it is in production and the results were certainly impressive to me.

Serving static content (cached HTML files, images, css, js, etc.) 2 Nginx processes could handle more than 2000 connections per second (over 1000mbit network). This is about 2x more than Apache2 was able to handle in about 1/30th the memory foot print.

Serving dynamic content is of course considerably slower, ~3 connections/second can be handled going though the full Wordpress PHP + MySQL stack with 4 php-fpm workers and xcache running.

It’s real hard to completely quantify all the variables, especially when I’m deliberately trying not to.

Conclusions

Okay I admit this post was sparse on details, I’ll try and rectify that in the near future. For now, let me just say, if you can switch to Nginx with your php application you should see better performance and lower resource usage than the same thing running under Apache. I certainly have.

Finally Done with this RAID project…

Published on October 7, 2011

Three days of screwing around and as far as I can tell I’ve successfully moved all my data from one array to another while keeping the machine and data online the whole time–other than the few minutes of reboots to remove and replace hardware. Not bad for a SOHO file & web devel server.

Expanding things…

After the move was completed, and the array was re-syncing I expanded the LVM logical volume and the file system inside of it. Expanding the LV was simple and fast using lvextend (8) and the following command…

root@host:# lvextend data /dev/md#

It should take a couple of seconds while it allocates the new extents and returns and that’s done.

Expanding the ext4 FS takes a bit longer, but as long as it’s being extended and not contracted, it can be done while the FS is online and mounted.

root@host:# resize2fs /file/system/mount/point

Adding the -P to resize2fs (8) would be handy, if it works I didn’t try it, adding progress information as the resize is done.

/dev/md127, wtf?

Rebooting to remove the old, now archival, hard drive and bringing the system back up raised an interesting head scratcher. The MD device that should have come up as MD0, came up as MD127. The /etc/mdadm.conf file looked correct, but it wasn’t putting the device where it should have been in /dev. The fix seems to be rebuilding the intiramfs…

root@host:# update-initramfs -u

With that done and a reboot the md device shows up as md0 like it’s suppose to.

Additionally md apparently now supports assigning arrays descriptive names, like "hostname:1" and the array shows up under /dev/md/ as that name in addition to the regular /dev/md* device.

A note on the Hitachi 5k3000s

I went with these drives based on this post on the Backblaze blog. Well, with the caveat that I’m still leery of 3TB drives, so I’m using 2TB drives. I don’t even think that the board or the SATA controllers I have in this box support >2TB drives (though the update later this month will).

More interestingly, the Hitachi 5k3000s are 512-byte sector drives, not 4K sector “advance format” drives. While the 4K sectors do have advantages on large drives in terms of insuring data integrity they also become somewhat fun trying to partition and align around. Partitions have to aligned to 4K boundaries (fdisk, as well as Windows’ partition tools, align to 1M (2048- 512-byte sectors) when DOS compatibility mode is disabled), on top of that you have to be careful where the MD device places the metadata information, as that can shift the FS alignment as well. And for that matter, I have no idea what kind of overhead LVM adds in terms of alignment. In short, 4K drives are, IMO, still something of a mess, and probably will be fore sometime.

One nice thing is I’m seeing about 2x the performance of these Hitachi drives than I was with WD Greens, even though I believe they were properly aligned. Benchmarks show the drives I have can do 140MB/s on the other tracks, I don’t get that yet, but I’m hopeful that a new system with a faster CPU (Xeon E3-1220) and more modern SATA controllers (not the ancient SiI3114 on this Tyan Thunder K8W) will get me closer to that.

The Brass Tacks: What I learned

madam’s support for RAID 10 is lacking compared to all the other levels
LVM, especially pvmove (8) was more useful than just having a realizable volumes.
Planning storage, especially growth is a pain in the rear, when you can’t just throw a ton of disks at it

Breaking Arrays, Moving Data, LVM Good for something

Published on October 6, 2011

Who knew LVM would be good for something. Well, maybe, I’ll know for sure sometime tomorrow, or late tonight, if it works it’ll be great, if it doesn’t. I’ll be damn glad I backed up these drives.

Yeah, so back to LVM. I always wondered if creating an LVM volume over the top of an MD raid volume was a good idea or if it wasn’t just adding extra overhead. And EXT4 partition can be extended without the help of LVM and so can an MD raid device. So why add the extra layer in there.

pvmove

That’s why.

Breaking Arrays, Making Arrays

Wanting to avoid the “blow it away and restore from backup” strategy, especially since WD Caviar Greens are so damn slow compared to just about everything else, I decided the best course of action would be to split the existing unresizable md array and create a new second one. Something like….

mdadm /dev/md0 --fail /dev/sdb1
mdadm /dev/md0 --remove /dev/sdb1
mdadm --create /dev/md1 --level=1 --raid-devices /dev/sdb1 missing

The end result, 2 degraded but fully functional md arrays. One still hosting the data volume group with my home logical volume, and one with a big empty disk.

The trick now is to move the data.

LVM Really is Good for Something

The question of how to move the data stumped me for a bit. I could create a new volume group (VG), or at least a new logical volume in the same data VG I already had, format it, and rsync the data across. Of course then I would have to edit at least my /etc/fstab and to get things pointed to the right place. The alternative that came up as I was digging though the LVM documentation is a nifty function called pvmove (8) that will move the physical extents of an LVM from one physical drive to another in a volume group (or to multiple drives in a volume group if needed). Moreover, as best as I can interpret the docs, it does this in a way that’s safe to do with the system online.

All told, for my system, the process looked something like this…

vgextend data /dev/md1
pvmove /dev/md0 /dev/md1

Now it’s back to the waiting game. It’ll be 5 or 6 hours before the pvmove is complete, then I have to tear down the md0 raid array and add the /dev/sdd device that’s left in md0 to md1. That will necessitate a 6, or so, hour re-sync. After which, I’ll reboot, make sure md1 becomes md0 and everything is found properly. Then it should hopefully be a short task of expanding the logical volume from 1.5TB to 2TB and then the EXT4 file-system inside of it. If not, well I’ll be damn glad I made that 6-hour long backup, wont I?

MD, RAID10, ARRRRRrrrrrrgggghhhh!!!!

Published on October 6, 2011

Normally the complexity of doing something in Linux doesn’t bother me. Arcane and convoluted commands don’t scare me, they never really have; they just take some getting use to. The problem I have is when the command, or the underlying system is only half implemented.

My current project has been replacing a pair of 1.5TB WD Caviar Greens with 2TB Hitachi 5k3000s. Yes I see the irony in replacing WD drives with drives made by a company that just sold their drive division to WD. On the up side 500GB more space nets me enough space to backup the rest of the computers on the network and still have as much free space as I had before, which was running down anyway; oh and the Hitachi’s are faster too.

Replacing the drives in the RAID array has gone smoothly enough using the following procedure:

Fail the disk to remove using mdadm /dev/md0 --fail /dev/sdX#
Remove the disk from the array using mdadm /dev/md0 --remove /dev/sdX#
Power down the machine (hot swap is coming in a future upgrade)
Swap the physical drives
Bring the machine back up
Add the new drive to the array using mdadm /dev/md0 --add /dev/sdX#
Let it re-sync.

I’ve done this for 2 1.5TB Greens, one that was failing and one that’s now going to become a proper backup target.

Now that I have two 2TB drives in there, I want to use them, and that means extending the md group to the full size of the array. So far as I can tell, that should be a simple…

mdadm -G /dev/md0 --size=max

…but, apparently that’s not the case if the array is configured as RAID10. RAID10, which gives the performance of RAID0 with redundancy of being able to lose a disk, which IMO is perfect for slow 5K RPM disks. MD even has a nice feature where the RAID10 array can be created in a partial 2-disk configuration then extended to the full 4+ disk configuration later. In the “partial” mode, it behaves exactly like a RAID-1 array.

Which brings me to the meat of this rant. I can re-size a RAID1 array, I can convert a RAID 1 array to 5, 6, or even 0. However, mdadm can’t re-size a RAID10 array, even if it’s running in what amounts to RAID1 mode, or convert it to RAID1 or any other RAID level for that matter.

Sigh…

Now it’s off to back up the damn thing, kill it rebuild it, and restore everything…. At least I’ll know if my backup procedure works.