Skip to content

Cult of Tech.net Posts

The Sad State of Lazy Loading: A bit of a rant

Every now and again I spend sometime on some other sites of my working on new features and performance. One of the facets that comes up occasionally, especially on my photography site, with it’s tons of images, is lazy loading the images.

Now don’t get me wrong, lazy loading images makes a lot of sense. I have pages where I might have 2 or 3 MB of large high quality JPEGs on the page. As a photographers, small heavily compressed images with lots of artifacts isn’t something that I find attractive, or what I want to show of.

Problem 1: The web developer has to do it.

The problem here is that like so many things on the web, the burden of doing this has been put on the web developer.

Okay, sure, there are JavaScript scripts available from all over the place, in just about any license that you could imagine. And it’s not like the JS needs to be all that big. Moreover, you can certainly make an argument that it should be the web developer doing the implementation as some people might want, or depend on, the standard behavior of downloading everything.

That said, I’m going to make exactly the opposite argument here, well kind of.

Linux/Ubuntu CPU Perf Scaling with Modern Intel CPUs

I’m posting this largely because all the documentation I can find, and the discussions around this appear to be out of date, or at least not entirely accurate.

I run a Ubuntu server as my home NAS, storage server, general do things host, and for development work on some sites I maintain. It’s built around an Intel Xeon E3-1220 v2, 16 GB of DDR3 RAM and storage running on ZFS.

By default, Ubuntu runs a process on boot called ondemand (/etc/init.d/ondemand). The process is simple enough, it’s a 73 line shell script that basically looks at the available CPU governors (/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors).

It then looks at what governors are available and if either “interactive”, “ondemand”, or “powersave” are, it sets those governors in that order.

So for example, if you have an E3-1220 v2; then the output of /sys/.../scaling_available_governors will be powersave perforamnce.

Since powersave shows up in the available list, it will then echo that to /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor.

So far all this seems reasonable, in a sense, and for older CPUs or maybe AMD or ARM cpus (I don’t know about this one for sure, as I don’t have any AMD or ARM systems) this may be the ideal way to go about getting the clocks on the CPU to scale dynamically.

An Idea: Email Expiry Header

Put this in the category of things that ought to be but aren’t.

An email expiry header.

E.g. Expires: 20180413T000011Z

My reasoning for this is simple, I subscribe to a lot of promotional stuff from various companies I’ve done, and do business with. You know things like discount codes. Most of which have an expiation date. However, there is no standard for this date. It’s often conveyed in the fine print at the bottom of the email, and the wording always varies. Some say “good thru” and a date, some say “valid until”. Moreover, the dates are almost never standardized.

All of this variation makes it hard to put together any kind of automation in my mail client to do something with these messages. With a standardized expires header — yes of course people would have to use it — I would have a standard way to deal with these messages.

E.g. maybe some promotions I want to highlight when they’re less than a week old. Others I may flat out want to delete when they’ve expired. Right now I have to manually address this on a regular basis. This is the 21st century though, why the heck do such trivial things manually?

Ergo an expires header, standardizes the date the email is no longer valid or useful so the end user’s software can easily deal with it.

Ending an Era of OpenBSD: Or a Brief History of my Firewalls

For something approaching 20 years, I’ve used OpenBSD to firewall my network from the internet and provide basic network services (DHCP, DNS, NTP, VPN, etc.). Just recently I’ve decided to retire OpenBSD and stand alone computers from the role of firewalls for something smaller, lower power, and easier to manage and upgrade.

I’ve been steadily moving towards smaller and lower power systems for as long as I’ve been doing OpenBSD based firewalls. My first machines were nothing more than mid-tower desktops that I had upgraded away from. In 2000-2003 I made my first moves towards building something more specialized, when I switched from using old towers to building a specific micro-atx pizza box style machines; though still with standard Athlon XP CPUs and parts.

In 2010 I replaced the micro-ATX Athlon XP with a mini-ITX based Intel Atom D510 machine. This halved the power consumption, from somewhere around 80-100 W[1] to something closer to 40 W.

Around 2015 or so I started looking into running OpenBSD off a USB flash drive instead of a standard hard drive. Part of this was to remove the power consumption of the HDD from the equation. In this, final configuration, the D510 machine with 2 NICs and 2GB of RAM turned in at a somewhat respectable 30 W. Though that was hampered by an abysmally bad PSU with almost 0 power factor correction that pulled nearly 60 VA.

Lua String Compare Performance Testing (Nginx-Lua)

In another article I wrote about my ongoing attempt to move my server’s WordPress’s security plugin’s firewall functionality out of PHP and into the embedded lua environment in Nginx. While I’m certainly not nearly as the scale where the C10K problem is a real issue for me, I still do my best to insure that I’m doing things as efficiently as possible.

In my last post, I was looking at the performance degradation between doing no firewalling at all (just building the page in WordPress and serving it), and using the embedded Lua environment to do basic application firewalling tests.

In that article, I saw approximately 425 microsecond latency impact form the Lua processing compared to just building the page. Of course, that was still on the order of 2 orders of magnitude faster than doing the same work in PHP.

Part of the larger part of the actual processing that is being done, is looking for various strings in the myriad of data that’s pushed along as part of the various requests. Things like, know bad user agents, key bits used in SQL injection attacks, and various things like that.

Lua and Nginx both offer some options for searching strings. On the Lua side, there’s the built in string.find() (Lua5.1 docs) and associated functions. On the Nginx-Lua side of things there’s ngx.re.find() (lua-nginx-module docs) which allows calls into Nginx’s regex engine.

I’ve done a significant amount of digging trying to find performance informational about both of these methods, and I haven’t been able to find any. So I sat down and did my own testing.

Nginx-Lua Module: Access Control Performance Testing

I’ve been playing with the Lua engine in Nginx for a while. My primary intent is to offload most, if not all, of my WordPress security stuffy from running in the PHP environment to running in something that potentially won’t use as much in the way of resources. The first question I need to answer before I can reasonably consider doing this is what kind of of overhead doing extended processing in Nginx–Lua imposes in terms of performance.

To put some perspective on this, I’ve been running the WordPress security plug-in Word Fence for a while now. When I compare my production server (which has Wordfence enabled) and my development server (which doesn’t have word fence installed, but is otherwise running the same plugins and code base), I see on average a 10–20 ms increase page rendering times, and nearly 20 additional database queries per page.

The overhead from Wordfence isn’t creating a performance problem per say, however, shaving even 15 ms off a 50–60 ms page render time is an appreciable improvement. Additionally, less resources consumed by a bad actor means more resources are available for actual users.

In any even the question here is how much performance overhead does the Nginx-Lua module carry for doing some reasonable processing.

Hidden “Features”

There’s a trend in modern computing that I don’t understand; hiding features and interactions. Actually, it goes beyond just hiding features to making it difficult to discover or understand what features are available or what is causing things to happen. And honestly, I’m getting kind of sick of it.

Take this gem in Windows 10.

I just upgraded to the Anniversary Edition, build 1607, but this may apply to earlier builds as well.

The biggest outward change for me with AE, is that I can no longer disable the lock screen with a group policy. Given that, I decided, that if I can’t not use it, I might as well customize it a bit.

One of the options you can set on the lock screen is the image. The choices currently are to use; Microsoft’s stream of images, a picture of your own choosing, or a slide show of your own images. I had set a picture, but I thought that a slide show would be kind of interesting. After all I have a number of my own images that I wouldn’t mind seeing there randomly.

Only there’s a big hidden catch. If you turn the slideshow on for the lock screen, instead of turning off your displays after N minutes, it does, but it also would lock the the computer and return to the lock screen. At least that’s what it was doing to me.

Edit: There are advanced configuration options for the slideshow located on a separate screen that you get to by clicking a not-very-link-like-looking text link — this flat UI thing is really starting to be more of a pain than it seems to be worth honestly. In there, there is an option to turn off using the lock screen instead of turning off the displays. Though as long as the slideshow is being used, the computer will lock when it turns off the the displays and you’ll have to re-enter your password.

Setting up OpenVPN with Certificates

I did this a couple of years ago, with certificates that had a 1 year expiry date. Then my certs expired, and I’d forgotten what to do. So I figured it out again, and this time I’m writing it down.

There are two ways to setup client auth in OpenVPN, a shared secret and TLS certificates. TLS certificates are the preferred way if you can manage them, as they make it possible to revoke access to devices without having to change the shared secret for every other device.

To do this you need to setup a certificate authority and sign and issue your own certificates. Most OpenVPN guides tell you how to do this using OpenSSL and it’s associated long cryptic commands. I like my method better.

Testing TLS Cipher Performance

As part of my investigation of TLS performance, I decided to benchmark various ciphers and hashing algorihtms on my dev server. My dev machines is a Xeon E3-1220 v2 with 8GB of RAM. For these tests I set the CPU governor performance to insure I wasn’t seeing effects from speedstep throttling the CPU up or down.

The short of it is that I was seeing significantly higher baseline CPU load after enabling H2 on my VPS compared to what I expected. Up from 0.5% to 2-3%. AWS t2.micro instances are burstable configurations designed to operate at a baseline CPU load of 10%. Going from from <1% to ~3% was pretty significant. Not a deal killer, but with no change in traffic that increase in compute load would dramatically decrease the headroom I had to grow before I had to consider a higher tier instance.

I appear to have resolved the production problem by applying the simple principal; encryption strength is proportional to computational complexity, so if there’s a lot of computational load, turning down the encryption strength may improve performance. What I didn’t do was much in the way of actual controlled testing to see if my premise was reliable.

HTTP/2, Encryption, AES, and Load

I’ve been working slowly towards moving to HTTP/2 over the past couple of months. Why? Mostly because its the new shiny and it’s supported by Nginx. Partly because H2 has benefits in reducing network connections by built in multiplexing which improves the efficency of my server and potentially the experience of visitors when loading multiple resources.

Part of HTTP/2, at least by defacto requirement is TLS encryption. The standard for HTTP/2 allows for unencrypted transfers, but none of the browsers that implement it support the unencrypted mode, and therefore there is functionally no unencrypted mode. Given that, phase one of moving to HTTP/2 was getting TLS certificates and getting that up and running.

One of the major counter arguments against TLS everywhere was that it adds compute overhead. Of course, pretty much every discussion I saw on the topic had the proponents shouting down the opponents claiming that it was only a tiny percent; hardly anything to worry about.

The reality is that the overhead of TLS can be a tiny percent, or it can be not so tiny. What it is all depends on your configuration.

Phase 2 of my plan to roll of HTTP/2 for my sites was to slowly move lower traffic stuff to HTTP/2, and see how it affected my server loads, and then move the heavier traffic sites over when I know what kind of CPU loads I could expect.