Leap Second Bug Strikes Linux Users
Leap Second Bug Strikes Linux Users
Like a thief in the night, the Leap Second Bug struck unprepared websites built on (or heavily reliant on) Java last weekend causing them to crash and go offline. Mozilla, Australia-based airline Quantas, and Reddit were among the afflicted sites, reported Joab Jackson of IDG News.
In case you didn’t notice, the International Earth Rotation and Reference Systems Service expanded our weekend by adding one second to Coordinated Universal Time in order to match the Mean Solar Time. This time-change supposedly affected many Java-reliant websites and platforms, including Reddit and Mozilla, according to initial media reports. However, a post by Jonathan Ellis at the Datastax Developer blog, which IDG News first reported on, describes how Linux lies at the heart of the bug.
From the Datastax Developer blog:
The primary symptom of the leap second problem was extremely high system load, with no corresponding increase in requests seen. Particularly unlucky systems would crash. Once diagnosed, a simple reboot or an even more simple reset of Linux’s timekeeping (e.g., via date `date +"%m%d%H%M%C%y.%S"`) was enough to fix the problem; the only difficulty was in determining the cause.
Initial reporting often fingered Java or even Cassandra as the culprit, which is a testament to the popularity of these systems in high-traffic web sites, but the actual problem was a kind of livelock in the Linux system calls responsible for timers. What made this non-obvious (if you weren’t one of the unlucky admins whose servers actually crashed) is that tools like top would report that the application in question was consuming the CPU; digging deeper to see that the culprit was system calls like futex_wait misbehaving is beyond the scope of most systems administration.
Ellis goes on to say this error ended up affecting “Java systems software like Cassandra, Hadoop, ElasticSearch, and Jetty, as well as non-Java code like MySQL or even client software like Firefox.”
Currently, folks are working on a possible fix over at the Linux Kernal Mailing List.

