Thursday, February 28, 2013

Signal Handlers Plus Locking Equals Evil Squared

Threads Are Evil


I'm not going to go into too much detail on the reasons why threads are evil. This topic has already been covered by more experienced authors than myself hereherehere, and perhaps most importantly here. I think I had better clarify that or I'm going to get nasty letters with inappropriate references to my mother. Threads are evil, but sometimes a necessary evil.

The basic problem with threading is one of race conditions on shared resources and non-atomic operations. If thread A begins an operation on resource R, and a context switch occurs while the operation is in progress, thread B can come in and step on its toes and corrupt resource R resulting in undefined and (usually) undesirable behavior.

There are techniques for preventing these kind of problems, which generally involve some kind of carefully-synchronized locking mechanism. Used incorrectly, these locks can lead to freezing deadlocks as each thread is waiting for the other to complete its work. Employed appropriately, these techniques can enable concurrent processing without interfering with the stability of the program.

A Story From The Real World


Join me in my sorrow as I relive the pain of recent events or, more likely, snicker and laugh at me as I run around in a panic trying to put out fires.

At my work, we run a live service application that has been deployed in production for many months. I won't go into too many details for privacy reasons, but suffice it to say that hundreds of devices connect to this system daily and expect it to work around the clock. It has had, as with all non-trivial software, its share of bugs. Most of them have been minor and reasonably straightforward to reproduce and fix. This one added several new twists.

This service, which had been running for some time now, suddenly stopped frozen and unresponsive. Anything short of the all-powerful kill -9 call was ineffective in terminating the deadlocked process. Fortunately, we have monitoring software in place that detects periods of inactivity and alerts us via email.

Browsing the server logs showed nothing out of the ordinary. The service was under reasonable load, but certainly not pushing the limits of the hardware. A few basic attempts to reproduce the issue came up fruitless. My next course of action was to create a new, massively active stress test. Even running this stress test failed to reproduce the issue. In the meantime, we saw the problem in the production a second time. It was clearly not just somebody's imagination.

The breakthrough moment came when I enhanced the stress test application to randomly break connections and ignore messages. After running the stress test for a few hours, I finally found a frozen process in the test environment. With a little further analysis, I quickly realized that the problem was in the service's signal-handling code.

The following C++ code snippet is a rough simplification of the code being used to process signals:

static std::queue< int > signalQueue;

static void enqueueSignal( int sigNum ) {

    signalQueue.push( sigNum );
    signal( sigNum );

}

int main( int argc, char** argv ) {

    signal( SIGTERM, enqueueSignal );
    signal( SIGCHLD, enqueueSignal );
    ...

    while( true ) {

        if( !signalQueue.empty() ) {

            int sigNum = signalQueue.front();
            processSignal( sigNum ); // Defined elsewhere
            signalQueue.pop();
        }

        ...

    }

}


Deep within the dark and disturbing mysteries of the standard template library lies a secret terror. That terror is designed to deal with the issues of threaded race conditions as I described above. Unfortunately, it can also occasionally ensnare a program that uses STL containers inside of a signal handler.

Signal handlers are special, as they are not run on a separate thread, but still can interrupt the main thread at any point in its processing. They can also interrupt themselves only to be popped off the stack at a later time. This can lead to deadlocks where the signal handler is waiting on a lock that will never be released by the code below it. This is clearly a case of evil squared.

A Way Out


The solution to this problem: use the sig_atomic_t variable type. This special variable type is intended especially for use inside of a signal handler. It is intended to guarantee an atomic global resource that can safely be updated by a signal handler.

Make sig_atomic_t your new best friend when writing C++ signal handler functions. This unfortunately means that you cannot implement particularly complex logic within a signal handler function. My recommendation is to implement a new signal queue as an array of sig_atomic_t values with a cycling index and size counter. These values can then be accessed by the main thread once the signal handler returns. From there you can do whatever complex processing may be necessary for your system.

Good luck with your ventures into the dangerous world of concurrency and signal processing.

Cheers,

Joshua Ganes

Saturday, February 23, 2013

Issues With Java Updater or Where The Heck Did This Ask.com Toolbar Come From And How Do I Get Rid Of It?

Act I - Morning

I woke up, kissed my wife, and threw on my housecoat on my way out of the bedroom. I grabbed a box of Cherrios, a bowl, a spoon, and a jug of milk and balanced them carefully as I walked toward my computer chair. I jiggled the mouse a bit and waited for the computer to wake up. I immediately opened a handful of browser tabs to find some early-morning entertainment and began sifting through the titles looking for amusement.

It was at about this point when I noticed a little flashing logo on my task bar. Oracle had come out with yet another Java update and needed my permission to go ahead and install it. Asking for permission is good, right? If Java just replaced itself inside of a critical system, who knows what could happen? We could wake up and find that all of Earth's computers had ground to a halt driving the human race into anarchy and chaos with cruel, makeshift bludgeoning weapons and the weak and helpless crying in the streets. More likely, the big boss might miss an important message about an important meeting with an important fellow and miss out on an opportunity to access all of his important money.

I understand why the Java updater errs on the side of caution here. My real problem stems from trying to find a way to change its default behavior. I have never (knowingly, anyhow) had an application that I use at home break because of a Java update. I've tried to (and perhaps some reader can help enlighten me) override this and find some configuration that allows me to install these updates automatically. So far, my search has been fruitless.

Rather than performing updates automatically, I do them mindlessly. I repeat the mantra "Shut up, Java. Yes, Java" over and over in my head with a disgruntled disposition while hammering the Next button as fast as I can. After hitting the rightmost button on the update wizard a few times, the flashing icon goes away and Java is appeased once again. This technique served me well for several months and I used it once again on this occasion.

I continued my breakfast and wake-up routine. I finished my bowl of cereal and read through a few amusing articles online until my start-of-shower deadline beckoned me on. I hurried to get myself ready for the day and rushed out the door just on time to make my way to work for the day.

Act II - Evening

My family and I had just finished dinner. I delivered the dirty dishes to the sink and waved at them expecting that they would magically clean themselves. I have yet to master this technique. My wife was playing on the floor with our little girl. I can't quite recall what I was looking for, but I decided that I wanted to search for someting online.

I opened my browser and typed my query into the Google Chrome omnibox. I paused for a moment because something didn't look quite right. Why did Google just give me an Ask.com search result as my top link? Figuring I must have typed something strange, I rephrased my query and typed again. "Who changed my default search engine?", I asked to the room in general. My wife gave me a shrug of uncertainty.

I looked more closely and realized that there was some kind of new toolbar that I hadn't seen before. I couldn't remember installing anything of dubious origin recently. I was both miffed and perplexed. I thought, like with other malware, that I would have to go on a long hunt to find the culprit and eradicate it. Thankfully, it turns out that removing it wasn't too arduous a process. NB: This admission does not excuse anyone of wrong doing.

I found the Ask.com toolbar under the Windows Control Panel Uninstall option and purged it with great vengeance. I reset my default search engine back to Google and everything was right with the world. Everything except the question that kept nagging at me - where did it come from? I finished my search and settled in for a cozy evening with my family.

Act III - Revelation

Our story continues about a week or so later. Another day, another web search... The Ask.com search and toolbar had come back. "Clever girl..."

This was surely the work of a genius criminal mind. Lull them into a false sense of security and then strike when they least expect. Play dead when they try to fight back.

This time I was determined to rid myself of the problem once and for all. A few quick searches led me to an unexpected and horrific conclusion. This was not from some random malware, but from Java. Good ol' Java - my buddy, my pal - running slowly and pretending to be as cross-platform as ever. My first university course was in Java. I was even the TA for a whole lab section on Java. This time, it was personal.

While it is technically not required, Java runs on so many personal computers that it might as well be considered an essential component. The fact that this story played out this way for me means that it probably played out similarly for thousands of others, if not more.

My problem isn't with Java bundling the Ask.com toolbar, though that's questionable enough. My problem is with how hard they seem to want to shove it down our throats.

For one, the optional toolbar installation is automatically checked by default. This means that people who aren't paying attention (me) or people who don't know better will be force fed a toolbar that nobody actually wants. This is particularly grievous for novice users who can't easily figure out how to undo the damage.

My second objection is that the user's preference not even saved for the next go around. This means that my old method of mindlessly clicking through the installation wizard is no longer safe. I now have to pay attention and deselect this option once every few weeks (probably more like twice or three times as I use a few different computers). This totally exacerbates the problem I described earlier of not being able to automate the update process. I leave a back-of-the-envelope calculation on the amount of time wasted worldwide as an exercise for the reader.

Oracle is abusing Java's absurdly advantageous position as an "essential component" to cash in on some marketing money from Ask.com. This soulless cash grab inconveniences and disrespects thousands of users each and every time they are forced to disable this option.

Unfortunately, I don't believe that Oracle or Ask.com will be punished adequately for this abuse of power. I sincerely hope that I am wrong.

Cheers,

Joshua Ganes