Threads Are Evil
I'm not going to go into too much detail on the reasons why threads are evil. This topic has already been covered by more experienced authors than myself here, here, here, and perhaps most importantly here. I think I had better clarify that or I'm going to get nasty letters with inappropriate references to my mother. Threads are evil, but sometimes a necessary evil.
The basic problem with threading is one of race conditions on shared resources and non-atomic operations. If thread A begins an operation on resource R, and a context switch occurs while the operation is in progress, thread B can come in and step on its toes and corrupt resource R resulting in undefined and (usually) undesirable behavior.
There are techniques for preventing these kind of problems, which generally involve some kind of carefully-synchronized locking mechanism. Used incorrectly, these locks can lead to freezing deadlocks as each thread is waiting for the other to complete its work. Employed appropriately, these techniques can enable concurrent processing without interfering with the stability of the program.
A Story From The Real World
Join me in my sorrow as I relive the pain of recent events or, more likely, snicker and laugh at me as I run around in a panic trying to put out fires.
At my work, we run a live service application that has been deployed in production for many months. I won't go into too many details for privacy reasons, but suffice it to say that hundreds of devices connect to this system daily and expect it to work around the clock. It has had, as with all non-trivial software, its share of bugs. Most of them have been minor and reasonably straightforward to reproduce and fix. This one added several new twists.
This service, which had been running for some time now, suddenly stopped frozen and unresponsive. Anything short of the all-powerful
kill -9
call was ineffective in terminating the deadlocked process. Fortunately, we have monitoring software in place that detects periods of inactivity and alerts us via email.
Browsing the server logs showed nothing out of the ordinary. The service was under reasonable load, but certainly not pushing the limits of the hardware. A few basic attempts to reproduce the issue came up fruitless. My next course of action was to create a new, massively active stress test. Even running this stress test failed to reproduce the issue. In the meantime, we saw the problem in the production a second time. It was clearly not just somebody's imagination.
The breakthrough moment came when I enhanced the stress test application to randomly break connections and ignore messages. After running the stress test for a few hours, I finally found a frozen process in the test environment. With a little further analysis, I quickly realized that the problem was in the service's signal-handling code.
The following C++ code snippet is a rough simplification of the code being used to process signals:
static std::queue< int > signalQueue;
static void enqueueSignal( int sigNum ) {
signalQueue.push( sigNum );
signal( sigNum );
}
int main( int argc, char** argv ) {
signal( SIGTERM, enqueueSignal );
signal( SIGCHLD, enqueueSignal );
...
while( true ) {
if( !signalQueue.empty() ) {
int sigNum = signalQueue.front();
processSignal( sigNum ); // Defined elsewhere
signalQueue.pop();
}
...
}
}
Deep within the dark and disturbing mysteries of the standard template library lies a secret terror. That terror is designed to deal with the issues of threaded race conditions as I described above. Unfortunately, it can also occasionally ensnare a program that uses STL containers inside of a signal handler.
Signal handlers are special, as they are not run on a separate thread, but still can interrupt the main thread at any point in its processing. They can also interrupt themselves only to be popped off the stack at a later time. This can lead to deadlocks where the signal handler is waiting on a lock that will never be released by the code below it. This is clearly a case of evil squared.
The solution to this problem: use the
Make
Good luck with your ventures into the dangerous world of concurrency and signal processing.
Cheers,
Joshua Ganes
Signal handlers are special, as they are not run on a separate thread, but still can interrupt the main thread at any point in its processing. They can also interrupt themselves only to be popped off the stack at a later time. This can lead to deadlocks where the signal handler is waiting on a lock that will never be released by the code below it. This is clearly a case of evil squared.
A Way Out
The solution to this problem: use the
sig_atomic_t
variable type. This special variable type is intended especially for use inside of a signal handler. It is intended to guarantee an atomic global resource that can safely be updated by a signal handler.Make
sig_atomic_t
your new best friend when writing C++ signal handler functions. This unfortunately means that you cannot implement particularly complex logic within a signal handler function. My recommendation is to implement a new signal queue as an array of sig_atomic_t
values with a cycling index and size counter. These values can then be accessed by the main thread once the signal handler returns. From there you can do whatever complex processing may be necessary for your system.Good luck with your ventures into the dangerous world of concurrency and signal processing.
Cheers,
Joshua Ganes