Don't compare Boost.Function with NULL

Passenger is not an easy project to debug. Being written in C++, it occasionally runs into failures caused by low-level issues. Such is the case with a strange bug report that we received last week. Users reported that Passenger crashes during startup on FreeBSD 11.2, but not on earlier FreeBSD versions. This blog post describes my debugging journey.

Initial user analysis

Initially, users tried to debug this problem themselves. When Passenger crashes, Passenger invokes its built-in crash analysis program. Unfortunately this crash analysis did not work, which led users astray on an entirely different issue.

The crash analysis program uses GDB under the hood. But when this program was first written, GDB (at least the version located in /usr/bin/gdb) was broken. And so the program refuses to use /usr/bin/gdb on FreeBSD. This is probably no longer true.

So the crash analysis program not working on FreeBSD is a completely unrelated issue that confused users. But returning to the main issue: one user hypothesized that it may have to do with the version of Boost that Passenger supplies, namely version 1.64. This user also observed that only recent versions of Passenger crashes on FreeBSD 11.2.

It is around this time when I stepped in.

My analysis: empty function object?

The first thing I did was looking at the crash backtrace. It's something like this:

#4   boost::function1<Passenger::Json::Value, Passenger::Json::Value const&>::operator()  at src/cxx_supportlib/vendor-modified/boost/function/function_template.hpp:757
#5  Passenger::ConfigKit::Store::applyInspectFilters  at src/cxx_supportlib/ConfigKit/Store.h:219
#6   Passenger::ConfigKit::Store::inspect  at src/cxx_supportlib/ConfigKit/Store.h:542
#7   Passenger::LoggingKit::Context::inspectConfig  at src/cxx_supportlib/LoggingKit/Implementation.cpp:592
#8  0x000000000048e3fa in Passenger::Agent::Fundamentals::initializeLoggingKit  at src/agent/Shared/Fundamentals/Initialization.cpp:492
#9   Passenger::Agent::Fundamentals::initializeAgent  at src/agent/Shared/Fundamentals/Initialization.cpp:610
#10  initializeBareEssentials  at src/agent/Watchdog/WatchdogMain.cpp:804
#11  watchdogMain  at src/agent/Watchdog/WatchdogMain.cpp:1296

From bottom to top:

watchdogMain is the main function of the PassengerAgent's watchdog component.
initializeAgent is where it initializes its environment.
One of the subcomponents it tries to initialize, is LoggingKit, a subsystem responsible for – tadaa – handling logging. Hence, initializeLoggingKit.
Under the hood, LoggingKit makes use of the configuration subsystem called ConfigKit (documentation available). At some point it tries to call ConfigKit::Store::inspect().

ConfigKit has a filtering system for filtering out sensitive data from its inspect output, so it tries to apply this filtering system by calling applyInspectFilters().

And that's where things get weird. applyInspectFilters() roughly performs the following:

for (all configuration entries `entry`) {
    // If no inspect filter is defined for the corresponding
    // schema entry, then ignore and continue loop.
    if (entry.schemaEntry->inspectFilter == NULL) {
        continue;
    }

    // There is an inspect filter defined for the corresponding
    // schema entry. So apply the filter into the store.
    entry.userValue = entry.schemaEntry->inspectFilter(entry.userValue); // <--- crash occurs here
}

entry.schemaEntry->inspectFilter is a function object. If I go one level down the stack, then I see that boost/function/function_template.hpp:757 contains:

if (this->empty())
    boost::throw_exception(bad_function_call()); // <--- crash occurs here

Investigating different causes

So Boost obviously thinks that this function object is empty, i.e. it's not assigned to a specific value. But how can this be? Earlier in the loop we obviously checked whether inspectFilter == NULL.

The usual suspect for such issues is memory corruption, e.g. when we're calling a function object that resides in invalid or garbled memory. But the following things make me believe that this is not a memory corruption issue:

The crash occurs consistently. Memory corruption typically results in crashes only some of the time.
The crash occurs in a consistent location. Memory corruption often results in multiple crash locations.
Lack of other evidence of memory corruption (e.g. log messages that contain garbled data).

So my hypothesis was that there is no memory corruption, and inspectFilter == NULL legitimate evaluates to false even though inspectFilter is in actuality empty.

To test whether this hypothesis is true, I asked users to change…

if (entry.schemaEntry->inspectFilter == NULL) {

…to…

if (entry.schemaEntry->inspectFilter.empty()) {

…and then checking whether Passenger still crashes. It didn't, thereby proving my hypothesis.

Why is the comparison operator broken?

So the next question was: why does this happen? I spun up a FreeBSD 11.2 Digital Ocean droplet to investigate. A user suspected that our version of Boost is too old (1.64) and is not compatible with FreeBSD 11.2. So I upgraded to Boost 1.67, but that did not solve the problem either.

Next, I wrote a simple test program that emulates the faulty NULL comparison code and prints the comparison result:

boost::function<void ()> f;
printf("is null: %d\n", f == NULL);

I then ran this program on both macOS Sierra and FreeBSD 11.2, and observed different results! On macOS it printed 1 (as it should) while on FreeBSD it printed 0.

With the aid of a debugger, I found out that f == NULL on macOS called the following Boost comparison function:

inline bool operator!=(const function_base& f,
                       detail::function::useless_clear_type*)
{
  return !f.empty();
}

So on macOS it was a simple alias for calling !empty(). But on FreeBSD 11.2 it called a totally different function:

 #if !(defined(__GNUC__) && __GNUC__ == 3 && __GNUC_MINOR__ <= 3)
// Comparisons between boost::function objects and arbitrary function
// objects. GCC 3.3 and before has an obnoxious bug that prevents this
// from working.
template<typename Functor>
BOOST_FUNCTION_ENABLE_IF_NOT_INTEGRAL(Functor, bool)
operator==(const function_base& f, Functor g)
{
    if (const Functor* fp = f.template target<Functor>())
      return function_equal(*fp, g);
    else return false;
}
 #endif

Just looking at the #if there already gave me the feeling that this is not right. It seemed that Boost is trying to work around a GCC 3 bug, but it wrongly detects FreeBSD's Clang compiler as GCC 3.

I thought this is a bug in Boost, so I submitted a bug report to Boost.

It was Passenger's fault after all (sorry Boost)

Several days later, another user pointed out an important fact. FreeBSD and OpenBSD modified their C++ compilation system in such a way that NULL becomes an alias for nullptr.

nullptr is a new construct in C++, first introduced in C++11. Whereas NULL evaluates to an integer-typed value 0, nullptr evaluates to a pointer-typed value, and so all new C++ code is encouraged to use nullptr.

NULL and nullptr look similar, but they are not the same thing. The latter is actually an object instead of a primitive value. So I wrote the following test program to investigate what happens if I compare an empty function object with nullptr and 0 (which NULL evaluates to), respectively:

boost::function<void ()> f;
printf("is nullptr: %d\n", f == nullptr);
printf("is 0: %d\n", f == 0);

This program printed the same output on both macOS and FreeBSD 11.2, namely 0 and 1! This proved that the problem was in fact caused by FreeBSD 11.2's compiler transforming the following code…

if (entry.schemaEntry->inspectFilter == NULL) {

…into…

if (entry.schemaEntry->inspectFilter == nullptr) {

…the latter which always evaluates to false, regardless of compiler version and platform!

Is it supposed to always evaluate to false? Intuitively I would think that comparing an empty function object with nullptr should evaluate to true, but whether the Boost.Function authors implemented that behavior is another question.

So I looked up the Boost.Function documentation, which says this about comparisons:

Invoking a function object wrapper that does not actually contain a function object is a precondition violation, much like trying to call through a null function pointer, and will throw a bad_function_call exception). We can check for an empty function object wrapper by using it in a boolean context (it evaluates true if the wrapper is not empty) or compare it against 0.

It says nothing about nullptr or NULL. So this issue was 100% my fault: I wasn't supposed to compare with either NULL or nullptr in the first place! Instead, I was supposed to use empty() or I was supposed to cast the function object to a boolean.

Fixing in multiple places

Armed with this newfound knowledge of my own mistake, I fixed not only this particular instance of the problem (in ConfigKit). I also performed a deep investigation throughout the codebase and fixed all instances of this problem. You can see my fix in commit 980643609. This fix will be part of the next Passenger release, which is due soon.