Tuesday, July 30, 2013

Signals/Slots Improvements: Better and Faster

Introduction

Last time I presented basic work on my implementation of Signals/Slots which was thread safe, full-featured (well, is capable of being full-featured), and performed quite well. I had a decent implementation, but there were problems with passing iterators. I have figured out a better solution by making a small change. This improves performance, as well as easily allows the use of begin/end iterators rather than a single iterator which tracked all state.

Theory

The change I made is really simple: I removed the special head/tail node instances. The end iterator is simply nullptr. This of course means I have to be careful with write operations operations, but as it turns out the actual implementation isn't much more complicated, and possibly may even be slightly simpler. There are fewer atomic operations required to setup/end an emit, so the static overhead is significantly less.

Push Front sample implementation

connection
push_front(const SlotFunction& f)
{
    boost::shared_ptr< slot_wrapper > node =
            boost::make_shared< slot< ResultType(Args...), SlotFunction > >(f);
    connection conn(*this, node);
    // acquire unique write lock
    boost::unique_lock< Mutex > lock(_lock);
    if (head == nullptr)
    {
        tail = node;
        group_head = node;
    }
    else
    {
        node->next = head;
        head->prev = node;
    }
    boost::atomic_store(&head, std::move(node));
    return conn;
}

Emit sample implementation

typename Combiner::result_type
signal::operator()(Args ... args) const
{
    boost::shared_ptr< slot_wrapper > begin_ptr = boost::atomic_load(&head);
    const std::tuple< Args... > params(args...);
    iterator begin(std::move(begin_ptr), params);
    iterator end(boost::shared_ptr< slot_wrapper >(), params);
    return combiner(std::move(begin), std::move(end));
}

Benchmark Results

I'm using the same setup as last-time (I actually just added my new timings to the old results). Again, I am using a laptop with an Intel i5-m430 running Windows 7 x64, compiling with mingw-w64 gcc 4.8.1.

Table 1. Emit Testing
Test Case No Slots (ns) 1 Slot (ns) 2 Slots (ns) 4 Slots (ns) 8 Slots (ns) 16 Slots (ns) 32 Slots (ns) 64 Slots (ns) static overhead (ns) dynamic overhead (ns/slot)
std::function calls 0 3.5 7.5 15 29 60 116 232 0 3.6
Qt Signals/Slots 14 75 115 177 312 581 1123 2202 40 33.8
Boost Signals2 75 137 183 253 398 705 1280 2500 75 37
My Implementation - special head/tail nodes 27 57 88 150 272 525 1010 1980 27 30
My Implementation - no unique head/tail nodes 15 47 74 134 258 488 954 1900 15 30

Conclusion

The biggest improvement here is the "standard" begin/end iterator style now can be used without sacrificing performance. Additionally, the static overhead is so small that having a non-thread safe implementation becomes almost moot for signals with no slots. Tim Janik's implementation which isn't thread safe is not that much faster for empty signals, even with his faster i7 hardware. I haven't seen any other stats which would indicate any other implementation (definitely thread-safe implementations, possibly many non-thread safe implementations) perform as well or better. There's also some static memory savings for not having unique head/tail nodes. More to come soon.

No comments :

Post a Comment