Bench False Cache Sharing

False cache sharing is when a bit of data shares a cache line and is getting dragged around the cores even though it doesn’t getting updated. It can be mittigated by having an empty bit of data at the end of the struct.

#include <thread>
#include <cstdint>
#include <chrono>


struct foo {
        uint32_t i;
        char padding[64]; /* optional */
};


void proc(foo *f, uint32_t count) {
        for(uint32_t i = 0; i < count; ++i) {
                f->i += 1;
        }
}


constexpr int th_count = 4;

foo data[th_count];
std::thread pool[th_count];


int main(int argc, const char **argv) {
        uint32_t count = std::atoi(argv[1]);

        auto begin = std::chrono::high_resolution_clock::now();

        for(int i = 0; i < th_count; ++i) {
                pool[i] = std::thread(proc, &data[i], count);
        }

        for(auto &p : pool) {
                p.join();
        }

        auto end = std::chrono::high_resolution_clock::now();

        auto diff = end - begin;
        auto time = std::chrono::duration_cast<std::chrono::nanoseconds>(diff).count();
        printf("Time: %d\n", (int)(time));
}

Results

In milliseconds.

Threads  | No padding | Padding
---------|------------|---------
4        | 158042     | 127984

Not a huge loss, but an easy win, its also a loss that will increase with cores/threads on your system. I should try and get some result on a system with more threads.