Unlock the Power of C++ Memory: Boost Performance with Custom Allocators

programming

Unlock the Power of C++ Memory: Boost Performance with Custom Allocators

Custom allocators in C++ offer control over memory management, potentially boosting performance. They optimize allocation for specific use cases, reduce fragmentation, and enable tailored strategies like pool allocation or memory tracking.

Aug 17, 2023

Unlock the Power of C++ Memory: Boost Performance with Custom Allocators

Memory management in C++ can be tricky, but using custom allocators can give you more control and potentially boost performance. I’ve played around with allocators a fair bit, and they’re pretty cool once you get the hang of them.

First off, let’s talk about why you’d want to use a custom allocator. The default allocator in C++ is fine for most cases, but it’s not always the most efficient. If you’re working on a project where performance is critical, or you need to manage memory in a specific way, custom allocators can be a game-changer.

One of the main benefits is that you can optimize memory allocation for your specific use case. For example, if you know you’ll be allocating a lot of small objects, you can create an allocator that’s tailored for that. This can reduce fragmentation and improve overall performance.

I remember working on a project where we were dealing with a ton of small objects, and the default allocator was causing some serious slowdowns. We switched to a custom allocator that used a pool of pre-allocated memory, and it made a huge difference. The application ran much smoother, and we saw a significant reduction in memory usage.

Now, let’s get into the nitty-gritty of how to actually implement a custom allocator. The basic idea is to create a class that overrides the allocation and deallocation functions. Here’s a simple example:

template <typename T>
class MyAllocator {
public:
    using value_type = T;

    MyAllocator() noexcept {}

    template <typename U>
    MyAllocator(const MyAllocator<U>&) noexcept {}

    T* allocate(std::size_t n) {
        return static_cast<T*>(::operator new(n * sizeof(T)));
    }

    void deallocate(T* p, std::size_t) noexcept {
        ::operator delete(p);
    }
};

This is a very basic allocator that doesn’t do much beyond what the default allocator does. But it gives you a starting point to build on.

One cool thing you can do with custom allocators is implement different allocation strategies. For example, you could create a pool allocator that pre-allocates a chunk of memory and then parcels it out as needed. This can be super efficient for allocating lots of small objects.

Here’s a simple pool allocator:

template <typename T, size_t BlockSize = 4096>
class PoolAllocator {
    union Slot {
        T element;
        Slot* next;
    };

    Slot* currentBlock = nullptr;
    Slot* currentSlot = nullptr;
    Slot* lastSlot = nullptr;
    Slot* freeSlots = nullptr;

public:
    using value_type = T;

    T* allocate(size_t n) {
        if (n > 1) {
            return static_cast<T*>(::operator new(n * sizeof(T)));
        }

        if (freeSlots != nullptr) {
            T* result = reinterpret_cast<T*>(freeSlots);
            freeSlots = freeSlots->next;
            return result;
        }

        if (currentSlot >= lastSlot) {
            currentBlock = reinterpret_cast<Slot*>(::operator new(BlockSize));
            currentSlot = currentBlock;
            lastSlot = currentBlock + (BlockSize / sizeof(Slot)) - 1;
        }

        return reinterpret_cast<T*>(currentSlot++);
    }

    void deallocate(T* p, size_t n) {
        if (n > 1) {
            ::operator delete(p);
            return;
        }

        reinterpret_cast<Slot*>(p)->next = freeSlots;
        freeSlots = reinterpret_cast<Slot*>(p);
    }
};

This allocator pre-allocates blocks of memory and keeps track of free slots. When you allocate, it either returns a free slot or allocates a new one from the current block. When you deallocate, it adds the slot back to the free list.

I’ve used something similar to this in a game engine I was working on, and it made a big difference in performance, especially for particle systems where we were constantly creating and destroying small objects.

Another cool use for custom allocators is implementing memory tracking. You can create an allocator that keeps track of all allocations and deallocations, which can be super helpful for debugging memory leaks. Here’s a simple example:

template <typename T>
class TrackingAllocator {
    static size_t allocatedBytes;
    static size_t allocatedCount;

public:
    using value_type = T;

    T* allocate(size_t n) {
        size_t bytes = n * sizeof(T);
        allocatedBytes += bytes;
        allocatedCount++;
        return static_cast<T*>(::operator new(bytes));
    }

    void deallocate(T* p, size_t n) noexcept {
        allocatedBytes -= n * sizeof(T);
        allocatedCount--;
        ::operator delete(p);
    }

    static size_t getAllocatedBytes() { return allocatedBytes; }
    static size_t getAllocatedCount() { return allocatedCount; }
};

template <typename T>
size_t TrackingAllocator<T>::allocatedBytes = 0;

template <typename T>
size_t TrackingAllocator<T>::allocatedCount = 0;

This allocator keeps track of the total number of bytes allocated and the number of allocations. You can use this to get a snapshot of your memory usage at any point in your program.

One thing to keep in mind when using custom allocators is that they need to be stateless to be interchangeable. This means you can’t store any data in the allocator itself. If you need to store state, you’ll need to use some kind of global or thread-local storage.

Another important consideration is thread safety. If your allocator will be used in a multi-threaded environment, you’ll need to make sure it’s thread-safe. This usually involves using some kind of synchronization primitive, like a mutex.

Here’s an example of a thread-safe version of our pool allocator:

template <typename T, size_t BlockSize = 4096>
class ThreadSafePoolAllocator {
    // ... same as before ...
    std::mutex mutex;

public:
    T* allocate(size_t n) {
        std::lock_guard<std::mutex> lock(mutex);
        // ... same allocation logic as before ...
    }

    void deallocate(T* p, size_t n) {
        std::lock_guard<std::mutex> lock(mutex);
        // ... same deallocation logic as before ...
    }
};

This version uses a mutex to ensure that only one thread can access the allocator at a time. This is a simple approach, but it can be a performance bottleneck if you’re doing a lot of allocations. For high-performance multi-threaded scenarios, you might want to look into lock-free allocators.

Custom allocators can also be used to implement more advanced memory management techniques. For example, you could create an allocator that uses a memory mapped file for its storage. This can be useful for creating persistent data structures or for working with large datasets that don’t fit in RAM.

Here’s a basic example of a memory mapped allocator:

#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>

template <typename T>
class MMapAllocator {
    void* mmapStart;
    size_t mmapSize;
    size_t used;

public:
    using value_type = T;

    MMapAllocator(const char* filename, size_t size) {
        int fd = open(filename, O_RDWR | O_CREAT, (mode_t)0600);
        if (fd == -1) {
            throw std::runtime_error("Error opening file for mmap");
        }

        if (lseek(fd, size - 1, SEEK_SET) == -1) {
            close(fd);
            throw std::runtime_error("Error calling lseek() to 'stretch' the file");
        }

        if (write(fd, "", 1) == -1) {
            close(fd);
            throw std::runtime_error("Error writing last byte of the file");
        }

        mmapStart = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
        if (mmapStart == MAP_FAILED) {
            close(fd);
            throw std::runtime_error("Error mmapping the file");
        }

        close(fd);
        mmapSize = size;
        used = 0;
    }

    ~MMapAllocator() {
        if (mmapStart != MAP_FAILED) {
            munmap(mmapStart, mmapSize);
        }
    }

    T* allocate(size_t n) {
        size_t bytes = n * sizeof(T);
        if (used + bytes > mmapSize) {
            throw std::bad_alloc();
        }
        T* result = reinterpret_cast<T*>(static_cast<char*>(mmapStart) + used);
        used += bytes;
        return result;
    }

    void deallocate(T*, size_t) {
        // In this simple implementation, we don't actually free memory
    }
};

This allocator maps a file into memory and then allocates from that memory. This can be useful for creating data structures that persist between program runs or for working with datasets that are too large to fit in RAM.

One last thing to mention is that custom allocators can be used with standard containers. This means you can use your custom allocators with things like std::vector or std::map. Here’s an example:

std::vector<int, MyAllocator<int>> vec;

This creates a vector that uses your custom allocator for memory management. This can be really powerful, as it allows you to customize the memory behavior of standard containers.

In conclusion, custom allocators are a powerful tool in C++ that can help you optimize memory usage, improve performance, and implement advanced memory management techniques. They require a bit of work to set up, but the benefits can be substantial in the right situations. Whether you’re working on a high-performance game engine, a memory-constrained embedded system, or just want more control over your program’s memory usage, custom allocators are definitely worth exploring.