Loki Astari

Thoughts of a former code monkey.

Vector - Resource Management Allocation

| Comments

A lot of new developers to C++ attempt to build a Vector like container as a learning processes. Getting a simple version of this working for POD types (like int) is not that complicated. The next step in getting this working for arbitrary data types takes a significant leap forward in thinking in C++ especially when you start looking at efficiency and exception safety. This set of five articles looks at building an efficient Vector implementation. I show some of the common mistakes and explain why and how to resolve the problems:

Note: This is not meant to replace std::vector<> this is simply meant as a teaching process.

Rule of Zero

You will notice that half the attempts below Sources are Vector implementations the other half are for Matrix implementations. I mention both because I want to emphasize the Separation of concerns. An object should be responsible for either business logic or resource management (not both). A lot of the Matrix implementations are trying to mix resource management (memory management) with the business logic of how matrices interact. So if you want to write a matrix class you should delegate resource management to a separate class (In a first pass std::vector<int> would be a good choice).

In C++ the compiler generates a couple of methods for free.

  • Destructor
  • Copy Constructor
  • Copy Assignment Operator
  • Move Constructor
  • Move Assignment Operator

These methods usually work perfectly well; unless your class contains a pointer (or a pointer like resource object). But if your class is doing business logic then it should not contain a pointer. So classes that handle business logic therefore should not be defining any of these compiler generated methods (just let the compiler generated ones work for you). Occasionally you want to delete them, to prevent copying or movement, but it is very unusual for these to need specialized implementations.

Conversely, classes that do resource management usually contain a pointer (or pointer like resource object). These classes should define all the above methods to correctly handle the resource. This is where ownership semantics of the resource are defined. The owner of the resource is responsible for destroying the resource when its lifespan is over (in terms of pointers this means the owner is responsible for calling delete on the pointer, usually in the destructor). If you are not the owner of a resource you should not have access to the resource object directly, as it may be destroyed by the owner without other objects knowing.

Rule of three

The rule of three comes from C++03 where we only had copy semantics.

Version-1 Simple Resource Management

When creating a class to manage resources; the first version created by beginner usually looks like this:

Rule of three first pass

template<typename T>
class Vector
{
    std::size_t     size;
    T*              buffer;
    Vector(int size = 100)
        : size(size)
        , buffer(new T[size])   // Allocate the resource
    {}
    ~Vector()
    {
        delete [] buffer;       // Clean up the resource
    }
};

The trouble here is that this version has a fundamental flaw because of the way the compiler generated copy constructor and copy assignment operator work with pointers (commonly referred to as the shallow copy problem).

Shallow copy problem.

int main()
{
    Vector<int>   x;
    Vector<int>   y(x);     // Compiler generate copy constructure does
                            // an element wise shallow copy of each element.
                            // This means both `x` and `y` have a buffer
                            // member that points at the same area in memory.
                            //
                            // When the objects go out of scope both will
                            // try and call delete on the memory resulting
                            // in a double delete of the memory.

    Vector<int>   z;        // Same problem after an assignment.
    z=x;
}

Version-2 Rule of Three

The rule of three simply stated is: If you define any of the methods Destructor/Copy Constructor/Copy Assignment Operator then you should define all three. When done correctly this resolves the shallow copy problem. Vector defines the destructor so we also need to define the copy constructor and copy assignment operator.

I see this as an initial attempt at defining the rule of three for vectors very often.

Rule of three second pass

template<typename T>
class Vector
{
    std::size_t     size;
    T*              buffer;
    Vector(int size = 100)
        : size(size)
        , buffer(new T[size])
    {}
    ~Vector()
    {
        delete [] buffer;
    }
    Vector(Vector const& copy)
        : size(copy.size)
        , buffer(new T[size])
    {
        // Copy constructor is simple.
        // We create a new resource area of the required size.
        // Then we copy the data from the old buffer to the new buffer.
        std::copy(copy.buffer, copy.buffer + copy.size, buffer);
    }
    Vector& operator=(Vector const& copy)
    {
        // Copy Object
        // This is relatively easy. But I want to cover this in detail in a subsquent post.
        return *this;
    }
};

Version-3 Lazy Construction of elements.

The problem with the previous version is that it forces initialization of all elements in the buffer immediately. This forces the requirement that members of the Vector (i.e. type T) must be default constructable. It also has two efficiency constraints imposed on the Vector:

  • You can't pre-allocate space for future members.
    • So resizing (larger or smaller) becomes very expensive as each resize requires copy all the elements to the newly re-sized buffer.
    • Alternatively pre-creating all the elements you need can also be expensive especially if construction of T is expensive.
  • The copy constructor is twice as expensive as it should be. Each element must be:
    • Default constructed (when the buffer is created).
    • Then copy constructed with the value from the source vector.

This attempt improves on that by allowing efficient pre-allocating of space (capacity) for the buffer. New members are then added by constructing in-place using placement new.

Rule of three third pass

template<typename T>
class Vector
{
    std::size_t     capacity;
    std::size_t     length;
    T*              buffer;
    Vector(int capacity)
        : capacity(capacity = 100)
        , length(0)
        // Allocates space but does not call the constructor
        , buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
        // Useful if the type T has an expensive constructor
        // We preallocate space without initializing it giving
        // room to grow and shrink the buffer without re-allocating.
    {}
    ~Vector()
    {
        // Because elements are constructed in-place using
        // placement new. Then we must manually call the destructor
        // on the elements.
        for(int loop = 0; loop < length; ++loop)
        {
            // Note we destroy the elements in reverse order.
            buffer[length - 1 - loop].~T();
        }
        ::operator delete(buffer);
    }
    Vector(Vector const& copy)
        : capacity(copy.length)
        , length(0)
        , buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
    {
        // Copy constructor is simple.
        // We create a new resource area of the required length.
        // But these elements are not initialized so we use push_back to copy them
        // into the new object. This is an improvement because we
        // only construct the members of the vector once.
        for(int loop = 0; loop < copy.length; ++loop)
        {
            push_back(copy.buffer[loop]);
        }
    }
    Vector& operator=(Vector const& copy)
    {
        // Copy Object
        // This is relatively easy. But I want to cover this in detail in a subsquent post.
        return *this;
    }
    void push_back(T const& value)
    {
        // Use placement new to copy buffer into the new buffer
        new (buffer + length) T(value);
        ++length;

        // Note we will handle growing the capacity later.
    }
    void pop_back()
    {
        // When removing elements need to manually call the destructor
        // because we created them using placement new.
        --length;
        buffer[length].~T();
    }
};

Rule of Five

In C++11 the language added the concept of "Move Semantics". Rather than having to copy an object (especially on return from a function) we could "move" an object. The concept here is that movement is supposed to be much cheaper than copying because you move the internal data structure of an object rather than all the elements. A good example is a std::vector. Before C++11 a return by value meant copying the object. The constructor of the new object allocates a new internal buffer and then copies all the elements from the original object's buffer to the new object's buffer. On the other hand a move simply gives the new object the internal buffer of the old object (we just move the pointer to the internal buffer). When an object is moved to another object the old object should be left in a valid state, but for efficiency the standard rarely specifies the state of an object after it has been the source of a move. Thus using an object after a move is a bad idea unless you are setting it to a specific state.

There are two new methods that allow us to specify move semantics on a class.

Vector Move Semantics.

class Vector
{
    std::size_t     capacity;
    std::size_t     length;
    T*              buffer;
    // STUFF

    // Move Constructor
    Vector(Vector&& move) noexcept;

    // Move Assignment Operator
    Vector& operator=(Vector&& move) noexcept;
};

Notice the && operator. This donates an r-value reference and means that your object is the destination of a move operation. The parameter passed is the source object and the state you should use to define your new object's state. After the move the source object must be left in a valid (but can be undefined state). For a vector this means it must no longer be the owner of the internal buffer that you are now using in your buffer.

The simplest way to achieve this goal is to set up the object in a valid (but very cheap to achieve state) and then swap the current object with the destination object.

Vector Move Semantics Implementation

class Vector
{
    std::size_t     capacity;
    std::size_t     length;
    T*              buffer;
    // STUFF

    // Move Constructor
    Vector(Vector&& move) noexcept
        : capacity(0)
        , length(0)
        , buffer(nullptr)
    {
        // The source object now has a nullptr/
        // This object has taken the state of the source object.
        move.swap(*this);
    }

    // Move Assignment Operator
    Vector& operator=(Vector&& move) noexcept
    {
        // In this case simply swap the source object
        // and this object around.
        move.swap(*this);
        return *this;
    }
};

Note I marked both move operators noexcept. Assuming the operations are guaranteed not to throw you should mark them as noexcept. If we know that certain operations are exception safe, then we can optimize resize operations and maintain the strong exception guarantee. This and some other optimizations will be documented in a subsequent post.

Final Version

Vector Final Version

template<typename T>
class Vector
{
    std::size_t     capacity;
    std::size_t     length;
    T*              buffer;

    struct Deleter
    {
        void operator()(T* buffer) const
        {
            ::operator delete(buffer);
        }
    };

    public:
        Vector(int capacity = 10)
            : capacity(capacity)
            , length(0)
            , buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
        {}
        ~Vector()
        {
            // Make sure the buffer is deleted even with exceptions
            // This will be called to release the pointer at the end
            // of scope.
            std::unique_ptr<T, Deleter>     deleter(buffer, Deleter());

            // Call the destructor on all the members in reverse order
            for(int loop = 0; loop < length; ++loop)
            {
                // Note we destroy the elements in reverse order.
                buffer[length - 1 - loop].~T();
            }
        }
        Vector(Vector const& copy)
            : capacity(copy.length)
            , length(0)
            , buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
        {
            try
            {
                for(int loop = 0; loop < copy.length; ++loop)
                {
                    push_back(copy.buffer[loop]);
                }
            }
            catch(...)
            {
                std::unique_ptr<T, Deleter>     deleter(buffer, Deleter());
                // If there was an exception then destroy everything
                // that was created to make it exception safe.
                for(int loop = 0; loop < length; ++loop)
                {
                    buffer[length - 1 - loop].~T();
                }

                // Make sure the exceptions continue propagating after
                // the cleanup has completed.
                throw;
            }
        }
        Vector& operator=(Vector const& copy)
        {
            // Covered in Part 2
            return *this;
        }
        Vector(Vector&& move) noexcept
            : capacity(0)
            , length(0)
            , buffer(nullptr)
        {
            move.swap(*this);
        }
        Vector& operator=(Vector&& move) noexcept
        {
            move.swap(*this);
            return *this;
        }
        void swap(Vector& other) noexcept
        {
            using std::swap;
            swap(capacity,      other.capacity);
            swap(length,        other.length);
            swap(buffer,        other.buffer);
        }
        void push_back(T const& value)
        {
            resizeIfRequire();
            new (buffer + length) T(value);
            ++length;
        }
        void pop_back()
        {
            --length;
            buffer[length].~T();
        }
    private:
        void resizeIfRequire()
        {
            if (length == capacity)
            {
                // Covered in Part 2
            }
        }
};

Summary

This article has shown how to handle the basic resource management required by a vector. It has covered several important principles for C++ programmers.

  • Separation Of Concerns
  • Rule of Zero
  • Rule of Three
  • Rule of Five
  • Default compiler generated methods
  • Shallow Copy Problem
  • Placement New
  • Exception Guarantees

Sources

Looking at CodeReview.stackexchange.com; reimplementing the vector class is a common goal for a first project.

Comments