A lot of new developers to C++ attempt to build a Vector
like container as a learning processes. Getting a simple version of this working for POD types (like int) is not that complicated. The next step in getting this working for arbitrary data types takes a significant leap forward in thinking in C++ especially when you start looking at efficiency and exception safety. This set of five articles looks at building an efficient Vector
implementation. I show some of the common mistakes and explain why and how to resolve the problems:
Note: This is not meant to replace std::vector<>
this is simply meant as a teaching process.
Rule of Zero
You will notice that half the attempts below Sources are Vector implementations the other half are for Matrix implementations. I mention both because I want to emphasize the Separation of concerns. An object should be responsible for either business logic or resource management (not both). A lot of the Matrix implementations are trying to mix resource management (memory management) with the business logic of how matrices interact. So if you want to write a matrix class you should delegate resource management to a separate class (In a first pass std::vector<int>
would be a good choice).
In C++ the compiler generates a couple of methods for free.
- Destructor
- Copy Constructor
- Copy Assignment Operator
- Move Constructor
- Move Assignment Operator
These methods usually work perfectly well; unless your class contains a pointer (or a pointer like resource object). But if your class is doing business logic then it should not contain a pointer. So classes that handle business logic therefore should not be defining any of these compiler generated methods (just let the compiler generated ones work for you). Occasionally you want to delete them, to prevent copying or movement, but it is very unusual for these to need specialized implementations.
Conversely, classes that do resource management usually contain a pointer (or pointer like resource object). These classes should define all the above methods to correctly handle the resource. This is where ownership semantics of the resource are defined. The owner of the resource is responsible for destroying the resource when its lifespan is over (in terms of pointers this means the owner is responsible for calling delete
on the pointer, usually in the destructor). If you are not the owner of a resource you should not have access to the resource object directly, as it may be destroyed by the owner without other objects knowing.
The rule of three comes from C++03 where we only had copy semantics.
Version-1 Simple Resource Management
When creating a class to manage resources; the first version created by beginner usually looks like this:
Rule of three first pass
template<typename T>
class Vector
{
std::size_t size;
T* buffer;
Vector(int size = 100)
: size(size)
, buffer(new T[size])
{}
~Vector()
{
delete [] buffer;
}
};
The trouble here is that this version has a fundamental flaw because of the way the compiler generated copy constructor and copy assignment operator work with pointers (commonly referred to as the shallow copy problem).
Shallow copy problem.
int main()
{
Vector<int> x;
Vector<int> y(x);
Vector<int> z;
z=x;
}
Version-2 Rule of Three
The rule of three simply stated is: If you define any of the methods Destructor/Copy Constructor/Copy Assignment Operator then you should define all three. When done correctly this resolves the shallow copy problem. Vector
defines the destructor so we also need to define the copy constructor and copy assignment operator.
I see this as an initial attempt at defining the rule of three for vectors very often.
Rule of three second pass
template<typename T>
class Vector
{
std::size_t size;
T* buffer;
Vector(int size = 100)
: size(size)
, buffer(new T[size])
{}
~Vector()
{
delete [] buffer;
}
Vector(Vector const& copy)
: size(copy.size)
, buffer(new T[size])
{
std::copy(copy.buffer, copy.buffer + copy.size, buffer);
}
Vector& operator=(Vector const& copy)
{
return *this;
}
};
Version-3 Lazy Construction of elements.
The problem with the previous version is that it forces initialization of all elements in the buffer immediately. This forces the requirement that members of the Vector
(i.e. type T
) must be default constructable. It also has two efficiency constraints imposed on the Vector:
- You can't pre-allocate space for future members.
- So resizing (larger or smaller) becomes very expensive as each resize requires copy all the elements to the newly re-sized buffer.
- Alternatively pre-creating all the elements you need can also be expensive especially if construction of
T
is expensive.
- The copy constructor is twice as expensive as it should be. Each element must be:
- Default constructed (when the buffer is created).
- Then copy constructed with the value from the source vector.
This attempt improves on that by allowing efficient pre-allocating of space (capacity
) for the buffer. New members are then added by constructing in-place using placement new.
Rule of three third pass
template<typename T>
class Vector
{
std::size_t capacity;
std::size_t length;
T* buffer;
Vector(int capacity)
: capacity(capacity = 100)
, length(0)
, buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
{}
~Vector()
{
for(int loop = 0; loop < length; ++loop)
{
buffer[length - 1 - loop].~T();
}
::operator delete(buffer);
}
Vector(Vector const& copy)
: capacity(copy.length)
, length(0)
, buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
{
for(int loop = 0; loop < copy.length; ++loop)
{
push_back(copy.buffer[loop]);
}
}
Vector& operator=(Vector const& copy)
{
return *this;
}
void push_back(T const& value)
{
new (buffer + length) T(value);
++length;
}
void pop_back()
{
--length;
buffer[length].~T();
}
};
Rule of Five
In C++11 the language added the concept of "Move Semantics". Rather than having to copy an object (especially on return from a function) we could "move" an object. The concept here is that movement is supposed to be much cheaper than copying because you move the internal data structure of an object rather than all the elements. A good example is a std::vector. Before C++11 a return by value meant copying the object. The constructor of the new object allocates a new internal buffer and then copies all the elements from the original object's buffer to the new object's buffer. On the other hand a move simply gives the new object the internal buffer of the old object (we just move the pointer to the internal buffer). When an object is moved to another object the old object should be left in a valid state, but for efficiency the standard rarely specifies the state of an object after it has been the source of a move. Thus using an object after a move is a bad idea unless you are setting it to a specific state.
There are two new methods that allow us to specify move semantics on a class.
Vector Move Semantics.
class Vector
{
std::size_t capacity;
std::size_t length;
T* buffer;
Vector(Vector&& move) noexcept;
Vector& operator=(Vector&& move) noexcept;
};
Notice the &&
operator. This donates an r-value reference and means that your object is the destination of a move operation. The parameter passed is the source object and the state you should use to define your new object's state. After the move the source object must be left in a valid (but can be undefined state). For a vector this means it must no longer be the owner of the internal buffer that you are now using in your buffer.
The simplest way to achieve this goal is to set up the object in a valid (but very cheap to achieve state) and then swap the current object with the destination object.
Vector Move Semantics Implementation
class Vector
{
std::size_t capacity;
std::size_t length;
T* buffer;
Vector(Vector&& move) noexcept
: capacity(0)
, length(0)
, buffer(nullptr)
{
move.swap(*this);
}
Vector& operator=(Vector&& move) noexcept
{
move.swap(*this);
return *this;
}
};
Note I marked both move operators noexcept
. Assuming the operations are guaranteed not to throw you should mark them as noexcept
. If we know that certain operations are exception safe, then we can optimize resize operations and maintain the strong exception guarantee. This and some other optimizations will be documented in a subsequent post.
Final Version
Vector Final Version
template<typename T>
class Vector
{
std::size_t capacity;
std::size_t length;
T* buffer;
struct Deleter
{
void operator()(T* buffer) const
{
::operator delete(buffer);
}
};
public:
Vector(int capacity = 10)
: capacity(capacity)
, length(0)
, buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
{}
~Vector()
{
std::unique_ptr<T, Deleter> deleter(buffer, Deleter());
for(int loop = 0; loop < length; ++loop)
{
buffer[length - 1 - loop].~T();
}
}
Vector(Vector const& copy)
: capacity(copy.length)
, length(0)
, buffer(static_cast<T*>(::operator new(sizeof(T) * capacity)))
{
try
{
for(int loop = 0; loop < copy.length; ++loop)
{
push_back(copy.buffer[loop]);
}
}
catch(...)
{
std::unique_ptr<T, Deleter> deleter(buffer, Deleter());
for(int loop = 0; loop < length; ++loop)
{
buffer[length - 1 - loop].~T();
}
throw;
}
}
Vector& operator=(Vector const& copy)
{
return *this;
}
Vector(Vector&& move) noexcept
: capacity(0)
, length(0)
, buffer(nullptr)
{
move.swap(*this);
}
Vector& operator=(Vector&& move) noexcept
{
move.swap(*this);
return *this;
}
void swap(Vector& other) noexcept
{
using std::swap;
swap(capacity, other.capacity);
swap(length, other.length);
swap(buffer, other.buffer);
}
void push_back(T const& value)
{
resizeIfRequire();
new (buffer + length) T(value);
++length;
}
void pop_back()
{
--length;
buffer[length].~T();
}
private:
void resizeIfRequire()
{
if (length == capacity)
{
}
}
};
Summary
This article has shown how to handle the basic resource management required by a vector. It has covered several important principles for C++ programmers.
- Separation Of Concerns
- Rule of Zero
- Rule of Three
- Rule of Five
- Default compiler generated methods
- Shallow Copy Problem
- Placement New
- Exception Guarantees
Sources
Looking at CodeReview.stackexchange.com; reimplementing the vector class is a common goal for a first project.