SOM-ITSOLUTIONS

C++

Copy-On-Write (COW)

SOMENATH MUKHOPADHYAY

som-itsolutions

#A2 1/13 South Purbachal Hospital Road Kolkata 700078 Mob: +91 9748185282

Email: som@som-itsolutions.com / som.mukhopadhyay@gmail.com

Website: http://www.som-itsolutions.com/

Blog: www.som-itsolutions.blogspot.com


As i was trying to recapitulate several important concepts of OOAD, naturally COW or Copy-On-Write got my attention for sometime. COW is a special technique used for efficient resource management in computer science. In C++ ‘98 standard, COW was used to define the std::string class. Although in C++ ‘11 standard, COW is not used to define the string class of the standard C++ library, but it is worth to have a look at the earlier implementation of the string class in GCC’s standard library. In COW, we defer creating of a new modifiable resource of a class in the case of copying if we keep the copy unchanged. We just refer to the old resource in the new copy. We write only when the copy changes that modifiable resource.

The following code is my effort to explain the COW concept in a String class. Let me explain some of the parts of this working code.

#include <string.h>
#include <cstddef>

//This is using default constructor
//and copy constructor. Hence while copying it will
//use shallow copy.
class StringHolder{
public:
        
char* data;
        
int refCount;

        
virtual ~StringHolder(){
                
delete[] data;
        }
};
class MyString{
        
private:
        StringHolder
* mStrHolder;
        
int length;

        
public:
                //default constructor
                MyString(){
                        mStrHolder
= new StringHolder();
                        length
= 0;
                        }

                MyString(
const char* inStr){
                        mStrHolder
= new StringHolder();
                        
unsigned l=strlen(inStr);
                        mStrHolder
->data=new char[l+1];
                        memcpy(mStrHolder
->data,inStr,l+1);
                        mStrHolder
->refCount = 1;
                        length
= l;

                }

                MyString(
const MyString& inStr){
                        mStrHolder
= inStr.mStrHolder;
                        mStrHolder
->refCount++;
                        length
= inStr.length;
                }

                
//Reference counted assignment operator
                MyString
& operator=(const MyString& inStr){
                        
//check against self assignment
                        
if(this == &inStr)
                                
return *this;


                        mStrHolder
->refCount--;//the data of the lhs string
                        
//does no longer point to the old char
                        
//array
                        
if(mStrHolder->refCount == 0)
                                
delete mStrHolder;

                        mStrHolder
= inStr.mStrHolder;//the lhs and rhs data
                        
//both are pointing to the RHS data.
                        mStrHolder
->refCount++;
                        length
= inStr.length;
                        
return *this;
                }

                
//mutator service
                MyString
& operator+=(const MyString& inStr){
                        Detach();
                        
const char* cStr = inStr.c_str();
                        
int l  = inStr.length;
                        
const char* oldData = mStrHolder->data;
                        
//deep copy
                        
char* tmp=new char[length + l];
                        mStrHolder
->data = tmp;
                        memcpy(mStrHolder
->data, oldData, length);
                        strcat(mStrHolder
->data,cStr);
                        mStrHolder
->refCount=1;
                        length
+= l;
                        
return *this;
                }

                
//Destructor. The virtual term is important
                
virtual ~MyString(){
                        mStrHolder
->refCount--;
                        
if(mStrHolder->refCount == 0)
                                
delete mStrHolder;
                }
                
//Call this function first for all the
                
//mutator services of this class
                
void Detach(){
                        
if (mStrHolder->refCount == 1)
                                
return;
                        StringHolder
* temp = new StringHolder();
                        temp
->data = mStrHolder->data;
                        temp
->refCount = 1;
                        mStrHolder
->refCount--;
                        mStrHolder
= temp;
                }

                
inline char* c_str()
                        {
                                
return (mStrHolder->data);
                        }

                
inline const char* c_str() const
                {
                        
return (mStrHolder->data);
                }

                
int getRefCount(){
                        
return mStrHolder->refCount;
                }
};

As you can see, there is a class called StringHolder which is responsible to store the actual data or the character array. As I have not used any constructor or copy constructor, naturally this class will take the help of compiler generated default constructor and copy constructor. And as the compiler generated copy constructor does shallow copy, hence this class will also use shallow copy in case of copy constructor. Beside the character array, this class has a reference count member variable which actually determines how many of our user defined String class actually are holding a reference to the character array at any particular time.

Now let us turn our attention to the MyString class. As you can see, there is a default constructor in this class, one copy constructor, one assignment operator and a destructor. These kinds of classes are called Canonical Classes in C++.

The class also has an one argument constructor which creates the MyString objects from a character array. Now let us examine the copy constructor. As you can see, inside the copy constructor we are simply taking the help of the shallow copy of the MyString’s member variable mStrHolder. Hence whenever we call a copy constructor, an extra reference is added to the character array of the StringHolder class. So we need to increase the refCount member variable as well.

Now let us examine the assignment operator. The first line checks for the self assignment that probably i don’t need to explain here. Next as we are assigning a new value to the LHS, the original LHS will no longer refer to the old character array of the StringHolder class. Hence we have decremented the reference count by one. Next we are doing a shallow copy of the mStrHolder member variable of the MyString class, and as a result we are just incrementing the reference count.

Now let us pay attention to a mutator member functions of this class. We have implemented only one of such functions and that is we have defined the += operator overloading. The other mutator functions can be implemented in the same fashion.

We have implemented a Detach() method to help us define the mutator methods. We have to call this Detach() method at the beginning of any mutator functions of the MyString class. Let us examine what this Detach method is doing.

//Call this function first for all the
                
//mutator services of this class
                
void Detach(){
                        
if (mStrHolder->refCount == 1)
                                
return;
                        StringHolder
* temp = new StringHolder();
                        temp
->data = mStrHolder->data;
                        temp
->refCount = 1;
                        mStrHolder
->refCount--;
                        mStrHolder
= temp;
                }

As you can see, whenever we call the Detach() method, it reduces the old mStringHolder’s refCount by one and creates a new mStrHolder object with a reference count 1.

Having explained the Detach() method, now let us focus on += mutator method of the class. As you can see from the definition of the += operator overloading method that we have first called the Detach() method. As a result the LHS will have a brand new mStrHolder object with a reference count of 1. Then later as you can see that we have used a deep copy of the mStrHolder class for the output result. Hence the output of this += operator overloading will be a brand new object having a single reference to the character array. At the same time, the original reference count of the LHS will be reduced by one.

That’s it.

So let me explain from an user’s perspective of what is happening in this COW based string class. To experiment with this class i have created few lines of code as follow.

MyString str1("Loves to study CPP...");

MyString str2("How are you?");

MyString str3("Somenath Mukhopadhyay");

Naturally all of the above strings will have a reference count of 1.

Now if we do str2 = str1, what will happen. As you can understand that now both str1 and str2 will point to the same character array. Hence the reference count will be 2. We can verify that by the following piece of code.

char* data1 = str1.c_str();

char* data2 = str2.c_str();

printf("%p\n",&data1[0]);

printf("%p\n",&data2[0]);

We will see that both of the outputs of the above code are same which implies that both str1 and str2 are referring to the same character array.

Now what will happen if we write MyString str4(str2). So now both str1, str2 and str4 are pointing to the same character array. Hence the reference count will be 3. We can again verify that by using the following code.

char* data1 = str1.c_str();

char* data2 = str2.c_str();

char* data4 = str4.c_str();

printf("%p\n",&data1[0]);

printf("%p\n",&data2[0]);

printf("%p\n",&data4[0]);

As you can see in all of the above functions we are actually not creating any new strings except for the first time. We are just incrementing the reference count.

Now the interesting part. What will happen if we do str1+=str2. In this case a brand new str1 will be created and the old str1’s reference to the character array will be lost. Hence after this operation only str2 and str4 will hold a pointer to the character array. We can again verify this by the following code.

char* data1 = str1.c_str();

char* data2 = str2.c_str();

char* data4 = str4.c_str();

printf("%p\n",&data1[0]);

printf("%p\n",&data2[0]);

printf("%p\n",&data4[0]);

All of the above steps have been pictorially described in the following diagram.Drawing.png