Poorly Designed Features 0f C++ Constructors -- ADTmag

Poorly Designed Features 0f C++ Constructors

By Eyas El-Qawasmeh, Basel Mahafzah
March 1, 2001

In C++, a constructor is a special function that must be invoked when an object is created. Its objective is to initialize a new object to a valid state before any processing occurs using the object. Constructors appear to have some poorly designed features for many reasons, such as the condition, which requires the class constructor's name to be the same as the class name. These features are explored so that some precautions can be taken when building any C++ compiler. In addition, any future modification to the C++ language or the design of any new object-oriented language can avoid these features. The resolution for some of these features is suggested.

In C++, every class has at least one constructor member function, even when none is declared.^1,2,3 The constructor member of a class is a function that is called when the object is declared or dynamically allocated.⁴ It performs various tasks that are not visible even if there is no code written for the constructor. Among these tasks is to allocate memory for an object on the stack (static allocation), and on the heap (dynamic allocation), and to set its initial internal state by assigning values to some or all of its data members. A constructor's name must be the same as the class, but there can be many different overloaded constructors, provided their list of parameters is uniquely different. Constructors may be explicitly called by the programmer code or implicitly called by the compiler when the programmer does not specify any constructor. Defining an explicit constructor is highly recommended, since the implicit defined constructor may not behave as desired, especially if dynamic allocation of memory is involved. In this case, the compiler will generate such constructors implicitly according to some rules. The generated code calls the constructor with the right parameters. Constructors have no return type even though they may have return statements.^5,6 Each constructor can initialize a class instance differently. Because all classes have a constructor, at least the default constructor, each object is initialized using a corresponding constructor prior to use in a program. The activities involved when an object is created using a constructor are described in Figure 1.
Figure 1. The activities involved in the execution of a constructor.

Due to the fact that a constructor is a function, its visibility must be declared as public, protected, or private. Usually, constructors are declared public. However, if a constructor is declared protected or private, then we limit how the class can be instantiated. This is useful for special classes where their instantiated objects should not be created by just anyone. Constructors can contain any valid C++ statement. For example, a side effect, like a print statement, can be added to a constructor to see exactly when it is invoked.

CONSTRUCTOR TYPES
C++ constructors have several types. The most common are the default and copy constructors. Some other types of rare usage also exist. Four different kinds of constructors are described in the following subsections.

Default constructor
Default constructors are member functions without arguments. However, a default constructor can be declared with an argument list, provided all arguments have defaults. The duty of a default constructor is to initialize a class instance into a known default state. If there are no constructors explicitly provided in the class definition, the compiler will automatically define an implicit default constructor. The implicit default constructor is equivalent to a default constructor with an empty body. It does nothing but chain to the default constructor of the class members that are class instances. The invocation of the default constructor is performed automatically in many cases; for example, the declaration of an object causes a call to the default constructor.

Copy constructor
The copy constructor, often called X(X&) ("X of X ref"), is a special constructor that is used by the compiler whenever a new object is created based on another object of the same class. It has a single argument that is a reference to an object of a class that will not change (since it is const). This constructor is essential for control passing and returning of user-defined types by value during function calls. The copy constructor will call the copy constructors of all base classes and member functions. It uses the constant form if it can; otherwise, the non-constant form is used.

In C++, there are three cases when an object is required to be copied by another one. In all of the following cases, the copy constructor should be used. These are:

When an object is passed by a value to a function.
When an object is returned by a value from a function.
When an object is initialized in its declaration based on another object.

The copy constructor is used in the previous situations. If a copy constructor is not used in the first two cases, it will lead to an object pointer pointing to deleted memory. For the third case, the copy constructor is used due to the differences in C++ between initialization and assignment. The initialization constructs a new object and gives it a value at the same time. This requires the use of a copy constructor in order to allocate memory. The assignment changes the value of an object that has already been constructed. This kind of initialization could actually be implemented as a call to the default constructor, followed by a call to the assignment operator. In fact, the copy constructor is implemented using the member functions that implement the default constructor and the assignment operator. Literature describing the differences and similarities between the copy constructor and the assignment operator exists in many references.^1,2,7

The copy constructor should not modify the object that is used to copy for the following reason: When an object is passed by value to a function, its copy constructor is called automatically to produce the new object, which is passed to the function. If an object was passed by value to its own copy constructor, its copy constructor would be called to copy the object so that a duplicate can be sent to the copy constructor. This would result in an infinite recursion.

In addition to being called implicitly when an object is passed by value to a function, the copy constructor is also called implicitly when an object is returned. In other words, the object you get from a function is a copy of the object the function passed to you. But again, the copy constructor is called implicitly, so you do not have to worry about it.

If there is no copy constructor explicitly provided in the class definition, the compiler automatically defines an implicit one that performs a bitwise copy from the existing object to the new object. The implicit copy constructor simply chains to the copy constructor of each class member. Many authors refer to this as the default copy constructor. Note that there is a difference between the implicit and the explicit copy constructor in chaining behavior. The explicit copy constructor chains only to the default constructor of the class members that are class instances unless another constructor is invoked in the class initialization or construction list.⁷

Copy constructor makes the programs more efficient because it does not need to change the input argument in order to initialize the new object. It is a good style to always define a copy constructor, even the default copy constructor from the compiler allocating the memory you wanted. It was observed that the default copy constructor is adequate in many cases.

User-defined constructor
User-defined constructors allow the user to define and initialize a variable at the same time. This type of constructor can take any parameters. An example of a user-defined constructor and other constructor types are listed in class mystring, as can be seen in the following example:


class mystring 
{......
public:	mystring();     // Default constructor
					mystring (mystring &src)
                        // Copy constructor
					mystring (char * scr);
                        // Coercion constructor
					mystring ( char scr[ ], size_t len);
                        // User-Defined constructor

};

Coercion constructor
In C++, it is possible to declare a constructor that takes a single parameter for a class and use this constructor for doing type conversion. A coercion constructor defines a conversion (implicit, as well as explicit cast) from the parameter type to the object type. In other words, the compiler can call the constructor with any instance of the parameter type. The purpose of this is to create a temporary instance of the class to be used instead of the parameter type instance whenever necessary for type compatibility. Note that recent additions to the C++ standard added the keyword "explicit" which, when used preceding a one-parameter constructor, prevents the implicit type conversion. However, this is not available on all compilers. The following is an example of the coercion constructor:


class A 
{
public :
					A(int ){ }
};
void f(A) { } 
void g()
{
					A My_Object= 17;
					A a2 = A(57);
					A a3(64);
					My_Object = 67;
					f(77);
}

A declaration like: A My_Object= 17; means that the A(int) constructor is called to create an object from the integer values. Such a constructor is called a coercion constructor.

GENERIC FEATURES
Following is a list of C++ constructor features that are poorly designed; however, other features might be detected. Working around these features is possible in most cases, and will be discussed on an individual case.

Constructors can be inlined, but should be prohibited
In general, the member functions of a class can be inlined by inserting the keyword "inline" in front of the function name. Among these functions is the constructor, which can also be inlined, but should not be. A constructor is inlined by inserting the keyword "inline" and then defining it as in the following example:


class x 
{..........
public : x (int );
     :
     :
};
inline x::x(int )
{...}

In the previous code, instead of having the function as a separate entity, it is inserted into the program. This is efficient for functions that are very short—one or two statements—since the jump to the function and the save operation of the registers are saved.

An example showing the dangers of using inline can be seen by defining an inline static constructor. In this case, the static constructor is supposed to be called by the runtime only once. However, if a header file contains an inline static constructor in multiple translation units, then there will be multiple copies. This will invoke all copies during program start-up, rather than the single copy the programmer intended. The basic problem here is that the static constructor is really an object in the guise of a function.

It is important to understand technically that inline is a request, not a command, and that the compiler generates inline code. This means that inline is implementation-dependent where there are various situations that are handled differently from one compiler to another. As a result, there is no standardization in complying with using inline request. On the other hand, there is more code in a constructor than the programmer usually writes. In case the constructor is declared inline, all the constructors for containing objects and base classes must also be called. Those calls are implicitly written into the constructor. This can create very large inline segments and, because of this, it is not recommended for inline constructors.

Constructors do not have a return type
Specifying a constructor with a return type is an error, as is taking the address of a constructor. This implies that it is not possible to use error codes. Thus, whether a constructor has succeeded or failed cannot be determined if the constructor never returns an indicating value. In fact, there is an existing method to determine if the memory allocation has occurred successfully despite the silence of C++ constructors. This method is built into the language to handle just such an exigency. A predefined function pointer, called the -new-handler, can be set by users to point to a user-define routine that will execute if the new operator should fail. This user-define routine can perform any action, including setting an error flag, attempting to recover memory, exiting, aborting, or throwing an exception.⁸ The -new-handler is a built-in exception handler packaged for ease use. The best way to signal constructor failure, therefore, is to throw an exception. Throwing an exception in a constructor is a single-constructor failure that cleans up whatever objects and memory allocations you have made prior to throwing the exception.

Throwing an exception in a constructor is tricky. This is because the memory for the object itself will be allocated by the time the constructor is called. There is no simple way to allocate the memory occupied by the object from within the constructor for that object. You will find that throwing an exception in a constructor may result in the object remaining allocated.

If there are operations performed in the constructor that can fail, it might be a better idea to put those operations into a separate initialization function, rather than throwing an exception in the constructor. This way, the programmer can safely construct the object and get a valid pointer to it. Then the initialization function for the object can be called. If the initialization function fails, the object can be deleted directly.

Constructors cannot be declared static
In C++, each object of the class has its own copy of the data members of the class. However, a static data member is not duplicated for each object; rather, a single data item is shared by all objects of a class.^9,10 A static function is one that operates on the class in general, rather than on objects of the class. It can be called with the class name and scope-resolution operator. An exception to this is the constructor function because it will violate the concept of object-orientedness.

A similar observation about this issue is related to a static initialized object, which is instantiated at start-up time (just before main() is called). This observation is explained through the following code:


MyClass static_object(88, 91);

 void   bar()
 {
  if (static_object.count( ) > 14) {
    ...
  }
 }

In this example, the static initialized object is instantiated at start-up time. Usually, there are two components to these objects. First there is the data segment, which is static data loaded into the global data segment of the program. The second part is an initializer function that is called by the loader before main() is called. We have found that some compilers do not implement the initializer function reliably. So you get the object data, but it is never initialized. One way to avoid this limitation is to write a wrapper function that creates a single instance of an object and to replace all references to the static initialized object with a call to the wrapper function. Thus, the above example will be as follows:


static MyClass* static_object = 0;

 MyClass*
 getStaticObject()
 {
  if (!static_object)
    static_object = 
       new MyClass(87, 92);
  return static_object;
 }

 void  bar()
 {
  if (getStaticObject()->count( ) > 15)
  {
    ...
   }
 }

Constructors cannot be virtual
A virtual constructor means the programmer is able to create an object from a class without knowing exactly what type of object he or she wishes to create until run time. Virtual construction is not possible in C++. The most common case of this arises when implementing I/O operations on objects. Even if enough information is stored in a file to represent the entire internal state of a given object, there must be some way to construct an instance of the corresponding class when that file is read just using the stored information. However, an advanced C++ programmer can simulate virtual constructors.^11,12

The simulation of virtual constructors requires the ability to specify which constructor to call at run time. The standard way of implementing specialized runtime behavior is using virtual member functions. Unfortunately, due to the semantics of C++, constructors cannot be virtual. In order to circumvent this limitation, a number of standard idioms have been developed to allow for the selection of which class to construct at run time. These techniques are known collectively as virtual constructors, even though there is really no such thing in C++.

The first idiom for implementing virtual constructors¹² is to use a switch statement or a sequence of if statements to manually implement a selection. In the following example, the selection is based on the standard library's type_info constructs, by enabling the runtime type information support in some compilers like Microsoft Visual Studio, but you could implement your own Run Time Type Information (RTTI) scheme based on virtual member functions:


class Base
{
public:
		virtual const char* get_type_id() const;
		staticBase* make_object
         (const char* type_name);
};

const char* Base::get_type_id() const
{
		return typeid(*this).raw_name();
}

class Child1: public Base
{
};

class Child2: public Base
{
};

Base* Base::make_object(const char* type_name)
{
		if (strcmp(type_name,
		typeid(Child1).raw_name()) == 0)
			return new Child1;
		else if (strcmp(type_name,typeid
		     (Child2).raw_name()) == 0)
			return new Child2;
		else
    {
			throw exception
		     ("unrecognized type name passed");
			return 0X00;		// represent NULL
	}
}

While this approach is straightforward, it requires the programmer to manually maintain a list of all supported types in the make_object function, which is a process that is both error-prone and labor-intensive. It also implies a violation of encapsulation in that a member of Base, Base::make_object(), must know about all of Base's concrete descendant classes.¹²

A more object-oriented approach to virtual constructors is a technique that Coplien calls exemplar instances.¹² The basic idea is that instead of maintaining a statically compiled function, the programmer creates a global list of specially designated instances (the exemplars) that exist only as handles on the virtual constructor mechanism:


class Base
{
public:
		staticBase* make_object(const char* typename)
		{
			if (!exemplars.empty())
			{
				Base* end = *(exemplars.end());
				list<Base*>::iterator iter =
				     exemplars.begin();
				while (*iter != end)
				{
					Base* e = *iter++;
					if (strcmp(typename,
					     e->get_typename()) == 0)
					return e->clone();
				}
			}
			return 0X00    // Represent NULL;
		}
		virtual ~Base() {  };
		virtual const char* get_typename() const
		{
			return typeid(*this).raw_name();
		}
		virtual Base* clone() const = 0;
protected:
static list<Base*> exemplars;
};
list<Base*> Base::exemplars;
// T must be a concrete class
// derived from Base, above
template<class T>
class exemplar: public T
{
public:
exemplar()
  {
			exemplars.push_back(this);
  }
~exemplar()
  {
exemplars.remove(this);
        }
};
class Child: public Base
{
public:
~Child()
        {
        }
Base* clone() const
  {
       return new Child;
  }
};
exemplar<Child> Child_exemplar;

In this scheme, what the programmer needs to do when adding a new class to the set of those for which a virtual constructor is desired is to remember to create an instance of the corresponding exemplar<T> class. Note that in this example, the exemplar instances are themselves instances of the classes for which they are exemplars. This is not necessary as long as they are instances of classes that know how to create instances of the classes for which they are exemplars. This allows for significant optimization if the cost of instantiating a given class is too high for an instance whose only purpose is to serve as an exemplar.¹²

The need for a default constructor even when it is not called
The need for a default constructor occurs when inheritance is used. To be more specific, when the most-derived class of a hierarchy is a constructor, all base classes are constructed before any of the derived base classes. As an example, consider the following code:


#include<iostream.h>
class Base
{
		int x;
public :
		Base() : x(0) {  } // The NULL constructor
		Base(int a) : x(a) {  }
};
class alpha : virtual public Base
{
		int y;
public :
		alpha(int a) : Base(a), y(2) { }
};
class beta : virtual public Base
{
		int z;
public :
		beta(int a) : Base(a), z(3) { }
};
class gamma : public alpha, public beta
{
		int w;
public :
		gamma ( int a, int b) : alpha(a), beta(b), 			w(4) { }
};
main()
{.....
}

In this example, we do not provide an initializer for Base in the gamma header. The compiler will use the default constructor for Base. Because you provide a constructor for Base, the compiler will not create the default constructor for you. As you see, the code contains the default constructor. If the default constructor is deleted, the compiler will complain.

If the Base constructor did something with side effects, like opening a file or allocating memory (which the programmer does not want), then the programmer has to ensure that intermediate base classes do not have initializers for virtual base classes. That is, only the default constructor for the virtual base should be used implicitly.

The default constructor for the virtual base class does these things, which should always be done once and does not depend on any parameters to the derived class constructors. You add an init() function to the virtual base class and call it from other functions to the virtual base class, and also call it from within the other class constructors (you may have to ensure it is only called once).

A constructor's address cannot be taken
In C++, it is not possible to pass constructors as function pointers, and pointers to constructors cannot be passed around directly. Allowing this permits later creation of objects by calling the pointers. One way to get around this is to create static helper functions that create and return a new object. Pointers to these static functions can then be used when new objects are required. The following is an example of this case:


class A
{
public:
		A( );  // cannot take the address of this
			   // constructor directly
		static A* createA();
		// This function creates a new A object
		// on the heap and returns a pointer to it.
		// A pointer to this function can be passed
		// in lieu of a pointer to the constructor.
};

This method would work very well with designs, which only put abstract classes in header files (a good way to control unnecessary rebuilds). This leaves one question of how to do a new, since the exact type must be visible. The above static function can be used to wrap up and hide the subtypes.

Bitwise copies are unacceptable for classes that use dynamically allocated memory
In C++, if you do not provide a copy constructor, the compiler will generate one automatically. This generated copy constructor simply performs a bitwise copy of the class instance. This is fine for a class that does not contain any pointer variables. However, bitwise copies are unacceptable for classes that use dynamically allocated memory. To clarify this situation, consider an object that is passed by value to a function, or returned by value from a function. In this case, the object is copied in a bitwise fashion. This bitwise copy is not sufficient for those objects that contain pointers to other objects (see Figure 2). When an object containing a pointer is passed by value into a function, the object is copied, including the address of the pointer, and the new object now resides within the scope of the function. Upon the end of the function execution, the destructor will be called to destroy the new object. As a result, the object pointer will be deleted. This leaves the original object pointer pointing to freed memory—an error in the program. A similar error in the program occurs when an object is returned by value from a function.

Figure 2. The automatic copy constructor that makes a bitwise copy of the class.

This problem can be avoided by creating a copy constructor for the class, whereby the action of copying instructs the copy constructor to allocate new memory and replicate the pointed to object. This kind of copy is called deep copy, where the heap memory is allocated separately for each copy.

The compiler can select a coercion constructor implicitly
Since the compiler can select a coercion constructor implicitly, you relinquish control over what functions are called and when. If it is essential to retain full control, do not declare any constructors that take a single argument; instead, define helper functions to perform conversions, as in the following example:


#include <stdio.h>
#include <stdlib.h>
class Money
{
public:
	Money();
	// Define conversion functions that can only be
	// called explicitly.
	static Money Convert( char * ch )
	{ return Money( ch ); }
	static Money Convert( double d )
	{ return Money( d ); }
	void  Print() { printf( "\n%f", _amount ); }
private:
	Money( char *ch ) { _amount = atof( ch ); }
	Money( double d ) { _amount = d; }
	double _amount;

};

void main()
{
	// Perform a conversion from type char *
	// to type Money.
	Money Account = Money::Convert( "57.29" );
	Account.Print();
	// Perform a conversion from type double to type
	// Money.
	Account = Money::Convert( 33.29 );
	Account.Print();
}

In the previous code, the coercion constructors are private and cannot be used in type conversions. However, they can be invoked explicitly by calling the convert functions. Because the Convert functions are static, they are accessible without referencing a particular object.

CONCLUSION
It should be clear that the mentioned points are applicable to ANSI C++ to the best of our knowledge. Note that there are many compilers that add their own extra syntax rules in addition to ANSI C++. These points are sensitive to different compilers. It was observed that many modern compilers handle these points incorrectly. The purpose of the explored points is to take precautions during the compiler construction. This clarification will also help to remove any ambiguity in the standardization of C++.

REFERENCES

Stroustrup, Bjarne. The C++ Programming Language, 3rd ed., Addison–Wesley, Reading, MA, 1997.
Ellis, Margaret and Bjarne Stroustrup. The Annotated C++ Reference Manual, Addison–Wesley, Reading, MA, 1990.
Stroustrup, Bjarne. The Design and Evolution of C++, Addison–Wesley, Reading, MA, 1994.
Murry, Robert B. C++ Strategies and Tactics, Addison–Wesley, Reading, MA, 1993.
Farres-Casals, J. "Proving Correctness of Constructor Implementations," Mathematical Foundations of Computer Science 1989 Proceedings.
Breymann, Ulrich. Designing Components with the C++ STL, Addison–Wesley, Reading, MA,1998.
Lippman, Stanley and Josee LaJoie. C++ Primer, 3rd ed., Addison–Wesley, Reading, MA, 1998.
Skelly, C. "Getting A Handle On The New-Handler," C++ Report, 4(2):1–18, February 1992.
Coggins, J. M. "Handling Failed Constructors Gracefully," C++ Report, 4(1):20–22, January 1992.
Sabatella, M. "Laser Evaluation of C++ Static Constructors," SIGPLAN Notices, 27(6):29–36 (June 1992).
Eckel, B. "Virtual Constructors," C++ Report, 4(4):13–16,May 1992.
Coplien, James O. Advanced C++: Programming Styles and Idioms, Addison–Wesley, Reading, MA, 1992.