Book Excerpt: streambuf: The Stream Buffer Classes
- By Angelika Langer and Klaus Kreft
- February 10, 2000
excerpted from pages 84-109 of Standard C++ IOStreams and Locales,
by Angelika Langer and Klaus Kreft
© 2000 by Addison Wesley Longman Inc.
Reproduced by permission of Addison Wesley Longman. All rights reserved.
You might wonder why on earth we devote 10+ pages of our book to the guts of stream buffers, which seem to be nothing more than an implementation detail of the IOStreams library. For an answer, let us quote Jerry Schwarz, the "inventor" of IOStreams (quote taken from the foreword):
"A major goal in my original design was that it be extensible in interesting ways. In particular, in the stream library the streambuf class was an implementation detail, but in the iostream library I intended it to be a usable class in its own right. I was hoping for the promulgation of many streambufs with varied functionality. I wrote a few myself, but almost no one else did. I answered many more questions of the form "how do I make my numbers look like this" than "how do I write a streambuf". And textbook authors also tended to ignore streambufs. Apparently they did not share my view that the architecture of the input/output library was an interesting case study."
Essentially, we agree with Jerry and find that the architecture IOStreams is an interesting case study of an extensible framework. Within this framework, the stream buffer abstraction is much more than an irrelevant implementation detail. Deriving from the stream buffer base class
basic_streambuf is a major extension point in IOStreams. It allows connection of user-specific devices to the IOStreams framework in a way that the stream layer's rich functionality of formatting and parsing can be reused together with the newly connected device. Not only can additional hardware be made accessible through the stream buffer interface, but also software abstractions can be hooked into the IOStreams framework. In fact, every abstraction that exhibits stream-like behavior and can serve as a source or sink of characters can be seen as an external device. Examples include: output to a certain window in a graphical user interface for display of trace information or input and output to a socket or shared memory for communication between two processes or a special wrap-around file abstraction for logbook purposes or any kind of filter functionality such as skipping comments on input or insertion of line counts on output. The list of conceivable extensions is endlessly long, and naturally you will find a section in the book that goes into the practical details of deriving concrete stream buffer classes. As Jerry said: It's an interesting case study.
The Stream Buffer Classes
The stream buffer classes represent the abstraction of a connection to an external device. The main task of the stream buffer is transport of characters to and from this external device, and buffering of these characters in an internal buffer.
The external devices are seen as sequences of characters. In the following, we will therefore simply talk of sequences when we mean the abstraction of an external device.
Stream buffers are used by streams for actual transport of characters to and from a device, whereas the streams themselves are responsible for parsing and formatting the text input and output.
Class Hierarchy
Like the stream classes, the stream buffer classes are organized in a class hierarchy (see figure 2-5).
Figure 2-5: The stream buffer classes.
Class basic_streambuf acts as an "abstract" stream buffer class. All concrete stream buffer classes, such as file stream buffers and string stream buffers, are derived from that "abstract" stream buffer class. The concrete stream buffer classes encapsulate knowledge that is specific to the external device connected to the stream, whereas the stream base class is independent of the specific device and defines the general buffering and transport interface and functionality that has to be provided by a stream buffer.
Class basic_streambuf is an "abstract" base class in the sense that no instances of this class can be constructed. Its constructor is protected and accessible only to derived classes. A number of member functions are virtual and meant to be overwritten by a derived class. However, none of the virtual functions is purely virtual. Rather, all virtual member functions implement a sensible default behavior so that they need not be overwritten in a derived class if the default behavior already meets the derived class's needs.
The file buffer classes basic_filebuf allow input and output to files. They have additional member functions open() and close() that are necessary for file handling, and they override several virtual functions that perform the actual transport of characters to and from the file.
The string stream buffer classes basic_stringbuf implement the in-memory I/O, i.e., they associate the input and output sequences with a memory location.
The following sections describe first the principles of the stream buffer abstraction in general and then the concrete mechanisms for each of the derived stream buffer classes. We concentrate on the main functionality of stream buffers, namely input, output, and putback. Other aspects such as positioning and locale management are omitted, but can be looked up in the reference part of this book if needed.
The Stream Buffer Abstraction
Two character sequences are associated with a stream buffer: the input sequence and the output sequence, which represents the external device. Internally a stream buffer maintains a character array for buffering the input and/or output sequence. If the entire sequence does not fit into this character array, which naturally is of limited length, the buffer represents a subsequence of the input and/or output sequence. This way the internal buffer can be seen as a window to the input and/or output sequence (see figure 2-6).
Figure 2-6: The stream buffer represents a subsequence of the external character sequence.
The input (sub)sequence, which is kept in the character array, is called the get area; the output (sub)sequence is called the put area. Each (sub)sequence, the input as well as the output sequence, is described by three pointers: (1) the begin_pointer, which is the address of the lowest element in the area, (2) the next_pointer, which is the address of the element that is the next candidate for reading or writing, and (3) the end-pointer, which is the address of the next element beyond the end of the area.
If an area is not available, the next_pointer is null. The way in which input and out-put areas are related is unspecified for the stream buffer base class. All you know is that there are two areas, each of which is described by three pointers and represents a (sub)sequence of the external device.
The interface of the stream buffer base class falls into three parts:
- Public. These functions are used by the streams for implementing their functionality on top of the stream buffer.
- Protected nonvirtual. These functions are used for implementing the stream buffer's public interface.
- Protected virtual. These functions are meant to be overridden by any derived stream buffer classes.
The protected virtual interface of a stream buffer class provides operations that access the external character sequence.
1 Such operations
- perform reads directly on the associated input (sub)sequence (xsgetn(), underflow(), uflow(), etc.), or
- perform writes directly on the associated output (sub)sequence (xsputn(), overflow()),
- make put back positions available in the input (sub)sequence (pbackfail()), and
- alter "the stream position" and conversion state (seekoff(), seekpos()).
The protected nonvirtual interface of a stream buffer class provides operations that manipulate one or both of the internal sequences. Such operations
- retrieve the values of the pointers (the get area's begin_, next_, and end_pointer via eback(), gptr(), egptr(), and the put area's begin_, next_, and end_pointer via pbase(), pptr(), epptr()),
- alter the value of the pointers (by assigning new pointers via setg(), setp(), or by incrementing the next_pointer via gbump(), pbump()).
The public interface is built on top of the protected interface and is used by the stream layer to implement its operations. The stream buffer's public interface includes operations for extraction and insertion of characters from/to the get/put area, stream positioning, and other functionality:
- extract characters from the get area (sgetc(), sgetn(), sbumpc(), etc.)
- insert characters to the put area (sputc(), sputn(), etc.)
- put back characters to the get area (sputbackc(), sungetc())
- stream positioning (pubseekoff(), pubseekpos()).
In addition to the functions mentioned above, stream buffers have a couple of other member functions. Only the most important and typical functions are listed above. For a complete description of the stream buffer base class's interface, see the reference section. Also, section 3.4, Adding Stream Buffer Functionality, provides more details on the protected interface.
A note on the stream buffer classes' constructors and destructors:
- The stream buffer base class's destructor is public and virtual, as is usual for a class that is designed to serve as a base class.
- The stream buffer base class has only one constructor, which is a protected default constructor. This is to ensure that only derived stream buffer objects may be constructed. The concrete stream buffer classes, of course, have public constructors.
Neither the copy constructor nor the copy assignment for any of the stream buffer classes is specified by the standard. In particular, it is not required that they are inaccessible. They will most likely not be implemented at all, which means that the compiler-generated default functionality for copying and assignment will apply. As a consequence, stream buffers, which contain pointers to their get and put areas,
can be copied and assigned, meaning that the internally held pointers will be copied. Two stream buffer objects that are copies of each other would operate on the same character array without any coordination. The results are likely to be unpredictable. For this reason, avoid inadvertent copies or assignments of stream buffer objects.
Let's return to the stream buffer's core functionality and look at the principles of handling character input and output in the stream buffer classes.
EXTRACTING INTPUT FROM THE INPUT SEQUENCE
A character can be requested from the input sequence by calling the stream buffer's public member function sgetc(). If the get area exists and is not empty, i.e., next_pointer != 0 and next_pointer < end_pointer,="" the="" next="" character="" from="" the="" get="" area="" is="" returned.="" if="" the="" get="" area="" does="" not="" exist="" or="" is="" empty,="" the="" protected="" virtual="" member="" function="">underflow() is called.
Alternatively, a character can be requested from the input sequence via the stream buffer's public member function sbumpc(). In addition to the functionality of sgetc(), namely, extraction of a character from the input sequence, sbumpc() also advances the read position. The effect is that the character extracted after a call to sbumpc() is the character at the next read position, whereas after a call to sgetc() the same character will be returned again. Roughly speaking, sgetc() means "looking at the available character" and sbumpc() means "consuming the available character." If the get area does not exist or is empty, then sbumpc() invokes the protected virtual member function uflow(), which is the counterpart to underflow() in the case of sgetc().
In the stream buffer base class basic_streambuf, the virtual function uflow() is implemented in terms of underflow(): It invokes underflow() and increments the get area's next pointer. This is a sensible default behavior that works nicely for stream buffers that have an internal character buffer. In fact, neither file buffers nor string buffers override this default behavior of uflow(), but only redefine underflow(). For this reason, we focus on the functionality of underflow() in the rest of this section.
The general purpose of underflow() is to make additional characters from the external sequence available in the internal buffer; in other words, it fills (all or part of) the get area with characters taken from the external device.
BASE CLASS. For the stream buffer base class, basic_streambuf, underflow() is in a nonoperational mode; its implementation returns traits::eof(), which indicates that the end of the stream is reached. Any useful behavior of underflow() fully depends on the characteristics of the external device, and underflow() is well defined for the derived stream buffer classes, which redefine this virtual function. The functionality of uflow() is that of underflow() plus advancing the read position.
STRING BUFFER. A string buffer cannot make additional characters available from an external device, because string streams are not connected to an external character sequence.2 A string stream buffer can make characters available for reading only when they have previously been stored in the internal buffer, for instance, as a result of a previous output operation. Such characters are made accessible by adjusting the get area pointers; more precisely, the get area's end pointer must be moved forward to include additional positions. This pointer adjustment can be done in underflow() or uflow() as part of an input operation. Alternatively, it can be performed during overflow() as part of an output operation. The standard allows both implementations.
FILE BUFFER. A file buffer's underflow() function makes additional characters available by reading new characters from the file. It then converts them to the internal character representation (if necessary), writes the result of the conversion into the get area, and returns the first newly read character.
INSERTING OUTPUT TO THE OUTPUT SEQUENCE
A character is written to the output sequence via the public member function sputc(). As an argument to sputc() the stream buffer receives a character to be inserted into the output sequence. If the put area exits and it is not already full, i.e., next_pointer != 0 and next_pointer < end_pointer,="" then="" the="" character="" is="" put="" to="" the="" position="" the="" next_pointer="" is="" referring="" to,="" and="" the="" next_pointer="" is="" incremented.="" if="" the="" put="" area="" does="" not="" exist="" or="" is="" full,="" then="" the="" protected="" member="" function="">overflow() is called, taking the character as an argument.
The general notion of overflow() is to make positions in the internal buffer available by writing characters to the external sequence, in other words, it empties (all or part of) the put area by writing characters to the external device. If the character received as an argument to overflow() does not equal end-of-file, this character is placed into the "fresh" internal buffer; otherwise no additional character is placed into the put area.
BASE CLASS. For the stream buffer base class, basic_streambuf, overflow() is in a nonoperational mode; its implementation returns traits::eof(), which indicates that the end of the stream is reached. Any useful behavior of overflow() fully depends on the characteristics of the external device, and overflow() is well defined for the derived classes, which override this virtual function.
STRING BUFFER. String buffers make positions in their internal buffer available by extending the buffer. The overflow() function reallocates a new, larger character array. Then the character passed to overflow() as an argument is added to the put area, and the get area's end pointer might be adjusted to include this new character.3
FILE BUFFER. A file buffer makes positions in its internal buffer available by writing to the external file. To be precise, it converts the characters contained in the put area to the external character representation (if necessary) and writes the result of the conversion to the file. After that it puts the character that was received as an argument to overflow() into the (fully or partly) emptied put area, unless it was equal to end-of-file.
PUTTING BACK CHARACTERS TO THE INPUT SEQUENCE
The stream buffer's public interface provides two function for putting back characters to the input sequence: sputbackc() and sungetc().
sputbackc() receives a character as an argument. This character is to be put back to the input sequence, that is, stored in the input sequence before the current read position. If the get area exits and characters have already been read, i.e., next_pointer != 0 and begin_pointer < next_pointer,="" a="" putback="" position="" is="" available.="" in="" this="" case,="" the="" character="" is="" stored="" in="" the="" position="" before="" the="" one="" the="" next_pointer="" currently="" refers="" to,="" and="" the="" next_pointer="" is="" decreased="" so="" that="" it="" points="" to="" this="" previous="">
sungetc() does not take an argument, but simply decrements the current read position, which has the effect of putting back the previously extracted character. No actual write access to the input sequence takes place, only the next_pointer is moved one position back.
pbackfail() is a protected member function that stores a character at the previous position in the input sequence and makes available additional putback positions.
Both sputbackc() and sungetc() call the protected member function pbackfail(). sungetc() uses the second functionality of pbackfail() and invokes it, if the get area does not exist (i.e., next_pointer == 0) or if no putback position is available (i.e., begin_pointer == next_pointer). sputbackc() uses both features of pbackfail(). It invokes pbackfail() when no putback position is available and when a character is put back that is different from the previously extracted one, that is, when an actual write access to the input sequence is required.
The general notion of pbackfail() is: (1) to store the character received as argument at the previous position, and to adjust the get area pointers so that the next read request will return the character that was put back, and (2) to make a putback position available in the internal buffer.
BASE CLASS. For the stream buffer base class basic_streambuf, pbackfail() is in a nonoperational mode; its implementation returns traits::eof(), which indicates failure. Any useful behavior fully depends on the characteristics of the external device, and pbackfail() is well defined for the derived classes, which override this virtual function.
STRING BUFFER. For string stream buffers, only the functionality (1) of pbackfail(), storing a character in the input sequence, is implemented. The next_pointer is decreased, and if the character to be put back is not the previously extracted one, the new character is stored at that position.
Functionality (2), making available additional putback positions, does not make sense for a string stream buffer. Putback positions are available only if characters have previously been extracted from the string. When there are no previously extracted characters, pbackfail() cannot make any available either.
FILE BUFFER. For file stream buffers, functionality (1), storing a character in the input sequence, is implemented in the same way as for string stream buffers. The next_pointer is decreased, and if the character to be put back is not the previously extracted one, the new character is stored at that position. A file buffer might fail to actually store the character, because the associated file was opened only for input and does not allow write access.
Functionality (2), making available additional putback positions, is implemented-dependent. For a file stream buffer it is conceivable that additional putback positions are made available by reloading characters from the external file. The standard, however, does not specify any implementation details.4
The subsequent two sections describe the behavior of the string buffers and file buffer in terms of an example. We explain in detail how input and output sequence, the internal character buffer, and the get and put areas are related to each other for these two derived classes. The third section describes the principle of the putback area, which is basically the same for string buffers and file buffers.
In order to show the principles, we make assumptions about the implementation of these classes. Standard compatible implementations, however, are allowed to differ and may work in a slightly different way than demonstrated in the following. Still, the general principles will be the same. The implementations of string buffers and file buffers override the virtual functions discussed above in order to achieve the results that we are going to describe. In the following, we do not aim to explain exactly how each of the virtual functions is redefined, but we intend to explain the overall net effect. Details of how to redefine which of the virtual functions, and under which circumstances, are discussed in section 3.4, Adding Stream Buffer Functionality.
String Stream Buffers
A string stream buffer maintains an internal buffer that is large enough to hold the entire external sequence; the get area contains the entire input sequence, and the put area represents the entire output sequence.
The get and put areas are related and available simultaneously. Figure 2-7 shows a typical situation:
Figure 2-7: Get and put area of a string stream buffer.
In this example the capacity of the internal character buffer is 16 characters, which is utterly unrealistic for real implementations. We do this on purpose, in order to keep the example simple yet demonstrate the crucial case of what happens if the buffer is full or empty.
The character sequence Hello World\n has been written to the output sequence, and the pointers of the put area are assigned in the following way:
begin_pointer to the beginning of the character array
next_pointer to the next empty position behind the text written to the output sequence
end_pointer to the next position behind the character array
The character sequence
Hello has already been read from the input sequence, and the pointers of the get area are assigned in the following way:
begin_pointer to the beginning of the character array
next_pointer to the next position behind the text already read from the input sequence
end_pointer to the same positions as the put area's next_pointer, because it is not possible to read text that has not already been written
Let us discuss the effect of input and output operations on the string stream buffer starting from the situation described above.
OUTPUT
"NORMAL" SITUATION. In this situation we write an additional character to the string stream buffer. The put area's next_pointer refers to the next available position in the put area. Hence the additional character is put to the position the put area's next_pointer refers to. Afterwards the next_pointer is incremented, so that it points to the next available position.
"OVERFLOW" SITUATION. If we keep on adding characters to the string stream buffer, the put area will eventually be full. When the internal buffer is full, the put area's next_pointer points to the end of the buffer area, i.e., next_pointer == end_pointer. Figure 2-8 illustrates this situation:
Figure 2-8: String stream buffer is full after output.
This situation is special, because the internal buffer is full. If we want to write an additional character, the string stream buffer needs to make available a new position in the put area. This is achieved by calling overflow(). The function overflow() acquires a new character array that can hold more characters. Figure 2-9 shows the situation after the call to overflow():
Figure 2-9: String stream buffer after call to overflow().
Afterwards the new character is put into the new position in the put area and the put area's next_pointer is incremented as always.
INPUT
During all these output operations on the string stream buffer the input area basically did not change. After the reallocation of the internal buffer due to the overflow(), the get area's pointers are reassigned to the same positions relative to each other.
"NORMAL" SITUATION. If we read a character from the string stream buffer, we receive the character that the get area's next_pointer refers to. Considering the situation in figure 2-10, this is a whitespace character. Afterwards the get area's next_pointer is incremented.
Figure 2-10: String stream buffer before input.
"UNDERFLOW" SITUATION. Let us assume that we keep on extracting characters from the string stream buffer and there is no intervening insertion; i.e., the put area does not change. We will ultimately reach the end of the get area, i.e., next_pointer == end_pointer, as shown in figure 2-11.
Figure 2-11: String stream buffer with exhausted get area.
If we now try to read a new character from the string stream buffer, underflow() is called in order to make additional characters available for reading. underflow() adjusts the get area's end_pointer so that it points to the same positions as the put area's next_pointer. In this way, all previously written characters are made available for subsequent read attempts. If all previously written characters have already been read and the get area's end_pointer equals the put area's end pointer, underflow() fails. In the situation shown in figure 2-11, additional characters can be made available, and underflow() adjusts the get area's end_pointer as shown in figure 2-12:
Figure 2-12: String stream buffer after call to underflow().
DISCLAIMER. The model explained above is just one of many ways to implement a string buffer. As an alternative, overflow() could allocate a new buffer area that holds exactly one additional position and adjusts not only the put area's pointers but also the get area's end_pointer. In this way each character written is immediately available for reading, without any pointer adjustment performed via underflow() as in the example above. In this alternative model, underflow() need not be redefined at all. Naturally, this solution is less efficient than the one described before, because the internal character buffer is always full and must be reallocated for each single character written to the string stream.
PUTBACK
Figure 2-13 shows a typical situation in which a number of characters have already been read from the input sequence. In this situation, characters can be put back to the input sequence.
Figure 2-13: String stream buffer before putback.
Only the get area is relevant to our discussion of the putback support; the pointers of the put area are not affected at all. The string Hello has been extracted, and the get area's next_pointer points to the next available read position. If a character is now requested via sbumpc(), the next character (the blank between Hello and World\n) is extracted and afterwards the next_pointer points to the character W.
"NORMAL" SITUATION. Let us see what happens if we then call sungetc(), with the intention of putting back the just extracted character, which was the blank. In this case the get area's next_pointer is simply decremented and points to the blank again. The next extraction would again return the blank character, which means that the previous extraction was reversed by the call to sungetc(). A further call to sungetc() would decrement the next_pointer even further and make available the character o for a subsequent read operation.
"PBACKFAIL" SITUATION. What if, in that situation, sputbackc('l') is called instead of sungetc()? The function sungetc() is supposed to make available the character o, whereas sputbackc('l') should override the character o and put back the character l in its position. As the character that is put back is different from the character that was extracted from this position, the function pbackfail() is called, and pbackfail() performs the write access to the get area and overrides the character o.5 The situation after sputbackc('l') looks like the one in figure 2-14.
Figure 2-14: String stream buffer after putting back the character 1.
ANOTHER "PBACKFAIL" SITUATION. We can keep on putting back characters via sputbackc() or sungetc() until we hit the beginning of the get area, as shown in figure 2-15. The next attempt to put back a character triggers pbackfail(), which is supposed to make further putback positions available. The get area's next_pointer cannot be decremented any further, and pbackfail() indicates failure. Only if characters are read from the get area will putback positions become available again.
Figure 2-15: String stream buffer with the putback position available.
Note that the put area's pointers are not affected by any of the putback operations. However, overwriting characters in the get area by means of sputbackc() changes the content of the internal buffer, much like an output operation. The modifications will be visible when the content of the string buffer is retrieved via str(), for instance.
File Stream Buffers
For a file buffer the internal character buffer is usually smaller than the external sequences; i.e., the internal buffer normally holds only subsequences of the external sequence as get and put areas.6
It is implementation-defined, how large the internal buffer is, whether the file stream buffer maintains two separate character arrays to represent the get and put areas respectively, or whether there is a shared internal character array for both areas. The assumed sample implementation we present in the following sections is one of a variety of conceivable implementations of a file stream buffer. Your particular implementation might have implemented a different scheme.
In our assumed implementation, the file stream buffer maintains only one internal character array, which is of a fixed size and too small to hold the entire content of the external character sequences. For this reason, the internal character array holds only a subsequence of the input sequence in the get area and a subsequence of the output sequence in the put area. Logically, both the put and get areas are present simultaneously; in practice only one of them can be active at a time, because the file stream buffer has only one internal character buffer: During output operations, the internal character array represents the put area, and the get area is inactive; During input operations, the internal character array represents the get area, and the put area is inactive.
The respective inactive area does logically exist, but it may not be immediately accessible. If, for instance, the get area is active, no output operation should be triggered, because it would need access to the currently inactive put area. An output operation can only follow an input operation if the file is repositioned in between, which puts the file stream buffer into a neutral state, from which it can reactivate the put area and make its content available in the internal buffer.
Let us first explore input and output separately before we discuss the scheme for exchanging the get and put areas while switching from input to output and vice versa.
OUTPUT
Initially, neither the put nor the get area is available. An area is considered unavailable when its next_pointer is zero. The begin_pointer and the end_pointer are undefined when the next_pointer is zero; they can also be zero or have any other arbitrary value. The content of the internal character buffer is undefined, too, in this situation; it might be empty, filled with garbage, or not even allocated. Figure 2-16 shows this neutral situation.
Figure 2-16: File stream buffer in neutral state.
Any output request in that neutral situation triggers overflow(), which activates the put area, places the first character into the internal character buffer, and adjusts the put area's pointers. Afterwards, the internal buffer area is filled with the remaining characters that were passed to the output operation, and the next_pointer is advanced accordingly. Figure 2-17 shows the situation after output of the string Hello World\n.
Figure 2-17: File stream buffer after an output operation.
If we keep on writing output to the file stream buffer, the put area's next_pointer will eventually hit the end_pointer. Then overflow() is called again in order to make available additional put positions. overflow() achieves this by transferring data from the internal buffer via code conversion (if necessary) to the external file. It is implementation-dependent whether all or only parts of the data in the internal buffer are transferred to the external file. The standard requires only that overflow() make "enough" positions available in the buffer; it does not specify how many positions. For our sample implementation, we assume that the entire internal character buffer is written to the external file. Afterwards, overflow() stores the first character in the internal character buffer and adjusts the put area pointers as shown in figure 2-18.
Figure 2-18: File stream buffer immediately after overflow().
Now there is plenty of room in the put area for further output, and the output request that triggered overflow() can be completed.
The character sequence that is transferred from the internal character buffer to the external file during overflow() is placed into successive locations on the external file starting at the current external file position. Where the external file position indicator stands depends on the circumstances.
Immediately after a file stream buffer is connected to an external file (via open()), the external file position indicator is either at the beginning of the file, which is the default situation, or at the end of the file, if the open mode included the at-end flag.
After preceding output operations (via sputc(), sputn()), the external file position indicator stands where the last output operation left it.
After an explicit repositioning of the stream position (via seekoff(), seekpos()), the external file position indicator is reset to a corresponding position in the external file.
If the open mode includes the append flag, the external file position indicator stands at the end of the file and cannot be repositioned to any other position.
INPUT
Input, like output, starts with a neutral situation, in which neither get nor put areas are active. Figure 2-19 shows this neutral situation.7
Figure 2-19: File stream buffer in neutral state.
An input request in this situation triggers underflow() in order to make available get positions for reading. This is achieved by transferring data from the external file via code conversion (if necessary) to the internal character buffer. It is implementation-dependent whether underflow() fills the entire internal buffer or only a part of it with characters transferred from the external file. In our sample implementation we assume that underflow() fills the entire internal buffer if possible. The get area is activated, and the get area's pointers are adjusted. Figure 2-20 shows the situation after the invocation of underflow().
Figure 2-20: File stream buffer immediately after underflow().
This is the situation after requesting the first character from the file stream via sgetc(). Had we extracted the character via sbumpc() instead of sgetc(), uflow() would have been called instead of underflow(), with basically the same result. The only difference would be that the put area's next_pointer would be advanced by one position and point the next available read position.
If we keep on requesting input from the file stream buffer, the get area's next_pointer will eventually hit the end of the internal buffer. underflow() or uflow() will then be triggered again. These operations discard the current content of the internal character buffer and transfer the next sequence of characters from the external file into the internal buffer.
The character sequence that is transferred from the external file to the internal character buffer during underflow() or uflow() is taken from successive locations on the external file starting at the current external file position. Where the external file position indicator stands depends on the circumstances.
Immediately after a file stream buffer is connected to an external file (via open()), the file position indicator is either at the beginning of the file, which is the default situation, or at the end of the file, if the open mode included the at-end flag.
After preceding input operations (via sgetc(), sbumpc()), the external file position indicator stands where the last input operation left it.
After an explicit repositioning of the stream position (via seekoff(), seekpos()), the external file position indicator is reset to a corresponding position in the external file.
SWITCHING BETWEEN INPUT AND OUTPUT
On bidirectional file streams, input and output operations are allowed, and for this reason, a bidirectional file stream uses its file stream buffer's put and get areas. Switching between input and output operations must obey certain rules, which are described in section 1.4.3, Bidirectional File Streams. A brief recap:
After output, the file stream must be flushed or repositioned before any input is permitted.
After input, the file stream must be repositioned before any output is allowed, unless the preceding input operations have reached end-of-file, in which case output can immediately follow input.
In our example, where the file stream buffer has only one internal character array, which represents either the put or the get area, the file stream buffer must exchange the get and put areas with every switch between input and output operations. Again, the following explanations are based on our sample implementation; your particular implementation might work differently.
SWITCHING FROM OUTPUT TO INPUT
Let us assume that the last operation on the file stream buffer was an output operation, in which case the put area is active and the get area is inactive. An example is shown in figure 2-21.
Figure 2-21: File stream buffer after an output operation.
Before any input operation can follow, the file stream must be flushed or repositioned, due to the rules for file stream operations. Both operations trigger the file stream buffer to transfer the content of its internal character buffer to the external file. After this transfer, the file stream buffer is in its neutral state again, that is, both areas are inactive, as shown in figure 2-22.
Figure 2-22: File stream buffer in neutral state after flush or repositioning.
If the requested operation was a request for repositioning, the file stream buffer not only transfers the content of the internal buffer to the external file but also resets the file position indicator of the external file as requested. Resetting the external file position indicator only affects the external file but has no direct effect on the get or put areas.
An input operation following the flush or repositioning works as described earlier for input in general: The get area is not available. As a result, underflow() or uflow() is called, characters are transferred from the external file to the internal character buffer, and the get area's pointers are adjusted accordingly. The character sequence transferred from the external file starts at the current external file position. Depending on whether the preceding operation was a flush or a repositioning, the external file position is either the last write position or the position to which the file position indicator was repositioned. Figure 2-23 shows the situation after a successful input operation.
Figure 2-23: File stream buffer after an input operation.
SWITCHING FROM INPUT TO OUTPUT
After this input operation, the get area is active and the put area is inactive. The situation is exactly as shown in figure 2-23. An output operation can follow only if the input operation reached the end of the file. Otherwise, before any output operation can follow, the file stream must be repositioned.
Reaching the end of the file during input puts the file stream buffer into its neutral state, because the entire file content has been consumed, and further input is not possible without any intervening output or repositioning. For that reason, the content of the internal character buffer can be discarded and both areas deactivated. As expected, the file position indicator of the external file stands at the end of the external file in this case.
Repositioning, too, involves the file stream buffer's discarding the content of its internal character buffer and putting itself into the neutral state, in which both areas are inactive. The file position indicator of the external file is reset accordingly, which affects only the external file but has no immediate effect on the get or put areas.
No matter whether the file stream is repositioned or whether the preceding input operation has reached the end of the file, the file stream buffer is put into its neutral state, as shown in figure 2-24.
Figure 2-24: File stream buffer in neutral state after repositioning or reaching end of file during input.
An output operation in this situation works as described earlier for output in general: First, overflow() is invoked, which activates the put area. Then the respective character sequence that was passed to the output operation is stored in the internal buffer area, and the put area's pointers are adjusted. Figure 2-25 shows the situation after successful output of Hello World\n.
Figure 2-25: File stream buffer after an output operation.
DISCLAIMER. The explanations given above regarding the management of a file stream buffer's put and get areas are not to be taken literally. An implementation is free to achieve the same effect in a different way. In particular, the neutral state can be expressed in a different way, but it always exists logically. The neutral state serves as the initial state of a file stream buffer, but it is also logically reached when input operations hit the end of the file or when the stream position is reset. A file stream buffer may also put itself into the neutral state for other reasons, such as error situations. How the neutral state is expressed or how exactly an implementation of a file stream buffer uses its internal character buffer(s) to represent the put and get areas is an implementation detail left open by the standard.
PUTBACK
Putting back characters to the input sequence via sungetc() or sputbackc() can be successful only following preceding input operations. Let us consider such a situation. As a result of the preceding input operations, the get area is active, and the file stream buffer might look like the one shown in figure 2-26.
Figure 2-26: File stream buffer after an input operation.
Putting back the previously read character means decrementing the get area's next_pointer. Putting back a character different from the previously read one means decrementing the get area's next_pointer and storing the different character at that location in the internal character buffer. pbackfail() is responsible for this write access to the get area. The write access will be rejected if the file stream buffer is not connected to an open file. Figure 2-27 shows the situation after three previously read characters have successfully been put back.
Figure 2-27: File stream buffer after some putback operations.
If we keep on putting back characters, we will eventually hit the begin_pointer. Then the next_pointer cannot be decreased any further, and pbackfail() is triggered in order to make further putback positions available. What pbackfail() does in such a situation is implementation-dependent. In our example, the attempt to put back any further characters will fail, because we consider it unusual that a large number of characters is put back into the input sequence, and for that reason we do not support it. Alternatively, a file buffer implementation could make additional putback positions available by extracting previously read characters from the external file, if the underlying file system allows that.
Let us discuss another situation. After successive input operations, the get area's next_pointer will eventually hit the end_pointer. Figure 2-28 shows a situation in which the get area is entirely consumed.
Figure 2-28: File stream buffer with consumed get area.
The next input operation triggers underflow() or uflow(), which then refills the internal buffer from the external file. In order to allow putback of characters even immediately after underflow() or uflow(), we can keep the first four positions in the internal character buffer reserved as putback positions in our sample implementation. The number of putback positions a file stream buffer reserves, if any, is implementation defined. In our sample implementation, underflow() or uflow() copies the last four characters of the consumed get area to the first four locations of the internal character buffer before they fill the rest of the internal buffer with characters transferred from the external file. Figure 2-29 shows the file stream buffer after invocation of underflow().
Figure 2-29: File stream buffer after overflow(), showing the reserved putback positions.
Now it is possible to put back four characters into the get area even if it has just been refilled from the external sequence.
In general, putting back characters is possible only if the get area is active, which means that for bidirectional file streams putback cannot immediately follow an output operation. The same rules as for input following output apply, that is, the file stream must be flushed or repositioned before any characters can be put back into the input sequence after an output operation.
If an output operation is performed after putting characters back into the input sequence, the entire get area, including the putback positions, is discarded to make room for the put area. As a result, any changes made to the putback positions are lost.
ACKNOWLEDGMENTS
This article is excerpted from the new book Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft, © 2000 Addison Wesley Longman Inc., which contains further detailed treatments of points touched on briefly in this article, including formatted and unformatted I/O using iostreams, internationalization using locales and standard facets, techniques for user-defined I/O operations, special-purpose streams and stream buffers and user-defined facets. For the complete table of contents see http://www.awl.com/cseng/titles/0-201-18395-1/.
FOOTNOTES
1 The list of stream buffer operations is not meant to be complete. Only the most important and typical functions are listed. For a complete description of the stream buffer base class's interface, see the reference section. Also, section 3.4, Adding Stream Buffer Functionality, provides more details on the protected interface.
2 Section 2.2.3, String Stream Buffers, explains in greater detail why this is.
3 The adjustment of the get area's end pointer might alternatively be deferred to the next input operations and would then be performed during underflow().
4 Details of a typical implementation are described in section 2.2.4, File Stream Buffers.
5 Positions in the internal sequence are overwritten only if the stream buffer's open mode allows it. A stream buffer whose open mode does not include output mode will not allow any write access to the internal sequence.
6 Only in rare situations, when the file size is less than or equal to the buffer size, can the internal buffer hold the whole file.
7 Whether the initial neutral state exists in practice is implementation defined. An implementation can also activate the get area right away and fill it with characters transferred from the external file before any actual input request.