A coder's home for Marc "Foddex" Oude Kotte, who used to be located in Enschede, The Netherlands, but now in Stockholm, Sweden!
foddex.net

C++ Lesson 3: Compiling, Compile Units And Linking

When writing C++, it is imperative to know details of how a C++ compiler works. Though there are many differences between a Unix and Microsoft platform, the basics behind these concepts are identical. There are a few differences on a syntactical level (yes!), but you will not encounter them until you get to dynamically linked in files (.dll or .so files), so they are not yet covered here.

First of all, you should be aware that other than is the case with scripting languages, C++ compilers convert source files into intermediate output files. All code in each file gets syntactically checked, semantically verified and then stored in an intermediate format: object files. Most compilers create a .o file for each compile unit (i.e. cpp file). So if you create two cpp files named main.cpp and test.cpp, after succesfull compilation you'll have a main.o and test.o file. These files are not yet in an executable form. This is very important to realize. C++ works in two phases: compiling and linking.

Compiling


The concept of compiling simply means that your code in one format (i.e. C++) is checked and then converted into a different format. This output file contains all functions, class functions and variables as you defined them in the compile unit you compiled. For example, let's create a new file test.cpp and place the following code inside:

#include <stdio.h> void testfunction( int param1 ) { printf("testfunction was called with parameter %d\n", param1); } int counter = 0; void callme() { printf("you have called this function %d times before\n", counter ); counter = counter + 1; }


When you compile this file, the resulting object file contains the
counter
variable of type
int
, and two functions,
void testfunction( int )
and
void callme( void )
. It's very important to know that result and parameter type information is stored in the output file. Next, let's create the main.cpp file for this lesson:

void testfunction( int ); void callme(); int main( int argc, char** argv ) { callme(); testfunction( -1 ); callme(); return 0; }


When you compile this file, the resulting object file contains the code for the
int main( int, char** )
function. However, it also contains something else: a reference to a function
void testfunction( int )
, and a reference to a function
void callme( void )
. All of these "things" (i.e. variables, functions and references) are called symbols. More on this soon. As said earlier, these output files are not yet in executable form. Since C++ projects are often large, and you often wish to separate code for functional or semantical reasons into different files, this intermediate output format was devised (among many, many other reasons), so you could recompile only small parts of the application, and then re-link.

The final step to create your executable is to link it. The process performing this task is called the "linker". In many occasions this is the same executable as the compiler being called with different parameters, but the functional difference is what counts. A linker receives a set of output files - which are input files to the linker itself - and creates an output file: your executable.

Linking


The process of the linker is basically very simple. The first step it takes is to verify that no symbol appears more than once. If you define the
void callme( void )
function in main.cpp as well as test.cpp, the linker will refuse to create your executable as it wouldn't know which of the two versions it oughta use when an unresolved reference to the function is encountered. Next, it makes sure that all unresolved references (to either functions or variables) can be resolved! You can compile the main.cpp file without the presence of the test.cpp or test.o file just fine. However, linking the main.cpp file into an executable cannot be done without the test.o file! Please take a moment to review the aforementioned concept of linking, as it is very, very important and you'll most likely run into linking issues during many phases of development.

To make all of this more clear, have a look at the following;

void xyz( float, int ); void b00h(); int main( int argc, char** argv ) { booh(); xyz( 2.0f, 5 ); return 0; }


This file can be compiled just fine. However, once you try to link it into an executable, you'll get errors like this:

undefined reference to `xyz(float, int)' undefined reference to `b00h()'


In laymans terms this means: hey, you say a function
xyz
is available, but I can't find it in any of the .o files!

This brings me to the concept of compile units. Every cpp file you compile is its own small container of variables, functions and references. This is called a compile unit.

Declarations


However, if compile units are self-contained code containers, how will two compile units be able to use each others functions? You already might have noticed the use of declaring functions rather than implementing (or defining) them. To clarify, this is a declaration:

void show_usage( int );


While this is a definition:

void show_usage( int ) { // ... code goes here... }


By declaring a function, you tell the compiler: "hey, at link time (sic!), you will have a function or variable available, I swear!" The compiler will accept your promise, turn it into a reference symbol in the output file, and go on. Do note that you are allowed to declare a function or variable more than once inside a single compiler unit! So the following is perfectly fine C++:

void show_usage( int ); void show_usage( int ); void show_usage( int ); void show_usage( int ); void show_usage( int );


Now if you have a project with 10 cpp files, you might realize that it becomes very, very cumbersome if one of these files contains a set of often re-used tools. Every single one of the 9 other cpp files will need declarations of the utility functions in order to satisfy the compiler. This becomes annoying and hard to maintain very fast!

Header files


The solution? So called header files! Header files are literally included in your compile unit (cpp file). They contain declarations of the functions and variables of the accompanying source cpp file. All you need to do is include the header file in any cpp file that needs to use those functions and you're done!

An example:

tools.h

void show_usage(); void print_copyrights();


tools.cpp

#include <stdio.h> #include "tools.h" void show_usage() { printf("usage: helloworld\n"); exit(1); } void print_copyrights() { printf("(c) Copyright 2010 - You!\n"); }


main.cpp

#include <stdio.h> #include "tools.h" int main( int argc, char** argv ) { print_copyrights(); if (argc != 1) show_usage(); printf( "hello world!\n" ); }


You might have noticed that both cpp files contain the same include of the
stdio.h
file. Regardless of what it's for (not covered in this lesson), you might consider it redundant. Surely there's a way to do this more effective? Ofcourse there is. Header files can include other header files!

#include <stdio.h> void show_usage(); void print_copyrights();


With this
tools.h
file, you can remove the includes of the
stdio.h
file from both cpp files.

Statics


Up till now, we assumed that everything in a cpp file should be usable in other cpp files, i.e. are referenceable. Yet there are plenty of cases in which you would want a function or variable to be private for a single compile unit only. For example:

file1.cpp

int counter = 0; void my_function() { ++counter; printf( "\n" ); }


file2.cpp

int counter = 0; void count_func() { for (counter=0; counter<10; ++counter) printf("counter = %d\n", counter); }


These two functions do not intent to share the
counter
variable. However, when these would be linked into an executable, the linker would complain that it sees two
counter
variables of type
int
! This is undesired, luckily the solution is simple: make both variables
static
. This keyword that works for both functions and variables tells the compiler that although the code is present in a compile unit, it should NOT be included in the list of available symbols. In this way the linker will no longer be confused about duplicate symbols! Variables and functions can be made static as follows:

static int counter = 0; static void myfunc() { // code goes here... }



2 comment(s)

Click to write your own comment

On Thu 04-02-2010 01:48 Burn wrote: Great stuff! Of all the tutorials I've ever read, none explained this basic, yet fundamental, stuff. I mean, ok syntax and semantics are important, but in the end one is unlikely to browse open source projects and find single compile units, and goes wondering where all the rest comes from and how it gets put together. Best tutorial ever!

Just a heads-up: it's not 100% clear when jumping here from turorial #2 that we're back on VC Express 2008.. it becomes evident soon but I was still looking at my unix shell and wondering.
On Thu 04-02-2010 22:17 Foddex wrote: Burn, I'm unsure what you're getting at. All of this works the same in Linux as well as VC Express 2008. As mentioned at the start of this lesson there are syntactical differences, but they're related to how DLLs and SOs work, not regular executables.

Can you elaborate on what didn't work for you?
Name:
URL: (optional!)
Write your comment:
Answer this question to prove you're human:
What's the last name of the first black American president?