How does the linker read C source code?

A C program may consist of multiple separately compiled parts, which are then combined into a single unit by a program typically called a linker (or loader). Because a compiler generally processes only one file at a time, it cannot detect errors that would require examining multiple source files simultaneously. Furthermore, in many systems, the linker is implemented independently of the C language; therefore, if the cause of the aforementioned error is C-related, the linker is equally helpless. Some C implementations provide a program called lint that can catch a large number of such errors, but unfortunately, not all C implementations provide this program. If you can find a program like lint, you should definitely make good use of it; this point cannot be overstated.

A key concept in C is separate compilation, meaning that several source programs can be compiled separately at different times and then integrated together at the appropriate point. However, the linker is generally separate from the C compiler and cannot understand many of the details of the C language. So, how does the linker manage to merge several C source programs into a single unit? Although the linker doesn't understand C, it does understand machine language and memory layout. The compiler's responsibility is to "translate" the C source programs into a form meaningful to the linker, allowing the linker to "read" the C source programs.

A typical linker integrates several object modules generated by a compiler or assembler into a single entity called a loader module or executable file, which can be directly executed by the operating system. Some object modules are directly provided to the linker as input; others are obtained from library files containing functions like `printf`, as needed during the linking process. The linker typically treats object modules as a set of external objects. Each external object represents a part of machine memory and is identified by an external name. Therefore, every function and every external variable in the program, unless declared as `static`, is an external object. Some C compilers may modify the names of static functions and static variables, treating them as external objects as well. Because of this "name mangling," they will not conflict with functions or variables of the same name in other source files.

Most linkers prohibit two different external objects within the same loaded module from having the same name. However, when multiple object modules are integrated into a single loaded module, these object modules may contain external objects with the same name. A crucial task of the linker is handling these naming conflicts. The simplest way to handle naming conflicts is to simply prohibit them entirely. This is perfectly correct when the external object is a function; a compiler should not accept two different functions with the same name if a program includes them. However, the problem becomes more complex when the external object is a variable. Different linkers handle this situation differently.

The linker takes a set of object modules or library files as input. Its output is a loader module. The linker reads object modules and library files and generates the loader module simultaneously. For each external object in each object module, the linker checks the loader module to see if a named external object already exists. If not, the linker adds the external object to the loader module; otherwise, it handles the naming conflict.

Besides external objects, a target module may also include references to external objects in other modules. For example, a target module generated from a C program that calls the function `printf` includes a reference to the function `printf`. It can be inferred that this reference points to an external object located in a library file. During the linker's generation of the load module, it must simultaneously record these references to external objects. When the linker reads a target module, it must resolve all references to external objects defined within that target module and mark them as no longer undefined.

A C program may consist of multiple separately compiled parts, which are then combined into a single unit by a linker. Because a compiler typically processes one file at a time, it can detect errors that would otherwise require examining multiple source files. Some C program implementations provide a lint utility that can catch a large number of these errors, but not all of them!

How does the linker read C source code?

Read next

CATDOLL Bebe 92CM Body with TPE Material

CATDOLL Sasha Hard Silicone Head

CATDOLL 131CM Amber Silicone Doll

CATDOLL 139CM Vivian (TPE Body with Soft Silicone Head)