drops

Linker drops

Bogdan Deac

20 Feb 2023 • 5 min read

Some notes about linkers, targeting ELF, Linux and C/C++.

Basics

A linker is a computer program. It takes one or more object files and combine them into a single executable, library or other object file.
An object file is generated by a compiler or an assembler and is composed by:
- sections -> code (.text), data (.data), etc.
- symbol table
- relocation records
- additional information used for debugging
The linker has two important tasks to perform:
- find all the symbols and complete the symbol table
- apply relocations.
By relocation, addresses for the code and data are updated with the run-time addresses.
There are two types of linking: static and dynamic.
The static and dynamic linking differ in the moment of time and the method that is used to load and resolve the references to the library.

Symbol table

A symbol table is a data structure. It is used by compilers, linkers and interpreters.
It associates keys with values: symbol (function/variable name) [key] -> address & type for that symbol [value].
Actually, a symbol table entry is more complex:

// From glibc/elf/elf.h

 typedef struct
{
  Elf64_Word	st_name;		/* Symbol name (string tbl index) */
  unsigned char	st_info;		/* Symbol type and binding */
  unsigned char st_other;		/* Symbol visibility */
  Elf64_Section	st_shndx;		/* Section index */
  Elf64_Addr	st_value;		/* Symbol value */
  Elf64_Xword	st_size;		/* Symbol size */
} Elf64_Sym;

a symbol holds metadata that helps the interpretation of the machine code
There are two symbol tables in an object file:
- .symtab
  - contains all the symbols, including the ones from .dynsym
  - is not needed at runtime
  - can be removed completely and the program will run fine
  - it is only needed at link-time
  - it resides in non-allocable ELF section -> not needed at runtime, but needed by linkers, debuggers and other tools
- .dymsym
  - contains only the symbols involved in dynamic linking
  - this kind of symbols are less than the rest of the symbols from an ELF file
  - is needed at runtime
  - it resides in allocable ELF section

General linking steps

read the input object files
determine the length and type of the sections
read the symbols
build a symbol table
link the undefined symbols
analyze the sections and decide where they should be placed in the executable file
solve the relocations
generate the final executable
as an optional step, the symbol table can be included in the executable file

Static linking

Static linking resolves the references (symbols) from an object file at the executable's creation-time.
This process is called early binding.
Static linking is used with static libraries.
On Unix, a static library:
- has .a extension
- is prefixed with lib
- is passed to the linker using -l option
- may contain multiple object files
If a particular symbol's definition is needed, the whole object file that contains that symbol is statically linked in the executable file.
This means that the static library will be included in the final executable file
Advantages
- static references are more efficient than the dynamic ones; see below
- the application does not have external dependencies on libraries and is portable across a lot of platforms
Disadvantages
- the executable occupies more space on the disk
- the executable occupies more space on the virtual memory, especially if many instances of the same program are executed
- if a new version of the static linked library is released (e.g. some bugs are fixed) the executable must be re-built to include the updates
- if there is an OS update and a new library version is needed to work with it properly, the static linked executable will not work anymore and must be re-built

Dynamic linking

Dynamic linking resolves the references (symbols) from an object file at load-time.
This approach is used with dynamic libraries, also named shared libraries.
Dynamic library
- has the following extensions: .so (Unix), .dll (Windows), .dylib (MacOS)
- does not have a predefined loading address; it will be determined at load-time
- the code inside a dynamic library must be position independent because it may run at different addresses, depending on the process that use it
  - this feature is achieved by using:
    - global offset table (GOT) and procedure linkage table (PLT) or
    - load-time relocation
    - more about these on another article
- Usually, there is only one copy of a shared library inside the memory
- The library is loaded in a shared segment which is shared by all processes that reference that library
Steps
- the linker finds an unresolved symbol in the user code
- if the code is linked against a shared library, the definition of the symbol is not included in the executable
- instead, a record with the name of the symbol and the dynamic library that contains it is kept
- the program is launched into execution -> load-time
- there are two methods to resolve the references to a dynamic library: dynamic linker or dynamic loading (runtime linker)
  - dynamic linker
    - if ld-linux.so is present in the .interp section of the launched program, the kernel launches the dynamic linker/loader ( ld-linux.so)
    - ld-linux.so takes care of all the references to the dynamic library and the relocations
    - the control is transferred back to the original program
  - dynamic loading
    - in this case, the dynamic library is not loaded by the OS, but by the program itself
    - the program specifies what library to be loaded
    - ld-linux.so is involved in this case too
    - there is a Dynamic Loading API that can be used by the application
    - after the dynamic library is loaded, the application can call functions within it
    - if there are no additional calls to the dynamic library, it can be unloaded
- if the dynamic library is loaded by the OS, but it doesn't exist, the application will stop; if the dynamic loading is used instead, the application can log an error message and continue its execution by taking an alternative execution path
Advantages
- the program's size is reduced by using dynamic linking
- less RAM is used during execution because a dynamic library is loaded only once
- the program can be loaded faster if the referenced libraries are already present in the memory
- the program can benefit from updated libraries without recompiling the program
Disadvantages
- there can be performance penalties at load-time if the referenced library is no present in the memory
- the calls to functions that reside in the shared segment require more CPU work because implies some extra instructions to access the segment
- if a dynamic library changes its ABI or is removed from the system, the programs that reference it will not work anymore
- dynamic library references are less efficient than the static ones; more page faults and TLB misses are generated by the fact that the shared code may be scattered widely in the memory

Interesting facts

A dynamic library can define multiple versions for its symbols. That means that if a dynamic library is changed in such a way that the ABI (e.g. return type or parameter types of a function) is changed too, the library will define a new version for its new symbols. All the previous executable files that were linked to the previous library version will still work because they will use the old version of the symbols.
The linker can be controlled by a linker script, written in the linker command language. The linker script defines how the sections in the input object files should be mapped in the output executable file. If a custom linker script is not provided, the default one is used.
By using lazy linking the dynamic linker will resolve the functions' references only when they are called.
If two processes use the same dynamic library, they can share the same .code segment of the library, but not the same .data and .bss segments.

Tools

nm -> list the symbols from an object file; there can be symbols that are marked with U which indicates an undefined reference for a certain symbol.
ldd -> print all the required libraries and their dependencies.
dlopen -> load a dynamic shared object.
dlsym -> obtain the address of a symbol in a shared object or executable.
readelf -> inspect the fields of an ELF file; e.g. readelf --dynamic /bin/ls
objdump -> display information from object files
ar -> create a static library.

If you enjoy my work please consider supporting it by buying me a coffee

📩 Please feel free to share this article with colleagues and friends who will find it valuable.