Kefir is an independent compiler for the C17/C23 programming language, developed by Jevgenij Protopopov. Kefir has been validated with a test suite of 100 software projects, among which are GNU core- and binutils, Curl, Nginx, OpenSSL, Perl, Postgresql, Tcl and many others. The compiler targets x86_64 architecture and System-V AMD64 ABI, supporting Linux, FreeBSD, NetBSD, OpenBSD and DragonflyBSD. The project intends to provide a well-rounded, compatible and compliant compiler, including SSA-based optimization pipeline, debug information generation, position-independent code support, and bit-identical bootstrap. Kefir integrates with the rest of system toolchain --- assembler, linker and shared library.
#At a glance
Kefir:
- Supports the C17 standard -- including complex and imaginary numbers, atomics, variable-length arrays, etc. (see Implementation quirks).
- Supports the C23 standard -- including bit-precise integers and
_Decimalfloating-point support (see Implementation quirks). - Supports some of widespread GNU C built-ins, certain extensions, inline assembly, 128 bit integers.
- Is written in C11 -- runtime dependencies are limited to the standard library, bits of POSIX and the shell.
- Targets x86_64 and System-V ABI -- primarily Linux (both glibc and musl libc), secondarily FreeBSD, NetBSD, OpenBSD and DragonflyBSD (see Supported environments).
- Is extensively validated on real-world open source software test suites -- including dozens of well-known projects (see Testing and validation).
- Implements SSA-based optimization pipeline with two SSA phases -- primarily targetting local scalars: local variable promotion to registers, dead code elimination, constant folding, global value numbering, loop-invariant code motion, function inlining, tail-call optimization, but also providing conservative global memory access optimization, as well as target-specific optimizations (see Optimization and codegen).
- Supports DWARF5 debug information, position-independent code, AT&T and Intel syntaxes of GNU As and has limited support for Yasm.
- Implements bit-identical bootstrap -- within fixed environment, Kefir produces identical copies of itself.
- Is able to generate freestanding assembly code -- with the exception for
thread-local storage (might require external calls per ABI),
_Decimalfloating-point numbers, and atomic operations of non-platform native sizes, which requirelibatomic-compatible library. - Provides
cc-compatible command line interface. - Is able to output internal representations (tokens, abstract syntax tree, intermediate representation) in machine-readable JSON form.
- Provides auditable logs and all build artifacts for pre-release testing of the most recent release.
- Licensed under GNU GPLv3 (only) terms for the compiler, and BSD-3 terms for runtime includes (see License).
- Is written by a single developer.
- Is named after fermented milk drink -- no other connotations are meant or intended.
Important note: as the project is developed and maintained by a single person, unfunded and in spare time, the author warns that uses of Kefir in production settings might be undesirable due to insufficient level of support the author can provide.
Important note #2: to the best of author's knowledge, all of the claims above are true (and many are reproducibly demonstrated by the test suite). Yet even with full rigour, many bugs, unintended omissions, inconsistencies and misunderstandings may slip through. The author intends to faithfully represent capabilities of the project, and is especially sensitive to any overstatements in this regard. If you have any doubts, objections or otherwise disagree with the above, please do not hesitate contact the author (see Author and contacts) -- corrections will be issued immediately after identifying the deficiency.
#Installation and usage
On supported platforms, Kefir is built and tested as follows:
make test all # Linux glibc
make test all USE_SHARED=no CC=musl-gcc KEFIR_TEST_USE_MUSL=yes # Linux musl
gmake test all CC=clang # FreeBSD
gmake test all CC=clang AS=gas # OpenBSD
gmake test all CC=gcc AS=gas # NetBSD
gmake test all LD=/usr/local/bin/ld AS=/usr/local/bin/as # DragonflyBSD
The installation is done via (g)make install prefix=.... The default prefix is
/opt/kefir.
Kefir build time dependencies are:
- C11 compiler -- tested with
gccandclang. - Bash
- GNU Make
- GNU Coreutils
- Groff
- m4
- mandoc
Kefir runtime dependencies are:
- The C standard library and POSIX
- Shell
- Furthermore, for correct end-to-end compilation, Kefir requires:
- External assembler -- full support for GNU As, limited support for Yasm
- External linker -- GNU ld
- External libc -- glibc or musl libc on Linux, system libc on *BSD systems. Note that this might be different from the libc Kefir itself is linked with.
- External startfiles --
crti.o,Scrt.o, etc. libatomic-compatible library (i.e.libatomicof gcc orcompiler_rtof clang).libgccin case decimal floating-number support is desired.
Users can consult dist/Dockerfile* files that document the necessary
environment for Ubuntu (base target), Fedora and Alpine, respectively, as well
as dist/PKGBUILD for Arch Linux. For *BSD systems, consult respective
.builds/*.yml files.
Note: upon build, Kefir detects host system toolchain (assembler, linker,
include and library paths) and configures itself respectively. Upon update of
the toolchain, Kefir provides kefir-detect-host-env --environment command
whose output shall be placed into $(prefix)/etc/kefir.local file.
Note #2: aforementioned dependencies do not include optional development and
full test suite dependencies. For these, please consult dist/Dockerfile dev
and full targets.
At the moment, Kefir is automatically tested in Ubuntu 24.04, FreeBSD 14.x, OpenBSD 7.7 and NetBSD 10.x environments; Arch Linux used as the primary development environment. DragonflyBSD support is tested manually prior to release.
#Decimal floating-point support
Kefir provides support for _Decimal floating-point numbers relying on libgcc
arithmetic routines. In order to enable the support, Kefir shall be compiled
directly or transitively (i.e. bootstrapped) by gcc host compiler. Decimal
arithmetic code produced by Kefir requires linkage with libgcc; if conversion
between bit-precise integers and decimal floating-point numbers is desired,
libgcc of version 14 or newer is required.
Both BID and DPD encodings are supported, BID being the default one. To enable
DPD, pass the following Make option when building Kefir:
EXTRA_CFLAGS="-DKEFIR_PLATFORM_DECIMAL_DPD".
Kefir can bootstrap libgcc version 4.7.4 automatically:
make bootstrap_libgcc474 -j$(nproc)
#Libatomic
Kefir can build required libatomic routines from compiler_rt project via:
make build_libatomic -j$(nproc)
#Usage
Kefir implements cc-compatible command line interface and therefore can be
used as a near-drop-in replacement (see Implementation quirks) of cc in
standard compilation pipelines:
which kefir # Should output correct path to kefir after installation
# Example usage
kefir -O1 -g -fPIC -o hello_world ./hello_world.c
./hello_world
Furthermore, kefir provides a manual page that documents command-line options and environment considerations:
man kefir # Make sure that kefir installation directory is available to man
kefir --help # Identical to the manual page contents
#Portable Kefir
Kefir provides scripts to build portable and standalone Kefir distribution package for Linux. The package includes statically-linked Kefir C compiler, musl libc, and selected tools from GNU Binutils. The package is intended to provide a minimalistic C development toolchain independent of host system tooling.
make portable_bootstrap -j$(nproc)
# Build artifact is located in bin/portable/kefir-portable-*.tar.gz
#Supported environments
Kefir targets x86_64 instruction set architecture and System-V AMD64 ABI. Supported platforms include modern versions of Linux (glibc & musl libc), FreeBSD, OpenBSD, NetBSD and DragonflyBSD operating systems. A platform is considered supported if:
- Kefir can be built with system compiler and successfully executes own test suite (see Testing and validation).
- Kefir can be built with itself and successfully executes own test suite.
- Kefir passes
c-testsuiteandgcc-torturetests (see Testing and validation for the exact tests, conditions, etc). - Kefir can compile Lua and run its base test suite.
- Kefir can perform reproducible (i.e. bit-identical bootstrap) of itself within fixed environment.
To claim a platform supported, no other requirements are imposed. Other tests
and validations described in Testing and validation section are focused
predominantly on Linux to ensure overall compilation process correctness. In
general, there are very few differences between Linux and BSD system code
generation, thus full testing and validation sequence shall suffice only on a
single platform. Please note that libc header quirks are generally the main
offender of compatibility, thus additional macro definitions or individual
header overrides might be necessary. Musl libc provides the most smooth
experience, however Kefir has accumulated sufficient support for GNU C
extensions to use glibc and BSD libc implementations reasonably (consult
Implementation quirks and the external test suite part of Testing and
validation, as well as respective .build/*.yml files for platform of choice
for detailed examples).
As mentioned in the Installation section, Kefir detects system toolchain configuration on build and uses it later. The compiler also supports a set of environment variables that take precedence over the built-in configuration. Consult respective section of the manual page for details of supported environment variables.
#Standard library considerations
On Linux, Kefir works with both glibc and musl libc. Musl headers are more standards-compliant and generally provide smoother compatibility. glibc, by contrast, may introduce incompatibilities with non-mainstream compilers (see Implementation quirks).
On FreeBSD, OpenBSD, and NetBSD, the system standard library can be used, though
additional macro definitions (e.g. __GNUC__, __GNUC_MINOR__) may be required
for successful builds.
#Implementation quirks
The following details need to be taken into account:
- Kefir implementation of C23 standard provides the support for
_Decimalfloating-point numbers relying onlibgccroutines for decimal arithmetics (see Installation and usage). - The C23 standard mandates use of Unicode for
char8_t,char16_tandchar32_ttypes and literals. Kefir relies on the standard library wide character encoding facilities, and thus implements this requirement under condition that the system locale is Unicode-based. The author believes that this is a reasonable assumption. - In general, the author has much higher confidence in compatibility with features of C17 and earlier versions. As of current version of external test suite (see Testing and validation), the absolute majority of third-party projects do not rely on any of C23 features, which makes external validation of C23 support much more limited. Hereby, the author affirms that they have faithfully read the changes included into the C23 standard and implemented these in good conscience and to the best of their ability, including implementing own tests for respective features.
- With glibc on Linux, there exist corner cases where the library breaks certain
features on non-mainstream compilers. For instance, glibc overrides
__attribute__specificaton with an empty macro, omitspackedattributes, etc., breaking compatibility despite the fact that respective features are supported by Kefir. Kefir installation includes shim wrapper for<sys/cdefs.h>header to fix up the most prominent issues, however absolute compatibility cannot be guaranteed. The author recommends getting acquainted with project build configurations from the external test suite (see Testing and validation). - Atomic operations predominantly implement sequentially-consistent semantics
irrespective of specified memory order, with an exception for native scalar
atomic stores that distinguish between
releaseandseq_cstsematics. This behavior is safe and shall not break any software. Atomic operations of non-native sizes rely on external software atomic library (libatomicfrom gcc, orcompiler_rtfrom clang). Kefir links resulting executables with the library automatically in all configurations except musl libc. Furthermore, use of<stdatomic.h>system header from Clang requires-D__GNUC__=4 -D__GNUC_MINOR__=20command line arguments. - Should atomic operations on long double variables (both scalar and complex) be used, care needs to be taken due to the fact that the last 48 bits of each long double storage unit may be uninitialized. Kefir implements zeroing of uninitialized padding to mitigate possible issues.
- Reliance on the host C standard library implies that Kefir needs to implement any built-ins or compiler extensions that appear in the library headers. The author has introduced a substantial number of built-ins into the compiler, however cannot guarantee completeness in that sense. Should any of standard C library functions be unusable on the supported platforms due to missing builtins or extensions, this will be treated as a bug.
- All relevant versions of the C language standard are officially available only at a substantial cost. When working on the compiler, the author has relied on publicly available drafts of the standard (see Useful resources and links). Should any of these drafts be in contradiction with the final standard, the author will be interested in hearing specific details.
#In practice
Several practical considerations users of Kefir might need to take into account:
- Kefir cannot directly compete with well-established major compilers such as
GCC or Clang in terms of raw performance, portability or breadth. The purpose
of Kefir project is producing an independent C17/C23 compiler with
well-rounded architecture and well-defined scope that is feasible for
implementation by a single developer.
- Especially, in terms of raw performance, Kefir might be lacking compared to more performance-focused projects. The author still sees many low-hanging fruits in register allocation, optimization passes, instruction selection, scheduling, etc.
- In terms of GNU C compatibility, Kefir implements sufficient amount of
extensions and builtins to be practically useful. Exhaustive list of
builtins is available at
source/tests/end2end/supported_builtins1/lib.asmgen.expected.
- After taking into account certain quirks (see Implementation quirks), Kefir can be used as a near-drop-in replacement for host C compiler to compile and successfully run major well-known C projects, as demonstrated by Testing and validation.
- In terms of C17/C23 compatibility, the author's intention is close-to-near compability with language standards. Any behavioral divergence not documented in the Implementation quirks shall be considered a bug in the compiler.
#Testing and validation
#Own test suite
The own test suite of the Kefir compiler is maintained by the author as part of the project code base. As a general rule, own test suite is extended to cover any changes made in the compiler. Exceptions are made when existing tests already cover the change, or when a change cannot reasonably be tested (e.g., reproducing a specific bug would require a prohibitively long case). Own test suite includes the following categories of tests:
- Partial tests -- exercise individual components, subsystems or subsystem
combinations. Rely on hand-crafted initialization:
- Unit tests -- Kefir implements custom unit testing library.
- Integration tests -- uses snapshot testing techniques (i.e. comparison with expected output) to test individual subsystem.
- System tests -- initializes Kefir submodules to output assembly code, assembles and links it with a counterpart module built by host C compiler. The counterpart module provides a harness to test Kefir outputs.
- end2end tests -- cover the complete compiler pipeline, consist of a set of
*.kefir.cand*.host.cmodules, which are built by Kefir and host C compiler respectively, linked together and executed. Typically, host modules provide a harness to test Kefir similarly to system tests. Optionally, end2end tests might also include snapshot (asmgen) tests.- Selected test cases generated by CSmith are also included into end2end test suite.
Historically, development relied mainly on partial tests before the compiler pipeline was complete. Today, most new work is validated primarily with end2end tests.
In continuous integration environment on Linux glibc and FreeBSD platforms, own test suite is executed with Valgrind and undefined behavior sanitizer. Furthermore, on all supported platforms special "self-test" run is executed, where Kefir acts as host compiler.
Consult ubuntu.yml, ubuntu-musl.yml, ubuntu-self.yml, freebsd.yml,
freebsd-self.yml, openbsd.yml, netbsd.yml from .builds directory for
detailed setup for own test suite execution on the platform of choice.
#Bootstrap test
On all supported platforms, Kefir also executes reproducible bootstrap test:
- Kefir is built with host C compiler normally. This build is referred to as
stage0. stage0kefir builds itself to producestage1. All intermediate assembly listings are preserved.stage1kefir builds itself to producestage2. All intermediate assembly listings are preserved.- Assembly listings from
stage1andstage2shall be identical. Furthermore, sha256 checksums forkefirexecutable andlibkefir.solibrary fromstage1andstage2shall be identical too for bootstrap test to succeed.
Bootstrap test is performed within fixed environment (i.e. standard library, assembler, linker versions are not changed during the test), and demonstrates that Kefir is able to produce identical copies of itself. On Ubuntu, bootstrap is performed using both GNU As and Yasm as target assemblers.
Consult ubuntu-other.yml, ubuntu-musl.yml, freebsd.yml, openbsd.yml,
netbsd.yml from .builds directory for detailed setup for bootstrap test
execution on the platform of choice.
For practical purposes, Kefir can be bootstrapped by specifying itself as a CC
compiler:
make CC=$(which kefir) -j$(nproc)
This form of bootstrap does not verify reproducibility, it simply rebuilds the compiler using itself.
#Portable bootstrap
Portable Kefir bootstrap procedure as described in the Installation section is also used in a role of an additional test. The portable bootstrap omits bit-precise reproducibility check, but performs iterative rebuild of complete toolchain (musl libc, GNU As, GNU ld) at each stage. Therefore, it ensures that Kefir is capable of producing a self-sustaining development environment.
#c-testsuite and gcc-torture suites
On all supported platforms, Kefir executes the following external test suites:
- c-testsuite -- a smaller test suite, relies on compiling and executing test cases, and comparing their output to an expected snapshot. Out of 220 tests, 3 test files rely on non-standard extensions and are skipped, the rest shall pass.
- GCC Torture suite -- a test suite imported from gcc 15.2.0, consists of independent test cases that perform self-testing in a form of assertions. The test suite heavily relies on gcc-specific features, therefore higher degree of failures is expected. As of current version, out of 3663 tests, Kefir fails 429 and skips 29. Note that the exact number of failed tests might slightly vary depending on the target platform and hardware performance (due to enforced 10 second timeout for execution). Reported number is the best-case result on Ubuntu glibc. Furthermore, note that in order for test to succeed, none of the failures shall be caused by fatal issues, aborts, segmentation faults or caught signals, either at runtime or in compile time.
- GCC test suite
_BitIntbits -- a separate set of 71 tests imported from gcc 15.2.0 to ensure correct implementation of bit-precise integers from the C23 standard. All tests from this suite shall run successfully.
#Lua basic test suite
On all supported platforms, Kefir is used to build Lua 5.4.8/5.5.0 and execute its basic test suite, which should pass completely. Purpose of this test is demonstration that Kefir is able to successfully build non-trivial software on the target platform. Technically, this is a part of the external test suite (see below), and its inclusion into the general test runs has happened for historical reasons.
#Fuzz testing
After release 0.5.0, Kefir testing discipline has been expanded to include 20'000 randomly generated csmith cases per nightly test suite run. Thus far, Kefir has successfully passed at least 2'500'000 random tests so far. Testing is differential against gcc --- for all test cases that can be compiled and executed by both kefir and gcc within given timeout, outputs shall be identical. All failing cases are fixed and added to the own test suite.
#External test suite
This is a suite of 100 third-party open source projects that are built using Kefir with subsequent validation: for most projects, their test suite is executed; where this is not possible, a custom smoke test is performed; for the minority, the fact of a successful build is considered sufficient. Purpose of the external test suite is:
- Establishing correctness and real-world applicability of the Kefir compiler. Compiling many well-known software projects, such as GNU binutils, coreutils, Ruby, Python, Perl, OpenSSH, OpenSSL, zsh and others demonstrates of Kefir capabilities.
- Tracking regressions during the development cycle. The external test suite is diverse enough to be sensitive to possible regressions, quickly exposing many newly introduced deficiencies. This helps author to resolve problems quickly and establish confidence.
- Document resolution of challenges arising when building real-world software with Kefir on Linux glibc. Such challenges include mitigating glibc-related issues, fixing build system assumptions, etc.
Except for Lua, the external test suite is executed exclusively in Linux glibc
environment as defined by dist/Dockerfile. Primary reason for that is resource
constraints. Execution of the external test suite is fully automated:
make .EXTERNAL_TESTS_SUITE -j$(nproc)
make .EXTERNAL_EXTRA_TESTS_SUITE -j$(nproc) # only for zig-bootstrap, see below
The external test suite (except for zig-bootstrap) is executed on a daily basis on current development version of Kefir, as well as at pre-release stage.
All source archives of third-party software included in the external test suite are mirrored at project's website under release validation section for reproducibility and completeness purposes, starting from version 0.5.0. By default, all external tests still use the original upstream links to the third-party software sources, however these can optionally be replaced with an archival version. Kefir provides necessary scripts for transparent redirection of upstream links to the archive.
#Limitations
- Generally, most of the source code is built and executed "as-is" without any modifications. However, due to multitude of reasons (bypassing glibc quirks, fixing hard-coded compiler assumptions, replacing exotic non-standard idioms, ignoring deliberately suppressed test case, etc.) patches might be applied. The author has established the following principle: any such patch shall be trivial and exceedingly small, non-trivial changes are never considered.
- Individual test cases of some projects might get suppressed. Reasons for that typically include too strong assumptions about the compiler or the environment, there do exist several test cases where the author is unsure about the exact reason of instability. However per author's estimate >99% of individual tests do pass successfully unmodified.
- Test suite of this size might exhibit certain degree of flakiness naturally. During some of the daily builds, the author has observed failures that were unrelated to any of the compiler changes, but due to such reasons as: network failures, calendar date changes, CPU throttling and related slowdowns (certain tests rely on timings). Such failures typically are one-off and are never observed repeatedly. Note that running the suite on different hardware might possibly expose some failures that were not observed by the author.
The author believes that outlined limitations do not undermine purpose and utility of the external test suite.
#Structure of the external test suite
The software included into the external test suite can be broadly grouped as
follows. Provided software list is not exhaustive, please look up the
source/tests/external for complete details and specific versions. As a general
rule, the author performs upgrades for most packages prior to each Kefir
release.
- Widely used software packages -- GNU Bash, Binutils, Bison, Coreutils, Curl,
GNU Awk, Git, Guile, Gzip, ImageMagick, libraries such as
expat/gmp/jpeg/png/uv/xml2, Lua, GNU Make, Memcached, Musl, Nano, Nasm, Nginx,
OCaml, OpenSSH, OpenSSL, PCRe2, Perl, PHP, PostgreSQL, Python, Redis, Ruby,
SQLite, tar, Tcl, Vim, Wget, xz, zlib, zsh, zstd, and some others. This is the
largest group, and it serves all purposes outlined above: correctness,
regression tracking, and documenting real-world build capability and
challenges. GCC 4.0.4 bootstrap procedure also belongs to this group.
- zig-bootstrap technically belongs to this group too, but due to unreasonable CPU time and memory requirements it is executed less frequently.
- Problem-specific software -- c23doku, jtckdint and couple of other small early adopters of C23. The main role of this group is testing some specific aspects of Kefir (e.g., checked arithmetic builtins for jtckdint, C23 feature support for others).
- "Reciprocal" projects -- hummingbird, libsir, oksh, slimcc, tin. These are projects that have acknowledged Kefir existence in some way, often early on. As a gesture of reciprocity, they are included in the external test suite.
#Nightly and pre-release test runs
Nightly and pre-release test runs largely coincide for Linux platform, and are
encoded by scripts/pre_release_test.sh script that encompasses all stages
described above. The script is to be executed in the environment as defined by
dist/Dockerfile. In addition, nightly runs include at least 4 CI manifests
randomly sampled from .builds directory. Pre-relase run imposes additional
requirements:
- All CI manifests from
.buildsdirectory shall run and pass. - DragonflyBSD tests as specified by
.builds/dragonflybsd.shscript shall pass. - Zig-bootstrap test shall pass in the environment as specified by
dist/Dockerfile.
scripts/pre_release_test.sh discipline includes own test suite in all
configurations (with glibc & musl gcc/kefir host, clang host), reproducible
bootstrap test in all configurations (GNU As & Yasm targets with glibc & musl
libc), portable bootstrap run, run of the external test suite (with exception
for zig-bootstrap).
Nightly tests are executed upon every change to the codebase, batched per day, on a shared-processor VPS with the following specs: AMD EPYC Rome CPU (4 cores), 8 GB of RAM and 8 GB of swap.
Pre-release tests are executed upon every merge to the master branch, which
coincides with tagging a release.
#Pre-release testing
Starting from the version 0.5.0, each Kefir release will be accompanied with the following artifacts:
- A complete set of logs for the whole run of
scripts/pre_release_test.shwill be preserved and published, along with an archive of external test sources and a container image with the test execution environment and all build artifacts and intermediate files. - zig-bootstrap external test will be performed in the same environment on a different machine. A complete set of logs and a container image will be published too.
- A set of logs produced by the SourceHut build service for all continuous
integration manifests from
.buildsdirectory. These builds include test runs for Linux, FreeBSD, OpenBSD and NetBSD in accordance with the requirements outlines in Supported environments section.
All artifacts will be published in auditable form along with release source code at Kefir website and signed with author's PGP key.
#Optimization and codegen
#Intermediate representations
Kefir structures compilation pipeline into multiple intermediate represetations between AST and code emission.
The pipeline is segmented by abstraction level into 3 parts. Target-independent part includes high-level representations that share the same execution semantics (core set of opcodes), but differ by control & data flow representation: linear stack-based IR and structured optimizer SSA (memory SSA is complementary and derived from optimizer SSA as part of some optimization passes). Target-specific part is further segmented based on resource management strategy: virtual representations use virtualized CPU registers characterized by type and allocation constraints, whereas physical 3AC encodes actual register names. Target-specific part too includes representations with different control & data flow shape sharing the same execution semantics.
Philosophically, Kefir optimization pipeline is structured along two dimensions: abstraction level and concern. The abstraction level defines the degree of source language and machine-specific information available at a particular point, specifying set of available operations and data types. The concern defines raison d'etre for the particular intermediate representation --- executable or analytical --- and thus specifies shape of control & data flow serving stated goal. Core idea is that executable IRs (stack-based and 3-address code) shall have reasonable operational semantics allowing for direct execution by a (virtual) machine of appropriate architecture, whereas analytical representations shall be amenable for analysis and transformation. Furthermore, Kefir enforces hard boundaries between IR families sharing the same abstraction level, ensuring that each family is self-sufficient and carries all information necessary to express program semantics. Each lowering boundary targets executable form of the underlying family, thus enabling simple procedural lowering relieved from the need to construct appropriate control & data flow structures. Therefore, Kefir optimization pipeline can be imagined as vertical zigzag shape in two-dimensional space.
Such design philosophy may contradict fashionable modern approaches (e.g. MLIR). The author motivates this structure as better suitable to satisfy the following requirements:
- Evolvability --- deliberate separation between executable and analytical forms enables early code emission, thus facilitating faster feedback loop at early stages of compiler construction. Kefir project evolution follows this pattern: early versions of the compiler used stack-based IR as actual operational model for code emission, generating stack-based threaded code for x86_64, optimizer SSA and 3AC were added later, superseding the original backend, and target IR was elaborated from the devirtualization of 3AC as the final development.
- Debuggability --- each abstraction-sharing IR family provides complete set of primitives to express program semantics, which enables quicker isolation of transformation/lowering issues and facilitates reasoning about each individual stage. The author has found that approaches that mix different abstraction levels (dialects) within the same pipeline with complex legalization rules lead to worse comprehension of program semantics and available operations at each particular point, as well as dupliction of equivalent operations across dialects. In Kefir pipeline, legalization largely coincides with lowering between IR families.
- Extensibility --- each individual intermediate representation might serve as a separate, stable target for adding extensions and plug-ins. Each extension is provided with a coherent and complete view of a program at desired abstraction level without need to concern itself with available operations and transformation passes that preceed or succeed the extension point. While Kefir currently does not offer extension mechanism outside of front-end, such integrations are in principle possible within the current design.
- Flexibility --- while this might seem contradictory at first, the author considers this compilation scheme to be more flexible. It does not require pre-emptive commitment to exact transformation pipeline structure, beyond defining for semantics of each abstraction level. Transformations for each abstraction family can be freely re-ordered and restructured, as each abstraction provides coherent view of a program with rich set of available operations that act as a substrate for any transformation at that level.
#Stack-based IR
Stack-based IR is a complete representation of an executable module. Apart from executable code, it includes symbol information, type & function signatures, global data definitions, string literals, inline assembly fragments.
From execution perspective, each function of stack-based IR is characterized by:
- A virtual unbounded stack. The stack has no fixed element type or width, scalar and complex values are handled uniformly. Aggregates are managed by-reference.
- Unstructured linear flow of instructions that operate on the stack, governed by unconditional jumps and branches. Each instruction might have optional immediate, type identifier or code reference parameters.
- A set of typed addressable local variable slots with unique identifiers and optional scopes. The slots are defined within the instruction flow and instantiated upon first reference.
The stack-based IR provides and isolation level between the frontend and middle- and backend of Kefir, encapsulating all target-specific details and providing a unified abstraction to upper layers. Beyond the container for the code, stack-based IR provides a set of APIs for the frontend to retrieve target-specific information (type layouts, sizes, alignments, etc).
List of stack-based IR opcodes is available in headers/kefir/optimizer/opcode_defs.h and headers/kefir/ir/opcode_defs.h (the former file includes several SSA-specific opcodes too).
Stack-based IR represents executable form along concern dimension. Earlier versions of kefir used it as operational model for generating stack-based threaded code.
#Optimizer IR
Optimizer IR is an analytical counterpart to the stack-based IR. It uses a flavour of SSA form with partial ordering of side-effect free operations. Optimizer IR is characterized by:
- An explicit control flow graph of basic blocks, owning individual instructions.
- Tight correspondence between instructions and values. Each instruction itself represents a value, a concept of variable and copying operation does not exist. Optimizer SSA is always in non-conventional form.
- A linear "control flow" chain of side effectful instructions within each basic block. Side effectful instructions include function calls, memory operations, inline assembly, block terminators. In principle, instruction opcode does not determine whether it belongs to the control flow chain (except for block terminators): there may exist partially ordered calls or memory accesses outside of the control flow chain, as long as they operate on a disjoint segment of program state. Rather, this chain reflects source-level side effect order of the program.
- A directed acyclic graph of side-effect free instructions (i.e. pure computations) within each basic block. Position of each individual instruction within the graph is determined solely by its incoming data flow edges without additional ordering constraints.
- Each instruction, irrespective of its type, can depend both on side effectful and side effect-free instructions from current or other basic blocks. Therefore, dominance relation is relaxed reflect relative sequencing of instructions based on their data flow dependencies and position in control flow. Such structure provides a set of constraints for scheduling without specifying the exact schedule. Compiler is free to assume any legal linearization.
- Data flow is organized via usual phi-instructions. Phi is always side-effect free.
Outside of code representation, the optimizer IR shares other aspects of program sematics (symbols, type & function signatures, etc) with stack-based IR. List of optimizer IR opcodes is available in headers/kefir/optimizer/opcode_defs.h.
The author considers the outlined design to be the most suitable for C compilation and overall beneficially-positioned within the spectrum of SSA forms between LLVM IR and Sea-of-Nodes style extremes. In particular,
- The design naturally mirrors distinction between pure computation and side-effectful operation present in many programming language. In C, this distinction is especially pronounced as the standard deliberately defines the abstract machine semantics in terms of sequence points and "as-if" rule.
- Side-effectful operations are organized in familiar fashion via basic blocks and linear chains of operations within them. Certain portion of criticisms of Sea-of-Nodes approach focuses on complexity of unified representation for data & control flow dependencies, which harms debuggability and comprehension. Kefir tries to avoid this by preserving traditional approach.
- Side effect-free operations use block-local directed acyclic graph without well-defined position an instruction within its basic block. In terms of theoretical expressivity, it is not different from block-locally linear approach of LLVM which permits rescheduling respecting data flow. However, the author considers modification and transformation of such structure to be simpler task as it does not require specifying precise insertion points for new instructions. Majority of the instructions are side effect-free and therefore many transformation passes can just "throw" instruction DAG fragments into their respective basic blocks without regard for precise position, which will be determined later at scheduling stage. In principle, block-local linearization similar to LLVM can be achieved simply by over-specifying control flow chain, however Kefir does not any re-scheduling facilities to account for such over-specification.
- This structure naturally induces dead-code elimination upon traversal of control flow chain and transitive data dependencies of reachable basic blocks upon scheduling. Kefir provides separate DCE transformation pass only for canonicalization measure to simplify certain other passes.
#Memory SSA
Memory SSA is subordinate to optimizer IR and is constructed from it for certain optimization passes. Memory SSA is constructed by scanning alive instructions within optimizer IR CFG for memory effects (memory accesses, function calls, inline assembly), resulting in a graph consisting of the following nodes: root (function entry point), terminate (function return), produce (write-only memory operations), consume (read-only memory operations), produce-comsume (read-write) and phi. Produce/consume nodes link back to their inducing optimizer IR instructions. Root, produce and produce-consume nodes define a new version of the entire memory which can be consumed by consume, produce-consume and terminate nodes. Distinction between produce and produce-consume nodes serves to reflect the behavior of an operation with respect to memory location it modifies.
Compared to optimizer IR, memory SSA omits basic block structure and linearizes partial ordering of optimizer IR into an arbitrary total order permited by control & data flow. The latter transformation is valid because the optimizer IR shall ensure that any two partially memory accesses necessarily operate on disjoint segments of memory. Omission of basic blocks is possible because memory SSA does not represent control flow or any other computations explicitly.
#Optimization pipeline
Kefir includes the following high-level optimization passes at -O1 level:
- Function inlining -- the optimizer performs inlining early on in the pipeline, guided exclusively by the annotations provided by the programmer at source code level. Kefir does not implement any heuristics for inlining a function, however certain inlinings might be disabled due to excessive inlining depth. The author does not view this as an optimization per se, but as faithful implementation of programmer's annotations.
- Local variable promotion to registers (
mem2reg) -- the optimizer identifies scalar and complex local variables whose addresses never escape a function and never alias, and promotes these local variables into SSA values. This is a cornerstone optimization that effectively enables most further analyses. - Phi removal -- the optimizer identifies redundant SSA phi nodes and webs that can be eliminated.
- Constant folding -- the optimizer identifies all constant subtrees of SSA and folds them.
- Simplification -- these optimization passes combine canonicalization, optimization and simplification of many diverse instruction shapes. Simplification is implemented as ad-hoc pattern matching upon the optimizer IR and runs until fixpoint is reached.
- Local allocation sinking -- the local variable allocations that have not been
eliminated by the
mem2regpass are moved closer to their actual uses to make stack frame layout more dense. - Memory SSA -- the optimizer constructs memory SSA representation out of optimizer IR and performs load-load, store-store, store-load and zeroing-store optimizations. Currently, the efficiency of this optimization passes are limited primarily by conservative non-path-sensitive alias and escape analyses.
- Scalar replacement of aggregates -- the optimizer pass piggybacks on
mem2regand alias analysis infrastructure, identifying segments of local variables that are accessed in disjoint non-aliasing manner and never escape. As a result, these fragments get promoted into SSA values. This optimizatiom pass can be viewed as generalization ofmem2reg. - Global value numeric (
gvn) -- the optimizer identifies instruction subtrees that are identical across the function code, and de-duplicates them. Where necessary, the de-duplicated subtrees are hoisted. Thegvnpass only works on integral scalar instruction subtrees that do not contain side effects, thus it is very conservative. By preceeding it with mem2reg, memory SSA and sroa passes, certain portion of memory accesses get promoted into SSA values which makes GVN transformation more capable. - Loop-invariant code motion (
licm) -- the optimizer identifies loops and groups them into nests. Instruction subtrees within a loop that are not dependent on any of loop values are hoisted to the outer levels of loop nests or outside it. As opposed to GVN, LICM integrates memory SSA to hoist side-effect free memory loads, and loop-invariant stores in cases where the loop is guaranteed to execute. - Loop removal -- under assumption of side-effect free loop termination provided by the C language standard, certain loops are eliminated.
- Dead code and allocation elimination -- the optimizer identifies and eliminates dead instruction and local variable allocations. While this pass is not necessary for code generation within current optimizer IR framework, it simplifies certain subsequent passes that do not need to consider instruction dependencies anymore.
- Block merging -- the optimizer identifies basic blocks that can be safely merged, does the merge and eliminates the control flow edge.
- Tail call optimization -- as the final part of the optimization pipeline, the optimizer identifies potential tail calls. It performs conservative escape analysis to verify that none of local variable adresses could have escaped the function. The optimization pass does not consider any target-specific aspects, therefore the final decision to perform a tail call is done at code generation stage.
All optimization passes as described above are strictly optional from code
correctness perspective. In addition to these passes, Kefir implements a
lowering pass as part of the pipeline. The lowering pass is necessary to
transform arbitrary-precision arithmetic instructions (used for implementing
_BitInt from the C23 standard) and certain software floating point operations
into either optimizer-native instruction arithmetic instructions or supporting
routine calls (see Runtime library below). Lowering does not introduce any
target-specific details into the IR.
Optimization levels: at the moment, Kefir supports two optimization levels
-O0 and -O1 (anything else is considered equivalent to -O1). Both levels
include function inlining, local allocation sinking, dead code and dead
allocation elimination and lowering passes. In addition, -O1 contains all
passes described above with some repetitions. Consult source/driver/driver.c
for the precise optimization pipeline, and consult the manual page for
command-line options to define the optimization pipeline passes explicitly.
#Virtual three-address code
Virtual 3AC represents a shift along the abstraction dimension axis into the target-specific family with virtualized resource management. In principle, virtual 3AC can be viewed as x86_64 assembly with virtual registers and spill area segments, but technically Kefir separates the container for 3AC (instruction structure, values, label attachment, virtual register types and constraints) from specific instantiation for x86_64. Kefir implements lowering from optimizer IR into x86_64 3AC via simple procedural instruction selection with minimal number of instruction variants and minimal fusion of particularly suitable optimizer IR opcodes. Many optimality concerns, including alternative instruction variants, larger patterns, fusion, addressing modes are shifted into target IR stage. Furthermore, virtual 3AC does not concern itself with legality of any specific instruction shape, accepting any combination of operands --- legalization happens only upon destruction of target IR into physical 3AC.
The predominant approach to encoding precise register requirements are virtual
register constraints that specify pre-coloring for register allocator. Virtual
register constraints are used to encode both ABI (e.g. calling convention) and
ISA (e.g. implicit register operands) specific requirements, therefore relieving
post-instruction selection stages from reasoning about these requirements
outside of mechanical constraint satisfaction. Typically, for constrained
virtual register, instruction selector also issues special instructions (see
below) to ensure minimum required lifetime. While 3AC provides a way to specify
physical registers directly, appeance of these at virtual stage is limited by
very specific code fragments in function prologue and epilogue, special
non-allocatable registers (e.g. rsp, rbp, segment registers), or placements
that are guarded by constraints of surrounding virtual registers (vanishingly
small number of cases). In all cases, the rest of pipeline is allowed to operate
under assumption that specified physical registers never interfere with register
allocation or any other decisions.
General set of supported x86-64 opcodes is available in
headers/kefir/target/asm/amd64/db.h and
special opcodes are in
headers/kefir/codegen/amd64/asmcmp.h.
Among special opcodes, link is used as a polymorphic mov operation between
virtual registers of any type, touch and weak_touch represent virtual
register lifetime extension operations, with the latter being reserved for
ABI-induced restrictions (erased after target IR contruction), produce
represents fresh definition of a virtual register with unspecified value ---
this one is necessary because in x86-64 use-define chains are often blurry and
certain instructions (e.g. xor %eax, %eax) provide pure definitions while
technically being RMW with no-op uses.
Virtual 3AC represents executable form along concern dimension. While it shall be executable by a virtual x86-64 CPU with unbounded number of registers, historically Kefir used it in conjunction with physical 3AC, implementing simple register allocation and devirtualization scheme for legalization of instruction shapes. In current version, virtual 3AC gets converted into target IR for more powerful optimizations.
#Target IR
Target IR is an analytical counterpart to target-specific Virtual 3AC. It represents state of x86-64 machine with virtualized resource management in SSA form. Target IR is characterized by:
- An explicit control flow of basic blocks, with total linear order of instructions within basic block
- One-to-many relationship between instructions and values. Each instruction may have zero or multiple output values, each value is uniquely identified by a pair of instruction identifier and an aspect identifier.
- Value aspect identifies class of the value: direct (represents virtualized resources such as general purpose and floating-point registers), resource (represents unique named entities within machine state such as individual flags, x87 stack) and indirect (represents memory effect). An instruction may produce multiple values with the same aspect class, distinguished by sequence numbers.
- Each value is associated with a type which is characterized by kind, variant and constraint. Kinds of direct values include general-purpose, floating-point, spill space, external memory pointers, kinds of resources and indirect values are fixed. Variant specifies portion of the output storage (e.g. 8/16/32/64 bits) computed by the instruction directly. Note that the output stoage (register, spill space slot, etc) is assumed to be modified in entirety irrespective of variant, which only serves to distinguish direct computation result from tied portion transferred from the input operand. Constraint is communicated to the register allocator.
- Each instruction has zero or multiple operands, whose structure mirrors virtual 3AC operands, substituting virtual registers with value references and label with block references. Value references have variant and tied flag, where the variant indicates portion of the value used for computation directly and tied flag marks whether the instruction might transfer other bits from this input parameter onto its output value.
- Compared to virtual 3AC, each instruction might have higher number of input operands and outputs because upon construction of target IR all implicit parameters and effects are made explicit.
- Read-modify-write instruction parameters are decomposed into input parameter and outputs upon construction from virtual 3AC. Tying parameters back happens during destruction, or temporarily in certain passes that need to distinguish precise instruction shape.
- Values of direct and resource class behave identically upon construction and transformation. Distinction appears during register allocation and destruction, where direct values are assigned with appropriate backing storage automatically, whereas resources are assumed to be well-behaved without enforcement. Thus, for resources SSA form serves as explicit data flow bookkeeping to be preserved by any transformations. The difference is justified by the fact that target architecture may lack mechanisms for efficient manipulation of individual resources (e.g. individual CPU flags) which would make automatic management of backing storage problematic.
- Target IR permits presence of physical registers in principle, however these may only appear under assumption that they may never interfere with register allocation or any other decisions taken by target IR, in the same sense as explained in virtual 3AC section.
To illustrate the target IR structure, consider a code fragment representing
cdq -> idiv operation in x86-64 which normally includes multiple implicit
registers with RMW operations and modifies CPU flags (note: kefir prints IR in
JSON format, syntax below is semantically equivalent but manually condensed for
brevity).
(%42:direct[0] gp variant default requires rdx) = cdq (%41:direct[0] variant 32bit !tied)
(%43:direct[0] gp variant default requires rax), (%43:direct[1] gp variant default requires rdx),
(%43:flag_sf), (%43:flag_of), (%43:flag_pf), (%43:flag_cf), (%43:flag_zf) =
idiv (%40:direct[0] variant 32bit !tied) (%41:direct[0] variant 32bit tied) (%42:direct[0] variant 32bit tied)
Which can be compared against the equivalent in MachineIR of LLVM.
The author considers target IR design to have the following beneficial properties:
- Uniform representation of most relevant x86-64 machine resources. Target IR avoids the need specify ad-hoc properties and special forms outside of SSA framework.
- Explicit and complete representation of inputs and outputs, including for RMW operations. Target IR lifts these into SSA form, and for most used resources ensures automatic backing storage management. The number of instructions with special handling is minimized.
- SSA-native live range and use tracking for all values. target IR discourages direct encoding of physical registers, instead attaching allocation constraints to values. Therefore, standard data flow analyses are available without need to track special attributes.
With this, target IR implements following transformations:
- Global "peephole" optimizations --- many traditional peephole optimizations are relaxed to match patterns in data flow across the entire function. These include selection of different instruction shapes, various simplifications, dead code elimination, constant propagation, folding instruction sequences into sophisticated addressing modes.
- Multi-stage register allocation --- target IR undergoes reversible out-of-SSA transformation prior to register allocation. Currently target IR implements simple evicting register allocator without live range splitting. Upon obtaining preliminary register assignment, the target IR code is trivially restored back into SSA form for further optimizations guided by spilling information generated by the register allocator, before the final register allocation pass. This scheme mitigates certain weaknesses of simplistic register allocator while keeping the pipeline tractable.
- Local hot copy insertion and rematerialization guided by preliminary register allocation results.
#Physical 3AC
Physical 3AC represents the lowest-level abstraction family in it's executable form. It encodes target machine-specific representation of the code with already allocated physical resources. Physical 3AC is characterized by:
- Shared container with virtual 3AC. Both IRs use the same instruction structures, with primary difference being the type of register operand. Physical 3AC disallows presence of virtual registers.
- All instructions should appear in legalized form as a result of target IR destruction. Code emission from physical 3AC is trivial single-pass operation.
- Physical 3AC permits several special forms of spill space addressing (spill slots, local variables). Code emitter is supplied with stack frame layout that is used to resolve these into frame base relative addresses.
- Physical 3AC is generally encoded as x86-64 instruction sequence with operand order corresponding to Intel syntax. Code emitter for physical 3AC implements several possible syntax targets (GNU As AT&T, Intel with/without prefixes, Yasm).
#Debugging information
Kefir supports generation of debugging information for GNU As target assembler.
Generated debug information is in DWARF-5 format, and includes mapping between
assembly instructions and source code locations, variable locations, type
information, function signatures. The author has made best-effort attempt to
preserve variable locations across the optimizer pipeline, however certain
optimizations at -O1 level might disrupt debugging experience significantly.
#Runtime library
With exception for non-native atomic operations which require libatomic,
decimal floating-point (libgcc) and thread-local storage, Kefir generates
self-contained assembly listings and requires no runtime library of its own.
Code generator typically inlines implementations for most of operations into the
target function directly. The sole exception to this are arbitrary-precision
arithmetics operations, that are necessary to support _BitInt feature of the
C23 standard, and certain software floating-point operations for complex
numbers. For these operations, Kefir issues function calls and appends necessary
functions with internal linkage to the end of the generated assembly listing.
#Goals and priorities
As a project, Kefir has the following goals, in order of priority:
- Independence. Within its scope, which is C17/C23 source code to assembly translation, Kefir shall be independent and do not rely on any parsing, compiler or code generation frameworks/libraries. Outside of the scope, Kefir shall integrate with system toolchain components.
- Correctness, compatibility and compliance. Kefir shall remain compliant with:
- C17 and C23 language standards. Any deviation from the standards, unless explicitly documented as a quirk, shall be considered a bug in Kefir.
- System-V AMD64 ABI. Kefir shall produce code that can be freely linked with object files or libraries produced by other compilers on the same platform without introducing any ABI issues.
- Other relevant documents (DWARF-5 standard, Thread-local storage models). Kefir shall comply where possible with other documents defining the platform binary interface or environment to facilitate complete compatibility.
- Popular C language extensions (GNU C extensions, gcc built-ins). Implementing complete set of language extensions is not a goal of Kefir, but a reasonable amount of extensions shall be supported in order to compile real-world software (such as included in the external test suite). Degree of support and compatibility with each particular extension might vary.
- Command-line interface. Kefir implements
cc-compatible command line interface, extended with certain options supported by gcc or clang compilers. The goal is to serve as a drop-in replacement ofccin cases where limitations documented in Implementation quirks are observed.
- Limited scope. Kefir is focused on source-to-assembly translation and integration with other system tools. Implementation of other parts of toolchain (libc, assembler, linker) or other languages/dialects is currently not considered a goal for the project.
- "Well-roundness". Kefir shall exhibit reasonable architecture with all stages expected from a credible C compiler. Performance (both run and compile time) of certain stages might be lacking, however all stages shall be present and architecturally sound. Project architecture shall permit iterative approach at refining certain stages and improving the compiler.
- Performance. Once all other goals are reasonably observed, the project might include enhancements in performance, in the broadest sense. The notion of performance in this context includes both compiler performance and efficiency of produced code. Optimizer passes and code generated as described in the section Optimization and codegen are first and foremost defined to satisfy the correctess and well-roundness condition, with performance enhancements coming later.
- Portability. Kefir shall be portable across Unix-like x86_64 systems. Support for non-x86_64 or non-Unix platforms is currently considered non-goal, but that might change in future.
#History and future plans
The project has been in active development since November 2020. In that
time-span, the author has released several intermediate versions, with complete
descriptions available in the CHANGELOG. It shall be noted that the versioning
scheme is inconsistent, and can be characterized as "vibe-versioning" (i.e.
absence of strict versioning scheme and relying on author's personal feeling
about the release).
- 0.1.0 -- released in September 2022. Represents the first two years of working on the project, and provides basic C17 compiler with several omissions targetting Linux and BSD systems, capable of bootstrapping and building a few real projects. The compiler is based on threaded-code execution model and does not perform any optimizations.
- 0.2.0 -- released in July 2023. The project had introduced a new optimizer IR and code generator, however all optimizations are deferred to the subsequent releases.
- 0.3.0 -- released in August 2023. The project had been augmented with some optimizations and usability enhancements.
- 0.3.1 -- released in November 2023. The project had acquired a new code generator. At this point, project's architecture transforms into its current form.
- 0.4.0 -- released in September 2024. Implementation of missing C17 features, debug information generator, introduction of the external test suite.
- 0.4.1 -- released in February 2025. Major improvements in generated code correctness and compatibility, significant extension of the external test suite.
- 0.5.0 -- released in September 2025. Includes substantial refactoring and improvement of optimizer IR structure, new optimization passes, C23 support, another significant extension of real-world compatibility.
- 0.5.1 -- released in April 2026. Includes completion of C23 support (decimal
floating-point, imaginary floating-point,
STDCpragmas), substantial rewrite of code generation layer (see "Target IR" section of Optimization & codegen), optimizer improvements including integation of conservative memory analysis via memory SSA, 128-bit integer support, extension of real-world compatibility.
The author does not make any promises or commitments regarding future development. Any commit to the project might be the final one without prior notice. Nevertheless, if development is terminated or indefinitely paused, the author will attempt to communicate this clearly. Furthermore, should any bugs in already published code be discovered after active development cessation, the author might issue limited fixes addressing the issue.
#Distribution
Kefir is distributed exclusively as source code, which can be obtained from the following sources:
The author publishes release tarballs at the project's
website. The author recommends to obtain the
source code from master branch of any of the official mirrors, as it might
contain more up-to-date code and each merge to that branch is tested as
thoroughly as releases.
In addition, the author maintains two PKGBUILD build scripts at ArchLinux User
Repository: kefir and
kefir-git.
The author is aware of kefir packages produced by the third parties. The author is not affiliated with any of these package maintainers, so use at your own discretion. Packages might be outdated or otherwise problematic:
#License
The main body of the compiler code is licensed under GNU GPLv3 (only) terms, see
LICENSE. Please note the only part: Kefir does not include any "later
version" clause, and publication of new GNU GPL versions does not affect the
project.
The arbitrary-precision integer handling routines (headers/kefir_bigint) and
runtime headers (headers/kefir/runtime) are licensed under the terms of BSD
3-clause license. Code from these files is intended to be included into
artifacts produced by the compiler, therefore licensing requirements are
relaxed. Furthermore, when these files are used as part of normal compilation
pipeline with Kefir, their licensing can be treated as being in the spirit of
GCC Runtime Library
exception. In such cases,
the author does not intend to enforce redistribution clauses (#1 and #2) of BSD
license in any way.
For clarity, most source files in the repository include a license and copyright headers.
#Contributing
The author works on the project in accordance with extreme cathedral model. Any potential external code contributions shall be discussed in advance with the author, unless the contribution is trivial and is formatted as a series of short commits that the author can review "at a glance". Any unsolicited non-trivial merge requests that did not undergo prior discussion might get rejected without any further discussion or consideration.
Nevertheless, the author welcomes non-code contributions, such as bug reports, bug reproduction samples, references to relevant materials, publications, etc.
#Useful resources and links
Fundamental information:
Useful tools:
Supplementary information:
Kefir-specific links:
Trivia:
#Acknowledgements
The author would like to acknowledge (in no particular order) many different people that have influenced author's intention, motivation and ability to work on Kefir:
- The original authors and designers of the C programming language and UNIX: Dennis Ritchie and Ken Thompson. Undoubtedly, any of the work on the Kefir is only possible because the author is standing on the shoulders of giants.
- The original authors of major C compilers -- Richard Matthew Stallman of GNU Compiler Collection, and Chris Lattner or Clang -- as well as Fabrice Bellard who is the author of Tiny C Compiler. Works of these people have inspired creation of Kefir.
- The authors and contributors of software projects that Kefir relies upon in any way in its build or development process: Linux kernel, FreeBSD, OpenBSD, NetBSD, DragonflyBSD, GNU project, Clang, Musl libc, CSmith, and any other smaller projects.
- In particular, the author wants to emphasize the Record and Replay framework project. While less known, it has been instrumental in investigating failures in Kefir and software compiled by it, and this project had non-trivial impact onto Kefir's current state.
- Once again, Richard Matthew Stallman as an author of GNU General Public License. The author believes that GNU GPL has been the cornerstone in establishing free software movement.
- Authors and contributors of all projects used in the external test suite.
These are too numerous to list here, so please refer to
source/tests/external. - Hsiang-Ying Fu, the developer of slimcc, who has compiled a wonderful collection of C projects for compiler validation.
- Dr. Brian Robert Callahan, whose blogpost has motivated the author to resume work on the compiler.
- Anybody else who has noticed and acknowledged Kefir development in early stages.
- Friends and relatives of the author, who over the years happened to be listeners to ramblings on C compiling topics.
#Author and contacts
The project has been architected, engineered and implemented single-handedly by Jevgenij Protopopov (legal spelling: Jevgēnijs Protopopovs), with the exception for two patches obtained from third parties:
The author can be contacted by email directly, or via the mailing list.
Development of the project has been conducted independently without external sources of funding or institutional support.