How Modern Languages Trace Their Toolchains Back to C
Why C is at the center
When a new programming language is designed, its first compiler or runtime almost always gets written in C or C++. The reasons are practical: C runs everywhere, has no runtime dependency, can be compiled by GCC or Clang on any system, and gives the implementer direct control over memory layout and calling conventions. Every major operating system exposes its API in C. Every CPU architecture has a C compiler.
This creates a pattern: the first implementation of Language X is written in C. Once X matures, the community rewrites the implementation in X itself (self-hosting), or in a higher-level systems language like C++ or Rust. But the historical chain from X back to C remains encoded in the dataset as bootstrap or implementation edges.
The Language Lineage dataset records compiler_written_in, runtime_written_in, bootstrap_written_in, and rewritten_in edges for 152 languages and tools. The chains below are computed at generation time by following those edges backward until reaching C or C++.
Implementation chains from the dataset
7 out of 8 target languages have traceable chains to C or C++ in the current dataset. Chains read from root (C or C++) to target language.
Rust (rustc bootstrap origin)
- Machine Code bootstrapped from Rust
JavaScript (V8 engine)
- C runtime written in JavaScript
TypeScript (tsc transpiler chain)
- C compiler written in Go
- Go rewritten in TypeScript
The self-hosting escape hatch
Several languages in the chains above are now self-hosting: the modern compiler or runtime is written in the language itself. Rust's rustc is written in Rust. Go's gc compiler is written in Go. The TypeScript compiler tsc is written in TypeScript. But each of those languages began with a compiler or runtime written in C, C++, or OCaml. The "chain to C" is a historical record, not a current dependency for most of these languages.
There are exceptions. CPython, the reference Python runtime, is still written in C. CRuby, the reference Ruby implementation, is still written in C. These are ongoing dependencies, not just historical ones. C is not just the bootstrap ancestor; it is the active implementation language for some of the world's most widely used runtimes.
Why OCaml appears in the Rust chain
Rust's first compiler, rustboot, was written in OCaml, not C. This is unusual. OCaml's powerful type system was well-suited to the experimental type theory work Rust required in its early years. The chain for Rust therefore goes through OCaml on its way back to C, because OCaml's own runtime is written in C. This shows that "back to C" does not always mean a direct hop; intermediate systems languages appear in some chains.
Limitations of the chains
The chains above follow a single path through the implementation graph: at each step, they pick the highest-priority edge type (bootstrap preferred over compiler preferred over runtime). This means some nuance is lost. Python's chain leads to C via its runtime, but Python also has PyPy (RPython/Python), Jython (Java), and GraalPy (Java), each with different implementation chains. The chain here is for the dominant, reference implementation only.
Similarly, the Go chain leads to C via the historical bootstrap origin (the original gc compiler was in C). The modern Go compiler has been self-hosted since Go 1.5. If you followed the current dependency rather than the bootstrap origin, Go would be a one-step self-referential chain. The dataset distinguishes bootstrap origins from current implementation with the bootstrap_written_in edge type.
What the chains reveal about software architecture
Looking at these chains together reveals a structural pattern in software: performance-critical runtimes descend from C, while developer-facing tooling tends to be written in the language it serves. CPython and CRuby are written in C because their performance profile demands direct memory control. But the Python standard library is mostly Python, the Ruby standard library is mostly Ruby, and the Rust standard library is Rust. The C layer is a substrate, not the full story.
Another pattern: the higher a language sits in the abstraction hierarchy, the more likely its first implementation is written in a lower-level language, and the more likely it later becomes self-hosting. Rust, Go, Haskell, TypeScript, and Java's javac are all self-hosting today. CPython and CRuby are notable exceptions: they remain C programs by choice, trading the simplicity and performance of C for the ecosystem benefits of being close to the OS and hardware.
The chains also show how knowledge transfers through the ecosystem. OCaml appeared as the original language for the Rust compiler because Graydon Hoare knew OCaml and its type theory was well-matched to Rust's ambitions. Go used C because Rob Pike, Ken Thompson, and the rest of the team were Unix and Plan 9 veterans who designed the language in the C tradition. The choice of bootstrap language leaves a historical record in the dataset that reflects the intellectual genealogy of the project as much as its technical requirements.
Reading the dataset yourself
Every edge shown above is in the public dataset at Language Lineage dataset. The relationship pages show all edges of each type: runtime_written_in, compiler_written_in, bootstrap_written_in. The compiler bootstrapping guide explains the bootstrap pattern in detail. Each language page in the dataset also shows the relationship map at the top, including all implementation edges.
Explore Implementation Chains in Graph →