I've been a ferocious critic of C, C++, and other memory unsafe languages, and a booster of memory safe languages such as Swift, Go, and particularly Rust. And though I believe there is a more-than-sufficient body of evidence to support the claim that the time to start migrating is now, there are still open questions related to how we migrate systems to memory safe languages more scalably, and how we maximize the safety of code written in these languages.
The remainder of this post will outline specific areas I believe are open research questions. It is my hope that this will prove useful to software and security engineers, researchers, as well as funding bodies. I'll be making references to things in Rust, however I believe most, if not all, of my examples are equally applicable to other memory safe languages.
The first set of research projects deal with how we can best ensure more projects adopt memory safe languages.
There is presently an enormous body of existing C and C++ code. Migrating even small portions of it by hand is the work of several lifetimes. An important open question is how we can use tools to automate these questions. Projects such as C2Rust1 begin to answer this question, by providing automated conversion from C to Rust. Unsurprisingly, this does not automatically make the Rust code safe, the generated code makes heavy use of Rust's unsafe.
I believe exploration of how unsafe Rust derived from C can be converted to safe Rust, probably utilizing human guided automation, would be an incredibly valuable accomplishment. Two specific areas I believe could benefit would be converting pointer plus length pairs to Rust's slices, and converting raw malloc invocations to use Rust's Box. As an initial hypothesis, I imagine this could work something like:
Such a process could be an incredible time saver for reducing the unsafe-ty in converted code (particularly given Microsoft's research indicating that spatial safety is the most common vulnerability category).
Lots of memory unsafe code is built on a small number of very popular abstractions and libraries, for example Linux kernel modules and Skia. Both are written in C or C++, and thus effectively all consumers of their APIs are as well. Major projects like these should have bindings in memory safe languages that allow new consumers to be written in memory safe languages, and existing consumers to migrate. Such projects should also, in the long term, assist with inverting this relationship, allowing these projects to be written in a memory safe language and offer C/C++ bindings for backwards compatibility.
An example of a project like this is linux-kernel-module-rust, by Geoffrey Thomas and myself, which allows writing Linux kernel modules entirely in safe Rust. Unfortunately, it currently supports only a narrow subset of the kernel APIs (exposing character devices and sysctls). Expanding its API surface and helping port real world kernel modules would have a profound impact.
Finally, we simply need more research on what strategies are effective for migrating codebases (and which strategies aren't!). This can include both writing descriptions of successful introductions of memory safety into an existing code base (e.g. Firefox and librsvg) as well as experiments which try different approaches to migration.
The second set of research projects deal with how to improve the safety of code written using unsafe keywords are similar.
One of the larger challenges with using C and C++ libraries from a language like Rust is figuring out what lifetimes are required of arguments to functions within the API (truth be told, this is a large challenge when using C and C++ libraries from C and C++ as well!). Libraries often do not document these requirements, and never expose this information programmatically (since C and C++ have no notion of lifetimes or syntax for expressing them). However, correctly maintaining lifetimes across FFI is vital to maintaining memory safety of a program!
We need better tooling for squaring this circle. I do not have a specific proposal for how to address this, but I'm hopeful that bright minds dedicated to solving it could make a meaningful improvement — until all libraries are written in memory safe languages, we'll need to be able to interoperate safely.
In C and C++, integer overflows are a classic source of memory corruption. Memory safe languages address this by enforcing bounds checks (but not using checked arithmetic everywhere!), however unsafe blocks which attempt to bypass these bounds checks for performance often forget to handle integer overflows, introducing vulnerabilities.
I believe two avenues of research exist here, the first is better analysis tools for identifying unhandled integer overflow in the presence of unsafe blocks, and the second is improving code generation and other optimizations such that enabling checked arithmetic by default is practical.
Finally, building on the previous project, there is a general purpose need for better static analysis tools for finding vulnerabilities in unsafe, and automatically suggesting safe idioms to replace unnecessary usage of unsafe. Ideally this work would be empirically driven, based on data sets from RustSec as well as review of large codebases which make use unsafe.
I believe even without these improvements, the case for migration to memory safe languages is acute and compelling. However, we must never be satisfied by the status quo, improvements which make adoption of memory safe languages easier, and these languages even safer, can have a profound impact on the safety of our overall computing experience.
It is my hope that engineers and researchers have found projects here they are excited to work on, and funding bodies have ideas for projects to fund.
|||C2Rust is inspired by Corrode, an earlier effort at translating C to Rust.|