EvoC2RUST: Bridging the Gap Between C and Rust with Smart Translation

🔊

💬 Ask

The transition from the ubiquitous C programming language to the memory-safe Rust is a growing imperative in software development, especially for safety-critical systems. However, translating complex, real-world C projects into idiomatic and secure Rust code presents significant challenges. A new framework, EvoC2RUST, developed by researchers from Shanghai Jiao Tong University and Huawei Technologies, offers a novel solution that tackles these hurdles, demonstrating superior performance compared to existing methods.

The core problem in C-to-Rust translation lies in the fundamental differences between the languages. C offers flexible memory management and pointer arithmetic, which are often the source of critical security vulnerabilities. Rust, on the other hand, enforces strict compile-time memory safety through its ownership and borrowing rules. Existing translation tools often resort to generating Rust code with “unsafe” blocks or raw pointers to mimic C’s behavior, sacrificing the very safety guarantees Rust is known for. Large Language Models (LLMs) show promise, but often struggle with the vast interdependencies of entire codebases, leading to semantic errors and broken references.

EvoC2RUST addresses these issues through a three-stage, skeleton-guided translation strategy. First, it decomposes the C project into functional modules and creates a compilable “skeleton” of Rust code. This skeleton includes type-checked function stubs, essentially placeholders that allow the project to compile even before the complex logic is translated.

The second stage involves incrementally translating these function stubs. Here, EvoC2RUST leverages LLMs augmented with a set of “safety-preserving linguistic mappings.” These mappings cover seven key categories, including type conversions, macro and function translation, and operator handling. For instance, C’s pointer arithmetic, a common source of errors, is mapped to Rust’s safe Ptr<T> type with methods like cast(). Similarly, the NULL macro in C is mapped to Rust’s Option<T> type. This structured approach guides the LLM to produce more accurate and idiomatic Rust code.

The final stage focuses on post-generation repair. EvoC2RUST employs a cascading approach that combines LLM-based refinement with static analysis. This process iteratively fixes compilation errors, starting with bracket mismatches and progressing to more complex semantic issues. This “evolutionary augmentation” blends the strengths of rule-based tools (for predictable errors) and LLMs (for nuanced corrections), resulting in robust and safe code.

The researchers evaluated EvoC2RUST on both open-source benchmarks and six industrial projects. The results were compelling. EvoC2RUST outperformed existing LLM-based approaches by over 17% in syntax accuracy and 14% in semantic accuracy. Crucially, it achieved a code safety rate of nearly 97%, significantly higher than rule-based tools. At the module level, the framework achieved a 92.25% compilation rate and an 89.53% test pass rate on industrial projects, demonstrating its effectiveness even on complex codebases and long functions.

A case study involving the translation of a rb_tree_rotate function highlighted the framework’s superiority. Unlike other methods that introduced “unsafe” blocks or ownership errors, EvoC2RUST successfully produced a safe and correct Rust translation by leveraging its structured mappings and careful pointer management.

In essence, EvoC2RUST represents a significant advancement in automated C-to-Rust translation. By combining skeleton-guided translation with linguistically informed LLMs and a robust repair mechanism, it provides a practical and effective solution for migrating legacy C codebases to the memory-safe and modern Rust ecosystem.

AI Papers Reader

Personalized digests of latest AI research

EvoC2RUST: Bridging the Gap Between C and Rust with Smart Translation

Chat about this paper