Safe Rust 的陷阱
Pitfalls of Safe Rust

原始链接: https://corrode.dev/blog/pitfalls-of-safe-rust/

Rust虽然提供了内存安全,但构建健壮的应用程序仍需谨慎。常见的陷阱包括整数溢出(使用`checked_mul`,`overflow-checks = true`),危险的类型转换(优先使用`TryFrom`)以及`unwrap`/`expect`导致的程序崩溃(处理`Option`s/`Result`s)。 领域特定类型有助于强制不变性(例如,`Username`)。注意结构体中无效状态组合;使用枚举进行清晰的状态管理。避免盲目推导`Default`、`Debug`(对敏感数据进行脱敏)和`Serialize`/`Deserialize`(实现自定义逻辑或`#[serde(try_from)]`)。 通过在检查期间持有资源来解决TOCTOU(时间竞争)错误。使用恒定时间等式进行密码验证以防止时序攻击。限制资源使用以防止拒绝服务攻击。理解`Path::join`的行为。 最后,使用`cargo-geiger`检查依赖项中是否存在不安全代码。使用大量的clippy lint(原文中已列出)在编译时捕获问题。将Rust的安全特性与严格的测试和验证相结合,以获得最大的可靠性。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Safe Rust 的陷阱 (corrode.dev) pjmlp 2 小时前 11 分 | 隐藏 | 过去 | 收藏 | 2 条评论 nerdile 5 分钟前 | 下一条 [–] 标题略有误导,但内容不错。“Safe Rust”这个标题让我觉得奇怪。这些问题在 Rust 中普遍存在,编写不安全的 Rust 代码也无法避免。它们也不是 Rust 独有的。一个不那么具有诱导性的标题可能是“Rust 的陷阱:超越内存安全的运行时正确性”。 回复 woah 6 分钟前 | 上一条 [–] “as” 是一个不必要的陷阱吗? 回复 加入我们,参加 6 月 16-17 日在旧金山举办的 AI 初创公司学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:
相关文章

原文

When people say Rust is a “safe language”, they often mean memory safety. And while memory safety is a great start, it’s far from all it takes to build robust applications.

Memory safety is important but not sufficient for overall reliability.

In this article, I want to show you a few common gotchas in safe Rust that the compiler doesn’t detect and how to avoid them.

Even in safe Rust code, you still need to handle various risks and edge cases. You need to address aspects like input validation and making sure that your business logic is correct.

Here are just a few categories of bugs that Rust doesn’t protect you from:

  • Type casting mistakes (e.g. overflows)
  • Logic bugs
  • Panics because of using unwrap or expect
  • Malicious or incorrect build.rs scripts in third-party crates
  • Incorrect unsafe code in third-party libraries
  • Race conditions

Let’s look at ways to avoid some of the more common problems. The tips are roughly ordered by how likely you are to encounter them.

Click here to expand the table of contents.

Overflow errors can happen pretty easily:

fn calculate_total(price: u32, quantity: u32) -> u32 {
    price * quantity  }

If price and quantity are large enough, the result will overflow. Rust will panic in debug mode, but in release mode, it will silently wrap around.

To avoid this, use checked arithmetic operations:

fn calculate_total(price: u32, quantity: u32) -> Result<u32, ArithmeticError> {
    price.checked_mul(quantity)
        .ok_or(ArithmeticError::Overflow)
}

Static checks are not removed since they don’t affect the performance of generated code. So if the compiler is able to detect the problem at compile time, it will do so:

fn main() {
    let x: u8 = 2;
    let y: u8 = 128;
    let z = x * y;  }

The error message will be:

error: this arithmetic operation will overflow
 --> src/main.rs:4:13
  |
4 |     let z = x * y;    |             ^^^^^ attempt to compute `2_u8 * 128_u8`, which would overflow
  |
  = note: `#[deny(arithmetic_overflow)]` on by default

For all other cases, use checked_add, checked_sub, checked_mul, and checked_div, which return None instead of wrapping around on underflow or overflow.

Rust carefully balances performance and safety. In scenarios where a performance hit is acceptable, memory safety takes precedence.

Integer overflows can lead to unexpected results, but they are not inherently unsafe. On top of that, overflow checks can be expensive, which is why Rust disables them in release mode.

However, you can re-enable them in case your application can trade the last 1% of performance for better overflow detection.

Put this into your Cargo.toml:

[profile.release]
overflow-checks = true 

This will enable overflow checks in release mode. As a consequence, the code will panic if an overflow occurs.

See the docs for more details.

While we’re on the topic of integer arithmetic, let’s talk about type conversions. Casting values with as is convenient but risky unless you know exactly what you are doing.

let x: i32 = 42;
let y: i8 = x as i8;  

There are three main ways to convert between numeric types in Rust:

  1. ⚠️ Using the as keyword: This approach works for both lossless and lossy conversions. In cases where data loss might occur (like converting from i64 to i32), it will simply truncate the value.

  2. Using From::from(): This method only allows lossless conversions. For example, you can convert from i32 to i64 since all 32-bit integers can fit within 64 bits. However, you cannot convert from i64 to i32 using this method since it could potentially lose data.

  3. Using TryFrom: This method is similar to From::from() but returns a Result instead of panicking. This is useful when you want to handle potential data loss gracefully.

If in doubt, prefer From::from() and TryFrom over as.

  • use From::from() when you can guarantee no data loss.
  • use TryFrom when you need to handle potential data loss gracefully.
  • only use as when you’re comfortable with potential truncation or know the values will fit within the target type’s range and when performance is absolutely critical.

(Adapted from StackOverflow answer by delnan and additional context.)

The as operator is not safe for narrowing conversions. It will silently truncate the value, leading to unexpected results.

What is a narrowing conversion? It’s when you convert a larger type to a smaller type, e.g. i32 to i8.

For example, see how as chops off the high bits from our value:

fn main() {
    let a: u16 = 0x1234;
    let b: u8 = a as u8;
    println!("0x{:04x}, 0x{:02x}", a, b); }

So, coming back to our first example above, instead of writing

let x: i32 = 42;
let y: i8 = x as i8;  

use TryFrom instead and handle the error gracefully:

let y = i8::try_from(x).ok_or("Number is too big to be used here")?;

Bounded types make it easier to express invariants and avoid invalid states.

E.g. if you have a numeric type and 0 is never a correct value, use std::num::NonZeroUsize instead.

You can also create your own bounded types:

struct Measurement {
    distance: f64,  }

#[derive(Debug, Clone, Copy)]
struct Distance(f64);

impl Distance {
    pub fn new(value: f64) -> Result<Self, DistanceError> {
        if value < 0.0 || !value.is_finite() {
            return Err(DistanceError::Invalid);
        }
        Ok(Distance(value))
    }
}

struct Measurement {
    distance: Distance,
}

(Rust Playground)

Whenever I see the following, I get goosebumps 😨:

let arr = [1, 2, 3];
let elem = arr[3];  

That’s a common source of bugs. Unlike C, Rust does check array bounds and prevents a security vulnerability, but it still panics at runtime.

Instead, use the get method:

let elem = arr.get(3);

It returns an Option which you can now handle gracefully.

See this blog post for more info on the topic.

This issue is related to the previous one. Say you have a slice and you want to split it at a certain index.

let mid = 4;
let arr = [1, 2, 3];
let (left, right) = arr.split_at(mid);

You might expect that this returns a tuple of slices where the first slice contains all elements and the second slice is empty.

Instead, the above code will panic because the mid index is out of bounds!

To handle that more gracefully, use split_at_checked instead:

let arr = [1, 2, 3];
match arr.split_at_checked(mid) {
    Some((left, right)) => {
            }
    None => {
            }
}

This returns an Option which allows you to handle the error case. (Rust Playground)

More info about split_at_checked here.

It’s very tempting to use primitive types for everything. Especially Rust beginners fall into this trap.

fn authenticate_user(username: String) {
    }

However, do you really accept any string as a valid username? What if it’s empty? What if it contains emojis or special characters?

You can create a custom type for your domain instead:

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
struct Username(String);

impl Username {
    pub fn new(name: &str) -> Result<Self, UsernameError> {
                if name.is_empty() {
            return Err(UsernameError::Empty);
        }

                if name.len() > 30 {
            return Err(UsernameError::TooLong);
        }

                if !name.chars().all(|c| c.is_alphanumeric() || c == '_') {
            return Err(UsernameError::InvalidCharacters);
        }

        Ok(Username(name.to_string()))
    }

        pub fn as_str(&self) -> &str {
        &self.0
    }
}

fn authenticate_user(username: Username) {
        }

(Rust playground)

The next point is closely related to the previous one.

Can you spot the bug in the following code?

struct Configuration {
    port: u16,
    host: String,
    ssl: bool,
    ssl_cert: Option<String>, 
}

The problem is that you can have ssl set to true but ssl_cert set to None. That’s an invalid state! If you try to use the SSL connection, you can’t because there’s no certificate. This issue can be detected at compile-time:

Use types to enforce valid states:

enum ConnectionSecurity {
    Insecure,
            Ssl { cert_path: String },
}

struct Configuration {
    port: u16,
    host: String,
                security: ConnectionSecurity,
}

In comparison to the previous section, the bug was caused by an invalid combination of closely related fields. To prevent that, clearly map out all possible states and transitions between them. A simple way is to define an enum with optional metadata for each state.

If you’re curious to learn more, here is a more in-depth blog post on the topic.

It’s quite common to add a blanket Default implementation to your types. But that can lead to unforeseen issues.

For example, here’s a case where the port is set to 0 by default, which is not a valid port number.

#[derive(Default)]  struct ServerConfig {
    port: u16,          max_connections: usize,
    timeout_seconds: u64,
}

Instead, consider if a default value makes sense for your type.

struct ServerConfig {
    port: Port,
    max_connections: NonZeroUsize,
    timeout_seconds: Duration,
}

impl ServerConfig {
    pub fn new(port: Port) -> Self {
        Self {
            port,
            max_connections: NonZeroUsize::new(100).unwrap(),
            timeout_seconds: Duration::from_secs(30),
        }
    }
}

If you blindly derive Debug for your types, you might expose sensitive data. Instead, implement Debug manually for types that contain sensitive information.

#[derive(Debug)]
struct User {
    username: String,
    password: String,  }

Instead, you could write:

#[derive(Debug)]
struct User {
    username: String,
    password: Password,
}

struct Password(String);

impl std::fmt::Debug for Password {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_str("[REDACTED]")
    }
}

fn main() {
    let user = User {
        username: String::from(""),
        password: Password(String::from("")),
    };
    println!("{user:#?}");
}

This prints

User {
    username: "",
    password: [REDACTED],
}

(Rust playground)

For production code, use a crate like secrecy.

However, it’s not black and white either: If you implement Debug manually, you might forget to update the implementation when your struct changes. A common pattern is to destructure the struct in the Debug implementation to catch such errors.

Instead of this:

impl std::fmt::Debug for DatabaseURI {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{}://{}:[REDACTED]@{}/{}", self.scheme, self.user, self.host, self.database)
    }
}

How about destructuring the struct to catch changes?

impl std::fmt::Debug for DatabaseURI {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
                             let DatabaseURI { scheme, user, password: _, host, database, } = self;
        write!(f, "{scheme}://{user}:[REDACTED]@{host}/{database}")?;
                                                                
        Ok(())
    }
}

(Rust playground)

Thanks to Wesley Moore (wezm) for the hint and to Simon Brüggen (m3t0r) for the example.

Don’t blindly derive Serialize and Deserialize – especially for sensitive data. The values you read/write might not be what you expect!

#[derive(Serialize, Deserialize)]
struct UserCredentials {
    #[serde(default)]      username: String,
    #[serde(default)]
    password: String, }

When deserializing, the fields might be empty. Empty credentials could potentially pass validation checks if not properly handled

On top of that, the serialization behavior could also leak sensitive data. By default, Serialize will include the password field in the serialized output, which could expose sensitive credentials in logs, API responses, or debug output.

A common fix is to implement your own custom serialization and deserialization methods by using impl<'de> Deserialize<'de> for UserCredentials.

The advantage is that you have full control over input validation. However, the disadvantage is that you need to implement all the logic yourself.

An alternative strategy is to use the #[serde(try_from = "FromType")] attribute.

Let’s take the Password field as an example. Start by using the newtype pattern to wrap the standard types and add custom validation:

#[derive(Deserialize)]
#[serde(try_from = "String")]
pub struct Password(String);

Now implement TryFrom for Password:

impl TryFrom<String> for Password {
    type Error = PasswordError;

                    fn try_from(value: String) -> Result<Self, Self::Error> {
                if value.len() < 8 {
            return Err(PasswordError::TooShort);
        }
        Ok(Password(value))
    }
}

With this trick, you can no longer deserialize invalid passwords:

let password: Password = serde_json::from_str(r#""pass""#).unwrap();

(Try it on the Rust Playground)

Credits go to EqualMa’s article on dev.to and to Alex Burka (durka) for the hint.

This is a more advanced topic, but it’s important to be aware of it. TOCTOU (time-of-check to time-of-use) is a class of software bugs caused by changes that happen between when you check a condition and when you use a resource.

fn remove_dir(path: &Path) -> io::Result<()> {
        if !path.is_dir() {
        return Err(io::Error::new(
            io::ErrorKind::NotADirectory,
            "not a directory"
        ));
    }
    
            remove_dir_impl(path)
}

(Rust playground)

The safer approach opens the directory first, ensuring we operate on what we checked:

fn remove_dir(path: &Path) -> io::Result<()> {
        let handle = OpenOptions::new()
        .read(true)
        .custom_flags(O_NOFOLLOW | O_DIRECTORY)         .open(path)?;
    
        remove_dir_impl(&handle)
}

(Rust playground)

Here’s why it’s safer: while we hold the handle, the directory can’t be replaced with a symlink. This way, the directory we’re working with is the same as the one we checked. Any attempt to replace it won’t affect us because the handle is already open.

You’d be forgiven if you overlooked this issue before. In fact, even the Rust core team missed it in the standard library. What you saw is a simplified version of an actual bug in the std::fs::remove_dir_all function. Read more about it in this blog post about CVE-2022-21658.

Timing attacks are a nifty way to extract information from your application. The idea is that the time it takes to compare two values can leak information about them. For example, the time it takes to compare two strings can reveal how many characters are correct. Therefore, for production code, be careful with regular equality checks when handling sensitive data like passwords.

fn verify_password(stored: &[u8], provided: &[u8]) -> bool {
    stored == provided  }

use subtle::{ConstantTimeEq, Choice};

fn verify_password(stored: &[u8], provided: &[u8]) -> bool {
    stored.ct_eq(provided).unwrap_u8() == 1
}

Protect Against Denial-of-Service Attacks with Resource Limits. These happen when you accept unbounded input, e.g. a huge request body which might not fit into memory.

fn process_request(data: &[u8]) -> Result<(), Error> {
    let decoded = decode_data(data)?;          Ok(())
}

Instead, set explicit limits for your accepted payloads:

const MAX_REQUEST_SIZE: usize = 1024 * 1024;  
fn process_request(data: &[u8]) -> Result<(), Error> {
    if data.len() > MAX_REQUEST_SIZE {
        return Err(Error::RequestTooLarge);
    }
    
    let decoded = decode_data(data)?;
        Ok(())
}

If you use Path::join to join a relative path with an absolute path, it will silently replace the relative path with the absolute path.

use std::path::Path;

fn main() {
    let path = Path::new("/usr").join("/local/bin");
    println!("{path:?}"); }

This is because Path::join will return the second path if it is absolute.

I was not the only one who was confused by this behavior. Here’s a thread on the topic, which also includes an answer by Johannes Dahlström:

The behavior is useful because a caller […] can choose whether it wants to use a relative or absolute path, and the callee can then simply absolutize it by adding its own prefix and the absolute path is unaffected which is probably what the caller wanted. The callee doesn’t have to separately check whether the path is absolute or not.

And yet, I still think it’s a footgun. It’s easy to overlook this behavior when you use user-provided paths. Perhaps join should return a Result instead? In any case, be aware of this behavior.

So far, we’ve only covered issues with your own code. For production code, you also need to check your dependencies. Especially unsafe code would be a concern. This can be quite challenging, especially if you have a lot of dependencies.

cargo-geiger is a neat tool that checks your dependencies for unsafe code. It can help you identify potential security risks in your project.

cargo install cargo-geiger
cargo geiger

This will give you a report of how many unsafe functions are in your dependencies. Based on this, you can decide if you want to keep a dependency or not.

Here is a set of clippy lints that can help you catch these issues at compile time. See for yourself in the Rust playground.

Here’s the gist:

  • cargo check will not report any issues.
  • cargo run will panic or silently fail at runtime.
  • cargo clippy will catch all issues at compile time (!) 😎
#![deny(arithmetic_overflow)] #![deny(clippy::checked_conversions)] #![deny(clippy::cast_possible_truncation)] #![deny(clippy::cast_sign_loss)] #![deny(clippy::cast_possible_wrap)] #![deny(clippy::cast_precision_loss)] #![deny(clippy::integer_division)] #![deny(clippy::arithmetic_side_effects)] #![deny(clippy::unchecked_duration_subtraction)] 
#![warn(clippy::unwrap_used)] #![warn(clippy::expect_used)] #![deny(clippy::panicking_unwrap)] #![deny(clippy::option_env_unwrap)] 
#![deny(clippy::indexing_slicing)] 
#![deny(clippy::join_absolute_paths)] 
#![deny(clippy::serde_api_misuse)] 
#![deny(clippy::uninit_vec)] 
#![deny(clippy::transmute_int_to_char)] #![deny(clippy::transmute_int_to_float)] #![deny(clippy::transmute_ptr_to_ref)] #![deny(clippy::transmute_undefined_repr)] 
use std::path::Path;
use std::time::Duration;

fn main() {
    
        let a: u8 = 255;
    let _b = a + 1;

        let large_number: i64 = 1_000_000_000_000;
    let _small_number: i32 = large_number as i32;

        let negative: i32 = -5;
    let _unsigned: u32 = negative as u32;

        let _result = 5 / 2; 
        let short = Duration::from_secs(1);
    let long = Duration::from_secs(2);
    let _negative = short - long; 
    
        let data: Option<i32> = None;
    let _value = data.unwrap();

        let result: Result<i32, &str> = Err("error occurred");
    let _value = result.expect("This will panic");

        let _api_key = std::env::var("API_KEY").unwrap();

    
        let numbers = vec![1, 2, 3];
    let _fourth = numbers[3]; 
        if let Some(fourth) = numbers.get(3) {
        println!("{fourth}");
    }

    
        let base = Path::new("/home/user");
    let _full_path = base.join("/etc/config"); 
        let base = Path::new("/home/user");
    let relative = Path::new("config");
    let full_path = base.join(relative);
    println!("Safe path joining: {:?}", full_path);

    
        let mut vec: Vec<String> = Vec::with_capacity(10);
    unsafe {
        vec.set_len(10);     }
}

Phew, that was a lot of pitfalls! How many of them did you know about?

Even if Rust is a great language for writing safe, reliable code, developers still need to be disciplined to avoid bugs.

A lot of the common mistakes we saw have to do with Rust being a systems programming language: In computing systems, a lot of operations are performance critical and inherently unsafe. We are dealing with external systems outside of our control, such as the operating system, hardware, or the network. The goal is to build safe abstractions on top of an unsafe world.

Rust shares an FFI interface with C, which means that it can do anything C can do. So, while some operations that Rust allows are theoretically possible, they might lead to unexpected results.

But not all is lost! If you are aware of these pitfalls, you can avoid them, and with the above clippy lints, you can catch most of them at compile time.

That’s why testing, linting, and fuzzing are still important in Rust.

For maximum robustness, combine Rust’s safety guarantees with strict checks and strong verification methods.

I hope you found this article helpful! If you want to take your Rust code to the next level, consider a code review by an expert. I offer code reviews for Rust projects of all sizes. Get in touch to learn more.

联系我们 contact @ memedata.com