Question

我想从标准输入读取一行，然后用空格将其分割并处理这些部分。

一个简单的 read_line 可以工作，因为它返回一个拥有的字符串：

fn read_line() -> String {
    let mut str: String = String::new();
    stdin().read_line(&mut str).unwrap();
    return str;
}

但是当我想使用 String 并返回一个拥有的 Split 时，我无法做到这一点，该 Split 的生命周期超出了创建它的函数的范围。

需要明确的是，我只想获得分割结果（任何类似的类型），而只要分割存在，为字符分配的内存仍然保留。

我试过了：

use std::{io::stdin, str::SplitWhitespace};
fn main() {
    let split = read_line_and_split();
}

// doesn't compile: missing lifetime specifier
fn read_line_and_split() -> SplitWhitespace {
    let mut str = String::new();
    stdin().read_line(&mut str);
    str.split_whitespace()
}

确实不清楚“拥有的分裂”是什么意思。根据定义，拆分是从父级借用的，因此除非该函数借用，否则无法从该函数返回它。然而，您可以做的是将 split 转换&str为拥有的，例如 map 并将其收集std::str::Split到Vec<String>. — 
对于您的用例，收集Vec<String>听起来最简单。如果您有奇怪的要求并且想要避免向量分配，您还可以查看此处的解决方案（更高级）：的自有版本Split本质上与的自有版本有相同的问题Chars。 —

Accepted Answer

创建的自有版本Split比看起来更棘手。例如，假设您尝试显而易见的方法：

// doesn't compile
fn owned_split(s: String) -> impl Iterator<Item = String> {
    s.split_whitespace().map(|sub| sub.to_string())
}

这不会编译，因为后面的值impl Iterator实际上SplitWhitespace<'a>（由返回str::split_whitespace()）包含对字符串的引用。此实现owned_split()有效地尝试返回对局部变量的引用s，该引用不会编译（也不应该编译）。理想情况下，我们会返回一个包含原始字符串和SplitWhitespace<'a>指向它的迭代器的结构。但这不起作用，因为借用检查器尚不支持自引用结构。可以使用其中一个自引用包使其在“安全”代码中工作让我们首先探索其他选项

正如评论中所指出的，最简单和最明显的选择就是收集到 aVec<String>并完成它：

let split: Vec<String> = s.split_whitespace().map(|sub| sub.to_owned()).collect();

但是，如果您的字符串真的很长，或者您对作为学习练习的替代方案感到好奇，请继续阅读。

另一种方法是在每次请求下一次分割时重新调用str::split_whitespace()，并返回字符串其余部分中的第一项。这需要一些足智多谋的方法来找出在哪里继续查找空格，例如通过查看返回的子字符串的地址：

fn owned_split(s: String) -> impl Iterator<Item = String> {
    let mut pos = 0;
    std::iter::from_fn(move || {
        let sub = s[pos..].split_whitespace().next()?;
        // Next search position is at the end of `sub`, but we need it as
        // index into `s`. Since `sub` is a slice of `s`, we calculate
        // where in `s` its end lies by subtracting the address of end of
        // `sub` from the address of start of `s`.
        pos = sub.as_bytes().as_ptr_range().end as usize - s.as_ptr() as usize;
        Some(sub.to_owned())
    })
}

指针减法看起来很可怕，但仍然 100% 安全，因为我们没有取消引用指针后面的数据，只是查询其地址以确定索引。

最后，这是一个使用创建自引用结构的版本，该结构包含拥有的字符串和借用它的分割迭代器：

use std::str::SplitWhitespace;

fn owned_split(s: String) -> impl Iterator<Item = String> {
    self_cell::self_cell! {
        struct OwnedSplit {
            owner: String,
            #[not_covariant]
            dependent: SplitWhitespace,
        }
    }
    impl Iterator for OwnedSplit {
        type Item = String;
        fn next(&mut self) -> Option<String> {
            self.with_dependent_mut(|_, split| split.next().map(|s| s.to_owned()))
        }
    }
    OwnedSplit::new(s, |s| s.split_whitespace())
}

此版本让其self_cell生成用于unsafe以与借用检查器配合良好的方式创建自引用类型的代码。除非中存在错误self_cell，否则 unsafe 的这种使用要么是合理的，要么是编译失败。非常相似，但我推荐，self_cell因为它生成的代码少得多，并且不使用 proc 宏，从而缩短了编译时间。

如何定义一个 Rust 函数，它使用一个拥有的 String 并返回一个拥有的 Split？

1 个回答
1

1 个回答 1

1 个回答
1