An i32 walks up to a String and asks for its number. The String replies: "Sorry, you're not my type."
Rust splits "string" across three cooperating types:
char is one Unicode scalar value (always 4 bytes).&str is a borrowed view into UTF-8 text. Cheap to pass around.String is an owned, growable UTF-8 buffer. You own the memory.Before we go further, two words that show up everywhere in Rust:
std::unique_ptr; in Python or Java terms, it's the role of the
variable that decides when the object can be collected. In Rust
every heap value has exactly one owner at a time.const T& in
C++, or handing out a read-only pointer in C. Borrows are written
with an & (or &mut if you also want to mutate). The borrow has
to end before the owner is dropped, and the compiler enforces that
for you, ruling out use-after-free and dangling pointers.Chapter 12 is dedicated to ownership and borrowing; for now just keep the mental picture of "one owner, many short-lived borrows."
The split between &str and String is what makes Rust strings both
fast and safe. A function that just reads text takes &str; a
function that produces new text returns String. You'll see this
rhythm again and again:
fn shout(text: &str) -> String {
text.to_uppercase()
}
let s = String::from("hello");
let louder = shout(&s); // &String coerces to &str
&str ("string slice", pronounced stir) is a borrowed view
into text that lives elsewhere. Taking name: &str means "I just need
to read this string; I'm not taking ownership of it."String is owned and heap-allocated. Returning -> String
means the caller gets a fresh, owned value back.A common gotcha: s.len() returns the number of bytes, not characters.
For character counts use s.chars().count(). UTF-8 means a single visible
character can take more than one byte.
You'll also meet .chars() a lot. It returns an iterator of char, and
iterators have many useful adapters like .next(), .count(), and
.any(...) (more on iterators in chapter 16).
String with format!The fastest way to assemble a new String is the format! macro. It
works like println!, except instead of printing, it returns the
formatted text:
let name = "Alice";
let greeting: String = format!("Hello, {name}!");
A few things worth noticing:
{name} inside the string is a captured identifier. Rust
pulls the variable from the surrounding scope. (Pre-2021 code uses
format!("Hello, {}!", name) instead; both still work.)String, ready to return from your function.!) means it's a macro, not a regular function
call. You'll learn what that distinction buys you later; for now,
treat it as a quirky bit of punctuation.for loopsThe simplest way to consume an iterator is a for loop:
for c in "hello".chars() {
println!("{c}");
}
You can read it as "for each c produced by the iterator on the right,
run the body once." The loop variable is a fresh binding scoped to each
iteration. Anything that produces an iterator (a Vec, a slice, a
HashMap, 0..10, ...) works on the right-hand side.
You won't memorize Rust's std library, and you don't need to.
Two things you can open in separate tabs right now:
Time to put &str and String together. Implement format_welcome_message
so it returns the string "Welcome, {name}!".
The signature already tells the story:
fn format_welcome_message(name: &str) -> String
You're handed a borrowed &str to read from, and you produce a fresh,
owned String to hand back. The intro mentions println!, but
println! prints; it returns (). The macro that builds a String
for you to return is
format!, which
uses the same {name} placeholder syntax.
Your first encounter with &str. In many languages, asking for the
"length" of a string gives you back the number of characters. In
Rust, str::len returns the number of bytes in the underlying
UTF-8 buffer, which only matches the character count for plain ASCII.
For "hello" the byte count and char count both happen to be 5, but
"café" is 5 bytes and 4 chars. Reach for chars() when you want the
character count.
Useful from the standard library
str::charsiterates over thechars of a string. The starting point for almost any character-level work.Iterator::countconsumes an iterator and returns how many items it produced.str::lenis byte length, not character count. Useful, but not what you want here.
This step is the canonical "borrowed in, owned out" pattern. The
caller hands you a cheap &str view, and you give back a brand new
String that they get to keep. You'll see this pattern over and
over in real Rust code, so it's worth getting comfortable with the
signature now.
Useful from the standard library
str::to_uppercaseandstr::to_lowercasereturn newStrings with the case changed.String::fromandstr::to_stringboth create an ownedStringfrom a&str. Use whichever reads better.
Strings aren't directly indexable in Rust (because UTF-8 characters
have varying widths), but you can iterate over their chars. A
plain for c in text.chars() loop will work, and so will the
iterator combinators like any or find, which usually express
"is there at least one ..." checks more directly.
Useful from the standard library
Iterator::anyreturnstrueif any item in the iterator matches a predicate. Stops at the first match, so it's cheap.char::is_uppercaseandchar::is_ascii_uppercaseclassify a single character. The Unicode-aware version is the safer default; the ASCII version is faster when you know the input is ASCII.
You worked with all three string types: counted UTF-8 characters
correctly, took a &str and produced a fresh String, and walked
a string character by character to answer a yes/no question.
What we learned
&stris a borrowed view into UTF-8 text;Stringis an owned, growable buffer;charis one Unicode scalar value. Functions that read take&str, functions that produce returnString.str::lenis byte length, not character count. Uses.chars().count()when you mean characters.str::chars()returns an iterator. Anything that takes an iterator works on it:for c in s.chars(),s.chars().any(...),s.chars().count(), and so on.- Case conversion (
to_uppercase,to_lowercase) returns a newString. Originals are immutable.char::is_uppercaseis the Unicode-aware classifier; theis_ascii_*family is faster when you know the input is ASCII.