Chapter 19

Word frequencies

Time to extend our running word-count example. Back in the word count chapter you built word_count, char_count, and longest_word with simple for loops. The iterators chapter then showed how iterators collapse those into one-liners. This chapter goes one level deeper: instead of asking how many words a text has, we'll ask which words appear and how often each one shows up.

There's no big new concept here either. It puts iterators, hashmaps, and Option to work together. Along the way you'll meet two new iterator tricks (max_by_key and HashMap::into_iter); the rest is just applying what's already in your toolbox.

A few patterns you'll likely use:

Splitting text into words. Both split_whitespace and split return iterators of &str. The first handles any kind of whitespace and skips empties, which is usually what you want for natural text:

for word in "hello  world\nrust".split_whitespace() {
    println!("{word}"); // hello, world, rust
}

Counting things into a HashMap. Reach for entry(...).or_insert(0):

let mut counts: HashMap<String, usize> = HashMap::new();
for word in text.split_whitespace() {
    *counts.entry(word.to_lowercase()).or_insert(0) += 1;
}

Finding the maximum by some property. max_by_key is the right tool for "give me the entry with the largest count":

let top = counts.iter().max_by_key(|(_, count)| *count);
// top: Option<(&String, &usize)>

Computing an average. Sum the lengths, divide by the count, watch out for the integer-division trap:

let total_chars: usize = words.iter().map(|w| w.len()).sum();
let avg = total_chars as f64 / words.len() as f64;

Counting words

The foundation for everything else in this chapter: take a string of text and produce a HashMap<String, usize> that maps each word to how many times it appears. Words are separated by whitespace and the count should be case-insensitive: "Hello" and "hello" are the same word.

The classic recipe is: split on whitespace, lowercase each piece, then walk the resulting iterator and bump a counter in the map. The entry API on HashMap is the idiomatic way to do that last step: *map.entry(key).or_insert(0) += 1.

Useful from the standard library

str::split_whitespace splits on any whitespace and skips empty pieces. Almost always what you want for word splitting.

str::to_lowercase returns a fresh String. Use it as the map key so Hello and hello collapse together.

HashMap::entry + Entry::or_insert is the "look up; insert default; mutate" pattern from the hashmaps chapter.

Exercise 1 of 3

Open in Web Editor

Reveal the full solution Spoiler: the complete answer

use std::collections::HashMap;

/// Counts how many times each word appears in the text.
/// Words are separated by spaces and should be case-insensitive.
fn count_words(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();
    for word in text.split_whitespace() {
        *counts.entry(word.to_lowercase()).or_insert(0) += 1;
    }
    counts
}

#[test]
fn test_count_words() {
    let text = "hello world hello rust world";
    let counts = count_words(text);
    assert_eq!(counts.get("hello"), Some(&2));
    assert_eq!(counts.get("world"), Some(&2));
    assert_eq!(counts.get("rust"), Some(&1));
}

#[test]
fn test_count_words_case_insensitive() {
    let text = "Hello HELLO hello";
    let counts = count_words(text);
    assert_eq!(counts.get("hello"), Some(&3));
}

The most common word

Now that you can count, finding the maximum is a one-liner, almost. The borrow checker has an opinion about returning data out of a HashMap, and that's the real lesson of this step.

count_words is duplicated below as a todo!() stub so this step compiles in isolation; you don't need to fill it in again. Focus on most_common_word. Once you have it, the test will drive both through unwrap().

Useful from the standard library

HashMap::into_iter consumes the map and yields owned (K, V) pairs. That's how you get an owned String out without cloning.

Iterator::max_by_key returns the entry with the largest derived key as an Option. max_by_key(|(_, count)| *count) does the trick here.

An empty input naturally produces None: count_words returns an empty map, into_iter().max_by_key(...) returns None, and the function signature already says Option<(String, usize)>. No special case needed.

Exercise 2 of 3

Open in Web Editor

use std::collections::HashMap;

/// Counts how many times each word appears in the text.
/// Words are separated by spaces and should be case-insensitive.
fn count_words(text: &str) -> HashMap<String, usize> {
    todo!()
}

/// Finds the most common word in the text.
/// Returns the word and its count, or None if text is empty.
///
/// Tip: this is the function where the borrow checker pushes back. To
/// return `(String, usize)` you need to own the key, but `iter()` on
/// a `HashMap` only hands out borrows. The trick is
/// [`into_iter`](https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.into_iter):
/// it consumes the map and yields `(K, V)` pairs by value, so combining
/// it with `max_by_key` gives you back an owned `(String, usize)`.
fn most_common_word(text: &str) -> Option<(String, usize)> {
    // Use count_words() then find the max by count
    todo!()
}

#[test]
fn test_most_common_word() {
    let text = "apple banana apple cherry apple";
    let (word, count) = most_common_word(text).unwrap();
    assert_eq!(word, "apple");
    assert_eq!(count, 3);
}

Reveal the full solution Spoiler: the complete answer

use std::collections::HashMap;

/// Counts how many times each word appears in the text.
/// Words are separated by spaces and should be case-insensitive.
fn count_words(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();
    for word in text.split_whitespace() {
        *counts.entry(word.to_lowercase()).or_insert(0) += 1;
    }
    counts
}

/// Finds the most common word in the text.
/// Returns the word and its count, or None if text is empty.
///
/// Tip: this is the function where the borrow checker pushes back. To
/// return `(String, usize)` you need to own the key, but `iter()` on
/// a `HashMap` only hands out borrows. The trick is
/// [`into_iter`](https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.into_iter):
/// it consumes the map and yields `(K, V)` pairs by value, so combining
/// it with `max_by_key` gives you back an owned `(String, usize)`.
fn most_common_word(text: &str) -> Option<(String, usize)> {
    count_words(text)
        .into_iter()
        .max_by_key(|(_, count)| *count)
}

use std::collections::HashMap;

/// Counts how many times each word appears in the text.
/// Words are separated by spaces and should be case-insensitive.
fn count_words(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();
    for word in text.split_whitespace() {
        *counts.entry(word.to_lowercase()).or_insert(0) += 1;
    }
    counts
}

/// Finds the most common word in the text.
/// Returns the word and its count, or None if text is empty.
///
/// Tip: this is the function where the borrow checker pushes back. To
/// return `(String, usize)` you need to own the key, but `iter()` on
/// a `HashMap` only hands out borrows. The trick is
/// [`into_iter`](https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.into_iter):
/// it consumes the map and yields `(K, V)` pairs by value, so combining
/// it with `max_by_key` gives you back an owned `(String, usize)`.
fn most_common_word(text: &str) -> Option<(String, usize)> {
    count_words(text)
        .into_iter()
        .max_by_key(|(_, count)| *count)
}

#[test]
fn test_most_common_word() {
    let text = "apple banana apple cherry apple";
    let (word, count) = most_common_word(text).unwrap();
    assert_eq!(word, "apple");
    assert_eq!(count, 3);
}

Text statistics

The orchestrator step, and the one with the most aggregations in a single body. text_stats returns three numbers about a piece of text: total word count, number of unique words, and the average word length as an f64. You can compute all three from a single pass over count_words's result, or split the work; either is fine.

The average is where Rust makes you slow down. Integer division truncates, so cast to f64 before you divide, not after. And once the result is a float, the test can't check it with ==: floats don't land on exact values, so it compares against a small tolerance instead.

count_words is stubbed with todo!() again so this file compiles on its own. Wire text_stats up however you like. The test only cares about the returned tuple.

Useful from the standard library

The total word count is the sum of every value in the map: counts.values().sum::<usize>().

The unique-word count is counts.len().

For the average length, sum key.chars().count() * count across the map (or sum word.len() straight from a fresh text.split_whitespace() pass) and divide by the total. Watch the integer-division trap: cast both operands to f64 before the divide.

HashMap::values and HashMap::iter are the two iterator entry points you'll likely use here.

Exercise 3 of 3

Open in Web Editor

use std::collections::HashMap;

/// Counts how many times each word appears in the text.
/// Words are separated by spaces and should be case-insensitive.
fn count_words(text: &str) -> HashMap<String, usize> {
    todo!()
}

#[test]
fn test_text_stats() {
    let text = "hello world rust";
    let (total, unique, avg_len) = text_stats(text);
    assert_eq!(total, 3);
    assert_eq!(unique, 3);
    assert!((avg_len - 4.66).abs() < 0.01); // Average length ≈ 4.66
    // Side note: floats don't compare exactly (the value here is
    // really 14/3 = 4.666...), so we check that we're close enough
    // by taking the absolute difference and comparing to a tolerance.
    // Direct `==` on `f64` is almost always the wrong thing.
}

Reveal the full solution Spoiler: the complete answer

use std::collections::HashMap;

/// Calculates basic text statistics.
/// Returns (`total_words`, `unique_words`, `average_word_length`).
///
/// In real code you'd reach for a `struct TextStats { total: usize,
/// unique: usize, avg_len: f64 }` here; a 3-tuple is hard to read at
/// the call site. We're sticking with a tuple to keep the focus on the
/// iterator chain in the body.
fn text_stats(text: &str) -> (usize, usize, f64) {
    let counts = count_words(text);
    let total: usize = counts.values().sum();
    let unique = counts.len();
    let total_length: usize = counts
        .iter()
        .map(|(word, count)| word.chars().count() * count)
        .sum();
    let average_word_length = total_length as f64 / total as f64;
    (total, unique, average_word_length)
}

use std::collections::HashMap;

/// Counts how many times each word appears in the text.
/// Words are separated by spaces and should be case-insensitive.
fn count_words(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();
    for word in text.split_whitespace() {
        *counts.entry(word.to_lowercase()).or_insert(0) += 1;
    }
    counts
}

/// Calculates basic text statistics.
/// Returns (`total_words`, `unique_words`, `average_word_length`).
///
/// In real code you'd reach for a `struct TextStats { total: usize,
/// unique: usize, avg_len: f64 }` here; a 3-tuple is hard to read at
/// the call site. We're sticking with a tuple to keep the focus on the
/// iterator chain in the body.
fn text_stats(text: &str) -> (usize, usize, f64) {
    let counts = count_words(text);
    let total: usize = counts.values().sum();
    let unique = counts.len();
    let total_length: usize = counts
        .iter()
        .map(|(word, count)| word.chars().count() * count)
        .sum();
    let average_word_length = total_length as f64 / total as f64;
    (total, unique, average_word_length)
}

#[test]
fn test_text_stats() {
    let text = "hello world rust";
    let (total, unique, avg_len) = text_stats(text);
    assert_eq!(total, 3);
    assert_eq!(unique, 3);
    assert!((avg_len - 4.66).abs() < 0.01); // Average length ≈ 4.66
    // Side note: floats don't compare exactly (the value here is
    // really 14/3 = 4.666...), so we check that we're close enough
    // by taking the absolute difference and comparing to a tolerance.
    // Direct `==` on `f64` is almost always the wrong thing.
}

Wrapping up the word counter

You glued together the chapters so far: a HashMap keyed by lowercased words, an into_iter() to escape the borrow checker, a max_by_key to pick a winner, and a few aggregations to compute summary stats.

What we learned

split_whitespace() is the right default for word-splitting in natural text. It collapses runs of whitespace and skips empties.

Lowercasing keys (or any other normalization step) belongs to the same pipeline that builds the map, not to the consumer side.

into_iter on a HashMap is the standard escape hatch when you need to return owned data out of it. iter only hands out borrows.

max_by_key returns an Option, so empty input naturally collapses to None without a special-case branch.

Watch the integer-division trap when computing averages: divide after casting to f64, not before. f64 comparisons need a tolerance ((a - b).abs() < eps); never ==.

Tuples like (usize, usize, f64) work for tiny ad-hoc returns, but a named struct (TextStats { total, unique, avg_len }) reads better at the call site as soon as a function takes off in scope.

An optional detour

You now have every tool you need to build a small program from scratch: structs, enums, iterators, Option, Result, vectors, and strings. There's an optional Creative Break chapter, an open-ended password validator project rather than a guided lesson. It isn't part of the main sequence and nothing later depends on it, so take it whenever you want a change of pace, or skip straight ahead.

Next chapter 20Modules and visibility

Word frequencies

Counting words

Useful from the standard library

Results

The most common word

Useful from the standard library

Results

Text statistics

Useful from the standard library

Results

Wrapping up the word counter

What we learned

An optional detour