go-02-word-frequency

1.000

2/2 tests· data

Challenge · difficulty 2/5

# Word frequency

Implement **`solution.go`** in `package challenge` exporting:

```go
func WordFrequency(text string) map[string]int
```

Count how many times each word occurs in `text` and return the counts in a map.

Rules:

- Split `text` into tokens on **whitespace** (spaces, tabs, newlines).
- For each token, strip any **surrounding ASCII punctuation** (leading and trailing).
  Punctuation in the middle of a token is kept.
- **Lowercase** each word before counting.
- If, after stripping, a token is empty, skip it (do not count an empty string).
- For empty or whitespace-only input, return an **empty, non-nil** map (length 0).

ASCII punctuation is the set of characters for which Go's `unicode.IsPunct` returns true
together with symbols such as `+`, `<`, `=`, etc. For this challenge, treat a byte as
"punctuation to strip" when it is an ASCII byte that is **not** a letter or digit.

Examples:

- `WordFrequency("the cat sat on the mat")` →
  `{"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}`
- `WordFrequency("Hello, hello! HELLO.")` → `{"hello": 3}`
- `WordFrequency("don't stop")` → `{"don't": 1, "stop": 1}` (interior apostrophe kept)
- `WordFrequency("   ")` → `{}` (empty, non-nil map)

tests/solution_test.go

package challenge

import (
	"reflect"
	"testing"
)

func TestWordFrequency(t *testing.T) {
	cases := []struct {
		name string
		in   string
		want map[string]int
	}{
		{
			name: "simple repeats",
			in:   "the cat sat on the mat",
			want: map[string]int{"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1},
		},
		{
			name: "punctuation and case",
			in:   "Hello, hello! HELLO.",
			want: map[string]int{"hello": 3},
		},
		{
			name: "interior apostrophe kept",
			in:   "don't stop don't",
			want: map[string]int{"don't": 2, "stop": 1},
		},
		{
			name: "tabs and newlines as whitespace",
			in:   "a\tb\nc a",
			want: map[string]int{"a": 2, "b": 1, "c": 1},
		},
		{
			name: "leading and trailing punctuation stripped",
			in:   "(go) [go]; {GO}",
			want: map[string]int{"go": 3},
		},
		{
			name: "digits are words",
			in:   "42 42 forty-two",
			want: map[string]int{"42": 2, "forty-two": 1},
		},
		{
			name: "token that is only punctuation is skipped",
			in:   "hi --- !!! bye",
			want: map[string]int{"hi": 1, "bye": 1},
		},
		{
			name: "empty input",
			in:   "",
			want: map[string]int{},
		},
		{
			name: "whitespace only",
			in:   "   \t \n  ",
			want: map[string]int{},
		},
	}

	for _, c := range cases {
		t.Run(c.name, func(t *testing.T) {
			got := WordFrequency(c.in)
			if got == nil {
				t.Fatalf("WordFrequency(%q) returned nil map, want non-nil", c.in)
			}
			if !reflect.DeepEqual(got, c.want) {
				t.Errorf("WordFrequency(%q) = %v, want %v", c.in, got, c.want)
			}
		})
	}
}

func TestWordFrequencyEmptyIsNonNil(t *testing.T) {
	got := WordFrequency("")
	if got == nil {
		t.Fatal("WordFrequency(\"\") = nil, want non-nil empty map")
	}
	if len(got) != 0 {
		t.Errorf("WordFrequency(\"\") = %v, want empty map", got)
	}
}

System prompt

You are an expert programmer. Solve the task exactly as specified. Output your solution as fenced code blocks using the required file name(s) and the exact function/type signatures requested. Prefer correctness; do not include prose outside code unless asked.

Proposed solution

```go
package challenge

import (
	"strings"
)

func WordFrequency(text string) map[string]int {
	if text == "" {
		return make(map[string]int)
	}

	words := make(map[string]int)
	tokens := strings.Fields(text)

	for _, token := range tokens {
		word := stripPunctuation(token)
		if word == "" {
			continue
		}
		word = strings.ToLower(word)
		words[word]++
	}

	return words
}

func stripPunctuation(token string) string {
	start := 0
	end := len(token)

	for start < end && isPunctuation(token[start]) {
		start++
	}

	for end > start && isPunctuation(token[end-1]) {
		end--
	}

	return token[start:end]
}

func isPunctuation(b byte) bool {
	return !isLetterOrDigit(b)
}

func isLetterOrDigit(b byte) bool {
	return (b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') || (b >= '0' && b <= '9')
}
```

Test output (stdout)

ped","Elapsed":0}
{"Time":"2026-07-02T04:33:59.185451271+02:00","Action":"run","Package":"challenge","Test":"TestWordFrequency/digits_are_words"}
{"Time":"2026-07-02T04:33:59.185452331+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/digits_are_words","Output":"=== RUN   TestWordFrequency/digits_are_words\n"}
{"Time":"2026-07-02T04:33:59.185453711+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/digits_are_words","Output":"--- PASS: TestWordFrequency/digits_are_words (0.00s)\n"}
{"Time":"2026-07-02T04:33:59.185454661+02:00","Action":"pass","Package":"challenge","Test":"TestWordFrequency/digits_are_words","Elapsed":0}
{"Time":"2026-07-02T04:33:59.185457751+02:00","Action":"run","Package":"challenge","Test":"TestWordFrequency/token_that_is_only_punctuation_is_skipped"}
{"Time":"2026-07-02T04:33:59.185458831+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/token_that_is_only_punctuation_is_skipped","Output":"=== RUN   TestWordFrequency/token_that_is_only_punctuation_is_skipped\n"}
{"Time":"2026-07-02T04:33:59.185461041+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/token_that_is_only_punctuation_is_skipped","Output":"--- PASS: TestWordFrequency/token_that_is_only_punctuation_is_skipped (0.00s)\n"}
{"Time":"2026-07-02T04:33:59.185462061+02:00","Action":"pass","Package":"challenge","Test":"TestWordFrequency/token_that_is_only_punctuation_is_skipped","Elapsed":0}
{"Time":"2026-07-02T04:33:59.185463021+02:00","Action":"run","Package":"challenge","Test":"TestWordFrequency/empty_input"}
{"Time":"2026-07-02T04:33:59.185464891+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/empty_input","Output":"=== RUN   TestWordFrequency/empty_input\n"}
{"Time":"2026-07-02T04:33:59.185466351+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/empty_input","Output":"--- PASS: TestWordFrequency/empty_input (0.00s)\n"}
{"Time":"2026-07-02T04:33:59.185467351+02:00","Action":"pass","Package":"challenge","Test":"TestWordFrequency/empty_input","Elapsed":0}
{"Time":"2026-07-02T04:33:59.185468371+02:00","Action":"run","Package":"challenge","Test":"TestWordFrequency/whitespace_only"}
{"Time":"2026-07-02T04:33:59.185469241+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/whitespace_only","Output":"=== RUN   TestWordFrequency/whitespace_only\n"}
{"Time":"2026-07-02T04:33:59.185470351+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency/whitespace_only","Output":"--- PASS: TestWordFrequency/whitespace_only (0.00s)\n"}
{"Time":"2026-07-02T04:33:59.185471471+02:00","Action":"pass","Package":"challenge","Test":"TestWordFrequency/whitespace_only","Elapsed":0}
{"Time":"2026-07-02T04:33:59.185472681+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequency","Output":"--- PASS: TestWordFrequency (0.00s)\n"}
{"Time":"2026-07-02T04:33:59.185474011+02:00","Action":"pass","Package":"challenge","Test":"TestWordFrequency","Elapsed":0}
{"Time":"2026-07-02T04:33:59.185475791+02:00","Action":"run","Package":"challenge","Test":"TestWordFrequencyEmptyIsNonNil"}
{"Time":"2026-07-02T04:33:59.185476891+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequencyEmptyIsNonNil","Output":"=== RUN   TestWordFrequencyEmptyIsNonNil\n"}
{"Time":"2026-07-02T04:33:59.185478501+02:00","Action":"output","Package":"challenge","Test":"TestWordFrequencyEmptyIsNonNil","Output":"--- PASS: TestWordFrequencyEmptyIsNonNil (0.00s)\n"}
{"Time":"2026-07-02T04:33:59.185479561+02:00","Action":"pass","Package":"challenge","Test":"TestWordFrequencyEmptyIsNonNil","Elapsed":0}
{"Time":"2026-07-02T04:33:59.185480621+02:00","Action":"output","Package":"challenge","Output":"PASS\n"}
{"Time":"2026-07-02T04:33:59.185680311+02:00","Action":"output","Package":"challenge","Output":"ok  \tchallenge\t0.001s\n"}
{"Time":"2026-07-02T04:33:59.185698891+02:00","Action":"pass","Package":"challenge","Elapsed":0.001}

Reasoning (chain-of-thought)

The task is to implement a `WordFrequency` function in Go that:

1. Splits text into tokens on whitespace (spaces, tabs, newlines)
2. Strips surrounding ASCII punctuation from each token
3. Lowercases each word before counting
4. Skips empty tokens after stripping
5. Returns an empty, non-nil map for empty/whitespace-only input

For punctuation, I need to treat ASCII bytes that are not letters or digits as punctuation to strip. This is slightly different from `unicode.IsPunct` - I should check if a byte is ASCII and not alphanumeric.

Let me think about the implementation:

1. Split by whitespace - I can use `strings.Fields()` which splits on any Unicode whitespace
2. For each token, strip leading and trailing punctuation (ASCII non-alphanumeric)
3. Convert to lowercase
4. Skip empty strings
5. Count in a map

For stripping punctuation, I need to:
- Remove leading characters that are ASCII and not letters or digits
- Remove trailing characters that are ASCII and not letters or digits

Let me write the code:

```go
package challenge

import (
	"strings"
	"unicode"
)

func WordFrequency(text string) map[string]int {
	if text == "" {
		return make(map[string]int)
	}
	
	words := make(map[string]int)
	tokens := strings.Fields(text)
	
	for _, token := range tokens {
		// Strip leading punctuation
		start := 0
		for start < len(token) && isPunctuation(token[start]) {
			start++
		}
		
		// Strip trailing punctuation
		end := len(token)
		for end > start && isPunctuation(token[end-1]) {
			end--
		}
		
		// Extract the word
		word := token[start:end]
		
		// Skip empty words
		if word == "" {
			continue
		}
		
		// Lowercase
		word = strings.ToLower(word)
		
		words[word]++
	}
	
	return words
}

func isPunctuation(b byte) bool {
	// ASCII punctuation: not a letter or digit
	return !isLetterOrDigit(b)
}

func isLetterOrDigit(b byte) bool {
	return (b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') || (b >= '0' && b <= '9')
}
```

I'm reconsidering the whitespace splitting approach. The requirement specifies splitting on whitespace characters, which `strings.Fields()` handles correctly.

For punctuation stripping, I need to be precise about what constitutes punctuation. The problem defines it as ASCII bytes that aren't letters or digits, which means I should check each character carefully.

The implementation looks solid for handling word frequency with these specific rules. I'll verify the edge cases work correctly, especially with punctuation and whitespace.