py-06-numpy-distances

1.000

3/3 tests· math

Challenge · difficulty 3/5

# Pairwise distances (numpy)

Implement **`solution.py`** with:

```python
import numpy as np

def pairwise_distances(points: np.ndarray) -> np.ndarray:
    ...
```

`points` is a 2-D array of shape `(n, d)` (n points in d dimensions). Return an `(n, n)`
array where entry `[i, j]` is the Euclidean distance between point `i` and point `j`.

- The result must be a NumPy array (`np.ndarray`), symmetric, with a zero diagonal.
- Use NumPy vectorization (no Python `for` loops over point pairs).
- Example: for `[[0, 0], [3, 4]]` the result is `[[0, 5], [5, 0]]`.

tests/test_distances.py

import numpy as np
from solution import pairwise_distances


def test_simple():
    pts = np.array([[0.0, 0.0], [3.0, 4.0]])
    d = pairwise_distances(pts)
    assert isinstance(d, np.ndarray)
    assert d.shape == (2, 2)
    np.testing.assert_allclose(d, [[0.0, 5.0], [5.0, 0.0]], atol=1e-9)


def test_symmetric_zero_diagonal():
    rng = np.random.default_rng(0)
    pts = rng.standard_normal((6, 3))
    d = pairwise_distances(pts)
    assert d.shape == (6, 6)
    np.testing.assert_allclose(d, d.T, atol=1e-9)
    np.testing.assert_allclose(np.diag(d), np.zeros(6), atol=1e-9)


def test_matches_bruteforce():
    rng = np.random.default_rng(42)
    pts = rng.standard_normal((5, 4))
    d = pairwise_distances(pts)
    for i in range(5):
        for j in range(5):
            expected = np.sqrt(((pts[i] - pts[j]) ** 2).sum())
            assert abs(d[i, j] - expected) < 1e-9

System prompt

You are an expert programmer. Solve the task exactly as specified. Output your solution as fenced code blocks using the required file name(s) and the exact function/type signatures requested. Prefer correctness; do not include prose outside code unless asked.

Proposed solution

```python
import numpy as np

def pairwise_distances(points: np.ndarray) -> np.ndarray:
    """
    Compute pairwise Euclidean distances between all points.

    Args:
        points: 2-D array of shape (n, d) containing n points in d dimensions.

    Returns:
        2-D array of shape (n, n) where entry [i, j] is the Euclidean distance
        between point i and point j.
    """
    # Use broadcasting to compute differences between all pairs of points
    # points[:, None, :] has shape (n, 1, d)
    # points[None, :, :] has shape (1, n, d)
    # Their difference has shape (n, n, d)
    diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
    
    # Compute squared Euclidean distances by summing over the feature dimension
    squared_distances = np.sum(diff ** 2, axis=2)
    
    # Take square root to get Euclidean distances
    return np.sqrt(squared_distances)
```

Test output (stdout)

...                                                                      [100%]
3 passed in 0.07s

Reasoning (chain-of-thought)

The task is to implement a function that computes pairwise Euclidean distances between all points in a 2D array.

Given:
- Input: `points` of shape `(n, d)` - n points in d dimensions
- Output: `(n, n)` array where `[i, j]` is the Euclidean distance between point i and point j

The Euclidean distance between two points is:
√((x1-x2)² + (y1-y2)² + ...)

For vectorized computation without loops, I can use:
1. Compute the squared differences between all pairs
2. Sum them up
3. Take the square root

One efficient approach using NumPy broadcasting:
- For each point i, compute the squared distance to all other points
- This can be done by computing ||x_i - x_j||² for all i, j

A common vectorized approach:
```python
# Compute squared distances
# For each point, we want ||x_i - x_j||² = ||x_i||² + ||x_j||² - 2*x_i·x_j
```

Let me think about the broadcasting:
- `points` has shape `(n, d)`
- We want to compute distances between all pairs

One approach:
```python
# Expand dimensions for broadcasting
# points[:, None, :] has shape (n, 1, d)
# points[None, :, :] has shape (1, n, d)
# Subtract: points[:, None, :] - points[None, :, :] has shape (n, n, d)
# Square and sum over d, then sqrt
```

Let me verify with the example:
- points = [[0, 0], [3, 4]]
- shape = (2, 2)
- points[:, None, :] = [[0, 0], [3, 4]] with new axis = [[0, 0], [3, 4]] but with shape (2, 1, 2)


- points[None, :, :] = [[0, 0], [3, 4]] with new axis = [[0, 0], [3, 4]] but with shape (1, 2, 2)

When I subtract these, I get:
- [[0-0, 0-0], [3-0, 4-0]] = [[0, 0], [3, 4]] for row 0
- [[0-3, 0-4], [3-3, 4-4]] = [[-3, -4], [0, 0]] for row 1

Squaring and summing gives the squared distances, then taking the square root yields the final pairwise distances.

Actually, I need to reconsider the broadcasting. With `points[:, None, :]` having shape (2, 1, 2) and `points[None, :, :]` having shape (1, 2, 2), the subtraction produces (2, 2, 2). Summing over the last axis gives (2, 2), which is what I want.

Let me verify this works: for points [[0, 0], [3, 4]], the differences are [[0, 0], [3, 4]] and [[-3, -4], [0, 0]]. Squaring and summing gives [[0, 0], [9, 16]] and [[9, 16], [0, 0]], which after sqrt becomes [[0, 5], [5, 0]].

I should also consider using `np.einsum` for potentially better performance, though the broadcasting approach is clearer and more readable.

```python
import numpy as np

def pairwise_distances(points: np.ndarray) -> np.ndarray:
    diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
    return np.sqrt(np.sum(diff**2, axis=2))
```

This creates the difference matrix with shape (n, n, d), squares each element, sums across the feature dimension, and takes the square root to get the final distances.