py-06-numpy-distances
1.000
Challenge · difficulty 3/5
# Pairwise distances (numpy)
Implement **`solution.py`** with:
```python
import numpy as np
def pairwise_distances(points: np.ndarray) -> np.ndarray:
...
```
`points` is a 2-D array of shape `(n, d)` (n points in d dimensions). Return an `(n, n)`
array where entry `[i, j]` is the Euclidean distance between point `i` and point `j`.
- The result must be a NumPy array (`np.ndarray`), symmetric, with a zero diagonal.
- Use NumPy vectorization (no Python `for` loops over point pairs).
- Example: for `[[0, 0], [3, 4]]` the result is `[[0, 5], [5, 0]]`.
tests/test_distances.py
import numpy as np
from solution import pairwise_distances
def test_simple():
pts = np.array([[0.0, 0.0], [3.0, 4.0]])
d = pairwise_distances(pts)
assert isinstance(d, np.ndarray)
assert d.shape == (2, 2)
np.testing.assert_allclose(d, [[0.0, 5.0], [5.0, 0.0]], atol=1e-9)
def test_symmetric_zero_diagonal():
rng = np.random.default_rng(0)
pts = rng.standard_normal((6, 3))
d = pairwise_distances(pts)
assert d.shape == (6, 6)
np.testing.assert_allclose(d, d.T, atol=1e-9)
np.testing.assert_allclose(np.diag(d), np.zeros(6), atol=1e-9)
def test_matches_bruteforce():
rng = np.random.default_rng(42)
pts = rng.standard_normal((5, 4))
d = pairwise_distances(pts)
for i in range(5):
for j in range(5):
expected = np.sqrt(((pts[i] - pts[j]) ** 2).sum())
assert abs(d[i, j] - expected) < 1e-9
System prompt
You are an expert programmer. Solve the task exactly as specified. Output your solution as fenced code blocks using the required file name(s) and the exact function/type signatures requested. Prefer correctness; do not include prose outside code unless asked.
Proposed solution
```python
import numpy as np
def pairwise_distances(points: np.ndarray) -> np.ndarray:
"""
Compute pairwise Euclidean distances between all points.
Args:
points: 2-D array of shape (n, d) containing n points in d dimensions.
Returns:
2-D array of shape (n, n) where entry [i, j] is the Euclidean distance
between point i and point j.
"""
# Use broadcasting to compute differences between all pairs of points
# points[:, None, :] has shape (n, 1, d)
# points[None, :, :] has shape (1, n, d)
# Their difference has shape (n, n, d)
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
# Compute squared Euclidean distances by summing over the feature dimension
squared_distances = np.sum(diff ** 2, axis=2)
# Take square root to get Euclidean distances
return np.sqrt(squared_distances)
```Test output (stdout)
... [100%] 3 passed in 0.07s
Reasoning (chain-of-thought)
The task is to implement a function that computes pairwise Euclidean distances between all points in a 2D array.
Given:
- Input: `points` of shape `(n, d)` - n points in d dimensions
- Output: `(n, n)` array where `[i, j]` is the Euclidean distance between point i and point j
The Euclidean distance between two points is:
√((x1-x2)² + (y1-y2)² + ...)
For vectorized computation without loops, I can use:
1. Compute the squared differences between all pairs
2. Sum them up
3. Take the square root
One efficient approach using NumPy broadcasting:
- For each point i, compute the squared distance to all other points
- This can be done by computing ||x_i - x_j||² for all i, j
A common vectorized approach:
```python
# Compute squared distances
# For each point, we want ||x_i - x_j||² = ||x_i||² + ||x_j||² - 2*x_i·x_j
```
Let me think about the broadcasting:
- `points` has shape `(n, d)`
- We want to compute distances between all pairs
One approach:
```python
# Expand dimensions for broadcasting
# points[:, None, :] has shape (n, 1, d)
# points[None, :, :] has shape (1, n, d)
# Subtract: points[:, None, :] - points[None, :, :] has shape (n, n, d)
# Square and sum over d, then sqrt
```
Let me verify with the example:
- points = [[0, 0], [3, 4]]
- shape = (2, 2)
- points[:, None, :] = [[0, 0], [3, 4]] with new axis = [[0, 0], [3, 4]] but with shape (2, 1, 2)
- points[None, :, :] = [[0, 0], [3, 4]] with new axis = [[0, 0], [3, 4]] but with shape (1, 2, 2)
When I subtract these, I get:
- [[0-0, 0-0], [3-0, 4-0]] = [[0, 0], [3, 4]] for row 0
- [[0-3, 0-4], [3-3, 4-4]] = [[-3, -4], [0, 0]] for row 1
Squaring and summing gives the squared distances, then taking the square root yields the final pairwise distances.
Actually, I need to reconsider the broadcasting. With `points[:, None, :]` having shape (2, 1, 2) and `points[None, :, :]` having shape (1, 2, 2), the subtraction produces (2, 2, 2). Summing over the last axis gives (2, 2), which is what I want.
Let me verify this works: for points [[0, 0], [3, 4]], the differences are [[0, 0], [3, 4]] and [[-3, -4], [0, 0]]. Squaring and summing gives [[0, 0], [9, 16]] and [[9, 16], [0, 0]], which after sqrt becomes [[0, 5], [5, 0]].
I should also consider using `np.einsum` for potentially better performance, though the broadcasting approach is clearer and more readable.
```python
import numpy as np
def pairwise_distances(points: np.ndarray) -> np.ndarray:
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
return np.sqrt(np.sum(diff**2, axis=2))
```
This creates the difference matrix with shape (n, n, d), squares each element, sums across the feature dimension, and takes the square root to get the final distances.