← run

algo-greedy-01

1.000
16/16 tests· algorithms
Challenge · difficulty 4/5
# Freelance gig scheduling for maximum payout

You are a freelancer planning your calendar. There is a queue of one-day gigs
on offer. Each gig `i` has:

- a **deadline** `d_i` (a positive integer) — the gig is only worth anything if
  you complete it on or before day `d_i`, and
- a **payout** `p_i` (a non-negative integer) — what you earn if you complete it
  in time.

Days are numbered `1, 2, 3, ...`. Every gig takes **exactly one full day**, and
you can work on **at most one gig per day**. You may accept any subset of the
gigs and choose which day to do each accepted gig, as long as no two accepted
gigs share a day and each accepted gig is done on some day `<= its deadline`.
Gigs you don't accept earn nothing.

Implement a file **`solution.py`** containing:

```python
def max_payout(gigs: list[tuple[int, int]]) -> int:
    """Return the maximum total payout achievable.

    `gigs` is a list of (deadline, payout) pairs. Return an int.
    """
```

## What to return

Return the **maximum total payout** you can earn with a valid schedule. You do
**not** need to return the schedule itself — only the best achievable total.

## Why this is subtle

A tempting greedy — "sort gigs by payout, and place each accepted gig on the
earliest free day within its deadline" — is **wrong**. Consider two gigs:

- gig A: deadline 2, payout 100
- gig B: deadline 1, payout 50

The correct answer is **150**: do A on day 2 and B on day 1. But if you place
the higher-payout gig A on the *earliest* free day (day 1), you occupy the only
day B could ever use, and you're stuck with just 100. The fix is an
exchange-argument insight: when you accept a gig, reserve it as **late** as its
deadline allows, keeping earlier days open for gigs with tighter deadlines.
Likewise, sorting purely by deadline (or accepting gigs first-come) can force
you to keep a cheap gig over a strictly better one competing for the same day.

## Constraints

- `0 <= len(gigs)`; large instances (up to `2 * 10^5` gigs) are tested, so an
  `O(n^2)` day-by-day scan will be too slow — aim for roughly `O(n log n)`.
- `1 <= d_i` and `0 <= p_i`, each up to about `10^9`. Deadlines may be far
  larger than the number of gigs.
- The empty gig list returns `0`.

## Examples

```python
assert max_payout([]) == 0
assert max_payout([(1, 42)]) == 42
assert max_payout([(2, 100), (1, 50)]) == 150          # place the rich gig late
assert max_payout([(1, 1), (1, 100)]) == 100           # one day-1 slot, keep 100
assert max_payout([(2, 100), (1, 19), (2, 27), (1, 25), (3, 15)]) == 142
assert max_payout([(1000, 5), (1000, 6), (1000, 7)]) == 18  # all fit
```
tests/test_gig_scheduler.py
import heapq
import itertools
import random

from solution import max_payout


# ----------------------------------------------------------------------------
# Independent oracles used to check the solution on random / large inputs.
# ----------------------------------------------------------------------------

def heap_oracle(gigs):
    """Classic min-heap greedy for job sequencing with deadlines.

    Process gigs in increasing deadline order, keep a min-heap of accepted
    payouts, and whenever more gigs are accepted than the current deadline
    allows, drop the smallest-payout accepted gig. The heap's sum is optimal.
    This is a different implementation from the reference (which uses a
    disjoint-set over day-slots), so agreement is a strong correctness signal.
    """
    heap = []
    for d, p in sorted(gigs, key=lambda x: x[0]):
        heapq.heappush(heap, p)
        if len(heap) > d:
            heapq.heappop(heap)
    return sum(heap)


def brute_force(gigs):
    """Exhaustive optimum for tiny inputs.

    Try every subset of gigs; a subset is schedulable iff we can match each
    gig to a distinct day <= its deadline (checked by the standard earliest
    -deadline-first feasibility greedy). Only feasible for very small n.
    """
    n = len(gigs)
    best = 0
    for r in range(n + 1):
        for subset in itertools.combinations(range(n), r):
            chosen = sorted(subset, key=lambda i: gigs[i][0])
            day = 0
            ok = True
            for i in chosen:
                day += 1
                if day > gigs[i][0]:
                    ok = False
                    break
            if ok:
                best = max(best, sum(gigs[i][1] for i in subset))
    return best


# ----------------------------------------------------------------------------
# Hand-verified small cases.
# ----------------------------------------------------------------------------

def test_empty():
    assert max_payout([]) == 0


def test_single_gig_within_deadline():
    assert max_payout([(1, 42)]) == 42
    assert max_payout([(5, 7)]) == 7


def test_two_gigs_no_conflict():
    assert max_payout([(1, 10), (2, 20)]) == 30


def test_latest_slot_subtlety():
    # The exchange-argument crux: the high-payout gig has the LATER deadline.
    # Placing it at its latest feasible day (day 2) leaves day 1 free for the
    # tight-deadline gig, yielding 150. A greedy that places the high-payout
    # gig at the EARLIEST free day (day 1) blocks the other gig -> only 100.
    assert max_payout([(2, 100), (1, 50)]) == 150


def test_must_prefer_higher_payout_on_tie_deadline():
    assert max_payout([(1, 1), (1, 100)]) == 100
    assert max_payout([(1, 100), (1, 1)]) == 100


def test_classic_five_gig_instance():
    gigs = [(2, 100), (1, 19), (2, 27), (1, 25), (3, 15)]
    assert max_payout(gigs) == 142


def test_four_gig_instance():
    assert max_payout([(1, 10), (2, 10), (2, 15), (1, 20)]) == 35


def test_all_same_deadline_one():
    assert max_payout([(1, 3), (1, 9), (1, 4), (1, 2)]) == 9


def test_deadlines_far_beyond_count():
    assert max_payout([(1000, 5), (1000, 6), (1000, 7)]) == 18


def test_greedy_by_deadline_alone_is_wrong():
    gigs = [(1, 5), (2, 6), (2, 7), (3, 4), (1, 20), (3, 3)]
    assert max_payout(gigs) == brute_force(gigs)


def test_zero_payout_gigs_are_never_harmful():
    assert max_payout([(1, 0), (1, 0)]) == 0
    assert max_payout([(2, 0), (1, 5)]) == 5


# ----------------------------------------------------------------------------
# Randomized cross-checks against brute force (small) and heap oracle.
# ----------------------------------------------------------------------------

def test_random_small_vs_bruteforce():
    rng = random.Random(1234)
    for _ in range(400):
        n = rng.randint(0, 7)
        gigs = [(rng.randint(1, 6), rng.randint(0, 50)) for _ in range(n)]
        assert max_payout(gigs) == brute_force(gigs)


def test_random_medium_vs_heap_oracle():
    rng = random.Random(99)
    for _ in range(60):
        n = rng.randint(1, 500)
        gigs = [(rng.randint(1, n + 5), rng.randint(1, 10**6)) for _ in range(n)]
        assert max_payout(gigs) == heap_oracle(gigs)


def test_random_tight_deadlines_vs_heap_oracle():
    rng = random.Random(7)
    for _ in range(60):
        n = rng.randint(1, 400)
        maxd = rng.randint(1, 5)
        gigs = [(rng.randint(1, maxd), rng.randint(1, 1000)) for _ in range(n)]
        assert max_payout(gigs) == heap_oracle(gigs)


def test_large_input_performance_and_correctness():
    # Big instance: an O(n^2) slot scan would blow the time budget, and a
    # naive greedy would give a wrong total. Must match the heap oracle.
    rng = random.Random(2024)
    n = 200_000
    gigs = [(rng.randint(1, 10**9), rng.randint(1, 10**9)) for _ in range(n)]
    assert max_payout(gigs) == heap_oracle(gigs)


def test_large_tight_deadlines():
    rng = random.Random(555)
    n = 200_000
    gigs = [(rng.randint(1, 50), rng.randint(1, 10**9)) for _ in range(n)]
    assert max_payout(gigs) == heap_oracle(gigs)
System prompt
You are an expert programmer. Solve the task exactly as specified. Output your solution as fenced code blocks using the required file name(s) and the exact function/type signatures requested. Prefer correctness; do not include prose outside code unless asked.
Proposed solution
```python
import heapq

def max_payout(gigs: list[tuple[int, int]]) -> int:
    """Return the maximum total payout achievable.

    `gigs` is a list of (deadline, payout) pairs. Return an int.
    """
    if not gigs:
        return 0
    
    # Sort gigs by deadline (ascending)
    gigs.sort(key=lambda x: x[0])
    
    # Min-heap to track payouts of accepted gigs
    heap = []
    
    for deadline, payout in gigs:
        heapq.heappush(heap, payout)
        # If we have more accepted gigs than the current deadline allows,
        # remove the smallest payout (can't do more than 'deadline' gigs by day 'deadline')
        if len(heap) > deadline:
            heapq.heappop(heap)
    
    return sum(heap)
```

Test output (stdout)

................                                                         [100%]
16 passed in 0.30s
Reasoning (chain-of-thought)
The problem is to find the maximum total payout from a set of gigs, where each gig has a deadline and a payout. Each gig takes exactly one day, and we can do at most one gig per day. A gig must be completed on or before its deadline.

This is a classic scheduling problem that can be solved using a greedy approach with a priority queue.

The key insight is:
1. Sort gigs by deadline (ascending)
2. Process gigs in order of deadline
3. For each gig, try to schedule it as late as possible within its deadline
4. Use a min-heap to track the payouts of accepted gigs

Actually, there's a better approach:
1. Sort gigs by deadline (ascending)
2. Use a min-heap to keep track of the payouts of gigs we've accepted
3. For each gig in deadline order:
   - Add the gig's payout to the heap
   - If the number of accepted gigs exceeds the current deadline, remove the smallest payout from the heap (since we can only do at most `deadline` gigs by that deadline)

Wait, let me think more carefully.

The constraint is that by day `d`, we can complete at most `d` gigs. So if we have accepted `k` gigs with deadlines all ≤ `d`, we need `k ≤ d`.

The algorithm:
1. Sort gigs by deadline (ascending)
2. Use a min-heap to track payouts of accepted gigs
3. For each gig (deadline, payout) in sorted order:
   - Add payout to heap
   - If heap size > deadline, remove the smallest payout from heap
4. Sum all payouts in the heap

Let me trace through the examples:

Example: `[(2, 100), (1, 50)]`
- Sort by deadline: `[(1, 50), (2, 100)]`
- Process (1, 50): heap = [50], size = 1 ≤ 1, OK


- Process (2, 100): heap = [50, 100], size = 2 ≤ 2, OK
- Total = 150 ✓

Example: `[(1, 1), (1, 100)]`
- Sort by deadline: `[(1, 1), (1, 100)]` (or either order since both have deadline 1)
- Process (1, 1): heap = [1], size = 1 ≤ 1, OK
- Process (1, 100): heap = [1, 100], size = 2 > 1, remove min (1), heap = [100]
- Total = 100 ✓

Example: `[(2, 100), (1, 19), (2, 27), (1, 25), (3, 15)]`
- Sort by deadline: `[(1, 19), (1, 25), (2, 100), (2, 27), (3, 15)]`
- Process (1, 19): heap = [19], size = 1 ≤ 1, OK
- Process (1, 25): heap = [19, 25], size = 2 > 1, remove 19, heap = [25]
- Process (2, 100): heap = [25, 100], size = 2 ≤ 2, OK
- Process (2, 27): heap = [25, 27, 100], size = 3 > 2, remove 25, heap = [27, 100]
- Process (3, 15): heap = [15, 27, 100], size = 3 ≤ 3, OK
- Total = 142 ✓

This approach works because we're always keeping the highest-value gigs that fit within the deadline constraints. By removing the smallest payout when we exceed the deadline limit, we ensure we're maximizing total earnings.

I'll implement this with a min-heap to efficiently track and remove the lowest-value gigs when needed.

```python
import heapq

def max_payout(gigs: list[tuple[int, int]]) -> int:
    """Return the maximum total payout achievable.

    `gigs` is a list of (deadline, payout) pairs. Return an int.
    """
    if not gigs:
        return 0
    
    # Sort by deadline (ascending)
    gigs.sort(key=lambda x: x[0])
    
    # Min-heap to track payouts of accepted gigs
    heap = []
    
    for deadline, payout in gigs:
        heapq.heappush(heap, payout)
        # If we have more gigs than the current deadline allows,
        # remove the smallest payout
        if len(heap) > deadline:
            heapq.heappop(heap)
    
    return sum(heap)
```

The key insight is that by processing gigs in deadline order and using a min-heap, we ensure we're always keeping the highest-value gigs that fit within the deadline constraints.

I'll verify the algorithm handles edge cases correctly. The empty list returns 0, and single gigs are processed straightforwardly. The heap approach efficiently manages which gigs to keep when deadlines are tight.

The time complexity is O(n log n) due to sorting and heap operations, which meets the performance requirements for large input sizes.