Data Structures: the Heap

Alessandro Salvato
3 set
Tempo di lettura: 7 min

Let's talk about something that's been bugging me lately. I keep seeing developers reinventing the wheel, writing complex sorting algorithms or building inefficient priority systems, when there's this absolutely brilliant data structure just sitting there, waiting to solve their problems.

I'm talking about heaps.

Now, before you roll your eyes and think "great, another boring CS lecture," stick with me. Because once you really get heaps, they'll change how you think about solving problems. Promise.

The "Aha!" Moment That Started Everything

Picture this: You're building a task management system. Users can add tasks with different priorities, and you need to always process the highest priority task first. Your first instinct? Maybe sort the list every time someone adds a task?

Ouch. That's O(n log n) every single time.

Or maybe you keep the list sorted and use binary search for insertions? Better, but still O(n) for each insertion because you need to shift elements.

This is where heaps come to rescue you from performance hell.

What Makes Heaps So Damn Clever

Here's the beautiful thing about heaps: they're lazy in the best possible way. They don't keep everything perfectly sorted (who has time for that?), but they make one simple promise:

"I'll always keep the most important thing at the top."

That's it. That's the magic.

In a max heap, the biggest element is always at the root. In a min heap, the smallest element sits at the top, ready to be grabbed in constant time. No searching, no sorting, just instant access to what matters most.

The Heap Property (Or: How to Be Lazy Efficiently)

The rule is beautifully simple:

Max Heap: Every parent is bigger than its kids
Min Heap: Every parent is smaller than its kids

That's literally it. No complex balancing, no color-coding (looking at you, red-black trees), just one simple rule that gives us incredible power.

The Array Trick That Blew My Mind

When I first learned about heaps, I expected some complex tree structure with pointers everywhere. But here's the genius part: heaps are usually implemented as plain old arrays.

Look at this:

The Heap graphical representation: Tree vs Array

The math is elegant:

Node at index i
Left child: 2i + 1
Right child: 2i + 2
Parent: (i-1)/2

The Operations That Make It All Work

Adding Elements: The "Bubble Up" Dance

When you add a new element:

Stick it at the end (because we're lazy, remember?)
Compare it with its parent
If it violates our heap property, swap them
Keep bubbling up until everything's cool

It's like that ambitious intern who starts at the bottom but keeps getting promoted until they find their proper level.

Time complexity: O(log n) - because you can only bubble up as many levels as the tree is tall.

The Build Heap Superpower

Here's something that blew my mind: you can turn ANY array into a heap in O(n) time. Not O(n log n) like you'd expect, but linear time.

The secret? Start from the bottom and work your way up, fixing small violations before they become big problems. It's like fixing a messy room by starting with the corners and working toward the center.

Where Heaps Are Secretly Running Your Life

1. Your Operating System's Task Scheduler

Every time you're multitasking, your OS is using heaps to decide which process gets CPU time next. High priority? Bubble up. Low priority? Sink down.

2. Dijkstra's Algorithm (AKA How Google Maps Works)

Finding the shortest path between two points? Dijkstra's algorithm uses a min heap to always pick the next closest unvisited node. Boom - optimal routes.

3. Heap Sort: The Underrated Sorting Champion

Heap sort doesn't get enough love, but it's consistently O(n log n) with O(1) space complexity. No worst-case quadratic behavior like quicksort, no extra space like mergesort.

5. The K-Largest Elements Problem (Interview Gold)

This is probably where you'll first encounter heaps in the wild. The problem: given an array and a number K, find the K largest elements.

Naive approach: Sort everything → O(n log n)

Heap approach: Use a min heap of size K → O(n log k)

The difference is massive when K is small. For K=10 and n=1,000,000, you go from ~20 million operations to ~20,000 operations. That's a 1000x improvement!

6. Priority Queues Everywhere

Hospital emergency rooms (triage)
Network packet routing
Job scheduling in distributed systems

Heapify toghether

Let's try writing the code step by step.

I'd like to act using a class named MaxHeap. So, any class requires a constructor.

The Constructor: Starting Simple

def __init__(self):
    self.heap = []
    self.size = 0  # Actually redundant - could just use len(self.heap)

Nothing fancy here. Just an empty list to store our heap elements.

Peek: The Easy Win

def peek(self):
    return self.heap[0] if self.heap else None

This is why heaps are beautiful. Want the maximum? It's always at index 0. No searching, no comparing, just direct access. The ternary operator handles the edge case of an empty heap gracefully.

Push: Where the Magic Begins

def push(self, value):
    self.heap.append(value)
    self._bubble_up(len(self.heap) - 1)

The strategy is deceptively simple:

Be lazy first: Just stick the new element at the end of the array
Fix violations: Let _bubble_up handle the heap property

This is the beauty of heaps - you can break the rules temporarily as long as you fix them quickly.

The Bubble Up Algorithm: Climbing the Ladder

def _bubble_up(self, index):
    if index == 0:
        return  # We've reached the root, we're done
    
    parent_index = (index - 1) // 2  # The array math magic
    if self.heap[index] > self.heap[parent_index]:
        
		# Violation! Swap with parent
        self.heap[index], self.heap[parent_index] = \
            self.heap[parent_index], self.heap[index]

        self._bubble_up(parent_index)  # Keep going up

Here's what's happening step by step:

Base case: If we're at the root (index 0), we're done
Find parent: Use the mathematical relationship (index - 1) // 2
Check violation: If current element is bigger than parent, we have a problem
Fix and recurse: Swap them and keep bubbling upù

The recursion naturally stops when either we reach the root or find a parent that's properly larger.

Pop: The Tricky Part

def pop(self):
    if not self.heap:
        return None
    
    # The clever part: swap first and last
    self.heap[0], self.heap[-1] = self.heap[-1], self.heap[0]
    max_val = self.heap.pop()  # Remove the last element (original root)
    
    # Now fix the mess we just created
    if self.heap:
        self._sink_down(0)
    
    return max_val

This is where most people get confused. Why the swap? Here's the thinking:

We want the root (maximum element), but removing it leaves a hole
Moving elements is expensive in the middle of an array
The last element is cheap to remove (no shifting required)
So we swap them! Root goes to the end (easy to remove), last element goes to root (needs fixing)

Sink Down: Restoring Order

def _sink_down(self, index):
    left_child = 2 * index + 1
    right_child = 2 * index + 2
    largest = index  # Assume current is largest
    
    # Check if left child is larger
    if (left_child < len(self.heap) and 
        self.heap[left_child] > self.heap[largest]):
        largest = left_child
    
    # Check if right child is larger
    if (right_child < len(self.heap) and 
        self.heap[right_child] > self.heap[largest]):
        largest = right_child
    
    # If we found a larger child, swap and continue
    if largest != index:
        self.heap[index], self.heap[largest] = \
            self.heap[largest], self.heap[index]
        self._sink_down(largest)

This is the most complex method, but the logic is straightforward:

Find the children: Use 2*index + 1 and 2*index + 2
Find the largest: Compare current element with both children
If violation exists: Swap with the larger child (maintains heap property)
Recurse: Keep sinking down until no more violations

The key insight: we always swap with the larger child to ensure we don't create new violations elsewhere.

The Performance Reality Check

Compare this to keeping a sorted array (O(n) insertions) or searching an unsorted array (O(n) to find max), and you'll see why heaps are such game-changers.

The Python wrapper: heapq

import heapq

# Python's heapq only does MIN heaps by default
numbers = [64, 34, 25, 12, 22, 11, 90]
heapq.heapify(numbers)  # Convert list to heap in-place
print(f"Min heap: {numbers}")  # [11, 12, 25, 34, 22, 64, 90]

# Getting the minimum is easy
print(f"Minimum: {numbers[0]}")  # 11

# Push and pop operations
heapq.heappush(numbers, 5)
print(f"After pushing 5: {numbers}")

smallest = heapq.heappop(numbers)
print(f"Popped: {smallest}, heap now: {numbers}")

# For K largest elements 
original = [64, 34, 25, 12, 22, 11, 90, 88, 76, 50]
k_largest = heapq.nlargest(3, original)
print(f"3 largest from {original}: {k_largest}")  # [90, 88, 76]

# The MAX heap workaround (negate values)
class MaxHeapq:
    def __init__(self):
        self.heap = []
    
    def push(self, val):
        heapq.heappush(self.heap, -val)  # Negate for max behavior
    
    def pop(self):
        return -heapq.heappop(self.heap)  # Negate back
    
    def peek(self):
        return -self.heap[0] if self.heap else None

# Test our max heap wrapper
max_heap = MaxHeapq()
for val in [10, 30, 20, 50, 40]:
    max_heap.push(val)

print(f"Max element: {max_heap.peek()}")  # 50
print(f"Popped max: {max_heap.pop()}")    # 50
print(f"New max: {max_heap.peek()}")      # 40

Key heapq Gotchas to Remember

It's always a MIN heap - no built-in max heap option
heapify() works in-place - it modifies your original list
The heap property only applies to parent-child relationships - don't expect the array to be fully sorted
For max heap behavior, negate your values (multiply by -1)

The heapq module is production-ready, well-tested, and optimized in C. Unless you're in a learning environment or need very specific behavior, this is what you should reach for.

The Traps

1. Heaps Aren't for General Searching

If you need to find arbitrary elements, heaps are terrible. They only guarantee the top element, nothing else.

2. The Heap Memory Confusion

The "heap" in "heap memory" has nothing to do with heap data structures. I spent an embarrassing amount of time confused about this early in my career.

3. Language-Specific Gotchas

Python's heapq only provides min heaps. Want a max heap? You'll need to negate your values or write a wrapper.

Your Next Steps

Here's my challenge for you: find one place in your current codebase where you're doing something inefficiently that a heap could solve. Maybe you're:

Repeatedly finding the max/min in a collection
Implementing a priority queue with a sorted array
Writing a custom sorting algorithm for partially ordered data

Try replacing it with a heap. I bet you'll be surprised by both the performance improvement and the code simplicity.

The Bottom Line

Heaps are one of those data structures that seem mysterious until they click, and then you see them everywhere. They're not the fanciest tool in the CS toolkit, but they're reliable, efficient, and solve real problems elegantly.

In a world of overengineered solutions and premature optimization, heaps are refreshingly straightforward: one simple property, a few basic operations, and the power to make your code significantly faster.

So next time you're facing a priority-based problem, remember the humble heap. It's been quietly powering the systems you use every day, and it's ready to power yours too.

What's your favorite heap use case? Hit reply and let me know - I love hearing about creative applications of fundamental data structures!