Heaps

Heaps

The binary heap data structure is an array object that we can view as a nearly complete binary tree. Each node of the tree corresponds to an element of the array. The tree is completely filled on all levels except possibly the lowes, which is filled from the left up to a point that is the elements in the subarray \(A[\lfloor \frac{n}{2} \rfloor + 1 ... n]\) are all leaves.

An array \(A\) that represents a heap is an object with two attributes:

  1. \(A.length\): gives the number of elements in the array. \(A[1:A.length]\).
  2. \(A.heap\_size\): gives how many elements in the heap are stored within array \(A\). \(A[1:A.heap\_size:A.length]\)

Given the index \(i\) of a node, we can easily compute the indices of its parent, left child and right child by:

  1. parent: \(\lfloor \frac{i}{2} \rfloor\) (by shifting right 1 bit i >> 1)
  2. left child: \(2i\) (by shifting left 1 bit i << 1)
  3. right child: \(2i + 1\) (by shifting left 1 bit and add 1 i << 1 + 1)

There are two types of binary heap:

  1. Max heap: satisfies the max heap property that for every node \(i\) other than the root \(A[parent(i)] \geq A[i]\), that is the maximum value in the array is stored in the root and the subtree rooted at a node contains values no larger than that contained at the node itself (heap sorts).
  2. Min heap: satisfies the min heap property that is organized in the opposite way, for every node \(i\) other than the root \(A[parent(i) \leq A[i]]\) (priority queues)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Heap:
def __init__(self, A, heap_size):
# A may not be a heap, call build_min_heap or build_max_heap to convert this instance to heap
assert isinstance(A, list)
self.heap_size = heap_size
self.length = len(A)
self._A = A

def left(self, i):
return i << 1

def right(self, i):
return i << 1 + 1

def parent(self, i):
return i >> 1

def __getitem__(self, index):
return self._A[index]

def append(self, val):
self._A.append(val)

Maintaining the Heap Property

max_heapify(A, i), when it is called, this function assumes that the binary trees rooted at \(left(i)\) and \(right(i)\) are max heaps but that \(A[i]\) might be smaller than its children:

  1. At each step, the largest of the elements \(A[i], left(i), right(i)\) is determined, and its index is stored in largest.
  2. If \(A[i]\) is largest, then the subtree rooted at node \(i\) is already a max-heap, we terminate the function.
  3. Otherwise, we swap \(A[i]\) with \(A[\arg\max(A[left(i)], A[right(i)])]\)
  4. Next, repeat on node \(A[\arg\max(A[left(i)], A[right(i)])]\) until the function terminates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def max_heapify(A, i):
# get left and right index
l = A.left(i)
r = A.right(i)
# if left > A[i], set current largest index to l
if l <= A.heap_size and A[l] > A[i]:
largest = l
else:
largest = i

# if right > A[i], set current largest index to r
if r <= A.heap_size and A[r] > A[largest]:
largest = r

# if there largest is A[i], we have base case therefore we dump out the loop, else we repeat on largest
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, largest)

min_heapify can be implemented in similar but opposite way.

Building a Heap

We can use the procedure max_heapify in a bottom-up manner to convert an array \(A[1 ... A.heap\_size]\) into a max heap. Recall that the elements in the subarray \(A[\lfloor \frac{n}{2} \rfloor + 1 ... n]\) are all leaves of the trees, and so each is a 1-element heap to begin with. The function build_max_heap(A) goes through the remaining nodes of the tree and runs max_heapify on each one.

1
2
3
4
5
6
def build_max_heap(A):
n = A.heap_size
# start at first non-leave node which is floor(n / 2)
start = n // 2
for i in range(start - 1, -1, -1):
max_heapify(A, i)

A min heap can be build in similar way by replace max_heapify with min_heapify.

Heap Sort

Recall that, a root of the max heap is the maximum item in the array \(A[1 ... A.heap\_size]\). if we do a build_max_heap on entire \(A\), swap \(A[1]\) with \(A[n]\) and then do a max_heapify(A[1 ... n - 1], i), we end up with a sorted array.

1
2
3
4
5
6
def heap_sort(A):
build_max_heap(A)
while A.heap_size >= 2:
A[0], A[heap_size - 1] = A[heap_size - 1], A[0]
A.heap_size = A.heap_size - 1
max_heapify(A, 0)

Priority Queues

A priority queue is a data structure for maintaining a set \(S\) of elements, each with an associated value called a \(key\). A max priority queue is based on max heap and a min prioprity queue is based on min heap.

The max priority queue (used in scheduler such as job scheduling) supports:

  1. insert(S, x): inserts the elements \(x\) into \(S\).
  2. maximum(S): returns the element of S with the largest key.
  3. extract_max(S): removes and returns the element of \(S\) with the largest key.
  4. increase_key(S, i, k): increases the value of index \(i\)'s key to the new value \(k\), which is assumed to be at least as large as \(x\)'s current key value.

The min priority queue (used in event-driven simulator) supports:

  1. insert(S, x): inserts the elements \(x\) into \(S\).
  2. minimum(S): returns the element of S with the smallest key.
  3. extract_min(S): removes and returns the element of \(S\) with the smallest key.
  4. decrease_key(S, i, k): decreases the value with index \(i\)'s key to the new value \(k\), which is assumed to be at most as large as \(x\)'s current key value.
1
2
3
4
5
6
7
8
9
10
11
12
def maximum(A):
return A[0]

def extract_max(A):
if A.heap_size < 1:
raise ValueError("Heap underflow")

max_val = maximum(A)
A[0], A[A.heap_size - 1] = A[A.heap_size - 1], A[0]
A.heap_size = A.heap_size - 1
max_heapify(A, 0)
return max_val
1
2
3
4
5
6
7
8
def increase_key(A, i, k):
n = A.heap_size
if i >= n or n < 1 or k < A[i]:
raise ValueError('invalid key or value')

A[i] = k
while i > 0 and A[i] > A[A.parent(i)]:
A[i], A[A.parent(i)], i = A[A.parent(i)], A[i], A.parent(i)
1
2
3
4
5
6
def insert(A, x):
# add a new leave, then increase key of the leave to x.
A.append(x)
A.heap_size = A.heap_size + 1
A.length = A.length + 1
increase_key(A, A.heap_size - 1, x)

Complexity

  1. max_heapify: \(O(\log (n))\)
  2. build_max_heap: \(O(n)\)
  3. heap_sort: \(O(n \log (n))\)
  4. maximum: \(O(1)\)
  5. extract_max: \(O(\log n)\)
  6. insert: \(O(\log n)\)
  7. increase_key: \(O(\log n)\)