Heap (Data Structure)

From Extremely Corporate Wiki
Jump to navigation Jump to search

A heap data structure is a binary tree which is useful to the heap-sort algorithm as well as as a priority queue.

Representation

The heap is stored as a contiguous array, but operations upon it are defined with respect to its interpretation as a tree. A heap can be thought of as a binary tree completely filled on all levels except possible the lowest. A heap has a "length" (the size of the underlying array) and a "heap size" (the upper bound on the indices of the array which are valid elements of the heap).

Like any binary tree, a single node in a heap has at most 3 nodes directly connected to it: a parent, a left child, and a right child.

Given an index [math]\displaystyle{ i }[/math] representing the location of a node in the underlying array, the indices of the directly connected nodes are given by the following operations:

  • Parent: [math]\displaystyle{ \left\lfloor\frac{i}{2}\right\rfloor }[/math]
  • Left Child: [math]\displaystyle{ 2i }[/math]
  • Right Child: [math]\displaystyle{ 2i + 1 }[/math]

The one exception is for the root node which always has index 0 and which has no parent.

The case of a leaf node or a node with only one child is captured by the "heap size," since the index of the non-existent child will be [math]\displaystyle{ \geq }[/math] the "heap size." This property is guaranteed by the invariant that if a node has just one child, it must be the left child.

There are two types of heaps: min-heaps and max-heaps. Min-heaps satisfy that every node is less than or equal to its children. Max-heaps satisfy that every node is greater than or equal to its children. Therefore, the root node is the minimum element in a min-heap and the maximum element in a max-heap.

The "height" of a node in the heap is the length of the longest downward path from that node to a leaf. The height of the whole heap is the height of the root node.

The leaves of an n-element heap are stored at indices from [math]\displaystyle{ \left\lfloor\frac{n}{2}\right\rfloor }[/math] to [math]\displaystyle{ n }[/math].

Usage

The max-heap is generally used to implement heap-sort, while the min-heap is generally used to implement priority queues.

Maintaining the Heap Property

Max Heapify

This algorithm takes 2 parameters: an array storing a max-heap, and an index i of a node in that array. It assumes both sub-trees of node i are valid max-heaps, but node i itself might be less than one of its sub nodes. The max-heapify algorithm "floats" the value of node i down into its correct place.

Keep track of an index called largest which is initially set to i.

If node i has a left child l and that child is greater than node i, set largest to l.

If node i has a right child r and that child is greater than node largest, set <largest> to r.

If largest is not i, swap the value of node i with node largest, then call the max heapify algorithm recursively on the same array, but starting at index largest this time.

This algorithm has a worst-case running time that is [math]\displaystyle{ \mathcal{O}(\text{log}_2 n) }[/math] where [math]\displaystyle{ n }[/math] is the height of the heap.

This algorithm can be modified to maintain the min-heap property by changing the comparison from checking if the child is greater than the parent to checking if the child is less than the parent.

Building a Heap

We can use the max heapify algorithm to create a max heap from an arbitrary array:

For values of i from [math]\displaystyle{ \left\lfloor\frac{length}{2}\right\rfloor }[/math] to 1, call the max heapify algorithm on node i.

This algorithm takes [math]\displaystyle{ \mathcal{O}(n) }[/math] time where [math]\displaystyle{ n }[/math] is the size of the heap.

Heap Sort

Heap sort is an unstable (the relative order of elements is not preserved) in-place (it uses [math]\displaystyle{ \mathcal{O}(1) }[/math] additional memory) sorting algorithm.

Heap sort starts by converting the given array into a max-heap using max heapify.

Then, we iterate index i from the size of the array down to 2.

For each iteration, we exchange the root node (maximum element) with the ith element of the array, i.e. we move it to the end.

Next, we max-heapify the first i elements, so the next largest element is at the first index.

The end result is a sorted list.

The algorithm takes [math]\displaystyle{ \mathcal{O}(n \text{log}_2 n) }[/math] time where [math]\displaystyle{ n }[/math] is the size of the input array.

Priority Queues

A priority queue is an abstract data type based on a heap. It is a set of elements which are a tuple of a value and a priority.

The maximum/minimum (depending on whether or not the priority queue is implemented with a max-heap or a min-heap) priority element is obtained by getting the root node.

Heap Extract Max

However, to remove the root instead of just accessing it is more involved:

Set the root node to the value of the node stored at the last valid array index. Then, decrement the heap size by 1. Finally, run max-heapify on the array starting at the root node (i.e. max-heapify the whole array/tree).

This algorithm takes [math]\displaystyle{ \mathcal{O}(\text{log}_2(n)) }[/math] time where [math]\displaystyle{ n }[/math] is the size of the heap.

Heap Increase Key

In a max heap, we can increase the priority of a node:

Given an index i, set the priority of node i to its new priority, provided that priority is higher.

Then, "bubble" the node up the tree: while the index i hasn't reached the first index and while the node at i is less than its parent, exchange the node with its parent, then set i to the parent node index ([math]\displaystyle{ \lfloor\frac{i}{2}\rfloor }[/math]).

This takes [math]\displaystyle{ \mathcal{O}(\text{log}_2 n) }[/math] time where [math]\displaystyle{ n }[/math] is the size of the heap.

Max Heap Insert

Finally, this is how we add new values to the heap:

Increment the heap size by 1, set the value of the new element at the end of the array to the desired value, set the priority of the new element to the minimum possible value, then run heap increase key on the last index with its desired priority.

Since this is just doing some constant work, then using the heap increase key algorithm, this also takes [math]\displaystyle{ \mathcal{O}(\text{log}_2 n) }[/math] time where [math]\displaystyle{ n }[/math] is the size of the heap.