Big-O

This tutorial introduces the reader to Big-O, a mathematical notation that helps us describe the limiting behavior of an algorithm's timing or spacing function as the problem size tends towards large values or infinity.

WARNING: This tutorial assumes that students have either taken or exempted MATH 1113 (precalculus). This means that students should be familiar with inequalities, logarithms, and exponents. Any new math concepts or symbols will be presented in context with examples.

INFO: Some of the material presented in this tutorial will be covered in other Computer Science classes such as Discrete Mathematics and Data Structures. In those classes, mathematical rigor is required. However, in CSCI 1302, we ignore some of the rigor so that students can get the gist of the concepts in order to apply them in practical situations. While some of this will be repeated in later courses, expect that those courses will require better explanations or even rigorous proofs, especially when it comes to the math.

Motivation

Given multiple algorithms that solve the same problem, which one do you choose? For example, there are many famous algorithms that can be used to sort an array of directly comparable values. This means you have a choice. There are often important trade-offs between each choice that impact, in a very practical way, the use of one algorithm over another in any given application.

You're likely familar with deriving a timing or spacing function for an algorithm. Given such a function, we can classify the algorithm into a complexity class based on the function. We can then say that algorithms whose functions are in the same complexity class behave similarly. Furthermore, if two algorithms are in different complexity classes, then one is clearly better than the other with respect to the unit used for the function (e.g., the set of key processing steps or the unit of space measurement).

Symbols and Notation

Here is a list of symbols and notation that we will use in this tutorial:

Symbols and Notation
Symbol / Notation	Meaning
$\mathbb{N}_0$	set of natural numbers, including $0$
$\mathbb{R}^+$	set of positive real numbers
$x \in S$	$x$ is an element of a set denoted by $S$ (i.e., $x$ is in $S$).
$f : A \to B$	$f$ is a function with domain set $A$ and codomain set $B$ (i.e., $f$ is a function from $A$ to $B$).
$a \to b$	$a$ implies $b$
$a \leftrightarrow b$	$a$ if, and only if $b$ (sometimes written "$a$ iff $b$")
${\rm O}(\cdot)$	Big-O of $\cdot$

Informal Definition of Big-O

Please bear with us through the next few sentences; we'll explain the math as we go. Big-O defines a set of functions with a similar upper bound. Here is an informal definition for membership in a particular Big-O set, assuming the functions we wish to classify are strictly positive for large input values:

DEFINITION (Big-O): Let $f : \mathbb{N}_0 \to \mathbb{R}^+$ and $g : \mathbb{N}_0 \to \mathbb{R}^+$ be functions where $g(n)$ is strictly positive for large enough values of $n$. Then, we can only say that $ f(n) \in {\rm O}(g(n)) $ if we can also show that $$ f(n) \leq c \cdot g(n) $$ for some positive constant $c$ and for all sufficiently large $n$. The notation $f(n) = {\rm O}(g(n))$ is also used to denote $f(n) \in {\rm O}(g(n))$. Finally, we call $g(n)$ the characteristic function for ${\rm O}(g(n))$.

A Big-O set characterizes a function according to its growth rate -- two functions in the same Big-O set grow similarly for sufficiently large values of $n$ (i.e., in their asymptotic limit).

If we're classifying a timing function, for example, the assumption of strictly positive functions makes sense as you cannot have a negative number of computing steps in practice. The same notion intuitively holds for spacing functions as well.

There is a more mathematically formal definition for Big-O. If you're interested in seeing it, then consult the appendix near the end of this tutorial. For this class, we'll assume that the informal definition is okay for most of our purposes.

Examples

Here are some examples of how to check and/or derive the Big-O sets or simple functions:

Consider $T(n) = 2n^2 + 3n - 2$. In this case, $T(n) \in {\rm O}(n^2)$. If this is true, then it means that $T(n) \leq c \cdot n^2$ for some $c$ and for all sufficiently large $n$. To prove this, we'll start by writing out the equation for $T(n)$, as provided, then take steps to strategically increase the right-hand side of (RHS) the equation. Since we're making the RHS bigger each time, we change the $=$ to $\leq$. Each step is an implication and the direction matters (i.e., you cannot simply follow the steps in reverse). We're not going to increase the terms at random -- instead, we increase them so that we can eventually factor out an $n^2$ term. $$\begin{eqnarray} && T(n) &=& 2n^2 + 3n - 2 \\ &\to& T(n) &\leq& 2n^2 + 3n + 2 \\ &\to& T(n) &\leq& 2n^2 + 3n^2 + 2 \\ &\to& T(n) &\leq& 2n^2 + 3n^2 + 2n^2 \\ &\to& T(n) &\leq& (2 + 3 + 2)n^2 \\ &\to& T(n) &\leq& 7n^2 \end{eqnarray}$$ This gives us $T(n) \leq c \cdot n^2$ where $c = 7$. In a Discrete Mathematics course, you would also need to prove that the inequality holds for sufficiently large $n$, perhaps using something like mathematical induction.
Now consider $T(n) \leq 3n^2 + 39$. In this case, $T(n) \in {\rm O}(n^2)$ as well. This can be shown: $$\begin{eqnarray} && T(n) &\leq& 3n^2 + 39 \\ &\to& T(n) &\leq& 3n^2 + 39n^2 \\ &\to& T(n) &\leq& (3 + 39)n^2 \\ &\to& T(n) &\leq& 42n^2 \end{eqnarray}$$ This gives us $T(n) \leq c \cdot n^2$ where $c = 42$. At this point, we also see the ${\rm O}(n^2)$ is a set of functions as the two functions $T(n) = 2n^2 + 3n - 2$ and $T(n) \leq 3n^2 + 42$ are both in ${\rm O}(n^2)$ (i.e., in "Big-O of $n^2$).
Now consider $T(n) = n^3 + 2\log_4(n) + 6$. In this case, $T(n) \in {\rm O}(n^3)$. This can be shown (this time we'll simplify the notation by omitting the implied implication symbol): $$\begin{eqnarray} T(n) &=& n^3 + 2\log_4(n) + 6 \\ &\leq& n^3 + 2n^3 + 6 \\ &\leq& n^3 + 2n^3 + 6n^3 \\ &\leq& (1 + 2 + 6)n^3 \\ &\leq& 9n^3 \end{eqnarray}$$ This gives us $T(n) \leq c \cdot n^3$ where $c = 9$.
Triangle Inequality: Now consider a $T(n)$ that is any polynomial (i.e., a sum where each term is a constant multiplied by a power of $n$). If you combine like terms and write it in standard form (i.e., with decreasing powers of $n$), then you have something like this: $$ T(n) = c_{k}n^{k} + c_{k-1}n^{k-1} + \cdots + c_2n^2 + c_1n^1 + c_0n^0 $$ In this case, $T(n) \in {\rm O}(n^k)$. This can be shown using a usual strategy, taking advantage of the triangle inequality: $$\begin{eqnarray} T(n) &=& c_{k}n^{k} + c_{k-1}n^{k-1} + \cdots + c_2n^2 + c_1n^1 + c_0n^0 && \\ &\leq& |c_{k}n^{k} + c_{k-1}n^{k-1} + \cdots + c_2n^2 + c_1n^1 + c_0n^0| && \\ &\leq& |c_{k}|n^{k} + |c_{k-1}|n^{k-1} + \cdots + |c_2|n^2 + |c_1|n^1 + |c_0|n^0 && \text{by the triangle inequality} \\ &\leq& |c_{k}|n^{k} + |c_{k-1}|n^{k} + \cdots + |c_2|n^k + |c_1|n^k + |c_0|n^k && \\ &\leq& (|c_k| + |c_{k-1}| + \cdots + |c_2| + |c_1| + |c_0|)n^k \end{eqnarray}$$ This gives us $T(n) \leq c \cdot n^k$ where $c = |c_k| + |c_{k-1}| + \cdots + |c_2| + |c_1| + |c_0|$. This is a powerful result. Essentially, it means that for any polynomial, we can take the highest order term, drop the coefficient, and that gives us the Big-O.
THEOREM (Quick Derivation of Big-O): Let $f(n)$ be a polynomial with $c_k n^k$ as its highest order term. Then, $f(n) \in {\rm O}(n^k)$ by the triangle inequality. Furthermore, let $f(n)$ be polynomial-like with $c \cdot g(n)$ as its highest order term. Then, $f(n) \in {\rm O}(g(n))$ by applying the same strategy as the triangle inequality. In either scenario, you identify the highest order term, then drop the coefficient to obtain the relevant Big-O set.
Now consider $T(n) = 4n^2 \log(n) + 3n + 2\log(n) + 5$. Then, using the quick derivation theorem, we have that $T(n) \in {\rm O}(n^2\log(n))$ since $4n^2 \log(n)$ is the highest order term in the polynomial-like function.
Now consider $T(n) = n\log_2(n) + 2n + 1$. In this case, $T(n) \in {\rm O}(n\log(n))$ where the logarithm is in any base. This can be shown, taking advantage of the change of base identity for logarithms: $$\begin{eqnarray} T(n) &=& n\log_2(n) + 2n + 1 \\ &=& n\left(\frac{\log(n)}{\log(2)}\right) + 2n + 1 \\ &=& \left(\frac{1}{\log(2)}\right) n\log(n) + 2n + 1 \\ &\leq& \left(\frac{1}{\log(2)}\right) n\log(n) + 2n\log(n) + 1n\log(n) \\ &\leq& \left(\frac{1}{\log(2)}\right) n\log(n) + 2n\log(n) + 1n\log(n) \\ &\leq& \left(\frac{1}{\log(2)} + 2 + 1\right) n\log(n) \end{eqnarray}$$ This gives us $T(n) \leq c \cdot n\log(n)$ where $c = \frac{1}{\log(2)} + 2 + 1$. The $\frac{1}{\log(2)}$ term is constant so it's allowed to be included in $c$. Essentially, logarithms of different bases only differ by a constant amount. Therefore, we can usually factor out the change of base into the constant.
THEOREM (Base of Logarithm in Big-O): Let $f(n) \in {\rm O}(g(n)\cdot\log_k(n))$ for some valid $g(n)$. Then, $f(n) \in {\rm O}(g(n)\cdot\log(n))$ by the change of base identity for logarithms.

Notable Complexity Classes

There is an infinite number of Big-O sets / classes. However, we usually classify the timing or spacing functions for algorithms into one of the notable complexity classes below:

Notable Complexity Classes
Set	Name	Example
${\rm O}(1)$	Constant	$f(n) = 42$
${\rm O}(\log(n))$	Logarithmic	$f(n) = 2 \log(n) + 1$
${\rm O}(n)$	Linear	$f(n) = 4 n + 2$
${\rm O}(n\log(n))$	Linearithmic	$f(n) = 3 n\log(n) + 2n$
${\rm O}(n^2)$	Quadratic	$f(n) = 5 n^2 + 3n$
${\rm O}(n^3)$	Cubic	$f(n) = 2 n^3 + n^2 + 2$
${\rm O}(n^k)$	Polynomial	$f(n) = c_{k}n^{k} + c_{k-1}n^{k-1} + \cdots + c_1n^1 + c_0n^0$
${\rm O}(k^n)$	Exponential	$f(n) = 2^n + 5$

In the table above, we assume, for our purposes, that both $n$ and $k$ are integers greater than $1$. As an exercise, take each example in the table above and show that it has the claimed Big-O.

THEOREM (Nested Big-O Classes): Suppose that $g(n) \leq h(n)$ for sufficiently large $n$ and $f(n) \in {\rm O}(g(n))$. Then, $f(n) \in {\rm O}(h(n))$. Informally, if a function is in a particular Big-O set, then it is also in all larger Big-O sets as well.

Now consider $T(n) = 4n^2 + n + 2$. Then, using the quick derivation theorem, we have that $T(n) \in {\rm O}(n^2)$. By the nested classes theorem, we also have that $T(n) \in {\rm O}(n^3)$, $T(n) \in {\rm O}(n^3\log(n))$, etc. $$\begin{eqnarray} T(n) &=& 4n^2 + n + 2 && \\ &\leq& (4 + 1 + 2) n^2 \\ &\leq& c \cdot n^2 ~;~ c = 7 &\to& T(n) \in {\rm O}(n^2) \\ &\leq& c \cdot n^3 ~;~ c = 7 &\to& T(n) \in {\rm O}(n^3) \\ &\leq& c \cdot n^3 \log(n) ~;~ c = 7 &\to& T(n) \in {\rm O}(n^3 \log(n))\\ \end{eqnarray}$$ In general, while you can establish that a function is in multiple Big-O sets, you usually want to pick the best Big-O that you can establish the inequality for. Since Big-O sets convey an upper bound, we define best as the smallest set that works.
Here is a diagram that shows the plots for some of the characteristic functions in the list of notable classes:

Figure by Cmglee under CC BY-SA 4.0
The general idea is that the functions in a particular Big-O set (if the best set is chosen for each) would all behave similar to the function in the Big-O when compared to functions in other Big-O sets.

Algorithm Example

The Big-O syntax not only can be used with the $T(n)$ or $S(n)$ that results from an algorithm analysis diagram, it can also be used in the diagram process itself. Consider the following method that we want to analyze. Here, we'll denote the problem size $n$ as the length of the array and the set of key processing steps as only including print-like statements:

void printA(int[] a) {
    for(int i = 0; i < a.length; i++) { // ------------------\
        for(int j = 0; j < i; j++) { // --------------\      |
            System.out.print(a[i] + " "); // --> 1    | <= n | n
            System.out.print(a[i] + " "); // --> 1    |      |
        } // for -------------------------------------/      |
        System.out.println(); // ---------------------> 1    |
    } // for // ---------------------------------------------/
} // printA

We can substitute each part with Big-O notation, as follows:

void printA(int[] a) {
    for(int i = 0; i < a.length; i++) { // ------------------\
        for(int j = 0; j < i; j++) { // --------------\      |
            System.out.print(a[i] + " "); // --> O(1) | O(n) | O(n)
            System.out.print(a[i] + " "); //          |      |
        } // for -------------------------------------/      |
        System.out.println(); // ---------------------> O(1) |
    } // for // ---------------------------------------------/
} // printA

Now, using the usual "multiply-across and add up", we get the following:

$$\begin{eqnarray} T(n) &\leq& {\rm O}(1) \cdot {\rm O}(n) \cdot {\rm O}(n) \\ &+& {\rm O}(1) \cdot {\rm O}(n) \end{eqnarray}$$

If we simplify this, we get $T(n) \in {\rm O}(n^2)$ print-like statements. This reult happens because when you multiply some Big-O sets, you multiply the inner functions, but when you add them up you only keep the higher order terms.

WARNING (Concerning Units): When using Big-O to classify an algorithm, it is very important to include the units. Two algorithms are only directly comparable by their Big-O sets if they use the same units. A notable example is the collection of various sorting algorithm which are usually analyzed according to the number of element comparisons they make or the number of element swaps they make. You would only want to compare a Big-O for swaps with another Big-O for swaps.

Congratulations! You've made it through the introductory tutorial for Big-O!

Appendix - Formal Definition of Big-O

Big-O defines a set or class of functions with a similar upper bound. The formal definition for membership in a particular Big-O set, from Discrete Mathematics, is:

DEFINITION (Big-O): Let $f : \mathbb{N}_0 \to \mathbb{R}^+$ and $g : \mathbb{N}_0 \to \mathbb{R}^+$ be functions where $g(n)$ is strictly positive for large enough values of $n$. Then, $$ f(n) \in {\rm O}(g(n)) \leftrightarrow \exists c, n_0 ~ \forall n \geq n_0 ~ \left( |f(n)| \leq c \cdot g(n) \right) $$ where $ c \in \mathbb{R}^+$ and $ n_0 \in \mathbb{N}_0$. The notation $f(n) = {\rm O}(g(n))$ is also often used to denote $f(n) \in {\rm O}(g(n))$.

It's okay if you don't know what some of these symbols mean -- if you don't, then you'll learn about them in the Discrete Mathematics class. We're going to work with the informal definition presented below.

Appendix - Other Bounds

We have already established that Big-O can be used to describe a set of functions with a similar upper bound. There is also Big-$\Omega$ (Big-Omega) for lower bounds and Big-$\Theta$ (Big-Theta) for tight bounds. There is even "little" versions, Little-o and Little-$\omega$ (omega) for strict inequalities (e.g., $<$ instead of $\leq$ in the case of Little-o).