Thales’ triangles

a broad context for Möbius functions

2025-06-24T14:45:00.000-07:00

This post is about the Möbius function, which is not something I have had to think about very often. It has come up in some combinatorial problems I’ve been working on recently, and so I’ve been reading about it bit by bit. One particular presentation I have found especially helpful, and that’s what I’m going to share here.

The “classical” Möbius function $\mu$ is defined on the positive integers by \[ \mu(n) = \begin{cases} 1 & \text{if}\ n = 1 \\ (-1)^r & \text{if}\ n=p_1\cdots p_r,\ \text{a product of $r$ distinct primes} \\ 0 & \text{otherwise.} \end{cases} \] I have never particularly understood this function. Its definition is simple, but to me opaque. Why would this function be of interest? (I’m not interested in answers such as “because it works.”) Maybe it’s just because I’m not a number theorist at heart.

For independent reasons, I recently read the monograph Commutation and Rearrangements by Pierre Cartier and Dominique Foata, and therein they demonstrate the existence of a Möbius function for any monoid $M$ with finite factorization: that is, given any $x\in M$, there are only finitely many sequences $(x_1,\dots,x_n)$ of non-identity elements $x_i$ such that $x=x_1\cdots x_n$. If $1_M$ is the identity element of $M$, then it is assumed to be the product of the empty sequence.

As two examples, think of the positive integers with multiplication as the operation, or strings over a given alphabet with concatenation as the operation. In a sense, these are two “extreme” examples, because the positive integers form a commutative monoid with an infinite set of generators (the primes), while the only elements that commute in the monoid of strings are powers of a single generator (the elements of the alphabet being the generators).

One consequence of the assumption that every element of $M$ has finitely many factorizations is that $1_M$ has only the empty factorization, because otherwise if $1_M=x_1\cdots x_n$ for some nonempty sequence $(x_1,\dots,x_n)$, then this could be used to produce infinitely many factorizations of every element of $M$.

Here is the definition that Cartier and Foata give for the Möbius function $\mu_M$ of M: given $x\in M$,

$d_+(x)$ is the number of ways $x$ can be written as a product of an even number of factors;
$d_-(x)$ is the number of ways $x$ can be written as a product of an odd number of factors;
$\mu_M(x) = d_+(x)-d_-(x)$.

This is a definition I can get behind! Now the Möbius function is revealed as the difference between two natural counting functions. Note that for any $M$, because the identity $1_M$ has only the empty factorization, with zero factors, $d_+(1_M)=1$ and $d_-(1_M)=0$, which means $\mu_M(1_M)=1$ always.

Let’s see how the rest of the values of $\mu_M$ work out for the two examples we have in mind.

First suppose that $M$ is the monoid of strings over an alphabet $A$. (This will turn out to be the easier of the two cases.) If $x\in A$ is a string with only one letter, then it can only be written as $x$, so $\mu_M(x) = -1$. If $x\in M$ has $n>1$ letters, then it can be factored in $2^{n-1}$ ways, by introducing “breaks” between adjacent letters. Half of these factorizations have an even number of factors, and half have an odd number of factors, so $d_+(x)=d_-(x)=2^{n-2}$, which means $\mu_M(x) = 0$. For example, we can factor the string $acbb$ in eight ways:

$(acbb)$	$(a)(c)(bb)$	$(a)(cb)(b)$	$(ac)(b)(b)$
$(a)(cbb)$	$(ac)(bb)$	$(acb)(b)$	$(a)(c)(b)(b)$

The top row shows the factorizations with an odd number of factors, and the bottom row shows the factorizations with an even number of factors.

Now suppose $M = \mathbb{Z}_+$ is the monoid of positive integers. If $p$ is a prime, then it can only be written as the product $p$, so $d_+(p)=0$ and $d_-(p)=1$, which means $\mu_M(p) = -1$. If $p$ and $q$ are distinct primes, then we have $pq = (pq) = (p)(q) = (q)(p)$, so $d_+(pq) = 2$ and $d_-(pq) = 1$, which means $\mu_M(pq) = 2-1 = 1$. On the other hand, $p^2 = (p^2) = (p)(p)$, so $d_+(p^2)=d_-(p^2)=1$, which means $\mu_M(p^2)=0$. More generally, if $x=p_1\cdots p_n$ with $p_1,\dots,p_n$ all distinct primes, then any factorization of $x$ is given by choosing $k_1$ prime factors, then $k_2$ prime factors, and so on, until $x$ is expressed as a product of $m$ factors as \[x=(p_1\cdots p_{k_1})(p_{k_1+1}\cdots p_{k_1+k_2})\cdots(p_{k_1+\cdots+k_{m-1}+1}\cdots p_{k_1+\cdots+k_m})\] where $k_1+\cdots+k_m=n$. Within each parenthetical factor, the order of the primes doesn’t matter, but the order of the parenthetical factors does matter. This is an ordered partition of the prime factors of $x$.

It is possible to compute $d_+(x)$ and $d_-(x)$ directly, based on the ordered partitions of the prime factors of $x$. (See OEIS A000670 for the total number of ordered partitions of $n$ elements, A089677 for the number of ordered partitions with an odd number of blocks, and A052841 for the number of ordered partitions with an even number of blocks.) However, here we will instead use induction to show the simpler result we need, that if $p_1,\dots,p_n$ are $n$ distinct primes, then \[ d_+(p_1\dots p_n) - d_-(p_1\dots p_n) = (-1)^n \] which recovers the classical Möbius function on numbers of this form. We have already shown the cases $n=1$ and $n=2$ in the previous paragraph, so suppose the result is true for some arbitrary value of $n$, and let $q$ be a prime distinct from $p_1,\dots,p_n$. Set $x=p_1\cdots p_n$. A factorization of $xq$ can be obtained from a factorization of $x$ in three ways:

by inserting $q$ into one of the existing factors;
by introducing $q$ as a new factor immediately before one of the existing factors;
by introducing $q$ as a new factor at the end.

The first operation preserves the parity of the number of factors, while the second operation switches the parity of the number of factors. Because there are the same number of these two types of operations, in computing the difference $d_+(xq)-d_-(xq)$, they cancel out. The third operation also switches the parity of the number of factors, and it is only applied once to each of the factorizations of $x$. Thus \[ d_+(xq)-d_-(xq) = d_-(x)-d_+(x) = -(-1)^n = (-1)^{n+1} \] and the result follows.

On the other hand, if $x\in\mathbb{Z}_+$ is divisible by a prime power $p^k$ with $k>1$, then we can first examine the ways to split up this power. By treating $p^k$ as a string over the alphabet $\{p\}$, we see that there are the same number of factorizations having an even number of factors as an odd number of factors, and so $d_+(p^k)-d_-(p^k)=0$. Introducing the rest of the prime factors of $x$ one at a time, the same argument as in the previous paragraph shows that $d_+(x)-d_-(x)=d_+(p^k)-d_-(p^k)=0$. (The case where more than one prime factor of $x$ appears with multiplicity higher than $1$ is left as an exercise to the reader.)

The purpose of the Möbius function of $M$ is mainly to enable Möbius inversion, a way of transforming between related complex-valued functions on $M$. I won’t go into that process now, because this post has taken long enough already, but I encourage anyone interested to read Cartier and Foata’s monograph (which has a load of other great ideas!).

I will observe that in the original paper where Möbius introduced his eponymous function ( “Über eine besondere Art von Umkehrung der Reihen”, 1832), he derived the expression \[ x = \frac{x}{1-x} - \frac{x^2}{1-x^2} - \frac{x^3}{1-x^3} - \frac{x^5}{1-x^5} + \frac{x^6}{1-x^6} - \frac{x^7}{1-x^7} + \frac{x^{10}}{1-x^{10}} \cdots \] which can be written more succinctly as \[ x = \sum_{n=1}^\infty \frac{\mu(n)x^n}{1-x^n} \] (the expression on the right is a Lambert series). In this context the definition of $\mu(n)$ also makes sense to me, as a collection of coefficients, as it can be derived by an application of the inclusion–exclusion principle. Recall that the summation formula for the geometric series yields \[ \frac{x^n}{1-x^n} = \sum_{k=1}^\infty x^{nk}\text. \] So if you start with the series $x+x^2+x^3+\cdots = x/(1-x)$ and you want to reduce it to $x$, you can remove all the powers that are multiples of $2$, then remove all powers that are multiples of $3$, then add back in all powers that are multiples of $6$ because they got removed twice, then remove all powers that are multiples of $5$, then add back powers that are multiples of $10$ or $15$, then remove powers that are multiples of $30$ because they got removed three times and added back in three times, and so on…

an average number of 1s and 2s

2023-03-03T14:33:00.007-08:00

The terms of the Virahanka–Fibonacci sequence $(V(n))_{n\ge0}$ (OEIS A000045, shifted to start with $V(0)=1$ instead of 0) count the number of ways each natural number $n$ can be expressed as a sum of 1s and 2s, in which the order in the sum matters. For example, $V(5) = 8$ because \[ \begin{array}{rl} 5 &= 1 + 1 + 1 + 1 + 1 \\ &= 1 + 1 + 1 + 2 = 1 + 1 + 2 + 1 = 1 + 2 + 1 + 1 = 2 + 1 + 1 + 1 \\ &= 1 + 2 + 2 = 2 + 1 + 2 = 2 + 2 + 1 \end{array} \]

Here is one way to find an exact expression for $V(n)$. The coefficient of $x^n$ in the polynomial $(x+x^2)^k$ expresses the number of ways to write $n$ as a sum of exactly $k$ 1s and 2s. (Try it!) We can add these polynomials together as a geometric series to obtain the generating function \[ f(x) = \sum_{n=0}^\infty V(n)\, x^n = \sum_{k=0}^\infty \left(x+x^2\right)^k = \frac{1}{1 - x - x^2}. \] The denominator $1 - x - x^2$ can be factored as $(1 - \varphi x)(1 + \varphi^{-1} x)$, where $\varphi - \varphi^{-1} = 1$. The positive solution to this latter equation is the golden ratio \[ \varphi = \frac{1 + \sqrt5}{2}. \] Using partial fractions, we can write \[ \begin{array}{rl} f(x) &= \dfrac{\varphi}{\varphi+\varphi^{-1}}\cdot\dfrac{1}{1 - \varphi x} + \dfrac{\varphi^{-1}}{\varphi+\varphi^{-1}}\cdot\dfrac{1}{1 + \varphi^{-1} x} \\ &= \displaystyle \frac{\varphi}{\varphi+\varphi^{-1}}\sum_{n=0}^\infty (\varphi x)^n + \frac{\varphi^{-1}}{\varphi+\varphi^{-1}}\sum_{n=0}^\infty (-\varphi^{-1}x)^n \\ &= \displaystyle\sum_{n=0}^\infty \frac{\varphi^{n+1} - (-\varphi)^{-(n+1)}}{\varphi+\varphi^{-1}} x^n \\ \end{array} \] and by equating coefficients we get an explicit formula for $V(n)$ in terms of powers of $\varphi$: \[ V(n) = \frac{\varphi^{n+1} - (-\varphi)^{-(n+1)}}{\varphi+\varphi^{-1}}. \] This is known as Binet’s formula. (The denominator in this expression is simply $\sqrt5$. But writing it as $\varphi-(-\varphi^{-1})$ instead better shows how it relates to the numerator.)

Now, what if we want to know not only how many ways $n$ can be expressed as a sum of 1s and 2s, but how many 1s and 2s are needed, on average, to create a sum that equals $n$? For example, among the eight ways to write 5 as a sum of 1s and 2s, there are a total of twenty 1s and ten 2s, with thirty terms in all. So, on average, a sum that equals 5 has 2.5 1s and 1.25 2s, for an average of 3.75 terms.

We can adapt the method of generating functions by introducing new variables that allow us to keep track of how many times 1 and 2 appear in a sum. We’ll use $t$ to represent a 1 in a sum and $u$ to represent a 2. Then the coefficient of $x^n$ in $(tx+ux^2)^k$ will be a polynomial in $t$ and $u$ such that the coefficient of $t^p u^q$ counts the number of ways to express $n$ as a sum of $p$ 1s and $q$ 2s, with $p+q = k$. As before, we can add these polynomials together in a series to get the multivariate generating function \[ F(x,t,u) = \sum_{k=0}^\infty \left(tx+ux^2\right)^k = \frac{1}{1 - tx - ux^2}. \] Note that $F(x,1,1) = f(x)$. When $F(x,t,u)$ is expanded as a power series, the coefficient of $x^n$ is a polynomial $P_n(t,u)$ such that the coefficient of $t^p u^q$ counts the total number of ways to express $n$ as a sum of $p$ 1s and $q$ 2s.

We can determine the coefficients of $P_n(t,u)$ explicitly using binomial coefficients. If a sum has $p+q$ terms of which $p$ are 1s and $q$ are 2s, then there are $\binom{p+q}{p}=\binom{p+q}{q}$ ways to arrange the terms (among the $p+q$ terms, select the positions for the 1s or for the 2s, respectively). If the sum is equal to $n$, then $p+2q=n$, and so \[ \begin{array}{rl} P_n(t,u) &= \displaystyle\sum_{p+2q=n} \binom{p+q}{p} t^p u^q = \displaystyle\sum_{p+2q=n} \binom{p+q}{q} t^p u^q \\ &= \displaystyle\sum_{q=0}^{\lfloor n/2 \rfloor} \binom{n-q}{q} t^{n-2q}u^q. \end{array} \] Because $P_n(1,1)=V(n)$, we thereby also obtain an expression for $V(n)$ as a sum of binomial coefficients: \[ V(n) = \sum_{q=0}^{\lfloor n/2 \rfloor} \binom{n-q}{q}. \] Here the parameter $q$ counts the number of terms equal to 2. The total number of 2s in all compositions of $n$ into 1s and 2s is thus \[ \sum_{q=0}^{\lfloor n/2 \rfloor} q\binom{n-q}{q} \] This sequence begins 0, 0, 1, 2, 5, 10, 20, 38, 71, … (OEIS A001629). If we divide these values by the corresponding terms of the sequence $V(n)$, we obtain the average number of 2s in all compositions of $n$ into 1s and 2s for particular values of $n$:

in a sum of 1s and 2s equal to	1	2	3	4	5	6	7	8
the average number of 2s is	0	1/2	2/3	1	5/4	20/13	38/21	71/34

But how can we study how this average behaves asymptotically?

Let’s write $T_1(n)$ for the total number of 1s in all sums of 1s and 2s equal to $n$, $T_2(n)$ for the total number of 2s, and $T(n)=T_1(n)+T_2(n)$ for the total number of all terms. We already have an expression for $T_2(n)$ above; let’s consider its structure. It takes each coefficient of $t^p u^q$ in $P_n(t,u)$ and multiplies that coefficient by $q$, then adds together the results. The same quantity can be obtained by differentiating $P_n(t,u)$ with respect to $u$ and evaluating the result at $(t,u)=(1,1)$. That is, \[ T_2(n) = \sum_{q=0}^{\lfloor n/2 \rfloor} q\binom{n-q}{q} = \frac{\partial}{\partial u}P_n(t,u)\bigg\vert_{(t,u)=(1,1)}. \] Similarly, \[ T_1(n) = \frac{\partial}{\partial t}P_n(t,u)\bigg\vert_{(t,u)=(1,1)} \] and \[ T(n) = \frac{d}{dt}P_n(t,t)\bigg\vert_{t=1} = \frac{\partial}{\partial t}P_n(t,u)\bigg\vert_{(t,u)=(1,1)} + \frac{\partial}{\partial u}P_n(t,u)\bigg\vert_{(t,u)=(1,1)}. \] To carry out this computation for all degrees simultaneously, we can use the partial derivatives of $F(x,t,u)$ with respect to $t$ and $u$: \[ \frac{\partial}{\partial t} F(x,t,u) = \frac{\partial}{\partial t} \frac{1}{1 - tx - ux^2} = \frac{x}{(1 - tx - ux^2)^2} \] \[ \frac{\partial}{\partial u} F(x,t,u) = \frac{\partial}{\partial u} \frac{1}{1 - tx - ux^2} = \frac{x^2}{(1 - tx - ux^2)^2} \] Evaluating these at $(t,u)=(1,1)$ gives us the generating functions for $T_1(n)$ and $T_2(n)$: \[ \sum_{n=0}^\infty T_1(n)\,x^n = \frac{x}{(1 - x - x^2)^2} \] \[ \sum_{n=0}^\infty T_2(n)\,x^n = \frac{x^2}{(1 - x - x^2)^2} \] Note that these functions are respectively $x(f(x))^2$ and $x^2(f(x))^2$. So it is enough to determine the coefficients of $(f(x))^2$, and then shift them as appropriate: \[ (f(x))^2 = \displaystyle\left(\sum_{n=0}^\infty V(n)\, x^n\right)^2 = \displaystyle\sum_{n=0}^\infty \left(\sum_{k=0}^n V(k)V(n-k)\right) x^n \] From the formula for $V(n)$ in terms of $\varphi$, the coefficient of $x^n$ becomes \[ \begin{array}{rl} &\displaystyle\sum_{k=0}^n \left(\frac{\varphi^{k+1}-(-\varphi)^{-(k+1)}}{\varphi+\varphi^{-1}}\right) \left(\frac{\varphi^{n-k+1}-(-\varphi)^{-(n-k+1)}}{\varphi+\varphi^{-1}}\right) \\ =&\displaystyle\frac{1}{(\varphi+\varphi^{-1})^2}\sum_{k=0}^n\big(\varphi^{n+2} + \varphi^k(-\varphi)^{k-n} + \varphi^{n-k}(-\varphi)^{-k} + (-\varphi)^{-(n+2)}\big) \\ =&\displaystyle\frac{1}{5}\left(\sum_{k=0}^n\big(\varphi^{n+2} + (-\varphi)^{-(n+2)}\big) + \frac{1}{(-\varphi)^n}\sum_{k=0}^n(-\varphi^2)^k + \varphi^n\sum_{k=0}^n (-\varphi^{-2})^k\right) \\ =&\displaystyle\frac{1}{5}\left((n+1)\big(\varphi^{n+2} + (-\varphi)^{-(n+2)}\big) + 2\cdot\frac{\varphi^{n+1}-(-\varphi)^{-(n+1)}}{\varphi+\varphi^{-1}}\right) \\ =&\displaystyle\frac{1}{5}\big((n+1)V(n+2) + (n+3)V(n)\big) \end{array} \] where we have used the formula for a sum of a partial geometric series, along with the fact that \[ \big(\varphi^{n+2}+(-\varphi)^{-(n+2)}\big)\frac{\varphi+\varphi^{-1}}{\varphi+\varphi^{-1}} = \frac{\varphi^{n+3}-(-\varphi)^{-(n+3)}}{\varphi+\varphi^{-1}}+\frac{\varphi^{n+1}-(-\varphi)^{-(n+1)}}{\varphi+\varphi^{-1}} \] (the reader is encouraged to fill in other details of the calculations). Therefore \[ \frac{T_1(n)}{V(n)} = \frac{1}{5}\cdot\frac{nV(n+1)+(n+2)V(n-1)}{V(n)} \] \[ \frac{T_2(n)}{V(n)} = \frac{1}{5}\cdot\frac{(n-1)V(n)+(n+1)V(n-2)}{V(n)} \] and thus, knowing that $\displaystyle\lim_{n\to\infty} V(n+1)/V(n) = \varphi$, we have \[ \lim_{n\to\infty}\frac1n\cdot\frac{T_1(n)}{V(n)} = \frac{1}{5}\left(\varphi+\frac{1}{\varphi}\right) = \frac{1}{\sqrt5} \] \[ \lim_{n\to\infty}\frac1n\cdot\frac{T_2(n)}{V(n)} = \frac{1}{5}\left(1+\frac{1}{\varphi^2}\right) = \frac{2}{5+\sqrt5} \] \[ \lim_{n\to\infty}\frac1n\cdot\frac{T(n)}{V(n)} = \frac{1}{5}\big(\varphi+2\big) = \frac{5+\sqrt5}{10} \] That is, if $n$ is large, on average a sum of 1s and 2s equal to $n$ will have about $n\cdot\dfrac{5+\sqrt5}{10}\approx 0.7236n$ terms, and the ratio of 1s to 2s will on average be about $\varphi$, or about 1.618. It took some work to get here, but once we have the answer in hand, it isn’t too surprising: the prevalence of $\varphi$’s appearances on this topic makes it entirely likely that the 1s and 2s would appear in that ratio, and once we know that $p+2q=n$ and $p\approx\varphi q$, we can solve to find $p\approx n/\sqrt5$ and $q\approx n/(\varphi\sqrt5)$.

P.S. The polynomials $P_n(t) = P_n(t,1)$ that appear as coefficients of $x^n$ in the expansion of $1/(1-tx-x^2)$ are sometimes called Fibonacci polynomials. They satisfy a recurrence relation similar to that of the Fibonacci numbers themselves: \[ P_0(t) = 1, \qquad P_1(t) = t, \qquad P_n(t) = tP_{n-1}(t) + P_{n-2}(t). \]

P.P.S. In the book Analytic Combinatorics by P. Flajolet and R. Sedgewick, an example in chapter 3 (Proposition III.4) shows that, if one allows compositions with terms of any size, not just 1s and 2s, then for any $n\geq1$ the average number of terms in a composition of $n$ is always $(n+1)/2$, and as $n\to\infty$, the number of times an integer $m\geq1$ appears in a composition of $n$ tends to $n/2^{m+1}$. That is, for large $n$, about half of the terms are 1s, about a quarter of the terms are 2s, and so on. My guess is that for compositions with terms no larger than $N$, the appearances of 1, 2, …, $N$ will distribute according to powers of the positive solution to $x^N + x^{N-1} + \cdots + x = 1$. When $N=2$, this solution is $1/\varphi$, and as $N\to\infty$ it approaches $1/2$.

converting between sums

2022-12-21T09:00:00.000-08:00

The number $5$ can be written in eight ways as a sum of $1$s and $2$s (in which order matters): \[ 1+1+1+1+1,\; 1+1+1+2,\; 1+1+2+1,\; 1+2+1+1, \] \[ 2+1+1+1,\; 1+2+2,\; 2+1+2,\; 2+2+1. \] The number $6$ can also be written in eight ways as a sum of odd numbers: \[ 1+1+1+1+1+1,\; 1+1+1+3,\; 1+1+3+1,\; 1+3+1+1, \] \[ 3+1+1+1,\; 1+5,\; 3+3,\; 5+1. \] Coincidence? No, and if you look carefully at the two lists of sums above, you might be able to guess how the rest of the post is going to go.

Let $a(n)$ be the number of ways to write $n$ as a sum of $1$s and $2$, and let $b(n)$ be the number of ways to write $n$ as a sum of (positive) odd numbers. The overall goal of this essay is to show that $b(n) = a(n-1)$, in two different ways.

As it turns out, $a(n)$ and $b(n)$ are both closely related to (or, depending on your perspective, the same as) the Fibonacci sequence $f(n)$, defined recursively by $f(1) = f(2) = 1$, and for $n\geq3$ \[ f(n) = f(n-1) + f(n-2). \] The sequence $f(n)$ begins, starting at $n=1$, with the terms $1, 1, 2, 3, 5, 8, 13, \dots$. The recursive formula for this sequence famously arises from the rabbit problem posed by Leonardo of Pisa: each month, Fibonacci’s Immortal Rabbits™ produce a new pair from each pair more than a month old. The number of pairs of rabbits in month $n$ is therefore the sum of the number of pairs the previous month (all of them having stuck around), and the number of pairs from two months prior (all of these pairs having produced a new pair). Start with $f(1)=1$ in the first month, and the Fibonacci sequence counts the number of pairs of rabbits you have in the $n$th month.

We can see that $a(1) = 1$ and $a(2) = 2$, because $1$ can be written just as $1$, and $2$ can be written as $2$ or as $1+1$. For larger values of $n$, the ways of writing $n$ as a sum of $1$s and $2$s can be partitioned into the sums that start with $1$ and those that start with $2$. Therefore, $a(n)$ can be expressed as $a(n-1) + a(n-2)$ for $n \ge 3$. The sequence $a(n)$ begins $1,2,3,5,8,13,21,\dots$. This is the Fibonacci sequence again, but shifted by one position: $a(n) = f(n+1)$. It is perhaps more appropriately called the Virahanka sequence. What happened to the initial $1$ in the Fibonacci sequence, though? It shifted to the 0th position: the empty sum, having no terms, is equal to 0 by definition, so $a(0)=1$. Zero can be written as a sum of $1$s and $2$ in one way, as the empty sum. (I have begun calling both $f(n)$ and $a(n)$ the Fibonacci–Virahanka sequence.)

Now let’s try to do something similiar for the sequence $b(n)$. We can check by hand that $b(1) = b(2) = 1$, because only the sum $1+1$ is allowed for $2$. Any sum of odd numbers could start with 1, or with 3, or with 5, and so on, which implies \[ b(n) = b(n-1) + b(n-3) + b(n-5) + b(n-7) + \cdots. \] While this looks potentially like an infinite sum, notice that for any particular $n$, the number of nonzero terms is finite, because $b(n) = 0$ when $n < 0$. But notice what happens when we peel off the first term of this infinite sum: \[ \begin{array}{rl} b(n) &= b(n-1) + \big(b(n-3) + b(n-5) + b(n-7) + \cdots\big) \\ &= b(n-1) + \big(b((n-2)-1) + b((n-2)-3) + b((n-2)-5) + \cdots\big) \\ &= b(n-1) + b(n-2) \end{array} \] So $b(n)$ satisfies the same recursive formula as $f(n)$ and $a(n)$! Thus this sequence begins $1,1,2,3,5,8,13,\dots$, which looks the same as the start of the sequence $f(n)$. Well, almost. It’s not quite the same sequence as $f(n)$, because $b(0)$, like $a(0)$, equals 1. So, starting at $n=0$, the sequence $b(n)$ begins $1,1,1,2,3,5,8,\dots$. The “empty sum” convention is necessary for the recursive formula for $b(n)$ to work in the case of both even and odd values of $n$. However, in Fibonacci’s rabbit problem, the rabbit keeper has zero pairs of rabbits in month 0, so $f(0)=0\ne b(0)$.

The conclusion of the preceding is that $a(n)=b(n+1)$ when $n\ge0$. Yet a second proof was promised. So consider a sum of $1$s and $2$s that is equal to $n$. For example, if $n=9$, we could write \[ 9 = 2 + 1 + 1 + 2 + 2 + 1. \] Now append a $1$ to the beginning of the sum, then wherever a $2$ appears, group it with the preceding terms, until a $1$ is reached, to produce an odd number. For example, using the sum above that equals $9$, we get \[ 10 = (1 + 2) + 1 + (1 + 2 + 2) + 1 = 3 + 1 + 5 + 1. \] This process can be reversed. Given a sum of odd numbers, convert each term to a sum of $1$ followed by a sequence of $2$s, then remove the initial $1$. For example, the sum \[16 = 1 + 3 + 5 + 7\] is converted to \[1 + (1 + 2) + (1 + 2 + 2) + (1 + 2 + 2 + 2),\] which corresponds to \[15 = 1 + 2 + 1 + 2 + 2 + 1 + 2 + 2 + 2.\] Thus we have found an explicit bijection between sums of $1$s and $2$ equal to $n$ and sums of odd numbers equal to $n+1$. These sums are also called compositions. A comment on the OEIS entry for the Fibonacci sequence observes that, more generally, “the numbers of compositions [of $n$] using parts $1$ and $k$ is equivalent to the numbers of compositions of $(n+1)$ using parts $1$ mod $k$.” An exercise for the reader! Can you generalize further?

in triplicate

2022-06-04T12:31:00.000-07:00

Academic grades, as conventionally understood and used, have three distinct audiences, and they serve a different function for each audience. Grades are problematic largely due to the ways these functions are at odds with each other.

The three audiences are:

The student. In the context of a single course, assessment and feedback are necessary parts of the educational process. They let the student know how they’re doing as they go along. Traditional letter grades are a crude form of feedback, and where they are used students rarely have a chance to meaningfully respond, other than to try to adapt for the next assessment. Alternative grading systems often treat the entire course duration as a traning period, in which the student can respond to feedback through revision, reassessment, or other forms of demonstrated proficiency following an initial evaluation.
The institution. Let’s be as generous as possible and assume that the primary goal of the college or university in which a student is enrolled is to educate that student in such a way that they reach their fullest potential. This requires communication among the numerous instructors that student will encounter throughout their studies, and also coordination with the other offices and institutional bodies that are present to support the student. A grade can be a succinct summary to the next instructor in a sequence about how developed are the student’s prerequisite skills. It can also be a signal to the institution about how well the student is progressing in their overall academic career. (This is the reason for “mid-semester” grades that can be used to alert the school when a student needs assistance or intervention.)
The outside world. Outside of the institution (and inside as well, to an extent), grades become currency that the student can trade for prestige or opportunities. This currency may be in the form of a GPA (to which all of the individual course grades contribute, despite being largely incommensurable with each other) or, for a more granular view, a transcript (which provides the opportunity to craft a narrative about the grades). Of course, not all persons or organizations outside academia value this currency in the same way. But “good grades” provide an inexhaustible supply of recommendations, and “bad grades” are a perpetual obstacle to be minimized or navigated around.

Thus grades are expected to operate at three different social scales, and also at three different time scales. The feedback to a student within a course is short-term, the communcation with the institution is medium-term, and the message to the outside world is long-term (or as they say, permanent). Much of the “objectivity theater” surrounding the assignment of grades is based on the pretence that these three purposes can be fulfilled by a single summative object.

The fact that the third use of grades has both the largest audience and the longest-lasting effects means that it becomes their dominant purpose, their telos. Student anxiety about grades, for the most part, is caused not by an intrinsic dislike of getting feedback about how they can improve their understanding and performance in a subject, but by the belief that in the end what has the greatest practical impact is the final letter or number they can show to the outside world when the course is done. Institutional concerns about “rigor” are based not on the needs of the student, but the needs of the school to present the final scores to the outside world as meaningful indicators of their students’ quality.

Those of us engaged in the practice of developing and implementing alternative grading systems are primarily focused on the first and smallest-scale purpose of grades, providing a useful feedback process to the student. Yet our systems must also interface with the institutional and outside world audiences. At those interfaces lie, in my view, the most difficult ethical questions of grading: how do we support students beyond our time as their instructor? how do we provide an honest evaluation that meets the needs of all three audiences? to whom are we primarily responsible? is the merging of the three functions into a single metric flawed in such a way that it needs to be overthrown?

Honestly, I think (at the time of this writing) that grades are most useful at the institutional level. If it were not for the outward-facing use of grades, they could serve as a quick, qualitative (not quantitative) shorthand in communicating among the internal parts of a college or university what are the needs or successes of an individual student. (To be supplemented by more personalized detail as necessary.) Within a course, as we’ve seen, any number of assessment/feedback systems can work, as long as they’re built on clear communication and building trust in the student-faculty relationship. As for the presentation of grades to the outside world, well, that’s where the dirty work happens.

more on poetry and partitions (sort of)

2022-05-23T16:41:00.005-07:00

In my previous post I defined the ordered Virahanka numbers $V_{m,n}$ and the unordered Virahanka numbers $U_{m,n}$ recursively by \[ \begin{array}{rl} V_{m,n} = U_{m,n} = 0 &\qquad \text{for $n < 0$ or $m < 0$} \\ V_{m,0} = U_{m,0} = 1 &\qquad \text{for $m \ge 0$} \\ V_{m,n} = \displaystyle\sum_{k=1}^m V_{m,n-k} &\qquad \text{for $n > 0$} \\ U_{m,n} = \displaystyle\sum_{k=1}^m U_{k,n-k} &\qquad \text{for $n > 0$} \\ \end{array} \] The value of $V_{m,n}$ counts the number of compositions of $n$ with no term greater than $m$, and the value of $U_{m,n}$ counts the number of partitions of $n$ with no part greater than $m$. (Following a standard convention, I consider the “empty sum” having no terms to be equal to $0$, which is why $V_{m,0} = U_{m,0} = 1$; moreover, $V_{0,n} = U_{0,n} = 0$ for $n > 0$, because the recursive formulas above become empty sums.) Some special cases:

$V_{2,n}$ is the sequence of Virahanka–Fibonacci numbers $1,1,2,3,5,8,\dots$.
$V_{n,n}$ is the total number of compositions of $n$ (which equals $2^{n-1}$ for $n \ge 1$).
$U_{n,n}$ is the total number of partitions of $n$ (which has no known closed-form expression).

For later reference, here are partial tables of values for $V_{m,n}$ and $U_{m,n}$. \[ V_{m,n}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 & 55 & 89 \\ \hline 3 & 1 & 1 & 2 & 4 & 7 & 13 & 24 & 44 & 81 & 149 & 274 \\ \hline 4 & 1 & 1 & 2 & 4 & 8 & 15 & 29 & 56 & 108 & 208 & 401 \\ \hline 5 & 1 & 1 & 2 & 4 & 8 & 16 & 31 & 61 & 120 & 236 & 464 \\ \hline 6 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 63 & 125 & 248 & 492 \\ \hline 7 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 127 & 253 & 504 \\ \hline 8 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 255 & 509 \\ \hline 9 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 511 \\ \hline 10 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 512 \end{array} \] \[ U_{m,n}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & 5 & 5 & 6 \\ \hline 3 & 1 & 1 & 2 & 3 & 4 & 5 & 7 & 8 & 10 & 12 & 14 \\ \hline 4 & 1 & 1 & 2 & 3 & 5 & 6 & 9 & 11 & 15 & 18 & 23 \\ \hline 5 & 1 & 1 & 2 & 3 & 5 & 7 & 10 & 13 & 18 & 23 & 30 \\ \hline 6 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 14 & 20 & 26 & 35 \\ \hline 7 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 21 & 28 & 38 \\ \hline 8 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 29 & 40 \\ \hline 9 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 41 \\ \hline 10 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 42 \end{array} \]

There are so many fun ways to play with these collections of numbers, it’s hard to know when to stop! To limit the scope of this post, I will only consider two related notions, and those only in part: first differences, and generating functions.

First differences

Given a collection of numbers $S_{m,n}$ indexed by integers $m$ and $n$, and a pair $(p,q) \in \mathbb{Z}^2$, let \[ S_{m,n}^{(p,q)} = S_{m,n} - S_{m-p,n-q}\text. \]

From the definitions, we have for $n > 1$ \[ \begin{array}{rl} V_{m,n}^{(0,1)} &= V_{m,n} - V_{m,n-1} \\ &= \sum_{k=1}^m V_{m,n-k} - \sum_{k=1}^m V_{m,(n-1)-k} \\ &= \sum_{k=1}^m \big(V_{m,n-k} - V_{m,(n-k)-1}\big) \\ &= \sum_{k=1}^m V_{m,n-k}^{(0,1)} \end{array} \] and so the first differences $V_{m,n}^{(0,1)}$ satisfy the same recurrence relation as the original numbers $V_{m,n}$ when $n\ge 2$. Here is the table of values for $V_{m,n}^{(0,1)}$. \[ V_{m,n}^{(0,1)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 2 & 1 & 0 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 \\ \hline 3 & 1 & 0 & 1 & 2 & 3 & 6 & 11 & 20 & 37 & 68 & 125 \\ \hline 4 & 1 & 0 & 1 & 2 & 4 & 7 & 14 & 27 & 52 & 100 & 193 \\ \hline 5 & 1 & 0 & 1 & 2 & 4 & 8 & 15 & 30 & 59 & 116 & 228 \\ \hline 6 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 31 & 62 & 123 & 244 \\ \hline 7 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 63 & 126 & 251 \\ \hline 8 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 127 & 254 \\ \hline 9 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 255 \\ \hline 10 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 \end{array} \] We see that the “initial conditions” of the recurrence $V_{m,n}^{(0,1)} = \sum_{k=1}^m V_{m,n-k}^{(0,1)}$ have shifted, so that before beginning to apply the recurrence relation, we insert one term equal to $0$. But what are these numbers counting? Well, $V_{m,n}$ is the number of compositions of $n$ with no term greater than $m$, and $V_{m,n-1}$ is the number of compositions of $n-1$ with no term greater than $m$. Any composition of $n-1$ can be converted to a composition of $n$ by adding a $1$ at the end. So $V_{m,n}^{(0,1)}$ equals the number of compositions of $n$ that do not end with $1$. Perhaps not the most interesting quantity to count, although having the recurrence formula is nice. (The same principle will lead to other interesting quantities, so don’t go away yet!)

On the other hand, $V_{m,n}^{(1,0)} = V_{m,n} - V_{m-1,n}$ does have an interesting interpretation. Because $V_{m-1,n}$ counts all compositions of $n$ whose largest term is at most $m-1$, $V_{m,n}^{(1,0)}$ equals the number of compositions of $n$ whose largest term is exactly $m$. Here’s a partial table of values. \[ V_{m,n}^{(1,0)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 0 & 0 & 1 & 2 & 4 & 7 & 12 & 20 & 33 & 54 & 88 \\ \hline 3 & 0 & 0 & 0 & 1 & 2 & 5 & 11 & 23 & 47 & 94 & 185 \\ \hline 4 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 & 27 & 59 & 127 \\ \hline 5 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 & 28 & 63 \\ \hline 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 & 28 \\ \hline 7 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 \\ \hline 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 \\ \hline 9 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 \\ \hline 10 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array} \] One interesting feature of this table is that the entries stabilize below the line $n = 2m$: when $n < 2m$, we see that $V_{m+1,n+1}^{(1,0)} = V_{m,n}^{(1,0)}$. The sequence $V_{m,2m-1}^{(0,1)}$ begins (starting with $m = 1$) \[ 1,\;2,\;5,\;12,\;28,\;64,\;144,\;320,\;704,\;\dots \] The OEIS says that this sequence comes from counting all the $1$s that appear in all compositions of $m$. Why should that sequence arise here? I guess that’s a mystery that will have to wait for later…

Turning to first differences of the unordered Virahanka numbers, let’s start with $U_{m,n}^{(1,0)}$. \[ U_{m,n}^{(1,0)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 0 & 0 & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & 5 \\ \hline 3 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 4 & 5 & 7 & 8 \\ \hline 4 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 5 & 6 & 9 \\ \hline 5 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 5 & 7 \\ \hline 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 5 \\ \hline 7 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 \\ \hline 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 \\ \hline 9 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 \\ \hline 10 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array} \] It looks like no new numbers have appeared in this table; each row has simply been shifted to right, with the $m$th row having $m$ new $0$s at the start. Indeed, using the recurrence formula for $U_{m,n}$, we calculate that \[ \begin{array}{rl} U_{m,n}^{(1,0)} &= U_{m,n} - U_{m-1,n} \\ &= \sum_{k=1}^m U_{k,n-k} - \sum_{k=1}^{m-1} U_{k,n-k} \\ &= U_{m,n-m} \end{array} \] Why should this be true, from a counting perspective? Reasoning as we did with $V_{m,n}^{(1,0)}$, we see that $U_{m,n}^{(0,1)}$ counts the number of partitions of $m$ whose largest part is exactly $m$. But if the largest part is exactly $m$, then it only remains to partition $n-m$ using parts no greater than $m$, which is exactly what $U_{m,n-m}$ counts! So $U_{m,n}^{(0,1)} = U_{m,n-m}$.

Ok, now for taking first differences of $U_{m,n}$ within rows. \[ U_{m,n}^{(0,1)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 2 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ \hline 3 & 1 & 0 & 1 & 1 & 1 & 1 & 2 & 1 & 2 & 2 & 2 \\ \hline 4 & 1 & 0 & 1 & 1 & 2 & 1 & 3 & 2 & 4 & 3 & 5 \\ \hline 5 & 1 & 0 & 1 & 1 & 2 & 2 & 3 & 3 & 5 & 5 & 7 \\ \hline 6 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 3 & 6 & 6 & 9 \\ \hline 7 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 6 & 7 & 10 \\ \hline 8 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 7 & 7 & 11 \\ \hline 9 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 7 & 8 & 11 \\ \hline 10 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 7 & 8 & 12 \end{array} \] Hmm, that’s curious: this is the first time we’ve seen any cases (outside of the $0$th row or the $1$st column) where the entries occasionally decrease moving left to right. What exactly are we seeing? Well, any partition of $n-1$ can be converted to a partition of $n$ by adjoining a $1$, so the difference between $U_{m,n}$ and $U_{m,n-1}$ is the number of partitions of $n$ that do not include $1$ as a part and whose largest part is at most $m$. So,

In row $2$, we are only allowing partitions using $2$, so every even number has one partition, and every odd number has zero.
In row $3$, we are only allowing partitions using $2$ and $3$. By hand, we find that $2$, $3$, $4$, and $5$ have one partition each, but $6$ has two: $6 = 2 + 2 + 2 = 3 + 3$. However, $7$ only has one: $7 = 3 + 2 + 2$. From this point, each time we add $6$, we increase the number of possible partitions by $1$, so $U_{3,n+6} = U_{3,n}+1$.

As the rows go on, the jumps down become harder to predict, in my view. But we could plug them into the OEIS and see what comes up! Here’s what row 4 yields.

Evidently, there is more fun to be had, looking at first differences $V_{(m,n)}^{(p,q)}$ and $U_{m,n}^{(p,q)}$ for other values of $(p,q)$, second differences, and so on. But for now let’s turn to a new game.

Generating functions

Given a sequence of numbers $\{a_n\}$, the generating function of the sequence is the formal power series \[ g(x) = \sum a_n x^n\text. \] I have not included a starting or ending index in order to allow for whatever values of $n$ make sense. In cases where the power series converges to an analytic function on some neighborhood of $0$, this function is also called the generating function of the sequence, and the terms of the sequence are the Taylor coefficients of the function.

The Virahanka–Fibonacci numbers $V_{2,n}$ produce the generating function \[ g_2(x) = \sum_{n=0}^\infty V_{2,n}x^n\text. \] Because $V_{2,0} = 1$, $V_{2,n} = 0$ for $n < 0$, and $V_{2,n} = V_{2,n-1} + V_{2,n-2}$ for $n \ge 1$, we can rewrite the power series as \[ \begin{array}{rl} g_2(x) &= \displaystyle 1 + \sum_{n=1}^\infty (V_{2,n-1} + V_{2,n-2}) x^n \\ &= \displaystyle 1 + \sum_{n=1}^\infty V_{2,n-1} x^n + \sum_{n=1}^\infty V_{2,n-2} x^n \\ &= \displaystyle 1 + x\sum_{n=1}^\infty V_{2,n-1} x^{n-1} + x^2\sum_{n=1}^\infty V_{2,n-2} x^{n-2} \\ &= \displaystyle 1 + x\sum_{n=0}^\infty V_{2,n} x^n + x^2\sum_{n=0}^\infty V_{2,n} x^n \\ &= 1 + xg_2(x) + x^2g_2(x) \end{array} \] and therefore $g_2(x) = 1 + xg_2(x) + x^2g_2(x)$. Solving this equation, we find \[ g_2(x) = \frac{1}{1 - x - x^2}\text. \] By a similar process, if we set \[ g_m(x) = \sum_{n=0}^\infty V_{m,n}x^n \] then using $V_{m,0} = 1$ and $V_{m,n} = V_{m,n-1} + \cdots + V_{m,n-m}$ for $n > 0$, we get \[ g_m(x) = \frac{1}{1 - x - \cdots - x^m}\text. \] As $m \to \infty$, the series $x + x^2 + \dots + x^m$ that is subtracted in the denominator converges to $x/(1-x)$, so the generating functions converge to \[ g_{\infty}(x) = \frac{1}{1 - x/(1 - x)} = \frac{1 - x}{1 - 2x}\text. \] This is precisely equivalent to \[ g_{\infty}(x) = 1 + \sum_{n=1}^\infty 2^{n-1} x^n \] which is the generating function of the sequence $V_{n,n}$ to which the terms of $V_{m,n}$ stabilize as $m$ grows.

Now let’s find the generating functions of the $U_{m,n}$ sequences. For each $m \ge 0$, set \[ f_m(x) = \sum_{n=0}^\infty U_{m,n}x^n \] By inspection, we have \[ f_0(x) = 1 \qquad\text{and}\qquad f_1(x) = 1 + x + x^2 + \cdots = \frac{1}{1 - x}\text. \] (These are the same as $g_0(x)$ and $g_1(x)$.) We know that $U_{2,n} = U_{1,n-1} + U_{2,n-2}$ for $n \ge 1$, and so \[ \begin{array}{rl} f_2(x) &= \displaystyle 1 + \sum_{n=1}^\infty (U_{1,n-1} + U_{2,n-2}) x^n \\ &= \displaystyle 1 + x\sum_{n=1}^\infty U_{1,n-1}x^{n-1} + x^2\sum_{n=1}^\infty U_{2,n-2} x^{n-2} \\ &= \displaystyle 1 + x\sum_{n=0}^\infty U_{1,n}x^n + x^2\sum_{n=0}^\infty U_{2,n} x^n \\ &= 1 + xf_1(x) + x^2f_2(x) \end{array} \] This implies $(1 - x^2) f_2(x) = 1 + xf_1(x)$, or \[ f_2(x) = \left(1 + \frac{x}{1 - x} \right) \frac{1}{1 - x^2} = \frac{1 - x + x}{1 - x} \frac{1}{1 - x^2} = \frac{1}{(1 - x)(1 - x^2)}\text. \] Proceeding inductively, a similar process shows that \[ f_m(x) = 1 + xf_1(x) + x^2f_2(x) + \cdots + x^mf_m(x) \] and therefore \[ f_m(x) = \frac{1}{(1-x)(1-x^2)\cdots(1-x^m)} \] for all $m \ge 2$. On compact subsets of $(-1,1)$, $x^m$ converges uniformly to $0$, and so $f_m$ converges uniformly on compact subsets of $(-1,1)$ to \[ f_\infty(x) = \prod_{k=1}^\infty \frac{1}{1 - x^k} \] which is the generating function of $U_{n,n}$, the sequence that counts all partitions of $n$.

Now that we have the generating functions $g_m(x)$ and $f_m(x)$ in hand, we can have more fun! Note that \[ x\sum a_n x^n = \sum a_n x^{n+1} = \sum a_{n-1} x^n \] and so multiplying a generating function by $x$ shifts all the terms of the sequence to the right by one position. (We have already used this fact above; it’s just worth making the general principle explicit.) This allows us to find the generating functions of the first differences calculated in the previous section. We’ll consider these in the reverse order from what we did previously.

For $U_{m,n}^{(1,0)}$, the generating function is \[ \begin{array}{rl} f_m(x) - f_{m-1}(x) &= \dfrac{1}{(1 - x)\cdots(1 - x^m)} - \dfrac{1}{(1 - x)\cdots(1 - x^{m-1})} \\ &= \dfrac{x^m}{(1 - x)\cdots(1 - x^m)} = x^mf_m(x) \end{array} \] Hey, that’s great! This matches what we just saw: the coefficients of $f_m(x) - f_{m-1}(x)$ are the same as those of $f_m(x)$, but shifted $m$ places to the right.

For $U_{m,n}^{(0,1)}$, the generating function is \[ \begin{array}{rl} f_m(x) - xf_m(x) &= \dfrac{1}{(1 - x)\cdots(1 - x^m)} - \dfrac{x}{(1 - x)\cdots(1 - x^m)} \\ &= \dfrac{1 - x}{(1 - x)\cdots(1 - x^m)} = \dfrac{1}{(1 - x^2)\cdots(1 - x^m)} \end{array} \] Oh, neat—we wanted a function that counts the number of partitions that don’t contain $1$ as a part, and the factor $1-x$ vanished! This suggests that if we want to count partitions that only allow $a_1,\dots,a_N$ then perhaps we should use the generating function $[(1-x^{a_1})\cdots(1-x^{a_N})]^{-1}$? Something to explore further…

Now for $V_{m,n}^{(1,0)}$, the generating function is \[ \begin{array}{rl} g_m(x) - g_{m-1}(x) &= \dfrac{1}{1 - x - \cdots - x^m} - \dfrac{1}{1 - x - \cdots - x^{m-1}} \\ &= \dfrac{x^m}{(1 - x - \cdots - x^m)(1 - x - \cdots - x^{m-1})} \\ &= x^m g_m(x) g_{m-1}(x)\text. \end{array} \] Hmm, this definitely tells us something interesting. Multiplying out the power series for $g_m$ and $g_{m-1}$, and equating coefficients, we find \[ V_{m,n}^{(1,0)} = \sum_{k=0}^{n-m} V_{m,k} V_{m-1,n-m-k}\text. \] Recall from our interpretation of $V_{m,n}^{(1,0)}$ that it counts the number of compositions of $n$ whose largest term is exactly equal to $m$. Indeed, since we know the largest term is $m$, we can sort the compositions by the last term that equals $m$. The sum of the terms before this will be equal to some $k \le n - m$, for which there are $V_{m,k}$ compositions with no terms greater than $m$. After the last term equal to $m$, the remaining terms will sum to $n - k - m$, for which there are $V_{m-1,n-k-m}$ compositions whose terms are all less than $m$. We thus could have found this equality earlier in our study, but the generating function suggested it immediately! (I had hoped that looking at the generating functions would also help explain why $V^{(1,0)}_{m,2m-1}$ equals the number of $1$s in all compositions of $m$, as observed above, but I haven’t quite got there yet… possible update to come!)

For $V_{m,n}^{(0,1)}$, the generating function is \[ \begin{array}{rl} g_m(x) - xg_m(x) &= \dfrac{1}{1 - x - \cdots - x^m} - \dfrac{x}{1 - x - \cdots - x^m} \\ &= \dfrac{1 - x}{1 - x - \cdots - x^m} \end{array} \] Oh, so changing the initial conditions of the sequence just corresponds to changing the numerator of the generating function. That’s neat! (Indeed, the Lucas numbers $L_n$ famously satisfy the same recursive formula as the Virahanka–Fibonacci numbers, $L_n = L_{n-1} + L_{n-2}$, and their generating function is $(2-x)/(1-x-x^2) = g_2(x) + (g_2(x)-xg_2(x))$.)

I can’t help doing one more thing with these generating functions, especially since we haven’t seen much interaction between the numbers $V_{m,n}$ and $U_{m,n}$. Let’s try dividing their generating functions. We know that $V_{m,n}$ is at least as large as $U_{m,n}$, so we’ll divide $g_m$ by $f_m$: \[ \frac{g_m}{f_m} = \frac{(1-x)\cdots(1-x^m)}{1-x-\cdots-x^m}\text. \] What could the coefficients of this generating function represent? Loosely speaking, they carry the information about the number of compositions (with no part greater than $m$) that is not contained in the number of partitions. That’s fairly vague, so let’s consider the case $m = 2$. We calculate \[ \frac{g_2}{f_2} = \frac{(1-x)(1-x^2)}{1-x-x^2} = 1 + \frac{x^3}{1-x-x^2} = 1 + x^3 g_2 \] which gives us the factorization $g_2 = f_2\big(1 + x^3 g_2\big)$, or $g_2 = f_2 + x^3 f_2 g_2$. Equating coefficients of the power series yields the relation \[ V_{2,n} = U_{2,n} + \sum_{k=0}^{n-3} U_{2,k} V_{2,n-3-k}\text. \] This equation has a sensible meaning. Recall that a partition of $n$ can be thought of as a composition (a sum of positive integers summing to $n$) in which the terms are non-increasing. (For example, $2 + 2 + 1 + 1 + 1$ is a partition of $7$, but $2 + 1 + 1 + 2 + 1$ is not.) The right-hand side of the last equation splits the number of compositions into the number of partitions $U_{2,n}$ and the number of non-partitions. If we are only allowing terms equal to $1$ or $2$, then a composition that is not a partition must have a $1$ followed by a $2$ somewhere in its expression. In poetic terms, we can think of $1$ and $2$ as short and long syllables, and $n$ as the number of beats in a line. We sort compositions by where the first appearance of $1 + 2$ occurs. Everything before that in the sum is non-increasing; say this takes $k$ beats (this is the $U_{2,k}$ factor in the sum on the right). Then we have $1 + 2$, which accounts for $3$ more beats, and finally the remaining $n-3-k$ beats can be subdivided in any manner using shrot and long syllables (which is what the $V_{2,n-3-k}$ factor counts).

The sequence of coefficients for $g_4(x)/f_4(x)$ begins \[ 1,\;0,\;0,\;1,\;2,\;5,\;8,\;16,\;30,\;58,\;113,\;217,\;418,\;\dots\text. \] As of this writing (May 2022) this sequence does not appear in the OEIS (should I submit it?). However, the sequence of coefficients for \[ \frac{g_\infty}{f_\infty} = 1 + x^3 + 2x^4 + 5x^5 + 9x^6 + 19x^7 + 37x^8 + 74x^9 + 148x^{10} + \cdots \] does appear, as A178841, which counts “the number of pure inverting compositions of $n$.” These seem to be related to a basis for the algebra of quasisymmetric functions as a module over the symmetric functions. So there’s something deep and important going on with these quotients!

In case it isn’t obvious from this post, I’m still learning about generating functions, and I’m kind of in awe of them. It’s entirely possible that everything in this post appears among the exercises of some book on combinatorics. (Certainly most of them are mild generalizations of facts that can be found in the OEIS.) One of my summer activities may be to finally get through Herbert Wilf’s book generatingfunctionology (the second edition is available for free download at the link).

on poetry and partitions

2022-05-18T12:10:00.005-07:00

Last fall, I learned the names of Pingala and Hemachandra from a blog post by James Propp. Both have been described as poets and mathematicians, and both produced works related to what are commonly known as the Fibonacci sequence \[ 1,\;1,\;2,\;3,\;5,\;8,\;13,\;21,\;34,\;55,\;89,\;144,\;\dots\text. \] Pingala lived around the 3rd century BCE, and Hemachandra lived in the 12th century CE. (The book Liber abaci—which contains the rabbit problem that associated Fibonacci’s name with the above sequence, along with other, more useful ideas—was produced later, in 1202.) In between them came Virahanka, sometime between the 6th and 8th centuries CE; following Manjul Bhargava, it is Virahanka’s name I will emphasize in this post.

In Sanskrit poetry, each syllable is either short or long, lasting one or two beats. Some forms of poetic meter have a fixed number of syllables in each line, and some have a fixed number of beats in each line. In the latter case, a natural question (asked at least as early as Pingala) is: how many ways can a line of $n$ beats be metrically divided into short and long syllables? For example, a line with three beats could be arranged “short, short, short” or “long, short” or “short, long”. This corresponds to the equalities $3 = 1+1+1 = 2+1 = 1+2$.

Virahanka seems to have been the first to provide a complete answer, stating that each line of $n$ beats must end with either a short syllable or a long syllable; therefore, the number of possibilities equals the sum of the numbers of possibilities for a line with $n-1$ beats and a line with $n-2$ beats. Because a line with one beat can be divided in only one way, and a line with two beats can be divided in two ways (“short, short” or “long”), the resulting sequence is the same as Fibonacci’s rabbit-counting sequence.¹ (The additional initial $1$ in the sequence above will be accounted for momentarily.)

Not only does poetic meter provide a more natural setting for this counting problem than rabbits, it is more reasonably generalized. Suppose, for instance, that instead of just two lengths of syllables, we had a language with three lengths of syllables: one beat, two beats, and three beats. Now how many metrical ways are there to compose a poetic line of $n$ beats? As before, we can work out the first few cases by hand: $1 = 1$, $2 = 1 + 1 = 2$, $3 = 1 + 1 + 1 = 2 + 1 = 1 + 2 = 3$. For larger numbers of beats, as before we can sort by the length of the final syllable ($1$, $2$, or $3$), and then count how many ways there are to complete the line, using our knowledge of previous line lengths. For example, a line with four beats can be divided in seven ways, four that end with a $1$-beat syllable, two that end with a $2$-beat syllable, and one that ends with a $3$-beat syllable: \[ \begin{array}{rl} 4 &= 1 + 1 + 1 + \boxed{1} = 2 + 1 + \boxed{1} = 1 + 2 + \boxed{1} = 3 + \boxed{1} \\ &= 1 + 1 + \boxed{2} = 2 + \boxed{2} \\ &= 1 + \boxed{3}\text. \end{array} \] In this sequence, each term is therefore the sum of the previous three terms: \[ 1,\;2,\;4,\;7,\;13,\;24,\;44,\;81,\;149,\;274,\;504,\;\dots\text. \] (Has anyone seen a biological application of this sequence, even an implausible one?)²

Generalizing further, suppose we have syllables of lengths $1$, $2$, …, $m$, and we wish to count the number of rhythmic ways to compose a line of $n$ beats. Or put another way, how many sums with terms between $1$ and $m$ add up to $n$?³ A sum of positive integers equalling $n$ is called a composition of $n$. Let $V_{m,n}$ be the number of compositions of $n$ whose largest term is no greater than $m$. I will call $V_{m,1},V_{m,2},V_{m,3},\dots$ the $m$th Virahanka sequence, so that $V_{2,n}$ is the usual $n$th Fibonacci number. By the same reasoning as in the case of $V_{2,n}$, we have \[ V_{m,n} = V_{m,n-1} + V_{m,n-2} + \cdots + V_{m,n-m}, \qquad n > m. \]

We can naturally extend $V_{m,n}$ to be defined when $m$ or $n$ equals $0$, or both. Using the convention that the “empty sum” having no terms is equal to $0$, we get $V_{m,0} = 1$ for all $m$, and $V_{0,n} = 0$ for $n \ge 1$. We can even set $V_{m,n} = 0$ when $n < 0$, which allows us to drop the $n > m$ restriction entirely in the recursive formula above.

Interesting patterns⁴ emerge when we arrange the numbers $V_{m,n}$ in a single table:

\[ \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 & 55 & 89 \\ \hline 3 & 1 & 1 & 2 & 4 & 7 & 13 & 24 & 44 & 81 & 149 & 274 \\ \hline 4 & 1 & 1 & 2 & 4 & 8 & 15 & 29 & 56 & 108 & 208 & 401 \\ \hline 5 & 1 & 1 & 2 & 4 & 8 & 16 & 31 & 61 & 120 & 236 & 464 \\ \hline 6 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 63 & 125 & 248 & 492 \\ \hline 7 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 127 & 253 & 504 \\ \hline 8 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 255 & 509 \\ \hline 9 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 511 \\ \hline 10 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 512 \end{array} \]

Looking along and below the diagonal, for instance, we see that $V_{m,n} = 2^{n-1}$ if $1 \le n \le m$, and also $V_{n-1,n} = 2^{n-1} - 1$. These values make sense if we imagine $n$ beats in a poetic line, which may be grouped up into syllables of lengths $1$, $2$, …, $m$. If $m \ge n$, then any grouping is allowed, and so we just need to decide at which of the $n-1$ spaces between beats to break the syllables, yielding $2^{n-1}$ possibilities. Likewise, if $m = n-1$, then the only grouping that is not allowed is to put all the beats into a single syllable. (These values also follow from the recursive equation for $V_{m,n}$, using the formula for summing geometric series: $1 + 2 + \cdots + 2^{n-1} = 2^n - 1$.)

What happens if we disregard the order of terms in the sum? Or said another way, what if each syllable in a line of poetry must be no longer than the previous one (that is, since order doesn’t matter, we always sort the syllables from largest to smallest)? These correspond to sums in which the terms are non-increasing. Let us call the number of such sums equalling $n$ with no terms larger than $m$ the unordered Virahanka number $U_{m,n}$. For example, \[ \begin{array}{ll} 6 &= 1 + 1 + 1 + 1 + 1 + 1 \\ &= 2 + 2 + 2 = 2 + 2 + 1 + 1 = 2 + 1 + 1 + 1 + 1 \\ &= 3 + 3 = 3 + 2 + 1 = 3 + 1 + 1 + 1 \end{array} \] and so $U_{3,6} = 7$. Here is the table of unordered Virahanka numbers:

\[ \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & 5 & 5 & 6 \\ \hline 3 & 1 & 1 & 2 & 3 & 4 & 5 & 7 & 8 & 10 & 12 & 14 \\ \hline 4 & 1 & 1 & 2 & 3 & 5 & 6 & 9 & 11 & 15 & 18 & 23 \\ \hline 5 & 1 & 1 & 2 & 3 & 5 & 7 & 10 & 13 & 18 & 23 & 30 \\ \hline 6 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 14 & 20 & 26 & 35 \\ \hline 7 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 21 & 28 & 38 \\ \hline 8 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 29 & 40 \\ \hline 9 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 41 \\ \hline 10 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 42 \end{array} \]

Again the entries stabilize below the diagonal. The entries $U_{n,n}$ are generally called the number of partitions of $n$. Correspondingly, the remaining values of $U_{m,n}$ count the number of partitions of $n$ with no part greater than $m$, which is why $U_{m,n} = U_{n,n}$ if $m \ge n$. By considering partitions whose largest part is $k$, with $k = 1, \dots, m$, we get a recursive formula for these numbers, as well: \[ U_{m,n} = U_{1,n-1} + U_{2,n-2} + \cdots + U_{m,n-m}. \] Unlike the case of $V_{m,n}$, which can be calculated using only earlier terms in the $m$th (ordered) Virahanka sequence, determining $U_{m,n}$ recursively requires knowing entries from all previous rows.

There is a nice visualization that expresses the relationship between the unordered Virahanka numbers $U_{m,n}$ and the ordered Virahanka numbers $V_{m,n}$. A partition is determined by the number of times each integer appears. So we can think of $U_{m,n}$ as counting the number of integer points $(i_1,\dots,i_m)$ such that $i_k \ge 0$ for all $1\le k\le m$ and $\sum\nolimits_{k=1}^m ki_k = n$. This last condition defines a hyperplane orthogonal to $(1,2,\dots,m)$.

If $m = 2$, for each $n$ we count the number of integer points on the line $x + 2y = n$ in the first quadrant (including the origin and the positive $x$- and $y$- axes).

From this image, you can see the sequence $1, 1, 2, 2, 3, 3, \dots$, or $1+\lfloor n/2 \rfloor$, emerge.

In the case $m = 3$, $U_{3,n}$ counts the number of integer points in the positive octant such that $x + 2y + 3z = n$. The image below shows the 10 points corresponding to partitions of $n = 8$.

Now we need to convert partitions to compositions. A partition in which $k$ appears $i_k$ times corresponds to the $m$-tuple $(i_1,\dots,i_m)$. A composition with the same number of appearances of each $k$ can be obtained by placing the terms of the partition in any order; the number of distinguishable ways of arranging these terms is given by the multinomial coefficient \[ \binom{i_1+\cdots+i_m}{i_1,\dots,i_m} = \frac{(i_1+\cdots+i_m)!}{i_1!\cdots i_m!}\text. \] For example, the partition $8 = 3 + 2 + 2 + 1$ can be arranged into $\binom{1+2+1}{1,2,1} = \frac{4!}{1!2!1!} = 12$ different compositions, first by choosing the position of the $1$, then the positions of the two $2$s, and then placing $3$ in the remaining spot: \[ \begin{array}{rl} 8 &= 3 + 2 + 2 + 1 = 2 + 3 + 2 + 1 = 2 + 2 + 3 + 1 \\ &= 3 + 2 + 1 + 2 = 2 + 3 + 1 + 2 = 2 + 2 + 1 + 3 \\ &= 3 + 1 + 2 + 2 = 2 + 1 + 3 + 2 = 2 + 1 + 2 + 3 \\ &= 1 + 3 + 2 + 2 = 1 + 2 + 3 + 2 = 1 + 2 + 2 + 3 \end{array} \] Thus if we think of each point $(i_1,\dots,i_m)$ as being weighted by this corresponding multinomial coefficient, we can write $V_{m,n}$ as a sum having $U_{m,n}$ terms, as follows: \[ V_{m,n} = \sum_{(i_1,\dots,i_m):\sum ki_k = n} \binom{i_1+\cdots+i_m}{i_1,\dots,i_m}\text. \] When $m = 2$, we have $i_1 + 2i_2 = n$, so $i_1 + i_2 = n - i_2$ and $i_1 = n - 2i_2$, and by setting $j = i_2$ we get the following expression of $V_{2,n}$ (remember, these are the usual Fibonacci numbers) as a sum of binomial coefficients: \[ V_{2,n} = \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n - j}{n - 2j} = \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n - j}{j}\text. \] The binomial coefficients are, of course, the entries in what is commonly known as Pascal’s triangle. But this also was known to Indian scholars, possibly even Pingala: a later commentator, Halayudha (10th century CE), explicitly arranged the same numbers into a triangle, under the name of Meru Prastaara⁵, and clarified Pingala’s references to them in relation to poetic meter.

When $m = 3$, the formula is a little more elaborate, so let's first look at an example. We have seen that there are seven partitions of $6$ that have no part greater than $3$, corresponding to the triples $(i_1,i_2,i_3) = (6,0,0)$, $(4,1,0)$, $(2,2,0)$, $(0,3,0)$, $(3,0,1)$, $(1,1,1)$, and $(0,0,2)$. (Remember that these are the triples of positive integers such that $i_1 + 2i_2 + 3i_3 = 6$.) The resulting sum in terms of multinomial coefficients is \[ \begin{array}{rl} V_{3,6} &= \displaystyle\binom{6}{6,0,0} + \binom{5}{4,1,0} + \binom{4}{2,2,0} + \binom{3}{0,3,0} + \binom{4}{3,0,1} + \binom{3}{1,1,1} + \binom{2}{0,0,2} \\ &= 1 + 5 + 6 + 1 + 4 + 6 + 1 = 24 \end{array} \] Now for the general sum⁶: setting $j = i_2$ and $k = i_3$, we have \[ V_{3,n} = \sum_{k=0}^{\lfloor n/3 \rfloor} \sum_{j=0}^{\lfloor (n-3k)/2 \rfloor} \binom{(n-2j-3k)+j+k}{n-2j-3k,j,k} = \sum_{k=0}^{\lfloor n/3 \rfloor} \sum_{j=0}^{\lfloor (n-3k)/2 \rfloor} \binom{n-j-2k}{n-2j-3k,j,k}\text. \]

I’m sure there is much more depth to be explored regarding these numbers, probably all of it known to combinatorialists, but I have to pause here! I hope to post a part 2 soon.

Footnotes:

^1. I first learned about a similar way of obtaining the Virahanka–Fibonacci sequence from the book Proofs that Really Count by Arthur Benjamin and Jennifer Quinn. In that text, the problem is to count the number of ways to tile an $n\times1$ grid using $1\times1$ and $2\times1$ pieces, called squares and dominoes. Then they use this visual presentation to prove several common identities involving terms of the sequence.↩
^2. Some sources, including the OEIS, call numbers in this sequence “Tribonacci numbers.” Cute, but nonsensical, given that the “fi-” prefix refers to the “son” of Bonacci, not to any counting procedure. However, the OEIS does attribute the coining of “tribonacci” to Mark Feinberg in 1963, when he was a 9th grader, and that is honestly a high pedigree—how often does a 14-year-old get to invent a mathematical term that endures for over half a century?↩
^3. Exactly this kind of generalization seems to have been considered by the 14th century mathematician Narayana Pandita. It might therefore be reasonable to name the resulting sequences “Virahanka–Narayana” sequences. See “The So-called Fibonacci Numbers in Ancient and Medieval India” by Parmanand Singh and “Nārāyana’s Generalisation of Mātrā-vrtta-prastāra and the Generalised Virahānka-Fibonacci Representation of Numbers” by Raja Sridharan, R Sridharan and M D Srinivas.↩
^4. Hopefully I’ll write more about these patterns soon in a followup post.↩
^5. According to P. Singh in “The So-called Fibonacci Numbers in Ancient and Medieval India”, “meru is the name of the imaginary mountain that is supposed to stand at the center of the earth.”↩
^6. At least a version of this sum was known to Narayana Pandita. He derived the multinomial coefficients through something called a “needle sequence” or “arrow-head” sequence, which I have not yet fully understood. See the references in Footnote 3.↩

a geometric interpretation for the sum of a geometric series

2021-05-28T16:16:00.002-07:00

Start with the geometric series: \[ \sum_{n=0}^\infty z^n = 1 + z + z^2 + z^3 + \cdots\text. \] If $z$ is not a real number, the partial sums of this series form a spiral pattern, which appears to be self-similar. Below is the pattern for $z = (1+i)/2$. (Click for interactive graph.)

A similarity of the plane can be expressed as a (complex) affine function $f(w) = aw + b$. To determine $a$ and $b$ for our spiral, we note that $f$ should satisfy \[ f(0) = 1 \qquad\text{and}\qquad f(1) = 1+z\text. \] The first equation implies $b = 1$. From the second equation, we then have $a + 1 = 1 + z$, so $a = z$. Our function is therefore $f(w) = zw + 1$, with $z$ treated as a constant.

Now we check whether this function $f(w)$ does in fact show that the spiral is self-similar. Indeed, $f(1+z) = z(1+z) + 1 = 1 + z + z^2$, and more generally, the sequence $1$, $f(1)$, $f(f(1))$, $f(f(f(1)))$, … coincides with the sequence of partial sums of the geometric series.

If $z \ne 1$, then the function $f(w)$ has a fixed point. Solve $f(w) = w$, or $zw + 1 = w$, to find that the fixed point is $w = 1/(1 - z)$. When $|z| < 1$, this fixed point is the sum of the series. But in general it has a nice geometric connection to the series, even when the series diverges: it is the center of similarity for the sequence of partial sums. For example, when $z = i+1/\sqrt3$, the partial sums and center look like this:

In particular, when $z$ is a root of unity, the partial sums of the series lie at the vertices of a regular polygon, and $1/(1-z)$ is the center of this polygon. (It is also the Cesàro sum of the series in that case.) Here are the polygons and centers for $z = e^{i\pi/4}$ and $z = e^{i\,3\pi/5}$:

The centers of these polygons all have real part $1/2$, which is a special case of the observation that the function $1/(1-z)$ sends the unit circle to the line $\mathrm{Re}\,z = 1/2$.

When $z = -1$, the polygon collapes to a line segment, and of course the center of this segment is $1/2$, the usual value assigned to the series $1 - 1 + 1 - 1 + \cdots$.

initial thoughts on teaching in a time of crisis

2020-03-14T23:14:00.001-07:00

For those reading in the future, this is the week that the reality of the COVID-19 (coronavirus) pandemic struck the U.S. Hundreds of schools announced that their campuses would be closing, and students would be expected to continue their studies from home, or wherever else they might find to stay. The president of our university and the dean of the college announced on Wednesday that face-to-face instruction would continue through Friday, students would have to leave campus by Sunday afternoon (unless they received an exception), and instruction would resume remotely the following Wednesday. That leaves Monday and Tuesday for redesigning courses and implementing them in a new format. Some instructors managed to get an earlier start before the week ended, but my mind and time were occupied with trying to wrap up well with students in person, and also home life obligations (I have a two-year-old daughter, and my wife is a graduate student with a full-time job), so I’m really just beginning to collect my thoughts.

I am entirely in the camp of those who describe this new mode of teaching as “remote” rather than ”online” learning (and I am grateful that is the language our administration has chosen to use). I do not plan to create an online class, and I do not have any pretensions that I could make a good one in the time available. Even in ordinary circumstances I forget to update course pages on our LMS half the time. I’m focused on what’s going to happen during the class meetings. I’m probably going to have to record a few lectures, but I’ll also look for other video resources that already exist, because me making a video will be either entirely off-the-cuff or hours of planning and scripting. I’m focused on the “remote” part, as a substitute for focusing on class meetings. Students are going to be studying on their own, in a wide variety of settings and living situations. The global upheaval and concomitant personal stresses make it likely that calculus or abstract algebra is not the primary concern in their lives. I want to give them the best chance to learn despite those conditions. The “online” element is present strictly to mitigate the isolation of self-study. Thus I plan to use online elements in ways that will maintain our community and help students feel like their study is meaningful. We’ll have online discussions, to the extent possible given the geographic diversity, and I plan to formulate new projects that will allow students to implement new knowledge in ways that directly affect their understanding of the world. Ideally, these projects themselves will enable assessment of the relevant skills, and I’ll be able to rely less on traditional test formats.

I am not in either camp regarding whether this switch to remote teaching is beneficial or detrimental. In fact, I would describe myself as wholeheartedly ambivalent on that matter. On the one hand, nothing about this situation is normal. Literally the entire world is focused on containing a disease that could kill tens or hundreds of millions of us. Many social institutions (schools, religious groups, local government) have shuttered their brick-and-mortar locations. Friends are instantly made distant, and any challenges to family life are made proximate. On the other hand, I believe my students truly value education. Continuing classes means maintaining some measure of normalcy. It is an element of life where one has some control, unlike almost everything else around us. And I have seen learning of complex mathematics happen in truly extraordinary circumstances, such as during the time that I taught in Peace Corps. (I do not support the argument, made by some in our community, that we can manage this “because we’ve done it before.” A year and a half ago, our college switched to remote instruction for three weeks due to wildfires; that was a more traumatic time, though a briefer one. This represents a fundamental shift in how we complete our courses, not just maintain progress in the short term, and its ramifications for higher education and our university in particular should not be minimized. The fact that, at the administrative level, so many decisions seem to be driven by the threat of future problems with accreditation exceeds my ability to worry.) For now, it is my job, and it is one way that I can contribute to bettering the world.

I am mindful of those for whom this time presents even greater challenges: people who already suffer from loneliness and isolation; people without homes, for whom the loss of public spaces and services will fray an already thin network of support; people in prison, who are neglected in the best of times, despite being in custody of the state; people in immigrant detention centers, who are constantly treated shamefully and forced to live in appalling conditions. My job and my position do not exonerate me from doing what I can to aid them, as well.

cardioid, deltoid, folium

2019-07-15T16:10:00.001-07:00

The cardioid and the deltoid are two of my favorite curves. They arise in similar ways: one is an epicycloid, and the other is a hypocycloid. In a sense, each is the simplest non-trivial example of their respective type. They make excellent examples for calculus problems. But as I learned this week, they are actually the same curve.

This post is about the claim made in italics in the previous paragraph. Obviously I don’t mean that the classical constructions mentioned above (and described below) produce the same curves in the Euclidean plane. Rather, they are the same from the perspective of complex projective geometry. When I searched for this fact on Google after uncovering it for myself, I only found one mention of it, in a textbook from 1923 entitled An Introduction to Projective Geometry. I assume it was well-known at the time, and today is probably known to certain algebraic geometers, but it seems worth explicating for a larger audience.

First, the curves. Epicycloids and hypocycloids are both examples of roulettes, curves traced out by a point marked on one curve, which is free to move, as it rolls along another curve, which is fixed, without slipping. To generate an epicycloid or hypocycloid, both the fixed curve and the moving curve are circles; the difference is that for an epicycloid, the rolling circle is outside the fixed circle, and for a hypocycloid the rolling circle is on the inside. The shape of the epicycloid or hypocycloid is determined by the ratio of the circles’ radii. For an epicycloid, we can choose a 1:1 ratio, which means the marked point on the rolling circle makes contact with the fixed circle once as the outer circle completes a circuit. A hypocycloid cannot be constructed from circles whose radii have a 1:1 ratio, and a 2:1 ratio simply produces a line segment, so the simplest hypocycloid arises from a 3:1 ratio. The construction of these simplest examples is illustrated below. (These animations were created using a Desmos graph with the help of GIFsmos.) The first is called the cardioid (“heart-like”) and the second is the deltoid (“triangle-like”).

In both cases, the rolling circle is given a radius of 1, and in both cases the centers of the two circles remain at a distance of 2. By watching carefully, one can see that in both cases the marked point makes two revolutions around the center of the rolling circle. For the cardioid, these revolutions are counterclockwise, and so the cardioid can be parameterized by \[ (2\cos\theta + \cos2\theta, 2\sin\theta + \sin2\theta)\text. \] In the case of the deltoid, the marked point’s revolutions are made clockwise, and so the deltoid can be parameterized by \[ (2\cos\theta + \cos2\theta, 2\sin\theta - \sin2\theta)\text. \] These formulas are very similar, but certainly not the same, and the pictures they produce are quite different. So how can I claim that the curves are the same?

Our first step toward understanding the claim involves switching to complex numbers. If we collect the $x$- and $y$-coordinates of the plane $\mathbb{R}^2$ into a single complex coordinate, then the parameterizations above become

$2e^{i\theta} + e^{2i\theta} \qquad$ and $\qquad 2e^{i\theta} + e^{-2i\theta}$. Now we want to extend to the complex plane $\mathbb{C}^2$ (note: I think of $\mathbb{C}$ as the complex line because it is one-dimensional as a complex vector space). A standard trick is to add a second coordinate that is conjugate to the first, which makes the parameterizations $\big(2e^{i\theta} + e^{2i\theta},2e^{-i\theta} + e^{-2i\theta}\big) \qquad$ and $\qquad \big(2e^{i\theta} + e^{-2i\theta},2e^{-i\theta} + e^{2i\theta}\big)$. Now let’s set $t = e^{i\theta}$ and allow $t$ to take on all complex values (except $0$, but we’ll take care of that later) instead of just values on the unit circle. At the same time, let’s label the parameterizations $\gamma_C$ and $\gamma_D$, with $C$ standing for cardioid and $D$ for deltoid. This gives us $\gamma_C(t) = \left(2t + t^2,\dfrac{2}{t} + \dfrac{1}{t^2}\right) \qquad$ and $\qquad \gamma_D(t) = \left(2t + \dfrac{1}{t^2},\dfrac{2}{t} + t^2\right)$. We still can see superficial similarities in these formulas, but not enough to conclude that they define equivalent curves. In order to see their equivalence, we need to see what’s happening at infinity, which means introducing some projective geometry.

The complex projective line $\mathbb{P}^1$, also known as the Riemann sphere, is obtained by adding a single point, labeled $\infty$, to the ordinary complex line $\mathbb{C}$. The points of $\mathbb{P}^1$ may be thought of as the “slopes” of lines through the origin in $\mathbb{C}^2$. Indeed, it is often useful to assign coordinates to $\mathbb{P}^1$ using non-zero vectors $(s,t)$ in $\mathbb{C}^2$, where two vectors correspond to the same point of $\mathbb{P}^1$ if they are scalar multiples of each other, $(s,t)\sim(\lambda s,\lambda t)$ if $\lambda\in\mathbb{C}\setminus\{0\}$. We write the equivalence class of $(s,t)$ as $[s:t]$; these are called homogeneous coordinates on $\mathbb{P}^1$. We can recover $\mathbb{P}^1$ as $\mathbb{C}\cup\{\infty\}$ by sending $[s:t]$ to the slope $t/s$ if $s \ne 0$; then $[0:1]$ is sent to $\infty$.

In a similar way, we can extend $\mathbb{C}^2$ to the complex projective plane $\mathbb{P}^2$ by adding points at infinity, and the most convenient way to do so is by homogenous coordinates. We start with non-zero vectors $(u,v,w)$ in $\mathbb{C}^3$ and consider $(\lambda u, \lambda v, \lambda w)$ to define the same point of $\mathbb{P}^2$ as $(u,v,w)$ if $\lambda\in\mathbb{C}\setminus\{0\}$. Then $[u:v:w]$ are homogeneous coordinates on $\mathbb{P}^2$. The points with $u\ne0$ correspond to points of the original complex plane $\mathbb{C}^2$, by sending $[u:v:w]$ to $(v/u,w/u)$. The points with $u=0$ constitute the new line at infinity, which is just a copy of $\mathbb{P}^1$ with coordinates $[0:v:w]$.

Now we can extend the cardioid and the deltoid to curves in $\mathbb{P}^2$, not just $\mathbb{C}^2$. We start with the parameterizations $\gamma_C$ and $\gamma_D$, append an initial coordinate of 1, then clear denominators (we can do this because of the equivalence that defines homogeneous coordinates). Then we get

$\gamma_C(t) = \big[t^2:2t^3 + t^4:2t + 1\big] \qquad$ and $\qquad \gamma_D(t) = \big[t^2:2t^3 + 1:2t + t^4\big]$. These allow for the possibility of $t = 0$, but apparently leave out the point at infinity $\infty$, so we make one more modification, replacing $t$ with $t/s$ and again clearing denominators to obtain $\gamma_C([s:t]) = \big[s^2 t^2:2st^3 + t^4:2s^3t + s^4\big] \qquad$ and
$\qquad \gamma_D([s:t]) = \big[s^2t^2:2st^3 + s^4:2s^3t + t^4\big]$. Here we see a feature characteristic of maps from one projective space to another, when homogeneous coordinates are used: each component of the map must be homogeneous of the same degree (in this case, four). By expressing the parameterizations of the cardioid and the deltoid in this way, we see that both curves touch the line at infinity at the two points $[0:1:0]$ and $[0:0:1]$, corresponding to $[0:1]$ and $[1:0]$, respectively, for the cardioid, and in the reverse order for the deltoid. Still this isn’t enough to show that the curves are the same! We need one more ingredient.

A projective transformation of $\mathbb{P}^1$ or $\mathbb{P}^2$ is induced by a linear transformation of the homogeneous coordinates. Readers who are already familiar with the Riemann sphere will recognize projective transformations of $\mathbb{P}^1$ as Möbius transformations (also known as fractional linear transformations): given $a,b,c,d\in\mathbb{C}$, we can convert $[s:t] \mapsto [as+bt:cs+dt]$ to a Möbius transformation in the coordinate $z = s/t$, where it becomes $z \mapsto \dfrac{az+b}{cz+d}$. The condition for this function to be invertible is $ad - bc \ne 0$, which is the same as the condition for the matrix $\begin{bmatrix}a & b \\ c & d\end{bmatrix}$ to be invertible. In the same way, projective transformations of $\mathbb{P}^2$ arise from invertible linear transformations of $\mathbb{C}^3$. Two objects in $\mathbb{P}^1$ or $\mathbb{P}^2$ are called projectively equivalent if there is a projective transformation that carries one to the other. And now we can state precisely what was meant in the opening paragraph:

The cardioid and the deltoid are projectively equivalent in $\mathbb{P}^2$.

But how do we find the projective equivalence? A clue may be found in one clear difference between the original curves drawn in the Euclidean plane, which niggled at me while I was trying to figure out their relationship. The deltoid clearly has three cusps, while the cardioid apparently only has one. If the curves are equivalent, where are the other cusps of the cardioid? The answer: on the line at infinity!

How can we tell? It’s time to apply some differential geometry and look at the tangent lines of these two curves. Returning to the parameterizations in terms of $t$, we find

$\gamma_C'(t) = \left(2 + 2t,-\dfrac{2}{t^2} - \dfrac{2}{t^3}\right) \qquad$ and $\qquad \gamma_D'(t) = \left(2 - \dfrac{2}{t^3},-\dfrac{2}{t^2} + 2t\right)$. Now a line in $\mathbb{C}^2$, with coordinates $(v,w)$, passing through $(a,b)$ in the direction $(s,t)$ has the equation $\begin{vmatrix} s & v - a \\ t & w - b \end{vmatrix} = 0$. Thus the tangent line to the cardioid at $\gamma_C(t)$ has the equation \[ \begin{vmatrix} 2 + 2t & v - \big(2t + t^2\big) \\ -\frac{2}{t^2} - \frac{2}{t^3} & w - \big(\frac{2}{t} + \frac{1}{t^2}\big) \end{vmatrix} = 0 \] which, after some simplification, becomes \[ wt^3 - 3t^2 - 3t + v = 0\text{.} \] This is the line equation of the cardioid. In a similar fashion, we can find the line equation of the deltoid, which is \[ t^3 - vt^2 + wt - 1 = 0\text{.} \]

Having the line equation of a curve, in terms of a parameter $t$, can be useful in several ways. As $t$ varies over $\mathbb{P}^1$, it produces all the tangent lines of the curve. (We’ll clarify what happens when $t = \infty$ in a moment.) But we can also let $(v,w)$ vary over $\mathbb{C}^2$ and find, for each point, which tangent lines of the curve pass through that point. Because the line equations of the cardioid and the deltoid are cubic polynomials in $t$, most points of $\mathbb{C}^2$ will lie on three tangent lines. Those points that lie on fewer than three tangent lines play a special role.

Let’s illustrate first with the deltoid. We’ll be looking at lots of cube roots, so let $\omega = e^{i\,2\pi/3}$; this means that $\omega^3 = 1$ and $1 + \omega + \omega^2 = 0$. When $(v,w)=(0,0)$, the line equation becomes $t^3 - 1 = 0$, so the tangent lines of the deltoid that pass through the origin correspond to the parameters $1$, $\omega$, and $\omega^2$. Indeed, the three points $\gamma_D(0) = (3,3)$, $\gamma_D(\omega) = (3\omega,3\omega^2)$, and $\gamma_D(\omega^2) = (3\omega^2,3\omega)$ are the three cusps of the deltoid. On the other hand, a point that belongs to the deltoid lies on tangent lines corresponding to at most two parameters (two of the points of tangency have “coalesced”). For example, when $(v,w)=(-1,-1)$, the line equation becomes $t^3 + t^2 - t - 1 = 0$, or $(t+1)^2(t-1) = 0$. At a cusp, all three tangent lines coincide: for example, when $(v,w)=(3,3)$, the line equation is $3t^3 - 3t^2 + 3t - 3 = 3(t-1)^3 = 0$. See the pictures below.

We can homogenize the line equation of the deltoid by replacing $t$ with $t/s$ and $(v,w)$ with $(v/u,w/u)$ and clearing denominators to obtain: \[ ut^3 - vst^2 + ws^2t - us^3 = 0\text. \] When $[s:t] = [1:0]$ or $[0:1]$ (remember, this second point in homogeneous coordinates corresponds to $t=\infty$), we get the same equation of the tangent line, $u = 0$. This is the equation of the line at infinity, so the line at infinity is tangent to the deltoid at both $[0:0:1]$ and $[0:1:0]$! A line that is tangent to a curve at two points is called a bitangent.

The cardioid also has a bitangent, which is easier to see: when $t = \omega$ or $t = \omega^2$, respectively, the line equation of the cardioid becomes $w - 3\omega^2 - 3\omega + v = 0$ or $w - 3\omega - 3\omega^2 + v = 0$, both of which are equivalent to $v + w = 3$. The visible cusp occurs at $(-1,-1)$, where the line equation becomes $(t + 1)^3 = 0$. For an example of more generic behavior, look at $(-3,-3)$, where the line equation becomes $3t^3 + 3t^2 + 3t + 3 = 0$, or $3(t + 1)(t + i)(t - i) = 0$. See pictures below.

The homogeneous version of the cardioid’s line equation is \[ wt^3 - 3ust^2 - 3us^2t + vs^3 = 0\text. \] When $[s:t] = [1:0]$, this becomes the $w$-axis $v = 0$, and when $[s:t] = [0:1]$, we get $w = 0$. In each of these cases, we see that only one tangent line passes through the point, just as we saw for the cusps of the deltoid. So we have identified the three cusps of the cardioid—$[1:-1:-1]$, $[0:0:1]$, and $[0:1:0]$. The tangent lines through all three of these cusps pass through the origin in $\mathbb{C}^2$, with homogeneous coordinates $[1:0:0]$.

We now have enough information to show the equivalence of the cardioid and the deltoid. To define a projective transformation from $\mathbb{P}^1$ to itself, we need to specify where three points go; to define a projective transformation from $\mathbb{P}^2$ to itself, we need to specify the images of four points, no three of which are collinear. We’ll show how to transform the line equation of the deltoid into the line equation of the cardioid via pullback.

We’re looking for projective transformations $f : \mathbb{P}^1 \to \mathbb{P}^1$ and $g : \mathbb{P}^2 \to \mathbb{P}^2$ such that $\gamma_D \circ f = g \circ \gamma_C$. Starting with $f$, we require

$f\big([1:0]\big) = [1:\omega]$, $f\big([0:1]\big) = [\omega:1]$, and $f\big([1:-1]\big) = [1:1]$, so that the parameters of the cardioid’s cusps are sent to those of the deltoid’s cusps. This can be accomplished by defining \[ f\big([s:t]\big) = [s - \omega t : \omega s - t]\text. \] Meanwhile, $g$ needs to satisfy $g\big([0:0:1]\big) = [1:3\omega:3\omega^2]$, $g\big([0:1:0]\big) = [1:3\omega^2:3\omega]$,
$g\big([1:-1:-1]\big) = [1:3:3]$, and $g\big([1:0:0]\big) = [1:0:0]$, which is accomplished by \[ g\big([u:v:w]\big) = [ 3u+v+w : 3\omega^2 v + 3\omega w : 3\omega v + 3\omega^2 w ]\text. \] Now substitute the components of $f$ and $g$ into the variables of the deltoid’s line equation, expand, and simplify. The result is the line equation of the cardioid. You can calculate this by hand, or just let SageMath do it for you:

One of the curves mentioned in the title of this post has been conspicuously absent so far: the folium of Descartes. This is another favorite curve of mine, invariably given in my calculus classes as an exercise in implicit differentiation. Its equation is $x^3 + y^3 = xy$.

So what’s the connection between this curve and the others? Well, if we extract the coefficients from the deltoid’s line equation and use them to define a new curve $\gamma_F$, we get \[ \gamma_F\big([s:t]\big) = [ s^3 - t^3 : st^2 : -s^2 t ]\text, \] which parameterizes \[ v^3 + w^3 = uvw\text, \] the homogeneous version of the folium’s equation. This means that the folium is dual to the deltoid (and thus also to the cardioid)! The tangent lines of the cardioid/deltoid have been converted into points of the folium, and likewise points of the cardioid/deltoid become tangent lines of the folium. Just as each point of $\mathbb{C}^2$ lies on three tangent lines of the cardioid/deltoid, counted with multiplicity, each line of $\mathbb{C}^2$ intersects the folium at three points, counted with multiplicity. The bitangent of the deltoid and cardioid has been converted into a point of self-intersection. If we look at points of the form $[1:v:\bar{v}]$, then the threefold symmetry of the folium is revealed (the three asymptotic directions correspond to the three tangent lines that pass through the origin, which as we saw are the tangent lines at the cusps).

an IBL preface

2018-12-29T11:07:00.000-08:00

In just over a week, I will distribute to students the first piece of the complex variables notes I have been writing. Here is a preface to be included with the notes, to motivate the IBL structure. The details of the class will be spelled out in the syllabus; this is just to set the tone.

You are the creators. These notes are a guide.

The notes will not show you how to solve all the problems that are presented, but they should enable you to find solutions, on your own and working together. They will also provide historical and cultural background about the context in which some of these ideas were conceived and developed. You will see that the material you are about to study did not come together fully formed at a single moment in history. It was composed gradually over the course of centuries, with various mathematicians building on the work of others, improving the subject while increasing its breadth and depth.

Mathematics is essentially a human endeavor. Whatever you may believe about the true nature of mathematics—does it exist eternally in a transcendent Platonic realm, or is it contingent upon our shared human consciousness? is math “invented” or “discovered”?—our experience of mathematics is temporal, personal, and communal. Like music, mathematics that is encountered only on as symbols on a page remains inert. Like music, mathematics must be created in the moment, and it takes time and practice to master each piece. The creation of mathematics takes place in writing, in conversations, in explanations, and most profoundly in the mental construction of its edifices on the basis of reason and observation.

To continue the musical analogy, you might think of these notes like a performer’s score. Much is included to direct you towards particular ideas, but much is missing that can only be supplied by you: participation in the creative process that will make those ideas come alive. Moreover, the success of the class will depend on the pursuit of both individual excellence and collective achievement. Like a musician in an orchestra, you should bring your best work and be prepared to blend it with others’ contributions.

In any act of creation, there must be room for experimentation, and thus allowance for mistakes, even failure. A key goal of our community is that we support each other—sharpening each other’s thinking but also bolstering each other's confidence—so that we can make failure a productive experience. Mistakes are inevitable, and they should not be an obstacle to further progress. It’s normal to struggle and be confused as you work through new material. Accepting that means you can keep working even while feeling stuck, until you overcome and reach even greater accomplishments.

These notes are a guide. You are the creators.

2018 calculus syllabus

2018-09-03T22:44:00.000-07:00

In my last post, I explained a bit about how I feel like my syllabus is a work-in-progress, even though the semester has started and I’m already using it. In this post I’ll give some more details and even more history. I’ll quote extensively from my syllabus verbatim; here is a link to the entire thing for anyone who is interested.

Revising my syllabus for this semester really began last fall. I wasn’t entirely blind to the faults that were starting to show. One major change was in restructuring the exam schedule. When I switched to standards-based grading in calculus 1, I also started weekly quizzes (which students took on their own time outside of class) and had three midterm exams plus a final. The quizzes functioned as a sort of preliminary assessment for most of the standards. Each test covered about eight standards. After the third test, there were a couple more standards we covered in class, which were only assessed on a quiz and the final exam. Even with three midterms, however, I had often felt like students were rushed in completing them. I also began to question the value of the out-of-class quizzes. So I turned the quizzes into “labs” that students were free to collaborate on, and I switched from three midterm exams to five, which would formally assess every standard before the final.

I really liked how having five midterms broke up the material. Each test became more coherent in the material it included. Exam 1 covered limits. Exam 2, definition and interpretation of derivatives. Exam 3, rules for differentiation. Exam 4, applications of derivatives. Exam 5, definition of integrals and the Fundamental Theorem of Calculus. Especially helpful was splitting up the applications of derivatives (l'Hospital's rule, optimization, related rates, and so on) from the introduction to integrals; these topics had usually been all jumbled together in the last midterm, compounding the difficulty already created by it being late in the sester. Also, by dedicating one test just to derivative rules, I was moving towards having a Differentiation Gateway Exam, as several of my colleagues at Pepperdine use. And paired with that move was an awareness that I was gravitating towards a specifications framework.

This fall, I decided to maintain the five-midterm structure and get rid of the quizzes/labs, to be replaced by an occasional more substantial homework exercise that will be used in class. I collected seven standards into the Gateway Exam, which will form the bulk of the third midterm. I split the remaining standards into 45 “tasks”, which is a term I hope will be clearer than “standards”; each standard split into approximately two tasks. The idea of tasks goes back in my mind to the list of problems Kate Owens shared from her Ph.D. advisor George McNulty. That is, a task is a specific type of problem that students will show they know how to solve. Here is the new introduction to the “Goals and Assessment” section of my syllabus:

Change is present all around us, and understanding it is an essential component of many fields of study. Calculus is fundamentally a set of tools for measuring, quantifying, estimating, and interpreting change in a variety of contexts. In this course, we will delve into some of the most profound ideas in mathematics, whose roots are from ancient times and which began to develop fully in the 17th century; they continue to form the basis for much of modern science. My hope is that this class will develop your analytical ability and deepen your appreciation for the power and elegance of mathematics.

The skills you should acquire are related to the Learning Outcomes stated on the first page of this syllabus. Your mastery of the course content will be assessed through your performance on a collection of definite tasks. A complete list of tasks is on the last two pages of this syllabus. These tasks, rather than points or percentages, will be the primary basis for grading. The following sections provide details on how the tasks will be assessed and what you should accomplish in order to earn your final grade.

My hope is that this method of assessment, called standards-based grading or mastery grading, will keep you clearly informed as to the expectations of the class and how well you are meeting them, while also removing the (often distracting) elements of linear grading that uses letters or total points. Learning is not always a straightforward process, and one of my goals is to give you as many opportunities as possible to demonstrate your understanding. I will be glad to do everything I can to help you towards your goal of mastery. If you have questions or concerns at any time, please feel free to discuss them with me.

Another potential source of confusion from my SBG system in the past was the levels of ranking. I really liked that we were using the vocabulary of mastery / proficiency / basic ability / novice to talk about students’ progress, but it was rare that a student could rate their own work with one of these levels. So this fall I opted, as many others have, for a simple pass/fail approach on tasks. I don’t like the pass/fail language, however, so I chose successful for a task completed satisfactorily and progressing or incomplete for work that has major mistakes or is absent. I also wanted to handle small mistakes through a faster revision process, an idea I picked up from MathFest; for these situations, I added a revisions needed category. Here is how I describe the rating system in my syllabus:

A task is a problem or a collection of similar problems that should be solved using calculus tools. Your progress in the class will be measured in terms of the number of tasks that you accomplish. Partial credit is not given; a task must be fully successful in order to count towards your final grade. Whenever a task is included on an exam, your work will receive one of four ratings:
S – successful Solution is complete and correct.
R – minor revisions needed Solution is correct except for small errors.
P – progressing Partial understanding is evident, but solution contains substantial errors.
I – incomplete Not enough evidence is available to provide an assessment.
A task marked “S” has been completed; you can check it off the list at the end of the syllabus.
A task marked “R” can have a small mistake such as an arithmetic error or a miscopied value. You will have 48 hours (or over the weekend if the work is returned on a Friday) to complete a Revision Form that explains how to correct the mistakes, and to submit the form along with your original work, in order to earn a successful rating.
A task marked “P” demonstrates progress in mastering the topic, but reassessment is necessary in order to successfully accomplish the task.
A task marked “I” shows little or no relevant work. Reassessment is necessary.
To show mastery of a task after it has received a rating of P or I, see the section entitled “Reassessment” on the next page.

Hopefully this simplified rating system will also make it easier for me to track student progress over the duration of the semester and analyze trends afterwards. (I agree with Kate that Drew’s “A tale of two students” chart was a moment of clear inspiration at MathFest.)

In order to help students know what is expected to prepare for reassessment, and to help me schedule them more effectively, I have introduced Reassessment Tickets:

After a task has been assessed on an exam, you may schedule a reassessment if you did not successfully complete the task. This is a two-step process:

First, pick up a Reassessment Ticket from my door or download and print one from the Courses site. Complete the form and return it to me at least 24 hours before you want a reassessment.

Second, once a meeting is scheduled, come to my office and I will give you a new opportunity to demonstrate mastery of the task. If possible, I will grade your work immediately; otherwise, I will let you know the result by the following day.

I will reassess up to two tasks per student per week. In addition, you can use exam days as opportunities for reassessment of up to three tasks, provided you let me know 48 hours in advance which ones.

I plan to use one or two class days at the end of the semester for reassessment alongside review, as well.

Another element I introduced was subcategories of tasks: “core”, “modeling”, and “additional“ (not a great name, I’ll try to find a better one in the future). Again, lots of other people are already doing this, and I like what many of them are doing, which is to require two demonstrations of mastery for core skills and only one for the rest. I couldn’t figure out how to make that work with my system, so I made the following distinction: core tasks are the ones that could appear on the final exam. There are 14 of them, and I will choose seven to go on the final. (The final exam will also include a reflection essay, and a period of time for additional reassessment.) I also set higher expectations for how many core vs. additional tasks needed to be successful at each grade level.

What I learned from creating a list of tasks is that, because I state exactly what types of questions I will include on the tests, there is less wiggle room than with standards, which could always be applied to new sorts of problems. (This is the distinction between activity and ability I talked about in my last post.) I don’t know if my list of tasks, or the categorizing thereof, is ideal, but it is certainly enough to guarantee that a student who succeeds at all of them will have mastered calculus 1. (I used Robert’s classification of “core” and “supplemental” learning targets as a reference while I was sorting, but our lists don’t match up exactly.)

After all the work that goes into getting away from letter grades in a standards-based system, it’s always a bit dispiriting to turn back to them. So I start my section on “Final letter grades” with a bit of reluctance (not to say snark).

At the end of the semester, I am required to submit to the university a letter grade reflecting your achievement in this class. Here is how that grade will be determined.

To earn an A: in addition to passing the Gateway Exam and completing the Final Reflection,

Submit 20 homework reports.
Complete all modeling tasks.
Complete all core tasks.
Complete 25 additional tasks.
Complete 6 core tasks on the final exam (minor errors are acceptable).
To earn an B: in addition to passing the Gateway Exam and completing the Final Reflection,

Submit 15 homework reports.
Complete 2 modeling tasks.
Complete 12 core tasks.
Complete 21 additional tasks.
Complete 5 core tasks on the final exam (minor errors are acceptable).
Passing the Gateway Exam is required to earn a final grade of B– or higher.
To earn an C: in addition to completing the Final Reflection,

Submit 10 homework reports.
Complete 1 modeling task.
Complete 10 core tasks.
Complete 17 additional tasks OR pass the Gateway Exam and complete 14 additional tasks.
Complete 4 core tasks on the final exam (minor errors are acceptable).
Failure to complete a Final Reflection will result in a grade of D or F.
To earn a D:

Submit 5 homework reports.
Complete any 30 tasks from C.1–C.14, A.1–A.28, M.1–M.3 OR pass the Gateway Exam and complete any 23 tasks.
Complete 3 core tasks on the final exam (minor errors are acceptable).
Plusses and minuses will be assigned as follows: if all criteria for a letter grade are met as well as two or three of those for a higher letter grade, then a plus will be added. If all but one or two criteria for a letter grade are met, and the remaining items meet the criterion for one letter grade lower, then the higher letter will be given with a minus added.

I will use my discretion to assign a final letter grade in cases where a different set of conditions is met.

So there it is. My syllabus for calculus this semester. I’m sure by December, or even October, I’ll have a much better notion of what changes I should have made. I’ll let you know how it goes.

(I should also have given more attribution in this post to the people I stole ideas from, especially at MathFest, but I don’t have those notes on hand right now. So a general word of thanks goes out to this very sharing community.)

prologue to a syllabus

2018-09-02T17:41:00.000-07:00

(This was originally supposed to be the post in which I describe my syllabus for the fall. I started writing some preliminary comments, and they got out of control. I’ll get back to the syllabus itself in my next post.)

First, I must express some gratitude. Thanks to parental leave provided by the state of California and my school, I did not have any teaching duties last spring. It was my first time not teaching first-semester calculus in four years. As I tell my students, calculus 1 is actually one of my favorite classes to teach, but I could tell by last fall that some parts of the course were getting stale. Having a semester break meant that, in addition to getting to know my newborn daughter, I could let my ideas on how to improve calculus instruction and assessment simmer for a bit.

Actually, it’s not entirely honest to refer to “my ideas” in this setting; what I really needed was a chance to reflect on ideas I’d been picking up (stealing) from others, and even better, to acquire (steal) some fresh ideas, which a workshop and conference provided over the summer. A fabulous community of college and university math teachers has formed around the question of how to improve our assessment practices, and the rate at which sharing/stealing/developing ideas is remarkably fast.

Over the past few weeks, as the fall semester has started up, several people have shared their syllabi along with extensive, thoughtful commentary on how they created them. I’ve been holding back, however, because while I believe my syllabus is better than it was last year, by the time the semester started I only felt like I had gotten it to “just good enough.” Some ideas aren’t fully developed yet, some feel out of balance, and some are plain risky. Nevertheless, in the spirit of community and maintaining a growth mindset, I’ve decided to go ahead and share my syllabus, too, warts and all.

Since I see this as a long-term work in progress, I’d like to begin with a few words about that progress. (These comments will parallel somewhat my talk from MathFest last month.) I started using standards-based grading in spring 2013, largely as a way to improve the feedback I was giving students. After a reasonably successful first attempt, I began using alternative assessment methods in all of my classes. Some worked better than others, but because I was teaching calculus 1 so often, my SBG system for that class developed into a collection of 25–30 standards that became fairly stable.

Around the same time, Robert Talbert was blogging about specifications grading, a well-developed and flexible framework whose goals, in the words Linda Nilson uses to subtitle her book on the topic, are “restoring rigor, motivating students, and saving faculty time.” For a while I remained skeptical about specs grading, because I couldn’t understand why anyone would turn to something besides my beloved standards. Eventually, however, I realized that SBG as I conceived it didn’t work in every situation, and so I delved more into specs. The Google+ community initiated by Robert goes by the name SBSG, to include both standards-based and specifications grading. Today the language of the community encompasses these and other alternative assessment systems under the broader term mastery grading, which hearkens back to Bloom’s terminology of mastery learning.

At MathFest, I talked a bit about this history of my classes and did some compare-and-contrast between SBG and specs grading. Possibly the most useful contribution I made to that session was the following six-word summary of how they relate:

Standards emphasize content.
Specifications emphasize activity.

Here’s another way to phrase the distinction in my mind: When we create standards, we are answering the question what do we want students to be able to do? When we create specifications, we are answering the question what do we want students to have done? More bluntly, standards are what we want to measure, while specifications are what we can actually measure; the latter is a proxy for the former.

I guess my claim is that standards and specifications support each other: they are two sides of the same coin. We need specifications in order to determine how standards will be assessed, and a clear list of standards keeps specifications from becoming arbitrary. (Or as Drew Lewis said on Twitter, “specs are how I assign letter grades, with the primary spec being mastery of standards.”) Whether I say that an assessment system is based on specifications or standards depends on whether the description of the system focuses on the proxy or the thing for which it proxies.

By last fall, some cracks in my SBG system for calculus had started to show. Every semester, I had a couple of students at the end of the course who still thought it wasn’t clear. The homogeneity of the list of standards was mushing the most important concepts together with secondary ones. Worst of all from a practical standpoint, I was finally getting overwhelmed by reassessments after years of claiming that SBG didn’t take any more of my time than traditional grading. I knew I needed to make some changes to clarify and streamline the assessment process.

What I have for now isn’t perfect, but it will get me through the semester. With this lengthy prologue complete, in my next post I’ll share parts of my syllabus and explain what I hope it achieves.

“Create Your Own” part 1

2018-08-27T22:31:00.000-07:00

Today was the first day of calculus for the fall semester of 2018. As a first-day activity, I wanted to do something that didn’t require any calculus knowledge and could break students out of the mindset that doing math is always about solving particular problems that have been fed to you.

So I initiated a sequence of exercises I’m planning, which I’ve come to think of collectively as “Create Your Own…”. In this case, I gave the following prompt:

The number 1 can be written many different ways, for example 4 – 3 or 10/10.
Come up with ten different expressions that equal 1. Be creative!
Try to have at least four of your solutions involve some kind of algebraic expression, like a variable x.

After they had a few minutes to work individually, I had them share their answers in small groups, and each group picked out what expression by its members they thought was most creative. At the end of class, I collected all of their solutions to look at later in the day.

In having students do this exercise, I learned a lot, and I would definitely do it again, with a few tweaks. Here’s some of what I learned:

Students judge creativity differently than I do. In looking over the collective work this evening, I saw some excellent examples of splitting 1 into a sum of fractions or decimals and some elaborate expressions involving absolute values or square roots. But the groups often picked examples with the fanciest functions as most creative. Each section had some students come up with cos²(x)+sin²(x) as an answer, and some used logarithms, as in ln(e) or log₁₀(10). And I’m glad those functions were there! It gave us a chance to talk a little about them and for me to give assurance that we would review them at an appropriate time. But 1/10+2/5+1/2 is much more personal, somehow, and I’d like it to have its due.
This kind of exercise was surprising and unfamiliar. I’m not quite sure how much time I gave for the creative process; I started out in my head with the idea of 2–3 minutes, but that clearly wasn’t enough, so it was probably 4–5. In that time, not everyone came up with ten solutions. (Which is fine! We’d spent an earlier part of the class watching Jo Boaler’s “Four Key Messages” video, which emphasizes that speed isn’t essential in learning math; a couple of students added that comment to their work.) I saw a few get stuck for a while, however, and next time I’ll have some ideas for how to gently prod.
The notion of “variable” is very strongly connected with “solving an equation”. The vast majority of students interpreted the direction “involve some kind of algebraic expression” to mean “write an equation whose solution is 1.” This led to answers like 2x=2 and x+3=4, and many others (one group gave log₅(x)=0 as an answer!). There was a remarkable amount of creativity in the creation of these equations; I’d like to figure out how to leverage that. But now I also know that the distinction between an “expression” and an “equation” has not yet been made clear, and when we start simplifying algebraic expressions (e.g., to compute limits), we’ll still need to inject some flexibility into our thinking.

The main adjustments I would make next time are:

Rephrasing the instructions to say that the expressions simplify to 1, rather than equalling 1. Hopefully this will give clearer direction regarding algebraic expressions. Also, I would probably add an algebraic example like (x+1)–x.
Preparing, nonetheless, for a discussion about what the term “expression” means.
Giving a more definite and slightly longer period of time, and providing more useful interventions as the students do individual work.

That’s all for now. More updates as warranted.

dialectics in mathematics

2018-08-20T11:39:00.001-07:00

This post is part of the Virtual Conference on Mathematical Flavors, and is part of a group thinking about different cultures within mathematics, and how those relate to teaching. Our group draws its initial inspiration from writing by mathematicians that describe different camps and cultures — from problem solvers and theorists, musicians and artists, explorers, alchemists and wrestlers, to “makers of patterns”. Are each of these cultures represented in the math curriculum? Do different teachers emphasize different aspects of mathematics? Are all of these ways of thinking about math useful when thinking about teaching, or are some of them harmful? These are the sorts of questions our group is asking. (intro by Michael Pershan)

I want to talk about how we respond to polarities. Here I mean “polarity” in the philosophical sense (a pair of concepts that are apparently in conflict) rather than in a mathematical sense. When we encounter a struggle or tension between goals or ideas, we tend to create one of two things:

dichotomy — a conclusion that the two ideas are irreconcilable and the choosing of sides, or
synthesis — a selection of desirable features from each and the attempt to make those features coexist.

While each approach is at times appropriate, both have their downsides. Establishing a dichotomy means that one side tends to be silenced and its contributions lost. Creating a synthesis can mean that neither side is fully honored; everything is compromise.

I propose a third option, an alternative to dichotomy or synthesis: this approach is dialectic — upholding both sides fully, maintaining the two ideas in tension so that a conversation may arise between them. Etymologically, “dialectic” comes from the roots “dia” (“across”) and “logos” (“word” or “reason”), so its underlying meaning may be read as “speaking across a divide”. Dialectics can simply refer to discussion or debate between two opposing sides, but I use it to denote a state that seeks not resolution, but rather the fruitfulness of an irreducible struggle. Doing so acknowledges the worth, validity, and potency of both sides. It can therefore be used in the classroom to foster the inclusion of diverse perspectives, even in mathematics.

Our group’s discussion began with an essay by Timothy Gowers entitled “The Two Cultures of Mathematics”. In this piece, Gowers makes the claim that most mathematicians are either “problem solvers”, who prefer to attack specific open problems that they believe are important, or “theory builders”, who prefer to develop a large, coherent body of understanding. The former are interested in general theory mainly insofar as it provides ways to solve their problems; the latter are interested in specific problems mainly insofar as they spur deeper insights or new directions for theory.

This subdivision is similar to the pure/applied separation we often talk about in mathematics, though it is not quite the same thing. Even the problems Gowers mentions fall well within the “pure” category. But these two polarities (pure/applied, theory/problems) share the feature that adherents of one side tends to be a bit snobbish towards those of the other.

Pure mathematicians tend to look on applied mathematics as, at best, a dirty form of math or, at worst, not truly math at all. G. H. Hardy, in his famous essay A Mathematician’s Apology, describes pure mathematics as more enduring, more exciting, and more “real” than applied mathematics. (He does make clear that what he considers “applied” mathematics limits itself to “elementary” tools, which more-or-less means grade-school arithmetic up through introductory calculus, and so his notion of applied mathematics might no longer suffice. I’ll get back to Hardy shortly.)

Gowers claims that, in a similar way, theory-building is currently “more fashionable” than problem-solving in the math world. (Rather than drawing the analogy with pure and applied mathematics, however, he compares this snobbishness with one, observed by C. P. Snow in “The Two Cultures”, held by humanities toward the sciences.) He laments that “this is not an entirely healthy state of affairs” and spends most of his essay defending problem-solving areas of math (combinatorics, in particular) against some perceived criticisms. His argument suggests to me that both the theory-building and the problem-solving camps should be upheld without one attempting to overcome the other; that is, a healthier state can be reached by sustaining a dialectic.

How can we think about theory-building vs. problem-solving in our classes?

For one thing, many of our students are trained problem-solvers. For them, learning mathematics means developing an appropriate response to any given stimulus. If a problem statement includes this-or-that word or phrase, then I should use such-and-such a technique to find a solution. For many of us instructors, however, it is the abstraction of ideas that drew us to mathematics. What is possible in this situation? To what extent can the possibilities be quantified and categorized? If theory-building is currently en vogue in mathematical culture, then I suspect we who teach are not immune to that trend. But here comes the question of motivation: what will draw students into doing mathematics? In many cases, the answer is… a problem. The problem may be “applied” (e.g., how does a population grow over time) or “pure” (e.g., how does the size of a square increase when its side length increases?), but a concrete connection provides an open door to considering broader mathematical truths. Such problems can lead into developing theory (e.g., what properties do exponential and polynomial functions share, and what distinguishes them?).

But developing theory for its own sake has been a part of mathematics since at least Euclid; we do our students a disservice if we neglect this aspect of doing math. A theory crystallizes into a single lattice ideas that might otherwise have been perceived as disconnected. Algebra in particular provides a unifying framework for solving individual problems. On the other hand, non-constructive statements are by turns inspiring and infuriating. It is no small movement from the (typically algebraic) claim that “A solution exists! And you can find it by following these steps…" to the (typically analytic) claim that “A solution exists! And you may never find it exactly…” This theory in turn motivates a slew of new problems: if nothing else, how shall we find solutions as close to the true answer as we desire?

In any case, it is useful to abide by a constructivist view of knowledge: students will understand best the structures that they form in their own minds, whether by induction (problem-solving) or deduction (theory-building), and they should be presented with ample opportunities for both forms of construction.

[Side note: in his keynote post for this series, Michael describes an occasion where he side-steps, or deconstructs, the theory-building/problem-solving divide by encouraging math-doers to create their own questions based on a simple prompt, questions which could easily veer in any direction, including problem-solving or theory-building.]

It is not hard to find other places in mathematics where polarities exist and a choice must be made: dichotomy, synthesis, or dialectic? A few weeks ago, I made a bit of a fuss on Twitter, claiming that everything Hardy wrote about mathematical culture should be read skeptically. The context for my criticism was an oft-shared quote from “A Mathematician’s Apology”: “Beauty is the first test: there is no permanent place in the world for ugly mathematics.”

A question immediately presents itself: who decides what is beautiful? Any claim to objectivity is nearly always tied up with privilege. The answer cannot be “all mathematicians” because we all have such different tastes and preferences. Nor can the answer be “a special subset of mathematicians” because the choice of that subset will inevitably be determined by power structures within the mathematical community. But neither is the answer that all mathematics is equally beautiful. The standard of beauty may be subjective, but that does not mean it is arbitrary. We value beauty, but it is not the sole or even the primary standard by which we judge mathematics.

Hardy argues that all mathematics considered “useful” is essentially “dull” or “trivial”. He seeks to create a dichotomy between the beautiful and the practical. Perhaps he didn’t foresee the computing revolution. He couldn’t predict that number theory would be used in encryption, or that general relativity would be used for GPS, or that differential equations would be used for movie animations. Perhaps he would not consider these applications to be built on the deepest, truest parts of those theories. (To be fair, at the time he wrote, Hardy was distressed by the ways in which science had been used in the cause of warfare, and wanted to establish some distance between pure mathematics and that particular set of applications.)

From an evangelistic perspective, potential converts (our students in particular) may be drawn in from either side: the aesthetic or the practical. I personally was first attracted to geometric form and the lovely, counterintuitive properties of mathematical relations. Some of my students have tastes similar to mine, but many more will be convinced of the predictive power of mathematics before they accept its inherent attractiveness.

Beyond this, however, neither beauty nor usefulness can or should be subjugated to the other. Mathematics is grounded in both. They can hone each other, but they can also proceed independently. Progress flows from the pursuit of either. It would be a mistake, I believe, to claim that either is the true purpose of mathematics; we should support both of them in our minds and in our classrooms.

Not every tension needs to be handled this way, but examples of dialectical pairings abound: precision and approximation, confidence and confusion, individual and community. I encourage us all to consider times when it can be productive not to resolve such conflicts but instead to foster a breadth of understanding from them.

summer activities

2018-08-10T11:05:00.000-07:00

This summer I had a fair amount of travel and conference/workshop activities. I’ve also been working on several projects that need finishing, and of course I have an eight-month old daughter. So I haven’t been blogging, even though several ideas for posts have been kicking around in my head. In order to get something posted, here’s a summary of some major events of the summer.

In May I taught a four-week session of “Transition to Abstract Mathematics”, our introduction-to-proofs course. I had seven students who worked very hard throughout the session. I was pleased that we were able to reach a point where the students could present results they selected from Proofs from THE BOOK during the final exam period.

In June I attended a program on “Teichmüller dynamics, mapping class groups and applications” at the Institut Fourier in Grenoble. (If any of those topics are of interest to you, videos of all the talks are available on YouTube.) I did not give a lecture, but I had a chance to talk with several people about work I did last summer with a Pepperdine student on the topic of “homothety” or “dilation” surfaces. Got a couple of more projects started during this time, to mix into the three or four I was already working on. ¯\_(ツ)_/¯ It was also my first time visiting that part of France, so I traveled with family around the region. A couple of touristic highlights: tasting Chartreuse at the distillery in Voiron, and exploring the Citadel in Sisteron.

In July I participated in an IBL workshop run by the Academy of Inquiry-Based Learning. Four days with 25 enthusiastic teacher-learners and six fantastic facilitators. I started a set of IBL notes to use in the course on complex variables I’m teaching next spring and garnered several new tools and ideas for increasing student activity and engagement in the classroom.

Last weekend I joined the mastery grading session at MathFest. Tons more ideas here! It’s great to be part of so many communities of people who are generating and sharing ideas big and small.

I’ve promised to write at least one more blog post in the next week, for Sam Shah’s Virtual Conference on Mathematical Flavors. So it won’t be quite so long before I post again!

angels’ staircases

2018-02-19T18:16:00.000-08:00

Here’s a pair of facts I hadn’t much considered before the last time I taught real analysis:

The only kind of discontinuity a monotone function can have are jump discontinuities.
A monotone function can have at most countably many discontinuities.

One thinks immediately of the floor function $f(x) = \lfloor x \rfloor$, which has a jump of size 1 at every integer. It is possible to have infinitely many jumps within a finite interval; for instance, $\frac{1}{\lfloor1/x\rfloor}$ has a jump at every unit fraction $1/n$. But can a monotone function have a dense set of discontinuities, say, the set of rationals? Sure, here’s one:

That’s the graph of \[f(x) = \sum_{n=1}^\infty \frac{\lfloor nx\rfloor}{2^n}.\] It has a jump of size $1/(2^q-1)$ at each fraction of the form $p/q$ (when $p/q$ is in reduced form, of course). Exercise. Why are these the sizes of the jumps? Keep in mind where the discontinuities of $\lfloor nx \rfloor$ appear.

Here’s a general way to construct a monotone function $f : \mathbb{R}\to\mathbb{R}$ whose set of discontinuities is your favorite countable set $C$. Let $c_1$, $c_2$, $c_3$, $\dots$ be an enumeration of $C$, and for each $x\in\mathbb{R}$, set \[N(x) = \{ n\in\mathbb{N} : c_n \le x \} \] Let $a_1$, $a_2$, $a_3$, $\dots$ be a sequence of strictly positive numbers such that $\sum a_n \lt \infty$. Then define \[ f(x) = \sum_{n\in N(x)} a_n. \] This function is discontinuous at each point in $C$ and continuous at each point that is not in $C$. (Exercise. Why? Keep in mind that the tail of a convergent series can be made arbitrarily small.) Essentially, we have created a distribution by placing a “delta mass” of weight $a_n$ at the point $c_n$, and $f$ is the integral of this distribution from $-\infty$ to $x$.

When $C$ is dense, the construction above produces a strictly increasing function $f$, and moreover the image of $f$ is nowhere dense. I call the graph of such a function an “angels’ staircase”, because $f$ is a one-sided inverse of a “devil’s staircase” function $g$—that is, $g$ is a monotone increasing function that is constant except on a set of measure zero. (MathWorld, on the other hand, uses “devil’s staircase” to refer to both kinds of functions.)

boxes and fractions

2017-08-05T14:26:00.000-07:00

Last week at MathFest, Dusa McDuff gave an excellent series of lectures on symplectic geometry. I enjoyed the second one the most, because in it she described the solution to a concrete problem that had a beautiful and expected answer, and used several tools of varying levels of difficulty. The result was published in the Annals of Mathematics in 2012; you can get the paper here. I won’t be referring to it for the rest of the post, however. Instead I want to highlight one elementary construction she described during this lecture.

About halfway through the talk, McDuff made a comment along the lines of, “Here’s something I must have learned in elementary school, but I’ve been surprised by how many mathematicians don’t know it.” I certainly don’t recall having seen it, at least not in the generality she described, and I do think people of all ages and mathematical abilities could enjoy playing with it.

Start with any fraction—say, $30/13$—and make a rectangle whose side lengths are the numerator and the denominator of the fraction.

Now start marking off squares, as large as possible, inside the rectangle. In this example, we can cut off two $13 \times 13$ squares at first.

When no more large squares fit, start marking off squares from the remaining rectangle along the side.

Continue until you run out of space. In this example, only one more step in the process is needed.

When you’re finished, you will have filled the rectangle with squares of various sizes.

Count how many squares you have of each size and put those numbers in sequence. In this example, we have two large squares, three medium squares, and four small squares, so the sequence is 2, 3, 4. (Yes, I chose the fraction $30/13$ to get this sequence.) Now write these numbers into a continued fraction, as follows: \[ 2 + \cfrac{1}{3 + \cfrac{1}{4}}. \] To evaluate a continued fraction, we start at the bottom and work through the nested operations: \[ 2 + \cfrac{1}{3 + \cfrac{1}{4}} = 2 + \cfrac{1}{\cfrac{13}{4}} = 2 + \frac{4}{13} = \frac{30}{13}. \] The result is the fraction we started with!

Here’s another example. If we start with the fraction $25/7$ and follow the same process, we get this picture.

This time there are four different sizes of boxes. Counting the number of boxes of each size gives the sequence 3, 1, 1, 3, and we can check that \[ 3 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{3}}} = 3 + \cfrac{1}{1 + \cfrac{3}{4}} = 3 + \frac{4}{7} = \frac{25}{7}. \]

I had fun figuring out why this works, so in the interest of keeping this post short, I’ll leave that as an exercise for the reader. A hint: the process of cutting off squares is essentially Euclidean division.

Although we started with rectangles whose side lengths are integers, there’s no reason to restrict the above process to that case. In fact, if this process seems familiar, you may have seen it before in the special case of a golden rectangle, in which only one square of each size can be included:

This is related to the fact that the (infinite) continued fraction of the golden ratio has all $1$s: \[ \frac{1 + \sqrt{5}}{2} = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \ddots}}} \] which I explained in another way in my last post.

One more example worth considering. Here is the start of the process when the side lengths of the rectangle are $\pi$ and $1$:

So there are three of the largest boxes, seven of the next largest, then fifteen of the next size (although the resolution of this image doesn’t let you see that). The fact that the first two steps nearly fill the entire rectangle is why $3 + 1/7 = 22/7$ is such a good approximation for $\pi$.

P.S. Dusa McDuff is also married to John Milnor, who gave the Hedrick Lectures in 1965. What other husband-and-wife pairs (or other pairs of family members) have both given the same high-profile math lecture series?

addition of fractions is bilinear

2017-06-25T16:45:00.000-07:00

The prototypical example of a bilinear operation is multiplication, $b(x,y)=xy$. For a function of two variables, bilinear means “linear in each variable.” That means multiplication has the following properties: \[ b(x_1+x_2,y) = b(x_1,y) + b(x_2,y) \] \[ b(x,y_1+y_2) = b(x,y_1) + b(x,y_2) \] \[ b(cx,y) = b(x,cy) = cb(x,y) \] Most operations that are called some kind of “product” (e.g., inner product, cross product, wedge product) get this name, at least in part, because they have the above properties.

Addition is also a function of two variables, but it is not linear in either variable; all of the above axioms fail. So how could it be bilinear? Read on!

Fractions are curious things, because they are pairs of things: a fraction has a numerator (which numerates, meaning it counts some number of things) and a denominator (which denominates, meaning it names the type of thing being counted). And we assume that the numerator and denominator are themselves the same type of thing (whether integers, polynomials, or what have you), so that we can make sense of the equality of fractions: we say $p/q$ and $r/s$ are equal if $qr = ps$.

(I have heard that my grandfather-in-law, who was also a college math professor, claimed it was self-evident that college students should be confused by fractions, in rebuttal to those who would say that fractions are trivial once one has reached the level of college mathematics; $2/4$ and $1/2$ are either not clearly the same, or clearly not the same.)

Leaving aside the equality of fractions for a moment, addition of fractions is also a curious thing. The most obvious operation is to add the numerators and denominators separately. This is sometimes called Farey addition, which is a rich topic that truly deserves its own treatment. Let’s use $\oplus$ to write this kind of addition: \[ \frac{p}{q} \oplus \frac{r}{s} = \frac{p + r}{q + s}. \] That’s not how addition of fractions is usually defined, however. Instead, we produce a new denominator which is a product of the old denominators, and a new numerator which takes into account both the old numerators and the old denominators in a product-y sort of way: \[ \frac{p}{q} + \frac{r}{s} = \frac{ps + qr}{qs} \] (Indeed, you might recognize the numerator as the inner product of $(p,q)$ and $(s,r)$, or of $(q,p)$ and $(r,s)$. In either case, one of the pairs has its numerator and denominator switched.) Perhaps now you see where the title of the post comes from: addition of fractions is bilinear when we treat the fractions as pairs (Farey addition is, after all, essentially vector addition): \[ \left(\frac{p_1}{q_1}\oplus\frac{p_2}{q_2}\right) + \frac{r}{s} = \left(\frac{p_1}{q_1} + \frac{r}{s}\right) \oplus \left(\frac{p_2}{q_2} + \frac{r}{s}\right) \] \[ \frac{p}{q} + \left(\frac{r_1}{s_1}\oplus\frac{r_2}{s_2}\right) = \left(\frac{p}{q} + \frac{r_1}{s_1}\right) \oplus \left(\frac{p}{q} + \frac{r_2}{s_2}\right) \] \[ \frac{cp}{cq} + \frac{r}{s} = \frac{p}{q} + \frac{cr}{cs} = \frac{c(ps + qr)}{c(qs)} \] The last property above is what makes addition of fractions well-defined, meaning that if we replace one fraction in the sum with an equivalent fraction, then the resulting sum is equivalent to the original sum.

This is all very cute and would simply be a curiosity, but I’m writing about it because this property of bilinearity recently helped me understand something about continued fractions.

Thanks to the student research I'm directing this summer, I'm finally learning some stuff about continued fractions.
— Joshua Bowman (@Thalesdisciple) June 2, 2017

A finite continued fraction is an expression of the form \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{\ddots + \cfrac{b_{n-1}}{a_{n-1} + \cfrac{b_n}{a_n}}}}} \] (In a simple continued fraction we assume that all of the $b_i$ equal $1$, but for my purposes here there’s no reason to make that assumption.) The operations are nested, so to find the value of this expression, we start from the bottom. For example, \[ 1 + \cfrac{2}{3 + \cfrac{4}{5 + \cfrac{6}{7}}} = 1 + \cfrac{2}{3 + \cfrac{4}{\cfrac{41}{7}}} = 1 + \cfrac{2}{3 + \cfrac{28}{41}} = 1 + \cfrac{2}{\cfrac{151}{41}} = 1 + \dfrac{82}{151} = \dfrac{233}{151} \] An infinite continued fraction has the same form as a finite continued fraction, except that the nesting doesn’t stop: \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{a_3 + \ddots}}} \] To make sense of this expression, we need—as is typical in such cases—to consider a sequence of finite continued fractions that should “approximate” this infinite continued fraction. If we truncate the infinite continued fraction after $a_n$, then we get its $n$th convergent, which we write as $p_n/q_n$. If the sequence of convergents converges, well, then, its limit is the value of the infinite continued fraction. (The Mathologer has an excellent video on infinite continued fractions, which includes an example of an apparent paradox that can arise from an infinite continued fraction that doesn’t converge.)

The definition is nice as far as it goes, but wouldn’t it be even nicer to know how successive convergents are related? That is, how can we get $p_{n+1}/q_{n+1}$ from earlier convergents? Fortunately, there are simple recurrence relations due to Euler and Wallis: \[ p_{n+1} = a_{n+1} p_n + b_{n+1} p_{n-1}, \qquad q_{n+1} = a_{n+1} q_n + b_{n+1} q_{n-1}. \] It was in trying to understand these formulas that I got stuck. Finite continued fractions, as explained above, are computed from the bottom up. But when we move from the $n$th convergent to the $(n+1)$st convergent of an infinite continued fraction, we add a new term to the bottom. That is, we change the start of the computation, not the end of it. That throws off my instincts for how recurrence should work.

It’s easy to find proofs of the Euler–Wallis recurrence relations. Most of them use induction, which is fine. But many also implicitly use the fact that continued fractions “can be” defined with real numbers for $a_i$ and $b_i$, not just integers. I never specified integers in the definitions above, either, but in many cases one wants to make that restriction, including the purpose for which I’ve been studying continued fractions. I felt it should be possible to prove such a fundamental relationship without leaving the natural “context” of whatever type of numbers was allowed to begin with.

Here is where the bilinearity of fraction addition came to my aid. One of the nice things about bilinear operations is that, if you fix one input, the result is just an ordinary linear function with respect to the other input, which can represented by a matrix. If we fix $p/q$, say, and add to it a variable fraction $x/y$, and represent the fractions as vectors (since we’re thinking of them as pairs anyway), then the calculation of the sum $p/q+x/y$ becomes \[ \begin{pmatrix} py + qx \\ qy \end{pmatrix} = \begin{pmatrix} q & p \\ 0 & q \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}. \] When computing continued fractions, we repeatedly start with a fraction, invert it, multiply the (new) numerator by some quantity, and add another value. All of these have representations as linear transformations: the reciprocal of $x/y$ is given by \[ \begin{pmatrix} y \\ x \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}. \] Multiplying the numerator of a fraction $x/y$, say by a constant $c$, is given by \[ \begin{pmatrix} cx \\ y \end{pmatrix} = \begin{pmatrix} c & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] Putting together the above linear transformations, we find that if $a$ and $b$ are fixed, then the matrix form of the transformation that sends $x/y$ to $a+b/(x/y)$ is \[ \begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix} \begin{pmatrix} b & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} a & b \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] We can find $p_n/q_n$, then, by starting with $a_n = a_n/1$ and applying the above procedure repeatedly: \[ \begin{pmatrix} p_n \\ q_n \end{pmatrix} = \begin{pmatrix} a_0 & b_1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} a_1 & b_2 \\ 1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & b_n \\ 1 & 0 \end{pmatrix} \begin{pmatrix} a_n \\ 1 \end{pmatrix} \] However, something is aesthetically “off” with this formula: the $a$ term and the $b$ term in each matrix on the right have different indices.

Let's try rearranging our steps so that like indices get grouped together: first invert the fraction, then add $a$, then multiply the denominator of the result by $b$ (which works out correctly, because in the next round the first thing we’ll do is invert again). This sequence of steps produces the formula \[ \begin{pmatrix} 1 & 0 \\ 0 & b \end{pmatrix} \begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} a & 1 \\ b & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] To make this work in computing the $n$th convergent of a continued fraction, we need to start with $a_n/b_n$ (already promising), and we need to set $b_0 = 1$ so that when we divide by it, we don’t change the final value (we were missing a $b_0$ anyway, which was also somewhat unsatisfying). Therefore, to find $p_n/q_n$ by repeated linear transformations, we have \[ \begin{pmatrix} p_n \\ q_n \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n \\ b_n \end{pmatrix} \] We can improve this formula a bit more by thinking about where $a_n/b_n$ came from. If we followed the same steps as before, then we must have inverted something, added it to $a_n$, and multiplied the denominator of the result by $b_n$. The thing we must have added to $a_n$ is $0 = 0/1$, which came from “inverting” $1/0$. Thus, if we set $p_{-1} = 1$ and $q_{-1} = 0$, then the following equation holds for all $n \ge 0$: \[ \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \] Whenever you end up with a formula this beautiful, you know you must have done something right.

It was an encounter with this matrix representation in a set of number theory notes that sparked the thoughts that led to this post. That set of notes did not contain a proof of the matrix formula, however. A few other sources that I have since found do use this formulation and prove it, such as this set of notes, which also includes a similar discussion to mine.

Back to the question of proving the Euler–Wallis recurrence relations. Suppose we have computed the $n$th convergent of a continued fraction, and we wish to proceed to the $(n+1)$st convergent. In our new formulation, that just means appending another matrix to the product: \[ \begin{pmatrix} p_{n+1} & p_n \\ q_{n+1} & q_n \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \begin{pmatrix} a_{n+1} & 1 \\ b_{n+1} & 0 \end{pmatrix} \]

If we group all but the last matrix on the right together, use the previous equation, and then carry out the final multiplication, we get: \[ \begin{pmatrix} p_{n+1} & p_n \\ q_{n+1} & q_n \end{pmatrix} = \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} \begin{pmatrix} a_{n+1} & 1 \\ b_{n+1} & 0 \end{pmatrix} = \begin{pmatrix} a_{n+1} p_n + b_{n+1} p_{n-1} & p_n \\ a_{n+1} q_n + b_{n+1} q_{n-1} & q_n \end{pmatrix} \] Thus, the Euler–Wallis recurrence relations are a consequence of the fact that matrix multiplication is associative (and also the fact that addition of fractions is bilinear!).

As a bonus, the difference between successive convergents is now seen to be a consequence of the fact that the determinant is multiplicative: \[ \begin{vmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{vmatrix} = \left\vert \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \right\vert \] \[ p_n q_{n-1} - p_{n-1} q_n = (-1)^{n+1} b_1 b_2 \cdots b_n \] \[ \frac{p_n}{q_n} - \frac{p_{n-1}}{q_{n-1}} = (-1)^{n+1} \frac{b_1 b_2 \cdots b_n}{q_n q_{n-1}} \] An infinite continued fraction can therefore be written as an alternating series: \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{a_3 + \ddots}}} = a_0 + \sum_{n=1}^\infty (-1)^{n+1} \frac{b_1 \cdots b_n}{q_n q_{n-1}}. \] When all of the $a_i$s and $b_i$s are positive, the condition for this series to converge is essentially that the $b_i$s don’t grow too quickly with respect to the denominators $q_n$. In particular, if $b_i = 1$ for all $i$ (as in the case of a simple fraction) and $a_i \ge 1$ for all $i$, then the $q_n$s grow exponentially fast, and so the series (and hence the infinite continued fraction) is guaranteed to converge!

No introductory piece on continued fractions is complete without mentioning Fibonacci numbers and the golden ratio. Let $\phi$ be the number defined by the infinite continued fraction \[ \phi = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \ddots}}} \] that is, $a_i = b_i = 1$ for all $i$. Then we have \[ p_{-1} = 1, \qquad q_{-1} = 0, \qquad p_0 = 1, \qquad q_0 = 1 \] \[ p_{n+1} = p_n + p_{n-1}, \qquad q_{n+1} = q_n + q_{n-1} \qquad \forall\ n. \] Then, with the convention that the Fibonacci numbers $F_n$ are defined by \[ F_0 = 0, \qquad F_1 = 1, \qquad F_{n+1} = F_n + F_{n-1} \] we have $p_n = F_{n+2}$ and $q_n = F_{n+1}$ for all $n \ge -1$. Using the matrix formulation for computing convergents of $\phi$, this means \[ \begin{pmatrix} F_{n+2} & F_{n+1} \\ F_{n+1} & F_n \end{pmatrix} = \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^{n+1} \] (Try it!) On the other hand, the continued fraction equation for $\phi$ implies that \[ \phi = 1 + \dfrac{1}{\phi}, \] which is the defining equation for the golden ratio, and has the solution $\phi = \dfrac{1 + \sqrt{5}}{2}$. (We reject the negative solution because the value of the continued fraction is positive.) So it follows immediately that \[ \lim_{n\to\infty} \frac{F_{n+1}}{F_n} = \frac{1 + \sqrt{5}}{2} = 1 + \sum_{n=1}^\infty \frac{(-1)^n}{F_n F_{n+1}}. \]

If I have time, another day I’ll explain what all of this has to do with hyperbolic geometry, the Farey–Ford tessellation, and the dynamics of circle rotations.

existence and uniqueness for some simple ODEs

2017-03-15T14:59:00.000-07:00

Last week in my analysis class we discussed ordinary differential equations (ODEs) and their solutions. The proof of the existence and uniqueness of solutions to ODEs is one of my favorite examples of an extremely abstract theorem (in this case, the contraction mapping principle) being used to solve a concrete problem (how do we know if an ODE has a solution, and if so, whether it’s uniquely specified by its initial conditions?). This post is just to give some simple examples that unearth the concerns one might have.

Here are the three examples I’d like to consider: \[ y' = y, \hspace{0.5in} y' = y^2, \hspace{0.5in} (y')^2 = y. \] The first equation is familiar as characteristic of exponential functions. With the initial condition $y(t_0) = C$, we obtain the solution $y(t) = Ce^{t-t_0}$. Thus every initial condition results in a globally-defined solution. With this solution in hand, we can even show that it is unique, by a bit of trickery. Suppose $f(t)$ is any solution to the initial value problem $y'=y$, $y(t_0)=C$, and consider the function $g(t)=\frac{Ce^{t-t_0}}{f(t)}$. Then \[ g'(t) = \frac{f(t) Ce^{t-t_0} - f'(t) Ce^{t-t_0}}{(f(t))^2} = \frac{Ce^{t-t_0}}{(f(t))^2}(f(t) - f'(t)). \] But by assumption $f'(t) = f(t)$, so $g'(t) = 0$. Thus $g(t)$ is constant, and because $g(t_0) = 1$, we must have $g(t)=1$ for all $t$. Therefore $f(t) = Ce^{t-t_0}$. (How does the case where $f(t) = 0$ for some $t$ need to be handled?)

The form of the second equation suggests that its solutions, once they get above $y = 1$, should grow faster than exponential functions, because the growth rate depends quadratically on $y$ rather than linearly. A bit of thought suggests the solution $y(t) = -\frac{1}{t}$; by translating in the $t$-direction, we can satisfy any initial condition with a solution of the form $y(t) = -\frac{1}{t-a}$. Again, the solution is uniquely specified by its initial condition. But these solutions are no longer globally defined; each one “blows up” at some finite time, either before or after the point at which we specify the initial condition. This shows that in general we cannot expect global solutions to ODEs; the usual existence theorem only guarantees local existence of solutions (although global existence can be guaranteed by stronger hypotheses, which are rarely satisfied).

The third equation looks similar to the second, but squaring the $y'$ term rather than $y$ has some major consequences. First, the equation implies that solutions must be non-negative. This is not serious, however; changing the equation slightly to $(y')^2 = |y|$ allows solutions to be negative. More serious is the fact that both of the following functions are solutions that satisfy the initial condition $y(t_0) = 0$: \[ y(t) = 0 \hspace{1in} y(t) = \frac14(t - t_0)^2. \] Indeed, by piecing together these two types of solutions, we can obtain solutions that remain 0 for any length of time, then “take off” unexpectedly. In short, we have existence, but we definitely do not have uniqueness, because as soon as a solution reaches zero, it can remain zero for an indeterminate amount of time. (We do have local uniqueness for solutions that have a non-zero initial condition, however.) The issue is that the equation allows the derivatives of its solutions to grow and shrink too quickly when they are near zero; more precisely, the function $\sqrt{|y|}$ is not Lipschitz in any interval containing 0.

As I said, my purpose here is not to expound the statement or proof of any theorem on existence and uniqueness, just to provide some simple examples that illustrate what considerations must be made in formulating such a theorem.

specifications in analysis

2016-08-19T15:45:00.000-07:00

Earlier this week, I wrote about expectations for my analysis class this fall (which also apply broadly to upper-level math classes) and some things I learned about specs grading this summer. In this post, I’ll share the specifications I have created for analysis. (I have taught real analysis before, and last time I tried a standards-based approach. Frankly, that basically turned into a point system, albeit a simplified one, which is why I’m trying something completely different this time.)

The rest of the post is taken verbatim from (the current draft of) my syllabus.

Effective learning requires effective methods of assessment. The assessments should relate as directly as possible to the expectations of the class, and they should provide both feedback on how to improve and opportunities to demonstrate improvement as the semester progresses. In my experience, “traditional” grading schemes based on assigning points or percentages to individual tasks do not serve these functions well. Therefore, this course adopts specifications grading*, in which grades are tied to specific outcomes. This is likely to be different from grading policies in other classes you have taken, so feel free to ask me questions or let me know if you have concerns. I hope that this system will make clear the connections between the expectations stated in the previous section and the ways you will be assessed.

Overall grading. At the end of the semester, I am required to submit to the university a letter grade reflecting your achievement in this class. That grade will be determined on the basis of a set of specifications in four areas: (1) class participation, (2) written proofs, (3) exams, and (4) synthesizing activities. Each of these areas will receive a simple grade of A, B, C, D, or F. The following sections describe how these grades will be determined. Your final grade will depend on your performance in all four areas, according to the following table.

Final grade	based on individual grades of
A	all As, or 3 As and 1 B
A–	two As and two Bs
B+	one A and three Bs
B	all Bs, or 3 Bs and 1 C
B–	two Bs and two Cs
C+	one B and three Cs
C	all Cs, or 3 Cs and 1 D
D–	two Cs and two Ds

I will use my discretion to assign a final letter grade to other combinations of individual letter grades.

Class participation. Attendance at every class meeting is required. Most weeks, we will alternate days between discussing reading assignments and presenting solutions to exercises. The end of this syllabus has a schedule of what we will be doing in class each day (with allowance for adjustments, as needed).

Reading. In order to participate effectively on discussion days, you will need to read the textbook before coming to class. Each reading assignment is about 10 pages. The textbook attempts to be very accessible, but that does not mean it is easy. We will be working with ideas that stretch reason and imagination. You should be prepared to spend at least 1–2 hours on each reading assignment; rereading pages, paragraphs, or sentences; working out examples; and writing questions or comments in the margins or on separate paper. You should be especially mindful of definitions. These are not always set apart from the text, so pay attention when new vocabulary is introduced. Start working on a list of definitions and theorems from the start of the semester. The chapter summaries can be an aid in this process.

Collaborating. On days with a reading assignment, you will work in small groups to discuss the material. I will assign these groups at the start of each week. You should bring your own questions and thoughts to these discussions. If there is extra time, you can also discuss the current set of exercises.

Presenting. On the remaining days, you will take turns presenting solutions to exercises distributed previously. The solution you present does not necessarily need to be entirely correct, but it should show evidence of a serious effort. You should also be prepared to answer questions from me or other students. To maintain balance, no one will be allowed to present more than once every two weeks, unless every student in the class has already presented during that time period. In exceptional cases, some of these verbal presentations may be made to me outside of class (no more than one per student).

To earn a	you must do the following
D	attend at least 75% of class meetings present at least one proof in class
C	attend at least 85% of class meetings and contribute to discussions present at least three proofs in class
B	attend at least 90% of class meetings and contribute to discussions present at least four proofs in class
A	attend all class meetings (2 unexcused absences allowed) and contribute to discussions present at least five proofs in class

Written proofs. Over the semester, you will develop a portfolio of work that you have submitted for formal assessment. Most of your contributions will be proofs. Each week I will indicate one or more exercises whose solutions could be submitted to your portfolio. You may discuss your work with other students in the class, to have them check whether it meets the standards of the class and give you feedback. A proof for the portfolio is due the Monday after it is assigned. These proofs must be typed using LaTeX, Google docs, Microsoft Word, or another system.

When you submit a written proof for your portfolio, I will judge whether it is Successful, Quasi-successful, or Unsuccessful (see the earlier section on “Proofs” under “Expectations” for details about these ratings), and mark it correspondingly with one of S/Q/U. Proofs marked Q or U will not be counted towards your grade. However, proofs can be resubmitted at the cost of one or two of your allotted tokens; see section on “Tokens” below.

To earn a	your portfolio must contain
D	at least four successful proofs
C	at least six successful proofs
B	at least eight successful proofs
A	at least ten successful proofs

Exams. There will be two midterm exams and a final exam. Each one will have a take-home portion and an in-class portion. [Dates and times, listed in syllabus, omitted here.]

The take-home portions will consist of two or three proofs that you are to complete on your own, without consulting other students. (You may discuss your work with me before turning in the exam, although I might not answer questions directly.) These will be judged as successful, partially successful, or unsuccessful, like the proofs in your portfolio. They cannot be resubmitted after grading, however.

The in-class portions will test your mastery of definitions and the statements of theorems. You will need to be able to state both definitions and theorems properly. You will also be asked to recognize and provide examples of situations or objects where a definition or theorem does or does not apply.

To earn a	you must do the following
D	correctly answer 60% of in-class test questions write at least two successful proofs on take-home exams
C	correctly answer 75% of in-class test questions write at least three successful proofs and one quasi-successful proof on take-home exams
B	correctly answer 85% of in-class test questions write at least four successful proofs and two quasi-successful proofs on take-home exams
A	correctly answer 95% of in-class test questions, write six successful proofs on take-home exams

Synthesis. To master the ideas of the class, you must spend time synthesizing the material for yourself. The items in this graded section will be added to your portfolio, to complement the proofs. All materials in this section must be typed using LaTeX, Google docs, Microsoft Word, or another system.

List of definitions and theorems. It should be clear at this point that being able to produce accurate statements of definitions and theorems is essential to success in this class. To encourage you to practice these, I am requiring you to create a list of these statements for the entire course. Your list should be organized in some way that makes sense to you—e.g., alphabetically or chronologically.

The textbook can be used as a reference, as can the internet, but how do you quickly recall what definitions we’ve used and how they're related? How do you find the phrasing of a theorem that’s become most familiar? This list should help you in these situations. More importantly, creating it will help you review and organize the material in your own mind.

I will verify your progress on these lists at each in-class exam.

Papers. Twice during the semester, once in the first half and once in the second half, I will provide a list of topics that we have been discussing, from which you can choose to base a paper on. These will be due approximately two weeks after the midterm exams.

There is a third paper that can be completed at any point in the semester on a topic of your choosing, but you must get the topic approved by me before Thanksgiving.

These papers will for the most part be expository, meaning they will present previously known mathematical results (not original research). Here are the requirements for a paper to be acceptable:

It should have 1500–4500 words.
It should use correct grammar, spelling, notation, and vocabulary.
It should be organized into paragraphs and, if you wish, sections.
It should cover the topic clearly and reasonably thoroughly, with an intended audience of other math students (who may be assumed to have studied as much analysis as you).
It should contain a proof of at least one major result.
The writing should be original to you. Of course, small pieces like definitions may be taken directly from another source, but apart from these the paper should be your own work.
Citations are generally not necessary in expository mathematical writing, except for the following: a statement of theorem that you are not proving, a peculiar formulation of a concept/definition, or a creative idea (e.g., an uncommon metaphor or illustration) from another source.
You may choose to follow the style of our textbook, or a more formally structured math textbook, or something more journalistic or creative, as long as the previous criteria are met.

Papers that do not meet these criteria will be considered unsatisfactory and will not count towards your grade. An unsatisfactory paper can be revised and resubmitted at the cost of three tokens.

To earn a	you must do the following
D	create a list of definitions and theorems to include in your portfolio
C	create a list of definitions and theorems to include in your portfolio write a paper on one of the topics provided
B	create a list of definitions and theorems to include in your portfolio write two papers on the topics provided, one during each half of the semester
A	create a list of definitions and theorems to include in your portfolio write two papers on the topics provided, one during each half of the semester write a third paper on a topic of your own choosing related to the class

Tokens. You start out the semester with seven (7) virtual “tokens,” which can be used in various ways:

One token allows you to resubmit a written proof initially judged to be quasi-successful (must be used within one week of initial grading).
Two tokens allow you to resubmit a written proof initially judged to be unsuccessful (must be used within one week of initial grading).
Three tokens allow you to resubmit an unsatisfactory paper (must be used within one week of receiving paper back).
One token gives you a 48 hour extension past the due date for a paper.

Unused tokens may be exchanged for a prize at the end of the semester. [maybe?!?]

*Based on Linda Nilson’s book Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time.

taking specs seriously

2016-08-18T11:26:00.000-07:00

I’ve been an advocate of standards-based grading since I started using it over three years ago. It has addressed many of the concerns I had about the dominant point-based grading system and encouraged students to move forward in their understanding rather than feeling trapped by past performance.

I’m not solely an SBG proponent when it comes to grading, however. For one thing, I find it hard to adapt SBG to upper-level math courses. For another, the time seems ripe for experimentation in grading practices as more of us realize the shortcomings of what we have inherited from decades past. Not that we should constantly reinvent the grading process, but we should be open to various thoughtful ways of providing authentic assessment.

So I was certainly interested a couple of years ago when several fellow instructors began talking about specifications grading, a method espoused by Linda Nilson in her book Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time. I adopted some of the ideas I heard and appreciated the increased flexibility it offered.

However, it was not until this summer that I read through Nilson’s book. It was useful because it seems Nilson and I think differently in ways I can’t quite put my finger on, and so the book has lots of ideas I would not have intuited on my own. Here are a few of the things I garnered from reading the book that I hadn’t picked up from online discussions (not that these things weren’t said, but this time they stuck):

Sometimes it’s OK to use percentages. I’ve been highly points- and percentages-averse since starting SBG. Percentages, my argument went, were essentially meaningless, because they’re constantly being curved (so they don’t really represent a “percentage” of anything) and the difference between 80% and 81% is essentially a coin toss (so they aren’t as linearly ordered as people like to think). But that argument isn’t uniformly true. In a course where precision is important, it is possible to measure, for instance, how many definitions a student can correctly state. For my upcoming analysis class, I expect “A” students to get definitions right 95% of the time, “B” students 85% of the time, “C” students 75% of the time. This really is quantifiable, and a definition is either correct (with respect to the established conventions of the subject) or not, so each one can be graded yes/no. As long as not everything is forced into a percentages model, this can be an effective way to give feedback.
Make students work for an A, but give them some choice in how to get there. As instructors, we want an A to represent mastery, an indication that the student can think nimbly and complexly about the subject. Ideally, students who earn an A will be the ones most invested in the subject. To demonstrate all this, students should have ownership of their work. They should make meaningful choices that reflect their interests and their skills as well as the subject at hand.
Not everyone has to do everything. This is closely tied to the previous point. Nilson uses the metaphor of “hurdles”: grade levels can be differentiated by having students clear either higher hurdles (more complex, better quality work) or more hurdles (more extensive work), or a mix of the two. I’m not generally a fan of having students earn higher grades by just proving they can do more—that takes more of my time, and more of theirs. But true mastery requires a measure of initiative. Having a small number of optional assignments that give students opportunities to distinguish themselves makes sense as part of a larger grading scheme.
There are good reasons to limit reassessments. Of course, one of these reasons is the subtitular “saving faculty time.” In past upper-level classes where I’ve allowed essentially unlimited resubmission, I’ve been swamped/behind at several points in the semester as students frantically tried to get something accepted. But that’s not even the best reason. By limiting reassessments and grading work pass/fail (or pass/progressing/fail or some other variant), students are encouraged to submit their best work each time, and to spend extra time making sure they check its quality before asking me to do so. The onus is on me to establish clear expectations, and on students to meet them. We’re not negotiating what’s acceptable through repeated revision and grading.

I also found the chapter on cognitive models (Chapter 3, “Linking Grades to Outcomes”) helpful in considering what it means to have a higher level of mastery; previously I wasn’t really familiar with anything beyond Bloom’s Taxonomy.

If this post was of interest to you, I hope you’ll consider joining the ~~Google+ Community on Standards-Based and Specifications Grading” (SBSG)~~ Slack workspace on mastery grading, where teachers of diverse disciplines are meeting to discuss how to implement these two particular alternative forms of grading.

Tomorrow I’ll share my full set of specifications for real analysis.

expectations in analysis

2016-08-16T13:46:00.000-07:00

I’m working on the syllabus for my (junior and senior level) analysis class this fall, and I’d like to share some parts of it, hopefully thereby eliciting feedback. The main thing I’m concerned about is the type of specifications grading I’m adopting for the class—I’ll share that later this week. This post is about establishing the expectations of the course, on which the specifications will be based. None of these are particular to analysis; they establish what I believe any student in an upper-level mathematics course should achieve.

The rest of the post is taken verbatim from (the current draft of) my syllabus.

To learn mathematics, it is essential to engage actively with the material. This is especially true at this stage in your mathematical careers, as the focus of study shifts from developing computational tools to examining underlying concepts and practicing abstract reasoning. This shift may be described as a move from pre-rigorous thinking, which is informal and intuitive, to rigorous thinking, which is formal and precise. (This terminology has been suggested by mathematician Terence Tao; he also includes a post-rigorous stage, in which professional mathematicians work, where one is able to make intuitive arguments that are grounded by formal training.)

The content of this course resides in definitions, theorems, and proofs. You will be expected to state both definitions and theorems accurately and to illustrate them through examples. Mathematics is not merely a collection of disconnected facts, however, and so you will also develop your logical skills by proving mathematical truths, linking definitions to their profound consequences captured by theorems. All of this will happen in the context of a community—two really, our class and the larger mathematical community.

Definitions. In mathematics, as in other sciences, it is necessary to quantify what is being studied and to be able to identify what is of interest at each moment. This is done by carefully establishing and internalizing definitions. This is not to say that definitions do not involve creativity; as a subject develops, often definitions evolve to encompass more or fewer cases, to be more precise, or to reorganize ideas.

By the end of the course, you should be able to:

state definitions accurately and explain any notation or previously-defined terms they contain;
identify whether or not an object meets the conditions of a given definition;
give examples that satisfy a given definition as well as examples that do not satisfy it;
test an unfamiliar definition using examples;
create new definitions when needed.

Theorems. A theorem has two parts: the antecedent (its assumptions) and the consequent (its conclusions). To take a familiar example, the equation $a^2 + b^2 = c^2$ by itself is not a theorem; rather, the Pythagorean Theorem states that “If $c$ is the length of the hypotenuse of a right triangle, and $a$ and $b$ are the lengths of its other two sides, then $a^2 + b^2 = c^2$.” A theorem may not always include the words “if” and “then,” but you should always be able to determine what are the antecedent and the consequent. Sometimes rephrasing the theorem’s statement can help. For example, “Every differentiable function is continuous” can be rephrased as “If a function is differentiable, then it is continuous.” In most cases, the consequent does not imply the antecedent (e.g., not every continuous function is differentiable). A theorem that says one set of conditions holds “if and only if” another set of conditions holds is logically making two statements (the antecedent and consequent can be reversed), and both must be proved.

By the end of the course, you should be able to:

state theorems accurately and identify what are their assumptions and their conclusions;
determine whether the conditions of a theorem do or do not hold in a given situation, explain why, and determine what the theorem does or does not imply in that situation;
recognize logically equivalent forms of a theorem;
formulate and test conjectures.

Proofs. Proofs are how we as individuals and as a community determine the truth of mathematical statements, i.e., theorems. Here is one definition of a proof, due to David Henderson: A proof is “a convincing communication that answers -- Why?” The extent to which a proof succeeds, therefore, depends on how well it embodies these three properties: it should be logical (does it convince?), it should be comprehensible (does it communicate?), and it should be intentional (does it answer why?). Evidently, each of these properties depends somewhat on the others. It is thus reasonable to classify proofs into an S/Q/U system:

(S) A successful proof makes an argument for the truth of a mathematical statement that is fully convincing to an informed reader or listener. It employs appropriate vocabulary and carefully chosen notation. It avoids sloppy reasoning. It makes clear use of the theorem’s assumptions and, when necessary, previously known results. The best examples provide motivation for the methods chosen. Minor revisions may be advisable, but they do not hinder the overall effectiveness.
(Q) A quasi-successful proof contains most of the ideas necessary to make a complete argument. It may have slips in logic or notation, or it may neglect a special case, or it may be hard to read. It contains sufficient evidence, however, that the argument can be “salvaged” by filling in gaps or clarifying language. Serious revision is necessary. [Not in syllabus: thanks to Dan for suggesting “quasi-”.]
(U) An unsuccessful proof does not convince an informed person of the truth of the purported theorem, for one or more of the following reasons: – It makes logical leaps or omits key ideas. – It demonstrates incomplete understanding of definitions or notation. – It fails to reference previous results when appropriate. Complete revision is generally necessary.

In other words, a successful proof is of sufficient quality that it could reasonably be accepted as part of a paper in a professional journal. A quasi-successful proof has some merit, but it requires revision, after which it might or might not be acceptable at a professional level. An unsuccessful proof is sufficiently flawed that it would not be acceptable as part of a professional publication.

By the end of the course, you should be able to:

evaluate, on the basis of professional standards, whether a given proof is successful or not;
write original, successful proofs.

Community. Our class time will be structured primarily around discussion rather than lecture. The idea is to have a space that promotes sharing ideas, making guesses, taking risks, and sharpening our reasoning abilities. I will guide and facilitate these conversations, but everyone is responsible for contributing to discussions, both in small groups and with the entire class. That is, in this course mathematical authority resides not just with me as the instructor, but with every class member. I will give short lectures (20 minutes) when the entire class agrees it would be beneficial, but not more often than once a week.

By the end of the course, you should be able to:

engage in discussions about mathematics by sharing questions, proposals, and insights;
evaluate others' contributions critically and respond constructively;
present your own work in front of an audience and address their comments and questions.

why I do math

2016-06-29T17:00:00.000-07:00

I spent the past week on retreat with fellow faculty members. During part of this time, we each shared our “vocational journeys,” or the stories of how we have been led to this job and these fields of scholarship. I thought that part of my essay might have broader appeal, so I’m posting it here.

Why do I study and teach mathematics? My research is in the field of pure mathematics, for which it may seem harder to justify an investment of time than for its adjacent field of applied mathematics. Applied math at least tries to tie itself directly to the needs and concerns of our immediate physical world. Pure math is happy to oblige in improving how well we understand the world, but its primary concern is math for math’s sake. (The boundary between these two types of math is highly permeable, and even pure math almost always starts with inspiration from experience.) I’d like to address the question by comparing math with two other areas represented in academia: music and science.

First, math is like music. The aesthetic element in mathematics is essential, not peripheral. I’m not sure, but I think that in the minds of many people mathematics is reduced to a collection of more-or-less arbitrary facts, like the fact that the area of a circle equals pi times the square of its radius. Each of these facts, however, is like the final cadence of a symphony. It may be thrilling by itself, but it’s missing the indispensable context of “where did we start?” and “how did we get here?”

This is why mathematicians insist on proving things: the proof is a whole symphony, not a single chord. Mathematicians are lauded not for stating facts, but for demonstrating their necessity, the way composers and musicians are praised for the whole course of a piece or a performance, not just its ending. When executed well, a proof has rhythm. It has themes that are developed and interwoven. It has counterpoint. It sets up expectations that are satisfied or subverted. Economy of material is valued, but not exclusively; an argument that wanders into neighboring territory, like a modulation to a neighboring key, can provide fuller appreciation of the main theme.

Proofs have a variety of forms, some as common as sonatas and minuets: direct proof, proof by contradiction, proof by induction, proof by picture, proof by exhaustion. We have computer-generated musical compositions and computer-generated mathematical proofs, and in both communities there is healthy debate about whether these artificial creations are beautiful or desirable in such quintessentially human activities. We return over and over to the same pieces and theorems that have inspired us, whether they be simple or grand, and each performer gives her or his own interpretation and inflection to the presentation.

Second, math is like science. Often mathematics is categorized as a science, and that’s not entirely wrong. Science is built on careful observation, winnowing data from the chaff of noise. Science seeks explanation which can be turned into prediction. It invents new tools for collecting information and improves upon those that already exist. It creates models and theories that encompass and relate as many pieces of knowledge as possible.

Where science and math differ is that science deals with the world in which we live, while the world of math is imagined. Imagine that there are such things as points with no volume and perfectly straight lines that connect them. Imagine that numbers have enough solidity that we can move them around en masse by means of undetermined variables, the x, the y, the z. Imagine that once we start counting upwards 1, 2, 3, 4, 5, a thousand, a million, a trillion, a googol,…, we could never reach an end, not in any number of lifetimes in any number of universes. Or imagine that the filigree of a fractal truly exists at every scale, that we can examine it closer and closer and see the ever-increasing detail, that there is no quantum barrier to our exploration, beyond which sight and measurement cease to be meaningful.

When we imagine these things, we create the worlds in which we make our observations. The rules of these worlds are not completely arbitrary, at least not if we want to be able to know anything about them, but they are ours to choose. Each time we choose anew, we enter an undiscovered country. Once in this country, we must return to scientific methods of study. We look for patterns, try to explain them, and check that our explanations make accurate predictions. We must know when to trust the instruments we have—our minds, computer programs, results proved by other mathematicians—and when not to trust them. Like scientists, we have to winnow out the noise.

Mathematical truth persists across ages and cultures, and so it may seem timeless, but our experience of it certainly isn’t. The channels of logic through which a proof flows may be carved out once and for all in eternity or in the human mind (depending on your view of where mathematical truth lies), but like notes on a page they remain inert until they are brought to life by individual or communal study. Like the tree of life in biology or the standard model in physics, mathematical theories are crystallized around our experience and our perception of the world. As Bill Thurston wrote, “mathematics only exists in a living community of mathematicians that spreads understanding and breathes life into ideas both old and new. The real satisfaction from mathematics is in learning from others and sharing with others.”

Homology modulo 2

2016-05-31T22:16:00.000-07:00

Last week, I was chirping on Twitter about “homology modulo 2”: how closely it matches my geometric intuition of what homology should measure, despite my never having thought seriously about it before, and how its computational simplicity makes it seem like an ideal way to introduce homology to undergraduates, even those who haven’t studied linear algebra. For a very complete graduate-level introduction to homology (and cohomology) modulo 2, check out Jean-Claude Hausmann’s book. I will instead try to demonstrate how this topic can be introduced at nearly any level, with an appropriate amount of care. For the sake of brevity, I will assume familiarity with linear algebra in this post; however, the necessarily elements (image, kernel, rank, row reduction) can easily be learned in the context of homology, particularly when working modulo 2.

Note: This post got long enough in the writing that I didn’t make any pictures to go with it, so you should draw your own! The idea is to discover how algebra can be used to extract geometric/topological information in a way that is really striking when you see it happen.

The space

For simplicity of exposition, I will only consider spaces $X$ that are created from finitely many convex polytopes (often simplices or hypercubes) by making some identifications (“gluings”) between their faces. The faces are not necessarily joined in pairs, however; more than two faces of the same dimension may be identified, or some faces might not be joined at all. A more careful definition is possible, but to provide one would get away from the fairly non-technical introduction I’m aiming for. Just assume no funny stuff happens, OK? The polytopes that make up $X$ are called the cells of $X$; the collection of cells includes all the faces of all the polytopes we started with (some of which, as noted above, have been identified with each other in pairs or larger groupings). Each cell, being a polytope, has a dimension, and if we wish to specify the dimension of a cell as $k$, we call it a $k$-cell.

For example, $X$ could just be a single convex polytope. Or it could be a convex polytope with the interior removed (keeping in mind that the boundary of a convex polytope is a union of convex polytopes of one dimension lower). The outside of a cube, for instance, is made up of six 2-cells (the faces), twelve 1-cells (the edges), and eight 0-cells (the vertices). A torus, when made from a rectangle by identifying opposite sides, is also such a space, with one 2-cell (the interior of the rectangle), two 1-cells (the result of identifying the edges in pairs), and one 0-cell (because all corners of the square are identified to the same point).

The data

The homology of $X$ measures the difference between objects in $X$ that have no boundary (these are called cycles) and objects that are the boundaries of other objects (called, quite sensibly, boundaries). A $k$-dimensional cycle that is not a boundary is supposed to “enclose” a $k$-dimensional “hole” in $X$. The formal definitions are intended to quantify what is meant by “boundary;” the intuitive notion of “hole” floats along, generally defying proper definition (and often even intuition).

By “object” in the previous paragraph, we mean something made up from the cells of $X$. We restrict ourselves to putting together cells of the same dimension, producing objects called chains. That is, a $k$-chain is just a collection of $k$-cells in $X$. We can add together $k$-chains, but—and this is the beautifully simple part—we add modulo 2. If a particular cell appears twice, then this pair of appearances cancel each other out. The idea is that, since we’re trying to study “holes” in our space $X$, if one cell appears twice, the pair of copies can be joined up along their common boundary and safely removed. Formally, a $k$-chain is a linear combination of $k$-cells, with coefficients in the field with two elements, if you find such a formal description helpful.

We now proceed to the key combinatorial data of our space $X$ and see how it can be used to extract topological information. Because $X$ is made up of finitely many cells, for each $k = 1, \dots, n$, we can construct a boundary matrix $\partial_k$. (Normally $\partial_k$ would be defined as a linear map between certain vector spaces; we are fully exploiting the equivalence between linear maps and matrices.) The columns of $\partial_k$ are labelled by the $k$-cells of $X$, and the rows are labelled by the $(k-1)$-cells. In each column, we put a 1 in each position where the corresponding $(k-1)$-cell lies in the boundary of the given $k$-cell, and a 0 otherwise. Exception. Sometimes the faces of a single $k$-cell may be joined to each other, meaning the resulting $(k-1)$-cell appears with multiplicity on the boundary of that $k$-cell. This multiplicity, modulo 2, is taken into account in the boundary matrix. See the boundary matrices of the torus, near the end, for examples of this phenomenon.

A concrete example: the tetrahedron

The boundary matrix, like most computational objects, is best understood through examples. Let’s start with the empty tetrahedron. Label the vertices $v_1$, $v_2$, $v_3$, $v_4$, and let $f_i$ be the triangular face opposite $v_i$. Let $e_{ij}$ be the edge joining $v_i$ to $v_j$, with $i < j$. Then we have two boundary matrices,

$ \partial_1 = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 \end{bmatrix} $ and $\partial_2 = \begin{bmatrix} 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \end{bmatrix}$. In $\partial_1$, the columns are labelled by the edges and the rows are labelled by the vertices. In $\partial_2$, the rows are labelled by the edges and the columns are labelled by the faces. In both matrices, the edges are listed in the order $e_{12}$, $e_{13}$, $e_{14}$, $e_{23}$, $e_{24}$, $e_{34}$. Notice that each column of $\partial_1$ has two 1s, because each edge has two endpoints, and each column of $\partial_2$ has three 1s, because each face is bounded by three edges.

Once we have these matrices, we can use them to find boundaries of more general chains. For instance, when joined together, the edges $e_{12}$ and $e_{23}$ form a path from $v_1$ to $v_3$, so we expect the boundary to be these two points. Indeed, adding together (modulo 2!) the corresponding entries from the first and fourth columns of $\partial_1$, we see that the 1s in the second entry cancel (which corresponds to the edges being joined at $v_2$), and we are left with 1s in the first and third entries. We can write this relation as $\partial_1(e_{12}+e_{23}) = v_1 + v_3$. Similarly, if we add together the first three columns of $\partial_2$, which correspond to $f_1$, $f_2$, and $f_3$, the result is a vector with 1s in the first, second, and fourth entries, which correspond to $e_{12}$, $e_{13}$, and $e_{23}$, producing the equation $\partial_2(f_1 + f_2 + f_3) = e_{12} + e_{13} + e_{23}$. This demonstrates that the union of three of the faces has the same boundary as the fourth face. The sum of all four columns of $\partial_2$ has all 0s for its entries, showing that the four faces of the tetrahedron, taken together, have no boundary.

How to extract information from the boundary matrix

Having illustrated some computations with boundary matrices in the above example, let’s codify some definitions. A collection of $k$-cells is called a $k$-cycle (or closed) if the sum of the corresponding columns of $\partial_k$ is the zero vector. (This is a formal way of saying “has no boundary.”) A collection of $k$-cells is called a $k$-boundary (or exact) if it can be obtained as a sum of columns of $\partial_{k+1}$. In linear algebra terms, a $k$-cycle is an element of the kernel of $\partial_k$, and a $k$-boundary is an element of the image of $\partial_{k+1}$. Again, the benefit of working modulo 2 is that these conditions can be easily checked. The set of $k$-boundaries is denoted $B_k$, and the set of $k$-cycles is denoted $Z_k$ (the notation $C_k$ generally being reserved for $k$-chains).

A fundamental property is that $\partial_k \partial_{k+1} = 0$, which has the satisfying geometric interpretation that “every $k$-boundary is a $k$-cycle,” or $B_k \subseteq Z_k$. This property can be checked directly in the above example of the tetrahedron. In general, it applies because, in a $k$-dimensional polytope, each $(k-2)$-dimensional face appears in two $(k-1)$-dimensional faces (provided $k \ge 2$; if $k=1$, then there are no $(k-2)$-dimensional faces, so $\partial_0 = 0$, and the property $\partial_0 \partial_1 = 0$ holds trivially). From the perspective of homology, this means boundaries aren’t “interesting” cycles. They’re the boundaries of something, after all, so they certainly don’t enclose a “hole.”

What we really want to measure, then, is how many cycles are not boundaries. To determine this, we first need to find out how many cycles and how many boundaries there are. Except we can add cycles together to get new cycles (in linear algebra terms, the kernel of a matrix is a subspace of the domain), and we can add boundaries to get new boundaries (the image of a matrix is also a subspace), so what we really want is to know how many independent cycles there are: that is, we want the dimension or rank of the set of cycles and the set of boundaries. I’ll use rank here, even though we’re working with vector spaces, because that terminology transfers to the case of integral homology.

The rank of the $k$-boundaries is the rank of $\partial_{k+1}$, because by definition this describes the maximal number of independent boundaries of $(k+1)$-chains. On the other hand, the rank of the $k$-cycles is the nullity of $\partial_k$, because this measures the maximal number of independent $k$-chains with no boundary. From linear algebra, we know that the rank of a matrix can be determined by row reducing to echelon form and counting the number of rows (equivalently, columns) that have leading ones.

Homology gets its name from the notion of homologous cycles (“homologous” meaning, etymologically, “having the same position or structure”). Two $k$-cycles are homologous if their difference is a $k$-boundary. Modulo 2, the difference of two objects is the same as their sum, so this just means that two cycles are homologous if, when we put them together, they form the boundary of an object of one higher dimension. Boundaries are “homologically trivial” because, by definition, they are homologous to the chain consisting of no cells, $0$. The $k$th homology of $X$ is the quotient (group, vector space, module, etc.) of the cycles and the boundaries: \[ H_k = Z_k/B_k. \] The associated numeric invariant is the $k$th Betti number $\beta_k$ of $X$, which is the rank of the $k$th homology. It can thus be computed as the difference between the rank of the $k$-cycles and that of the $k$-boundaries: \[ \beta_k = \mathrm{rank}\,Z_k - \mathrm{rank}\,B_k. \] This is the number that “counts” the “$k$-dimensional holes” in our space $X$. Note that this is an ordinary natural number, not an integer modulo 2. However, when working modulo 2, the Betti numbers entirely determine the homology, up to isomorphism. (In ordinary, integral homology, this is not the case: homology may have “torsion” elements, while the Betti numbers only count the “free” part of homology. The integral homology determines the mod 2 homology, but the reverse is not true, so homology modulo 2 is undoubtably “weaker,” and there are certainly times one would want the full theory. However, I hope this post is illustrating the benefits of using homology modulo 2 as a shortcut for introducing the key concepts.)

Examples of homology

Let’s return to the example of the tetrahedron. Using $\sim$ for row equivalence, we have

$ \partial_1 \sim \begin{bmatrix} 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} $ and $\partial_2 \sim \begin{bmatrix} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}$. The rank of both matrices is $3$. The nullity of the first matrix is $6 - 3 = 3$, and the nullity of the second matrix is $4 - 3 = 1$. Thus we have \[ \mathrm{rank}\,Z_1 = 3, \qquad \mathrm{rank}\,B_1 = 3, \qquad \mathrm{rank}\,Z_2 = 1, \qquad \mathrm{rank}\,B_2 = 0, \qquad \] the last quantity following from the fact that there are no 3-cells in the empty tetrahedron. We also know that \[ \mathrm{rank}\,Z_0 = 4, \qquad \mathrm{rank}\,B_0 = 3, \qquad \] with the first of this pair of equations coming from the fact that a point has no boundary. The Betti numbers of tetrahedron are \[ \beta_0 = 4 - 3 = 1, \qquad \beta_1 = 3 - 3 = 0, \qquad \beta_2 = 1 - 0 = 1. \] Here is a geometric interpretation of these numbers, in reverse order.

The equation $\beta_2 = 1$ means that there is one independent 2-cycle which is not a boundary. The reduced form of $\partial_2$ shows that this cycle is $f_1 + f_2 + f_3 + f_4$, i.e., the sum of all the faces of the tetrahedron. Thus, when we take all the faces together, the result is a closed cycle, and no other combination of faces has an empty boundary. Roughly speaking, the entire tetrahedron encloses a “hole.”
The equation $\beta_1 = 0$ can be read as “every 1-cycle is a 1-boundary.” A stronger form of this statement is that the tetrahedron is simply connected—every loop can be contracted to a point, or every closed loop on the tetrahedron is the boundary of something 2-dimensional. Roughly speaking, there are no holes on the surface of the tetrahedron.
The “holes” measured by the 0th homology are of a somewhat different type. Generally speaking, the Betti number $\beta_0$ measures the number of connected components. Because any point has no boundary on its own (hence is a 0-cycle), two vertices are are boundary if and only if they can be joined by a path of edges. Thus the equation $\beta_0 = 1$ simply means that the tetrahedron is connected.

Now let’s turn to the example of the torus, formed from a rectangle by identifying opposite sides. This space has one 2-cell $f$ (the interior of the torus), two 1-cells $e_1$ and $e_2$ (the edges of the rectangle, after being identified in pairs), and one 0-cell $v$ (all four vertices of the rectangle become a single point on the torus). Each edge $e_i$ appears twice on the boundary of $f$, and the vertex $v$ appears at both ends of each edge, so the boundary matrices are \[ \partial_1 = \begin{bmatrix} 0 & 0 \end{bmatrix}, \qquad\qquad \partial_2 = \begin{bmatrix} 0 \\ 0 \end{bmatrix}. \] Thus every $k$-chain has an empty boundary for $k = 0, 1, 2$, and the rank of the $k$-cycles equals the number of $k$-cells. The interpretations of $\beta_0 = 1$ and $\beta_2 = 1$ are the same as in the case of the tetrahedron. In this case, the equation $\beta_1 = 2$ tells us there are two different, independent 1-cycles, which can be represented by a latitude circle and a longitude circle on the torus.

Footnote on topological spaces

A few words justifying the restriction to polytopal complexes: When I was in Hatcher’s algebraic topology class, he chose to introduce cellular homology first so that we could get to computations quickly; later he introduced singular homology mainly to prove that the homology groups only depend on the underlying topological space. It thus seems entirely reasonable to me, for purposes of introduction, to work directly with CW complexes. The appendix to Hatcher’s book is a standard reference for learning about CW complexes, but in practice a CW complex usually means a topological space that is assembled from convex polytopes, attached along their faces.

Another introductory source on homology for undergraduates

I recently came across Peter Giblin’s book Graphs, Surfaces and Homology, which provides a very thorough introduction to its eponymous topics with only the prerequisite of linear algebra. However, like most treatments of homology, it first deals with integral homology, then comes around to homology modulo 2 late in the book, in Chapter 8, specifically to deal with non-orientable (or at least unoriented) surfaces and simplicial complexes. Gibson describes the theory of homology modulo 2 as “satisfactory” but “weaker than the theory with integral coefficients,” which is absolutely true.

However, if one’s goal is either to learn about homology quickly or to study new spaces (rather than, say, to prove the classification of surfaces), then I think homology modulo 2 is perfectly sufficient, particularly since the contemporary field of persistent homology, applied to study data sets in large dimensions, often works with homology modulo 2. (See this survey, or the remark on page 7 of this overview, for instance.)

Snell and Escher

2016-05-21T17:37:00.000-07:00

A few weeks ago, Grant Sanderson posted a video on the brachistochrone, with guest Steven Strogatz.

The video explains Johann Bernoulli’s solution to the problem of finding the brachistochrone, which is a clever application of Snell’s Law. I immediately wondered if a similar application could be used to explain the behavior of geodesics in the hyperbolic plane, which it turns out is true. I’m not the first to think of this, but it doesn’t seem to be well-known, so that’s what I’ll try to explain in this post. This may become my standard way of introducing hyperbolic geometry in informal settings, i.e., when formulas aren’t needed. (As an example of another exposition that describes hyperbolic geodesics this way, see the lecture notes for this geometry course.)

Snell’s Law, as represented in the above diagram (image source), applies to light traveling from one medium to another, where the interface between the two is horizontal. If light travels at speed $v_1$ in the first medium and $v_2$ in the second medium, and its trajectory meets the interface at an angle of $\theta_1$ and leaves at an angle of $\theta_2$ (both angles measured with respect to the vertical), then \[ \frac{\sin\theta_1}{v_1} = \frac{\sin\theta_2}{v_2}. \] This is the case of two distinct media. Snell’s Law has a continuous version (derived from the discrete one by a limiting process, as suggested in the video). Suppose light is traveling through a medium with the property that the speed of light at each point depends on the vertical position of the point. That is, the speed of light in this medium at a point $(x,y)$ is a function $v(y)$, which may vary continuously. At each point of a trajectory of light in this medium, let $\theta$ be the angle formed by the direction of the trajectory (i.e., the tangent line) and the vertical. Then the quantity \[ \frac{\sin\theta}{v(y)} \] is constant along the trajectory.

So suppose we are looking at a medium that covers the half-plane $y > 0$, in which light travels at a speed proportional to the distance from the $x$-axis: $v(y) = cy$. (The constant $c$ may be thought of as the usual speed of light in a vacuum, so that along the line $y = 1$ light moves at the speed we expect. As we shall see, this is analogous to the fact that distances along the line $y = 1$ in the hyperbolic metric match Euclidean distances. Of course, it also means that light moves faster than $c$ above this line, which is physically impossible, but we’re doing a thought experiment, so we’ll allow it.) If we imagine someone living inside this medium trying to look at an object, what direction should they face?

From our outside perspective, it seems that the observer should look “directly at” the object, in a straight (Euclidean) line. However, in this medium light does not travel along Euclidean line segments, but instead along curved arcs, as illustrated below.

Click on the graph to go to an interactive version.
It’s not too surprising that light follows a path something like this if it’s trying to minimize the time it takes to travel from the object to the observer: the light travels faster at higher vertical positions, so it’s worth going up at least slightly to take advantage of this property, and it’s also worth descending somewhat sharply so as to spend as little time as possible in the lower, slower regions.

What may come as a surprise is that the path of least time is precisely a circular arc. With Snell’s Law, however, this fact can be derived quickly. We have that $v(y) = cy$, and so along a light trajectory \[ \frac{\sin\theta}{cy} = \text{constant}. \] Multiplying both sides by $c$, we find that $\frac{\sin\theta}{y}$ is also a constant. If this constant is zero, then $\theta = 0$ constantly, so the path is a vertical segment. Otherwise, call this constant $\frac{1}{R}$. Then $y = R \sin\theta$. Now set $x = a + R \cos \theta$. The curve \[ (x,y) = (a + R \cos\theta, R \sin\theta) \] parametrizes a circle centered at $(a,0)$ by the angle between the $x$-axis and the diameter. It remains to see that this angle $\theta$ is the same as the angle between the vertical direction and the tangent line at the corresponding point of the circle. This equality can be shown in any number of ways from the diagram below.

Click on the graph to go to an interactive version.
This is not to say that this parametrization describes the speed at which light moves along the path. As previously observed, light slows as it approaches the horizontal boundary, that is, the $x$-axis.

But perhaps we’ve been prejudiced in assuming our perspective is the right one. We’ve been looking with our Euclidean vision and supposing light moves at different speeds depending on where it is in this half-plane. Thus it seems to us that light covers Euclidean distances more quickly the further it gets from the $x$-axis. But relativity teaches us that distance isn’t absolute: instead, the speed of light is what’s absolute. So perhaps we could gain greater insight by measuring the distance between points according to how long it takes light to travel between them. That is, we assume that the paths determined above are the geodesics of the half-plane, and by doing so we learn to “see” hyperbolically. Then we are not troubled by looking at an image like

(image source) and being told that all of the pentagonal shapes are the same size, because we’ve learned to look at things with our hyperbolic geometry glasses on.

M. C. Escher illustrated (or, more accurately, approximated) the hyperbolic geometry of the upper half-plane with his print Regular Division of the Plane VI (1958), shown below (image source).

This design was created during a time Escher was attempting to visually depict infinity. It was shortly before he had encountered the Poincaré disk in a paper by Coxeter, which discovery led to the Circle Limit series. In this print, the geometry of each lizard is Euclidean, structured around an isosceles right triangle. Each horizontal “layer” has two sizes of triangles, one scaled down from the other by a factor of $\sqrt{2}$. The side lengths of the triangles in one layer are one-half of those in the layer above, so the heights of layers converge geometrically to the horizontal boundary at the bottom. Some of the triangles are outlined in the next image.

Some questions I have about Escher’s print:

How different would this image look if it were drawn according to proper hyperbolic rules, with each lizard having reflectional symmetry, and each meeting of “elbows” having true threefold symmetry? (This would give the tessellation with Schläfli symbol {3,8}, an order-8 triangular tiling.)
If we suppose that the right triangles act as prisms, with light moving at a constant speed inside each one, but this speed being proportional to the square root of the triangle’s area, then what will the trajectories of light look like as it moves through the plane? Will they approximately follow circles?
How many lizards are in the picture?

Coda: Jos Leys has taken some of Escher’s Euclidean tessellations and converted them to hyperbolic ones, in both the disk and the half-plane model.

\((acbb)\)	\((a)(c)(bb)\)	\((a)(cb)(b)\)	\((ac)(b)(b)\)
\((a)(cbb)\)	\((ac)(bb)\)	\((acb)(b)\)	\((a)(c)(b)(b)\)