Saturday, June 04, 2022

in triplicate

Academic grades, as conventionally understood and used, have three distinct audiences, and they serve a different function for each audience. Grades are problematic largely due to the ways these functions are at odds with each other.

The three audiences are:

  • The student. In the context of a single course, assessment and feedback are necessary parts of the educational process. They let the student know how they’re doing as they go along. Traditional letter grades are a crude form of feedback, and where they are used students rarely have a chance to meaningfully respond, other than to try to adapt for the next assessment. Alternative grading systems often treat the entire course duration as a traning period, in which the student can respond to feedback through revision, reassessment, or other forms of demonstrated proficiency following an initial evaluation.
  • The institution. Let’s be as generous as possible and assume that the primary goal of the college or university in which a student is enrolled is to educate that student in such a way that they reach their fullest potential. This requires communication among the numerous instructors that student will encounter throughout their studies, and also coordination with the other offices and institutional bodies that are present to support the student. A grade can be a succinct summary to the next instructor in a sequence about how developed are the student’s prerequisite skills. It can also be a signal to the institution about how well the student is progressing in their overall academic career. (This is the reason for “mid-semester” grades that can be used to alert the school when a student needs assistance or intervention.)
  • The outside world. Outside of the institution (and inside as well, to an extent), grades become currency that the student can trade for prestige or opportunities. This currency may be in the form of a GPA (to which all of the individual course grades contribute, despite being largely incommensurable with each other) or, for a more granular view, a transcript (which provides the opportunity to craft a narrative about the grades). Of course, not all persons or organizations outside academia value this currency in the same way. But “good grades” provide an inexhaustible supply of recommendations, and “bad grades” are a perpetual obstacle to be minimized or navigated around.

Thus grades are expected to operate at three different social scales, and also at three different time scales. The feedback to a student within a course is short-term, the communcation with the institution is medium-term, and the message to the outside world is long-term (or as they say, permanent). Much of the “objectivity theater” surrounding the assignment of grades is based on the pretence that these three purposes can be fulfilled by a single summative object.

The fact that the third use of grades has both the largest audience and the longest-lasting effects means that it becomes their dominant purpose, their telos. Student anxiety about grades, for the most part, is caused not by an intrinsic dislike of getting feedback about how they can improve their understanding and performance in a subject, but by the belief that in the end what has the greatest practical impact is the final letter or number they can show to the outside world when the course is done. Institutional concerns about “rigor” are based not on the needs of the student, but the needs of the school to present the final scores to the outside world as meaningful indicators of their students’ quality.

Those of us engaged in the practice of developing and implementing alternative grading systems are primarily focused on the first and smallest-scale purpose of grades, providing a useful feedback process to the student. Yet our systems must also interface with the institutional and outside world audiences. At those interfaces lie, in my view, the most difficult ethical questions of grading: how do we support students beyond our time as their instructor? how do we provide an honest evaluation that meets the needs of all three audiences? to whom are we primarily responsible? is the merging of the three functions into a single metric flawed in such a way that it needs to be overthrown?

Honestly, I think (at the time of this writing) that grades are most useful at the institutional level. If it were not for the outward-facing use of grades, they could serve as a quick, qualitative (not quantitative) shorthand in communicating among the internal parts of a college or university what are the needs or successes of an individual student. (To be supplemented by more personalized detail as necessary.) Within a course, as we’ve seen, any number of assessment/feedback systems can work, as long as they’re built on clear communication and building trust in the student-faculty relationship. As for the presentation of grades to the outside world, well, that’s where the dirty work happens.

Monday, May 23, 2022

more on poetry and partitions (sort of)

In my previous post I defined the ordered Virahanka numbers $V_{m,n}$ and the unordered Virahanka numbers $U_{m,n}$ recursively by \[ \begin{array}{rl} V_{m,n} = U_{m,n} = 0 &\qquad \text{for $n < 0$ or $m < 0$} \\ V_{m,0} = U_{m,0} = 1 &\qquad \text{for $m \ge 0$} \\ V_{m,n} = \displaystyle\sum_{k=1}^m V_{m,n-k} &\qquad \text{for $n > 0$} \\ U_{m,n} = \displaystyle\sum_{k=1}^m U_{k,n-k} &\qquad \text{for $n > 0$} \\ \end{array} \] The value of $V_{m,n}$ counts the number of compositions of $n$ with no term greater than $m$, and the value of $U_{m,n}$ counts the number of partitions of $n$ with no part greater than $m$. (Following a standard convention, I consider the “empty sum” having no terms to be equal to $0$, which is why $V_{m,0} = U_{m,0} = 1$; moreover, $V_{0,n} = U_{0,n} = 0$ for $n > 0$, because the recursive formulas above become empty sums.) Some special cases:

  • $V_{2,n}$ is the sequence of Virahanka–Fibonacci numbers $1,1,2,3,5,8,\dots$.
  • $V_{n,n}$ is the total number of compositions of $n$ (which equals $2^{n-1}$ for $n \ge 1$).
  • $U_{n,n}$ is the total number of partitions of $n$ (which has no known closed-form expression).

For later reference, here are partial tables of values for $V_{m,n}$ and $U_{m,n}$. \[ V_{m,n}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 & 55 & 89 \\ \hline 3 & 1 & 1 & 2 & 4 & 7 & 13 & 24 & 44 & 81 & 149 & 274 \\ \hline 4 & 1 & 1 & 2 & 4 & 8 & 15 & 29 & 56 & 108 & 208 & 401 \\ \hline 5 & 1 & 1 & 2 & 4 & 8 & 16 & 31 & 61 & 120 & 236 & 464 \\ \hline 6 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 63 & 125 & 248 & 492 \\ \hline 7 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 127 & 253 & 504 \\ \hline 8 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 255 & 509 \\ \hline 9 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 511 \\ \hline 10 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 512 \end{array} \] \[ U_{m,n}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & 5 & 5 & 6 \\ \hline 3 & 1 & 1 & 2 & 3 & 4 & 5 & 7 & 8 & 10 & 12 & 14 \\ \hline 4 & 1 & 1 & 2 & 3 & 5 & 6 & 9 & 11 & 15 & 18 & 23 \\ \hline 5 & 1 & 1 & 2 & 3 & 5 & 7 & 10 & 13 & 18 & 23 & 30 \\ \hline 6 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 14 & 20 & 26 & 35 \\ \hline 7 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 21 & 28 & 38 \\ \hline 8 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 29 & 40 \\ \hline 9 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 41 \\ \hline 10 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 42 \end{array} \]

There are so many fun ways to play with these collections of numbers, it’s hard to know when to stop! To limit the scope of this post, I will only consider two related notions, and those only in part: first differences, and generating functions.

First differences

Given a collection of numbers $S_{m,n}$ indexed by integers $m$ and $n$, and a pair $(p,q) \in \mathbb{Z}^2$, let \[ S_{m,n}^{(p,q)} = S_{m,n} - S_{m-p,n-q}\text. \]

From the definitions, we have for $n > 1$ \[ \begin{array}{rl} V_{m,n}^{(0,1)} &= V_{m,n} - V_{m,n-1} \\ &= \sum_{k=1}^m V_{m,n-k} - \sum_{k=1}^m V_{m,(n-1)-k} \\ &= \sum_{k=1}^m \big(V_{m,n-k} - V_{m,(n-k)-1}\big) \\ &= \sum_{k=1}^m V_{m,n-k}^{(0,1)} \end{array} \] and so the first differences $V_{m,n}^{(0,1)}$ satisfy the same recurrence relation as the original numbers $V_{m,n}$ when $n\ge 2$. Here is the table of values for $V_{m,n}^{(0,1)}$. \[ V_{m,n}^{(0,1)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 2 & 1 & 0 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 \\ \hline 3 & 1 & 0 & 1 & 2 & 3 & 6 & 11 & 20 & 37 & 68 & 125 \\ \hline 4 & 1 & 0 & 1 & 2 & 4 & 7 & 14 & 27 & 52 & 100 & 193 \\ \hline 5 & 1 & 0 & 1 & 2 & 4 & 8 & 15 & 30 & 59 & 116 & 228 \\ \hline 6 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 31 & 62 & 123 & 244 \\ \hline 7 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 63 & 126 & 251 \\ \hline 8 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 127 & 254 \\ \hline 9 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 255 \\ \hline 10 & 1 & 0 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 \end{array} \] We see that the “initial conditions” of the recurrence $V_{m,n}^{(0,1)} = \sum_{k=1}^m V_{m,n-k}^{(0,1)}$ have shifted, so that before beginning to apply the recurrence relation, we insert one term equal to $0$. But what are these numbers counting? Well, $V_{m,n}$ is the number of compositions of $n$ with no term greater than $m$, and $V_{m,n-1}$ is the number of compositions of $n-1$ with no term greater than $m$. Any composition of $n-1$ can be converted to a composition of $n$ by adding a $1$ at the end. So $V_{m,n}^{(0,1)}$ equals the number of compositions of $n$ that do not end with $1$. Perhaps not the most interesting quantity to count, although having the recurrence formula is nice. (The same principle will lead to other interesting quantities, so don’t go away yet!)

On the other hand, $V_{m,n}^{(1,0)} = V_{m,n} - V_{m-1,n}$ does have an interesting interpretation. Because $V_{m-1,n}$ counts all compositions of $n$ whose largest term is at most $m-1$, $V_{m,n}^{(1,0)}$ equals the number of compositions of $n$ whose largest term is exactly $m$. Here’s a partial table of values. \[ V_{m,n}^{(1,0)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 0 & 0 & 1 & 2 & 4 & 7 & 12 & 20 & 33 & 54 & 88 \\ \hline 3 & 0 & 0 & 0 & 1 & 2 & 5 & 11 & 23 & 47 & 94 & 185 \\ \hline 4 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 & 27 & 59 & 127 \\ \hline 5 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 & 28 & 63 \\ \hline 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 & 28 \\ \hline 7 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 & 12 \\ \hline 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 5 \\ \hline 9 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 \\ \hline 10 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array} \] One interesting feature of this table is that the entries stabilize below the line $n = 2m$: when $n < 2m$, we see that $V_{m+1,n+1}^{(1,0)} = V_{m,n}^{(1,0)}$. The sequence $V_{m,2m-1}^{(0,1)}$ begins (starting with $m = 1$) \[ 1,\;2,\;5,\;12,\;28,\;64,\;144,\;320,\;704,\;\dots \] The OEIS says that this sequence comes from counting all the $1$s that appear in all compositions of $m$. Why should that sequence arise here? I guess that’s a mystery that will have to wait for later…

Turning to first differences of the unordered Virahanka numbers, let’s start with $U_{m,n}^{(1,0)}$. \[ U_{m,n}^{(1,0)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 0 & 0 & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & 5 \\ \hline 3 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 4 & 5 & 7 & 8 \\ \hline 4 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 5 & 6 & 9 \\ \hline 5 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 5 & 7 \\ \hline 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 & 5 \\ \hline 7 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 & 3 \\ \hline 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 2 \\ \hline 9 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 \\ \hline 10 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array} \] It looks like no new numbers have appeared in this table; each row has simply been shifted to right, with the $m$th row having $m$ new $0$s at the start. Indeed, using the recurrence formula for $U_{m,n}$, we calculate that \[ \begin{array}{rl} U_{m,n}^{(1,0)} &= U_{m,n} - U_{m-1,n} \\ &= \sum_{k=1}^m U_{k,n-k} - \sum_{k=1}^{m-1} U_{k,n-k} \\ &= U_{m,n-m} \end{array} \] Why should this be true, from a counting perspective? Reasoning as we did with $V_{m,n}^{(1,0)}$, we see that $U_{m,n}^{(0,1)}$ counts the number of partitions of $m$ whose largest part is exactly $m$. But if the largest part is exactly $m$, then it only remains to partition $n-m$ using parts no greater than $m$, which is exactly what $U_{m,n-m}$ counts! So $U_{m,n}^{(0,1)} = U_{m,n-m}$.

Ok, now for taking first differences of $U_{m,n}$ within rows. \[ U_{m,n}^{(0,1)}: \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 2 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ \hline 3 & 1 & 0 & 1 & 1 & 1 & 1 & 2 & 1 & 2 & 2 & 2 \\ \hline 4 & 1 & 0 & 1 & 1 & 2 & 1 & 3 & 2 & 4 & 3 & 5 \\ \hline 5 & 1 & 0 & 1 & 1 & 2 & 2 & 3 & 3 & 5 & 5 & 7 \\ \hline 6 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 3 & 6 & 6 & 9 \\ \hline 7 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 6 & 7 & 10 \\ \hline 8 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 7 & 7 & 11 \\ \hline 9 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 7 & 8 & 11 \\ \hline 10 & 1 & 0 & 1 & 1 & 2 & 2 & 4 & 4 & 7 & 8 & 12 \end{array} \] Hmm, that’s curious: this is the first time we’ve seen any cases (outside of the $0$th row or the $1$st column) where the entries occasionally decrease moving left to right. What exactly are we seeing? Well, any partition of $n-1$ can be converted to a partition of $n$ by adjoining a $1$, so the difference between $U_{m,n}$ and $U_{m,n-1}$ is the number of partitions of $n$ that do not include $1$ as a part and whose largest part is at most $m$. So,

  • In row $2$, we are only allowing partitions using $2$, so every even number has one partition, and every odd number has zero.
  • In row $3$, we are only allowing partitions using $2$ and $3$. By hand, we find that $2$, $3$, $4$, and $5$ have one partition each, but $6$ has two: $6 = 2 + 2 + 2 = 3 + 3$. However, $7$ only has one: $7 = 3 + 2 + 2$. From this point, each time we add $6$, we increase the number of possible partitions by $1$, so $U_{3,n+6} = U_{3,n}+1$.
As the rows go on, the jumps down become harder to predict, in my view. But we could plug them into the OEIS and see what comes up! Here’s what row 4 yields.

Evidently, there is more fun to be had, looking at first differences $V_{(m,n)}^{(p,q)}$ and $U_{m,n}^{(p,q)}$ for other values of $(p,q)$, second differences, and so on. But for now let’s turn to a new game.

Generating functions

Given a sequence of numbers $\{a_n\}$, the generating function of the sequence is the formal power series \[ g(x) = \sum a_n x^n\text. \] I have not included a starting or ending index in order to allow for whatever values of $n$ make sense. In cases where the power series converges to an analytic function on some neighborhood of $0$, this function is also called the generating function of the sequence, and the terms of the sequence are the Taylor coefficients of the function.

The Virahanka–Fibonacci numbers $V_{2,n}$ produce the generating function \[ g_2(x) = \sum_{n=0}^\infty V_{2,n}x^n\text. \] Because $V_{2,0} = 1$, $V_{2,n} = 0$ for $n < 0$, and $V_{2,n} = V_{2,n-1} + V_{2,n-2}$ for $n \ge 1$, we can rewrite the power series as \[ \begin{array}{rl} g_2(x) &= \displaystyle 1 + \sum_{n=1}^\infty (V_{2,n-1} + V_{2,n-2}) x^n \\ &= \displaystyle 1 + \sum_{n=1}^\infty V_{2,n-1} x^n + \sum_{n=1}^\infty V_{2,n-2} x^n \\ &= \displaystyle 1 + x\sum_{n=1}^\infty V_{2,n-1} x^{n-1} + x^2\sum_{n=1}^\infty V_{2,n-2} x^{n-2} \\ &= \displaystyle 1 + x\sum_{n=0}^\infty V_{2,n} x^n + x^2\sum_{n=0}^\infty V_{2,n} x^n \\ &= 1 + xg_2(x) + x^2g_2(x) \end{array} \] and therefore $g_2(x) = 1 + xg_2(x) + x^2g_2(x)$. Solving this equation, we find \[ g_2(x) = \frac{1}{1 - x - x^2}\text. \] By a similar process, if we set \[ g_m(x) = \sum_{n=0}^\infty V_{m,n}x^n \] then using $V_{m,0} = 1$ and $V_{m,n} = V_{m,n-1} + \cdots + V_{m,n-m}$ for $n > 0$, we get \[ g_m(x) = \frac{1}{1 - x - \cdots - x^m}\text. \] As $m \to \infty$, the series $x + x^2 + \dots + x^m$ that is subtracted in the denominator converges to $x/(1-x)$, so the generating functions converge to \[ g_{\infty}(x) = \frac{1}{1 - x/(1 - x)} = \frac{1 - x}{1 - 2x}\text. \] This is precisely equivalent to \[ g_{\infty}(x) = 1 + \sum_{n=1}^\infty 2^{n-1} x^n \] which is the generating function of the sequence $V_{n,n}$ to which the terms of $V_{m,n}$ stabilize as $m$ grows.

Now let’s find the generating functions of the $U_{m,n}$ sequences. For each $m \ge 0$, set \[ f_m(x) = \sum_{n=0}^\infty U_{m,n}x^n \] By inspection, we have \[ f_0(x) = 1 \qquad\text{and}\qquad f_1(x) = 1 + x + x^2 + \cdots = \frac{1}{1 - x}\text. \] (These are the same as $g_0(x)$ and $g_1(x)$.) We know that $U_{2,n} = U_{1,n-1} + U_{2,n-2}$ for $n \ge 1$, and so \[ \begin{array}{rl} f_2(x) &= \displaystyle 1 + \sum_{n=1}^\infty (U_{1,n-1} + U_{2,n-2}) x^n \\ &= \displaystyle 1 + x\sum_{n=1}^\infty U_{1,n-1}x^{n-1} + x^2\sum_{n=1}^\infty U_{2,n-2} x^{n-2} \\ &= \displaystyle 1 + x\sum_{n=0}^\infty U_{1,n}x^n + x^2\sum_{n=0}^\infty U_{2,n} x^n \\ &= 1 + xf_1(x) + x^2f_2(x) \end{array} \] This implies $(1 - x^2) f_2(x) = 1 + xf_1(x)$, or \[ f_2(x) = \left(1 + \frac{x}{1 - x} \right) \frac{1}{1 - x^2} = \frac{1 - x + x}{1 - x} \frac{1}{1 - x^2} = \frac{1}{(1 - x)(1 - x^2)}\text. \] Proceeding inductively, a similar process shows that \[ f_m(x) = 1 + xf_1(x) + x^2f_2(x) + \cdots + x^mf_m(x) \] and therefore \[ f_m(x) = \frac{1}{(1-x)(1-x^2)\cdots(1-x^m)} \] for all $m \ge 2$. On compact subsets of $(-1,1)$, $x^m$ converges uniformly to $0$, and so $f_m$ converges uniformly on compact subsets of $(-1,1)$ to \[ f_\infty(x) = \prod_{k=1}^\infty \frac{1}{1 - x^k} \] which is the generating function of $U_{n,n}$, the sequence that counts all partitions of $n$.

Now that we have the generating functions $g_m(x)$ and $f_m(x)$ in hand, we can have more fun! Note that \[ x\sum a_n x^n = \sum a_n x^{n+1} = \sum a_{n-1} x^n \] and so multiplying a generating function by $x$ shifts all the terms of the sequence to the right by one position. (We have already used this fact above; it’s just worth making the general principle explicit.) This allows us to find the generating functions of the first differences calculated in the previous section. We’ll consider these in the reverse order from what we did previously.

For $U_{m,n}^{(1,0)}$, the generating function is \[ \begin{array}{rl} f_m(x) - f_{m-1}(x) &= \dfrac{1}{(1 - x)\cdots(1 - x^m)} - \dfrac{1}{(1 - x)\cdots(1 - x^{m-1})} \\ &= \dfrac{x^m}{(1 - x)\cdots(1 - x^m)} = x^mf_m(x) \end{array} \] Hey, that’s great! This matches what we just saw: the coefficients of $f_m(x) - f_{m-1}(x)$ are the same as those of $f_m(x)$, but shifted $m$ places to the right.

For $U_{m,n}^{(0,1)}$, the generating function is \[ \begin{array}{rl} f_m(x) - xf_m(x) &= \dfrac{1}{(1 - x)\cdots(1 - x^m)} - \dfrac{x}{(1 - x)\cdots(1 - x^m)} \\ &= \dfrac{1 - x}{(1 - x)\cdots(1 - x^m)} = \dfrac{1}{(1 - x^2)\cdots(1 - x^m)} \end{array} \] Oh, neat—we wanted a function that counts the number of partitions that don’t contain $1$ as a part, and the factor $1-x$ vanished! This suggests that if we want to count partitions that only allow $a_1,\dots,a_N$ then perhaps we should use the generating function $[(1-x^{a_1})\cdots(1-x^{a_N})]^{-1}$? Something to explore further…

Now for $V_{m,n}^{(1,0)}$, the generating function is \[ \begin{array}{rl} g_m(x) - g_{m-1}(x) &= \dfrac{1}{1 - x - \cdots - x^m} - \dfrac{1}{1 - x - \cdots - x^{m-1}} \\ &= \dfrac{x^m}{(1 - x - \cdots - x^m)(1 - x - \cdots - x^{m-1})} \\ &= x^m g_m(x) g_{m-1}(x)\text. \end{array} \] Hmm, this definitely tells us something interesting. Multiplying out the power series for $g_m$ and $g_{m-1}$, and equating coefficients, we find \[ V_{m,n}^{(1,0)} = \sum_{k=0}^{n-m} V_{m,k} V_{m-1,n-m-k}\text. \] Recall from our interpretation of $V_{m,n}^{(1,0)}$ that it counts the number of compositions of $n$ whose largest term is exactly equal to $m$. Indeed, since we know the largest term is $m$, we can sort the compositions by the last term that equals $m$. The sum of the terms before this will be equal to some $k \le n - m$, for which there are $V_{m,k}$ compositions with no terms greater than $m$. After the last term equal to $m$, the remaining terms will sum to $n - k - m$, for which there are $V_{m-1,n-k-m}$ compositions whose terms are all less than $m$. We thus could have found this equality earlier in our study, but the generating function suggested it immediately! (I had hoped that looking at the generating functions would also help explain why $V^{(1,0)}_{m,2m-1}$ equals the number of $1$s in all compositions of $m$, as observed above, but I haven’t quite got there yet… possible update to come!)

For $V_{m,n}^{(0,1)}$, the generating function is \[ \begin{array}{rl} g_m(x) - xg_m(x) &= \dfrac{1}{1 - x - \cdots - x^m} - \dfrac{x}{1 - x - \cdots - x^m} \\ &= \dfrac{1 - x}{1 - x - \cdots - x^m} \end{array} \] Oh, so changing the initial conditions of the sequence just corresponds to changing the numerator of the generating function. That’s neat! (Indeed, the Lucas numbers $L_n$ famously satisfy the same recursive formula as the Virahanka–Fibonacci numbers, $L_n = L_{n-1} + L_{n-2}$, and their generating function is $(2-x)/(1-x-x^2) = g_2(x) + (g_2(x)-xg_2(x))$.)

I can’t help doing one more thing with these generating functions, especially since we haven’t seen much interaction between the numbers $V_{m,n}$ and $U_{m,n}$. Let’s try dividing their generating functions. We know that $V_{m,n}$ is at least as large as $U_{m,n}$, so we’ll divide $g_m$ by $f_m$: \[ \frac{g_m}{f_m} = \frac{(1-x)\cdots(1-x^m)}{1-x-\cdots-x^m}\text. \] What could the coefficients of this generating function represent? Loosely speaking, they carry the information about the number of compositions (with no part greater than $m$) that is not contained in the number of partitions. That’s fairly vague, so let’s consider the case $m = 2$. We calculate \[ \frac{g_2}{f_2} = \frac{(1-x)(1-x^2)}{1-x-x^2} = 1 + \frac{x^3}{1-x-x^2} = 1 + x^3 g_2 \] which gives us the factorization $g_2 = f_2\big(1 + x^3 g_2\big)$, or $g_2 = f_2 + x^3 f_2 g_2$. Equating coefficients of the power series yields the relation \[ V_{2,n} = U_{2,n} + \sum_{k=0}^{n-3} U_{2,k} V_{2,n-3-k}\text. \] This equation has a sensible meaning. Recall that a partition of $n$ can be thought of as a composition (a sum of positive integers summing to $n$) in which the terms are non-increasing. (For example, $2 + 2 + 1 + 1 + 1$ is a partition of $7$, but $2 + 1 + 1 + 2 + 1$ is not.) The right-hand side of the last equation splits the number of compositions into the number of partitions $U_{2,n}$ and the number of non-partitions. If we are only allowing terms equal to $1$ or $2$, then a composition that is not a partition must have a $1$ followed by a $2$ somewhere in its expression. In poetic terms, we can think of $1$ and $2$ as short and long syllables, and $n$ as the number of beats in a line. We sort compositions by where the first appearance of $1 + 2$ occurs. Everything before that in the sum is non-increasing; say this takes $k$ beats (this is the $U_{2,k}$ factor in the sum on the right). Then we have $1 + 2$, which accounts for $3$ more beats, and finally the remaining $n-3-k$ beats can be subdivided in any manner using shrot and long syllables (which is what the $V_{2,n-3-k}$ factor counts).

The sequence of coefficients for $g_4(x)/f_4(x)$ begins \[ 1,\;0,\;0,\;1,\;2,\;5,\;8,\;16,\;30,\;58,\;113,\;217,\;418,\;\dots\text. \] As of this writing (May 2022) this sequence does not appear in the OEIS (should I submit it?). However, the sequence of coefficients for \[ \frac{g_\infty}{f_\infty} = 1 + x^3 + 2x^4 + 5x^5 + 9x^6 + 19x^7 + 37x^8 + 74x^9 + 148x^{10} + \cdots \] does appear, as A178841, which counts “the number of pure inverting compositions of $n$.” These seem to be related to a basis for the algebra of quasisymmetric functions as a module over the symmetric functions. So there’s something deep and important going on with these quotients!

In case it isn’t obvious from this post, I’m still learning about generating functions, and I’m kind of in awe of them. It’s entirely possible that everything in this post appears among the exercises of some book on combinatorics. (Certainly most of them are mild generalizations of facts that can be found in the OEIS.) One of my summer activities may be to finally get through Herbert Wilf’s book generatingfunctionology (the second edition is available for free download at the link).

Wednesday, May 18, 2022

on poetry and partitions

Last fall, I learned the names of Pingala and Hemachandra from a blog post by James Propp. Both have been described as poets and mathematicians, and both produced works related to what are commonly known as the Fibonacci sequence \[ 1,\;1,\;2,\;3,\;5,\;8,\;13,\;21,\;34,\;55,\;89,\;144,\;\dots\text. \] Pingala lived around the 3rd century BCE, and Hemachandra lived in the 12th century CE. (The book Liber abaci—which contains the rabbit problem that associated Fibonacci’s name with the above sequence, along with other, more useful ideas—was produced later, in 1202.) In between them came Virahanka, sometime between the 6th and 8th centuries CE; following Manjul Bhargava, it is Virahanka’s name I will emphasize in this post.

In Sanskrit poetry, each syllable is either short or long, lasting one or two beats. Some forms of poetic meter have a fixed number of syllables in each line, and some have a fixed number of beats in each line. In the latter case, a natural question (asked at least as early as Pingala) is: how many ways can a line of $n$ beats be metrically divided into short and long syllables? For example, a line with three beats could be arranged “short, short, short” or “long, short” or “short, long”. This corresponds to the equalities $3 = 1+1+1 = 2+1 = 1+2$.

Virahanka seems to have been the first to provide a complete answer, stating that each line of $n$ beats must end with either a short syllable or a long syllable; therefore, the number of possibilities equals the sum of the numbers of possibilities for a line with $n-1$ beats and a line with $n-2$ beats. Because a line with one beat can be divided in only one way, and a line with two beats can be divided in two ways (“short, short” or “long”), the resulting sequence is the same as Fibonacci’s rabbit-counting sequence.1 (The additional initial $1$ in the sequence above will be accounted for momentarily.)

Not only does poetic meter provide a more natural setting for this counting problem than rabbits, it is more reasonably generalized. Suppose, for instance, that instead of just two lengths of syllables, we had a language with three lengths of syllables: one beat, two beats, and three beats. Now how many metrical ways are there to compose a poetic line of $n$ beats? As before, we can work out the first few cases by hand: $1 = 1$, $2 = 1 + 1 = 2$, $3 = 1 + 1 + 1 = 2 + 1 = 1 + 2 = 3$. For larger numbers of beats, as before we can sort by the length of the final syllable ($1$, $2$, or $3$), and then count how many ways there are to complete the line, using our knowledge of previous line lengths. For example, a line with four beats can be divided in seven ways, four that end with a $1$-beat syllable, two that end with a $2$-beat syllable, and one that ends with a $3$-beat syllable: \[ \begin{array}{rl} 4 &= 1 + 1 + 1 + \boxed{1} = 2 + 1 + \boxed{1} = 1 + 2 + \boxed{1} = 3 + \boxed{1} \\ &= 1 + 1 + \boxed{2} = 2 + \boxed{2} \\ &= 1 + \boxed{3}\text. \end{array} \] In this sequence, each term is therefore the sum of the previous three terms: \[ 1,\;2,\;4,\;7,\;13,\;24,\;44,\;81,\;149,\;274,\;504,\;\dots\text. \] (Has anyone seen a biological application of this sequence, even an implausible one?)2

Generalizing further, suppose we have syllables of lengths $1$, $2$, …, $m$, and we wish to count the number of rhythmic ways to compose a line of $n$ beats. Or put another way, how many sums with terms between $1$ and $m$ add up to $n$?3 A sum of positive integers equalling $n$ is called a composition of $n$. Let $V_{m,n}$ be the number of compositions of $n$ whose largest term is no greater than $m$. I will call $V_{m,1},V_{m,2},V_{m,3},\dots$ the $m$th Virahanka sequence, so that $V_{2,n}$ is the usual $n$th Fibonacci number. By the same reasoning as in the case of $V_{2,n}$, we have \[ V_{m,n} = V_{m,n-1} + V_{m,n-2} + \cdots + V_{m,n-m}, \qquad n > m. \]

We can naturally extend $V_{m,n}$ to be defined when $m$ or $n$ equals $0$, or both. Using the convention that the “empty sum” having no terms is equal to $0$, we get $V_{m,0} = 1$ for all $m$, and $V_{0,n} = 0$ for $n \ge 1$. We can even set $V_{m,n} = 0$ when $n < 0$, which allows us to drop the $n > m$ restriction entirely in the recursive formula above.

Interesting patterns4 emerge when we arrange the numbers $V_{m,n}$ in a single table:

\[ \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 & 55 & 89 \\ \hline 3 & 1 & 1 & 2 & 4 & 7 & 13 & 24 & 44 & 81 & 149 & 274 \\ \hline 4 & 1 & 1 & 2 & 4 & 8 & 15 & 29 & 56 & 108 & 208 & 401 \\ \hline 5 & 1 & 1 & 2 & 4 & 8 & 16 & 31 & 61 & 120 & 236 & 464 \\ \hline 6 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 63 & 125 & 248 & 492 \\ \hline 7 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 127 & 253 & 504 \\ \hline 8 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 255 & 509 \\ \hline 9 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 511 \\ \hline 10 & 1 & 1 & 2 & 4 & 8 & 16 & 32 & 64 & 128 & 256 & 512 \end{array} \]

Looking along and below the diagonal, for instance, we see that $V_{m,n} = 2^{n-1}$ if $1 \le n \le m$, and also $V_{n-1,n} = 2^{n-1} - 1$. These values make sense if we imagine $n$ beats in a poetic line, which may be grouped up into syllables of lengths $1$, $2$, …, $m$. If $m \ge n$, then any grouping is allowed, and so we just need to decide at which of the $n-1$ spaces between beats to break the syllables, yielding $2^{n-1}$ possibilities. Likewise, if $m = n-1$, then the only grouping that is not allowed is to put all the beats into a single syllable. (These values also follow from the recursive equation for $V_{m,n}$, using the formula for summing geometric series: $1 + 2 + \cdots + 2^{n-1} = 2^n - 1$.)

What happens if we disregard the order of terms in the sum? Or said another way, what if each syllable in a line of poetry must be no longer than the previous one (that is, since order doesn’t matter, we always sort the syllables from largest to smallest)? These correspond to sums in which the terms are non-increasing. Let us call the number of such sums equalling $n$ with no terms larger than $m$ the unordered Virahanka number $U_{m,n}$. For example, \[ \begin{array}{ll} 6 &= 1 + 1 + 1 + 1 + 1 + 1 \\ &= 2 + 2 + 2 = 2 + 2 + 1 + 1 = 2 + 1 + 1 + 1 + 1 \\ &= 3 + 3 = 3 + 2 + 1 = 3 + 1 + 1 + 1 \end{array} \] and so $U_{3,6} = 7$. Here is the table of unordered Virahanka numbers:

\[ \begin{array}{c|c|c|c|c|c|c|c|c|c} m\backslash n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & 5 & 5 & 6 \\ \hline 3 & 1 & 1 & 2 & 3 & 4 & 5 & 7 & 8 & 10 & 12 & 14 \\ \hline 4 & 1 & 1 & 2 & 3 & 5 & 6 & 9 & 11 & 15 & 18 & 23 \\ \hline 5 & 1 & 1 & 2 & 3 & 5 & 7 & 10 & 13 & 18 & 23 & 30 \\ \hline 6 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 14 & 20 & 26 & 35 \\ \hline 7 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 21 & 28 & 38 \\ \hline 8 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 29 & 40 \\ \hline 9 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 41 \\ \hline 10 & 1 & 1 & 2 & 3 & 5 & 7 & 11 & 15 & 22 & 30 & 42 \end{array} \]

Again the entries stabilize below the diagonal. The entries $U_{n,n}$ are generally called the number of partitions of $n$. Correspondingly, the remaining values of $U_{m,n}$ count the number of partitions of $n$ with no part greater than $m$, which is why $U_{m,n} = U_{n,n}$ if $m \ge n$. By considering partitions whose largest part is $k$, with $k = 1, \dots, m$, we get a recursive formula for these numbers, as well: \[ U_{m,n} = U_{1,n-1} + U_{2,n-2} + \cdots + U_{m,n-m}. \] Unlike the case of $V_{m,n}$, which can be calculated using only earlier terms in the $m$th (ordered) Virahanka sequence, determining $U_{m,n}$ recursively requires knowing entries from all previous rows.

There is a nice visualization that expresses the relationship between the unordered Virahanka numbers $U_{m,n}$ and the ordered Virahanka numbers $V_{m,n}$. A partition is determined by the number of times each integer appears. So we can think of $U_{m,n}$ as counting the number of integer points $(i_1,\dots,i_m)$ such that $i_k \ge 0$ for all $1\le k\le m$ and $\sum\nolimits_{k=1}^m ki_k = n$. This last condition defines a hyperplane orthogonal to $(1,2,\dots,m)$.

If $m = 2$, for each $n$ we count the number of integer points on the line $x + 2y = n$ in the first quadrant (including the origin and the positive $x$- and $y$- axes).

From this image, you can see the sequence $1, 1, 2, 2, 3, 3, \dots$, or $1+\lfloor n/2 \rfloor$, emerge.

In the case $m = 3$, $U_{3,n}$ counts the number of integer points in the positive octant such that $x + 2y + 3z = n$. The image below shows the 10 points corresponding to partitions of $n = 8$.

Now we need to convert partitions to compositions. A partition in which $k$ appears $i_k$ times corresponds to the $m$-tuple $(i_1,\dots,i_m)$. A composition with the same number of appearances of each $k$ can be obtained by placing the terms of the partition in any order; the number of distinguishable ways of arranging these terms is given by the multinomial coefficient \[ \binom{i_1+\cdots+i_m}{i_1,\dots,i_m} = \frac{(i_1+\cdots+i_m)!}{i_1!\cdots i_m!}\text. \] For example, the partition $8 = 3 + 2 + 2 + 1$ can be arranged into $\binom{1+2+1}{1,2,1} = \frac{4!}{1!2!1!} = 12$ different compositions, first by choosing the position of the $1$, then the positions of the two $2$s, and then placing $3$ in the remaining spot: \[ \begin{array}{rl} 8 &= 3 + 2 + 2 + 1 = 2 + 3 + 2 + 1 = 2 + 2 + 3 + 1 \\ &= 3 + 2 + 1 + 2 = 2 + 3 + 1 + 2 = 2 + 2 + 1 + 3 \\ &= 3 + 1 + 2 + 2 = 2 + 1 + 3 + 2 = 2 + 1 + 2 + 3 \\ &= 1 + 3 + 2 + 2 = 1 + 2 + 3 + 2 = 1 + 2 + 2 + 3 \end{array} \] Thus if we think of each point $(i_1,\dots,i_m)$ as being weighted by this corresponding multinomial coefficient, we can write $V_{m,n}$ as a sum having $U_{m,n}$ terms, as follows: \[ V_{m,n} = \sum_{(i_1,\dots,i_m):\sum ki_k = n} \binom{i_1+\cdots+i_m}{i_1,\dots,i_m}\text. \] When $m = 2$, we have $i_1 + 2i_2 = n$, so $i_1 + i_2 = n - i_2$ and $i_1 = n - 2i_2$, and by setting $j = i_2$ we get the following expression of $V_{2,n}$ (remember, these are the usual Fibonacci numbers) as a sum of binomial coefficients: \[ V_{2,n} = \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n - j}{n - 2j} = \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n - j}{j}\text. \] The binomial coefficients are, of course, the entries in what is commonly known as Pascal’s triangle. But this also was known to Indian scholars, possibly even Pingala: a later commentator, Halayudha (10th century CE), explicitly arranged the same numbers into a triangle, under the name of Meru Prastaara5, and clarified Pingala’s references to them in relation to poetic meter.

When $m = 3$, the formula is a little more elaborate, so let's first look at an example. We have seen that there are seven partitions of $6$ that have no part greater than $3$, corresponding to the triples $(i_1,i_2,i_3) = (6,0,0)$, $(4,1,0)$, $(2,2,0)$, $(0,3,0)$, $(3,0,1)$, $(1,1,1)$, and $(0,0,2)$. (Remember that these are the triples of positive integers such that $i_1 + 2i_2 + 3i_3 = 6$.) The resulting sum in terms of multinomial coefficients is \[ \begin{array}{rl} V_{3,6} &= \displaystyle\binom{6}{6,0,0} + \binom{5}{4,1,0} + \binom{4}{2,2,0} + \binom{3}{0,3,0} + \binom{4}{3,0,1} + \binom{3}{1,1,1} + \binom{2}{0,0,2} \\ &= 1 + 5 + 6 + 1 + 4 + 6 + 1 = 24 \end{array} \] Now for the general sum6: setting $j = i_2$ and $k = i_3$, we have \[ V_{3,n} = \sum_{k=0}^{\lfloor n/3 \rfloor} \sum_{j=0}^{\lfloor (n-3k)/2 \rfloor} \binom{(n-2j-3k)+j+k}{n-2j-3k,j,k} = \sum_{k=0}^{\lfloor n/3 \rfloor} \sum_{j=0}^{\lfloor (n-3k)/2 \rfloor} \binom{n-j-2k}{n-2j-3k,j,k}\text. \]

I’m sure there is much more depth to be explored regarding these numbers, probably all of it known to combinatorialists, but I have to pause here! I hope to post a part 2 soon.


Footnotes:

1. I first learned about a similar way of obtaining the Virahanka–Fibonacci sequence from the book Proofs that Really Count by Arthur Benjamin and Jennifer Quinn. In that text, the problem is to count the number of ways to tile an $n\times1$ grid using $1\times1$ and $2\times1$ pieces, called squares and dominoes. Then they use this visual presentation to prove several common identities involving terms of the sequence.
2. Some sources, including the OEIS, call numbers in this sequence “Tribonacci numbers.” Cute, but nonsensical, given that the “fi-” prefix refers to the “son” of Bonacci, not to any counting procedure. However, the OEIS does attribute the coining of “tribonacci” to Mark Feinberg in 1963, when he was a 9th grader, and that is honestly a high pedigree—how often does a 14-year-old get to invent a mathematical term that endures for over half a century?
3. Exactly this kind of generalization seems to have been considered by the 14th century mathematician Narayana Pandita. It might therefore be reasonable to name the resulting sequences “Virahanka–Narayana” sequences. See “The So-called Fibonacci Numbers in Ancient and Medieval India” by Parmanand Singh and “Nārāyana’s Generalisation of Mātrā-vrtta-prastāra and the Generalised Virahānka-Fibonacci Representation of Numbers” by Raja Sridharan, R Sridharan and M D Srinivas.
4. Hopefully I’ll write more about these patterns soon in a followup post.
5. According to P. Singh in “The So-called Fibonacci Numbers in Ancient and Medieval India”, “meru is the name of the imaginary mountain that is supposed to stand at the center of the earth.”
6. At least a version of this sum was known to Narayana Pandita. He derived the multinomial coefficients through something called a “needle sequence” or “arrow-head” sequence, which I have not yet fully understood. See the references in Footnote 3.

Friday, May 28, 2021

a geometric interpretation for the sum of a geometric series

Start with the geometric series: \[ \sum_{n=0}^\infty z^n = 1 + z + z^2 + z^3 + \cdots\text. \] If $z$ is not a real number, the partial sums of this series form a spiral pattern, which appears to be self-similar. Below is the pattern for $z = (1+i)/2$. (Click for interactive graph.)

A similarity of the plane can be expressed as a (complex) affine function $f(w) = aw + b$. To determine $a$ and $b$ for our spiral, we note that $f$ should satisfy \[ f(0) = 1 \qquad\text{and}\qquad f(1) = 1+z\text. \] The first equation implies $b = 1$. From the second equation, we then have $a + 1 = 1 + z$, so $a = z$. Our function is therefore $f(w) = zw + 1$, with $z$ treated as a constant.

Now we check whether this function $f(w)$ does in fact show that the spiral is self-similar. Indeed, $f(1+z) = z(1+z) + 1 = 1 + z + z^2$, and more generally, the sequence $1$, $f(1)$, $f(f(1))$, $f(f(f(1)))$, … coincides with the sequence of partial sums of the geometric series.

If $z \ne 1$, then the function $f(w)$ has a fixed point. Solve $f(w) = w$, or $zw + 1 = w$, to find that the fixed point is $w = 1/(1 - z)$. When $|z| < 1$, this fixed point is the sum of the series. But in general it has a nice geometric connection to the series, even when the series diverges: it is the center of similarity for the sequence of partial sums. For example, when $z = i+1/\sqrt3$, the partial sums and center look like this:

In particular, when $z$ is a root of unity, the partial sums of the series lie at the vertices of a regular polygon, and $1/(1-z)$ is the center of this polygon. (It is also the Cesàro sum of the series in that case.) Here are the polygons and centers for $z = e^{i\pi/4}$ and $z = e^{i\,3\pi/5}$:

The centers of these polygons all have real part $1/2$, which is a special case of the observation that the function $1/(1-z)$ sends the unit circle to the line $\mathrm{Re}\,z = 1/2$.

When $z = -1$, the polygon collapes to a line segment, and of course the center of this segment is $1/2$, the usual value assigned to the series $1 - 1 + 1 - 1 + \cdots$.

Saturday, March 14, 2020

initial thoughts on teaching in a time of crisis

For those reading in the future, this is the week that the reality of the COVID-19 (coronavirus) pandemic struck the U.S. Hundreds of schools announced that their campuses would be closing, and students would be expected to continue their studies from home, or wherever else they might find to stay. The president of our university and the dean of the college announced on Wednesday that face-to-face instruction would continue through Friday, students would have to leave campus by Sunday afternoon (unless they received an exception), and instruction would resume remotely the following Wednesday. That leaves Monday and Tuesday for redesigning courses and implementing them in a new format. Some instructors managed to get an earlier start before the week ended, but my mind and time were occupied with trying to wrap up well with students in person, and also home life obligations (I have a two-year-old daughter, and my wife is a graduate student with a full-time job), so I’m really just beginning to collect my thoughts.

I am entirely in the camp of those who describe this new mode of teaching as “remote” rather than ”online” learning (and I am grateful that is the language our administration has chosen to use). I do not plan to create an online class, and I do not have any pretensions that I could make a good one in the time available. Even in ordinary circumstances I forget to update course pages on our LMS half the time. I’m focused on what’s going to happen during the class meetings. I’m probably going to have to record a few lectures, but I’ll also look for other video resources that already exist, because me making a video will be either entirely off-the-cuff or hours of planning and scripting. I’m focused on the “remote” part, as a substitute for focusing on class meetings. Students are going to be studying on their own, in a wide variety of settings and living situations. The global upheaval and concomitant personal stresses make it likely that calculus or abstract algebra is not the primary concern in their lives. I want to give them the best chance to learn despite those conditions. The “online” element is present strictly to mitigate the isolation of self-study. Thus I plan to use online elements in ways that will maintain our community and help students feel like their study is meaningful. We’ll have online discussions, to the extent possible given the geographic diversity, and I plan to formulate new projects that will allow students to implement new knowledge in ways that directly affect their understanding of the world. Ideally, these projects themselves will enable assessment of the relevant skills, and I’ll be able to rely less on traditional test formats.

I am not in either camp regarding whether this switch to remote teaching is beneficial or detrimental. In fact, I would describe myself as wholeheartedly ambivalent on that matter. On the one hand, nothing about this situation is normal. Literally the entire world is focused on containing a disease that could kill tens or hundreds of millions of us. Many social institutions (schools, religious groups, local government) have shuttered their brick-and-mortar locations. Friends are instantly made distant, and any challenges to family life are made proximate. On the other hand, I believe my students truly value education. Continuing classes means maintaining some measure of normalcy. It is an element of life where one has some control, unlike almost everything else around us. And I have seen learning of complex mathematics happen in truly extraordinary circumstances, such as during the time that I taught in Peace Corps. (I do not support the argument, made by some in our community, that we can manage this “because we’ve done it before.” A year and a half ago, our college switched to remote instruction for three weeks due to wildfires; that was a more traumatic time, though a briefer one. This represents a fundamental shift in how we complete our courses, not just maintain progress in the short term, and its ramifications for higher education and our university in particular should not be minimized. The fact that, at the administrative level, so many decisions seem to be driven by the threat of future problems with accreditation exceeds my ability to worry.) For now, it is my job, and it is one way that I can contribute to bettering the world.

I am mindful of those for whom this time presents even greater challenges: people who already suffer from loneliness and isolation; people without homes, for whom the loss of public spaces and services will fray an already thin network of support; people in prison, who are neglected in the best of times, despite being in custody of the state; people in immigrant detention centers, who are constantly treated shamefully and forced to live in appalling conditions. My job and my position do not exonerate me from doing what I can to aid them, as well.

Monday, July 15, 2019

cardioid, deltoid, folium

The cardioid and the deltoid are two of my favorite curves. They arise in similar ways: one is an epicycloid, and the other is a hypocycloid. In a sense, each is the simplest non-trivial example of their respective type. They make excellent examples for calculus problems. But as I learned this week, they are actually the same curve.

This post is about the claim made in italics in the previous paragraph. Obviously I don’t mean that the classical constructions mentioned above (and described below) produce the same curves in the Euclidean plane. Rather, they are the same from the perspective of complex projective geometry. When I searched for this fact on Google after uncovering it for myself, I only found one mention of it, in a textbook from 1923 entitled An Introduction to Projective Geometry. I assume it was well-known at the time, and today is probably known to certain algebraic geometers, but it seems worth explicating for a larger audience.

First, the curves. Epicycloids and hypocycloids are both examples of roulettes, curves traced out by a point marked on one curve, which is free to move, as it rolls along another curve, which is fixed, without slipping. To generate an epicycloid or hypocycloid, both the fixed curve and the moving curve are circles; the difference is that for an epicycloid, the rolling circle is outside the fixed circle, and for a hypocycloid the rolling circle is on the inside. The shape of the epicycloid or hypocycloid is determined by the ratio of the circles’ radii. For an epicycloid, we can choose a 1:1 ratio, which means the marked point on the rolling circle makes contact with the fixed circle once as the outer circle completes a circuit. A hypocycloid cannot be constructed from circles whose radii have a 1:1 ratio, and a 2:1 ratio simply produces a line segment, so the simplest hypocycloid arises from a 3:1 ratio. The construction of these simplest examples is illustrated below. (These animations were created using a Desmos graph with the help of GIFsmos.) The first is called the cardioid (“heart-like”) and the second is the deltoid (“triangle-like”).

In both cases, the rolling circle is given a radius of 1, and in both cases the centers of the two circles remain at a distance of 2. By watching carefully, one can see that in both cases the marked point makes two revolutions around the center of the rolling circle. For the cardioid, these revolutions are counterclockwise, and so the cardioid can be parameterized by \[ (2\cos\theta + \cos2\theta, 2\sin\theta + \sin2\theta)\text. \] In the case of the deltoid, the marked point’s revolutions are made clockwise, and so the deltoid can be parameterized by \[ (2\cos\theta + \cos2\theta, 2\sin\theta - \sin2\theta)\text. \] These formulas are very similar, but certainly not the same, and the pictures they produce are quite different. So how can I claim that the curves are the same?

Our first step toward understanding the claim involves switching to complex numbers. If we collect the \(x\)- and \(y\)-coordinates of the plane \(\mathbb{R}^2\) into a single complex coordinate, then the parameterizations above become

\(2e^{i\theta} + e^{2i\theta} \qquad\) and \(\qquad 2e^{i\theta} + e^{-2i\theta}\).
Now we want to extend to the complex plane \(\mathbb{C}^2\) (note: I think of \(\mathbb{C}\) as the complex line because it is one-dimensional as a complex vector space). A standard trick is to add a second coordinate that is conjugate to the first, which makes the parameterizations
\(\big(2e^{i\theta} + e^{2i\theta},2e^{-i\theta} + e^{-2i\theta}\big) \qquad\) and \(\qquad \big(2e^{i\theta} + e^{-2i\theta},2e^{-i\theta} + e^{2i\theta}\big)\).
Now let’s set \(t = e^{i\theta}\) and allow \(t\) to take on all complex values (except \(0\), but we’ll take care of that later) instead of just values on the unit circle. At the same time, let’s label the parameterizations \(\gamma_C\) and \(\gamma_D\), with \(C\) standing for cardioid and \(D\) for deltoid. This gives us
\(\gamma_C(t) = \left(2t + t^2,\dfrac{2}{t} + \dfrac{1}{t^2}\right) \qquad\) and \(\qquad \gamma_D(t) = \left(2t + \dfrac{1}{t^2},\dfrac{2}{t} + t^2\right)\).
We still can see superficial similarities in these formulas, but not enough to conclude that they define equivalent curves. In order to see their equivalence, we need to see what’s happening at infinity, which means introducing some projective geometry.

The complex projective line \(\mathbb{P}^1\), also known as the Riemann sphere, is obtained by adding a single point, labeled \(\infty\), to the ordinary complex line \(\mathbb{C}\). The points of \(\mathbb{P}^1\) may be thought of as the “slopes” of lines through the origin in \(\mathbb{C}^2\). Indeed, it is often useful to assign coordinates to \(\mathbb{P}^1\) using non-zero vectors \((s,t)\) in \(\mathbb{C}^2\), where two vectors correspond to the same point of \(\mathbb{P}^1\) if they are scalar multiples of each other, \((s,t)\sim(\lambda s,\lambda t)\) if \(\lambda\in\mathbb{C}\setminus\{0\}\). We write the equivalence class of \((s,t)\) as \([s:t]\); these are called homogeneous coordinates on \(\mathbb{P}^1\). We can recover \(\mathbb{P}^1\) as \(\mathbb{C}\cup\{\infty\}\) by sending \([s:t]\) to the slope \(t/s\) if \(s \ne 0\); then \([0:1]\) is sent to \(\infty\).

In a similar way, we can extend \(\mathbb{C}^2\) to the complex projective plane \(\mathbb{P}^2\) by adding points at infinity, and the most convenient way to do so is by homogenous coordinates. We start with non-zero vectors \((u,v,w)\) in \(\mathbb{C}^3\) and consider \((\lambda u, \lambda v, \lambda w)\) to define the same point of \(\mathbb{P}^2\) as \((u,v,w)\) if \(\lambda\in\mathbb{C}\setminus\{0\}\). Then \([u:v:w]\) are homogeneous coordinates on \(\mathbb{P}^2\). The points with \(u\ne0\) correspond to points of the original complex plane \(\mathbb{C}^2\), by sending \([u:v:w]\) to \((v/u,w/u)\). The points with \(u=0\) constitute the new line at infinity, which is just a copy of \(\mathbb{P}^1\) with coordinates \([0:v:w]\).

Now we can extend the cardioid and the deltoid to curves in \(\mathbb{P}^2\), not just \(\mathbb{C}^2\). We start with the parameterizations \(\gamma_C\) and \(\gamma_D\), append an initial coordinate of 1, then clear denominators (we can do this because of the equivalence that defines homogeneous coordinates). Then we get

\(\gamma_C(t) = \big[t^2:2t^3 + t^4:2t + 1\big] \qquad\) and \(\qquad \gamma_D(t) = \big[t^2:2t^3 + 1:2t + t^4\big]\).
These allow for the possibility of \(t = 0\), but apparently leave out the point at infinity \(\infty\), so we make one more modification, replacing \(t\) with \(t/s\) and again clearing denominators to obtain
\(\gamma_C([s:t]) = \big[s^2 t^2:2st^3 + t^4:2s^3t + s^4\big] \qquad\) and
\(\qquad \gamma_D([s:t]) = \big[s^2t^2:2st^3 + s^4:2s^3t + t^4\big]\).
Here we see a feature characteristic of maps from one projective space to another, when homogeneous coordinates are used: each component of the map must be homogeneous of the same degree (in this case, four). By expressing the parameterizations of the cardioid and the deltoid in this way, we see that both curves touch the line at infinity at the two points \([0:1:0]\) and \([0:0:1]\), corresponding to \([0:1]\) and \([1:0]\), respectively, for the cardioid, and in the reverse order for the deltoid. Still this isn’t enough to show that the curves are the same! We need one more ingredient.

A projective transformation of \(\mathbb{P}^1\) or \(\mathbb{P}^2\) is induced by a linear transformation of the homogeneous coordinates. Readers who are already familiar with the Riemann sphere will recognize projective transformations of \(\mathbb{P}^1\) as Möbius transformations (also known as fractional linear transformations): given \(a,b,c,d\in\mathbb{C}\), we can convert \([s:t] \mapsto [as+bt:cs+dt]\) to a Möbius transformation in the coordinate \(z = s/t\), where it becomes \(z \mapsto \dfrac{az+b}{cz+d}\). The condition for this function to be invertible is \(ad - bc \ne 0\), which is the same as the condition for the matrix \(\begin{bmatrix}a & b \\ c & d\end{bmatrix}\) to be invertible. In the same way, projective transformations of \(\mathbb{P}^2\) arise from invertible linear transformations of \(\mathbb{C}^2\). Two objects in \(\mathbb{P}^1\) or \(\mathbb{P}^2\) are called projectively equivalent if there is a projective transformation that carries one to the other. And now we can state precisely what was meant in the opening paragraph:

The cardioid and the deltoid are projectively equivalent in \(\mathbb{P}^2\).

But how do we find the projective equivalence? A clue may be found in one clear difference between the original curves drawn in the Euclidean plane, which niggled at me while I was trying to figure out their relationship. The deltoid clearly has three cusps, while the cardioid apparently only has one. If the curves are equivalent, where are the other cusps of the cardioid? The answer: on the line at infinity!

How can we tell? It’s time to apply some differential geometry and look at the tangent lines of these two curves. Returning to the parameterizations in terms of \(t\), we find

\(\gamma_C'(t) = \left(2 + 2t,-\dfrac{2}{t^2} - \dfrac{2}{t^3}\right) \qquad\) and \(\qquad \gamma_D'(t) = \left(2 - \dfrac{2}{t^3},-\dfrac{2}{t^2} + 2t\right)\).
Now a line in \(\mathbb{C}^2\), with coordinates \((v,w)\), passing through \((a,b)\) in the direction \((s,t)\) has the equation \(\begin{vmatrix} s & v - a \\ t & w - b \end{vmatrix} = 0\). Thus the tangent line to the cardioid at \(\gamma_C(t)\) has the equation \[ \begin{vmatrix} 2 + 2t & v - \big(2t + t^2\big) \\ -\frac{2}{t^2} - \frac{2}{t^3} & w - \big(\frac{2}{t} + \frac{1}{t^2}\big) \end{vmatrix} = 0 \] which, after some simplification, becomes \[ wt^3 - 3t^2 - 3t + v = 0\text{.} \] This is the line equation of the cardioid. In a similar fashion, we can find the line equation of the deltoid, which is \[ t^3 - vt^2 + wt - 1 = 0\text{.} \]

Having the line equation of a curve, in terms of a parameter \(t\), can be useful in several ways. As \(t\) varies over \(\mathbb{P}^1\), it produces all the tangent lines of the curve. (We’ll clarify what happens when \(t = \infty\) in a moment.) But we can also let \((v,w)\) vary over \(\mathbb{C}^2\) and find, for each point, which tangent lines of the curve pass through that point. Because the line equations of the cardioid and the deltoid are cubic polynomials in \(t\), most points of \(\mathbb{C}^2\) will lie on three tangent lines. Those points that lie on fewer than three tangent lines play a special role.

Let’s illustrate first with the deltoid. We’ll be looking at lots of cube roots, so let \(\omega = e^{i\,2\pi/3}\); this means that \(\omega^3 = 1\) and \(1 + \omega + \omega^2 = 0\). When \((v,w)=(0,0)\), the line equation becomes \(t^3 - 1 = 0\), so the tangent lines of the deltoid that pass through the origin correspond to the parameters \(1\), \(\omega\), and \(\omega^2\). Indeed, the three points \(\gamma_D(0) = (3,3)\), \(\gamma_D(\omega) = (3\omega,3\omega^2)\), and \(\gamma_D(\omega^2) = (3\omega^2,3\omega)\) are the three cusps of the deltoid. On the other hand, a point that belongs to the deltoid lies on tangent lines corresponding to at most two parameters (two of the points of tangency have “coalesced”). For example, when \((v,w)=(-1,-1)\), the line equation becomes \(t^3 + t^2 - t - 1 = 0\), or \((t+1)^2(t-1) = 0\). At a cusp, all three tangent lines coincide: for example, when \((v,w)=(3,3)\), the line equation is \(3t^3 - 3t^2 + 3t - 3 = 3(t-1)^3 = 0\). See the pictures below.

We can homogenize the line equation of the deltoid by replacing \(t\) with \(t/s\) and \((v,w)\) with \((v/u,w/u)\) and clearing denominators to obtain: \[ ut^3 - vst^2 + ws^2t - us^3 = 0\text. \] When \([s:t] = [1:0]\) or \([0:1]\) (remember, this second point in homogeneous coordinates corresponds to \(t=\infty\)), we get the same equation of the tangent line, \(u = 0\). This is the equation of the line at infinity, so the line at infinity is tangent to the deltoid at both \([0:0:1]\) and \([0:1:0]\)! A line that is tangent to a curve at two points is called a bitangent.

The cardioid also has a bitangent, which is easier to see: when \(t = \omega\) or \(t = \omega^2\), respectively, the line equation of the cardioid becomes \(w - 3\omega^2 - 3\omega + v = 0\) or \(w - 3\omega - 3\omega^2 + v = 0\), both of which are equivalent to \(v + w = 3\). The visible cusp occurs at \((-1,-1)\), where the line equation becomes \((t + 1)^3 = 0\). For an example of more generic behavior, look at \((-3,-3)\), where the line equation becomes \(3t^3 + 3t^2 + 3t + 3 = 0\), or \(3(t + 1)(t + i)(t - i) = 0\). See pictures below.

The homogeneous version of the cardioid’s line equation is \[ wt^3 - 3ust^2 - 3us^2t + vs^3 = 0\text. \] When \([s:t] = [1:0]\), this becomes the \(w\)-axis \(v = 0\), and when \([s:t] = [0:1]\), we get \(w = 0\). In each of these cases, we see that only one tangent line passes through the point, just as we saw for the cusps of the deltoid. So we have identified the three cusps of the cardioid—\([1:-1:-1]\), \([0:0:1]\), and \([0:1:0]\). The tangent lines through all three of these cusps pass through the origin in \(\mathbb{C}^2\), with homogeneous coordinates \([1:0:0]\).

We now have enough information to show the equivalence of the cardioid and the deltoid. To define a projective transformation from \(\mathbb{P}^1\) to itself, we need to specify where three points go; to define a projective transformation from \(\mathbb{P}^2\) to itself, we need to specify the images of four points, no three of which are collinear. We’ll show how to transform the line equation of the deltoid into the line equation of the cardioid via pullback.

We’re looking for projective transformations \(f : \mathbb{P}^1 \to \mathbb{P}^1\) and \(g : \mathbb{P}^2 \to \mathbb{P}^2\) such that \(\gamma_D \circ f = g \circ \gamma_C\). Starting with \(f\), we require

\(f\big([1:0]\big) = [1:\omega]\),   \(f\big([0:1]\big) = [\omega:1]\),   and   \(f\big([1:-1]\big) = [1:1]\),
so that the parameters of the cardioid’s cusps are sent to those of the deltoid’s cusps. This can be accomplished by defining \[ f\big([s:t]\big) = [s - \omega t : \omega s - t]\text. \] Meanwhile, \(g\) needs to satisfy
\(g\big([0:0:1]\big) = [1:3\omega:3\omega^2]\),  \(g\big([0:1:0]\big) = [1:3\omega^2:3\omega]\),
\(g\big([1:-1:-1]\big) = [1:3:3]\),   and   \(g\big([1:0:0]\big) = [1:0:0]\),
which is accomplished by \[ g\big([u:v:w]\big) = [ 3u+v+w : 3\omega^2 v + 3\omega w : 3\omega v + 3\omega^2 w ]\text. \] Now substitute the components of \(f\) and \(g\) into the variables of the deltoid’s line equation, expand, and simplify. The result is the line equation of the cardioid. You can calculate this by hand, or just let SageMath do it for you:

One of the curves mentioned in the title of this post has been conspicuously absent so far: the folium of Descartes. This is another favorite curve of mine, invariably given in my calculus classes as an exercise in implicit differentiation. Its equation is \(x^3 + y^3 = xy\).

So what’s the connection between this curve and the others? Well, if we extract the coefficients from the deltoid’s line equation and use them to define a new curve \(\gamma_F\), we get \[ \gamma_F\big([s:t]\big) = [ s^3 - t^3 : st^2 : -s^2 t ]\text, \] which parameterizes \[ v^3 + w^3 = uvw\text, \] the homogeneous version of the folium’s equation. This means that the folium is dual to the deltoid (and thus also to the cardioid)! The tangent lines of the cardioid/deltoid have been converted into points of the folium, and likewise points of the cardioid/deltoid become tangent lines of the folium. Just as each point of \(\mathbb{C}^2\) lies on three tangent lines of the cardioid/deltoid, counted with multiplicity, each line of \(\mathbb{C}^2\) intersects the folium at three points, counted with multiplicity. The bitangent of the deltoid and cardioid has been converted into a point of self-intersection. If we look at points of the form \([1:v:\bar{v}]\), then the threefold symmetry of the folium is revealed (the three asymptotic directions correspond to the three tangent lines that pass through the origin, which as we saw are the tangent lines at the cusps).

Saturday, December 29, 2018

an IBL preface

In just over a week, I will distribute to students the first piece of the complex variables notes I have been writing. Here is a preface to be included with the notes, to motivate the IBL structure. The details of the class will be spelled out in the syllabus; this is just to set the tone.

You are the creators. These notes are a guide.

The notes will not show you how to solve all the problems that are presented, but they should enable you to find solutions, on your own and working together. They will also provide historical and cultural background about the context in which some of these ideas were conceived and developed. You will see that the material you are about to study did not come together fully formed at a single moment in history. It was composed gradually over the course of centuries, with various mathematicians building on the work of others, improving the subject while increasing its breadth and depth.

Mathematics is essentially a human endeavor. Whatever you may believe about the true nature of mathematics—does it exist eternally in a transcendent Platonic realm, or is it contingent upon our shared human consciousness? is math “invented” or “discovered”?—our experience of mathematics is temporal, personal, and communal. Like music, mathematics that is encountered only on as symbols on a page remains inert. Like music, mathematics must be created in the moment, and it takes time and practice to master each piece. The creation of mathematics takes place in writing, in conversations, in explanations, and most profoundly in the mental construction of its edifices on the basis of reason and observation.

To continue the musical analogy, you might think of these notes like a performer’s score. Much is included to direct you towards particular ideas, but much is missing that can only be supplied by you: participation in the creative process that will make those ideas come alive. Moreover, the success of the class will depend on the pursuit of both individual excellence and collective achievement. Like a musician in an orchestra, you should bring your best work and be prepared to blend it with others’ contributions.

In any act of creation, there must be room for experimentation, and thus allowance for mistakes, even failure. A key goal of our community is that we support each other—sharpening each other’s thinking but also bolstering each other's confidence—so that we can make failure a productive experience. Mistakes are inevitable, and they should not be an obstacle to further progress. It’s normal to struggle and be confused as you work through new material. Accepting that means you can keep working even while feeling stuck, until you overcome and reach even greater accomplishments.

These notes are a guide. You are the creators.