Sunday, June 25, 2017

addition of fractions is bilinear

The prototypical example of a bilinear operation is multiplication, \(b(x,y)=xy\). For a function of two variables, bilinear means “linear in each variable.” That means multiplication has the following properties: \[ b(x_1+x_2,y) = b(x_1,y) + b(x_2,y) \] \[ b(x,y_1+y_2) = b(x,y_1) + b(x,y_2) \] \[ b(cx,y) = b(x,cy) = cb(x,y) \] Most operations that are called some kind of “product” (e.g., inner product, cross product, wedge product) get this name, at least in part, because they have the above properties.

Addition is also a function of two variables, but it is not linear in either variable; all of the above axioms fail. So how could it be bilinear? Read on!

Fractions are curious things, because they are pairs of things: a fraction has a numerator (which numerates, meaning it counts some number of things) and a denominator (which denominates, meaning it names the type of thing being counted). And we assume that the numerator and denominator are themselves the same type of thing (whether integers, polynomials, or what have you), so that we can make sense of the equality of fractions: we say \(p/q\) and \(r/s\) are equal if \(qr = ps\).

(I have heard that my grandfather-in-law, who was also a college math professor, claimed it was self-evident that college students should be confused by fractions, in rebuttal to those who would say that fractions are trivial once one has reached the level of college mathematics; \(2/4\) and \(1/2\) are either not clearly the same, or clearly not the same.)

Leaving aside the equality of fractions for a moment, addition of fractions is also a curious thing. The most obvious operation is to add the numerators and denominators separately. This is sometimes called Farey addition, which is a rich topic that truly deserves its own treatment. Let’s use \(\oplus\) to write this kind of addition: \[ \frac{p}{q} \oplus \frac{r}{s} = \frac{p + r}{q + s}. \] That’s not how addition of fractions is usually defined, however. Instead, we produce a new denominator which is a product of the old denominators, and a new numerator which takes into account both the old numerators and the old denominators in a product-y sort of way: \[ \frac{p}{q} + \frac{r}{s} = \frac{ps + qr}{qs} \] (Indeed, you might recognize the numerator as the inner product of \((p,q)\) and \((s,r)\), or of \((q,p)\) and \((r,s)\). In either case, one of the pairs has its numerator and denominator switched.) Perhaps now you see where the title of the post comes from: addition of fractions is bilinear when we treat the fractions as pairs (Farey addition is, after all, essentially vector addition): \[ \left(\frac{p_1}{q_1}\oplus\frac{p_2}{q_2}\right) + \frac{r}{s} = \left(\frac{p_1}{q_1} + \frac{r}{s}\right) \oplus \left(\frac{p_2}{q_2} + \frac{r}{s}\right) \] \[ \frac{p}{q} + \left(\frac{r_1}{s_1}\oplus\frac{r_2}{s_2}\right) = \left(\frac{p}{q} + \frac{r_1}{s_1}\right) \oplus \left(\frac{p}{q} + \frac{r_2}{s_2}\right) \] \[ \frac{cp}{cq} + \frac{r}{s} = \frac{p}{q} + \frac{cr}{cs} = \frac{c(ps + qr)}{c(qs)} \] The last property above is what makes addition of fractions well-defined, meaning that if we replace one fraction in the sum with an equivalent fraction, then the resulting sum is equivalent to the original sum.

This is all very cute and would simply be a curiosity, but I’m writing about it because this property of bilinearity recently helped me understand something about continued fractions.

A finite continued fraction is an expression of the form \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{\ddots + \cfrac{b_{n-1}}{a_{n-1} + \cfrac{b_n}{a_n}}}}} \] (In a simple continued fraction we assume that all of the \(b_i\) equal \(1\), but for my purposes here there’s no reason to make that assumption.) The operations are nested, so to find the value of this expression, we start from the bottom. For example, \[ 1 + \cfrac{2}{3 + \cfrac{4}{5 + \cfrac{6}{7}}} = 1 + \cfrac{2}{3 + \cfrac{4}{\cfrac{41}{7}}} = 1 + \cfrac{2}{3 + \cfrac{28}{41}} = 1 + \cfrac{2}{\cfrac{151}{41}} = 1 + \dfrac{82}{151} = \dfrac{233}{151} \] An infinite continued fraction has the same form as a finite continued fraction, except that the nesting doesn’t stop: \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{a_3 + \ddots}}} \] To make sense of this expression, we need—as is typical in such cases—to consider a sequence of finite continued fractions that should “approximate” this infinite continued fraction. If we truncate the infinite continued fraction after \(a_n\), then we get its \(n\)th convergent, which we write as \(p_n/q_n\). If the sequence of convergents converges, well, then, its limit is the value of the infinite continued fraction. (The Mathologer has an excellent video on infinite continued fractions, which includes an example of an apparent paradox that can arise from an infinite continued fraction that doesn’t converge.)

The definition is nice as far as it goes, but wouldn’t it be even nicer to know how successive convergents are related? That is, how can we get \(p_{n+1}/q_{n+1}\) from earlier convergents? Fortunately, there are simple recurrence relations due to Euler and Wallis: \[ p_{n+1} = a_{n+1} p_n + b_{n+1} p_{n-1}, \qquad q_{n+1} = a_{n+1} q_n + b_{n+1} q_{n-1}. \] It was in trying to understand these formulas that I got stuck. Finite continued fractions, as explained above, are computed from the bottom up. But when we move from the \(n\)th convergent to the \((n+1)\)st convergent of an infinite continued fraction, we add a new term to the bottom. That is, we change the start of the computation, not the end of it. That throws off my instincts for how recurrence should work.

It’s easy to find proofs of the Euler–Wallis recurrence relations. Most of them use induction, which is fine. But many also implicitly use the fact that continued fractions “can be” defined with real numbers for \(a_i\) and \(b_i\), not just integers. I never specified integers in the definitions above, either, but in many cases one wants to make that restriction, including the purpose for which I’ve been studying continued fractions. I felt it should be possible to prove such a fundamental relationship without leaving the natural “context” of whatever type of numbers was allowed to begin with.

Here is where the bilinearity of fraction addition came to my aid. One of the nice things about bilinear operations is that, if you fix one input, the result is just an ordinary linear function with respect to the other input, which can represented by a matrix. If we fix \(p/q\), say, and add to it a variable fraction \(x/y\), and represent the fractions as vectors (since we’re thinking of them as pairs anyway), then the calculation of the sum \(p/q+x/y\) becomes \[ \begin{pmatrix} py + qx \\ qy \end{pmatrix} = \begin{pmatrix} q & p \\ 0 & q \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}. \] When computing continued fractions, we repeatedly start with a fraction, invert it, multiply the (new) numerator by some quantity, and add another value. All of these have representations as linear transformations: the reciprocal of \(x/y\) is given by \[ \begin{pmatrix} y \\ x \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}. \] Multiplying the numerator of a fraction \(x/y\), say by a constant \(c\), is given by \[ \begin{pmatrix} cx \\ y \end{pmatrix} = \begin{pmatrix} c & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] Putting together the above linear transformations, we find that if \(a\) and \(b\) are fixed, then the matrix form of the transformation that sends \(x/y\) to \(a+b/(x/y)\) is \[ \begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix} \begin{pmatrix} b & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} a & b \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] We can find \(p_n/q_n\), then, by starting with \(a_n = a_n/1\) and applying the above procedure repeatedly: \[ \begin{pmatrix} p_n \\ q_n \end{pmatrix} = \begin{pmatrix} a_0 & b_1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} a_1 & b_2 \\ 1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & b_n \\ 1 & 0 \end{pmatrix} \begin{pmatrix} a_n \\ 1 \end{pmatrix} \] However, something is aesthetically “off” with this formula: the \(a\) term and the \(b\) term in each matrix on the right have different indices.

Let's try rearranging our steps so that like indices get grouped together: first invert the fraction, then add \(a\), then multiply the denominator of the result by \(b\) (which works out correctly, because in the next round the first thing we’ll do is invert again). This sequence of steps produces the formula \[ \begin{pmatrix} 1 & 0 \\ 0 & b \end{pmatrix} \begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} a & 1 \\ b & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] To make this work in computing the \(n\)th convergent of a continued fraction, we need to start with \(a_n/b_n\) (already promising), and we need to set \(b_0 = 1\) so that when we divide by it, we don’t change the final value (we were missing a \(b_0\) anyway, which was also somewhat unsatisfying). Therefore, to find \(p_n/q_n\) by repeated linear transformations, we have \[ \begin{pmatrix} p_n \\ q_n \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n \\ b_n \end{pmatrix} \] We can improve this formula a bit more by thinking about where \(a_n/b_n\) came from. If we followed the same steps as before, then we must have inverted something, added it to \(a_n\), and multiplied the denominator of the result by \(b_n\). The thing we must have added to \(a_n\) is \(0 = 0/1\), which came from “inverting” \(1/0\). Thus, if we set \(p_{-1} = 1\) and \(q_{-1} = 0\), then the following equation holds for all \(n \ge 0\): \[ \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \] Whenever you end up with a formula this beautiful, you know you must have done something right.

It was an encounter with this matrix representation in a set of number theory notes that sparked the thoughts that led to this post. That set of notes did not contain a proof of the matrix formula, however. A few other sources that I have since found do use this formulation and prove it, such as this set of notes, which also includes a similar discussion to mine.

Back to the question of proving the Euler–Wallis recurrence relations. Suppose we have computed the \(n\)th convergent of a continued fraction, and we wish to proceed to the \((n+1)\)st convergent. In our new formulation, that just means appending another matrix to the product: \[ \begin{pmatrix} p_{n+1} & p_n \\ q_{n+1} & q_n \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \begin{pmatrix} a_{n+1} & 1 \\ b_{n+1} & 0 \end{pmatrix} \]

If we group all but the last matrix on the right together, use the previous equation, and then carry out the final multiplication, we get: \[ \begin{pmatrix} p_{n+1} & p_n \\ q_{n+1} & q_n \end{pmatrix} = \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} \begin{pmatrix} a_{n+1} & 1 \\ b_{n+1} & 0 \end{pmatrix} = \begin{pmatrix} a_{n+1} p_n + b_{n+1} p_{n-1} & p_n \\ a_{n+1} q_n + b_{n+1} q_{n-1} & q_n \end{pmatrix} \] Thus, the Euler–Wallis recurrence relations are a consequence of the fact that matrix multiplication is associative (and also the fact that addition of fractions is bilinear!).

As a bonus, the difference between successive convergents is now seen to be a consequence of the fact that the determinant is multiplicative: \[ \begin{vmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{vmatrix} = \left\vert \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \right\vert \] \[ p_n q_{n-1} - p_{n-1} q_n = (-1)^{n+1} b_1 b_2 \cdots b_n \] \[ \frac{p_n}{q_n} - \frac{p_{n-1}}{q_{n-1}} = (-1)^{n+1} \frac{b_1 b_2 \cdots b_n}{q_n q_{n-1}} \] An infinite continued fraction can therefore be written as an alternating series: \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{a_3 + \ddots}}} = a_0 + \sum_{n=1}^\infty (-1)^{n+1} \frac{b_1 \cdots b_n}{q_n q_{n-1}}. \] When all of the \(a_i\)s and \(b_i\)s are positive, the condition for this series to converge is essentially that the \(b_i\)s don’t grow too quickly with respect to the denominators \(q_n\). In particular, if \(b_i = 1\) for all \(i\) (as in the case of a simple fraction) and \(a_i \ge 1\) for all \(i\), then the \(q_n\)s grow exponentially fast, and so the series (and hence the infinite continued fraction) is guaranteed to converge!

No introductory piece on continued fractions is complete without mentioning Fibonacci numbers and the golden ratio. Let \(\phi\) be the number defined by the infinite continued fraction \[ \phi = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \ddots}}} \] that is, \(a_i = b_i = 1\) for all \(i\). Then we have \[ p_{-1} = 1, \qquad q_{-1} = 0, \qquad p_0 = 1, \qquad q_0 = 1 \] \[ p_{n+1} = p_n + p_{n-1}, \qquad q_{n+1} = q_n + q_{n-1} \qquad \forall\ n. \] Then, with the convention that the Fibonacci numbers \(F_n\) are defined by \[ F_0 = 0, \qquad F_1 = 1, \qquad F_{n+1} = F_n + F_{n-1} \] we have \(p_n = F_{n+2}\) and \(q_n = F_{n+1}\) for all \(n \ge -1\). Using the matrix formulation for computing convergents of \(\phi\), this means \[ \begin{pmatrix} F_{n+2} & F_{n+1} \\ F_{n+1} & F_n \end{pmatrix} = \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^{n+1} \] (Try it!) On the other hand, the continued fraction equation for \(\phi\) implies that \[ \phi = 1 + \dfrac{1}{\phi}, \] which is the defining equation for the golden ratio, and has the solution \(\phi = \dfrac{1 + \sqrt{5}}{2}\). (We reject the negative solution because the value of the continued fraction is positive.) So it follows immediately that \[ \lim_{n\to\infty} \frac{F_{n+1}}{F_n} = \frac{1 + \sqrt{5}}{2} = 1 + \sum_{n=1}^\infty \frac{(-1)^n}{F_n F_{n+1}}. \]

If I have time, another day I’ll explain what all of this has to do with hyperbolic geometry, the Farey–Ford tessellation, and the dynamics of circle rotations.

No comments: