tag:blogger.com,1999:blog-306112022017-07-24T05:49:24.801-07:00Thales’ trianglesTeaching and doing mathematics in a liberal arts context. Exploring the meaning of life. Occasionally posting chronicles and observations.Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.comBlogger168125tag:blogger.com,1999:blog-30611202.post-60384644092950624792017-06-25T16:45:00.000-07:002017-06-25T18:51:51.743-07:00addition of fractions is bilinear<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p>The prototypical example of a bilinear operation is multiplication, \(b(x,y)=xy\). For a function of two variables, <em>bilinear</em> means “linear in each variable.” That means multiplication has the following properties: \[ b(x_1+x_2,y) = b(x_1,y) + b(x_2,y) \] \[ b(x,y_1+y_2) = b(x,y_1) + b(x,y_2) \] \[ b(cx,y) = b(x,cy) = cb(x,y) \] Most operations that are called some kind of “product” (e.g., inner product, cross product, wedge product) get this name, at least in part, because they have the above properties.</p> <p>Addition is also a function of two variables, but it is not linear in either variable; all of the above axioms fail. So how could it be <u>bi</u>linear? Read on!</p> <p>Fractions are curious things, because they are <em>pairs</em> of things: a fraction has a <em>numerator</em> (which <em>numerates</em>, meaning it counts some number of things) and a <em>denominator</em> (which <em>denominates</em>, meaning it names the type of thing being counted). And we assume that the numerator and denominator are themselves the same type of thing (whether integers, polynomials, or what have you), so that we can make sense of the equality of fractions: we say \(p/q\) and \(r/s\) are equal if \(qr = ps\).</p> <p>(I have heard that my grandfather-in-law, who was also a college math professor, claimed it was self-evident that college students should be confused by fractions, in rebuttal to those who would say that fractions are trivial once one has reached the level of college mathematics; \(2/4\) and \(1/2\) are either not clearly the same, or clearly not the same.)</p> <p>Leaving aside the equality of fractions for a moment, addition of fractions is also a curious thing. The most obvious operation is to add the numerators and denominators separately. This is sometimes called <em><a href="http://mathworld.wolfram.com/FareySequence.html" target="_blank">Farey addition</a></em>, which is a rich topic that truly deserves <a href="https://www.youtube.com/watch?v=0hlvhQZIOQw" target="_blank">its own treatment</a>. Let’s use \(\oplus\) to write this kind of addition: \[ \frac{p}{q} \oplus \frac{r}{s} = \frac{p + r}{q + s}. \] That’s not how addition of fractions is usually defined, however. Instead, we produce a new denominator which is a product of the old denominators, and a new numerator which takes into account both the old numerators and the old denominators in a product-y sort of way: \[ \frac{p}{q} + \frac{r}{s} = \frac{ps + qr}{qs} \] (Indeed, you might recognize the numerator as the inner product of \((p,q)\) and \((s,r)\), or of \((q,p)\) and \((r,s)\). In either case, one of the pairs has its numerator and denominator switched.) Perhaps now you see where the title of the post comes from: <u>addition of fractions is bilinear when we treat the fractions as <em>pairs</em></u> (Farey addition is, after all, essentially vector addition): \[ \left(\frac{p_1}{q_1}\oplus\frac{p_2}{q_2}\right) + \frac{r}{s} = \left(\frac{p_1}{q_1} + \frac{r}{s}\right) \oplus \left(\frac{p_2}{q_2} + \frac{r}{s}\right) \] \[ \frac{p}{q} + \left(\frac{r_1}{s_1}\oplus\frac{r_2}{s_2}\right) = \left(\frac{p}{q} + \frac{r_1}{s_1}\right) \oplus \left(\frac{p}{q} + \frac{r_2}{s_2}\right) \] \[ \frac{cp}{cq} + \frac{r}{s} = \frac{p}{q} + \frac{cr}{cs} = \frac{c(ps + qr)}{c(qs)} \] The last property above is what makes addition of fractions well-defined, meaning that if we replace one fraction in the sum with an equivalent fraction, then the resulting sum is equivalent to the original sum.</p> <p>This is all very cute and would simply be a curiosity, but I’m writing about it because this property of bilinearity recently helped me understand something about continued fractions.</p> <blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Thanks to the student research I'm directing this summer, I'm finally learning some stuff about continued fractions.</p>— Joshua Bowman (@Thalesdisciple) <a href="https://twitter.com/Thalesdisciple/status/870752207494197249">June 2, 2017</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> <p>A <em>finite continued fraction</em> is an expression of the form \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{\ddots + \cfrac{b_{n-1}}{a_{n-1} + \cfrac{b_n}{a_n}}}}} \] (In a <em>simple continued fraction</em> we assume that all of the \(b_i\) equal \(1\), but for my purposes here there’s no reason to make that assumption.) The operations are nested, so to find the value of this expression, we start from the bottom. For example, \[ 1 + \cfrac{2}{3 + \cfrac{4}{5 + \cfrac{6}{7}}} = 1 + \cfrac{2}{3 + \cfrac{4}{\cfrac{41}{7}}} = 1 + \cfrac{2}{3 + \cfrac{28}{41}} = 1 + \cfrac{2}{\cfrac{151}{41}} = 1 + \dfrac{82}{151} = \dfrac{233}{151} \] An <em><a href="http://people.math.binghamton.edu/dikran/478/Ch7.pdf" target="_blank">infinite continued fraction</a></em> has the same form as a finite continued fraction, except that the nesting doesn’t stop: \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{a_3 + \ddots}}} \] To make sense of this expression, we need—as is typical in such cases—to consider a sequence of finite continued fractions that should “approximate” this infinite continued fraction. If we truncate the infinite continued fraction after \(a_n\), then we get its <em>\(n\)th convergent</em>, which we write as \(p_n/q_n\). If the sequence of convergents converges, well, then, its limit is the value of the infinite continued fraction. (The <a href="https://twitter.com/mathologer" target="_blank">Mathologer</a> has an excellent <a href="https://www.youtube.com/watch?v=CaasbfdJdJg" target="_blank">video on infinite continued fractions</a>, which includes an example of an apparent paradox that can arise from an infinite continued fraction that doesn’t converge.) </p> <p>The definition is nice as far as it goes, but wouldn’t it be even nicer to know how successive convergents are related? That is, how can we get \(p_{n+1}/q_{n+1}\) from earlier convergents? Fortunately, there are simple recurrence relations due to <a href="https://en.wikipedia.org/wiki/Leonhard_Euler" target="_blank">Euler</a> and <a href="https://en.wikipedia.org/wiki/John_Wallis" target="_blank">Wallis</a>: \[ p_{n+1} = a_{n+1} p_n + b_{n+1} p_{n-1}, \qquad q_{n+1} = a_{n+1} q_n + b_{n+1} q_{n-1}. \] It was in trying to understand these formulas that I got stuck. Finite continued fractions, as explained above, are computed from the bottom up. But when we move from the \(n\)th convergent to the \((n+1)\)st convergent of an infinite continued fraction, we add a new term <em>to the bottom</em>. That is, we change the <em>start</em> of the computation, not the end of it. That throws off my instincts for how recurrence should work.</p> <p>It’s easy to find proofs of the Euler–Wallis recurrence relations. Most of them use induction, which is fine. But many also implicitly use the fact that continued fractions “can be” defined with real numbers for \(a_i\) and \(b_i\), not just integers. I never specified integers in the definitions above, either, but in many cases one wants to make that restriction, including the purpose for which I’ve been studying continued fractions. I felt it should be possible to prove such a fundamental relationship without leaving the natural “context” of whatever type of numbers was allowed to begin with.</p> <p>Here is where the bilinearity of fraction addition came to my aid. One of the nice things about bilinear operations is that, if you fix one input, the result is just an ordinary linear function with respect to the other input, which can represented by a matrix. If we fix \(p/q\), say, and add to it a variable fraction \(x/y\), and represent the fractions as vectors (since we’re thinking of them as pairs anyway), then the calculation of the sum \(p/q+x/y\) becomes \[ \begin{pmatrix} py + qx \\ qy \end{pmatrix} = \begin{pmatrix} q & p \\ 0 & q \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}. \] When computing continued fractions, we repeatedly start with a fraction, invert it, multiply the (new) numerator by some quantity, and add another value. All of these have representations as linear transformations: the reciprocal of \(x/y\) is given by \[ \begin{pmatrix} y \\ x \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}. \] Multiplying the numerator of a fraction \(x/y\), say by a constant \(c\), is given by \[ \begin{pmatrix} cx \\ y \end{pmatrix} = \begin{pmatrix} c & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] Putting together the above linear transformations, we find that if \(a\) and \(b\) are fixed, then the matrix form of the transformation that sends \(x/y\) to \(a+b/(x/y)\) is \[ \begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix} \begin{pmatrix} b & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} a & b \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] We can find \(p_n/q_n\), then, by starting with \(a_n = a_n/1\) and applying the above procedure repeatedly: \[ \begin{pmatrix} p_n \\ q_n \end{pmatrix} = \begin{pmatrix} a_0 & b_1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} a_1 & b_2 \\ 1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & b_n \\ 1 & 0 \end{pmatrix} \begin{pmatrix} a_n \\ 1 \end{pmatrix} \] However, something is aesthetically “off” with this formula: the \(a\) term and the \(b\) term in each matrix on the right have different indices.</p> <p>Let's try rearranging our steps so that like indices get grouped together: first invert the fraction, then add \(a\), then multiply the <em>denominator</em> of the result by \(b\) (which works out correctly, because in the next round the first thing we’ll do is invert again). This sequence of steps produces the formula \[ \begin{pmatrix} 1 & 0 \\ 0 & b \end{pmatrix} \begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} a & 1 \\ b & 0 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \] To make this work in computing the \(n\)th convergent of a continued fraction, we need to start with \(a_n/b_n\) (already promising), and we need to set \(b_0 = 1\) so that when we divide by it, we don’t change the final value (we were missing a \(b_0\) anyway, which was also somewhat unsatisfying). Therefore, to find \(p_n/q_n\) by repeated linear transformations, we have \[ \begin{pmatrix} p_n \\ q_n \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n \\ b_n \end{pmatrix} \] We can improve this formula a bit more by thinking about where \(a_n/b_n\) came from. If we followed the same steps as before, then we must have inverted something, added it to \(a_n\), and multiplied the denominator of the result by \(b_n\). The thing we must have added to \(a_n\) is \(0 = 0/1\), which came from “inverting” \(1/0\). Thus, if we set \(p_{-1} = 1\) and \(q_{-1} = 0\), then the following equation holds for all \(n \ge 0\): \[ \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \] Whenever you end up with a formula this beautiful, you know you must have done something right.</p> <p>It was an encounter with this matrix representation in <a href="http://www.math.illinois.edu/~ajh/453/nt-notes6.pdf" target="_blank">a set of number theory notes</a> that sparked the thoughts that led to this post. That set of notes did not contain a proof of the matrix formula, however. A few other sources that I have since found do use this formulation and prove it, such as <a href="http://alpha.math.uga.edu/~vandehey/Notes%20on%20CFs.pdf" target="_blank">this set of notes</a>, which also includes a similar discussion to mine.</p> <p>Back to the question of proving the Euler–Wallis recurrence relations. Suppose we have computed the \(n\)th convergent of a continued fraction, and we wish to proceed to the \((n+1)\)st convergent. In our new formulation, that just means appending another matrix to the product: \[ \begin{pmatrix} p_{n+1} & p_n \\ q_{n+1} & q_n \end{pmatrix} = \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \begin{pmatrix} a_{n+1} & 1 \\ b_{n+1} & 0 \end{pmatrix} \] <p>If we group all but the last matrix on the right together, use the previous equation, and then carry out the final multiplication, we get: \[ \begin{pmatrix} p_{n+1} & p_n \\ q_{n+1} & q_n \end{pmatrix} = \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} \begin{pmatrix} a_{n+1} & 1 \\ b_{n+1} & 0 \end{pmatrix} = \begin{pmatrix} a_{n+1} p_n + b_{n+1} p_{n-1} & p_n \\ a_{n+1} q_n + b_{n+1} q_{n-1} & q_n \end{pmatrix} \] Thus, <u>the Euler–Wallis recurrence relations are a consequence of the fact that matrix multiplication is associative</u> (and also the fact that addition of fractions is bilinear!).</p> <p>As a bonus, the difference between successive convergents is now seen to be a consequence of the fact that the determinant is multiplicative: \[ \begin{vmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{vmatrix} = \left\vert \begin{pmatrix} a_0 & 1 \\ b_0 & 0 \end{pmatrix} \begin{pmatrix} a_1 & 1 \\ b_1 & 0 \end{pmatrix} \cdots \begin{pmatrix} a_{n-1} & 1 \\ b_{n-1} & 0 \end{pmatrix} \begin{pmatrix} a_n & 1 \\ b_n & 0 \end{pmatrix} \right\vert \] \[ p_n q_{n-1} - p_{n-1} q_n = (-1)^{n+1} b_1 b_2 \cdots b_n \] \[ \frac{p_n}{q_n} - \frac{p_{n-1}}{q_{n-1}} = (-1)^{n+1} \frac{b_1 b_2 \cdots b_n}{q_n q_{n-1}} \] An infinite continued fraction can therefore be written as an alternating series: \[ a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{a_3 + \ddots}}} = a_0 + \sum_{n=1}^\infty (-1)^{n+1} \frac{b_1 \cdots b_n}{q_n q_{n-1}}. \] When all of the \(a_i\)s and \(b_i\)s are positive, the condition for this series to converge is essentially that the \(b_i\)s don’t grow too quickly with respect to the denominators \(q_n\). In particular, if \(b_i = 1\) for all \(i\) (as in the case of a simple fraction) and \(a_i \ge 1\) for all \(i\), then the \(q_n\)s grow exponentially fast, and so the series (and hence the infinite continued fraction) is guaranteed to converge!</p> <p>No introductory piece on continued fractions is complete without mentioning Fibonacci numbers and the golden ratio. Let \(\phi\) be the number defined by the infinite continued fraction \[ \phi = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \ddots}}} \] that is, \(a_i = b_i = 1\) for all \(i\). Then we have \[ p_{-1} = 1, \qquad q_{-1} = 0, \qquad p_0 = 1, \qquad q_0 = 1 \] \[ p_{n+1} = p_n + p_{n-1}, \qquad q_{n+1} = q_n + q_{n-1} \qquad \forall\ n. \] Then, with the convention that the Fibonacci numbers \(F_n\) are defined by \[ F_0 = 0, \qquad F_1 = 1, \qquad F_{n+1} = F_n + F_{n-1} \] we have \(p_n = F_{n+2}\) and \(q_n = F_{n+1}\) for all \(n \ge -1\). Using the matrix formulation for computing convergents of \(\phi\), this means \[ \begin{pmatrix} F_{n+2} & F_{n+1} \\ F_{n+1} & F_n \end{pmatrix} = \begin{pmatrix} p_n & p_{n-1} \\ q_n & q_{n-1} \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^{n+1} \] (Try it!) On the other hand, the continued fraction equation for \(\phi\) implies that \[ \phi = 1 + \dfrac{1}{\phi}, \] which is the defining equation for the golden ratio, and has the solution \(\phi = \dfrac{1 + \sqrt{5}}{2}\). (We reject the negative solution because the value of the continued fraction is positive.) So it follows immediately that \[ \lim_{n\to\infty} \frac{F_{n+1}}{F_n} = \frac{1 + \sqrt{5}}{2} = 1 + \sum_{n=1}^\infty \frac{(-1)^n}{F_n F_{n+1}}. \] </p> <p>If I have time, another day I’ll explain what all of this has to do with <a href="https://en.wikipedia.org/wiki/Hyperbolic_geometry" target="_blank">hyperbolic geometry</a>, the <a href="http://www-bcf.usc.edu/~fbonahon/STML49/FareyFord.html" target="_blank">Farey–Ford tessellation</a>, and the <a href="https://www.youtube.com/watch?v=S1UE7jYa5mI" target="_blank">dynamics of circle rotations</a>.</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-5434019660042002332017-03-15T14:59:00.000-07:002017-03-15T15:00:42.673-07:00existence and uniqueness for some simple ODEs<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p>Last week in my analysis class we discussed ordinary differential equations (ODEs) and their solutions. The <a href="https://en.wikipedia.org/wiki/Picard%E2%80%93Lindel%C3%B6f_theorem" target="_blank">proof of the existence and uniqueness of solutions to ODEs</a> is one of my favorite examples of an extremely abstract theorem (in this case, the <a href="https://en.wikipedia.org/wiki/Banach_fixed-point_theorem" target="_blank">contraction mapping principle</a>) being used to solve a concrete problem (how do we know if an ODE has a solution, and if so, whether it’s uniquely specified by its initial conditions?). This post is just to give some simple examples that unearth the concerns one might have.</p> <p>Here are the three examples I’d like to consider: \[ y' = y, \hspace{0.5in} y' = y^2, \hspace{0.5in} (y')^2 = y. \] The first equation is familiar as characteristic of exponential functions. With the initial condition \(y(t_0) = C\), we obtain the solution \(y(t) = Ce^{t-t_0}\). Thus every initial condition results in a globally-defined solution. With this solution in hand, we can even show that it is unique, by a bit of trickery. Suppose \(f(t)\) is any solution to the initial value problem \(y'=y\), \(y(t_0)=C\), and consider the function \(g(t)=\frac{Ce^{t-t_0}}{f(t)}\). Then \[ g'(t) = \frac{f(t) Ce^{t-t_0} - f'(t) Ce^{t-t_0}}{(f(t))^2} = \frac{Ce^{t-t_0}}{(f(t))^2}(f(t) - f'(t)). \] But by assumption \(f'(t) = f(t)\), so \(g'(t) = 0\). Thus \(g(t)\) is constant, and because \(g(t_0) = 1\), we must have \(g(t)=1\) for all \(t\). Therefore \(f(t) = Ce^{t-t_0}\). (How does the case where \(f(t) = 0\) for some \(t\) need to be handled?)</p> <p>The form of the second equation suggests that its solutions, once they get above \(y = 1\), should grow <em>faster</em> than exponential functions, because the growth rate depends quadratically on \(y\) rather than linearly. A bit of thought suggests the solution \(y(t) = -\frac{1}{t}\); by translating in the \(t\)-direction, we can satisfy any initial condition with a solution of the form \(y(t) = -\frac{1}{t-a}\). Again, the solution is uniquely specified by its initial condition. But these solutions are no longer <em>globally</em> defined; each one “blows up” at some finite time, either before or after the point at which we specify the initial condition. This shows that in general we cannot expect global solutions to ODEs; the <a href="https://en.wikipedia.org/wiki/Picard%E2%80%93Lindel%C3%B6f_theorem" target="_blank">usual existence theorem</a> only guarantees <em>local</em> existence of solutions (although global existence can be guaranteed by stronger hypotheses, which are rarely satisfied).</p> <p>The third equation looks similar to the second, but squaring the \(y'\) term rather than \(y\) has some major consequences. First, the equation implies that solutions must be non-negative. This is not serious, however; changing the equation slightly to \((y')^2 = |y|\) allows solutions to be negative. More serious is the fact that both of the following functions are solutions that satisfy the initial condition \(y(t_0) = 0\): \[ y(t) = 0 \hspace{1in} y(t) = \frac14(t - t_0)^2. \] Indeed, by piecing together these two types of solutions, we can obtain solutions that remain 0 for any length of time, then “take off” unexpectedly. In short, we have existence, but we definitely do <em>not</em> have uniqueness, because as soon as a solution reaches zero, it can remain zero for an indeterminate amount of time. (We do have <em>local uniqueness</em> for solutions that have a non-zero initial condition, however.) The issue is that the equation allows the derivatives of its solutions to grow and shrink too quickly when they are near zero; more precisely, the function \(\sqrt{|y|}\) is not <a href="https://en.wikipedia.org/wiki/Lipschitz_continuity" target="_blank">Lipschitz</a> in any interval containing 0.</p> <p>As I said, my purpose here is not to expound the statement or proof of any theorem on existence and uniqueness, just to provide some simple examples that illustrate what considerations must be made in formulating such a theorem.</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com3tag:blogger.com,1999:blog-30611202.post-42001681654522475002016-08-19T15:45:00.000-07:002016-08-23T10:23:05.897-07:00specifications in analysis<p>Earlier this week, I wrote about <a href="http://thalestriangles.blogspot.com/2016/08/expectations-in-analysis.html">expectations for my analysis class this fall</a> (which also apply broadly to upper-level math classes) and <a href="http://thalestriangles.blogspot.com/2016/08/taking-specs-seriously.html">some things I learned about specs grading this summer</a>. In this post, I’ll share the specifications I have created for analysis. (I have taught real analysis before, and <a href="http://thalestriangles.blogspot.com/2014/08/standards-for-analysis.html">last time I tried a standards-based approach</a>. Frankly, that basically turned into a point system, albeit a simplified one, which is why I’m trying something completely different this time.) </p> <p>The rest of the post is taken verbatim from (the current draft of) my syllabus. </p> <hr> <p>Effective learning requires effective methods of assessment. The assessments should relate as directly as possible to the expectations of the class, and they should provide both feedback on how to improve and opportunities to demonstrate improvement as the semester progresses. In my experience, “traditional” grading schemes based on assigning points or percentages to individual tasks do not serve these functions well. Therefore, this course adopts <em>specifications grading</em>*, in which grades are tied to specific outcomes. This is likely to be different from grading policies in other classes you have taken, so feel free to ask me questions or let me know if you have concerns. I hope that this system will make clear the connections between the expectations stated in the previous section and the ways you will be assessed. </p> <p><b>Overall grading.</b> At the end of the semester, I am required to submit to the university a letter grade reflecting your achievement in this class. That grade will be determined on the basis of a set of specifications in four areas: (1) class participation, (2) written proofs, (3) exams, and (4) synthesizing activities. Each of these areas will receive a simple grade of A, B, C, D, or F. The following sections describe how these grades will be determined. Your final grade will depend on your performance in all four areas, according to the following table. <table border="1"><tr><th>Final grade</th><th>based on individual grades of</th></tr><tr><td>A</td><td>all As, or 3 As and 1 B</td></tr><tr><td>A–</td><td>two As and two Bs</td></tr><tr><td>B+</td><td>one A and three Bs</td></tr><tr><td>B</td><td>all Bs, or 3 Bs and 1 C</td></tr><tr><td>B–</td><td>two Bs and two Cs</td></tr><tr><td>C+</td><td>one B and three Cs</td></tr><tr><td>C</td><td>all Cs, or 3 Cs and 1 D</td></tr><tr><td>D–</td><td>two Cs and two Ds</td></tr></table>I will use my discretion to assign a final letter grade to other combinations of individual letter grades. </p> <p><b>Class participation.</b> Attendance at every class meeting is required. Most weeks, we will alternate days between discussing reading assignments and presenting solutions to exercises. The end of this syllabus has a schedule of what we will be doing in class each day (with allowance for adjustments, as needed). </p> <p><em>Reading.</em> In order to participate effectively on discussion days, you will need to read the textbook before coming to class. Each reading assignment is about 10 pages. The <a href="https://www.amazon.com/Analysis-Revised-Jones-Bartlett-Mathematics/dp/0763714976" target="_blank">textbook</a> attempts to be very accessible, but that does not mean it is easy. We will be working with ideas that stretch reason and imagination. You should be prepared to spend at least 1–2 hours on each reading assignment; rereading pages, paragraphs, or sentences; working out examples; and writing questions or comments in the margins or on separate paper. You should be especially mindful of definitions. These are not always set apart from the text, so pay attention when new vocabulary is introduced. Start working on a list of definitions and theorems from the start of the semester. The chapter summaries can be an aid in this process. </p> <p><em>Collaborating.</em> On days with a reading assignment, you will work in small groups to discuss the material. I will assign these groups at the start of each week. You should bring your own questions and thoughts to these discussions. If there is extra time, you can also discuss the current set of exercises. </p> <p><em>Presenting.</em> On the remaining days, you will take turns presenting solutions to exercises distributed previously. The solution you present does not necessarily need to be entirely correct, but it should show evidence of a serious effort. You should also be prepared to answer questions from me or other students. To maintain balance, no one will be allowed to present more than once every two weeks, unless every student in the class has already presented during that time period. In exceptional cases, some of these verbal presentations may be made to me outside of class (no more than one per student). </p> <table border="1"><tr><th>To earn a</th><th>you must do the following</th></tr><tr><td>D</td><td>attend at least 75% of class meetings<br> present at least one proof in class</td></tr><tr><td>C</td><td>attend at least 85% of class meetings and contribute to discussions<br> present at least three proofs in class</td></tr><tr><td>B</td><td>attend at least 90% of class meetings and contribute to discussions<br> present at least four proofs in class</td></tr><tr><td>A</td> <td>attend all class meetings (2 unexcused absences allowed) and contribute to discussions<br> present at least five proofs in class</td></tr></table> <p><b>Written proofs.</b> Over the semester, you will develop a portfolio of work that you have submitted for formal assessment. Most of your contributions will be proofs. Each week I will indicate one or more exercises whose solutions could be submitted to your portfolio. You may discuss your work with other students in the class, to have them check whether it meets the standards of the class and give you feedback. A proof for the portfolio is due the Monday after it is assigned. These proofs <em>must be typed</em> using LaTeX, Google docs, Microsoft Word, or another system. </p> <p>When you submit a written proof for your portfolio, I will judge whether it is Successful, Quasi-successful, or Unsuccessful (see the <a href="http://thalestriangles.blogspot.com/2016/08/expectations-in-analysis.html">earlier section on “Proofs” under “Expectations”</a> for details about these ratings), and mark it correspondingly with one of S/Q/U. Proofs marked Q or U will not be counted towards your grade. However, proofs can be resubmitted at the cost of one or two of your allotted tokens; see section on “Tokens” below. </p> <table border="1"><tr><th>To earn a</th><th>your portfolio must contain</th></tr><tr><td>D</td><td>at least four successful proofs</td></tr><tr><td>C</td><td>at least six successful proofs</td></tr><tr><td>B</td><td>at least eight successful proofs</td></tr><tr><td>A</td> <td>at least ten successful proofs</td></tr></table> <p><b>Exams.</b> There will be two midterm exams and a final exam. Each one will have a take-home portion and an in-class portion. [Dates and times, listed in syllabus, omitted here.] </p> <p>The take-home portions will consist of two or three proofs that you are to complete <em>on your own</em>, without consulting other students. (You may discuss your work with me before turning in the exam, although I might not answer questions directly.) These will be judged as successful, partially successful, or unsuccessful, like the proofs in your portfolio. They cannot be resubmitted after grading, however. </p> <p>The in-class portions will test your mastery of definitions and the statements of theorems. You will need to be able to state both definitions and theorems properly. You will also be asked to recognize and provide examples of situations or objects where a definition or theorem <em>does</em> or <em>does not</em> apply. </p> <table border="1"><tr><th>To earn a</th><th>you must do the following</th></tr><tr><td>D</td><td>correctly answer 60% of in-class test questions<br> write at least two successful proofs on take-home exams</td></tr><tr><td>C</td><td>correctly answer 75% of in-class test questions<br> write at least three successful proofs and one quasi-successful proof on take-home exams</td></tr><tr><td>B</td><td>correctly answer 85% of in-class test questions<br> write at least four successful proofs and two quasi-successful proofs on take-home exams</td></tr><tr><td>A</td> <td>correctly answer 95% of in-class test questions, write six successful proofs on take-home exams</td></tr></table> <p><b>Synthesis.</b> To master the ideas of the class, you must spend time synthesizing the material for yourself. The items in this graded section will be added to your portfolio, to complement the proofs. All materials in this section <em>must be typed</em> using LaTeX, Google docs, Microsoft Word, or another system. </p> <p><em>List of definitions and theorems.</em> It should be clear at this point that being able to produce accurate statements of definitions and theorems is essential to success in this class. To encourage you to practice these, I am requiring you to create a list of these statements for the entire course. Your list should be organized in some way that makes sense to you—e.g., alphabetically or chronologically. </p> <p>The textbook can be used as a reference, as can the internet, but how do you quickly recall what definitions we’ve used and how they're related? How do you find the phrasing of a theorem that’s become most familiar? This list should help you in these situations. More importantly, creating it will help you review and organize the material in your own mind. </p> <p>I will verify your progress on these lists at each in-class exam. </p> <p><em>Papers.</em> Twice during the semester, once in the first half and once in the second half, I will provide a list of topics that we have been discussing, from which you can choose to base a paper on. These will be due approximately two weeks after the midterm exams. </p> <p>There is a third paper that can be completed at any point in the semester on a topic of your choosing, but you must get the topic approved by me before Thanksgiving. </p> <p>These papers will for the most part be <em>expository</em>, meaning they will present previously known mathematical results (not original research). Here are the requirements for a paper to be acceptable: <ul><li>It should have 1500–4500 words.</li><li>It should use correct grammar, spelling, notation, and vocabulary.</li><li>It should be organized into paragraphs and, if you wish, sections.</li><li>It should cover the topic clearly and reasonably thoroughly, with an intended audience of other math students (who may be assumed to have studied as much analysis as you).</li><li>It should contain a proof of at least one major result.</li><li>The writing should be original to you. Of course, small pieces like definitions may be taken directly from another source, but apart from these the paper should be your own work.</li><li>Citations are generally not necessary in expository mathematical writing, except for the following: a statement of theorem that you are not proving, a peculiar formulation of a concept/definition, or a creative idea (e.g., an uncommon metaphor or illustration) from another source.</li><li>You may choose to follow the style of our textbook, or a more formally structured math textbook, or something more journalistic or creative, as long as the previous criteria are met.</li></ul>Papers that do not meet these criteria will be considered unsatisfactory and will not count towards your grade. An unsatisfactory paper can be revised and resubmitted at the cost of three tokens. </p> <table border="1"><tr><th>To earn a</th><th>you must do the following</th></tr><tr><td>D</td><td>create a list of definitions and theorems to include in your portfolio</td></tr><tr><td>C</td><td>create a list of definitions and theorems to include in your portfolio<br> write a paper on one of the topics provided</td></tr><tr><td>B</td><td>create a list of definitions and theorems to include in your portfolio<br> write two papers on the topics provided, one during each half of the semester</td></tr><tr><td>A</td> <td>create a list of definitions and theorems to include in your portfolio <br>write two papers on the topics provided, one during each half of the semester <br>write a third paper on a topic of your own choosing related to the class</td></tr></table> <p><b>Tokens.</b> You start out the semester with seven (7) virtual “tokens,” which can be used in various ways: <ul><li>One token allows you to resubmit a written proof initially judged to be quasi-successful (must be used within one week of initial grading).</li><li>Two tokens allow you to resubmit a written proof initially judged to be unsuccessful (must be used within one week of initial grading).</li><li>Three tokens allow you to resubmit an unsatisfactory paper (must be used within one week of receiving paper back).</li><li>One token gives you a 48 hour extension past the due date for a paper.</li></ul>Unused tokens may be exchanged for a prize at the end of the semester. [maybe?!?] </p> <p>*Based on Linda Nilson’s book <em><a href="https://www.amazon.com/Specifications-Grading-Restoring-Motivating-Students/dp/1620362422" target="_blank">Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time</a></em>. </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com5tag:blogger.com,1999:blog-30611202.post-42542583514038512492016-08-18T11:26:00.000-07:002016-08-18T11:26:51.631-07:00taking specs seriously<p>I’ve been an advocate of <a href="http://thalestriangles.blogspot.com/search/label/sbg">standards-based grading</a> since I started using it over three years ago. It has addressed many of the concerns I had about the dominant point-based grading system and encouraged students to move forward in their understanding rather than feeling trapped by past performance. </p> <p>I’m not solely an SBG proponent when it comes to grading, however. For one thing, I find it hard to adapt SBG to upper-level math courses. For another, the time seems ripe for experimentation in grading practices as more of us realize the shortcomings of what we have inherited from decades past. Not that we should constantly reinvent the grading process, but we should be open to various thoughtful ways of providing authentic assessment. </p> <p>So I was certainly interested a couple of years ago when several fellow instructors began talking about <a href="https://www.insidehighered.com/views/2016/01/19/new-ways-grade-more-effectively-essay" target-"_blank">specifications grading</a>, a method espoused by <a href="http://chronicle.com/blognetwork/castingoutnines/2014/11/25/41-interview-linda-nilson/" target="_blank">Linda Nilson</a> in her book <a href="https://www.amazon.com/Specifications-Grading-Restoring-Motivating-Students/dp/1620362422" target="_blank"><em>Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time</em></a>. I adopted some of the ideas I heard and appreciated the increased flexibility it offered. </p> <p>However, it was not until this summer that I read through Nilson’s book. It was useful because it seems Nilson and I think differently in ways I can’t quite put my finger on, and so the book has lots of ideas I would not have intuited on my own. Here are a few of the things I garnered from reading the book that I hadn’t picked up from online discussions (not that these things weren’t said, but this time they stuck): <ul><li><em>Sometimes it’s OK to use percentages.</em> I’ve been highly points- and percentages-averse since starting SBG. Percentages, my argument went, were essentially meaningless, because they’re constantly being curved (so they don’t really represent a “percentage” of anything) and the difference between 80% and 81% is essentially a coin toss (so they aren’t as linearly ordered as people like to think). But that argument isn’t uniformly true. In a course where precision is important, it is possible to measure, for instance, <em>how many</em> definitions a student can correctly state. For my upcoming analysis class, I expect “A” students to get definitions right 95% of the time, “B” students 85% of the time, “C” students 75% of the time. This really is quantifiable, and a definition is either correct (with respect to the established conventions of the subject) or not, so each one can be graded yes/no. As long as not <em>everything</em> is forced into a percentages model, this can be an effective way to give feedback.</li><li><em>Make students work for an A, but give them some choice in how to get there.</em> As instructors, we want an A to represent mastery, an indication that the student can think nimbly and complexly about the subject. Ideally, students who earn an A will be the ones <em>most invested</em> in the subject. To demonstrate all this, students should have ownership of their work. They should make meaningful choices that reflect their interests and their skills as well as the subject at hand.</li><li><em>Not everyone has to do everything.</em> This is closely tied to the previous point. Nilson uses the metaphor of “hurdles”: grade levels can be differentiated by having students clear either <em>higher</em> hurdles (more complex, better quality work) or <em>more</em> hurdles (more extensive work), or a mix of the two. I’m not generally a fan of having students earn higher grades by just proving they can do more—that takes more of my time, and more of theirs. But true mastery requires a measure of initiative. Having a small number of optional assignments that give students opportunities to distinguish themselves makes sense as part of a larger grading scheme.</li><li><em>There are good reasons to limit reassessments.</em> Of course, one of these reasons is the subtitular “saving faculty time.” In past upper-level classes where I’ve allowed essentially unlimited resubmission, I’ve been swamped/behind at several points in the semester as students frantically tried to get something accepted. But that’s not even the best reason. By limiting reassessments and grading work pass/fail (or pass/progressing/fail or some other variant), students are encouraged to submit their best work each time, and to spend extra time making sure they check its quality before asking me to do so. The onus is on me to establish clear expectations, and on students to meet them. We’re not negotiating what’s acceptable through repeated revision and grading.</li></ul>I also found the chapter on cognitive models (Chapter 3, “Linking Grades to Outcomes”) helpful in considering what it means to have a higher level of mastery; previously I wasn’t really familiar with anything beyond <a href="https://en.wikipedia.org/wiki/Bloom%27s_taxonomy" target="_blank">Bloom’s Taxonomy</a>. </p> <p>If this post was of interest to you, I hope you’ll consider joining the Google+ Community on <a href="https://plus.google.com/u/0/communities/117099673102877564377">“Standards-Based and Specifications Grading”</a> (SBSG), where teachers of diverse disciplines are meeting to discuss how to implement these two particular alternative forms of grading. </p> <p>Tomorrow I’ll share my full set of specifications for real analysis. </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-6363334532250015372016-08-16T13:46:00.000-07:002016-08-16T13:52:37.259-07:00expectations in analysis<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script> <p>I’m working on the syllabus for my (junior and senior level) analysis class this fall, and I’d like to share some parts of it, hopefully thereby eliciting feedback. The main thing I’m concerned about is the type of specifications grading I’m adopting for the class—I’ll share that later this week. This post is about establishing the <em>expectations</em> of the course, on which the specifications will be based. None of these are particular to analysis; they establish what I believe any student in an upper-level mathematics course should achieve.</p> <p>The rest of the post is taken verbatim from (the current draft of) my syllabus.</p> <hr> <p>To learn mathematics, it is essential to engage actively with the material. This is especially true at this stage in your mathematical careers, as the focus of study shifts from developing computational tools to examining underlying concepts and practicing abstract reasoning. This shift may be described as a move from <em>pre-rigorous</em> thinking, which is informal and intuitive, to <em>rigorous</em> thinking, which is formal and precise. (This terminology has been suggested by mathematician Terence Tao; he also includes a <em>post-rigorous</em> stage, in which professional mathematicians work, where one is able to make intuitive arguments that are grounded by formal training.) </p> <p>The content of this course resides in <em>definitions</em>, <em>theorems</em>, and <em>proofs</em>. You will be expected to state both definitions and theorems accurately and to illustrate them through examples. Mathematics is not merely a collection of disconnected facts, however, and so you will also develop your logical skills by proving mathematical truths, linking definitions to their profound consequences captured by theorems. All of this will happen in the context of a <em>community</em>—two really, our class and the larger mathematical community. </p> <p><b>Definitions.</b> In mathematics, as in other sciences, it is necessary to quantify what is being studied and to be able to identify what is of interest at each moment. This is done by carefully establishing and <em>internalizing</em> definitions. This is not to say that definitions do not involve creativity; as a subject develops, often definitions evolve to encompass more or fewer cases, to be more precise, or to reorganize ideas. </p> <p>By the end of the course, you should be able to: <ul><li> state definitions accurately and explain any notation or previously-defined terms they contain;</li><li> identify whether or not an object meets the conditions of a given definition;</li><li> give examples that satisfy a given definition as well as examples that do not satisfy it;</li><li> test an unfamiliar definition using examples;</li><li> create new definitions when needed.</li></ul></p> <p><b>Theorems.</b> A theorem has two parts: the <em>antecedent</em> (its assumptions) and the <em>consequent</em> (its conclusions). To take a familiar example, the equation \(a^2 + b^2 = c^2\) by itself is not a theorem; rather, the Pythagorean Theorem states that “<u>If</u> \(c\) is the length of the hypotenuse of a right triangle, and \(a\) and \(b\) are the lengths of its other two sides, <u>then</u> \(a^2 + b^2 = c^2\).” A theorem may not always include the words “if” and “then,” but you should always be able to determine what are the antecedent and the consequent. Sometimes rephrasing the theorem’s statement can help. For example, “Every differentiable function is continuous” can be rephrased as “If a function is differentiable, then it is continuous.” In most cases, the consequent does not imply the antecedent (e.g., not every continuous function is differentiable). A theorem that says one set of conditions holds “if and only if” another set of conditions holds is logically making two statements (the antecedent and consequent can be reversed), and both must be proved. </p> <p>By the end of the course, you should be able to: <ul><li> state theorems accurately and identify what are their assumptions and their conclusions;</li><li> determine whether the conditions of a theorem do or do not hold in a given situation, explain why, and determine what the theorem does or does not imply in that situation;</li><li> recognize logically equivalent forms of a theorem;</li><li> formulate and test conjectures.</li></ul></p> <p><b>Proofs.</b> Proofs are how we as individuals and as a community determine the truth of mathematical statements, i.e., theorems. Here is one definition of a proof, due to David Henderson: A proof is “a convincing communication that answers -- Why?” The extent to which a proof succeeds, therefore, depends on how well it embodies these three properties: it should be <em>logical</em> (does it <em>convince?</em>), it should be <em>comprehensible</em> (does it <em>communicate?</em>), and it should be <em>intentional</em> (does it <em>answer why?</em>). Evidently, each of these properties depends somewhat on the others. It is thus reasonable to classify proofs into an S/Q/U system: <ul><li> (S) A <em>successful</em> proof makes an argument for the truth of a mathematical statement that is fully convincing to an informed reader or listener. It employs appropriate vocabulary and carefully chosen notation. It avoids sloppy reasoning. It makes clear use of the theorem’s assumptions and, when necessary, previously known results. The best examples provide motivation for the methods chosen. Minor revisions may be advisable, but they do not hinder the overall effectiveness.</li><li> (Q) A <em>quasi-successful</em> proof contains most of the ideas necessary to make a complete argument. It may have slips in logic or notation, or it may neglect a special case, or it may be hard to read. It contains sufficient evidence, however, that the argument can be “salvaged” by filling in gaps or clarifying language. Serious revision is necessary. [Not in syllabus: thanks to <a href="https://twitter.com/dancuzz/status/763154393487200257" target="_blank">Dan</a> for suggesting “quasi-”.]</li><li> (U) An <em>unsuccessful</em> proof does not convince an informed person of the truth of the purported theorem, for one or more of the following reasons: – It makes logical leaps or omits key ideas. – It demonstrates incomplete understanding of definitions or notation. – It fails to reference previous results when appropriate. Complete revision is generally necessary.</li></ul>In other words, a successful proof is of sufficient quality that it could reasonably be accepted as part of a paper in a professional journal. A quasi-successful proof has some merit, but it requires revision, after which it might or might not be acceptable at a professional level. An unsuccessful proof is sufficiently flawed that it would not be acceptable as part of a professional publication. </p> <p>By the end of the course, you should be able to: <ul><li> evaluate, on the basis of professional standards, whether a given proof is successful or not;</li><li> write original, successful proofs.</li></ul></p> <p><b>Community.</b> Our class time will be structured primarily around discussion rather than lecture. The idea is to have a space that promotes sharing ideas, making guesses, taking risks, and sharpening our reasoning abilities. I will guide and facilitate these conversations, but everyone is responsible for contributing to discussions, both in small groups and with the entire class. That is, in this course <em>mathematical authority</em> resides not just with me as the instructor, but with every class member. I will give short lectures (20 minutes) when the entire class agrees it would be beneficial, but not more often than once a week. </p> <p>By the end of the course, you should be able to: <ul><li> engage in discussions about mathematics by sharing questions, proposals, and insights;</li><li> evaluate others' contributions critically and respond constructively;</li><li> present your own work in front of an audience and address their comments and questions.</li></ul></p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com7tag:blogger.com,1999:blog-30611202.post-65989464607682292882016-06-29T17:00:00.000-07:002016-06-30T07:08:57.238-07:00why I do math<p>I spent the past week on retreat with fellow faculty members. During part of this time, we each shared our “vocational journeys,” or the stories of how we have been led to this job and these fields of scholarship. I thought that part of my essay might have broader appeal, so I’m posting it here. </p> <p><u>Why do I study and teach mathematics?</u> My research is in the field of <em>pure</em> mathematics, for which it may seem harder to justify an investment of time than for its adjacent field of <em>applied</em> mathematics. Applied math at least tries to tie itself directly to the needs and concerns of our immediate physical world. Pure math is happy to oblige in improving how well we understand the world, but its primary concern is math for math’s sake. (The boundary between these two types of math is highly permeable, and even pure math almost always starts with inspiration from experience.) I’d like to address the question by comparing math with two other areas represented in academia: music and science. </p> <p>First, math is <u>like music</u>. The aesthetic element in mathematics is essential, not peripheral. I’m not sure, but I think that in the minds of many people mathematics is reduced to a collection of more-or-less arbitrary facts, like the fact that the area of a circle equals pi times the square of its radius. Each of these facts, however, is like the final cadence of a symphony. It may be thrilling by itself, but it’s missing the indispensable context of “where did we start?” and “how did we get here?” </p> <p>This is why mathematicians insist on proving things: the proof is a whole symphony, not a single chord. Mathematicians are lauded not for stating facts, but for demonstrating their necessity, the way composers and musicians are praised for the whole course of a piece or a performance, not just its ending. When executed well, a proof has rhythm. It has themes that are developed and interwoven. It has counterpoint. It sets up expectations that are satisfied or subverted. Economy of material is valued, but not exclusively; an argument that wanders into neighboring territory, like a modulation to a neighboring key, can provide fuller appreciation of the main theme. </p> <p>Proofs have a variety of forms, some as common as sonatas and minuets: direct proof, proof by contradiction, proof by induction, proof by picture, proof by exhaustion. We have computer-generated musical compositions and computer-generated mathematical proofs, and in both communities there is healthy debate about whether these artificial creations are beautiful or desirable in such quintessentially human activities. We return over and over to the same pieces and theorems that have inspired us, whether they be simple or grand, and each performer gives her or his own interpretation and inflection to the presentation. </p> <p>Second, math is <u>like science</u>. Often mathematics is categorized as a science, and that’s not entirely wrong. Science is built on careful observation, winnowing data from the chaff of noise. Science seeks explanation which can be turned into prediction. It invents new tools for collecting information and improves upon those that already exist. It creates models and theories that encompass and relate as many pieces of knowledge as possible. </p> <p>Where science and math differ is that science deals with the world in which we live, while the world of math is imagined. <em>Imagine</em> that there are such things as points with no volume and perfectly straight lines that connect them. <em>Imagine</em> that numbers have enough solidity that we can move them around en masse by means of undetermined variables, the <i>x</i>, the <i>y</i>, the <i>z</i>. <em>Imagine</em> that once we start counting upwards <em>1, 2, 3, 4, 5, a thousand, a million, a trillion, a googol,…</em>, we could never reach an end, not in any number of lifetimes in any number of universes. Or <em>imagine</em> that the filigree of a fractal truly exists at every scale, that we can examine it closer and closer and see the ever-increasing detail, that there is no quantum barrier to our exploration, beyond which sight and measurement cease to be meaningful. </p> <p>When we imagine these things, we create the worlds in which we make our observations. The rules of these worlds are not completely arbitrary, at least not if we want to be able to know anything about them, but they are ours to choose. Each time we choose anew, we enter an undiscovered country. Once in this country, we must return to scientific methods of study. We look for patterns, try to explain them, and check that our explanations make accurate predictions. We must know when to trust the instruments we have—our minds, computer programs, results proved by other mathematicians—and when not to trust them. Like scientists, we have to winnow out the noise. </p> <p>Mathematical truth persists across ages and cultures, and so it may seem timeless, but our experience of it certainly isn’t. The channels of logic through which a proof flows may be carved out once and for all in eternity or in the human mind (depending on your view of where mathematical truth lies), but like notes on a page they remain inert until they are brought to life by individual or communal study. Like the tree of life in biology or the standard model in physics, mathematical theories are crystallized around <em>our</em> experience and <em>our</em> perception of the world. As Bill Thurston wrote, “mathematics only exists in a living community of mathematicians that spreads understanding and breathes life into ideas both old and new. The real satisfaction from mathematics is in learning from others and sharing with others.” </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com1tag:blogger.com,1999:blog-30611202.post-56148247909398550312016-05-31T22:16:00.000-07:002016-06-01T17:02:46.320-07:00Homology modulo 2<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script> <p>Last week, I was <a href="https://twitter.com/Thalesdisciple/status/735187572205322240" target="_blank">chirping</a> on Twitter about “homology modulo 2”: how closely it matches my geometric intuition of what homology should measure, despite my never having thought seriously about it before, and how its computational simplicity makes it seem like an ideal way to introduce homology to undergraduates, even those who haven’t studied linear algebra. For a very complete graduate-level introduction to homology (and cohomology) modulo 2, check out <a href="https://www.unige.ch/math/folks/hausmann/hausmannBook.pdf" target="_blank">Jean-Claude Hausmann’s book</a>. I will instead try to demonstrate how this topic can be introduced at nearly any level, with an appropriate amount of care. For the sake of brevity, I will assume familiarity with linear algebra in this post; however, the necessarily elements (image, kernel, rank, row reduction) can easily be learned in the context of homology, particularly when working modulo 2. </p> <p><em>Note:</em> This post got long enough in the writing that I didn’t make any pictures to go with it, so you should draw your own! The idea is to discover how algebra can be used to extract geometric/topological information in a way that is really striking when you see it happen. </p> <h2>The space</h2> <p>For simplicity of exposition, I will only consider spaces \(X\) that are created from finitely many <a href="https://en.wikipedia.org/wiki/Convex_polytope" target="_blank">convex polytopes</a> (often <a href="https://en.wikipedia.org/wiki/Simplex" target="_blank">simplices</a> or <a href="https://en.wikipedia.org/wiki/Hypercube" target="_blank">hypercubes</a>) by making some identifications (“gluings”) between their faces. The faces are not necessarily joined in pairs, however; more than two faces of the same dimension may be identified, or some faces might not be joined at all. A more careful definition is possible, but to provide one would get away from the fairly non-technical introduction I’m aiming for. Just assume no funny stuff happens, OK? The polytopes that make up \(X\) are called the <em>cells</em> of \(X\); the collection of cells includes all the faces of all the polytopes we started with (some of which, as noted above, have been identified with each other in pairs or larger groupings). Each cell, being a polytope, has a dimension, and if we wish to specify the dimension of a cell as \(k\), we call it a \(k\)-cell. </p> <p>For example, \(X\) could just be a single convex polytope. Or it could be a convex polytope with the interior removed (keeping in mind that the boundary of a convex polytope is a union of convex polytopes of one dimension lower). The outside of a cube, for instance, is made up of six 2-cells (the faces), twelve 1-cells (the edges), and eight 0-cells (the vertices). A torus, when <a href="https://youtu.be/0H5_h-RB0T8" target="_blank">made from a rectangle by identifying opposite sides</a>, is also such a space, with one 2-cell (the interior of the rectangle), two 1-cells (the result of identifying the edges in pairs), and one 0-cell (because all corners of the square are identified to the same point). </p> <h2>The data</h2> <p>The homology of \(X\) measures the difference between objects in \(X\) that have no boundary (these are called <em>cycles</em>) and objects that are the boundaries of other objects (called, quite sensibly, <em>boundaries</em>). A \(k\)-dimensional cycle that is not a boundary is supposed to “enclose” a \(k\)-dimensional “hole” in \(X\). The formal definitions are intended to quantify what is meant by “boundary;” the intuitive notion of “hole” floats along, generally defying proper definition (<a href="http://math.stackexchange.com/questions/40149/intuition-of-the-meaning-of-homology-groups" target="_blank">and often even intuition</a>). </p> <p>By “object” in the previous paragraph, we mean something made up from the cells of \(X\). We restrict ourselves to putting together cells of the same dimension, producing objects called <em>chains</em>. That is, a <em>\(k\)-chain</em> is just a collection of \(k\)-cells in \(X\). We can add together \(k\)-chains, but—and this is the beautifully simple part—we add <em>modulo 2</em>. If a particular cell appears twice, then this pair of appearances cancel each other out. The idea is that, since we’re trying to study “holes” in our space \(X\), if one cell appears twice, the pair of copies can be joined up along their common boundary and safely removed. Formally, a \(k\)-chain is a <a href="https://en.wikipedia.org/wiki/Linear_combination" target="_blank">linear combination</a> of \(k\)-cells, with coefficients in the <a href="https://en.wikipedia.org/wiki/GF(2)" target="_blank">field with two elements</a>, if you find such a formal description helpful. </p> <p>We now proceed to the key combinatorial data of our space \(X\) and see how it can be used to extract topological information. Because \(X\) is made up of finitely many cells, for each \(k = 1, \dots, n\), we can construct a <em>boundary matrix</em> \(\partial_k\). (Normally \(\partial_k\) would be defined as a linear map between certain vector spaces; we are fully exploiting the equivalence between linear maps and matrices.) The columns of \(\partial_k\) are labelled by the \(k\)-cells of \(X\), and the rows are labelled by the \((k-1)\)-cells. In each column, we put a 1 in each position where the corresponding \((k-1)\)-cell lies in the boundary of the given \(k\)-cell, and a 0 otherwise. <b>Exception.</b> Sometimes the faces of a single \(k\)-cell may be joined <em>to each other,</em> meaning the resulting \((k-1)\)-cell appears <em>with multiplicity</em> on the boundary of that \(k\)-cell. This multiplicity, modulo 2, is taken into account in the boundary matrix. See the boundary matrices of the torus, near the end, for examples of this phenomenon. </p> <h2>A concrete example: the tetrahedron</h2> <p>The boundary matrix, like most computational objects, is best understood through examples. Let’s start with the empty tetrahedron. Label the vertices \(v_1\), \(v_2\), \(v_3\), \(v_4\), and let \(f_i\) be the triangular face opposite \(v_i\). Let \(e_{ij}\) be the edge joining \(v_i\) to \(v_j\), with \(i < j\). Then we have two boundary matrices, <center>\( \partial_1 = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 \end{bmatrix} \) and \(\partial_2 = \begin{bmatrix} 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \end{bmatrix}\). </center>In \(\partial_1\), the columns are labelled by the edges and the rows are labelled by the vertices. In \(\partial_2\), the rows are labelled by the edges and the columns are labelled by the faces. In both matrices, the edges are listed in the order \(e_{12}\), \(e_{13}\), \(e_{14}\), \(e_{23}\), \(e_{24}\), \(e_{34}\). Notice that each column of \(\partial_1\) has two 1s, because each edge has two endpoints, and each column of \(\partial_2\) has three 1s, because each face is bounded by three edges. </p> <p>Once we have these matrices, we can use them to find boundaries of more general chains. For instance, when joined together, the edges \(e_{12}\) and \(e_{23}\) form a path from \(v_1\) to \(v_3\), so we expect the boundary to be these two points. Indeed, adding together (modulo 2!) the corresponding entries from the first and fourth columns of \(\partial_1\), we see that the 1s in the second entry cancel (which corresponds to the edges being joined at \(v_2\)), and we are left with 1s in the first and third entries. We can write this relation as \(\partial_1(e_{12}+e_{23}) = v_1 + v_3\). Similarly, if we add together the first three columns of \(\partial_2\), which correspond to \(f_1\), \(f_2\), and \(f_3\), the result is a vector with 1s in the first, second, and fourth entries, which correspond to \(e_{12}\), \(e_{13}\), and \(e_{23}\), producing the equation \(\partial_2(f_1 + f_2 + f_3) = e_{12} + e_{13} + e_{23}\). This demonstrates that the union of three of the faces has the same boundary as the fourth face. The sum of all four columns of \(\partial_2\) has all 0s for its entries, showing that the four faces of the tetrahedron, taken together, have no boundary. </p> <h2>How to extract information from the boundary matrix</h2> <p>Having illustrated some computations with boundary matrices in the above example, let’s codify some definitions. A collection of \(k\)-cells is called a <em>\(k\)-cycle</em> (or <em>closed</em>) if the sum of the corresponding columns of \(\partial_k\) is the zero vector. (This is a formal way of saying “has no boundary.”) A collection of \(k\)-cells is called a <em>\(k\)-boundary</em> (or <em>exact</em>) if it can be obtained as a sum of columns of \(\partial_{k+1}\). In linear algebra terms, a \(k\)-cycle is an element of the kernel of \(\partial_k\), and a \(k\)-boundary is an element of the image of \(\partial_{k+1}\). Again, the benefit of working modulo 2 is that these conditions can be easily checked. The set of \(k\)-boundaries is denoted \(B_k\), and the set of \(k\)-cycles is denoted \(Z_k\) (the notation \(C_k\) generally being reserved for \(k\)-chains). </p> <p>A fundamental property is that \(\partial_k \partial_{k+1} = 0\), which has the satisfying geometric interpretation that “every \(k\)-boundary is a \(k\)-cycle,” or \(B_k \subseteq Z_k\). This property can be checked directly in the above example of the tetrahedron. In general, it applies because, in a \(k\)-dimensional polytope, each \((k-2)\)-dimensional face appears in two \((k-1)\)-dimensional faces (provided \(k \ge 2\); if \(k=1\), then there are no \((k-2)\)-dimensional faces, so \(\partial_0 = 0\), and the property \(\partial_0 \partial_1 = 0\) holds trivially). From the perspective of homology, this means boundaries aren’t “interesting” cycles. They’re the <em>boundaries</em> of something, after all, so they certainly don’t enclose a “hole.” </p> <p>What we really want to measure, then, is <u>how many cycles are not boundaries</u>. To determine this, we first need to find out how many cycles and how many boundaries there are. Except we can add cycles together to get new cycles (in linear algebra terms, the kernel of a matrix is a subspace of the domain), and we can add boundaries to get new boundaries (the image of a matrix is also a subspace), so what we really want is to know how many <em>independent</em> cycles there are: that is, we want the <em>dimension</em> or <em>rank</em> of the set of cycles and the set of boundaries. I’ll use rank here, even though we’re working with vector spaces, because that terminology transfers to the case of integral homology. </p> <p>The rank of the \(k\)-boundaries is the <a href="https://en.wikipedia.org/wiki/Rank_(linear_algebra)" target="_blank">rank</a> of \(\partial_{k+1}\), because by definition this describes the maximal number of independent boundaries of \((k+1)\)-chains. On the other hand, the rank of the \(k\)-cycles is the <a href="https://en.wikipedia.org/wiki/Kernel_(linear_algebra)" target="_blank">nullity</a> of \(\partial_k\), because this measures the maximal number of independent \(k\)-chains with no boundary. From linear algebra, we know that the rank of a matrix can be determined by <a href="https://en.wikibooks.org/wiki/Linear_Algebra/Row_Reduction_and_Echelon_Forms" target="_blank">row reducing to echelon form</a> and counting the number of rows (equivalently, columns) that have leading ones. </p> <p>Homology gets its name from the notion of homologous cycles (“homologous” meaning, etymologically, <a href="http://www.etymonline.com/index.php?term=homologous" target="_blank">“having the same position or structure”</a>). Two \(k\)-cycles are <em>homologous</em> if their difference is a \(k\)-boundary. Modulo 2, the difference of two objects is the same as their sum, so this just means that two cycles are homologous if, when we put them together, they form the boundary of an object of one higher dimension. Boundaries are “homologically trivial” because, by definition, they are homologous to the chain consisting of no cells, \(0\). The <em>\(k\)th homology</em> of \(X\) is the <a href="https://en.wikipedia.org/wiki/Equivalence_class" target="_blank">quotient (group, vector space, module, etc.)</a> of the cycles and the boundaries: \[ H_k = Z_k/B_k. \] The associated numeric invariant is the <em>\(k\)th Betti number</em> \(\beta_k\) of \(X\), which is the rank of the \(k\)th homology. It can thus be computed as the difference between the rank of the \(k\)-cycles and that of the \(k\)-boundaries: \[ \beta_k = \mathrm{rank}\,Z_k - \mathrm{rank}\,B_k. \] This is the number that “counts” the “\(k\)-dimensional holes” in our space \(X\). Note that this is an ordinary natural number, not an integer modulo 2. However, when working modulo 2, the Betti numbers entirely determine the homology, up to isomorphism. (In ordinary, integral homology, this is not the case: homology may have “torsion” elements, while the Betti numbers only count the “free” part of homology. The integral homology determines the mod 2 homology, but the reverse is not true, so homology modulo 2 is undoubtably “weaker,” and there are certainly times one would want the full theory. However, I hope this post is illustrating the benefits of using homology modulo 2 as a shortcut for introducing the key concepts.) </p> <h2>Examples of homology</h2> <p>Let’s return to the example of the tetrahedron. Using \(\sim\) for row equivalence, we have <center>\( \partial_1 \sim \begin{bmatrix} 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \) and \(\partial_2 \sim \begin{bmatrix} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\). </center>The rank of both matrices is \(3\). The nullity of the first matrix is \(6 - 3 = 3\), and the nullity of the second matrix is \(4 - 3 = 1\). Thus we have \[ \mathrm{rank}\,Z_1 = 3, \qquad \mathrm{rank}\,B_1 = 3, \qquad \mathrm{rank}\,Z_2 = 1, \qquad \mathrm{rank}\,B_2 = 0, \qquad \] the last quantity following from the fact that there are no 3-cells in the empty tetrahedron. We also know that \[ \mathrm{rank}\,Z_0 = 4, \qquad \mathrm{rank}\,B_0 = 3, \qquad \] with the first of this pair of equations coming from the fact that a point has no boundary. The Betti numbers of tetrahedron are \[ \beta_0 = 4 - 3 = 1, \qquad \beta_1 = 3 - 3 = 0, \qquad \beta_2 = 1 - 0 = 1. \] Here is a geometric interpretation of these numbers, in reverse order. <ul><li>The equation \(\beta_2 = 1\) means that there is one independent 2-cycle which is not a boundary. The reduced form of \(\partial_2\) shows that this cycle is \(f_1 + f_2 + f_3 + f_4\), i.e., the sum of all the faces of the tetrahedron. Thus, when we take all the faces together, the result is a closed cycle, and no other combination of faces has an empty boundary. Roughly speaking, the entire tetrahedron encloses a “hole.”</li><li>The equation \(\beta_1 = 0\) can be read as “every 1-cycle is a 1-boundary.” A stronger form of this statement is that the tetrahedron is <a href="https://en.wikipedia.org/wiki/Simply_connected_space" target="_blank">simply connected</a>—every loop can be contracted to a point, or every closed loop on the tetrahedron is the boundary of something 2-dimensional. Roughly speaking, there are no holes on the surface of the tetrahedron.</li><li>The “holes” measured by the 0th homology are of a somewhat different type. Generally speaking, the Betti number \(\beta_0\) measures the number of connected components. Because any point has no boundary on its own (hence is a 0-cycle), two vertices are are boundary if and only if they can be joined by a path of edges. Thus the equation \(\beta_0 = 1\) simply means that the tetrahedron is connected.</li></ul></p> <p>Now let’s turn to the example of the torus, formed from a rectangle by identifying opposite sides. This space has one 2-cell \(f\) (the interior of the torus), two 1-cells \(e_1\) and \(e_2\) (the edges of the rectangle, after being identified in pairs), and one 0-cell \(v\) (all four vertices of the rectangle become a single point on the torus). Each edge \(e_i\) appears twice on the boundary of \(f\), and the vertex \(v\) appears at both ends of each edge, so the boundary matrices are \[ \partial_1 = \begin{bmatrix} 0 & 0 \end{bmatrix}, \qquad\qquad \partial_2 = \begin{bmatrix} 0 \\ 0 \end{bmatrix}. \] Thus every \(k\)-chain has an empty boundary for \(k = 0, 1, 2\), and the rank of the \(k\)-cycles equals the number of \(k\)-cells. The interpretations of \(\beta_0 = 1\) and \(\beta_2 = 1\) are the same as in the case of the tetrahedron. In this case, the equation \(\beta_1 = 2\) tells us there are two different, independent 1-cycles, which can be represented by a latitude circle and a longitude circle on the torus. </p> <h3>Footnote on topological spaces</h3> <p>A few words justifying the restriction to polytopal complexes: When I was in <a href="https://www.math.cornell.edu/~hatcher/AT/ATpage.html" target="_blank">Hatcher’s algebraic topology</a> class, he chose to introduce cellular homology first so that we could get to computations quickly; later he introduced singular homology mainly to prove that the homology groups only depend on the underlying topological space. It thus seems entirely reasonable to me, for purposes of introduction, to work directly with CW complexes. The <a href="https://www.math.cornell.edu/~hatcher/AT/ATapp.pdf" target="_blank">appendix to Hatcher’s book</a> is a standard reference for learning about CW complexes, but in practice a CW complex usually means a <a href="https://en.wikipedia.org/wiki/Topological_space" target="_blank">topological space</a> that is assembled from convex polytopes, attached along their faces. </p> <h3>Another introductory source on homology for undergraduates</h3> <p>I recently came across <a href="http://www.cambridge.org/us/academic/subjects/mathematics/geometry-and-topology/graphs-surfaces-and-homology-3rd-edition?format=PB" target="_blank">Peter Giblin’s book <em>Graphs, Surfaces and Homology</em></a>, which provides a very thorough introduction to its eponymous topics with only the prerequisite of linear algebra. However, like most treatments of homology, it first deals with integral homology, then comes around to homology modulo 2 late in the book, in Chapter 8, specifically to deal with non-orientable (or at least unoriented) surfaces and simplicial complexes. Gibson describes the theory of homology modulo 2 as “satisfactory” but “weaker than the theory with integral coefficients,” which is absolutely true. </p> <p>However, if one’s goal is either to learn about homology quickly or to study new spaces (rather than, say, to prove the classification of surfaces), then I think homology modulo 2 is perfectly sufficient, particularly since the contemporary field of persistent homology, applied to study data sets in large dimensions, often works with homology modulo 2. (See <a href="https://users.cs.duke.edu/~edels/Papers/2008-B-02-PersistentHomology.pdf" target="_blank">this survey</a>, or the remark on page 7 of <a href="http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf" target="_blank">this overview</a>, for instance.) </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-16977259576345864692016-05-21T17:37:00.000-07:002016-05-22T16:49:47.220-07:00Snell and Escher<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script> <p>A few weeks ago, <a href="https://twitter.com/3Blue1Brown" target="_blank">Grant Sanderson</a> posted a <a href="https://youtu.be/Cld0p3a43fU" target="_blank">video on the brachistochrone</a>, with guest <a href="https://twitter.com/stevenstrogatz" target="_blank">Steven Strogatz</a>. <br><center><iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/Cld0p3a43fU" width="560"></iframe></center><br>The video explains Johann Bernoulli’s solution to the problem of finding the brachistochrone, which is a clever application of <a href="https://en.wikipedia.org/wiki/Snell%27s_law" target="_blank">Snell’s Law</a>. I immediately wondered if a similar application could be used to explain the behavior of <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_half-plane_model" target="_blank">geodesics in the hyperbolic plane</a>, which it turns out is true. I’m not the first to think of this, but it doesn’t seem to be well-known, so that’s what I’ll try to explain in this post. This may become my standard way of introducing hyperbolic geometry in informal settings, i.e., when formulas aren’t needed. (As an example of another exposition that describes hyperbolic geodesics this way, see the <a href="http://people.ucsc.edu/~rmont/classes/clGeom2013/Lectures/" target="_blank">lecture notes</a> for <a href="http://people.ucsc.edu/~rmont/classes/clGeom2013/" target="_blank">this geometry course</a>.) </p> <p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-VckBpLCUftY/Vxu2ulGc1gI/AAAAAAAACUQ/q6Vrv8zJiB0Ss4EwgTXdkHbL_P_HS0HsACLcB/s1600/Snells_law2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-VckBpLCUftY/Vxu2ulGc1gI/AAAAAAAACUQ/q6Vrv8zJiB0Ss4EwgTXdkHbL_P_HS0HsACLcB/s320/Snells_law2.png" /></a></div>Snell’s Law, as represented in the above diagram (<a href="https://en.wikipedia.org/wiki/Snell%27s_law#/media/File:Snells_law2.svg" target="_blank">image source</a>), applies to light traveling from one medium to another, where the interface between the two is horizontal. If light travels at speed \(v_1\) in the first medium and \(v_2\) in the second medium, and its trajectory meets the interface at an angle of \(\theta_1\) and leaves at an angle of \(\theta_2\) (both angles measured with respect to the vertical), then \[ \frac{\sin\theta_1}{v_1} = \frac{\sin\theta_2}{v_2}. \] This is the case of two distinct media. Snell’s Law has a continuous version (derived from the discrete one by a limiting process, as suggested in the video). Suppose light is traveling through a medium with the property that the speed of light at each point depends on the vertical position of the point. That is, the speed of light in this medium at a point \((x,y)\) is a function \(v(y)\), which may vary continuously. At each point of a trajectory of light in this medium, let \(\theta\) be the angle formed by the direction of the trajectory (i.e., the tangent line) and the vertical. Then the quantity \[ \frac{\sin\theta}{v(y)} \] is constant along the trajectory. </p> <p>So suppose we are looking at a medium that covers the half-plane \(y > 0\), in which light travels at a speed proportional to the distance from the \(x\)-axis: \(v(y) = cy\). (The constant \(c\) may be thought of as the usual speed of light in a vacuum, so that along the line \(y = 1\) light moves at the speed we expect. As we shall see, this is analogous to the fact that distances along the line \(y = 1\) in the hyperbolic metric match Euclidean distances. Of course, it also means that light moves faster than \(c\) above this line, which is physically impossible, but we’re doing a thought experiment, so we’ll allow it.) If we imagine someone living inside this medium trying to look at an object, what direction should they face? </p> <p>From our outside perspective, it seems that the observer should look “directly at” the object, in a straight (Euclidean) line. However, in this medium light does not travel along Euclidean line segments, but instead along curved arcs, as illustrated below. <br><center><div class="separator" style="clear: both; text-align: center;"><a href="https://www.desmos.com/calculator/pdbcgxevrk" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" src="https://3.bp.blogspot.com/-orhDCjmbnDg/VziDgLkfrwI/AAAAAAAACVM/NCD8O-S38Ygo4JH4kWoMp2pb59cnBtGmQCLcB/s320/desmos_hyp_geod.png" /></a></div><em>Click on the graph to go to an interactive version.</em></center><br>It’s not too surprising that light follows a path something like this if it’s trying to minimize the time it takes to travel from the object to the observer: the light travels faster at higher vertical positions, so it’s worth going up at least slightly to take advantage of this property, and it’s also worth descending somewhat sharply so as to spend as little time as possible in the lower, slower regions. </p> <p>What may come as a surprise is that the path of least time is precisely a circular arc. With Snell’s Law, however, this fact can be derived quickly. We have that \(v(y) = cy\), and so along a light trajectory \[ \frac{\sin\theta}{cy} = \text{constant}. \] Multiplying both sides by \(c\), we find that \(\frac{\sin\theta}{y}\) is also a constant. If this constant is zero, then \(\theta = 0\) constantly, so the path is a vertical segment. Otherwise, call this constant \(\frac{1}{R}\). Then \(y = R \sin\theta\). Now set \(x = a + R \cos \theta\). The curve \[ (x,y) = (a + R \cos\theta, R \sin\theta) \] parametrizes a circle centered at \((a,0)\) by the angle between the \(x\)-axis and the diameter. It remains to see that this angle \(\theta\) is the same as the angle between the vertical direction and the tangent line at the corresponding point of the circle. This equality can be shown in any number of ways from the diagram below. <br><center><div class="separator" style="clear: both; text-align: center;"><a href="https://www.desmos.com/calculator/yh4ysj28as" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" src="https://3.bp.blogspot.com/-jX6AudRysz4/VzpBFOhPmpI/AAAAAAAACVw/B3BGoH90iH4iuMzE3bsklnEDGmWfFiM4ACLcB/s320/desmos_cong_angles.png" /></a></div><em>Click on the graph to go to an interactive version.</em></center><br>This is <em>not</em> to say that this parametrization describes the speed at which light moves along the path. As previously observed, light slows as it approaches the horizontal boundary, that is, the \(x\)-axis. </p> <p>But perhaps we’ve been prejudiced in assuming our perspective is the right one. We’ve been looking with our Euclidean vision and supposing light moves at different speeds depending on where it is in this half-plane. Thus it seems to us that light covers Euclidean distances more quickly the further it gets from the \(x\)-axis. But relativity teaches us that distance isn’t absolute: instead, the speed of light is what’s absolute. So perhaps we could gain greater insight by measuring the distance between points according to how long it takes light to travel between them. That is, we <em>assume</em> that the paths determined above are the geodesics of the half-plane, and by doing so we learn to “see” hyperbolically. Then we are not troubled by looking at an image like <div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-tgnebmVBXAs/V0D8vrKujXI/AAAAAAAACWc/O3fPau8NwPQhGWlXHOTS0zZ1iYuqLtvLgCLcB/s1600/tiling_425_uhp.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-tgnebmVBXAs/V0D8vrKujXI/AAAAAAAACWc/O3fPau8NwPQhGWlXHOTS0zZ1iYuqLtvLgCLcB/s400/tiling_425_uhp.gif" /></a></div>(<a href="http://bulatov.org/math/1001/" target="_blank">image source</a>) and being told that all of the pentagonal shapes are the same size, because we’ve learned to look at things with our hyperbolic geometry glasses on. </p> <p>M. C. Escher illustrated (or, more accurately, approximated) the hyperbolic geometry of the upper half-plane with his print <em>Regular Division of the Plane VI</em> (1958), shown below (<a href="http://www.wikiart.org/en/m-c-escher/regular-division-of-the-plane-vi" target="_blank">image source</a>). <br><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-kKu1CxvKCfw/Vzj14rter9I/AAAAAAAACVg/E8YStYahrAoN0-l-nteT3LQkz6psyuHRQCLcB/s1600/Escher6.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-kKu1CxvKCfw/Vzj14rter9I/AAAAAAAACVg/E8YStYahrAoN0-l-nteT3LQkz6psyuHRQCLcB/s400/Escher6.jpg" /></a></div>This design was created during a time Escher was attempting to visually depict infinity. It was shortly before he had encountered the <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model" target="_blank">Poincaré disk</a> in a paper by Coxeter, which discovery led to the <a href="http://euler.slu.edu/escher/index.php/Escher%27s_Circle_Limit_Exploration" target="_blank"><em>Circle Limit</em></a> series. In this print, the geometry of each lizard is Euclidean, structured around an isosceles right triangle. Each horizontal “layer” has two sizes of triangles, one scaled down from the other by a factor of \(\sqrt{2}\). The side lengths of the triangles in one layer are one-half of those in the layer above, so the heights of layers converge geometrically to the horizontal boundary at the bottom. Some of the triangles are outlined in the next image. <div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-3jPUmDw2X0c/V0DxzwxlQNI/AAAAAAAACWI/hkyLy1gik_AoNnaZlldF6z6awb8Wsmm1gCLcB/s1600/Escher6-tiles.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-3jPUmDw2X0c/V0DxzwxlQNI/AAAAAAAACWI/hkyLy1gik_AoNnaZlldF6z6awb8Wsmm1gCLcB/s400/Escher6-tiles.png" /></a></div></p> <p>Some questions I have about Escher’s print: <ul><li>How different would this image look if it were drawn according to proper hyperbolic rules, with each lizard having reflectional symmetry, and each meeting of “elbows” having true threefold symmetry? (This would give the tessellation with Schläfli symbol {3,8}, an <a href="https://en.wikipedia.org/wiki/Order-8_triangular_tiling" target="_blank">order-8 triangular tiling</a>.)</li><li>If we suppose that the right triangles act as prisms, with light moving at a constant speed inside each one, but this speed being proportional to the square root of the triangle’s area, then what will the trajectories of light look like as it moves through the plane? Will they approximately follow circles?</li><li>How many lizards are in the picture?</li></ul></p> <p><b>Coda:</b> <a href="http://www.josleys.com/index.php" target="_blank">Jos Leys</a> has <a href="http://www.josleys.com/show_gallery.php?galid=325" target="_blank">taken some of Escher’s Euclidean tessellations and converted them to hyperbolic ones</a>, in both the disk and the half-plane model. </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com2tag:blogger.com,1999:blog-30611202.post-88770204618705300112016-04-02T22:18:00.001-07:002016-04-02T23:14:57.934-07:00using calculus to understand the world<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p>In my <a href="http://thalestriangles.blogspot.com/2015/11/remove-antithesis.html">last post</a>, I wrote about how I returned to teaching related rates in my calculus class and ranted a bit about the inanity of most related rates problems. There I mainly discussed the difficulty in reading the statement of such problems and how to make the questions they raise seem more natural. I’d like to expand on this theme with some more examples.</p> <p>One feature of mathematics that doesn’t get emphasized enough, IMHO, is that it is a <em>science,</em> and as such is based in <em>observation.</em> Often, either we lead students through abstract reasoning to a previously unanticipated result, or we prove things that are so self-evident that the notion they need proof is itself baffling. Now, in the world of professional mathematics, it is true that even apparently obvious facts need proving (remember that “to prove” just means “to test”), and we often do get excited when we are led to something unexpected and beautiful. That is because we have learned how to use and trust our logical skills to examine the truth of something, and we delight in the uncovering of new truth by means of those skills. But even when a result is surprising to an audience, and even if it was at first surprising to the speaker, it is no longer so. A mathematical researcher plays with ideas until she notices something interesting, and then she tries to understand why it is so. That’s the exciting part of math, and that is what I believe we can share with our students through the process of modeling.</p> <p>My goal in teaching related rates has become to ground as many questions as possible in direct observation. When I ask about the sliding ladder, as I described in my <a href="http://thalestriangles.blogspot.com/2015/11/remove-antithesis.html">last post</a>, before setting up the math but after asking students what they think will happen, I demonstrate by leaning a ruler against a book and slowly pulling the bottom end away. What one notices in this experiment is that the top end of the ruler moves <em>very slowly</em> at first, and <em>very quickly</em> just before reaching the ground. The speed in the final moment is so great that one is tempted to think the person pulling the bottom end lost control, and gravity took over. (This is even more credible when using the much larger, heavier ladder in a demonstration.) But the math shows that even if the person keeps complete control and moves at precisely the same speed, the same effect will occur. Let’s see why.</p> <p>The exact length of the ladder doesn’t matter, of course, so call it $L$. If $x$ measures the distance from the wall to the bottom end of the ladder and $y$ measures the distance from the floor to the top of the ladder, then we have $x^2 + y^2 = L^2$. Then we differentiate both sides with respect to time and get $2x\frac{dx}{dt} + 2y\frac{dy}{dt} = 0$, or \[ \frac{dy}{dt} = -\frac{x}{y} \frac{dx}{dt}. \] At this point most related rates problems would ask you about the size of $dy/dt$ for some particular values of $x$, $y$, and $dx/dt$, but look at how much we can determine just from this related rates equation: when $y$ is larger than $x$, the top end is moving more slowly than the bottom end, and conversely when $x$ is greater than $y$, the top end is moving more quickly than the bottom end. There is just one moment when the two ends are moving at the same speed, which is when $y = x$, or in other words, when the ladder is at a 45 degree angle. And as the distance between the top end and the floor approaches zero, the speed of the top end approaches infinity. (Not physically possible, of course, but it explains why there’s such a quick movement at the end of the process.) There; now I feel like I’ve learned something!</p> <p>I have five more examples to illustrate how much more interesting I think related rates are when tied to direct observation. This will probably belabor the point, but unfortunately these examples are also stripped of any interest by focusing too much on a single moment in time, which is what every standard textbook does with them.</p> <p>The next example involves inflating a balloon. This, again, is easy to demonstrate. I can’t take a deep enough breath to fill the whole balloon at once, but even with two puffs, exhaled at a near-constant rate, it’s obvious to students that the size (i.e., diameter, or radius) of the balloon grows more quickly at first, then more slowly. Anyone who’s worked with an air or helium tank has surely experienced this phenomenon. Why is this happening? And how much more slowly is the diameter increasing as time goes on? Here there’s very little modeling involved; essentially the entire model is provided by assuming the balloon is a sphere and using the formula for the volume of a sphere in terms of its radius, $V = \frac{4}{3}\pi r^3$. Differentiating with respect to time gives the relation $\frac{dV}{dt} = 4\pi r^2 \frac{dr}{dt}$, so \[ \frac{dr}{dt} = \frac{1}{4\pi r^2}\frac{dV}{dt}. \] Some books reach this equation, but as with the ladder they again jump to plugging in values for a specific time, rather than noting the following: if $dV/dt$ is constant, then $dr/dt$ is inversely proportional to the <em>square</em> of the radius! And even more, $4\pi r^2$ is the <em>surface area</em> of a sphere with radius $r$, so this relationship between rates is <em>directly related</em> to the fact that the derivative of the volume of a sphere with respect to its radius is the surface area! That is, the size of the surface of the balloon is what determines, together with the rate the volume is increasing, how quickly the radius is increasing.</p> <p>A similar phenomenon happens with the standard filling-an-inverted-cone problem. The demonstration here involves a martini glass and some colored water. (As I promise my students when doing this experiment, it’s just water.) My martini glass is about 12 cm across on top, and about 8 cm deep, giving it a volume of 300 milliliters. (That’s about 10 ounces, the size of two regular martinis—you don’t want to fill this glass with gin and drink it too quickly.) Having a nice big glass is useful for this demonstration: if I pour at a constant rate, the water level rises much more slowly near the top than near the bottom. The math shows just how much more slowly. The volume of a cone with height $h$ and base radius $r$ is $V = \frac{1}{3}\pi r^2 h$. From the geometry of this situation (using similar triangles, for instance), for my glass the radius of the surface of the water is always three-quarters of the water’s depth (here interpreted as height). We could use the relation $r = \frac{3}{4}h$ and substitute into the volume formula to get rid of the variable $r$, but there’s also no harm (as my students taught me) in differentiating first, using the product rule: \[ \frac{dV}{dt} = \frac{\pi}{3} \left( 2rh\frac{dr}{dt} + r^2\frac{dh}{dt}\right). \] Notice that this formula is valid for <em>all</em> cones varying in height, radius, and volume, whether or not the height and radius are linearly related at all times. The most obvious quantity of interest (assuming constant $dV/dt$) is $dh/dt$. From $r = \frac{3}{4}h$ we get $\frac{dr}{dt} = \frac{3}{4}\frac{dh}{dt}$, and also $h = \frac{4}{3}r$. The reason to solve for both of these quantities is that, by keeping both $dh/dt$ and $r$ in the equation and substituting out $h$ and $dr/dt$, we get $\frac{dV}{dt} = \frac{\pi}{3}\left(2r^2\frac{dh}{dt}+r^2\frac{dh}{dt}\right)$, or, after solving for $dh/dt$, \[ \frac{dh}{dt} = \frac{1}{\pi r^2} \frac{dV}{dt}. \] First of all, the ratio between the height and the radius has disappeared, so this formula now works for any inverted cone, not just my martini glass. And second of all, just as with the balloon, the rate at which the height increases depends on the “surface area” that is expanding, which in this case is just the base of the cone! Thus, again, the reason the water level rises more slowly near the top of the glass has a clear geometric interpretation. (Here’s a <b>real-world application</b>: I argue this works to the benefit of bartenders, who can pour into a martini glass fairly quickly without risk of overflowing, because the beverage level rises slowly near the top of the glass.)</p> <p>I took the next two examples from Cornell’s <a href="http://www.math.cornell.edu/~GoodQuestions/materials.html" target="_blank">Good Questions Project</a>; come to think of it, it may be these questions that first planted in my head the idea of looking at related rates problems over time, without numbers. The situations are again standard for related rates problems, but the conclusions are much more interesting than a single rate at a single moment.</p> <p>Consider an actor (say, Benedict) on a stage, illuminated by a light at the foot of the stage. Benedict casts a shadow on the back wall; how does the length of his shadow vary if he walks towards the light at a constant speed? The demonstration of this situation is particularly exciting, because you get to turn off the classroom lights, pull out a flashlight and a doll or figurine, and watch what happens to the shadow of the doll/figurine/actor on the wall as it moves towards the flashlight. Students observe that at first the shadow grows slowly (when the figure is close to the wall), then more quickly as he approaches the light. Modeling this situation generally provides the first major geometric hurdle for my students, because it involves the imagined line that emanates from the light, passes by Benedict’s head, and finally reaches the back wall, thereby determining the height of the shadow. (I wonder if many of them have never thought about the geometry of how shadows relate to the objects that cast them.) I’ll let the reader work out the fact that, if Benedict’s height is $h$, the distance from the light to the back wall is $D$, the distance from Benedict to the light is $x$, and the height of the shadow is $s$, then $\frac{s}{D} = \frac{h}{x}$. (Hint: use similar triangles.) Here the only variables are $x$ and $s$, so the related rates equation is \[ \frac{ds}{dt} = -\frac{hD}{x^2}\frac{dx}{dt}. \] Students are at first perplexed by the negative sign: shouldn’t the shadow be increasing? If so, why does its derivative appear to be negative? Then they realize: ah, if Benedict is walking <em>towards</em> the light, then $dx/dt$ is negative, so $ds/dt$ is in fact positive! And so it becomes clear that the height of the shadow increases much more rapidly when Benedict is near the light than when he is near the wall. (I generally give specific values for the height of the actor and the distance from the wall to the light in this question, so that it’s more obvious which values are constant.)</p> <p>I don’t have a standard demonstration for this next problem, because I use it as a quiz question (although maybe not anymore, now that I’ve written about it here), but it’s easy enough to devise an experiment. This situation is similar enough to the previous one that its result is a bit surprising. Suppose a streetlight at height $L$ is the only source of illumination nearby, and a woman (say, Agatha) of height $h$ walks at a constant speed away from the light. As she gets farther away from the light, does her shadow grow more quickly, more slowly, or does it grow at a constant rate? If $x$ again denotes the distance to the light (well, really from Agatha’s feet to the base of the lamp, which is not the same as her distance to the source of illumination), and $s$ is the length of Agatha’s shadow, then similar triangles produce the relation $\frac{s}{h} = \frac{s + x}{L}$. We can rearrange this into a simple proportion between $s$ and $x$: $s = \frac{h}{L-h} x$. (Here’s an interesting feature of this equation already: it only makes sense if $h < L$, that is, if Agatha is shorter than the lamppost!) Now we differentiate to get \[ \frac{ds}{dt} = \frac{h}{L - h} \frac{dx}{dt}. \] So if Agatha’s speed is constant, then her shadow’s length is also increasing at a constant rate. This example shows especially well why it’s dumb to look at related rates at a single moment in time. Most book exercises of this sort ask how quickly the shadow is growing when Agatha is at a particular distance from the lamp. <em>But it doesn’t matter how far away she is,</em> and the math proves that it doesn’t matter.</p> <p>There’s a risk in related rates exercises to always resort to problems that only involve differentiating polynomials, so here’s an example that uses trigonometric functions. The demonstration I use: I walk back in forth in front of the class and tell the students to be mindful of what their heads do as they follow my movement. After a couple of times, several of them observe that their heads must turn more quickly when I’m closer to them. I point out that this is something anyone who’s had to run a video camera at a race must be aware of. (It’s also apparent to someone riding in the passenger seat of a car, keeping their gaze fixed on a single tree or other immobile object: for a long time, your head turns little, but when you’re close to the object, you have to turn quickly to keep it in view.) I generally set up the problem on the board as though it is taking place at a racetrack. Suppose a runner is moving along a track (let’s assume it’s straight for simplicity) at $v$ feet per second. You’re watching from a position $D$ feet away from the track. How quickly does your head need to turn to keep following the runner? The answer depends on how far away the runner is. One has to introduce a reasonable coordinate system and some useful variables: good choices are the position $x$ of the runner relative to the point of the track closest to you, and the angle $\theta$ by which your head is turned from looking at this closest point. Then we get the relation $\tan\theta = \frac{x}{D}$, and differentiating with respect to time results in the equation $\sec^2\theta \frac{d\theta}{dt} = \frac{1}{D} \frac{dx}{dt}$, or \[ \frac{d\theta}{dt} = \frac{v}{D} \cos^2\theta \] (using the assumption that $dx/dt = v$). When $\theta = 0$, so that the runner is closest to you, the rate at which your head turns is $v/D$, which depends only on how fast the runner is going and how far away from the track you are. (Notice that the units work out: the radian measure of an angle is technically dimensionless, and so we expect its rate of change not to have any dimension other than 1/time. Since $v$ has dimension of distance/time and $D$ has the dimension of distance, $v/D$ has the dimension 1/time.) As $\theta$ increases (in this scenario, $\theta$ is never greater than a right angle), the change in the angle of your head to follow the runner happens more slowly, because $\cos^2\theta$ is closer to zero.</p> <p>These are just a few examples of standard situations involving related rates that become much more interesting when the myopic attention to a single moment in time is removed. I’m sure most readers of this post can do the calculations I’ve shown on their own, but the tendency to hone in on a single rate at a single point in time is so entrenched that I wanted to show how much more interesting related rates become when that element is removed. I don’t know that my students are better at solving related rates problems than other students, but I have noticed that they’re much less likely to insert specific quantities into a relation before it’s necessary than when I taught the subject years ago. I haven’t had time to strip all such problems of the detritus that comes with wanting a numeric answer, but I believe our understanding (and our calculus students’ understanding) of the world will be much improved by making the effort to transform these problems into meaningful questions.</p> <p>Here are two other examples that I won’t work out in detail. One scenario has a boat being pulled into a dock by a rope attached to a pulley elevated some distance above the boat. If the rope is pulled at a constant rate, the boat in fact <em>speeds up</em> as it approaches the dock! (I tried demonstrating this once with a string tied to a stuffed animal pulled across a desk, with moderate success.) Another common type of problem considers two boats moving in perpendicular directions (or cars moving along perpendicular roads), and asks at a certain point in time whether the distance between them is increasing or decreasing. That’s silly. Why not establish the relation between them, and ask at what times the distance is increasing, and at what times the distance is decreasing? If there’s a time when the rate of change in distance is zero, then the boats (or cars) are at their closest (or farthest) positions, which connects to the study of optimization, which has its own set of issues…</p><br><p>P.S. I should have known better than to look at <a href="https://www.khanacademy.org/math/differential-calculus/derivative-applications/rates-of-change/" target="_blank">Khan Academy’s treatment of related rates</a>. His videos show all the marks of what is classically wrong with these problems: the irrelevant information of what variables equal at a single moment in time is presented up front along with everything that’s constant in the situation, and in the end the answer is a single, uninformative number. Even when an interesting equation is present on the screen, Khan rushes past it to get to the final number. How can we get our students to ask and answer more interesting questions than these, about the same situations?</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com1tag:blogger.com,1999:blog-30611202.post-58686923498244217662015-11-02T21:46:00.000-08:002015-11-02T21:59:47.833-08:00remove the antithesis<p>Today, for the first time in years, I included a <a href="http://tutorial.math.lamar.edu/Classes/CalcI/RelatedRates.aspx" target="_blank">related rates</a> lesson in my calculus class. I had never liked related rates, and when I got my own class and could create my own syllabus, I dropped the topic. This fall I’m at a new school, though, and I decided while revamping my course plans to give related rates another shot.</p> <p>Background: related rates didn’t sit well with me for a long time before I could enunciate why. Then I learned about the notion of <a href="https://nrich.maths.org/7701" target="_blank">“low-threshold, high-ceiling” tasks</a>, which provide multiple levels of entry for students, as well as a lot of space for growth and exploration. I realized that the classic related rates problems fail both tests. They generally have a <em>high</em> threshold, because students have to understand the entire process of translating the word problem into symbols, then differentiating, then solving, before they have any measurable confidence that they can begin such a problem. And they generally also have a <em>low</em> ceiling, because one the immediate question has been answered, there is no enticement to do further analysis, or even any indication such analysis is possible.</p> <p>As an example, consider the problem of the sliding ladder. This is included in almost every textbook section on related rates, in almost exactly the following form. <blockquote>A 10-foot long ladder is leaning against a wall. If the bottom of the ladder is sliding away from the wall at 1 foot per second when it is six feet away from the wall, how quickly is the top of the ladder sliding downward at that instant?</blockquote>Now, that is an incredibly difficult problem to read. Some books may have slightly better phrasing (I decided not to quote any book in particular, so as not to single out just one malefactor), but the gist is the same. Before you’ve even gotten a sense of what the situation is and what’s changing, you’re asked a question involving bits of data that seem to come out of nowhere, and whose answer is completely uninspiring.</p> <p>Like I said, for a few years my solution was to avoid these types of problems entirely. I had seen too many students struggle to set up these problems and go through the motions of solving them, only to get a single number at the end that showed nothing other than their ability to set up and solve a contrived problem. What I realized while preparing for today’s class is that, when the problem is done, you don’t feel like you’ve <em>learned</em> anything about how the world works. Calculus is supposed to be about <em>change</em>, yet the problem above feels static because it only captures a single moment in a process. A static answer is antithetical to the subject of calculus. Moreover, most related rates problems arise out of nowhere, flinging information at the reader willy-nilly to answer a single question, despite a decidedly unnatural feel to these questions. Unnatural questions are antithetical to mathematics. So I decided to remove these elements of antithesis as best as I could.</p> <p>Here is the question I posed at the start of class today. <blockquote>A 10-foot long ladder is learning against a wall. Suppose you pull the bottom away from the wall at a rate of 1 foot per second. At the same time, the top of the ladder slides down the wall. Does it:<ul><li>slide down at a constant rate,</li><li>start out slowly, then speed up, or</li><li>start out quickly, then slow down?</li></ul></blockquote>I claim this version is both more natural and easier to start discussing than the near-ubiquitous original. It almost seems like a question one might come up with on one’s own. It’s clear what quantities are changing, and that there is a relationship between them. The process itself can be demonstrated; I used a ruler and a book, rather than bringing a ladder to class. (How would you demonstrate that instantaneous rate of change in the original problem?) No overly specific information is given. And best of all, the answer is a bit surprising, at least to some. (When I asked my students what they thought after a couple minutes of discussion, about half thought the top would start slowly, then speed up, and about half thought it would slide at a constant rate.)</p> <p>I don’t claim any originality in this idea. Probably many other excellent math teachers have made exactly this change. I may have encountered it as one of <a href="https://youtu.be/BlvKWEvKSi8" target="_blank">Dan Meyer’s examples</a>, or somewhere else, and it stuck in the back of my mind. I should emphasize that I <em>really</em> hated related rates problems, and I saw little chance of rehabilitating them. I’m glad to have realized that they can be interesting and reveal interesting things about the world, when they are restored to the state of natural questions.</p> <p>I doubt today’s lesson was perfect. I still probably talked too much and introduced symbols too quickly. But it was good enough that I’m going to keep teaching related rates in my calculus classes from now on.</p> <p>If you have other examples of this better type of related rates problem, please share in the comments!</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com1tag:blogger.com,1999:blog-30611202.post-32847804070399546732014-10-11T07:10:00.000-07:002014-10-16T06:55:58.412-07:00geometry at the fair<p>Last month, West Springfield once again hosted the Eastern States Exposition (or <a href="http://www.thebige.com/fair/" target="_blank">“The Big E”</a>), which brings together fair activities from six states: Maine, New Hampshire, Vermont, Massachusetts, Connecticut, and Rhode Island. It’s great fun to attend, and includes displays of the finest crafts to have competed in county and state fairs from all over the northeastern U.S. in the past year. This means, for instance, that there are a bunch of great quilts. </p><p>Symmetry naturally plays a large part in the design of these quilts. The interplay between large-scale and small-scale, and between shapes and colors, creates aesthetic interest. This quilt, for instance, presents squares laid out in a basic tiling pattern (a <a href="http://en.wikipedia.org/wiki/Square_lattice" target="_blank">square lattice</a>). Each square contains a star-shaped figure. The star itself has fourfold <a href="http://en.wikipedia.org/wiki/Dihedral_group" target="_blank">dihedral symmetry</a>, which matches the symmetry of the lattice, but the choice of colors in the stars breaks the symmetry of the reflections, resulting in <a href="https://en.wikipedia.org/wiki/Rotational_symmetry" target="_blank">cyclic (i.e., pure rotational) symmetry</a>. <div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-B5SaeT9s9AA/VDkgoN3ouFI/AAAAAAAABrA/oKt5S771aOg/s1600/IMG_0031.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-B5SaeT9s9AA/VDkgoN3ouFI/AAAAAAAABrA/oKt5S771aOg/s320/IMG_0031.jpg" /></a></div>This quilt also shows fourfold dihedral symmetry in the shapes, which is broken into cyclic symmetry by the colors. It hints at eightfold (octahedral) symmetry in some places, but this is broken into fourfold symmetry by the colors and by the relationship of these shapes to the surrounding stars. <div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-psuc9DqueL0/VDkmgoxOrAI/AAAAAAAABrU/s27fWH4mPE0/s1600/IMG_0025.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-psuc9DqueL0/VDkmgoxOrAI/AAAAAAAABrU/s27fWH4mPE0/s320/IMG_0025.jpg" /></a></div>This pattern shows fourfold cyclic symmetry at the corners, but that’s not what first caught my eye. The basic tile is a rectangle, which has the symmetry of the <a href="http://en.wikipedia.org/wiki/Klein_four-group" target="_blank">Klein four-group</a> (no, not <i>that</i> <a href="http://youtu.be/BipvGD-LCjU" target="_blank">Klein Four Group</a>). For the two quilts above, I first noticed the large-scale symmetry that was broken at the small scale; here I first saw the limited small-scale symmetry that is arranged in such a way as to produce large-scale symmetry. (I think this is because I tend to notice shapes before colors.) <div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-pmmW-9W2co4/VDkozUDilSI/AAAAAAAABrk/zbphYRAvu4c/s1600/IMG_0027.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-pmmW-9W2co4/VDkozUDilSI/AAAAAAAABrk/zbphYRAvu4c/s320/IMG_0027.jpg" /></a></div>This quilt uses the square lattice on the large scale, but varies the type of small-scale symmetry. Each square contains the same shapes, but they are colored differently so that sometimes the symmetry is dihedral, sometimes cyclic. <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-oPetaC47JJE/VDkyK0YonbI/AAAAAAAABr0/F_pCU3GjHLc/s1600/IMG_0032.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-oPetaC47JJE/VDkyK0YonbI/AAAAAAAABr0/F_pCU3GjHLc/s320/IMG_0032.jpg" /></a></div>This next quilt is geometrically clever in many ways. It has no reflection symmetries, even disregarding the colors, although the basic shapes that comprise it (squares and a shape with four curved edges, two concave and two convex, for which I have no name <b><i>Edit 10/15</i>:</b> In <a href="https://twitter.com/Thalesdisciple/status/521070961131810816" target="_blank">an amusing exchange on Twitter</a>, I learned that this shape is described among quilters as an <a href="https://www.google.com/search?q=quilt+apple+core" target="_blank">“apple core”</a>) do have reflection symmetries. (I am disregarding the straight lines that cut the <strike>curved shapes</strike> apple cores into smaller, non-symmetric pieces.) The centers of the squares lie on a lattice that matches the orientation of the sides of the quilt, but the sides of the squares are not parallel to the sides of the quilt. The introduction of curved shapes also acts in tension with the rectangular frame provided by the quilt medium. <div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-DArU_x5vdP0/VDkmgm1ZybI/AAAAAAAABrQ/tcLmAQdSKc8/s1600/IMG_0024.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-DArU_x5vdP0/VDkmgm1ZybI/AAAAAAAABrQ/tcLmAQdSKc8/s320/IMG_0024.jpg" /></a></div>Some of the quilt designs rejected fourfold symmetry altogether. Here is one based on a hexagonal lattice: <a href="http://"><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-PK97Xc07sCU/VDkzX1wKQZI/AAAAAAAABsA/WLAiNhvCa3Q/s1600/IMG_0026.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-PK97Xc07sCU/VDkzX1wKQZI/AAAAAAAABsA/WLAiNhvCa3Q/s320/IMG_0026.jpg" /></a></div>and another based on a triangular lattice: <div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-86_PNJO7OgU/VDkzXwxgXHI/AAAAAAAABsE/TosLz4lc6cE/s1600/IMG_0033.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-86_PNJO7OgU/VDkzXwxgXHI/AAAAAAAABsE/TosLz4lc6cE/s320/IMG_0033.jpg" /></a></div></a>(<a href="http://en.wikipedia.org/wiki/Hexagonal_lattice" target="_blank">These two lattices have the same symmetries.</a>) </p> <p>Here is a quilt that stands out. It appears to simply be pixellated: <div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-Xzu-M5JdUY0/VDk33sBIKmI/AAAAAAAABsw/IbHwvOfdBw8/s1600/IMG_0038.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-Xzu-M5JdUY0/VDk33sBIKmI/AAAAAAAABsw/IbHwvOfdBw8/s320/IMG_0038.jpg" /></a></div>but if you look closely, you’ll see that the “pixels” are not squares, but miniature trapezoids. <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-lkoNzv-P5c0/VDk33KaD9mI/AAAAAAAABso/2Ic1hd-qnnM/s1600/IMG_0038%2Bdetail.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-lkoNzv-P5c0/VDk33KaD9mI/AAAAAAAABso/2Ic1hd-qnnM/s320/IMG_0038%2Bdetail.jpeg" /></a></div>It therefore has <i>no</i> points that display fourfold symmetry. All rotational symmetries are of order two.</p> <p>All of the types of symmetries of the above quilts (except, perhaps, the one that used some tiles with dihedral symmetry, some with merely cyclic) can be described using <a href="http://en.wikipedia.org/wiki/Wallpaper_group" target="_blank">wallpaper groups</a>, which I leave as an exercise for the reader. </p> <p>This next design seems more topological than geometric: it is full of <a href="http://en.wikipedia.org/wiki/Knot_(mathematics)" target="_blank">knots</a> and <a href="http://en.wikipedia.org/wiki/Link_(knot_theory)" target="_blank">links</a>. <div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-GrfVPmJ05NA/VDk0j2LnpZI/AAAAAAAABsU/rDaNl82fVNk/s1600/IMG_0028.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-GrfVPmJ05NA/VDk0j2LnpZI/AAAAAAAABsU/rDaNl82fVNk/s320/IMG_0028.jpg" /></a></div>This quilt has an underlying square lattice pattern, but the use of circles again evokes links, at least for me. <div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-JY75VYgPSt4/VDk28IAxCKI/AAAAAAAABsg/dX4x9JeT7ZA/s1600/IMG_0039.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-JY75VYgPSt4/VDk28IAxCKI/AAAAAAAABsg/dX4x9JeT7ZA/s320/IMG_0039.jpg" /></a></div></p> <p>It was a surprise to come across a quilt with fivefold symmetry, but it makes perfect sense for a tablecloth. <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-GIbePmzawXU/VDk5L3KoyCI/AAAAAAAABs8/1EI7I93mfrg/s1600/IMG_0037.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-GIbePmzawXU/VDk5L3KoyCI/AAAAAAAABs8/1EI7I93mfrg/s320/IMG_0037.jpg" /></a></div></p> <p>Finally, this quilt was just gorgeous. The underlying pattern is simple—again a square lattice—but the diagonal translations are highlighted by the arrangement of the butterflies. <div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-9CrHqsGmNDk/VDk6M4eweII/AAAAAAAABtM/hIqzjAiypxg/s1600/IMG_0034.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-9CrHqsGmNDk/VDk6M4eweII/AAAAAAAABtM/hIqzjAiypxg/s400/IMG_0034.jpg" /></a></div>As you can see, it was decorated as “Best of Show”. We were particularly happy to see it receive this prize, because we had previously seen it in Northampton’s own <a href="http://www.threecountyfair.com/" target="_blank">3 County Fair</a>! </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com2tag:blogger.com,1999:blog-30611202.post-72677794733494170422014-08-22T06:54:00.000-07:002014-08-22T07:07:23.004-07:00formative assessment isn’t scary<p>I get a little jumpy around nomenclature. This probably comes from being a mathematician; we spend a lot of time coming up with names for complex ideas so that they’re easier to talk about. <a href="http://www.amphilsoc.org/sites/default/files/proceedings/1570204Graham.pdf" target="_blank">Naming a thing gives you power over it</a> and all that. So when we come across a new name, it could take anywhere between a few minutes and a few months to unpack it. An <a href="http://simple.wikipedia.org/wiki/Abelian_group" target="_blank"><em>abelian group</em></a>, for instance, can be completely and formally defined very quickly, whereas a rigorous definition of <a href="https://www.math.ucdavis.edu/~kapovich/EPR/T.pdf" target="_blank"><em>Teichmüller space</em></a> often takes several weeks in a course to reach. Some things are in between, easy to define but not-so-easy to figure out why the object has a special name (see <a href="http://www.ams.org/notices/200307/what-is.pdf" target="_blank"><em>dessin d’enfant</em></a>). Very often a major step along the way to understanding something is grasping the simplicity—the inevitability, even—of its definition. </p> <p>So it is with <em>formative assessment</em>. When I first learned about the formative/summative assessment distinction, I got nervous. I thought, “So, besides giving tests and quizzes, I need to be doing a whole bunch of other things in class to find out what students are thinking? How much more class time will this take? How much more preparation will it take? How will I ever incorporate this new feature into my class, and how bad will it be if I don’t manage to?” I think I got caught up in the impressiveness of the term <em>assessment</em>; that seemed like a big “thing”, and doing any kind of assessment must require a carefully crafted and substantial process. </p> <p>So let’s back up a bit. In teaching, assessment means anything that provides an idea of students’ level of understanding. If it’s not graded, it’s formative. </p> <p>That’s it. </p> <p>As a teacher, unless you have literally never asked “Are there any questions?”, you have done formative assessment. Asking “Are there any questions?” is a crude and often ineffective means of formative assessment, but it is assessment nonetheless. You and I are <em>already doing</em> formative assessment, which means that we don’t have to <em>start</em> doing it; we can instead turn to ways of doing it <em>better</em>. Somehow I find that easier. </p> <p>“Formative assessment” is more like “abelian group” than “Teichmüller space”. If you have ever added integers, you have worked with an abelian group. But having an easily-grasped definition doesn’t have to mean than a concept is limited. In fact, simple definitions can often encompass a broad range of ideas, which happen to share a few common features. There are entire theorems and theories built on abelian groups. Naming a thing gives you power over it. Now that we’ve named formative assessment, let’s see how we can build on it. </p> <p><a href="https://twitter.com/davidwees" target="_blank">David Wees</a> has a collection of <a href="http://www.edutopia.org/groups/assessment/250941" target="_blank">56 different examples of formative assessment</a>, which range from the “Quick nod” (“You ask students if they understand, and they nod yes or no”—possibly virtually, which enables anonymity) to “Clickers” to “Extension projects” (“Such as: diorama, poster, fancy file folder, collage, abc books. Any creative ideas students can come up with to demonstrate additional understanding of a topic.”) <a href="https://twitter.com/thescamdog" target="_blank">John Scammell</a> has a similar collection of <a href="https://www.dropbox.com/s/ju5ls63sioxgmog/Practical%20Formative%20Assessment%20Strategies.docx" target="_blank">Practical Formative Assessment Strategies</a> (some overlap with Wees’s list), grouped into sections like “Whole Class Strategies”, “Individual Student Strategies”, “Peer Feedback Strategies”, “Engineering Classroom Discussion Strategies”, and so on. </p> <p>Formative assessment doesn’t have to take much time or preparation. You’re probably already doing it without realizing it. Adding some variety to the methods of assessment, however, can provide a more complete picture of students’ understanding, to their benefit. Feel free to add more resources in the comments. </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com3tag:blogger.com,1999:blog-30611202.post-61993641891952529742014-08-19T08:07:00.000-07:002014-08-21T10:31:04.788-07:00a reflection on course structure, and standards for calculus<p>Here’s what I’ve learned about writing standards: it’s hard to get them balanced properly. This challenge is inherent in developing any grading system. I used to fret about whether quizzes should count for 15% or 20% of the final grade; now I fret about whether the product, quotient, and chain rules should be assessed together or separately. (I’m happier trying to solve the latter.) </p> <p>Another challenge is in setting up standards so that assessments have some coherence. I’ll explain. My first couple of times creating standards, I sat down and made a list of all the things I wanted my students to be able to do by the end of the semester, grouped into related sets, with an eye towards having each standard be of roughly equal importance (as I mentioned in the previous paragraph). After all, that’s what standards are, right? All the skills we want students to develop? That done, I told myself, “Okay, now every assessment—every homework, quiz, and test—will have to be graded on the basis of items in this list.” In principle, it’s nice to have this platonic vision of what students should do and know, including all the connections between related ideas (<em>parametrization means imposing coordinates on an object; it doesn’t really matter what dimension it has, so parametrizing curves and surfaces should go together as a single standard</em>). However, while this list said a lot about what I thought <em>students</em> should do, it didn’t say much about what <em>I</em> was going to do. It didn’t fit the structure of the course, just of the ideas (<em>oh, wait, we’re parametrizing curves in week 2 and surfaces in week 10—why didn’t I notice that before?</em>). Looking back, I can see that a lack of contiguousness within a standard <em>does</em> reflect a conceptual distinction between the concepts involved (<em>hmmm, maybe the idea of drawing a curve through space is conceptually different from laying out a coordinate system on a curvy surface</em>). I ended up assessing “partial” standards at various points in the semester, which is absurd on the face of it. It’s one thing to assert that a standard may be assessed at different points in the semester, based on how the skills are needed for the task at hand; it’s another to say, “Well, you’re learning <u>part</u> of a skill now, and I’ll test you on that, and you’ll learn <u>the rest of this same skill</u> later.” </p> <p>I’ve had fewer slip-ups of this sort as time goes on, but I’ve never quite been happy with how the standards match up with the time spent in class. Both of the problems above keep rearing their heads. So for this fall, I decided to look at the schedule of the class and write standards based on what we do in 1–2 days of class. (Reading <a href="http://arundquist.wordpress.com/2014/07/14/1-standard-per-day/" target="_blank">this blog post</a> by Andy Rundquist earlier in the summer helped push me in this direction.) If it seemed like too little or too much was getting done in a day, well, that’s an indication that the schedule should be modified. In a semester with 38 class meetings, there should be sufficient time allotted for review, flexibility, and a few in-depth investigations, which leads me to having 25–30 content standards for the course. That’s a few more than I’ve had in the past, but not by many. </p> <p>Here’s the conclusion I’m coming to: <em>standards both shape and are shaped by the structure of the class.</em> Part of what we as instructors bring to a class is a personal view of how the subject is organized and holds together. If you and I are both teaching calculus, there will be a great deal of overlap in what skills we believe should be assessed, but there will be differences, and we’ll find different dependencies. A fringe benefit of writing out standards is that we can see this structure clearly—even better, I believe, than just by looking at the order of topics. They force us to be honest about our expectations, thereby combatting a certain tendency, observed by Steven Krantz in <em>How to Teach Mathematics</em>, to give tests based on “<em>questions that would amuse a mathematician</em>—by which I mean questions about material that is secondary or tertiary. … In the students’ eyes, such a test is <em>not</em> about the main ideas in the course.” You may want students to use calculus mostly in applied settings where exact formulas for the functions involved are not known, whereas I may be primarily concerned with students’ ability to deal formally with closed-form expressions and to deeply understand classical functions. We can both be right. We should both let our students know what we expect of them, rather than making them guess. In short, standards are not completely standardized—they highlight the commonalities and the particularities among courses that treat basically the same material. </p> <p>With all that said, here I will share my list of standards for Calculus 1 this semester. Because of the length of the list, I’ll just link to a Google document that contains them: <a href="https://docs.google.com/document/d/1DYAo7Lygu4NfBRmNeEngBcaLoBvDtVestGijIn8NnoU/edit?usp=sharing" target="_blank">Standards for MTH 111, Fall 2014</a>. They are grouped into twenty-six “Content standards” and three “General standards”. Over time, I’ve settled on these last three as skills that I want to assess on every graded assignment: <b>Presentation</b>, <b>Arithmetic and algebra</b>, and <b>Mathematical literacy and numeracy</b>. These are essential skills for doing anything in calculus, and struggles in calculus can often be attributed to weaknesses in these areas. We’ve all had students who are fine at applying the quotient rule to a rational function, but are stymied when it comes to expanding and simplifying the numerator of the result. That can hamper solving certain kinds of problems, and I want to be able to point to “algebra”, not anything calculus-related, as the area that needs attention. The descriptions of the content standards are shaped in part by our textbook, <em>Calculus: Single Variable</em> by Deborah Hughes-Hallett et al. I like to introduce differential equations fairly early in the course—this follows a tradition at my college, too—so some standards related to that are sprinkled throughout. I should also confess an <a href="https://docs.google.com/document/d/1tlMAZVJHYiaI68Aj54mdqjNvx4lC1LJMwrX5yzSay1A/edit" target="_blank">indebtedness to Theron Hitchman</a> for the language of using verb clauses to complete the sentence “Student will be able to …” </p> <p>In addition to the 29 standards in the document linked above, I have one more for this class: <b>Homework</b>. Oh, homework. The calls to treat homework purely formatively and to <a href="http://shawncornally.com/wordpress/?p=583" target="_blank">stop grading it</a> (link goes to Shawn Cornally’s blog) have not quite reached the halls of post-secondary education. Many college and university instructors believe homework is so important that they make it worth a substantial fraction of the students’ grades. And it is important, but solely as a means for practicing, taking risks, developing understanding, and <em>making mistakes</em>. (See this video by Jo Boaler* on the importance of making mistakes: <a href="http://youtu.be/yysOlVWDzoU" target="_blank">“Mistakes & Persistence”</a>.) Grading homework almost always means that its usefulness as a place to take risks is undermined. Last semester I didn’t grade homework at all, although I did have a grader, who made comments on the homework that was submitted. On average, about 1/3 of the class turned anything in. At the end of the semester, I got two kinds of feedback on homework. A few students expressed appreciation that the pressure to make sure that everything in the homework was exactly right was relieved. Several, however, said they realized how important doing homework is to their understanding—often because they let it slip at some point—and urged me to again make it “required”. I want to honor both of these sentiments. I want to encourage students to do the homework and to feel like it is the safest of places to practice and make mistakes, and thereby improvements. So I will count both <u>submissions</u> and <u>resubmissions</u> of homework towards this standard. A student who turns in 20 homework assignments or thoughtfully revised assignments will earn a 4 on this standard, 15 will earn a 3, and so on. I hope this will have the desired effect of giving students maximum flexibility and responsibility in their own learning, while also acknowledging the work and practice they do. </p> <p>All of the rest of the standards, general and content, will also be graded out of 4 points, with the following interpretations: 1 – novice ability, 2 – basic ability, 3 – proficiency, 4 – mastery. (I’ve adapted this language from that used by several other SBG instructors). At the end of the semester, to guarantee an “A” in the class, a student must have reached “mastery” in at least 90% of the standards (that is, have 4s in 27 out of 30 standards), and have no grades below “proficiency”. To guarantee a “B”, she must have reached “proficiency” in at least 90% of the standards, and “basic ability” in the rest. A final grade of at least “C” is guaranteed by reaching “basic ability” in at least 90% of the standards. </p> <p>Two other blog posts about standards in college-level math classes went up yesterday: <ul><li><a href="http://symmetricblog.wordpress.com/2014/08/18/assessment-idea-for-calculus-i-near-final-draft/" target="_blank">Bret Benesh</a> wrote about his near-final list of standards for calculus 1, and again explained his idea to have <em>students</em> identify for which standards they have demonstrated aptitude when they complete a test or quiz. I really like this idea, as it essentially builds <a href="http://thalestriangles.blogspot.com/2013/01/thinking-about-thinking-about-thinking.html" target="_blank">metacognition</a> into the assessment system. I will have to consider this for future semesters.</li><li><a href="http://blogs.cofc.edu/owensks/2014/08/18/list-reboot/" target="_blank">Kate Owens</a> posted her list of standards for calculus 2, which she has organized around a set of “Big Questions” that highlight the main themes of the course. This is particularly important in calculus 2, which can sometimes seem like a collection of disconnected topics. In an <a href="https://twitter.com/ProfNoodlearms/status/501486790427942912" target="_blank">ensuing discussion on Twitter</a>, it was pointed out that these kinds of Big Ideas are what can really stick with students, far beyond the details of what was covered.</li></ul>After reading Kate’s post, I looked at my monolithic list of standards, and attempted to organize them into groups based on three big questions: “What does it mean to study change?” (concepts of calculus), “What are some methods for calculating change?” (computational tools), and “What are some situations in which it’s useful to measure change?” (applications). I was not particularly successful at sorting my standards into these categories, but I like the questions. I may ask the students how they would use the various standards to answer these questions. There are trade-offs in any method of developing a set of standards. I am grateful for these other instructors who are also working on changing how we think about grading and sharing their ideas. </p> <hline> <p>* Jo Boaler’s online courses on “How to Learn Math” are currently open:<br><a href="http://online.stanford.edu/course/how-to-learn-math-for-teachers-and-parents-s14" target="_blank">For teachers and parents until October 15 ($125)</a><br><a href="https://class.stanford.edu/courses/Education/EDUC115-S/Spring2014/about" target="_blank">For students until December 15 (free)</a></p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com2tag:blogger.com,1999:blog-30611202.post-50904686016784905972014-08-18T06:20:00.001-07:002014-08-19T08:13:21.453-07:00standards for analysis<p>Writing standards for a proof-based class is a different beast than for <a href="http://thalestriangles.blogspot.com/2014/08/a-reflection-on-course-structure-and.html">introductory calculus</a>, or even <a href="http://thalestriangles.blogspot.com/2013/09/probability-skills.html">probability</a>. In <a href="http://thalestriangles.blogspot.com/2014/08/low-threshold-exercises-for-analysis.html">my last post</a>, I described a bit of the structure of the analysis class I’m teaching this fall: inquiry-based, primarily structured around group work, running on a weekly cycle of tackling a problem, agreeing on an approach, and presenting a solution to the class for discussion. My usual way of compiling standards—looking through the course content and breaking it into 20–30 skill sets of roughly equal importance—sort of falls apart here. Do I want students to be able to prove that every Cauchy sequence in the set of real numbers is convergent, and to explain what this implies about the completeness of the reals? Yes, but what I really want is for them to be able to assimilate new concepts and make sense of them by creating examples and fitting the definitions into proofs. Do I want them to be able to compute integrals with respect to both Lebesgue measure and singular Dirac measures? Yes, but what I really want is for them to see how these represent the interplay of mathematics and other sciences—how the exigencies of other fields of science led to the development of both the Lebesgue integral and the Dirac delta—and to feel part of a scientific community, both in and out of the classroom. </p> <p>While considering these questions, I determined that there are <b>six standards</b> I want students to actively develop during the semester, and on which I want to be giving targeted feedback. These skills will be grounded in the content of the course, but they will also provide the benchmarks of success in mastering the content. Here they are: <ol><li><b>Correct use of vocabulary and notation:</b> Using mathematical terminology and symbols, especially those particular to analysis, correctly and appropriately.</li><li><b>Correct and convincing argumentation:</b> Creating and recognizing complete proofs, with their various pieces presented in a logical order.</li><li><b>Clear written exposition:</b> Organizing a paper for the benefit of the reader, making it easy to read and using proper English grammar.</li><li><b>Broad vision of the subject:</b> Providing context in papers, including statements of solved problems, a guide to the structure of proofs, and connections with other ideas in the class (previous work or larger themes).</li><li><b>Effective verbal presentation:</b> Using good speaking habits (e.g., speaking confidently, talking to the class and not to the board, being sensitive to the audience, handling questions well) to present mathematical content.</li><li><b>Collaboration and participation in discussion:</b> Attending class regularly, engaging in discussion through questions and critical feedback, seeking ways to serve the overall community.</li></ol>(As usual, I’m grateful to <a href="https://plus.google.com/u/0/+BretBenesh/posts" target="_blank">Bret Benesh</a> and <a href="https://plus.google.com/u/0/+TheronHitchman/posts" target="_blank">Theron Hitchman</a> for helping me think through these at an early stage.) As I will acknowledge to my students, some of these standards depend to a certain extent on others. For example, it’s hard to make an effective presentation without mastering the vocabulary of the topic. But I believe these are distinguishable skills, all of which are important for students’ development as mathematicians. And I believe the students should be reflecting on their mastery of <em>these</em> skills as much as their mastery of analysis, and have the chance to show when they’ve improved. </p> <p>My grading scheme for this class is somewhat of a compromise. I am keeping as many of the features of standards-based grading as I can—including scoring individual assignments by standards and providing opportunities for reassessment—but in order to take into account how well the <em>content</em> has been mastered, at the end of the semester I will weight and total points to determine a final grade. This last step is a kludge made necessary by the continued use of letter grades. If I had my druthers, I would leave the final assessment in terms of the students’ demonstrated mastery of the standards on the individual assignments, so that their focus would always be on improving in those areas rather than reaching a particular grade. I have tried to set this up in a way that, to <a href="https://docs.google.com/document/d/1tlMAZVJHYiaI68Aj54mdqjNvx4lC1LJMwrX5yzSay1A/edit" target="_blank">quote T.J.</a>, “if you tried to ‘game the system’ to improve your grade, you would be doing exactly the kinds of things I wanted you to do, and improving your abilities as a mathematician.” (This suggests that we’re having to work against the current grading system to encourage students to grow in the ways we want. I suppose it’s a bit idealistic to believe that we can create a grading and reporting method that will provide both useful feedback to students and a helpful summary to those outside, but I digress.) </p> <p>Of the standards I’ve listed, 1–4 are basically about writing and 5–6 are basically about active involvement. They will be handled separately in the grading scheme. Each student will write, as part of a group, eleven papers that state and solve a particular problem. These papers will be graded on the basis of standards 1–4, with each standard receiving either a 0 or a 1. After a paper has been graded, the groups will have the benefit of feedback from me and from their classmates, and they will revise, if necessary, until the paper merits at least 3 of the possible 4 points. This final version will be included in a document for the whole class to share. There will be a midterm and a final exam, as required by the college. Both will be take-home, and the individual problems on the exams will be graded according to the same standards as the papers. Following the midterm, students will have the chance to revise their solutions, as they do with the group papers. </p> <p>Standards 5 and 6 will be graded over the whole semester. Each student will have approximately four chances to present in front of the class; although they will be presenting as part of a group, I will give individual presentation grades, again out of 4 points. The baseline will be 2 points. Grades of 3 or 4 will be achieved based on the quality of the presentation and adherence to the principles stated in the description of the standard. I’ll only consider the highest presentation grade at the end of the semester. For the participation grade, the baseline will again be 2 points, for regular attendance. (This is my first time giving an attendance grade. I generally believe college students should be free to decide for themselves whether coming to class is useful or not. In this case, however, the presence and participation of individual members is essential for the class to work, so I think this grade is justified.) Grades of 3 or 4 will be achieved based on involvement in class discussion, either during meetings or online in the class forum (where each week’s papers will be posted), and in general contributing to a supportive, scientific atmosphere. Since this grade is not given on any particular assignment, I will meet with students individually a couple of times during the semester to gauge their progress and experiences, and to discuss their level of participation. </p> <p>Now, at the end of the semester, I want students’ work on the group papers and the exams to count about equally towards their final grade, and I want each of those to count about four times as much as their presentation and participation grades. So I will convert everything to a 40-point scale (16 possible points for papers, 16 for exams, 4 for presentation, and 4 for participation <b><i>Edit:</i></b> I’ve clarified these numbers in the comments). A letter grade of A will require at least 38 points, with no grades lower than 3 on any assignment (paper or exam problem) or standard (presentation and participation). A B will require at least 28 points, with no grades lower than 3. A C will require at least 18 points. This is as close as I can get to <a href="http://thalestriangles.blogspot.com/2013/01/assessing-standards.html">my usual way</a> of assigning final grades: a 4 on 80% of standards (or 90%, depending on the class), with no grade below 3, and so on. It also follows relatively closely the <a href="http://en.wikipedia.org/wiki/Academic_grading_in_France" target="_blank">French grading system</a> based on 20 points, with 10 required for passing. </p> <p>It’s not perfect, but that’s my current grading plan for this inquiry-based Introduction to Analysis course. Thoughts? </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com8tag:blogger.com,1999:blog-30611202.post-49013403894698695942014-08-11T10:20:00.001-07:002014-08-11T10:20:48.714-07:00low-threshold exercises for analysis<p>This fall, one of my courses will be Introduction to Analysis. At my school, this has been taught using a modified-Moore method for the last few years, and I will be largely adopting the structure and content of these previous years. In this IBL implementation, students work in groups on one problem per week. Each week has three assigned problems (so generally multiple groups are working on the same problem) that are loosely related. At the end of the week one class period is devoted to presentations: for each problem, one group is selected to present their solution in about 20 minutes, and the rest of the class is expected to be engaged in discussion with the presenters. Many of the problems were developed by <a href="http://www.math.smith.edu/faculty_cohen.php" target="_blank">David Cohen</a> (now professor emeritus), who described the method in <a href="http://cs.smith.edu/~dwcohen/modifiedmoore.pdf" target="_blank">an article for the American Mathematical Monthly</a>. Further developments were made by <a href="http://www.math.smith.edu/~cgole/" target="_blank">Christophe Golé</a>, with whom I co-taught the course two years ago. From my first exposure to the materials for this class, I have been impressed by the clever way students are led through standard material by a non-standard path. </p><p>As with many introductory analysis courses, one goal of this class is to help students transition to more formal mathematics, giving them experience with absorbing definitions and writing proofs. The problems themselves guide students through much of this process. I felt, however, that at times students could benefit from having exercises that allow them to interact more rapidly and immediately with new definitions. So one aspect I’m adding this year is a collection of “Warm-up exercises”, one per week. These are intended to be “low-threshold” activities, in the sense that a student should be able to work on them and produce results even with just a superficial understanding of the definitions involved. My hope is that by interacting with the definitions in a meaningful and productive way, they will feel more prepared to grapple with the assigned problems. </p><p>Here is a list of the exercises I’ve written, together with a rough description of the corresponding week’s topic. In addition to being “low-threshold”, several of these are also “high-ceiling”, meaning that immediate extensions and generalizations are evident. (For most of the course, however, the “high ceiling” is provided by the main set of problems.) <ul><li>(<em>Counting and cardinality</em>) Prove that the sets {1,2,3} and {4,5,6} have the same cardinality. Prove that {1,2} and {1,2,3} do not.</li><li>(<em>Balls in metric spaces</em>) Recall |<i>x</i>|=<i>x</i> if <i>x</i>≥0 and |<i>x</i>|=−<i>x</i> if <i>x</i> < 0. Prove |<i>x</i>+<i>y</i>|≤|<i>x</i>|+|<i>y</i>| for any real numbers <i>x,y</i>.</li><li>(<em>Topology of real numbers</em>) Prove that if <i>x</i> is isolated from a set <i>T</i> ⊂ <b>R</b>, then <i>x</i> cannot be an accumulation point of <i>T</i>.</li><li>(<em>Topological properties</em>) In <b>R</b>, is a set that contains just one point compact? (A bit of clarification here: in this course, the definition given for “compact” is a variant of sequential compactness, namely, that every infinite subset has an accumulation point.)</li><li>(<em>Continuity</em>) Prove that <i>x^n</i> is continuous at zero for any <i>n</i>∈<b>N</b>.</li><li>(<em>Properties of functions</em>) Prove that <i>x^n</i> is differentiable at zero for any <i>n</i>∈<b>N</b>.</li><li>(<em>Sequences of functions</em>) Use the algebraic identity (1–r)(1+r+r^2+…+r^n) = 1–r^(n+1) to prove that the series 1+r+r^2+r^3+… converges to 1/(1–r) if |r| < 1. (I keep finding that students have forgotten the sum of a geometric series in classes after calculus, so I figured it made sense to remind them of this fact while also suggesting they prove it.)</li><li>(<em>Uniform convergence and degrees of differentiability</em>) For any <i>k</i>∈<b>N</b>, give an example of a function that is <i>C^k</i> but not <i>C^(k+1)</i>.</li><li>(<em>Borel sets</em>) Suppose <i>X</i> is any set and <i>A</i> is the power set of <i>X</i>, i.e., the collection of all subsets of <i>X</i> (including ∅ and <i>X</i> itself). Show that <i>A</i> is a <a href="http://planetmath.org/completebooleanalgebra" target="_blank">countably complete Boolean algebra</a>.</li><li>(<em>Lebesgue integration</em>) Show that a sum of <a href="http://en.wikipedia.org/wiki/Simple_function#Definition" target="_blank">simple functions</a> is a simple function.</li></ul>There’s one more week’s worth of problems—all focused on properties of the Cantor set—which don’t require any new definitions. </p> <p>I’m not quite sure what role to give these in the course. I don’t want them to be required, and I definitely don’t want to make them “extra credit”. I do want them to provide a useful entry into playing around with definitions and not seem like extra work. Thoughts? </p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com2tag:blogger.com,1999:blog-30611202.post-44682327565169065262014-07-18T11:40:00.000-07:002014-08-26T04:22:36.187-07:00my one goal for teaching next year<p>Over the last few years, I’ve introduced several new aspects to my teaching: standards-based grading, student essays, prompting class discussion with questions on slips of paper, explorations with Desmos, assignments through Google docs, and so on. Some of these have had real positive effects, and I definitely believe in continuing to try new things. However, this year I’ve decided to focus on just one element of my teaching, which is <b>to engage every student in every class</b>. This means not worrying about all the potential newness, paying attention to what happens each time the class meets, and figuring out from those observations how to make every class meeting productive for everyone. This doesn’t mean I won’t try new things, but I want my focus to be on student engagement rather than experimentation.</p> <p>Here are a few specific things I think this entails: <ul><li><em>More preparation before the semester begins.</em> I’m doing more work ahead of time to prepare my classes than I have before. Usually I make sure my syllabus has an outline of the topics in rough chronological order, a description of when homework is due and exams will be given, and a litany of other policies and expectations. Then, during the semester, I choose homework assignments as we go along and follow the schedule with some fluidity, which means lots of time spent figuring out just what the next class can cover. I want my time outside of class to be more reflective. That is, instead of emerging from class and picking a homework assignment that goes with what we did, I want to have time to think about what each student did during the class and what might encourage them next time to be even more involved in the work. Instead of spending prep time picking topics, I want to look at the topics already before me and think about how each student might connect with them. (<a href="http://thalestriangles.blogspot.com/2014/08/a-reflection-on-course-structure-and.html">Writing standards</a> is already a big help towards this: when I consider what skills I want the students to demonstrate by the end of the semester, it forces me to balance the material, on a global scale, in terms of importance and time invested.)</li><li><em>More peer-instruction methods, like think-pair-share.</em> In other words, I should talk less (but PI is the positive formulation of this principle). How many answers can the students generate on their own? While some might think having students come up with the answers rather than providing a nice clear explanation myself would take more time, I am thinking of the fact that even in my “good” classes any explanation I give usually has to be given multiple times, because not everyone is focused at the same time. The next level would to be see how many <em>questions</em> the students can generate on their own before they start coming up with the answers, and I have that goal in mind. Nothing like trying to answer your own question to keep you engaged!</li><li><em>Effective use of silence.</em> I have absolutely no problem with periods of silence in my class. If nothing else, stopping the flow of information for a few moments now and then underlines the message that “class is not an info dump”. But I want to be sensitive to what <em>kind</em> of silence is occurring. The best kind is when you know there’s cogitation going on: the students are faced with a new idea or a collision of ideas and are trying to sort it out in some way they can enunciate. But there’s also the kind where everyone is just so baffled and lost that they can’t come up with answers, questions, or anything else. And sometimes in the silence you sense that the students know the prompt they’ve been given is banal, and responding to it proves nothing other than that they’re not literally asleep. I want to be attuned enough to know which is happening. Even better, I want the students attuned enough that they can tell me which is happening and whether the period of silence is worthwhile.</li><li><em>Finding and using “low-floor, high-ceiling” activities.</em> These are the kind of things anyone can get excited about. A student who is floundering should have something to grasp on to. A student who has mastered the material so far should have somewhere to grow. One way to do this is to have a whole bunch of questions of increasing “difficulty”, and I’ve used that tactic, but it conflicts with some of these other goals. In particular, someone who has trouble getting started on the list might feel at the end of class like they’ve failed if they don’t get to all the questions, and someone who rushes through and gets to the end might get the sense that there’s nowhere further to go. Moreover, when I ask more questions it leaves less room for students to ask theirs. I guess what I’m saying is that tracking down these types of activities is hard, and defies the way in-class activities are often done in calculus. (Possibly the objection I have to many traditional types of calculus problems, like Optimization and Related Rates, is that they have such a high floor and low ceiling. They’re basically puzzles, aimed at a particular level of understanding, which means they’re fun for some but not really broadly useful for learning.)</li><li><em>Being more deliberate about <a href="http://thalestriangles.blogspot.com/2014/08/formative-assessment-isnt-scary.html">formative assessment</a>.</em> This might be the hardest one for me, and yet I think it’s key to the whole endeavor. It’s easy to have a sense of how a few particular students and the class as a whole are doing. It’s easy to grade a quiz or a test and look over the results to draw conclusions about students’ understanding (a.k.a., summative assessment). It’s harder to come up with ways that encourage students to work independently, take risks, and also produce something concrete I can assess and provide feedback on. So I’ll be mining the math-twitter-blogosphere for ideas on a variety of ways to make formative assessments!</li></ul></p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com4tag:blogger.com,1999:blog-30611202.post-37302488142574998532014-02-27T13:19:00.002-08:002014-02-27T13:19:49.915-08:00big mistake or little mistake?<p>One of my friends shared this picture on Facebook— <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-GrR_ypvFsuw/Uw-p2l-JcbI/AAAAAAAABbo/w8U9UrwOZ7E/s1600/board_sawing_problem.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-GrR_ypvFsuw/Uw-p2l-JcbI/AAAAAAAABbo/w8U9UrwOZ7E/s400/board_sawing_problem.jpg" /></a></div>(which came via <a href="https://www.facebook.com/pages/mathtricksorg/123606280919" target="_blank">mathtricks.org</a>)—and suggested that the teacher who graded this assignment should not be teaching math at all. I suspected that the grading had fallen prey to a heavy teaching load for an elementary school teacher who might not be as comfortable as they’d like to be with mathematical concepts, so I wrote this response (which I’ve edited slightly):</p><blockquote><p>The teacher is doing something rather sophisticated—solving a more general problem—which is what makes it easy to trip up on the apparent simplicity of this question. Consider the following similar questions:</p> <p>“It took Marie 10 minutes to paint two boards. If she works just as fast, how long will it take her to paint three boards of the same size?”</p> <p>“It took Marie 20 minutes to saw a board into 5 pieces. If she works just as fast, how long will it take her to saw another board into 6 pieces?”</p> <p>In the case of the first alternative question I’ve proposed, the teacher’s reasoning would be entirely correct: 10 minutes for 2 boards means 15 minutes for three boards. Although this is not the question that’s being asked, sometimes it’s helpful to think of situations where an incorrect sequence of reasoning becomes correct in order to identify where the mistakes are.</p> <p>In the case of the second alternative question I’ve proposed, think about how you would solve it. Would you divide the 20 minutes into 5 equal periods of time, or 4? Would you blame someone for dividing by 5 the first time they attempted to solve the problem? Once you figure out that what's important is the 4 cuts it takes, rather than the 5 pieces that are produced, then you can solve any such problem. For example, “If Marie takes an hour to cut a board into 6 pieces, then how long will it take to saw another board into 12 pieces?” (The answer, btw, is <i>not</i> 2 hours.)</p> <p>The reasoning the teacher wrote on the paper is clearly of this latter kind. Their mistake is not in <i>computation</i>, but in choosing what aspect of the problem deserves attention, namely the cuts in the wood and not the resulting pieces. This leads to nothing more than an “off by 1” error, which is easily corrected. I would be happy to see this reasoning written on a student’s paper, because I would know that only a small correction is needed, after which the student could solve the much more general problem, thanks to a demonstrated understanding of proportion.</p> <p>Math teachers have to be prepared to look for this kind of demonstrated understanding in order to hone in on where a student is making mistakes in their reasoning. This particular case is an example of someone who is teaching math, but probably also a lot of other subjects, and may or may not have training in mathematical thinking. So the more sophisticated concept—proportionality—steps in and overrides a simpler formulation of the problem, which just involves counting. This kind of mix-up is common not just in students, but among all people. Which is why I don't think it’s incompetence, but a symptom of the need for more mathematical training for teachers.</p></blockquote> <p>I’m curious how other math teachers would have responded to this discussion. There are certainly those in the math community that can give clearer expression to what I was trying to say. Other commenters on Facebook seemed baffled that a teacher could make this mistake in grading, but I think it’s not such a serious error in reasoning (except that the teacher should have been correcting this on students’ papers, rather than making the mistake on their own).</p> <p>So, what do you think?</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com10tag:blogger.com,1999:blog-30611202.post-21805138055866592172014-02-21T14:44:00.000-08:002014-02-22T12:06:30.155-08:00a bit of ex-spline-ation<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p><em>Splines</em> are piecewise-polynomial functions that interpolate between a finite set of specified points $(x_1,y_1)$, …, $(x_n,y_n)$. <em>Cubic</em> splines assume that each piece has at most third degree; this allows the formation of curves that appear quite smooth to the eye, as one has sufficient freedom to match both first and second derivatives at the joining points. Either the first or second derivative may be chosen freely at the first and last points; in the case of <em>natural</em> cubic splines, the assumption is that the second derivative vanishes at those points. I spent part of this week trying to understand how they work, and so I decided to make Desmos graphs that would illustrate how to interpolate by cubic splines for sets of three and four points.</p> <center><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/yvgfpq2prf" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/yvgfpq2prf.png" width="250px" height="250px" style="border:1px solid #ccc; border-radius:5px" /></a><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/qnr2doc0su" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/qnr2doc0su.png" width="250px" height="250px" style="border:1px solid #ccc; border-radius:5px" /></a><br><i>Click on the left image to go to the graph with three points,<br>or the right image to go to the graph with four points.</i></center> <p>The process I followed for three points was straightforward, if not lovely. Suppose the interpolating functions are $f_1$ and $f_2$. The assumption that $(x_1,y_1)$ and $(x_3,y_3)$ are inflection points means that $f_1$ and $f_2$ have the form \[ f_1(x) = a_1 (x - x_1)^3 + b_1 (x - x_1) + y_1 \] and \[ f_2(x) = a_2 (x - x_3)^3 + b_2 (x - x_3) + y_3 \] (think in terms of Taylor polynomials around $x_1$ and $x_3$). We need to find the coefficients $a_1, b_1, a_2, b_2$. Two conditions arise from the fact that $f_1(x_2) = f_2(x_2) = y_2$. The condition that the second derivatives match at $x_2$ implies $a_1 (x_2 - x_1) = a_2 (x_2 - x_3)$. The fourth and final condition is that the first derivatives match at $x_2$, and now <a href="http://wolfr.am/1fkEvSL" target="_blank">the system can be solved</a> to find $f_1$ and $f_2$ entirely.</p> <p>While preparing to make a graph for four points, I came across <a href="http://calculus7.org/2014/02/14/connecting-dots-naturally/" target="_blank">a post on the Calculus VII blog</a> that breaks down the whole process of computing splines in a clever and beautiful way, which also reduces the complexity of the computation. In addition, the post provides, in rough outline, a motivation for why natural cubic splines are a good choice for interpolation, and I recommend reading the whole thing. I did have to work out several of the details for myself, however, particularly since that post only deals with $x$-values spaced one unit apart. I thought that it might be useful for others to see the process that led to the formulas I use on the graph. Lots of algebra ahead.</p> <p>We start with four points, $(x_1,y_1)$, $(x_2,y_2)$, $(x_3,y_3)$, and $(x_4,y_4)$, with $x_1 < x_2 < x_3 < x_4$. The first observation is that the easiest kind of interpolation is piecewise-linear, so we compute the three slopes \[ m_1 = \frac{y_2 - y_1}{x_2 - x_1}, \qquad m_2 = \frac{y_3 - y_2}{x_3 - x_2}, \qquad m_3 = \frac{y_4 - y_3}{x_4 - x_3} \] for the three segments between successive pairs of points, and the linear functions $L_1$, $L_2$, and $L_3$ corresponding to this interpolation, $L_i(x) = m_i (x - x_i) + y_i$.</p> <center><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/kpi6ui8xje" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/kpi6ui8xje.png" width="300px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a><br><i>Linear interpolation</i></center> <p>The next big idea is that we want to adjust the piecewise-linear approximation by adding cubic “correction” terms $C_1$, $C_2$, and $C_3$, so that our final interpolating functions become $f_i = L_i + C_i$, $i = 1,2,3$, where $C_i(x_i) = C_i(x_{i+1}) = 0$. These latter conditions imply that $C_i$ can be written in the form \[ C_i(x) = a_i (x - x_i) (x - x_{i+1})^2 + b_i (x - x_i)^2 (x - x_{i+1}), \] which means that the first and second derivatives are \[ C_i'(x) = a_i (x - x_{i+1})^2 + 2(a_i + b_i) (x - x_i) (x - x_{i+1}) + b_i (x - x_i)^2 \] and \[ C_i''(x) = (4a_i + 2b_i) (x - x_{i+1}) + (2a_i + 4b_i) (x - x_i). \] Note also that $f_i' = m_i + C_i'$ and $f_i'' = C_i''$. </p> <p>What other properties do we want these cubic functions to have? <ul><li>For the derivatives of the $f_i$s to match at $x_2$ and $x_3$, we must have $m_i + C_i'(x_{i+1}) = m_{i+1} + C_{i+1}'(x_{i+1})$ for $i = 1,2$.</li><li>We want the second derivatives of the $C_i$s to match at $x_2$ and $x_3$ (this is the same as matching the second derivatives of the $f_i$s).</li><li>We also require that the second derivatives be zero at the outer endpoints.</li></ul>Now a curious twofold effect comes into play: <ul><li>The coefficients $a_i$ and $b_i$ are linear combinations of $z_i = C_i''(x_i)$ and $z_{i+1} = C_i''(x_{i+1})$. To wit, solving the system \[ \begin{cases} z_i &= (4 a_i + 2b_i) (x_i - x_{i+1}) \\ z_{i+1} &= (2a_i + 4 b_i) (x_{i+1} - x_i) \end{cases} \] for $a_i$ and $b_i$ yields \[ a_i = \frac{2z_i + z_{i+1}}{6 (x_i - x_{i+1})}, \qquad b_i = \frac{2z_{i+1} + z_i}{6 (x_{i+1} - x_i)} \] </li><li>The condition that the second derivatives be equal is exceedingly simple; we have already used it implicitly in labeling them as $z_1$, $z_2$, $z_3$, $z_4$.</li></ul>Our assumption at the endpoints is that $z_1 = z_4 = 0$. Thus, the whole problem reduces to finding what should be the second derivatives at the “interior” points $x_2$ and $x_3$.</p> <center><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/ioavughcmv" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/ioavughcmv.png" width="300px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a><br><i>Cubic correction terms</i></center> <p>The idea of parametrizing by the second derivatives, after removing the “linear” effects, is where the beauty and cleverness in this solution lie. We use the assumption about matching first derivatives (a linear condition in the coefficients) to set up the remaining conditions on the second derivatives (which themselves depend linearly on the coefficients). Looking back, this is essentially what I did for three points, but I missed out on dealing with the linear effects separately, so I had to solve for three variables simultaneously. At this point, we only need to solve for two: $z_2$ and $z_3$.</p> <p>Since $C_i'(x_i) = a_i (x_i - x_{i+1})^2 = \frac{1}{6} (2z_i + z_{i+1})(x_i - x_{i+1})$ and $C_i'(x_{i+1}) = b_i (x_{i+1} - x_i)^2 = \frac{1}{6} (2z_{i+1} + z_i)(x_{i+1} - x_i)$, the equations $m_i + C_i'(x_{i+1}) = m_{i+1} + C_{i+1}'(x_{i+1})$ become \[ \begin{cases} (x_2 - x_1) (2z_2 + z_1) + (x_3 - x_2) (2 z_2 + z_3) = 6(m_2 - m_1) \\ (x_3 - x_2) (2z_3 + z_2) + (x_4 - x_3) (2 z_3 + z_4) = 6(m_3 - m_2) \end{cases} \] (in this form, it is easy to see how to generalize to $n$ points, and it shows the origin of what the other post called the “tridiagonal” form of the system). Now we set $z_1$ and $z_4$ to zero and solve for $z_2$ and $z_3$. For this two-variable system it isn’t too bad to write down <a href="http://wolfr.am/1gTkNhu" target="_blank">the explicit solution</a>, which is what is used in the <a href="https://www.desmos.com/calculator/o0qbclbspw" target="_blank">Desmos graph</a>: \begin{gather*} z_2 = 6 \frac{3 m_2 x_2 + 2 m_1 x_4 + m_3 x_2 - 2 m_1 x_2 - 2 m_2 x_4 - m_2 x_3 - m_3 x_2}{(x_2 + x_3)^2 - 4 (x_1 x_2 + x_3 x_4 - x_1 x_4)} \\ z_3 = 6 \frac{3 m_2 x_3 + 2 m_3 x_1 + m_1 x_2 - 2 m_2 x_1 - 2 m_3 x_3 - m_1 x_3 - m_2 x_2}{(x_2 + x_3)^2 - 4 (x_1 x_2 + x_3 x_4 - x_1 x_4)} \end{gather*} </p> <center><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/hycpcbwtbq" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/hycpcbwtbq.png" width="300px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a><br><i>In this graph, blue plus green equals red.</i></center> <p>Finally, to check that we have actually created a spline with the desired properties, we can look at the graphs of the first and second derivatives to make sure they’re continuous.</p> <center><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/rg1ywpexvh" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/rg1ywpexvh.png" width="300px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a><br><i>The spline is red. The first derivative is purple, and the second derivative is orange.</i></center> <p>Notice that the second derivative is piecewise linear (naturally, since the spline is piecewise cubic) and zero at the endpoints (as we chose it to be). I particularly like seeing how the derivatives change as the points are moved.</p> <p>Anyway, I learned a lot from putting together the graphs, and almost as much from writing this post. I think there are lots of interesting explorations one could do with these graphs, but for now I’ll just release them to the wild and hope people enjoy them!</p><br> <p><b>P.S.</b> Please pardon the bad pun in the title. I’m working on making my post titles more… interesting?</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-17822552812167778922014-02-15T12:15:00.000-08:002014-02-16T05:14:08.500-08:00be careful with computers<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p>This week in my calculus class, we were studying examples of separable differential equations: <a href="http://en.wikipedia.org/wiki/Exponential_growth#Differential_equation" target="_blank">exponential growth and decay</a>, <a href="http://en.wikipedia.org/wiki/Convective_heat_transfer#Newton.27s_law_of_cooling" target="_blank">Newton’s Law of Heating and Cooling</a>, and <a href="http://mathworld.wolfram.com/LogisticEquation.html" target="_blank">the logistic model</a>, respectively \[ \frac{dy}{dt} = ky, \qquad \frac{dy}{dt} = k(M-y), \qquad\text{and}\qquad \frac{dy}{dt} = ky\left(1-\frac{y}{L}\right). \] <center><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/cvb1jdrini" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/cvb1jdrini.png" width="180px" height="170px" style="border:1px solid #ccc; border-radius:5px" /></a><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/h14wywokc5" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/h14wywokc5.png" width="180px" height="170px" style="border:1px solid #ccc; border-radius:5px" /></a><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/ff6ghswn9z" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/ff6ghswn9z.png" width="180px" height="170px" style="border:1px solid #ccc; border-radius:5px" /></a></center>We explored how the shapes of solutions depend on the parameters in the equations, which parameters are most physically meaningful in various situations, and what extra parameters appear in the course of solving them.</p> <p>I had several goals for this course of study: <ul><li>to show the versatility of differential equations in modeling physical situations;</li><li>to show the usefulness of integration techniques for solving real-world problems;</li><li>to practice understanding the behavior of solutions to differential equations by examining the equations themselves;</li><li>to get students more familiar with <a href="http://www.wolframalpha.com/" target="_blank">Wolfram Alpha</a> and <a href="https://www.desmos.com/calculator" target="_blank">Desmos</a> as computational tools.</li></ul>This last goal led to a curious discovery, which is what prompted me to write this post.</p> <p>In the age of widely-accessible <a href="http://en.wikipedia.org/wiki/List_of_computer_algebra_systems" target="_blank">computer algebra systems</a>, we are finally freed from spending a month in calculus class mastering techniques of integration. Substitution and integration by parts are indispensable, but as far as I’m concerned, the rest of the methods are used infrequently enough that students should be made aware that other techniques exist and they can learn them when needed. In particular, solving the logistic equation is about the only reason I can conjure to justify learning partial fractions in introductory calculus. So we learned how to compute partial fractions for a rational function with two linear factors in the denominator. Here’s how it gets used.</p> <p>First, rearrange the logistic equation a bit and separate variables to get \[ \frac{dy}{y(y-L)} = -\frac{k}{L}dt. \] The right side clearly integrates to $-\frac{k}{L}t$. Using partial fractions, the integral of the left side is \[ \int\frac{dy}{y(y-L)} = \int \frac{1}{L} \frac{dy}{y - L} - \int\frac{1}{L} \frac{dy}{y} = \frac{1}{L} \big( \ln|y - L| - \ln|y| \big) = \frac{1}{L} \ln \left|\frac{y - L}{y}\right|. \] Tossing in the ever-present arbitrary constant of integration $+C$ (which really matters very little until one starts solving differential equations, as here), we have \[ \frac{1}{L} \ln \left|\frac{y - L}{y}\right| = -\frac{k}{L}t + C. \] Multiply both sides by $L$ and exponentiate both sides to get \[ \left|\frac{y - L}{y}\right| = e^{-kt + LC} = e^{LC} e^{-kt}. \] Now, $e^{LC}$ is always positive, but when we drop the absolute value signs, we get \[ \frac{y - L}{y} = Be^{-kt}, \] where $B$ can be either positive or negative. Finally, solve for $y$: \[ y = \frac{L}{1 + Ae^{-kt}} \] (where $A = -B$). Technically, throughout this process we had to assume that $y \ne 0$ and $y \ne L$. That’s okay, though, because $y = 0$ and $y = L$ are evident as equilibrium (constant) solutions from the equation itself.</p> <p>There are three qualitatively different behaviors that solutions to the logistic equation can have, depending on their initial values. <ol><li>The solutions $y = 0$ and $y = L$, as mentioned above, are constant.</li><li>A solution that starts out between $0$ and $L$ will increase over time, approaching the value $L$ asymptotically. This behavior corresponds to $A > 0$ in the solution above.</li><li>A solution that starts out above $L$ will decrease over time, again approaching the value $L$ asymptotically. This behavior corresponds to $A < 0$ in the solution above.</li></ol>The case of an initial value below zero is generally not physically meaningful, and in any case it follows the same formula as 3. (See the red curve in the third image at the top.) The most interesting solutions are type 2, whose graphs follow <a href="http://en.wikipedia.org/wiki/Sigmoid_curve" target="_blank">what Wikipedia describes as a “sigmoid” shape</a> (the blue curve in the image at top). These are the solutions that provide the most meaningful applications of the logistic equation: growth in a constrained environment.</p> <p>So far, so good, and I haven’t said anything that isn’t easily found elsewhere. But before we solved the logistic equation by hand, I wanted my students to explore the solutions graphically, so I had them <a href="http://wolfr.am/1kJFwYa" target="_blank">plug the equation into Wolfram Alpha</a>. Here’s the solution it provides: \[ y(t) = \frac{L e^{c_1 L+k t}}{e^{c_1 L+k t}-1}. \] However, if you <a href="https://www.desmos.com/calculator/cndmpgptex" target="_blank">plot this solution</a> (link goes to a Desmos graph) and vary the parameters (particularly $c_1$, since $k$ and $L$ are constants given in the equation), you will only see one kind of solution: the third kind. Type 2 solutions, the most interesting ones, don’t even show up in Wolfram Alpha’s formula! What’s going on?</p> <p>If you sign in to Wolfram Alpha, you can have it produce step-by-step solutions. The results from this query:<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-qeakwBiGVhE/Uv-vJ4ErWJI/AAAAAAAABao/pkzR6bmMCZk/s1600/WA-logistic-solution.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-qeakwBiGVhE/Uv-vJ4ErWJI/AAAAAAAABao/pkzR6bmMCZk/s1600/WA-logistic-solution.png" /></a></div>This doesn’t look so different from our solution: separate variables, integrate, solve for $y$. In W|A, $\log$ denotes the natural logarithm, so nothing’s amiss there. What’s changed?</p> <p>It’s a small thing, one that could easily be missed, even if you’re looking step-by-step. A hint comes from the placement of the additional parameter that arises from the constant of integration: in our solution, this parameter ended up in front of the exponential function, whereas Wolfram Alpha left it in the exponent. That’s not really the issue though… ah, there it is:<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-7zAi1UmNF4A/Uv_GSr93eXI/AAAAAAAABbA/1-p8CCvn4n8/s1600/WA-logistic-solution-log.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-7zAi1UmNF4A/Uv_GSr93eXI/AAAAAAAABbA/1-p8CCvn4n8/s640/WA-logistic-solution-log.png" /></a></div>After integrating, <em>there’s no absolute value inside the logarithm</em> (gasp!). I know, we try to convince our students time and time again that the standard antiderivative of $\frac{1}{x}$ is $\ln|x|$, and here Wolfram is leaving out the absolute value. And it turns out to make a difference—go back and read the solution we went through, and you’ll see that the moment the absolute value was dropped from the equation is precisely when our formula gained the flexibility to accommodate both type 2 and type 3 solutions. Wolfram Alpha <em>missed solutions</em> by leaving out the absolute value.</p> <p>This isn’t all that disastrous. I know that Wolfram Alpha generally assumes complex arithmetic, in which case the logarithm requires a branch cut anyway. It also assumes a fair amount of mathematical sophistication on the part of its users. It wasn’t too hard for me to figure out why we didn’t get the answer we expected. [For more on these points, see the addendum, below.] But this example does suggest caution when we try to use W|A for educational purposes. In fact, it reinforces the message that as we’re training our students to use computing tools, we need to make sure they’re doing so intelligently. One doesn’t need to fully anticipate the answer provided, but one should have some idea of what to expect, to check the answer’s reasonableness.</p> <p>I’m also not arguing that one has to be a stickler about constants of integration or absolute values in logarithms from the moment that they are introduced. Such matters are generally secondary in the early days of learning integration. But when motivation for such secondary matters naturally arises in examples of interest, that should be seized.</p> <p><b><i>Addendum (2/16):</i></b> When I first posted this, I <a href="https://twitter.com/Thalesdisciple/status/434787110034235392">suggested</a> that it was evidence of a bug in Wolfram Alpha. Later I realized that this is not technically the case, because with complex numbers, <a href="http://mathworld.wolfram.com/EulerFormula.html" target="_blank">Euler’s formula</a> shows that we <em>can</em> get all of the solutions; for instance, if we let $c_1 = i\pi + c_2$, then Wolfram Alpha’s answer becomes $L e^{c_2+k t}/(e^{c_2+k t}+1)$. Indeed, if we add the initial condition $y(0) = L/2$, then <a href="http://wolfr.am/1dWeHaT" target="_blank">W|A returns</a> $y(t) = Le^{kt}/(e^{kt}+1)$, as expected. It’s not enough, however, to specify that the equation be <a href="http://wolfr.am/1jIsHgH" target="_blank">solved over the reals</a>; doing so gives the same answer as at first. The pedagogical points I made about using technology still stand. They are perhaps even made stronger, as defaulting to computations over the complex numbers <a href="http://blog.wolframalpha.com/2013/04/26/get-real-with-wolframalpha-computing-roots/" target="_blank">often produces results that can be confusing for students</a>. This doesn’t mean we should avoid using such tools, but we should prepare our students to adapt to unexpected output.</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com3tag:blogger.com,1999:blog-30611202.post-31106896475299333662013-12-31T07:21:00.000-08:002013-12-31T09:39:11.648-08:00the best “real-life” use of geometry I saw this year<p>On May 3, 2003, the Old Man of the Mountain—a rock formation that had been known for at least two centuries as one of the natural wonders of New Hampshire—<a href="http://www.nhstateparks.org/explore/state-parks/old-man-mountain.aspx" target="_blank">collapsed</a>. No one saw it happen; that morning two park rangers looked up and realized he was gone. It had been expected that this day would arrive. The Old Man’s face was a remnant of ancient glacial movements, and it was not stable, thanks to erosion and freezing; it had already been repaired multiple times since the 1920s. In 2007, a project was begun to memorialize the Old Man, and in 2011 the “Profiler Plaza” was dedicated.</p> <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-o8U-4Gv-GLY/UsJXiFdThQI/AAAAAAAABX4/5QJLqELiteo/s1600/IMG_1271.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-o8U-4Gv-GLY/UsJXiFdThQI/AAAAAAAABX4/5QJLqELiteo/s400/IMG_1271.jpg" /></a></div> <p>Over fall break this year, my wife and I made a trip to the western edge of the White Mountains, where the Old Man of the Mountain used to reside. We stopped by the memorial to the Old Man that is now located on the edge of “Profile Lake”, where I was astounded by the ingenuity of the project that had been created. Not content with photographs or descriptive plaques, the <a href="http://www.oldmanofthemountainlegacyfund.org/" target="_blank">Old Man of the Mountain Legacy Fund</a> sought to recreate the experience of viewing the famous visage.</p> <div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-XLEfXyDw0WI/UsJWv7TXBgI/AAAAAAAABXo/8W0XlgTIDgA/s1600/IMG_1264.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-XLEfXyDw0WI/UsJWv7TXBgI/AAAAAAAABXo/8W0XlgTIDgA/s400/IMG_1264.jpg" /></a></div> <p>This optical illusion is created by looking along any of several different steel structures, called “profilers”.</p> <div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-Qaxw2YclAfs/UsLbe8YdE3I/AAAAAAAABYs/jIO7GV_o4zU/s1600/IMG_1266.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-Qaxw2YclAfs/UsLbe8YdE3I/AAAAAAAABYs/jIO7GV_o4zU/s320/IMG_1266.jpg" /></a><a href="http://3.bp.blogspot.com/-3-fCSSbhS00/UsLbjSWqgYI/AAAAAAAABY0/1r5ToiWrjU4/s1600/IMG_1265.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-3-fCSSbhS00/UsLbjSWqgYI/AAAAAAAABY0/1r5ToiWrjU4/s320/IMG_1265.jpg" /></a></div> <p>Each profiler has an array of raised features that, when viewed from an appropriate angle, line up to recreate the face on the mountain from the viewer’s perspective.</p> <div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-TOWKj5GMqes/UsJYogkNuUI/AAAAAAAABYE/ZqJND8fsSEY/s1600/IMG_1269.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-TOWKj5GMqes/UsJYogkNuUI/AAAAAAAABYE/ZqJND8fsSEY/s400/IMG_1269.jpg" /></a></div> <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-LHiBeKYpH4k/UsJYrQilHAI/AAAAAAAABYM/6672TMmnx7o/s1600/IMG_1267.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-LHiBeKYpH4k/UsJYrQilHAI/AAAAAAAABYM/6672TMmnx7o/s400/IMG_1267.jpg" /></a></div> <p>The distance from the Profiler Plaza to the Old Man’s former location is about half a mile, but for the profile effect to work requires careful placement of the viewer’s eyes. Thus each steel profiler comes equipped with three spots, marked according to the viewer’s height, so that they will be in the proper alignment. (Below is a picture of my wife looking at one of the profilers.)</p> <div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-n0IaTqml4c4/UsJgNQQvLSI/AAAAAAAABYc/bdoJPPI0Egc/s1600/IMG_1268.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-n0IaTqml4c4/UsJgNQQvLSI/AAAAAAAABYc/bdoJPPI0Egc/s400/IMG_1268.jpg" /></a></div> <p>I found this application of geometry to a memorial not only ingenious, but also quite stirring. The Old Man of the Mountain inspired several artistic works, including Nathaniel Hawthorne’s short story <a href="http://www.gutenberg.org/files/1916/1916-h/1916-h.htm" target="_blank">“The Great Stone Face”</a>. When I was in high school, my mom directed a theatrical adaptation of this story, in which I played the role of the visiting poet who appears near the end of the tale. So I felt a special connection to this place as I visited it for the first time.</p> <p>It seems this could make a useful cross-disciplinary lesson in school, say between English, geometry, and U.S. history. Students could study the stories of the Great Stone Face and the monument’s demise in 2003. Then they might be asked to choose a location and design the memorial, working out the necessary measurements. For instance, here is a link to a map with the face’s former location marked: <a href="https://maps.google.com/maps?q=44.1606%C2%B0+N,+71.6834%C2%B0+W&hl=en&ie=UTF8&ll=44.160595,-71.683388&spn=0.017764,0.033002&sll=44.163659,-71.681006&sspn=0.008882,0.016501&t=h&z=15" target="_blank">44.1606° N, 71.6834° W</a>. The actual location of the profilers is on the <a href="https://maps.google.com/maps?q=44.1657%C2%B0+N,+71.6785%C2%B0+W&hl=en&ie=UTF8&ll=44.165675,-71.678495&spn=0.017763,0.033002&sll=44.165598,-71.678603&sspn=0.00111,0.002063&t=h&z=15" target="_blank">north shore of Profile Lake</a>. If anyone carries this out, I’d love to know how it goes!</p> <p>Thanks for reading, and Happy New Year!</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-88869415507219156462013-12-16T09:11:00.000-08:002013-12-17T06:53:22.038-08:00some of my favorite Desmos projects from this semester<p><b><i>Important note:</i></b> You can click on any graph in this post to go to an interactive version. The interactivity is kind of the whole point, so please do take a few moments to experiment with some of these.</p> <p>At the start of last summer, I <a href="https://twitter.com/Thalesdisciple/statuses/335389579412647937?tw_i=335389579412647937&tw_e=details&tw_p=archive" target="_blank">announced</a> that the <a href="https://www.desmos.com/" target="_blank">Desmos graphing calculator</a> had sold me on its usefulness “after just a few minutes of playing around”. Since then, the <a href="https://www.desmos.com/team" target="_blank">Desmos team</a> has added a lot more features, without ever sacrificing user-friendliness (which, for those of us using Desmos to teach, is paramount).</p> <p>During the fall semester, I used Desmos extensively in my calculus classes at Smith College. I made “worksheets” that allowed students to interact with mathematical ideas in an incredibly direct way; I also had fun creating them. Eventually I figured out that Desmos and Google docs could be used together to make more <a href="https://docs.google.com/document/d/1q5jet6GMEWOPU6ykwCzP9lsu9qaeN_GGQOS7FgyfQvk/edit?usp=sharing" target="_blank">fully developed worksheets</a>. (A brief word about my teaching situation: at Smith, all students have a Google account for their email, and thus all have a school-related Google drive by which documents could be shared. On days we used worksheets, about half of the students would bring in laptops, and they would work in groups of 2–3. At the end of class, or afterwards if they had sections to finish, they would share their work with me so that I could review it.) I’ve shared some of these worksheets over time, but I thought it would be nice to have some of my favorites collected in a single place. For simplicity, I’ve removed a bunch of the “worksheet” structure to these, so that they have become more like demonstrations others can use as they wish. Not all of these were used in class, as sometimes I just had to play around with some ideas.</p> <p>First, some play. That you could not only define variables but also define functions in a Desmos graph and use them elsewhere came as a revelation to me, as did the fact that you could create sums with a variable number of terms. I first learned this while adding up sine functions à la Fourier sine series, which I wrote about <a href="http://thalestriangles.blogspot.com/2013/06/another-experiment.html">here</a>. After that, I made the <a href="http://en.wikipedia.org/wiki/Blancmange_curve" target="_blank">blancmange curve</a>, a classic example of a continuous but nowhere-differentiable function: <center><a title="Blancmange curve" href="https://www.desmos.com/calculator/nrqwgqonmw" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/iytfzz1jh5.png" width="400px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a></center>The ever-responsive and ever-creative Desmos team turned the blancmange curve into a mountain range, with a setting sun and moving train (you’ll definitely want to play with this one): <center><a title="Landscape with the Blancmange Curve" href="https://www.desmos.com/calculator/3ytwdkdi0m" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/3ytwdkdi0m.png" width="400px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a></center></p> <p>Onto the calculus demonstrations. Teaching calculus in the fall almost always leads to introducing derivatives near the equinox, around the time that days are getter shorter at their fastest rate. I have in the past just mentioned this as an illustration of the derivative. This fall, however, I had students explore how the changing amount of daylight is affected by time of year, latitude, axial tilt, etc. Here, for instance, is a graph depicting the amount of daylight on each of the year at latitude 35°N: <center><a title="Daylight Hours Explorer" href="https://www.desmos.com/calculator/jqp1beehos" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/jqp1beehos.png" width="400px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a></center>Here is the amount of daylight each day at latitude 50°S: <center><a title="Daylight Hours Explorer" href="https://www.desmos.com/calculator/gkql6w0lmj" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/gkql6w0lmj.png" width="400px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a></center>And here’s what the amount of daylight would be like just a few degrees away from the equator, if Earth had the same axial tilt as Uranus (about 82 degrees): <center><a title="Daylight Hours Explorer" href="https://www.desmos.com/calculator/24kcuh6dlt" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/24kcuh6dlt.png" width="400px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a></center></p> <p>Optimization takes up a chunk of time after derivatives have been introduced. Several classic problems deal with boxes whose surface area must be minimized, or whose volume must be maximized, under various constraints. I’ve always suspected students have trouble imagining what it means, for instance, to require a box have a square base and a fixed volume. What do the various shapes of such boxes look like? So I made a simple model of an open-topped box whose volume and base side length could be manipulated: <center><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/yfcap9dvw4" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/yfcap9dvw4.png" width="350px" height="300px" style="border:1px solid #ccc; border-radius:5px" /></a></center></p> <p>Then integration rolls around, with the requisite Riemann sums. Between the introduction of sigma notation, Δx‘s, and a host of other notation, it’s easy for students to feel like they have no idea what is going on. A picture can clear things up, because the idea is quite simple, but drawing enough pictures to show what it means for Riemann sums to converge can take an incredibly long time. Isn’t it nice that we can just show this now? <center><a title="Riemann sums" href="https://www.desmos.com/calculator/tgyr42ezjq" target="_blank"><img src="https://s3.amazonaws.com/grapher/exports/9clb9gsaxf.png" width="220px" height="165px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/xbqjll2gj8.png" width="220px" height="165px" style="border:1px solid #ccc; border-radius:5px" /><br><img src="https://s3.amazonaws.com/grapher/exports/tnpe8gnaxk.png" width="220px" height="165px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/eq1ksx88pm.png" width="220px" height="165px" style="border:1px solid #ccc; border-radius:5px" /></a></center>(I adapted this from <a href="https://www.desmos.com/calculator/j5626pgxtx" target="_blank">another Riemann sums demonstration</a>, made by Evan R.)</p> <p>In discussing differential equations, we took a day to look at the <a href="http://www.azimuthproject.org/azimuth/show/Logistic+equation" target="_blank">logistic model of population growth</a>. I <a href="https://twitter.com/Thalesdisciple/status/407216494901071872" target="_blank">asked on Twitter</a> if anyone had a suggestion for real-world data to base a project on. <a href="https://twitter.com/LiaSantilli/status/407264274864549889" target="_blank">Lia Santilli came up with a great idea</a> I would never have considered: the number of Starbucks locations open <i>t</i> years after the company started. I had the students create a table using the data available <a href="http://globalassets.starbucks.com/assets/e56b2a6b08244aaab0632dc6ac25ad0d.pdf" target="_blank">here</a>, then try to match the data as nearly as possible with a logistic curve. Here was my attempt: <center><a title="Starbucks logistic growth" href="https://www.desmos.com/calculator/lrajxtefxw" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/lrajxtefxw.png" width="400px" height="275px" style="border:1px solid #ccc; border-radius:5px" /></a></center>What I like about this example is that you can see how difficult it is to distinguish between exponential and logistic growth early on. Right up until the inflection point of the logistic curve, the growth seems exponential, and so naturally the company continues that growth trend for another few years. But as the market reaches saturation, it becomes clear that they’ve overshot the mark, and one year (2009) they actually have to close more locations than they open. After that, the growth is more restrained. I don’t know if the 20,000 locations I built into my model is actually the largest sustainable number for this “population”, but I like the challenges for management highlighted by this analysis.</p> <p>I also used Desmos a few times in my probability class for illustrations. I wrote about one example <a href="http://thalestriangles.blogspot.com/2013/09/surprises-part-2.html">here</a>. By using the built-in floor functions and combinations, it’s easy to show what various probability distributions look like, and how they change with the various parameters. For example, here is the probability distribution for the number of times a coin comes up heads in 20 tosses, if it is weighted so that it comes up heads 70% of the time (the vertical dotted line indicates the expected value of 14): <center><a title="Discrete distributions" href="https://www.desmos.com/calculator/gyl3z0aptn" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/gyl3z0aptn.png" width="400px" height="275px" style="border:1px solid #ccc; border-radius:5px" /></a></center>The ability to change parameters also makes it possible to nicely illustrate the <a href="http://en.wikipedia.org/wiki/Central_limit_theorem" target="_blank">Central Limit Theorem</a>. Here, for instance, is a graph showing the standard normal distribution (black), the distribution of an exponential random variable (blue), the sum of ten such random variables (green, heading off the right side of the graph), and a normalization of the sum to have mean 0 and variance 1 (red): <center><a title="Central Limit Theorem Demonstration" href="https://www.desmos.com/calculator/sopyrud31g" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/sopyrud31g.png" width="400px" height="275px" style="border:1px solid #ccc; border-radius:5px" /></a></center>You can see the convergence of the sum to a normal distribution beginning.</p> <p>Finally, I also found Desmos useful for illustrating parts of my research. One of the dynamical systems I’m studying is related to the three-cusped hypocycloid, or “deltoid”, which is traced out by a point marked on the circumference of a circle rolling around the inside of a circle three times the size: <center><a title="2- and 3-cusped hypocycloids" href="https://www.desmos.com/calculator/guhq4lbkua" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/wbdtdjdke6.png" width="180px" height="150px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/uc4plegojn.png" width="180px" height="150px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/ijltniarge.png" width="180px" height="150px" style="border:1px solid #ccc; border-radius:5px" /></a></center>Each point inside the deltoid lies on three tangent segments: <center><a title="2- and 3-cusped hypocycloids" href="https://www.desmos.com/calculator/l5x5nikjfj" target="_blank"><img src="https://s3.amazonaws.com/grapher/exports/l5x5nikjfj.png" width="175px" height="160px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/aojxr4njg9.png" width="175px" height="160px" style="border:1px solid #ccc; border-radius:5px" /></a></center>Perhaps most exciting for me was when I discovered that all the pedal curves of the deltoid could be easily seen and manipulated. A pedal curve is determined by the orthogonal projections of a fixed point onto the tangent lines of the deltoid: <center><a title="2- and 3-cusped hypocycloids" href="https://www.desmos.com/calculator/j6z508ef2k" target="_blank"><img src="https://s3.amazonaws.com/grapher/exports/j6z508ef2k.png" width="170px" height="160px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/knkzhyzljt.png" width="170px" height="160px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/ylxy2cf4jj.png" width="170px" height="160px" style="border:1px solid #ccc; border-radius:5px" /></a><br><a title="2- and 3-cusped hypocycloids" href="https://www.desmos.com/calculator/q7j0n8otej" target="_blank"><img src="https://s3.amazonaws.com/grapher/exports/q7j0n8otej.png" width="170px" height="160px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/tg4yujaolb.png" width="170px" height="160px" style="border:1px solid #ccc; border-radius:5px" /><img src="https://s3.amazonaws.com/grapher/exports/3amisfurg1.png" width="170px" height="160px" style="border:1px solid #ccc; border-radius:5px" /></a></center>The pedal curves in the above examples were drawn using explicit parametrizations. They can also be defined implicitly by fourth-degree polynomial equations. Since Desmos recently added the ability to plot implicit curves where both variables appear with degree greater than 2, I thought I’d share another, simpler graph that illustrates this functionality: <center><a title="Implicit pedal curves" href="https://www.desmos.com/calculator/mzt1yzdhcs" target="_blank"><img src="https://s3.amazonaws.com/grapher/exports/mzt1yzdhcs.png" width="365px" height="350px" style="border:1px solid #ccc; border-radius:5px" /></a></center></p> <p>So that’s an assortment of things I’ve done with Desmos over the past few months, some big, some small. For teachers planning to use Desmos with their students, I would make the following suggestions: <ol><li>Draw them in with something interactive and manipulable. Teach them early to recognize that different shapes can be given by the same formula simply by changing a few parameters, and to explore the effects that the parameters have.</li><li>Get them to create their own graphs. In the past, we had to do all the work to create the worksheets and the models, but now students can be enabled to build their own; when they do, they will benefit from creating, not just responding.</li><li>Give them questions that require thoughtful use of the technology they have; simply having access is not a panacea. For example, real-world problems often have models that call for very different scales on the vertical and horizontal axes. Students can be tempted just to use the zoom buttons, causing them to miss important details. Make sure they know they have to think about the graphs they’re creating, not just rely on the computer to show them everything, because it won’t.</li></ol>For everyone, I encourage widespread use of Desmos and similar tools for education, illustration, research, and entertainment. The Desmos development team deserves an immense amount of thanks for providing us with such graphing and computational power.</p> <p><i>Added 12/17–</i> Check out these other graphs for calculus, made by Patrick Honner: <a href="http://mrhonner.com/desmos" target="_blank">http://mrhonner.com/desmos</a></p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-34037177963184475812013-09-26T09:52:00.000-07:002013-09-26T10:45:02.008-07:00surprises, part 2<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p>In my <a href="http://thalestriangles.blogspot.com/2013/09/my-favorite-surprise.html">last post</a>, I described a classic problem in probability: suppose a certain disease occurs in 0.5% of a given population, and there is a test that detects it with 99% accuracy, but also returns a false positive 1% of the time it is used on a healthy person. What is the conditional probability that an individual has the disease, given that they have a positive result from the test? The answer, somewhat surprisingly, turns out to be <a href="http://wolfr.am/18rZgs7" target="_blank">less than a third</a>.</p> <p>When we discussed this in my probability class, one student asked a very sensible question: <em>What if we test the person twice?</em></p> <p>This question seemed worth investigating. As I see it, the question can be interpreted two ways. On one hand, what if we tested <em>everyone</em> twice? How would that affect the conditional probability given above? On the other hand, what if we <em>only</em> gave the test a second time to those who had a positive first test? Would we be more likely to filter out those who are actually ill in that case, having restricted to a population in which the disease is more prevalent? Do these two methods produce different results?</p> <p>To begin with, let’s return to the original question and analyze it more thoroughly by introducing some variables. Let $r$ be the prevalence of the disease in the total population (which can be interpreted as the probability that any particular individual has the disease). Suppose the test we have returns a <em>true positive</em> (a positive result for someone who is ill) with probability $p$ (called the <a href="http://en.wikipedia.org/wiki/Sensitivity_and_specificity#Sensitivity" target="_blank">sensitivity</a> of the test), and it returns a <em>false positive</em> (a positive result for someone who is well) with probability $q$ (the value $1 - q$ is called the test’s <a href="http://en.wikipedia.org/wiki/Sensitivity_and_specificity#Specificity" target="_blank">specificity</a>). <a href="http://thalestriangles.blogspot.com/2013/09/my-favorite-surprise.html" target="_blank">Bayes’ formula</a> then says that the probability of having the illness given a positive test result is \[ P(r) = \frac{r \cdot p}{r \cdot p + (1 - r) \cdot q}. \] If we fix $p$ and $q$ and let $r$ vary, we get a graph like the following:<br><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/ssqr1yed3w" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/ssqr1yed3w.png" width="400px" height="400px" style="border:1px solid #ccc; border-radius:5px" /></a><br>(drawn here with $p = 0.98$ and $q = 0.05$; you can click on the graph to go to an interactive version). Notice the large derivative for small values of $r$; that low conditional probability we got at the beginning? Was essentially an artifact of the disease itself being fairly uncommon. (As one student slyly put it, “so the way to make a positive test more likely to mean you’re sick is to give more people the disease?”) Raising the value of $p$ doesn’t change the graph much. The real problem lies in the false positives; if the disease is sufficiently rare, then having any chance at all of false positives ($q > 0$) means that the false positives will outnumber the true positives.</p> <p>If we change the situation so that every time an individual is tested we administer the test twice, then a few things happen. First, the chance of getting two false positives when testing a healthy individual is $q^2$, which is generally much smaller than $q$. Meanwhile, the chance of getting two positives when testing a sick individual is $p^2$, smaller than $p$ but not by much. The result is a much steeper curve for low-prevalence diseases:<br><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/bbj7rzrfz5" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/bbj7rzrfz5.png" width="400px" height="400px" style="border:1px solid #ccc; border-radius:5px" /></a><br>(the red curve is the same as before; the purple curve represents the probability of having the illness given two positive tests). Effectively, we have created a <em>new</em> test with a much reduced chance of false positives.</p> <p>But testing <em>everyone</em> twice seems unnecessary. Just as a low prevalence leads to a reduced probability that a positive result means the disease is actually present, so it also reduces the probability that one is ill given a negative result. Here is the graph of this latter conditional probability (that is, the prevalence of the disease among those who have a negative test):<br><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/mf9x7v3kjo" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/mf9x7v3kjo.png" width="400px" height="400px" style="border:1px solid #ccc; border-radius:5px" /></a><br>So we shouldn’t worry too much about those who have a negative test. We can give the test a second time just to those who have a positive first test. In effect, rather than creating a new test as before, we have restricted to a new <em>population</em>, in which the disease is far more prevalent (as given by the original conditional probability $P(r)$). Here is the graph of the original function $P(r)$ (again in red) together with the graph (in orange) of the probability of having the disease given a positive result and being among those who had a first positive test:<br><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/vhayu89z2r" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/vhayu89z2r.png" width="400px" height="400px" style="border:1px solid #ccc; border-radius:5px" /></a></p> <p>Do you notice something about the purple and orange curves in the graphs above? They are the same. I admit, this surprised me at first. I thought that having a second positive result when restricted to those who already had one would make it more likely that one had the disease than if we tested everyone twice indiscriminately. But <a href="http://wolfr.am/1aYX3rC" target="_blank">the algebra bears out this coincidence of graphs</a>. It doesn’t matter whether everyone is tested twice or just those who first have a positive result; the conditional probability of having the disease after two positive tests is the same either way. In the latter case, of course, far fewer total tests are administered.</p> <p>Something we haven’t considered yet is what it means to have one positive and one negative test. Here the relative sizes of $p$ and $1-q$ matter. You can check that if $p + q = 1$, then having one positive and one negative test returns one’s likelihood of having the disease back to that of the overall population (because a sick person and a healthy person have the same chance of getting one positive and one negative result). However, if $q$ is greater than $1-p$ (that is, if a healthy person is more likely to have a false positive than a sick person is to have a false negative), then obtaining different results on two tests means one’s chance of having the disease is slightly <em>less</em> than in the overall population. One last graph, in which the red and blue curves from before reappear, together with a green curve representing the probability of having the disease given one positive and one negative test:<br><a title="View with the Desmos Graphing Calculator" href="https://www.desmos.com/calculator/5kp9g1tkgf" target="_blank"> <img src="https://s3.amazonaws.com/grapher/exports/5kp9g1tkgf.png" width="400px" height="400px" style="border:1px solid #ccc; border-radius:5px" /></a><br>Conversely, if $q$ is less than $1 - p$, then the green curve would lie slightly above the diagonal.</p> <p>The ideas we have been exploring are at the heart of <em><a href="http://mathworld.wolfram.com/BayesianAnalysis.html" target="_blank">Bayesian analysis</a></em>, in which a certain assumption (called a <em>prior</em>) about how some characteristic is distributed is fed into a conditional probability model, and a new distribution is obtained. The new distribution becomes the new prior, and the process may be repeated. This kind of analysis depends on a Bayesian view of probability, in which the distribution represents a measure of belief (rather than any necessarily objective knowledge), and how that belief changes with the introduction of new knowledge. In our case, our prior was the assumption that the disease had prevalence $r$, and the new knowledge we introduced was the result of a medical test. This is the same kind of analysis—at a much more elementary level—that <a href="http://scientopia.org/blogs/goodmath/2012/11/09/debunking-two-nate-silver-myths/" target="_blank">Nate Silver made famous</a> (or perhaps that made Nate Silver famous) during recent election seasons. I must say, I was pleased that a student’s question led so neatly into this timely topic.</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-14555666144002880892013-09-17T19:15:00.000-07:002013-09-17T21:07:35.709-07:00my favorite surprise<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p>I’m excited about tomorrow. Tomorrow in my probability class, we’re going to start discussing <a href="http://en.wikipedia.org/wiki/Bayes'_theorem" target="_blank">Bayes’ Formula</a>. This is the main thing I remember about my college probability class. While we have already seen some surprising results in particular cases where the rules of probability are applied, this is, to me, the first truly surprising <em>general</em> result. It changes everything I think about probability.</p> <p>Here’s my motivating example: suppose we have before us <a href="http://en.wikipedia.org/wiki/Urn_problem" target="_blank">an urn</a> that contains five blue balls and five red balls. We draw two balls, in order, and record their colors. (To be clear, this is <a href="http://stattrek.com/statistics/dictionary.aspx?definition=sampling_without_replacement" target="_blank">sampling without replacement</a>.) What is the probability that the first ball is red? “Well,” you say, “evidently the likelihood of that is 1/2, because half of the balls are red.” “Very well,” I say, “then what is the probability that the first ball is red, <em>assuming that the second ball is blue?</em>” “What does that have to do with it?” you ask. “When the second ball is drawn, the first one has already been chosen, so how could the second ball turning up blue have anything to do with the probability that the first ball is red?”</p> <p>Let’s throw some formulas in here. Suppose $E$ and $F$ are two events in the sample space $S$ of an experiment. (I discussed the definitions of these terms in <a href="http://thalestriangles.blogspot.com/2013/09/a-biased-game-of-war.html" target="_blank">my previous post</a>.) The <em>conditional probability of $E$ given $F$</em>, written $P(E\mid F)$, is the quotient $\dfrac{P(E \cap F)}{P(F)}$, meaning, loosing speaking, that we consider all the ways <em>both</em> $E$ and $F$ can occur (weighted by their individual probabilities), and think of this as just a subset of the outcomes where $F$ occurs (rather than all of $S$). “Sensible enough,” (I hope) I hear you say. Now, you will hopefully also agree that we can split $F$ into two parts: the one that intersects $E$ and the one that does not, i.e., $F = (F \cap E) \cup (F \cap E^c)$. “Aren’t you overcomplicating things?” you demur. “Just wait,” I plead. Because the events $F \cap E$ and $F \cap E^c$ are <em>mutually exclusive</em> (i.e., disjoint), and so we have $P(F) = P(F \cap E) + P(F \cap E^c)$. Interesting, no? So we can write \[ P(E \mid F) = \frac{P(E \cap F)}{P(E \cap F) + P(E^c \cap F)} \] (using the fact that $E \cap F = F \cap E$). And now perhaps it seems like this manipulation isn’t so weird, because in our “motivating case”, each of the terms in that expression isn’t so hard to compute, and in fact one of them appears twice!</p> <p>So what happens? Let’s return to our urn and say $E$ is the event “the first ball is red”, while $F$ is the event “the second ball is blue”. Then $P(E \cap F) = \big(\frac{5}{10}\big)\big(\frac{5}{9}\big)$ and $P(E^c \cap F) = \big(\frac{5}{10}\big)\big(\frac{4}{9}\big)$, so \[ P(E \mid F) = \frac{\big(\frac{5}{10}\big)\big(\frac{5}{9}\big)}{\big(\frac{5}{10}\big)\big(\frac{5}{9}\big)+\big(\frac{5}{10}\big)\big(\frac{4}{9}\big)} = \frac{25}{25+20} = \frac{5}{9}. \] Since 5/9 > 1/2, it is <em>more</em> likely that the first ball was red if we know that the second ball is blue! (Surprised? Think about what happens if there are only two balls to begin with, one blue and one red. Once that’s sunk in, try the above again starting with $m$ blue and $n$ red balls in the urn.)</p> <p>So far I’m cool with everything that’s happened. The realization that later events provide information about earlier ones is a bit of a jolt, but not so far-fetched after a little reflection. Bayes, however, endeavors to turn our minds further inside-out. We just need one new idea, just as simple as everything we’ve done to this point: the equation for conditional probability can be rewritten as $P(E \cap F) = P(F) \cdot P(E \mid F)$. And of course, because $E \cap F = F \cap E$, we could just as well write $P(E \cap F) = P(E) \cdot P(F \mid E)$. Now, as before, let’s split $F$ into $E \cap F$ and $E^c \cap F$. Using our most recent observation, we have \[ P(E \mid F) = \frac{P(E) \cdot P(F \mid E)}{P(E) \cdot P(F \mid E) + P(E^c) \cdot P(F \mid E^c)}. \] “Now why on Earth…?” you splutter, to which I reply, “Because sometimes the knowledge you have is more suited to computing the conditional probabilities on the right than finding the one on the left directly from the definition.”</p> <p>Here’s a classic example. Suppose there is an uncommon illness that occurs in the general population with probability 0.005 (half of percent). Suppose further that there is a medical test for this affliction that is 99% accurate. That is, 99% percent of the time the test is used on a sick patient, the test returns positive, and 99% of the time it is used on a healthy patient, it returns negative. You are concerned that you might have this illness, and so you have the test. It comes back positive. What is the probability that you have the illness?</p> <p>Do you see where this is going? You’re interested (well, we both are, really, because I care about your health) in the event $E$ “I have the illness.” The information we have, though, is that the event $F$ “the test came back positive” occurred. And what we know about the test is <em>how its results depend on the patient being sick or well</em>. That is, we know $P(F \mid E)$ and $P(F \mid E^c)$, and fortunately we also know $P(E)$ (ergo we also know $P(E^c) = 1 - P(E)$). We can compute the likelihood of your being ill as \[ P(E \mid F) = \frac{(0.005)(.99)}{(0.005)(0.99) + (0.995)(0.01)} \approx 0.3322. \] Far from it being a certainty that you have this particular illness, your chances are better than 2 in 3 that you don’t! Even if the illness were twice as common and occurred in 1% of the population, your chances of being sick are only 1 in 2 after the test comes back positive. (Notice that this probability—this <em>conditional</em> probability—depends not only on the efficacy of the test, but also on the prevalence of the illness.)</p> <p>And that’s my favorite surprise in probability.</p> <p>(If you haven’t read it yet, you should go look at the <a href="http://opinionator.blogs.nytimes.com/2010/04/25/chances-are/" target="_blank">article Steven Strogatz wrote</a> for the New York Times about Bayes’ Theorem, in which he makes it seem—somewhat—less surprising.)</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-6835723897401689272013-09-11T19:30:00.000-07:002013-09-12T06:34:18.621-07:00a (biased) game of war<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <p>Here was the problem that took up half our class time today trying to solve (time well spent, I believe): Two players, Alexa and Beatrice, take turns drawing cards from a deck. The first one to draw an ace wins. Alexa draws first. What is the probability Alexa wins?</p> <p>This problem could be solved with a naïve view of probability, but because I assigned it in order for the class to practice using modern definitions, I’d like to spell those out. First, a <em>sample space</em> is the set $S$ of all possible outcomes of a given experiment (here, the game). A subset of the sample space is called an <em>event</em>. (One might think of an “event“ as “all the possible ways the experiment can succeed,” for whatever definition of “success” we might like.) A <em>probability measure</em> on $S$ is a function that takes each event $E$ and assigns to it a number $P(E)$ with the following properties: <ul><li>For every event $E$, $0 \le P(E) \le 1$.</li><li>$P(S) = 1$ (I think of this as the statement “whenever you perform the experiment, something happens.”)</li><li>If $E_1$, $E_2$, $E_3$, …, is any sequence of <em>mutually exclusive</em> events (meaning that $E_i \cap E_j = \varnothing$ whenever $i \ne j$), then \[ P \left( \bigcup_{i=1}^\infty E_i \right) = \sum_{i=1}^\infty P(E_i). \] (Note that this still holds if the sequence only has finitely many events; for example, if $E$ and $F$ are mutually exclusive, then $P(E \cup F) = P(E) + P(F)$.) </li></ul>(This definition works fine if $S$ is finite; if $S$ is uncountably infinite, then we have to be more careful about what kinds of events we can allow. But in our situation $S$ is finite, so no worries.) This is the modern, axiomatic view of probability, essentially as formulated by Kolmogorov. (If you haven’t read Slava Gerovitch’s recent, excellent <a href="http://nautil.us/issue/4/the-unlikely/the-man-who-invented-modern-probability" target="_blank">article on Kolmogorov’s life</a>, you should.) The key realization is that the probability of an event is not given <em>a priori</em>; we must <em>always</em> make assumptions about the likelihood of an event.</p> <p>The first question that arises in analyzing the game between Alexa and Beatrice is therefore, What is the sample space? That is, what are the possible outcomes? The students in my probability class came up with essentially three possibilities: <ol><li>The set “Alexa wins on her $n$th draw, or Beatrice wins on her $n$th draw“.</li><li>The set of all sequences of cards drawn from a deck, with the last card drawn an ace.</li><li>The set of all possible orderings of the 52 cards in the deck.</li></ol>In class, I added the following analysis of these choices: <ol><li>This is the simplest description of the set of possible outcomes; one is only concerned with who won, and at which point in the game she won.</li><li>This is the most “natural” (i.e., experiential) description of the possible outcomes; one pays attention only to which cards appear in the course of the game.</li><li>This is the most “equitable” description of possible outcomes; we generally assume that, if the deck has been well-shuffled, then all orderings are equally likely.</li></ol>(One could add the set of outcomes, “Alexa wins” or “Beatrice wins”, as a potential choice of sample space, but I had already suggested to the students that it is worthwhile to consider <em>when</em> a particular player wins.) Another way to distinguish the third choice of sample space from the first two is philosophical: is the experiment done when the players have finished drawing cards, or before that, when the deck is placed in front of them? After all, once the deck is shuffled and laid down, if one knows the order of the cards, then one knows who will win. One student delivered a description of how to find the probability that Alexa wins, when the sample space is chosen to be “all possible orderings of the deck”: \[ P(\text{Alexa wins}) = \frac{\#(\text{orderings in which the first ace appears in an odd position})} {\#(\text{possible orderings of the whole deck})}. \] The denominator equals, of course, 52!. Counting the numerator then becomes the challenge. It is equivalent to the solution proposed below.</p> <p>A useful approach, applicable to many situations, is to think about the sequence of events, “Alexa wins on her <em>first</em> turn“, “Alexa wins on her <em>second</em> turn“, etc. These events are clearly mutually exclusive: if Alexa wins after drawing her seventh card, then the game is over, and she won’t win after drawing ten cards. So if we can find the probability of each of these events, then we can add them up to find the total probability that Alexa wins.</p> <p>Finding the probability that Alexa wins on the first turn is straightforward enough; she just has to draw an ace from the deck. This will happen with probability 4/52 = 1/13.</p> <p>Finding the probability that Alexa wins on her second turn is a bit trickier, but contains the germ of the complete solution. First, she must draw anything but an ace; the probability of this is 48/52. Then Beatrice must draw anything but an ace; since Alexa has already drawn one card, the probability of this is 47/51. Then Alexa draws an ace; with only 50 cards remaining, her chances are 4/50. Thus the probability she wins after drawing two cards is (48/52)(47/51)(4/50).</p> <p>Now we can see how the general case works: in order for Alexa to win on her $n$th turn, she and Beatrice must each have drawn $(n-1)$ cards that are not aces, and Alexa must draw an ace after that. The probability of this happening is $\frac{48}{52}\cdot\frac{47}{51}\cdot\frac{46}{50} \cdots \frac{48-2n+1}{52-2n+1}\cdot\frac{4}{52-2n+2}$ (as long as $n > 1$).</p> <p>How many turns must we consider? Well, suppose all four aces were at the end of the deck. Then Alexa and Beatrice would each draw 24 cards before reaching an ace, and Alexa would win on her 25th turn. The probability of this happening is \[ \frac{48!4!}{52!} = \frac{1}{270\,725} \approx 0.0000037, \] since the 48 non-aces can appear in any order, followed by the four aces. Thus we need to add together the probability that Alexa wins on each of her 25 (potential) turns to get the total probability that she wins.</p> <p>Before we compute this total, what do we think the result should be? Does Alexa have a greater chance of winning than Beatrice, or the reverse? Or are their chances equal? The students came up with the following arguments: <ul><li>Alexa has more potential turns than Beatrice—25 as opposed to 24, because Beatrice could never win on her 25th—so Alexa is more likely to win. (Given how small is the probability that Alexa wins on her 25th turn, I find this argument not very convincing.)</li><li>Each time Beatrice draws a card, she has fewer cards to choose from, but the same number of aces, so her likelihood of drawing an ace at each turn is greater than Alexa’s, and she has a better chance of winning. (This, however, disregards the sequence of events necessary for Beatrice to have the opportunity to draw.)</li><li>As previously mentioned, we just want to know how likely it is that the first ace appears in an odd position. It seems that there should be no preference for the first ace to appear in an odd versus an even position, so the two players have equal chances of winning. (This does make sense to me, intuitively, but there’s something skewed about the “first ace” that makes it questionable. As we’ll see, the computations do not bear this estimation out, but they nearly do.)</li></ul>Apparently, Alexa’s advantage in drawing first has the potential to be balanced out by Beatrice’s greater chance of drawing an ace due to the reduced number of cards in the deck on her turn.</p> <p>I don’t know how to resolve this philosophical conundrum without actually doing the computation, so here goes. With a bit of reindexing, the probability that Alexa wins can be computed by the sum \[ \sum_{N=0}^{24} \frac{48!}{(48-2N)!}\cdot\frac{(52-2N)!}{52!}\cdot\frac{4}{52-2N} \] (here $N$ is the number of turns that have elapsed <em>before</em> Alexa draws her $(N+1)$st card). <a href="http://wolfr.am/17RcJcC" target="_blank">Using Wolfram|Alpha</a>, we find that this sum is about <b>0.5198</b>. Therefore Alexa has a small but clear advantage. In fact, her chances of winning are better than the house’s chances of winning at <a href="http://en.wikipedia.org/wiki/Roulette" target="_blank">French roulette</a> if you, the gambler, bet on all reds, which is 19/37, or about 0.5135. We can check our work (i.e., do we have the right formula?) by applying similar reasoning to compute Beatrice’s chances of winning as \[ \sum_{N=0}^{23} \frac{48!}{(48-2N-1)!}\cdot\frac{(52-2N-1)!}{52!}\cdot\frac{4}{52-2N-1}, \] <a href="http://wolfr.am/1b7x8gD" target="_blank">which is about 0.4802</a>, as we would expect. </p> <p>I have two lingering questions about this scenario: <ul><li>Is there a way to determine that the first player has an advantage in the game without doing the full computation? That is, can one provide a “rigorous” (but not overly computational) argument that Alexa will win more than half the time?</li><li>The probability that the first player wins in this game must necessarily be a rational number, because it is a finite sum of rational numbers. As it turns out, this fraction is exactly 433/833, which has a remarkably small denominator (especially considering that the last term in the sum is 1/270725). Is the number 833 intrinsically meaningful in this situation, or is this a coincidence? </ul>Comments are welcome.</p> <p><b>Added 9/12:</b> I have made perhaps a small step towards understanding heuristically why the first player should have a small advantage. Suppose instead of the winner being the first one to draw an ace, it is the first one to draw the ace of spades. Then the players have equal chances: a specific card is equally likely to be in either an even or odd position. Now suppose that the winning condition is to draw a red card; then the first player already has 1/2 probability of winning on her first turn, apart from any advantage she has later in the game. It seems plausible that if the winning condition depends on drawing one of $k$ designated cards before the other player does, then the first player’s probability of winning is an increasing function of $k$, reaching probability 1 when $k$ is 52 (i.e., the winning condition is “draw a card”). I may get around to calculating how the answer varies with $k$.</p>Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0tag:blogger.com,1999:blog-30611202.post-34397887802377654782013-09-05T09:58:00.000-07:002013-09-05T09:58:15.587-07:00some Smith student summer projectsWe just had our first weekly math lunch, during which several of our math majors explained what they had done over the summer. Here are just a few of the projects they described (I couldn’t remember them all!): <ul><li>investigating the sensitivity of face-recognition algorithms to certain properties, like whether the face is turned towards the camera or not;</li><li>restoring and classifying polylink models, originally created by Alan Holden in the 1970s;</li><li>turning research from the spring into a professional-level paper for publication;</li><li>starting a research project on dynamics, game theory, and biology to describe interaction between snails and crabs;</li><li>an REU about integrating monomials over the Cantor set;</li><li>helping a faculty member teach statistics to 14–15-year-old girls in an intensive two-week program;</li><li>much more!</li></ul>Needless to say, it’s always an honor and a pleasure to be working with these students.Joshua Bowmanhttps://plus.google.com/103262883740888913105noreply@blogger.com0