Thursday, February 27, 2014

big mistake or little mistake?

One of my friends shared this picture on Facebook—

(which came via mathtricks.org)—and suggested that the teacher who graded this assignment should not be teaching math at all. I suspected that the grading had fallen prey to a heavy teaching load for an elementary school teacher who might not be as comfortable as they’d like to be with mathematical concepts, so I wrote this response (which I’ve edited slightly):

The teacher is doing something rather sophisticated—solving a more general problem—which is what makes it easy to trip up on the apparent simplicity of this question. Consider the following similar questions:

“It took Marie 10 minutes to paint two boards. If she works just as fast, how long will it take her to paint three boards of the same size?”

“It took Marie 20 minutes to saw a board into 5 pieces. If she works just as fast, how long will it take her to saw another board into 6 pieces?”

In the case of the first alternative question I’ve proposed, the teacher’s reasoning would be entirely correct: 10 minutes for 2 boards means 15 minutes for three boards. Although this is not the question that’s being asked, sometimes it’s helpful to think of situations where an incorrect sequence of reasoning becomes correct in order to identify where the mistakes are.

In the case of the second alternative question I’ve proposed, think about how you would solve it. Would you divide the 20 minutes into 5 equal periods of time, or 4? Would you blame someone for dividing by 5 the first time they attempted to solve the problem? Once you figure out that what's important is the 4 cuts it takes, rather than the 5 pieces that are produced, then you can solve any such problem. For example, “If Marie takes an hour to cut a board into 6 pieces, then how long will it take to saw another board into 12 pieces?” (The answer, btw, is not 2 hours.)

The reasoning the teacher wrote on the paper is clearly of this latter kind. Their mistake is not in computation, but in choosing what aspect of the problem deserves attention, namely the cuts in the wood and not the resulting pieces. This leads to nothing more than an “off by 1” error, which is easily corrected. I would be happy to see this reasoning written on a student’s paper, because I would know that only a small correction is needed, after which the student could solve the much more general problem, thanks to a demonstrated understanding of proportion.

Math teachers have to be prepared to look for this kind of demonstrated understanding in order to hone in on where a student is making mistakes in their reasoning. This particular case is an example of someone who is teaching math, but probably also a lot of other subjects, and may or may not have training in mathematical thinking. So the more sophisticated concept—proportionality—steps in and overrides a simpler formulation of the problem, which just involves counting. This kind of mix-up is common not just in students, but among all people. Which is why I don't think it’s incompetence, but a symptom of the need for more mathematical training for teachers.

I’m curious how other math teachers would have responded to this discussion. There are certainly those in the math community that can give clearer expression to what I was trying to say. Other commenters on Facebook seemed baffled that a teacher could make this mistake in grading, but I think it’s not such a serious error in reasoning (except that the teacher should have been correcting this on students’ papers, rather than making the mistake on their own).

So, what do you think?

Friday, February 21, 2014

a bit of ex-spline-ation

Splines are piecewise-polynomial functions that interpolate between a finite set of specified points $(x_1,y_1)$, …, $(x_n,y_n)$. Cubic splines assume that each piece has at most third degree; this allows the formation of curves that appear quite smooth to the eye, as one has sufficient freedom to match both first and second derivatives at the joining points. Either the first or second derivative may be chosen freely at the first and last points; in the case of natural cubic splines, the assumption is that the second derivative vanishes at those points. I spent part of this week trying to understand how they work, and so I decided to make Desmos graphs that would illustrate how to interpolate by cubic splines for sets of three and four points.


Click on the left image to go to the graph with three points,
or the right image to go to the graph with four points.

The process I followed for three points was straightforward, if not lovely. Suppose the interpolating functions are $f_1$ and $f_2$. The assumption that $(x_1,y_1)$ and $(x_3,y_3)$ are inflection points means that $f_1$ and $f_2$ have the form \[ f_1(x) = a_1 (x - x_1)^3 + b_1 (x - x_1) + y_1 \] and \[ f_2(x) = a_2 (x - x_3)^3 + b_2 (x - x_3) + y_3 \] (think in terms of Taylor polynomials around $x_1$ and $x_3$). We need to find the coefficients $a_1, b_1, a_2, b_2$. Two conditions arise from the fact that $f_1(x_2) = f_2(x_2) = y_2$. The condition that the second derivatives match at $x_2$ implies $a_1 (x_2 - x_1) = a_2 (x_2 - x_3)$. The fourth and final condition is that the first derivatives match at $x_2$, and now the system can be solved to find $f_1$ and $f_2$ entirely.

While preparing to make a graph for four points, I came across a post on the Calculus VII blog that breaks down the whole process of computing splines in a clever and beautiful way, which also reduces the complexity of the computation. In addition, the post provides, in rough outline, a motivation for why natural cubic splines are a good choice for interpolation, and I recommend reading the whole thing. I did have to work out several of the details for myself, however, particularly since that post only deals with $x$-values spaced one unit apart. I thought that it might be useful for others to see the process that led to the formulas I use on the graph. Lots of algebra ahead.

We start with four points, $(x_1,y_1)$, $(x_2,y_2)$, $(x_3,y_3)$, and $(x_4,y_4)$, with $x_1 < x_2 < x_3 < x_4$. The first observation is that the easiest kind of interpolation is piecewise-linear, so we compute the three slopes \[ m_1 = \frac{y_2 - y_1}{x_2 - x_1}, \qquad m_2 = \frac{y_3 - y_2}{x_3 - x_2}, \qquad m_3 = \frac{y_4 - y_3}{x_4 - x_3} \] for the three segments between successive pairs of points, and the linear functions $L_1$, $L_2$, and $L_3$ corresponding to this interpolation, $L_i(x) = m_i (x - x_i) + y_i$.


Linear interpolation

The next big idea is that we want to adjust the piecewise-linear approximation by adding cubic “correction” terms $C_1$, $C_2$, and $C_3$, so that our final interpolating functions become $f_i = L_i + C_i$, $i = 1,2,3$, where $C_i(x_i) = C_i(x_{i+1}) = 0$. These latter conditions imply that $C_i$ can be written in the form \[ C_i(x) = a_i (x - x_i) (x - x_{i+1})^2 + b_i (x - x_i)^2 (x - x_{i+1}), \] which means that the first and second derivatives are \[ C_i'(x) = a_i (x - x_{i+1})^2 + 2(a_i + b_i) (x - x_i) (x - x_{i+1}) + b_i (x - x_i)^2 \] and \[ C_i''(x) = (4a_i + 2b_i) (x - x_{i+1}) + (2a_i + 4b_i) (x - x_i). \] Note also that $f_i' = m_i + C_i'$ and $f_i'' = C_i''$.

What other properties do we want these cubic functions to have?

  • For the derivatives of the $f_i$s to match at $x_2$ and $x_3$, we must have $m_i + C_i'(x_{i+1}) = m_{i+1} + C_{i+1}'(x_{i+1})$ for $i = 1,2$.
  • We want the second derivatives of the $C_i$s to match at $x_2$ and $x_3$ (this is the same as matching the second derivatives of the $f_i$s).
  • We also require that the second derivatives be zero at the outer endpoints.
Now a curious twofold effect comes into play:
  • The coefficients $a_i$ and $b_i$ are linear combinations of $z_i = C_i''(x_i)$ and $z_{i+1} = C_i''(x_{i+1})$. To wit, solving the system \[ \begin{cases} z_i &= (4 a_i + 2b_i) (x_i - x_{i+1}) \\ z_{i+1} &= (2a_i + 4 b_i) (x_{i+1} - x_i) \end{cases} \] for $a_i$ and $b_i$ yields \[ a_i = \frac{2z_i + z_{i+1}}{6 (x_i - x_{i+1})}, \qquad b_i = \frac{2z_{i+1} + z_i}{6 (x_{i+1} - x_i)} \]
  • The condition that the second derivatives be equal is exceedingly simple; we have already used it implicitly in labeling them as $z_1$, $z_2$, $z_3$, $z_4$.
Our assumption at the endpoints is that $z_1 = z_4 = 0$. Thus, the whole problem reduces to finding what should be the second derivatives at the “interior” points $x_2$ and $x_3$.


Cubic correction terms

The idea of parametrizing by the second derivatives, after removing the “linear” effects, is where the beauty and cleverness in this solution lie. We use the assumption about matching first derivatives (a linear condition in the coefficients) to set up the remaining conditions on the second derivatives (which themselves depend linearly on the coefficients). Looking back, this is essentially what I did for three points, but I missed out on dealing with the linear effects separately, so I had to solve for three variables simultaneously. At this point, we only need to solve for two: $z_2$ and $z_3$.

Since $C_i'(x_i) = a_i (x_i - x_{i+1})^2 = \frac{1}{6} (2z_i + z_{i+1})(x_i - x_{i+1})$ and $C_i'(x_{i+1}) = b_i (x_{i+1} - x_i)^2 = \frac{1}{6} (2z_{i+1} + z_i)(x_{i+1} - x_i)$, the equations $m_i + C_i'(x_{i+1}) = m_{i+1} + C_{i+1}'(x_{i+1})$ become \[ \begin{cases} (x_2 - x_1) (2z_2 + z_1) + (x_3 - x_2) (2 z_2 + z_3) = 6(m_2 - m_1) \\ (x_3 - x_2) (2z_3 + z_2) + (x_4 - x_3) (2 z_3 + z_4) = 6(m_3 - m_2) \end{cases} \] (in this form, it is easy to see how to generalize to $n$ points, and it shows the origin of what the other post called the “tridiagonal” form of the system). Now we set $z_1$ and $z_4$ to zero and solve for $z_2$ and $z_3$. For this two-variable system it isn’t too bad to write down the explicit solution, which is what is used in the Desmos graph: \begin{gather*} z_2 = 6 \frac{3 m_2 x_2 + 2 m_1 x_4 + m_3 x_2 - 2 m_1 x_2 - 2 m_2 x_4 - m_2 x_3 - m_3 x_2}{(x_2 + x_3)^2 - 4 (x_1 x_2 + x_3 x_4 - x_1 x_4)} \\ z_3 = 6 \frac{3 m_2 x_3 + 2 m_3 x_1 + m_1 x_2 - 2 m_2 x_1 - 2 m_3 x_3 - m_1 x_3 - m_2 x_2}{(x_2 + x_3)^2 - 4 (x_1 x_2 + x_3 x_4 - x_1 x_4)} \end{gather*}


In this graph, blue plus green equals red.

Finally, to check that we have actually created a spline with the desired properties, we can look at the graphs of the first and second derivatives to make sure they’re continuous.


The spline is red. The first derivative is purple, and the second derivative is orange.

Notice that the second derivative is piecewise linear (naturally, since the spline is piecewise cubic) and zero at the endpoints (as we chose it to be). I particularly like seeing how the derivatives change as the points are moved.

Anyway, I learned a lot from putting together the graphs, and almost as much from writing this post. I think there are lots of interesting explorations one could do with these graphs, but for now I’ll just release them to the wild and hope people enjoy them!


P.S. Please pardon the bad pun in the title. I’m working on making my post titles more… interesting?

Saturday, February 15, 2014

be careful with computers

This week in my calculus class, we were studying examples of separable differential equations: exponential growth and decay, Newton’s Law of Heating and Cooling, and the logistic model, respectively \[ \frac{dy}{dt} = ky, \qquad \frac{dy}{dt} = k(M-y), \qquad\text{and}\qquad \frac{dy}{dt} = ky\left(1-\frac{y}{L}\right). \]

We explored how the shapes of solutions depend on the parameters in the equations, which parameters are most physically meaningful in various situations, and what extra parameters appear in the course of solving them.

I had several goals for this course of study:

  • to show the versatility of differential equations in modeling physical situations;
  • to show the usefulness of integration techniques for solving real-world problems;
  • to practice understanding the behavior of solutions to differential equations by examining the equations themselves;
  • to get students more familiar with Wolfram Alpha and Desmos as computational tools.
This last goal led to a curious discovery, which is what prompted me to write this post.

In the age of widely-accessible computer algebra systems, we are finally freed from spending a month in calculus class mastering techniques of integration. Substitution and integration by parts are indispensable, but as far as I’m concerned, the rest of the methods are used infrequently enough that students should be made aware that other techniques exist and they can learn them when needed. In particular, solving the logistic equation is about the only reason I can conjure to justify learning partial fractions in introductory calculus. So we learned how to compute partial fractions for a rational function with two linear factors in the denominator. Here’s how it gets used.

First, rearrange the logistic equation a bit and separate variables to get \[ \frac{dy}{y(y-L)} = -\frac{k}{L}dt. \] The right side clearly integrates to $-\frac{k}{L}t$. Using partial fractions, the integral of the left side is \[ \int\frac{dy}{y(y-L)} = \int \frac{1}{L} \frac{dy}{y - L} - \int\frac{1}{L} \frac{dy}{y} = \frac{1}{L} \big( \ln|y - L| - \ln|y| \big) = \frac{1}{L} \ln \left|\frac{y - L}{y}\right|. \] Tossing in the ever-present arbitrary constant of integration $+C$ (which really matters very little until one starts solving differential equations, as here), we have \[ \frac{1}{L} \ln \left|\frac{y - L}{y}\right| = -\frac{k}{L}t + C. \] Multiply both sides by $L$ and exponentiate both sides to get \[ \left|\frac{y - L}{y}\right| = e^{-kt + LC} = e^{LC} e^{-kt}. \] Now, $e^{LC}$ is always positive, but when we drop the absolute value signs, we get \[ \frac{y - L}{y} = Be^{-kt}, \] where $B$ can be either positive or negative. Finally, solve for $y$: \[ y = \frac{L}{1 + Ae^{-kt}} \] (where $A = -B$). Technically, throughout this process we had to assume that $y \ne 0$ and $y \ne L$. That’s okay, though, because $y = 0$ and $y = L$ are evident as equilibrium (constant) solutions from the equation itself.

There are three qualitatively different behaviors that solutions to the logistic equation can have, depending on their initial values.

  1. The solutions $y = 0$ and $y = L$, as mentioned above, are constant.
  2. A solution that starts out between $0$ and $L$ will increase over time, approaching the value $L$ asymptotically. This behavior corresponds to $A > 0$ in the solution above.
  3. A solution that starts out above $L$ will decrease over time, again approaching the value $L$ asymptotically. This behavior corresponds to $A < 0$ in the solution above.
The case of an initial value below zero is generally not physically meaningful, and in any case it follows the same formula as 3. (See the red curve in the third image at the top.) The most interesting solutions are type 2, whose graphs follow what Wikipedia describes as a “sigmoid” shape (the blue curve in the image at top). These are the solutions that provide the most meaningful applications of the logistic equation: growth in a constrained environment.

So far, so good, and I haven’t said anything that isn’t easily found elsewhere. But before we solved the logistic equation by hand, I wanted my students to explore the solutions graphically, so I had them plug the equation into Wolfram Alpha. Here’s the solution it provides: \[ y(t) = \frac{L e^{c_1 L+k t}}{e^{c_1 L+k t}-1}. \] However, if you plot this solution (link goes to a Desmos graph) and vary the parameters (particularly $c_1$, since $k$ and $L$ are constants given in the equation), you will only see one kind of solution: the third kind. Type 2 solutions, the most interesting ones, don’t even show up in Wolfram Alpha’s formula! What’s going on?

If you sign in to Wolfram Alpha, you can have it produce step-by-step solutions. The results from this query:

This doesn’t look so different from our solution: separate variables, integrate, solve for $y$. In W|A, $\log$ denotes the natural logarithm, so nothing’s amiss there. What’s changed?

It’s a small thing, one that could easily be missed, even if you’re looking step-by-step. A hint comes from the placement of the additional parameter that arises from the constant of integration: in our solution, this parameter ended up in front of the exponential function, whereas Wolfram Alpha left it in the exponent. That’s not really the issue though… ah, there it is:

After integrating, there’s no absolute value inside the logarithm (gasp!). I know, we try to convince our students time and time again that the standard antiderivative of $\frac{1}{x}$ is $\ln|x|$, and here Wolfram is leaving out the absolute value. And it turns out to make a difference—go back and read the solution we went through, and you’ll see that the moment the absolute value was dropped from the equation is precisely when our formula gained the flexibility to accommodate both type 2 and type 3 solutions. Wolfram Alpha missed solutions by leaving out the absolute value.

This isn’t all that disastrous. I know that Wolfram Alpha generally assumes complex arithmetic, in which case the logarithm requires a branch cut anyway. It also assumes a fair amount of mathematical sophistication on the part of its users. It wasn’t too hard for me to figure out why we didn’t get the answer we expected. [For more on these points, see the addendum, below.] But this example does suggest caution when we try to use W|A for educational purposes. In fact, it reinforces the message that as we’re training our students to use computing tools, we need to make sure they’re doing so intelligently. One doesn’t need to fully anticipate the answer provided, but one should have some idea of what to expect, to check the answer’s reasonableness.

I’m also not arguing that one has to be a stickler about constants of integration or absolute values in logarithms from the moment that they are introduced. Such matters are generally secondary in the early days of learning integration. But when motivation for such secondary matters naturally arises in examples of interest, that should be seized.

Addendum (2/16): When I first posted this, I suggested that it was evidence of a bug in Wolfram Alpha. Later I realized that this is not technically the case, because with complex numbers, Euler’s formula shows that we can get all of the solutions; for instance, if we let $c_1 = i\pi + c_2$, then Wolfram Alpha’s answer becomes $L e^{c_2+k t}/(e^{c_2+k t}+1)$. Indeed, if we add the initial condition $y(0) = L/2$, then W|A returns $y(t) = Le^{kt}/(e^{kt}+1)$, as expected. It’s not enough, however, to specify that the equation be solved over the reals; doing so gives the same answer as at first. The pedagogical points I made about using technology still stand. They are perhaps even made stronger, as defaulting to computations over the complex numbers often produces results that can be confusing for students. This doesn’t mean we should avoid using such tools, but we should prepare our students to adapt to unexpected output.

Tuesday, December 31, 2013

the best “real-life” use of geometry I saw this year

On May 3, 2003, the Old Man of the Mountain—a rock formation that had been known for at least two centuries as one of the natural wonders of New Hampshire—collapsed. No one saw it happen; that morning two park rangers looked up and realized he was gone. It had been expected that this day would arrive. The Old Man’s face was a remnant of ancient glacial movements, and it was not stable, thanks to erosion and freezing; it had already been repaired multiple times since the 1920s. In 2007, a project was begun to memorialize the Old Man, and in 2011 the “Profiler Plaza” was dedicated.

Over fall break this year, my wife and I made a trip to the western edge of the White Mountains, where the Old Man of the Mountain used to reside. We stopped by the memorial to the Old Man that is now located on the edge of “Profile Lake”, where I was astounded by the ingenuity of the project that had been created. Not content with photographs or descriptive plaques, the Old Man of the Mountain Legacy Fund sought to recreate the experience of viewing the famous visage.

This optical illusion is created by looking along any of several different steel structures, called “profilers”.

Each profiler has an array of raised features that, when viewed from an appropriate angle, line up to recreate the face on the mountain from the viewer’s perspective.

The distance from the Profiler Plaza to the Old Man’s former location is about half a mile, but for the profile effect to work requires careful placement of the viewer’s eyes. Thus each steel profiler comes equipped with three spots, marked according to the viewer’s height, so that they will be in the proper alignment. (Below is a picture of my wife looking at one of the profilers.)

I found this application of geometry to a memorial not only ingenious, but also quite stirring. The Old Man of the Mountain inspired several artistic works, including Nathaniel Hawthorne’s short story “The Great Stone Face”. When I was in high school, my mom directed a theatrical adaptation of this story, in which I played the role of the visiting poet who appears near the end of the tale. So I felt a special connection to this place as I visited it for the first time.

It seems this could make a useful cross-disciplinary lesson in school, say between English, geometry, and U.S. history. Students could study the stories of the Great Stone Face and the monument’s demise in 2003. Then they might be asked to choose a location and design the memorial, working out the necessary measurements. For instance, here is a link to a map with the face’s former location marked: 44.1606° N, 71.6834° W. The actual location of the profilers is on the north shore of Profile Lake. If anyone carries this out, I’d love to know how it goes!

Thanks for reading, and Happy New Year!

Monday, December 16, 2013

some of my favorite Desmos projects from this semester

Important note: You can click on any graph in this post to go to an interactive version. The interactivity is kind of the whole point, so please do take a few moments to experiment with some of these.

At the start of last summer, I announced that the Desmos graphing calculator had sold me on its usefulness “after just a few minutes of playing around”. Since then, the Desmos team has added a lot more features, without ever sacrificing user-friendliness (which, for those of us using Desmos to teach, is paramount).

During the fall semester, I used Desmos extensively in my calculus classes at Smith College. I made “worksheets” that allowed students to interact with mathematical ideas in an incredibly direct way; I also had fun creating them. Eventually I figured out that Desmos and Google docs could be used together to make more fully developed worksheets. (A brief word about my teaching situation: at Smith, all students have a Google account for their email, and thus all have a school-related Google drive by which documents could be shared. On days we used worksheets, about half of the students would bring in laptops, and they would work in groups of 2–3. At the end of class, or afterwards if they had sections to finish, they would share their work with me so that I could review it.) I’ve shared some of these worksheets over time, but I thought it would be nice to have some of my favorites collected in a single place. For simplicity, I’ve removed a bunch of the “worksheet” structure to these, so that they have become more like demonstrations others can use as they wish. Not all of these were used in class, as sometimes I just had to play around with some ideas.

First, some play. That you could not only define variables but also define functions in a Desmos graph and use them elsewhere came as a revelation to me, as did the fact that you could create sums with a variable number of terms. I first learned this while adding up sine functions à la Fourier sine series, which I wrote about here. After that, I made the blancmange curve, a classic example of a continuous but nowhere-differentiable function:

The ever-responsive and ever-creative Desmos team turned the blancmange curve into a mountain range, with a setting sun and moving train (you’ll definitely want to play with this one):

Onto the calculus demonstrations. Teaching calculus in the fall almost always leads to introducing derivatives near the equinox, around the time that days are getter shorter at their fastest rate. I have in the past just mentioned this as an illustration of the derivative. This fall, however, I had students explore how the changing amount of daylight is affected by time of year, latitude, axial tilt, etc. Here, for instance, is a graph depicting the amount of daylight on each of the year at latitude 35°N:

Here is the amount of daylight each day at latitude 50°S:
And here’s what the amount of daylight would be like just a few degrees away from the equator, if Earth had the same axial tilt as Uranus (about 82 degrees):

Optimization takes up a chunk of time after derivatives have been introduced. Several classic problems deal with boxes whose surface area must be minimized, or whose volume must be maximized, under various constraints. I’ve always suspected students have trouble imagining what it means, for instance, to require a box have a square base and a fixed volume. What do the various shapes of such boxes look like? So I made a simple model of an open-topped box whose volume and base side length could be manipulated:

Then integration rolls around, with the requisite Riemann sums. Between the introduction of sigma notation, Δx‘s, and a host of other notation, it’s easy for students to feel like they have no idea what is going on. A picture can clear things up, because the idea is quite simple, but drawing enough pictures to show what it means for Riemann sums to converge can take an incredibly long time. Isn’t it nice that we can just show this now?


(I adapted this from another Riemann sums demonstration, made by Evan R.)

In discussing differential equations, we took a day to look at the logistic model of population growth. I asked on Twitter if anyone had a suggestion for real-world data to base a project on. Lia Santilli came up with a great idea I would never have considered: the number of Starbucks locations open t years after the company started. I had the students create a table using the data available here, then try to match the data as nearly as possible with a logistic curve. Here was my attempt:

What I like about this example is that you can see how difficult it is to distinguish between exponential and logistic growth early on. Right up until the inflection point of the logistic curve, the growth seems exponential, and so naturally the company continues that growth trend for another few years. But as the market reaches saturation, it becomes clear that they’ve overshot the mark, and one year (2009) they actually have to close more locations than they open. After that, the growth is more restrained. I don’t know if the 20,000 locations I built into my model is actually the largest sustainable number for this “population”, but I like the challenges for management highlighted by this analysis.

I also used Desmos a few times in my probability class for illustrations. I wrote about one example here. By using the built-in floor functions and combinations, it’s easy to show what various probability distributions look like, and how they change with the various parameters. For example, here is the probability distribution for the number of times a coin comes up heads in 20 tosses, if it is weighted so that it comes up heads 70% of the time (the vertical dotted line indicates the expected value of 14):

The ability to change parameters also makes it possible to nicely illustrate the Central Limit Theorem. Here, for instance, is a graph showing the standard normal distribution (black), the distribution of an exponential random variable (blue), the sum of ten such random variables (green, heading off the right side of the graph), and a normalization of the sum to have mean 0 and variance 1 (red):
You can see the convergence of the sum to a normal distribution beginning.

Finally, I also found Desmos useful for illustrating parts of my research. One of the dynamical systems I’m studying is related to the three-cusped hypocycloid, or “deltoid”, which is traced out by a point marked on the circumference of a circle rolling around the inside of a circle three times the size:

Each point inside the deltoid lies on three tangent segments:
Perhaps most exciting for me was when I discovered that all the pedal curves of the deltoid could be easily seen and manipulated. A pedal curve is determined by the orthogonal projections of a fixed point onto the tangent lines of the deltoid:

The pedal curves in the above examples were drawn using explicit parametrizations. They can also be defined implicitly by fourth-degree polynomial equations. Since Desmos recently added the ability to plot implicit curves where both variables appear with degree greater than 2, I thought I’d share another, simpler graph that illustrates this functionality:

So that’s an assortment of things I’ve done with Desmos over the past few months, some big, some small. For teachers planning to use Desmos with their students, I would make the following suggestions:

  1. Draw them in with something interactive and manipulable. Teach them early to recognize that different shapes can be given by the same formula simply by changing a few parameters, and to explore the effects that the parameters have.
  2. Get them to create their own graphs. In the past, we had to do all the work to create the worksheets and the models, but now students can be enabled to build their own; when they do, they will benefit from creating, not just responding.
  3. Give them questions that require thoughtful use of the technology they have; simply having access is not a panacea. For example, real-world problems often have models that call for very different scales on the vertical and horizontal axes. Students can be tempted just to use the zoom buttons, causing them to miss important details. Make sure they know they have to think about the graphs they’re creating, not just rely on the computer to show them everything, because it won’t.
For everyone, I encourage widespread use of Desmos and similar tools for education, illustration, research, and entertainment. The Desmos development team deserves an immense amount of thanks for providing us with such graphing and computational power.

Added 12/17– Check out these other graphs for calculus, made by Patrick Honner: http://mrhonner.com/desmos

Thursday, September 26, 2013

surprises, part 2

In my last post, I described a classic problem in probability: suppose a certain disease occurs in 0.5% of a given population, and there is a test that detects it with 99% accuracy, but also returns a false positive 1% of the time it is used on a healthy person. What is the conditional probability that an individual has the disease, given that they have a positive result from the test? The answer, somewhat surprisingly, turns out to be less than a third.

When we discussed this in my probability class, one student asked a very sensible question: What if we test the person twice?

This question seemed worth investigating. As I see it, the question can be interpreted two ways. On one hand, what if we tested everyone twice? How would that affect the conditional probability given above? On the other hand, what if we only gave the test a second time to those who had a positive first test? Would we be more likely to filter out those who are actually ill in that case, having restricted to a population in which the disease is more prevalent? Do these two methods produce different results?

To begin with, let’s return to the original question and analyze it more thoroughly by introducing some variables. Let $r$ be the prevalence of the disease in the total population (which can be interpreted as the probability that any particular individual has the disease). Suppose the test we have returns a true positive (a positive result for someone who is ill) with probability $p$ (called the sensitivity of the test), and it returns a false positive (a positive result for someone who is well) with probability $q$ (the value $1 - q$ is called the test’s specificity). Bayes’ formula then says that the probability of having the illness given a positive test result is \[ P(r) = \frac{r \cdot p}{r \cdot p + (1 - r) \cdot q}. \] If we fix $p$ and $q$ and let $r$ vary, we get a graph like the following:

(drawn here with $p = 0.98$ and $q = 0.05$; you can click on the graph to go to an interactive version). Notice the large derivative for small values of $r$; that low conditional probability we got at the beginning? Was essentially an artifact of the disease itself being fairly uncommon. (As one student slyly put it, “so the way to make a positive test more likely to mean you’re sick is to give more people the disease?”) Raising the value of $p$ doesn’t change the graph much. The real problem lies in the false positives; if the disease is sufficiently rare, then having any chance at all of false positives ($q > 0$) means that the false positives will outnumber the true positives.

If we change the situation so that every time an individual is tested we administer the test twice, then a few things happen. First, the chance of getting two false positives when testing a healthy individual is $q^2$, which is generally much smaller than $q$. Meanwhile, the chance of getting two positives when testing a sick individual is $p^2$, smaller than $p$ but not by much. The result is a much steeper curve for low-prevalence diseases:

(the red curve is the same as before; the purple curve represents the probability of having the illness given two positive tests). Effectively, we have created a new test with a much reduced chance of false positives.

But testing everyone twice seems unnecessary. Just as a low prevalence leads to a reduced probability that a positive result means the disease is actually present, so it also reduces the probability that one is ill given a negative result. Here is the graph of this latter conditional probability (that is, the prevalence of the disease among those who have a negative test):

So we shouldn’t worry too much about those who have a negative test. We can give the test a second time just to those who have a positive first test. In effect, rather than creating a new test as before, we have restricted to a new population, in which the disease is far more prevalent (as given by the original conditional probability $P(r)$). Here is the graph of the original function $P(r)$ (again in red) together with the graph (in orange) of the probability of having the disease given a positive result and being among those who had a first positive test:

Do you notice something about the purple and orange curves in the graphs above? They are the same. I admit, this surprised me at first. I thought that having a second positive result when restricted to those who already had one would make it more likely that one had the disease than if we tested everyone twice indiscriminately. But the algebra bears out this coincidence of graphs. It doesn’t matter whether everyone is tested twice or just those who first have a positive result; the conditional probability of having the disease after two positive tests is the same either way. In the latter case, of course, far fewer total tests are administered.

Something we haven’t considered yet is what it means to have one positive and one negative test. Here the relative sizes of $p$ and $1-q$ matter. You can check that if $p + q = 1$, then having one positive and one negative test returns one’s likelihood of having the disease back to that of the overall population (because a sick person and a healthy person have the same chance of getting one positive and one negative result). However, if $q$ is greater than $1-p$ (that is, if a healthy person is more likely to have a false positive than a sick person is to have a false negative), then obtaining different results on two tests means one’s chance of having the disease is slightly less than in the overall population. One last graph, in which the red and blue curves from before reappear, together with a green curve representing the probability of having the disease given one positive and one negative test:

Conversely, if $q$ is less than $1 - p$, then the green curve would lie slightly above the diagonal.

The ideas we have been exploring are at the heart of Bayesian analysis, in which a certain assumption (called a prior) about how some characteristic is distributed is fed into a conditional probability model, and a new distribution is obtained. The new distribution becomes the new prior, and the process may be repeated. This kind of analysis depends on a Bayesian view of probability, in which the distribution represents a measure of belief (rather than any necessarily objective knowledge), and how that belief changes with the introduction of new knowledge. In our case, our prior was the assumption that the disease had prevalence $r$, and the new knowledge we introduced was the result of a medical test. This is the same kind of analysis—at a much more elementary level—that Nate Silver made famous (or perhaps that made Nate Silver famous) during recent election seasons. I must say, I was pleased that a student’s question led so neatly into this timely topic.

Tuesday, September 17, 2013

my favorite surprise

I’m excited about tomorrow. Tomorrow in my probability class, we’re going to start discussing Bayes’ Formula. This is the main thing I remember about my college probability class. While we have already seen some surprising results in particular cases where the rules of probability are applied, this is, to me, the first truly surprising general result. It changes everything I think about probability.

Here’s my motivating example: suppose we have before us an urn that contains five blue balls and five red balls. We draw two balls, in order, and record their colors. (To be clear, this is sampling without replacement.) What is the probability that the first ball is red? “Well,” you say, “evidently the likelihood of that is 1/2, because half of the balls are red.” “Very well,” I say, “then what is the probability that the first ball is red, assuming that the second ball is blue?” “What does that have to do with it?” you ask. “When the second ball is drawn, the first one has already been chosen, so how could the second ball turning up blue have anything to do with the probability that the first ball is red?”

Let’s throw some formulas in here. Suppose $E$ and $F$ are two events in the sample space $S$ of an experiment. (I discussed the definitions of these terms in my previous post.) The conditional probability of $E$ given $F$, written $P(E\mid F)$, is the quotient $\dfrac{P(E \cap F)}{P(F)}$, meaning, loosing speaking, that we consider all the ways both $E$ and $F$ can occur (weighted by their individual probabilities), and think of this as just a subset of the outcomes where $F$ occurs (rather than all of $S$). “Sensible enough,” (I hope) I hear you say. Now, you will hopefully also agree that we can split $F$ into two parts: the one that intersects $E$ and the one that does not, i.e., $F = (F \cap E) \cup (F \cap E^c)$. “Aren’t you overcomplicating things?” you demur. “Just wait,” I plead. Because the events $F \cap E$ and $F \cap E^c$ are mutually exclusive (i.e., disjoint), and so we have $P(F) = P(F \cap E) + P(F \cap E^c)$. Interesting, no? So we can write \[ P(E \mid F) = \frac{P(E \cap F)}{P(E \cap F) + P(E^c \cap F)} \] (using the fact that $E \cap F = F \cap E$). And now perhaps it seems like this manipulation isn’t so weird, because in our “motivating case”, each of the terms in that expression isn’t so hard to compute, and in fact one of them appears twice!

So what happens? Let’s return to our urn and say $E$ is the event “the first ball is red”, while $F$ is the event “the second ball is blue”. Then $P(E \cap F) = \big(\frac{5}{10}\big)\big(\frac{5}{9}\big)$ and $P(E^c \cap F) = \big(\frac{5}{10}\big)\big(\frac{4}{9}\big)$, so \[ P(E \mid F) = \frac{\big(\frac{5}{10}\big)\big(\frac{5}{9}\big)}{\big(\frac{5}{10}\big)\big(\frac{5}{9}\big)+\big(\frac{5}{10}\big)\big(\frac{4}{9}\big)} = \frac{25}{25+20} = \frac{5}{9}. \] Since 5/9 > 1/2, it is more likely that the first ball was red if we know that the second ball is blue! (Surprised? Think about what happens if there are only two balls to begin with, one blue and one red. Once that’s sunk in, try the above again starting with $m$ blue and $n$ red balls in the urn.)

So far I’m cool with everything that’s happened. The realization that later events provide information about earlier ones is a bit of a jolt, but not so far-fetched after a little reflection. Bayes, however, endeavors to turn our minds further inside-out. We just need one new idea, just as simple as everything we’ve done to this point: the equation for conditional probability can be rewritten as $P(E \cap F) = P(F) \cdot P(E \mid F)$. And of course, because $E \cap F = F \cap E$, we could just as well write $P(E \cap F) = P(E) \cdot P(F \mid E)$. Now, as before, let’s split $F$ into $E \cap F$ and $E^c \cap F$. Using our most recent observation, we have \[ P(E \mid F) = \frac{P(E) \cdot P(F \mid E)}{P(E) \cdot P(F \mid E) + P(E^c) \cdot P(F \mid E^c)}. \] “Now why on Earth…?” you splutter, to which I reply, “Because sometimes the knowledge you have is more suited to computing the conditional probabilities on the right than finding the one on the left directly from the definition.”

Here’s a classic example. Suppose there is an uncommon illness that occurs in the general population with probability 0.005 (half of percent). Suppose further that there is a medical test for this affliction that is 99% accurate. That is, 99% percent of the time the test is used on a sick patient, the test returns positive, and 99% of the time it is used on a healthy patient, it returns negative. You are concerned that you might have this illness, and so you have the test. It comes back positive. What is the probability that you have the illness?

Do you see where this is going? You’re interested (well, we both are, really, because I care about your health) in the event $E$ “I have the illness.” The information we have, though, is that the event $F$ “the test came back positive” occurred. And what we know about the test is how its results depend on the patient being sick or well. That is, we know $P(F \mid E)$ and $P(F \mid E^c)$, and fortunately we also know $P(E)$ (ergo we also know $P(E^c) = 1 - P(E)$). We can compute the likelihood of your being ill as \[ P(E \mid F) = \frac{(0.005)(.99)}{(0.005)(0.99) + (0.995)(0.01)} \approx 0.3322. \] Far from it being a certainty that you have this particular illness, your chances are better than 2 in 3 that you don’t! Even if the illness were twice as common and occurred in 1% of the population, your chances of being sick are only 1 in 2 after the test comes back positive. (Notice that this probability—this conditional probability—depends not only on the efficacy of the test, but also on the prevalence of the illness.)

And that’s my favorite surprise in probability.

(If you haven’t read it yet, you should go look at the article Steven Strogatz wrote for the New York Times about Bayes’ Theorem, in which he makes it seem—somewhat—less surprising.)