Tuesday, August 27, 2013

iambic pentameter and stratified sampling

This post is to remind me later of a cool idea I just had. I have no idea if it will ever be carried out, but if it’s written down somewhere obvious, it’s more likely to happen. (I should do this more often.)

Pentametron 2013 is a Twitter bot account that searches for tweets that happen to be written in iambic pentameter and compiles them into sonnets. They’re really just grouped into rhyming couplets, but you can see the most recently compiled sonnets at pentametron.com. (You can also read the creator’s description here.) Much of the time, this method of culling messages from the 400 million tweets produced each day results in an odd mishmash: random song lyrics (no surprise those tend to fall into the Bard’s classic style), expressions of irritation or frustration (maybe those emotions lead to the punchier iambs?), snippets of musings (the author is in a poetic frame of mind). But sometimes you get a glimpse into how someone very far removed from you personally is reacting to a common cultural event. For instance, I noticed today that several of the tweets were about back-to-school topics.

So here is my question, which I believe should be quantifiable to an extent:

What is the chance that a message retweeted by Pentametron will be about a trending topic?

This seems like a good project for a statistics class, or perhaps even my probability class this fall.

Here’s why I find this question interesting: Pentametron’s algorithm amounts to choosing a relatively small number of tweets from the vast panoply of Twitter users in a systematic way that isn’t biased too much towards one type of tweet or the other (except perhaps for the above-noted frequency with which song lyrics appear). In statistics, this is called stratified sampling, and it provides the basis for essentially all polling from large populations. If the method of sampling selects from as many different kinds of groups (“strata”) as possible in a way that is random (or systematic in a way unrelated to the division into groups), then the results are very often representative of the entire population, even more so than a random selection from the whole pool, without stratification. I expect this effect can be explained by some theoretical arguments which I now intend to learn.

(By a strange quirk, I learned about stratified sampling from the introduction to Donald Knuth—yes, that Donald Knuth—’s book 3:16 Bible Texts Illuminated. He explains that by selecting verse 16 of chapter 3 from every book of the Bible—insofar as possible—we should theoretically get a relatively good picture of the Bible’s message as a whole. I recommend reading his description of the process, and his justification of it as one way of studying the Bible, alongside others of course.)

A relatively small number of tweets turn out to be in iambic pentameter, but they are spread out among all of the different types of Twitter users. Thus, perhaps by looking at just these tweets, we can get a sense of what Twitter as a whole is talking about. Another indicator of these large-scale trends are the so-called trending topics, which simply refer to words, phrases, or hashtags that appear with a high frequency in a particular region or worldwide. My question above can be rephrased as, how often do these two indicators (the subjects of tweets in iambic pentameter and the trending topics) align? A priori, it may seem like there is no connection, but I think the above discussion suggests that a correlation is likely, measurable, and even estimable if one knows the right guesses and assumptions to make.

Any thoughts on how to go about studying this?

a brief teaching statement

I have in the past spent a long time preparing lectures. Like many teachers, I started out believing that the most important thing was for me to present an ideal amount of information in an ideal order with an ideal set of illustrative examples. I have always made time for student practice during class, but I believed somehow that this was a break for the students to reflect upon my masterful presentation and to benefit from it through exercises. Which is to say, if I wish to vary my teaching style, I have already set aside the time, and I can use that time in many different ways.

I will now use that time to prepare questions instead of lectures. Good questions require as much preparation as good lectures. But they flip the students’ classroom experience from essentially passive to essentially active, and activity is what leads to learning. To put a fancy name to it, I will be designing my classes for inquiry-based learning. (I am grateful especially to Dana Ernst and Bret Benesh for providing some helpful thoughts and resources, as well as references to research.) On particularly good days, I hope I won’t even have to state the questions, because the students will come up with them and start to answer them on their own.

I don’t mean that I will entirely give up making presentations (though they will never last longer than 25 minutes). For one thing, it’s pleasant to have a well-received lecture, and they can model mathematical thinking and expression (although those skills are rarely learned purely by osmosis). For another, certain types of material are learned just as well through mimicry as through discovery (if I understand correctly, techniques for calculation generally fall in this category). And lastly, different students have different ways of learning; for some of them, a clear presentation by the instructor really is best, and I want to respect that. (I say this securely on the basis of feedback I have received from students.)

I want the best education for my students. In today’s world, where massive amounts of information are freely available, that means more class time should be spent finding answers, not accumulating facts. And to find answers, one must first have questions. Real questions. Fruitful questions. Questions that are better the closer they are to the mind and heart of the learner.

So my question is, how do I do this?

Monday, August 05, 2013

what I did last week: math in Mexico

Last Saturday I returned from a 10-day visit to the Centro de Ciencias Matemáticas on the campus of Universidad Nacional Autónoma de México in Morelia, where we held the International conference and workshop on surfaces of infinite type, bringing together about 45 participants for lectures, mini-courses, and collaboration.

It was a great success; all of the talks were good, and several people expressed gratitude that we had organized a conference on this particular topic. Our goal was to gather people interested in dynamical or geometric aspects of infinite-type surfaces, so we structured the conference around two mini-course series. The first, on infinite-dimensional Teichmüller spaces, was taught by Alastair Fletcher. The second, on a particular dynamical system known as the wind-tree model, was taught by Vincent Delecroix and Samuel Lelièvre. The reason that this conference was so interesting for us is that infinite-type surfaces are generally outliers in the conferences we attend. A bit of background.

Surfaces are spaces that, on a small scale, look just like the plane. Some familiar examples of surfaces are the plane itself, the sphere, and the torus. These are all examples of finite-type surfaces. The sphere and the torus are both compact, and the plane is homeomorphic to (i.e., “just like” in an appropriate sense) the sphere with one point removed (as can be seen via stereographic projection). Compact surfaces were classified in the late 19th and early 20th centuries, during the early development of the field of topology. In the orientable case, which is the main situation we’re interested in, every type of compact surface can be obtained essentially by “adding handles” to a sphere: the number of handles added is called the genus of the surface. So a sphere has genus zero, a torus has genus one, a two-holed torus has genus two, and so on. Labeling a surface by its genus is a convenient way to describe it, because any two compact surfaces with the same genus can be continuously deformed from one to the other. Finite-type surfaces are obtained from compact surfaces by removing finitely many points, called punctures. They are therefore labeled by a pair (g,n), where g is the genus of the starting surface, and n is the number of punctures.

Infinite-type surfaces, by contrast, have either infinite genus or infinitely many punctures (or both). Think of the plane with all points of the form (a,b) removed, where a and b are integers, or of an infinite chain of tori, attached one to the next. Such surfaces were classified by Kerékjártó in the 1920s, although later improvements and corrections to his proof were necessary. Essentially, any non-compact surface can be characterized (up to continuous deformation, i.e., homeomorphism) by its genus (which may be finite or infinite) and its set of ends. Roughly speaking, the ends of a surface describe the ways that a sequence of points can “escape to infinity” while remaining on the surface. (“Escaping to infinity” means leaving every compact subset of the surface and not returning.) In the case of finite-type surfaces, the ends correspond to the set of punctures; a compact surface has zero ends. More generally, the set of ends of a surface has a nice structure so that it can be considered as a subset of a Cantor set. To get a surface whose ends form a Cantor set, for instance, imagine starting with a Y-shaped pipe, which has a “base” and two “arms”.

Add two other Y-shaped pipes by attaching their bases onto the arms of the first one. Now there are four “free” arms, to which four more Y-shaped pipes may be added. Continue this process ad infinitum. The result is shown here (credit to Spivak’s Differential Geometry):

If you’re familiar with the Cantor set construction, you can see that the ends of the surface—each of which (except for the base of the original pipe we started with) corresponds to following a particular path through an infinite set of pipes—line up with a Cantor set. This surface has genus zero, however. A surface with infinite genus must have at least one end; the unique (up to homeomorphism) surface with infinite genus and one end has been humorously named the Loch Ness Monster by Étienne Ghys. While surfaces can be constructed with any number of ends (as long as they form a closed subset of the Cantor set), the Loch Ness Monster seems to show up naturally in our field most often whenever infinite-type surfaces appear.

Now let’s return to the topics of the conference mini-courses. Although there is only one topological orientable surface of genus g, such a surface can be given many distinct geometric structures. Think, for example, of a long, skinny torus as opposed to a short, fat one:


These are distinguished in their geometry, for instance, by the lengths of curves going around the two surfaces. When the genus is greater than 1 (or, more generally, when 2×(genus of the surface)+(number of punctures) is at least three), any geometric structure on the surface which is complete (meaning the ends are infinitely far away) must be hyperbolic—the total curvature (which measures whether the surface is more like a saddle or more like a sphere near each point) must be negative. Teichmüller space describes all the ways that a surface of a fixed type can have hyperbolic geometry in which the surface looks the same near every point (in more technical terms, the curvature is constant). When a surface has finite type, its Teichmüller space is finite dimensional; the required number of parameters is 6×(genus)+2×(number of punctures)-6. When the surface has infinite type, as one might expect, the Teichmüller space also becomes infinite-dimensional, and some care is needed to describe it. Our course on Teichmüller spaces dealt with some properties that are common to both finite- and infinite-type cases, as well as some of the peculiarities of infinite-type.

The second course, on wind-tree models, addressed a family of surfaces with a specific geometric construction. The name refers not to actual wind blowing through trees, but to a roughly analogous process that occurs in so-called Lorentz gas, which consists of particles striking obstacles following billiard dynamics. Various forms of this model have appeared since the early 20th century (starting with Paul and Tatiana Ehrenfest, after whom the model is sometimes named). In the version this course dealt with, the obstacles are identical rectangles with vertical and horizontal sides, centered at integer lattice points in the place. The particles are modeled by straight-line trajectories that reflect off the obstacles in the usual way, “angle of incidence equals angle of reflection”.


One might ask several questions about this model: Which particles escape to infinity? Which remain bounded? Which return close to their starting position infinitely often? How do these answers depend on the angle at which a particle is traveling? How do they depend on the shape of the obstacles? This system has an associated surface of infinite type which facilitates the study of its behavior. By a clever trick, much of the study can be related to a finite-type surface (in fact, of genus 2), where much is already known. In order to use the theory of the finite-type surface to draw conclusions about the infinite-type surface (and thus the wind-tree model itself), several new applications of dynamical and topological tools have been needed, with quite beautiful results.

In my own work, I tend to study surfaces where a connection with finite-type surfaces is not as apparent. The field of infinite-type surfaces in a dynamical setting is relatively new, and growing, which is why we felt the time was appropriate to hold this conference. I am grateful to my fellow organizers, the staff at UNAM, and all the participants for making it a great week.