So please hop over to davidlowryduda.com if you’re interested.

(I migrated over a year ago, but there’s been a sudden boom of interest in this site)

]]>**PLEASE NOTE** that this note was written and typeset for my (now) main site: davidlowryduda.com. You can find it here. Because of this, some of the math shown here will display better there, where I have more control. This will also serve as the announcement that davidlowryduda.com is now on an improved webhost and should be fast and fully operational. Now, back to the math.

We care about Taylor series because they allow us to approximate other functions in predictable ways. Sometimes, these approximations can be made to be very, very, very accurate without requiring too much computing power. You might have heard that computers/calculators routinely use Taylor series to calculate things like (which is more or less often true). But up to this point in most students’ mathematical development, most mathematics has been clean and perfect; everything has been exact algorithms yielding exact answers for years and years. This is simply not the way of the world.

Here’s a fundamental fact to both mathematics and life: almost anything worth doing is probably pretty hard and pretty messy.

For a very recognizable example, let’s think about finding zeroes of polynomials. Finding roots of linear polynomials is very easy. If we see , we see that is the zero. Similarly, finding roots of quadratic polynomials is very easy, and many of us have memorized the quadratic formula to this end. Thus has solutions . These are both nice, algorithmic, and exact. But I will guess that the vast majority of those who read this have never seen a “cubic polynomial formula” for finding roots of cubic polynomials (although it *does* exist, it is horrendously messy – look up Cardano’s formula). There is even an algorithmic way of finding the roots of quartic polynomials. But here’s something amazing: there is no general method for finding the exact roots of 5th degree polynomials (or higher degree).

I don’t mean *We haven’t found it yet, but there may be one*, or even *You’ll have to use one of these myriad ways* – I mean it has been shown that there is no general method of finding exact roots of degree 5 or higher polynomials. But we certainly can approximate them arbitrarily well. So even something as simple as finding roots of polynomials, which we’ve been doing since we were in middle school, gets incredibly and unbelievably complicated.

So before we hop into Taylor series directly, I want to get into the mindset of approximating functions with other functions.

**1. Approximating functions with other functions **

We like working with polynomials because they’re so easy to calculate and manipulate. So sometimes we try to approximate complicated functions with polynomials, a problem sometimes called “polynomial interpolation”.

Suppose we wanted to approximate . The most naive approximation that we might do is see that , so we might approximate by . We know that it’s right at least once, and since is periodic, it’s going to be right many times. I write to indicate that this is a degree polynomial, that is, a constant polynomial. Clearly though, this is a terrible approximation, and we can do better.

** 1.1. Linear approximation **

So instead of a constant function, let’s approximate it with a line. We know that and , and two points determine a line. What is the line that goes through and ?. It’s , and so this is one possible degree approximation:

You might see that our line depends on what two points we use, and this is very true. Intuitively, you would expect that if we chose our points very close together, you would get an approximation that is very accurate near those two points. If we used and instead, we get this picture:

Two points determine a line. I’ve said this a few times. Alternately, a point on the line and the slope of the line are enough to determine the line. What if we used that to give us the point , and used that to give us our direction? Then we would have a line that goes through the point and has the right slope, at least at the point – this is a pretty good approximate. I’ll call this line , and in this case we see that . *Aside: Some people call the derivative the “best linear approximator” because of how accurate this approximation is for near (as seen in the picture below). In fact, the derivative actually is the “best” in thise sense – you can’t do better.*

These two graphs look almost the same! In fact, they are very nearly the same, and this isn’t a fluke. If you think about it, the secant line going through and is an approximation of the derivative of at – so of course they are very similar!

** 1.2. Parabolic approximation **

But again, we could do better. Three points determine a parabola. Let’s try to use , and to come up with a parabola. One way we could do this would be to use that we know both roots of the parabola, and . So our degree approximation must be of the form , since all degree polynomials with zeroes at and are of this form (*I am implicitly using something called the Factor Theorem here, which says that a polynomial has a root if and only if for some other polynomial of degree one less than *). What is ? We want it to pass through the point , so we want . This leads us to , so that . And so

As we can see, the picture looks like a reasonable approximation for near , but as we get farther away from , it gets worse.

Earlier, we managed to determine a line with either 2 points, or a point and a slope. Intuitively, this is because lines look like , so there are two unknowns. Thus it takes pieces of information to figure out the line, be they points or a point and a direction. So for a parabola , it takes three pieces of information. We’ve done it for three points. We can also do it for a point, a slope, and a concavity, or rather a point, the derivative at that point, and the second derivative at that point.

Let’s follow this idea to find another quadratic approximate, which I’ll denote by in parallel to my notation above, for around . We’ll want , , and , since and . How we we find such a polynomial? The “long” way would be to call it and create the system of linear equations

and to solve them. We immediately see that , which lets us see that , which lets us see that . This yields

The “clever” way is to write the polynomial in the form , since then (as the other terms have a factor of ). When you differentiate in this form, you get , so that . And when you differentiate again, you get , so that . This has the advantange of allowing us to simply read off the answer without worrying about solving the system of linear equations. Putting these together gives

And if you check (you should!), you see that these two forms of are equal. This feels very “mathlike” to me. These approximations look like

Comparing the two is insteresting. is okay near , then gets a bit worse, but then gets a little better again near and . But is *extremely* good near , and then gets worse and worse.

** 1.3. Cubic approximation **

You can see that gives us a better approximation than for close to , just as was better than for near . Let’s do one more: cubic polynomials. Let’s use the points and to generate . This yields

Showing this calculation is true leads us a bit further afield, but we’ll jump right in. Four points, so if we wanted to we could create a system of linear equations as we did above for and solve. It may even be a good refresher on solving systems of linear equations. But that’s not how we are going to approach this problem today. Instead, we’re going to be a bit “clever” again.

Here’s the plan: find a polynomial-part that takes the right value at and is at and . Do the same for the other three points. Add these together. This is reasonable since it’s easy to find a cubic that’s at the points and : it’s . We want to choose the value of so that this piece is when . This leads us to choosing

In other words, this polynomial part is

Written in this form, it’s easy to see that it is at the three points where we want it to be , and it takes the right value at . Notice how similar this was in feel to our work for the quadratic part. Doing the same sort of thing for the other three points and adding all four together yields a cubic polynomial (since all four subparts are cubic) that takes the correct values at 4 points. Since there is only 1 cubic that goes through those four points (since four points determine a cubic), we have found it. *Aside: this is just a few steps away from being a full note about Lagrange polynomial interpolation* This looks like

These points were chosen symmetrically around so that this might be a reasonable approximation when is close to . How will compare to , the approximating polynomial determined by and his derivatives at ? Our four pieces of information now will be and . Writing first as , we see that , and . This lets us read off the coefficients: we have , so that in total

which looks like

These look very similar. Which one is better? Let’s compare the differences between and with the differences between and in graph form. This next picture shows , or rather the error in the approximation of by . What we want is for this graph to be , since this means that is exactly , or as close to as possible. Since the approximation is so good, the picture is very zoomed in.

Notice that the y-axis is labelled by “1e-3”, which meand . So the error is within , which is very small (we really zoomed in). As you can see, the error is a bit weird. It oscillates a little, and is a bit hard to predict. Now let’s look at the picture of .

This error is very easy to predict, and again we can see that it is *extremely* accurate near the point we used to generate the approximation (which is in this case). As we get further away, it gets worse, but it is far more accurate and more predictably accurate in the center. Further, was easier to generate than , and it gives four decimal places of accuracy near while can only give two. So maybe there is something special to these polynomials. Let’s look into them more.

*Aside: people spend a lot of time on these interpolating problems. If this is something you are interested in, let me know and I can direct you to further avenues of learning about them*

**2. Taylor Polynomials **

The idea that the polynomials were better approximates than around the central point, as we saw above, led to the polynomials getting a special name: they are called “Taylor Polynomials.” This might not be how you’ve seen Taylor polynomials introduced before, but this is where they really come from. And this is what keeps the right intuition. Just as the first derivative is the “best linear approximation,” these Taylor Polynomials give the best quadratic approximation, cubic approximation, etc. Let’s get the next few Taylor polynomials for for near .

We saw that . And the way we got this was by calculating and its first three derivatives at (giving us 4 pieces of information), and finding the unique cubic polynomial satisfying those four pieces of information. So to get more accuracy, we might try to include more derivatives. So let’s use the and its first *four* derivatives at . These are , and . Writing , we expect . Since the fourth derivative is , we get the same polynomial as before! We get . But before we go on, notice that the constant term came from evaluating , the coefficient of came from evaluating , the coefficient of came from , the coefficient of came from , and the coefficient of came from . These are exactly the same expressions that came up for and from before.

This is something very convenient. We see that the coefficient of in our polynomials depends only of the th derivative of . So to find , I don’t need to recalculate all the coefficients. I just need to realize that the coefficient of will come from the fifth derivative of at (which happens to be ). And to get this, we would have differentiated five times, giving us an extra , so that the coefficient is . All told, this means that

and that this is the best degree approximation around the point . Pictorally, we see

Now that we’ve seen the pattern, we can write the general degree Taylor polynomial ,, approximation for :

The next images shows increasingly higher order Taylor approximations to . Worse approximations are more orange, better are closer to blue.

What we’ve done so far extends very generally. The degree polynomial that agrees with the value and first derivatives of a function at approximates the function for values of near , and in general these polynomials are given by

This prompts two big questions. First: how good are these approximations? And second, since using more derivatives gives us better approximations, does using “all” the derivatives give us the whole function? Rather, if is infinitely differentiable, is it true that ? Or is it at least true that this gives us the “best” approximation we can get from a single point?

These are big questions, and they are all inter-related. The first question leads to considering the remainder, or error, of Taylor polynomials. And the second leads us to consider infinite Taylor series associated to a function.

**3. Estimating the Error of Taylor Polynomial Approximations **

If we want to know how well we can expect Taylor polynomials to approximate their associated function, then we need to understand the error, or differences, between the two. We might hope that Taylor polynomials always give very good approximations, or that if we use enough terms, then we can get whatever accuracy we want.

But this is not true.

One good example is the function:

This is often called the bump function, because it looks like

which is just a little bump. The weird thing about this is that is differentiable everywhere. This isn’t obvious, because we defined it in two pieces. But it turns out that this function has a derivative even at , where the two “pieces” touch. In fact, this function is infinitely differentiable everywhere. So that’s already sort of weird. It’s weirder, though, in that all the derivatives of at are exactly (so that these derivatives are more like the derivatives of the constant function to its right than the “bump” part to its left).

So if you look at the Taylor polynomials for generated at the point , then you would get for all (all the derivatives are ). These Taylor polynomials do a terrible job of approximating the bump function.

It turns out that there are many, many functions that are very “nice” in that they have many derivatives, but which don’t have well behaved Taylor approximations. There are even functions that have many derivatives, but whose Taylor polynomials *never* provide good approximations, regardless of what point you choose to generate the polynomials.

To make good use of Taylor polynomial approximations, we therefore have to be a bit careful about the error in the approximations. To do this, we will use the mean value theorem. (*Aside: For those keeping score, the mean value theorem gives us so much. In my previous note, I talk about how the mean value theorem gives us the fundamental theorem of calculus. Now we’ll see that it gives us Taylor series and their remainder too. This is sort of crazy, since the statement of the mean value theorem is so underwhelming.*)

This also marks a turning point in this note. Suddenly, things become much less experimental and visual. We will prove things – but don’t be afraid!

The mean value theorem says that if a function is differentiable, and we choose any two numbers in the domain and , then there is a point between and so that is equal to the slope of the secant line from to . Stated differently, there is a point between and so that .

If we suppose that and (or you could keep as it is for Taylor polynomials generated at points other than ), then we get that there is some between and such that . Rearranging yields . This serves as a bound on the error of the degree Taylor polynomial at , which is just . How so? This says that the difference between and (which is our approximate in this case) is at most for some . If we happen to know that for all in , then we know that . So our error at the point is at most .

This really isn’t very good, but it is also a very bad approximate. On the other hand, this semi-trivial example contains the intuition for the proof.

To get a bound on linear approximations, we start off with the derivative of . If we apply the mean value theorem of , we get that there is some between and so that , or rather that . (This shouldn’t be a surprise – is just a function too, so we expect to get this in the same form as we had above). Here’s where we can do something interesting: let’s integrate this. In particular, let’s rewrite this in :

and let’s integrate him:

The second fundamental theorem of calculus (which is also essentially the mean value theorem!) says that . So the equation above becomes

which rearranges to

This says that the error in our linear Taylor polynomial is at most for some in . So as before, if we know a bound on the second derivative, then our error is at most . This is much better than what we had above. In particular, if our value is close to , like say less than , then the bit in the error means the error has a factor of . But as we get further away from , we expect worse error. This is why in the pictures above we see that the Taylor polynomials give very good approximations near the center of the expansion, and then get predictably worse.

Let’s do one more, to see how this works. Now we will start with the second derivative. The mean value theorem says again that . Writing this in and integrating both sides, we get

Rearranging gives

(This shouldn’t surprise us either, since the second derivative is just a function. So of course it has a first order expansion just like we saw above.) Let’s again write this in and integrate both sides,

Rearranging yields

for some between and . Now we see that the approximations are getting better. Firstly, for near , we now get a cubic factor in the error term. So if we were interested in around , now the error would get a factor of . That’s pretty good. We also see that there is a rising factorial in the denominator of the error term. If there is anything to know about factorials, it’s that they grow very, very fast. So we might hope that this factorial increases with higher derivatives so that the error is even better. (And this is the case).

So the only possible bad thing in this error term is that is not well-behaved, or is not very small, or is very hard to understand. And in these cases, the error might be very large. This is the case with the bump function above – the derivatives grow too rapidly.

If you were to continue this process, you would see that the general pattern is

for some between and . And this is where the commonly quoted Taylor remainder estimate comes from:

where is the maximum value that takes on the interval .

There are other ways of getting estimates on the error of Taylor polynomials, but this is by far my favorite. Now that we have estimates for the error, how can we put that to good use?

Let’s go back to thinking about the Taylor polynomials for that was the source of all the pictures above. What can we say about the error of the degree Taylor polynomial for ? We can say a lot! The derivatives of go in a circle: . And all of these are always bounded below by and above by . So the st derivative of is always bounded by . By the remainder we derived above, this means that

For small , this converges really, really fast – this is why the successive approximations above were so accurate. But it also happens that as gets bigger, the factorial in the denominator grows very fast too – so we get better and better approximations even for not small . This prompts another question: what would happen if we kept on including more and more terms? What could we say then? This brings us to the next topic.

**4. Taylor Series **

Now we know the th Taylor polynomial approximates very well, and increasingly well as increases. So what if we considered the limit of the Taylor polynomials? Rather, what if we considered

This is a portal into much deeper mathematics, and is the path that many of the mathematical giants of the past followed. You see, infinite series are weird. When do they exist? Is an infinite sum of continuous functions still continuous? What about differentiable? These are big, deep questions that are beyond the scope of this note. But some of the intuition is totally within the scope of this note.

We know the error of the th Taylor polynomial to is bounded above by . If we take the limit as of the sequence of remainders, we get

and it goes to no matter what is. (Factorials grow larger than the numerator for any fixed ). So no matter what is, as we use more and more terms, the approximations become arbitrarily good. In fact, since the error goes to and is well-behaved, we have a powerful result:

That is equality. Not approximately equal to, but total equality. Some people even go so far as to define by that Taylor series. When I say Taylor series, I mean the infinite sum. People study Taylor series to try to see what information they can glean about functions from their Taylor series. For example, we know that is periodic. But how would you determine that from its Taylor series? (It’s not very easy).

It turns out that you can tell a lot about a function from its Taylor series (although not as much as we might like, and we don’t really study this in math 100). Functions that are completely equal to their Taylor series are called “analytic” functions, and analytic functions are awesome. But even Taylor polynomials are good, and used to approximate hard-to-understand things.

*Aside: I do a lot of work with complex numbers (where we allow and things like that) instead of real numbers. One reason why I prefer complex numbers is that complex analytic functions, which are complex-valued functions that are exactly equal to their Taylor series, are miraculous things. Being complex analytic is totally amazing, and you can tell essentially anything you want from the Taylor series of a complex analytic function. This is to say that you are on the precipice of exciting and very deep mathematics. Of course, that’s where introductory classes stop.*

**5. Conclusion **

We saw Taylor polynomials, some approximations, some proofs, and some Taylor series. One thing I would like to mention is that for applications, people normally use finite Taylor polynomials and show (or use, if it’s well known) that the error is small enough for the application. But Taylor polynomials are useless without some understanding of the error. In Math 100, we give a totally lackluster treatment of error estimates. Serious math and physics concentrators will probably need and use more than what we have taught – even though now is perhaps the best time to learn it. So it goes! (Others might very well use Taylor polynomials and their error, but many useful functions and their remainders are understood just as well as those for sine, which we saw here; it is not always necessary to reinvent the wheel, although it can be rewarding in less immediate ways).

I hope you made it this far. If you have any comments, questions, concerns, tips, or whatnot then feel free to leave a comment or to ask me. For additional reading, I would advise you to use only google and free materials, as everything is available for free (and I mean legally and freely available). Something I did not mention but that I’ve been thinking about is presenting a large collection of applications of Taylor series. Perhaps another time.

This note can be found online at mixedmath.wordpress.com or davidlowryduda.com under the title “An Intuitive Overview of Taylor Series.” This note was written with vim in latex, and converted to html by a modified latex2wp. Thus this document also comes in pdf and .tex code . The pdf does not include my beautiful gif, which I am always proud of. The graphics were all produced using the free mathematical software sage . Interestingly, this note was the source of sage trac 15419, which will likely result in a tiny change in sage as a whole. I highly encourage people to check sage out.

And to my students – I look forward to seeing you in class. We only have a few left.

]]>As requested, I’m uploading the last five weeks’ worth of worksheets, with (my) solutions. A comment on the solutions: not everything is presented in full detail, but most things are presented with most detail (except for the occasional one that is far far beyond what we actually expect you to be able to do). If you have any questions about anything, let me know. Even better, ask it here – maybe others have the same questions too.

Without further ado –

- Week 6 worksheet with solutions
- Week 7 worksheet with solutions
- Week 8 worksheet with solutions
- Week 9 worksheet with solutions
- Week 10 worksheet with solutions

And since we were unable to go over the quiz in my afternoon recitation today, I’m attaching a worked solution to the quiz as well.

Again, let me know if you have any questions. I will still have my office hours on Tuesday from 2:30-4:30pm in my office (I’m aware that this happens to be immediately before the exam – status not by design). And I’ll be more or less responsive by email.

Study study study!

]]>This is the first time I’ve dealt with backend-ish things, so it took me a bit to get used to the lay of the land. Although I have the domain davidlowryduda (go figure), I am currently using a free webhost (I’m going to assume that I’m not going to suddenly out-traffic it or anything). So it’s a bit slow compared to wordpress.com, but there will be no more ads! (yay!) And I have complete control over the layout, and I can have a better site.

Further, I’m experimenting with django, and I haven’t yet decided which one I prefer or how I might want to integrate them together.

This is all to say that davidlowryduda.com is in a state of flux, but I’ll be maintaining both this and that for a while – until I get fully set up.

]]>Before we hop into the details, I’d like to encourage you all to avail of each other, your professor, your ta, and the MRC in preparation for the first midterm (next week!).

**1. The quiz **

There were two versions of the quiz this week, but they were very similar. Both asked about a particular trig substitution

And the other was

They are very similar, so I’m only going to go over one of them. I’ll go over the first one. We know we are to use trig substitution. I see two ways to proceed: either draw a reference triangle (which I recommend), or think through the Pythagorean trig identities until you find the one that works here (which I don’t recommend).

We see a , and this is hard to deal with. Let’s draw a right triangle that has as a side. I’ve drawn one below. (Not fancy, but I need a better light).

In this picture, note that , or that , and that . If we substitute in our integral, this means that we can replace our with . But this is a substitution, so we need to think about too. Here, means that .

*Some people used the wrong trig substitution, meaning they used or , and got stuck. It’s okay to get stuck, but if you notice that something isn’t working, it’s better to try something else than to stare at the paper for 10 minutes. Other people use , which is perfectly doable and parallel to what I write below.*

*Another common error was people forgetting about the term entirely. But it’s important!*.

Substituting these into our integral gives

where I have included question marks for the limits because, as after most substitutions, they are different. You have a choice: you might go on and put everything back in terms of before you give your numerical answer; or you might find the new limits now.

*It’s not correct to continue writing down the old limits. The variable has changed, and we really don’t want to go from to .*

If you were to find the new limits, then you need to consider: if and , then we want a such that , so we might use . Similarly, when , we want such that , like . *Note that these were two arcsine calculations, which we would have to do even if we waited until after we put everything back in terms of to evaluate*.

*Some people left their answers in terms of these arcsines. As far as mistakes go, this isn’t a very serious one. But this is the sort of simplification that is expected of you on exams, quizzes, and homeworks. In particular, if something can be written in a much simpler way through the unit circle, then you should do it if you have the time.*

So we could rewrite our integral as

How do we integrate ? We need to make use of the identity . **You should know this identity for this midterm**. Now we have

The first integral is extremely simple and yields The second integral has antiderivative (*Don’t forget the on bottom!*), and we have to evaluate , which gives . **You should know the unit circle sufficiently well to evaluate this for your midterm**.

And so the final answer is . (You don’t need to be able to do that approximation).

Let’s go back a moment and suppose you didn’t re\”{e}valuate the limits once you substituted in . Then, following the same steps as above, you’d be left with

Since , we know that . This is how we evaluate the left integral, and we are left with . This means we need to know the arcsine of and . These are exactly the same two arcsine computations that I referenced above! Following them again, we get as the answer.

We could do the same for the second part, since when is ; and when we get .

Putting these together, we see that the answer is again .

Or, throwing yet another option out there, we could do something else (a little bit wittier, maybe?). We have this term to deal with. You might recall that , the so-called double-angle identity.

Then . Going back to our reference triangle, we know that and that . Putting these together,

When , this is . When , we have .

And fortunately, we get the same answer again at the end of the day. (phew).

**2. The worksheet **

Finally, here is the worksheet for the day. I’m working on their solutions, and I’ll have that up by late this evening (sorry for the delay).

Ending tidbits – when I was last a TA, I tried to see what were the good predictors of final grade. Some things weren’t very surprising – there is a large correlation between exam scores and final grade. Some things were a bit surprising – low homework scores correlated well with low final grade, but high homework scores didn’t really have a strong correlation with final grade at all; attendance also correlated weakly. But one thing that really stuck with me was the first midterm grade vs final grade in class: it was really strong. For a bit more on that, I refer you to my final post from my Math 90 posts.

]]>Firstly, here is the recitation work from the first three weeks:

*(there was no recitation the first week)*- A worksheet focusing on review.
- A worksheet focusing on integration by parts and u-substitution, with solutions.

In addition, I’d like to remind you that I have office hours from 2-4pm (right now) in Kassar 018. I’ve had multiple people set up appointments with me outside of these hours, which I’m tempted to interpret as suggesting that I change when my office hours are. If you have a preference, let me know, and I’ll try to incorporate it.

Finally, there will be an exam next Tuesday. I’ve been getting a lot of emails about what material will be on the exam. The answer is that everything you have learned up to now and by the end of this week is fair game for exam material. **This also means there could be exam questions on material that we have not discussed in recitation**. So be prepared. However, I will be setting aside a much larger portion of recitation this Thursday for questions than normal. So come prepared with your questions.

Best of luck, and I’ll see you in class on Thursday.

]]>After the war, Donald began to act in children’s programs at a radio station in Chicago. Perhaps it was because of his love of children’s education, perhaps it was the sudden visibility of the power of science, as evidenced by the nuclear bomb, or perhaps something else – but Donald had an idea for a tv show based around general science experiments. And so Watch Mr. Wizard was born on 3 March 1951 on NBC. (When I think about it, I’m surprised at how early this was in the life of television programming). Each week, a young boy or a girl would join Mr. Wizard (played by Donald) on a live tv show, where they would be shown interesting and easily-reproducible science experiments.

Watch Mr. Wizard was the first such tv program, and one might argue that its effects are still felt today. A total of 547 episodes of Watch Mr. Wizard aired. By 1956, over 5000 local Mr. Wizard science clubs had been started around the country; by 1965, when the show was cancelled by NBC, there were more than 50000. In fact, my parents have told me of Mr. Wizard and his fascinating programs. Such was the love and reach of Mr. Wizard that on the first Late Night Show with David Letterman, the guests were Bill Murray, Steve Fessler, and Mr. Wizard. He’s also mentioned in the song Walkin’ On the Sun by Smash Mouth. Were it possible for me to credit the many scientists that certainly owe their

I mention this because the legacy of Mr. Wizard was passed down. Don Herbert passed away on June 12, 2007. In an obituary published a few days later, Bill Nye writes that “Herbert’s techniques and performances helped create the United States’ first generation of homegrown rocket scientists just in time to respond to Sputnik. He sent us to the moon. He changed the world.” Reading the obituary, you cannot help but think that Bill Nye was also inspired to start his show by Mr. Wizard.

In fact, 20 years ago today, on 10 September 1993, the first episode of Bill Nye the Science Guy aired on PBS. It’s much more likely that readers of this blog have heard of Bill Nye; even though production of the show halted in 1998, PBS still airs reruns, and it’s commonly used in schools (did you know it won an incredible 19 Emmys?). I, for one, loved Bill Nye the Science Guy, and I still follow him to this day. I think it is impossible to narrow down the source of my initial interest in science, but I can certainly say that Bill Nye furthered my interest in science and experiments. He made science seem cool and powerful. To be clear, I know science is still cool and powerful, but I’m not so sure that’s the popular opinion. (*As an aside: I also think math would really benefit from having our own Bill Nye*).

When Bill Nye was preparing his team for the show in 1992, he distributed a sheet of paper that contained the objective of the show: Change the world.

Bill Nye talks a bit about the start of the show, and his plans to Change the World, in this youtube video Bill Nye: Firebrand for Science – Aims to Change the World. He talks about his conscious decision to inspire further generations of scientists (He inspired me!).

Bill Nye is still very prevalent in the science community, and still does his best to inspire. He’s also the CEO of the Planetary Society. He talks at graduations, at conventions, at schools, on tv. Really, he does whatever he can, wherever he can. I’ve twice had the chance to meet him in person, and he’s thoroughly motivating in person. He’s also a great storyteller. (This video of Bill Nye speaking at ASU shows both his passion for science and his skills of storytelling, as well as being motivational and an inspiration to Change the World. It’s perhaps my favorite video of him that I’ve seen).

So I have a question: why is it that 20 years after the initial debut of the show, we still show ‘Flight,’ the first episode of Bill Nye the Science Guy? Surely part of it is because it’s just so good (and that he said they deliberately focused on science instead of technology, so that the show would stand the test of time). But it wasn’t the most-watched show of its time, nor the show with the highest production value. But I have a theory, a theory well-stated by Hank Green in one of his famous vlogs (he’s one of the Vlogbrothers who, while we’re talking about inspiration, is one of the groups that inspired Vi Hart to begin her extraordinary set of youtube videos “Doodling in Math Class”, incidentally one of the closest treasures math has to a Bill Nye at the moment).

Hank Green says that: “PBS is probably the most innovative and successive legacy media company in online video, which at first seems odd until you realize that PBS has always focused not on content that gets views, but on content that gets loved.”

People certainly love both Bill Nye and his show. And Bill is also always looking to go that extra yard to connect with the viewer.

How does Bill Nye compare to the top 25 watched shows of 2012-2013? Of them, there are 8 reality tv shows, 2 sports shows, and 9 crime/police procedural dramas. Other than these, two remain from the top 10: The Big Bang Theory (which I enjoy, but always feel guilty about liking because of its incredible reliance and perseverance on nerd/geek stereotypes) and The Walking Dead (which I hear good things about, but have never watched).

This reminds me of some little-known trivia. What was the first reality tv show (in the modern sense, not the candid-camera or novelty-act type), and on what network did it air? When I was first asked this question, I thought of The Real World. But in fact, it’s An American Family, which aired on PBS in 1971. Isn’t that funny?

There is a concept called “channel drift” or “network decay,” which is ‘the gradual transition of a television network away from its original programming focus to either target a more lucrative audience or to include less niche programming’ (which I took straight from wikipedia). Although the term may not be common, complaints about it are – and they tend to be complaints about reality tv. My favorite example is TLC, what was once known as The Learning Channel. TLC was founded in 1972 by NASA and a publicly funded instructional network focused on providing real education through tv. At first, it was distributed at no cost by NASA satellites. It was privatized in 1980, and it began its slow transformation from providing free educational programming to the network that carries Here Comes Honey Boo Boo, a show widely considered exploitative and depraved, and absolutely no educational programming.

In fact, some argue this is occurring to other channels (some say essentially all such channels) largely thought of as ‘educational’ in nature. To highlight a few, Animal Planet’s highest ratings came from a two hoax specials about finding mermaids (completely fictional documentaries); Discovery’s single largest audience was for the ‘mockumentary’ Megalodon, speculation about the existence of a giant prehistoric shark in the waters today. Or the History Channel, which once showed history, or SyFy, which once showed (and was called) sci-fi, or MTV, which once had music (instead of Jersey Shore, another weirdly faked reality tv series). The list goes on.

I don’t know what to say or think about channel drift. But I do know that it reminds me of the importance and consistency of PBS, a network that can continue to focus on lasting content that gets loved, instead of gratifying content that gets views. The Boston Globe writes that PBS is doing better now that channels that once provided educational content have drifted away. Viewers once again see that PBS provides a service that non-public stations cannot, or at least choose not to, match.

I really like Bill Nye the Science Guy (and NOVA, and many other science programs available on PBS) and I want to be able to continue to enjoy them, or be inspired by them, or inspire others with them, for years.

There’s one final cog in this story that completes a circle. In one week, the next season of Dancing with the Stars will start. Among the contestants are both Snooki, a character/person famous from Jershey Shore, and Bill Nye, likely with a new set of bow ties. Bill writes in a blog that he does this as the CEO of the Planetary Society and as a student of Carl Sagan. In other words, it seems Bill will be trying to bring inspirational messages and support of science education and funding to the main networks on his own.

He writes:

As unusual as this may seem, I believe we can broaden awareness of the Society and thereby humankind’s exploration of the Cosmos one ballroom dance at a time.

I may not understand, but I want to believe. Because it’s Bill Nye, who inspired me and so many others, and he wants to, dare I say it, Change the World.

Happy Birthday Bill Nye the Science Guy (the show’s 20 now!), and to Bill, I wish all the luck I can.

]]>Many people managed to stumble across the page before I’d finished all the graphics. I’m sorry, but they’re all done now! I was having trouble interpreting how WordPress was going to handle my gif files – it turns out that they automagically resize them if you don’t make them of the correct size, which makes them not display. It took me a bit to realize this. I’d like to mention that this actually started as a 90 minute talk I had with my wife over coffee, so perhaps an alternate title would be “*Learning calculus in 2 hours over a cup of coffee*.”

So read on if you would like to understand what calculus is, or if you’re looking for a refresher of the concepts from a first semester in calculus (like for Math 100 students at Brown), or if you’re looking for a bird’s eye view of AP Calc AB subject material.

**1. An intuitive and semicomplete introduction to calculus **

We will think of a function as something that takes an input and gives out another number, which we’ll denote by . We know functions like , which means that if I give in a number then the function returns the number . So I put in , I get , i.e. . Primary and secondary school overly conditions students to think of functions in terms of a formula or equation. The important thing to remember is that a function is really just something that gives an output when given an input, and if the same input is given later then the function spits the same output out. *As an aside, I should mention that the most common problem I’ve seen in my teaching and tutoring is a fundamental misunderstanding of functions and their graphs*

For a function that takes in and spits out numbers, we can associate a graph. A graph is a two-dimensional representation of our function, where by convention the input is put on the horizontal axis and the output is put on the vertical axis. Each axis is numbered, and in this way we can identify any point in the graph by its coordinates, i.e. its horizontal and vertical position. A graph of a function includes a point if .

Thus each point on the graph is really of the form . A large portion of algebra I and II is devoted to being able to draw graphs for a variety of functions. And if you think about it, graphs contain a huge amount of information. Graphing involves drawing an upwards-facing parabola, which really represents an infinite number of points. That’s pretty intense, but it’s not what I want to focus on here.

** 1.1. Generalizing slope – introducing the derivative **

You might recall the idea of the ‘slope’ of a line. A line has a constant ratio of how much the value changes for a specific change in , which we call the slope (people always seem to remember rise over run). In particular, if a line passes through the points and , then its slope will be the vertical change divided by the horizontal change , or .

So if the line is given by an equation , then the slope from two inputs and is . *As an aside, for those that remember things like the ‘standard equation’ or ‘point-slope’ but who have never thought or been taught where these come from: the claim that lines are the curves of constant slope is saying that for any choice of on the line, we expect a constant, which I denote by for no particularly good reason other than the fact that some textbook author long ago did such a thing. Since we’re allowing ourselves to choose any , we might drop the subscripts – since they usually mean a constant – and rearrange our equation to give , which is what has been so unkindly drilled into students’ heads as the ‘point-slope form.’ This is why lines have a point-slope form, and a reason that it comes up so much is that it comes so naturally from the defining characteristic of a line, i.e. constant slope.*

But one cannot speak of the ‘slope’ of a parabola.

Intuitively, we look at our parabola and see that the ‘slope,’ or an estimate of how much the function changes with a change in , seems to be changing depending on what values we choose. (This should make sense – if it didn’t change, and had constant slope, then it would be a line). The first major goal of calculus is to come up with an idea of a ‘slope’ for non-linear functions. I should add that we already know a sort of ‘instantaneous rate of change’ of a nonlinear function. When we’re in a car and we’re driving somewhere, we’re usually speeding up or slowing down, and our pace isn’t usually linear. Yet our speedometer still manages to say how fast we’re going, which is an immediate rate of change. So if we had a function that gave us our position at a time , then the slope would give us our velocity (change in position per change in time) at a moment. So without knowing it, we’re familiar with a generalized slope already. Now in our parabola, we don’t expect a constant slope, so we want to associate a ‘slope’ to each input . In other words, we want to be able to understand how rapidly the function is changing at each , analogous to how the slope of a line tells us that if we change our input by an amount then our output value will change by .

How does calculus do that? The idea is to get closer and closer approximations. Suppose we want to find the ‘slope’ of our parabola at the point . Let’s get an approximate answer. The slope of the line coming from inputs and is a (poor) approximation. In particular, since we’re working with , we have that and , so that the ‘approximate slope’ from and is . But looking at the graph,

we see that it feels like this slope is too large. So let’s get closer. Suppose we use inputs and . We get that the approximate slope is . If we were to graph it, this would also feel too large. So we can keep choosing smaller and smaller changes, like using and , or and , and so on. This next graphic contains these approximations, with chosen points getting closer and closer to .

Let’s look a little closer at the values we’re getting for our slopes when we use and as our inputs. We get

It looks like the approximate slopes are approaching . What if we plot the graph with a line of slope going through the point ?

It looks great! Let’s zoom in a whole lot.

That looks really close! In fact, what I’ve been allowing as the natural feeling slope, or local rate of change, is really the line tangent to the graph of our function at the point . In a calculus class, you’ll spend a bit of time making sense of what it means for the approximate slopes to ‘approach’ . This is called a ‘limit,’ and the details are not important to us right now. The important thing is that this let us get an idea of a ‘slope’ at a point on a parabola. It’s not really a slope, because a parabola isn’t a line. So we’ve given it a different name – we call this ‘the derivative.’ So the derivative of at is , i.e. right around we expect a rate of change of , so that we expect . If you think about it, we’re saying that we can approximate near the point by the line shown in the graph above: this line passes through and it’s slope is , what we’re calling the slope of at .

Let’s generalize. We were able to speak of the derivative at one point, but how about other points? The rest of this post is below the ‘more’ tag below.

What we did was look at a sequence of points getting closer and closer to , and finding the slopes of the corresponding lines. So we were calculating , , \dots. Seen in a slightly different way, we had a decreasing , starting with and we were looking at slopes between the inputs and . So we were calculating . We did this for when and when so find the derivative at . But this idea leads to the same process for any . Let’s reparse this formula.

finds the slope between the points and , which approximates the slope at the point . For small , this approximation should be even better. So to find the derivative (read the ‘slope’) at a value of , we insert that value of and try this for decreasing . And hopefully it will approach a number, like it approached above. If this *does* approach a number like above, then we say that is differentiable at , and we call the resulting slope the derivative, which we denote by . So above, when , the derivative at is , or .

In general, we have

Definition 1The derivative of a function at the point , which is an analogy of slope for nonlinear functions, is given by

as gets smaller and smaller (if it exists). If this does not tend to some number as gets smaller and smaller, when does not have a derivative at . If has a derivative at , then is said to be ‘differentiable’ at .

We haven’t really talked about cases when a function doesn’t have a derivative, but not every function does. Functions with discontinuities or jumps, or that aren’t defined everywhere, etc. don’t have good ideas of local slopes. So sometimes functions have derivatives, sometimes they don’t, and sometimes they have derivatives at some points and not at others.

Another thing we haven’t yet addressed is the notation. Derivatives also have a function-like notation, and that’s because they are a function. Above, for , we had that , i.e. the derivative of at is . It turns out that has a derivative everywhere (i.e. is called ‘differentiable’), and the derivative is given by the function . This is an amazingly compact presentation of information. So the ‘slope’ of at a point is given by , for any point . Whoa.

But the important thing to accept now is that we have a way of talking about the ‘slope,’ or rather the local rate of change, of reasonable-behaving nonlinear functions, and the key is given by definition 1 above.

** 1.2. What can we do with a derivative? **

A good question now is so what? Why do we care about derivatives? We mentioned that the derivative gives a good linear approximation to functions (like why our line is so close to the graph of the parabola above). This linear approximation is very useful and important in its own right. We’ve also mentioned that derivatives give you a way to talk about rates of change, which is also very important in its own right. But I’ll mention 3 more things now, and a few in the Section 1, when we talk about ‘undoing derivatives.’

First and most commonly talked about is optimization. Sometimes you want to make something as big, as small, or as cheap as possible. These problems, when you’re trying to maximize or minimize something, are called optimization problems. Derivatives provide a method of solving optimization problems (in many ways, the best method). This relies on the key observation that if you have a differentiable (i.e. has a derivative) function that takes a maximum value when , so that for near , then the ‘slope’ of at will be . In other words, . Why? Well, this slope describes the slope of the line tangent to at , and if it’s not flat then the function is going up in one direction or the other – and so isn’t a max afterall. An example is shown in the image below:

Similarly, if takes a minimum when , then .

So to maximize or minimize a function, we can calculate its derivative (well, *we* can’t because we’re not focusing on the calculations right now, but it’s possible and you learn this in a calculus course) and try to find its zeroes. This is really used all the time, and is simple enough to be done automatically. So when companies try to maximize profits, or land use is optimized, or power consumption minimized, etc., there’s probably calculus afoot.

Second (although a bit complicated – if you don’t understand, don’t worry), derivatives give good ways to find zeroes of functions through something called ‘Newton’s Method.’ The idea is that derivatives give linear approximations to our function, and it’s easy to see when a line has a zero. So you try to find a point near a zero, approximate the function with a line using a derivative, and find the zero of the line. This will an approximate zero. Repeating this process (approximate the function with a line, find the zero, plug this zero into the function and approximate with a line again) can very quickly yield zeroes. So conceivably you’ll be optimizing a function, and thus will find its derivative and want to find its zeroes. So you then use derivatives to find the zeroes of the derivative of the original function\dots in other words, derivatives everywhere.

Third and perhaps most importantly (because this ends up yielding the key to the sections below, because it’s not at all obvious how important this is) is a deceptively simple statement. In the first reason above, we talked about how the derivative of a differentiable function at a max or a min is zero. This leads to Rolle’s Theorem,

which says that if is differentiable and for some and , then there is a between and such that . The reasoning behind Rolle’s Theorem is very simple: if and f is not constant, then there is a max or a min between and . At this max or min, the derivative is . And if is constant, then it is a line of slope zero, and thus has derivative .

Using slightly more refined thinking (which amounts to ‘rotating a graph’ to be level so that we can appeal to Rolle’s Theorem), we can get a similar theorem called the Mean Value Theorem:

Theorem 2 (Mean Value Theorem)Suppose is a differentiable function and . Then there is a point between and such that

In other words, there is a point between and whose derivative, or immediate slope, is the same as the average slope of from to .

Let’s use this for a moment to really prove something that sort of know to be true, but that now we can really justify. Let’s say that you travel 30 miles in 30 minutes. If we think of your position as a function of time, then we might think of (so at time , you have gone zero distance) and (so 30 minutes in, you’ve gone 30 miles). Your average speed was miles per minutes, or miles per hour. By the mean value theorem, there was at least one time when you were going exactly miles per hour. If cops were to use this to measure speed, like have strips and/or cameras that record your positions at different times, then they could issue speeding tickets without ever actually measuring your speed. That’s sort of cool.

There are many more reasons why derivatives are awesome – but this is what a calculus course is for.

** 1.3. Undoing derivatives – introducing the integral **

So we’re done talking about derivatives (mostly). Two big questions motivate the ‘other half’ of calculus: if I give you the derivative of a function, can you ‘undifferentiate’ it and find the original function ? (This is the intrinsic motivation, how you might motivate it yourself from learning derivatives for the first time). But there is also an extrinsic motivation, sort of in the same way that derivatives arise from wanting to talk about slopes of nonlinear functions. This extrinsic motivation is: how do you calculate the area under a function ? It’s not at all obvious that these are related (real math is full of these surprising and deep connections).

We will proceed with the second one: how do we calculate the area under a function ? For this section, we’ll start with the function .

We actually know how to calculate the area under a triangle. Suppose we want to calculate the area between the horizontal axis and , starting at and going as far right as . So we have a triangle of width and height , so the area is . Think back to earlier: we found the derivative of to be , and now the area from to under is . They almost undid each other! This is an incredible relationship. Let’s look deeper.

Let’s think about a generic well-behaved function , like the picture below.

We’re going to create a function called , which takes in a nice always-positive function and spits out the area between the function and the horizontal axis from to . So this is a function in a variable, – but it’s formulated a bit differently (it’s right around here where some people may need to adjust how they think of functions). In terms of the function from the graph above, the area represented by is the area of the shaded region in the picture below.

Let’s do something a bit interesting: let’s try to take the derivative of at . Recall that a derivative of a function is gotten from looking at as . So we want to try to make sense of

Well, this is finding the area from to and then taking out the area from to . If you think about it, we’re just left with the area from to , or . Pictorally, this picture shows the area from to in blue, and the area from to a bit more than in red (so that the overlap is purple). So in the picture, we are treating as , so that we’re taking the derivative at .

Once we remove the image in blue from the image in red (so we take away everything in purple), we are left with a strip from to a bit more than , as shown here.

It’s time to appeal to a bit of intuition (or the intermediate value theorem ). The area from to under is the same as the area of a rectangle of length and height within the range of on . For example, the area from the shape above (zoomed in a bit here)

is the same as the area in this rectangle.

Now as is getting smaller, the height of the rectangle must get closer and closer to , i.e. the value of at the point . In fact, as gets smaller, the area from to under gets closer and closer to (which is the same as the width of the rectangle times the approximate height), so . This lets us evaluate our derivative:

so that taking the derivative of this area function gives back the original function . This is known as the First Fundamental Theorem of Calculus.

Theorem 3 (Fundamental Theorem of Calculus I)The derivative of the area under a function from up to , which we write as , is precisely , the value of the function at .

*Aside: remember, this is just an intuitive introduction. There are annoying requirements and assumptions on the functions we’re using and ways to make the style arguments rigorous, but I sweep these under the rug for now.*

There is a big caveat to what we’ve just said, and it has to do with this ‘Area function.’ When does it make sense to talk about the area under a function? For example, what if we have the following function:

Does it have an area function? What about a worse function, with points all on their own? What do we mean by area? We know how to find areas of polygons and straight-sided shapes. What about non-straight-sided shapes? Just like how we developed derivatives to talk about slopes of nonlinear functions, we will now develop a method to calculate areas of non-straight-sided functions. And just like with derivatives, we’re going to do this with approximations.

We love being asked to find areas of rectangles, because it’s so easy. So given a function and a region on the horizontal axis, say from to , we can approximate the area by a rectangle. Well, how do we choose how tall to make the rectangle? Let’s compare two alternatives: using the minimum value of on (using a minimum-rectangle approximation), and the maximum value of on (using a maximum-rectangle approximation). Let’s return to our generic function from above. In blue is the maximum-rectangle approximation, in red in the minimum-rectangle approximation (purple is overlap).

But this is clearly a poor approximation. How can we make it better? What if we used two rectangles? Or three? Or ten? Maybe a hundred?In the animation above, note that as the number of rectangles increases, the approximation becomes better and better, and our two alternative area methods are getting closer and closer. If, as the number of rectangles gets huge, the area given by the minimum-rectangles tends to the same number as the area given by the maximum-rectangles, then we say that the area under from to is that number that the approximations tend to (so this agrees with our intuition in the picture). This is a clear parallel to how we thought of derivatives.

Definition 4We call the area under a function from to the number that arises as the number that both the minimum-rectangles approximation and maximum-rectangles approximation tend to as the number of rectangles increases, if there is such a number. If there is such a number, we callintegrableon , and we represent this area by the symbol

where I used to emphasize that the area is not a function of , but is just the area under a fixed region from to .

So let’s set up the parallel: if we can find the slope of a function at a point (which we call the derivative), we call the function differentiable there; if we can find the area under a function on a region (which we call the integral), we call the function integrable there.

With new notation, we can phrase the first Fundamental Theorem of Calculus as follows: if is integrable, then the derivative of is . Said another way, we can find functions that can be differentiated to give . For this reason, integrals are sometimes called anti-derivatives. There is a deeper connection here too.

Suppose is a function whose derivative is another function . So . We’ve seen this relationship so far: the function has derivative . Let’s return to the task of finding the area under , this time from to . The following is a bit slick and the most non-obvious part of this post (in my opinion/memory).

Start with . Let’s carve up the segment into many little pieces of width , i.e. . Then by adding and subtracting evaluated at these points, we see that

Let’s just look at the first set of parentheses for a moment: . By the Mean Value Theorem (equation 2), we know that

for some between and , and recalling that . Rearranging this, we get that

Repeating, we see that

Here’s the magic. This sum has an interpretation. Since is between and , and we’re multiplying by , that could be the same calculation we would do if we were approximating the area under from to with rectangles: each rectangle has width , so that’s why we multiply by . Then is a reasonable height of the rectangle. So the sum of times values on the right is an approximation of the area under from to .

As we use more and more rectangles, it becomes the area, so we get that the area under from to is exactly .

Stated more generally (as there was nothing special about here):

Theorem 5 (Fundamental Theorem of Calculus II)If is a differentiable function with , then

So, for example, to find the area under from to , we can compute where , since is the function whose area we want to understand. This gives . Although is not a hard function to compute areas for, this works for many many functions. In fact, integrals are the best tool we have to compute areas when available.

One way of thinking about these big theorems is that the first fundamental theorem says that antiderivatives exist, and the second says that you can use any antiderivative to calculate the area under a function (I haven’t mentioned this, but antiderivatives are not unique! also has derivative , and you can check that it gives the same area under from to ). So a large part of calculus is learning how to find antiderivatives for functions you want to study/integrate. What makes this so challenging is that there is no good, general method of finding antiderivatives – so you have to learn a lot of patterns and do a lot of computations. (We don’t do any of that here)

This concludes the theoretical development of calculus in AP Calculus AB, and Math 90 at Brown for that matter. But I’d like to mention one under-emphasized fact about the material we’ve discussed here – this will be the final section.

** 1.4. Why do we care about integrals, other than to calculate area? **

Being able to compute areas is cool and useful in its own right, but I think it’s also way over-emphasized. Integrals and derivatives, the two fundamental tools of calculus, allow an entirely different method of thinking about and solving problems. Let’s look at two examples.

\subsubsection{Population growth}

Let’s make a model for population growth from first principles. The great strength of calculus is that we can base our calculations only on assumptions of related rates of change. For instance, suppose that is the population of bacteria in a petri dish at time . We might guess that if there is twice as much bacteria, then there will be twice as much growth (since there will be twice as much bacteria splitting and doing bacteria-reproductive things). Stated in terms of derivatives, we think that the rate of change in bacteria population is proportional to the size of the population, i.e. for some constant .

Calculus allows one to ‘undo the derivative’ on using integration (and a few things that are not in the scope of this survey), and in the process actually explicitly gives that all possibilities for are , where is , the base of the exponential. To reiterate – calculus allows us to show that the only functions whose size at is proportional to its slope at are functions of the form . Then if we measured a bacteria population at two times, we could solve for and , and have an explicit model. It also turns out that this model is really good for small bacteria sizes (before limiting factors like food, etc. become an issue). But it’s possible to develop more sophisticated models too, and these are not hard to create and experiment with.

\subsubsection{Laws of motion}

Galileo famously showed (reputedly by dropping things off the Leaning Tower of Pisa) that acceleration due to gravity is a constant and is independent of the mass of the object being dropped. Well, acceleration is the rate of change of velocity. If we call the velocity at time and the acceleration at time , then we suspect that is a constant. Since acceleration is the rate of change of velocity, we can say that , or that . Integration ‘undoes’ derivatives, and it turns out the antiderivatives of are functions of the form for some constant . So here, we suspect that for some constant .

Well, what is that constant? If we dropped the object at rest, then its initial velocity was . So at time , we expect . This means that (if it didn’t start at rest, then we get a different story). Thus . More generally, if it had initial velocity , then we expect that .

We can do more. Velocity is change in position per time. If is the position at time , then . It turns out that the antiderivatives of are , where is some constant.

In short, we are able to derive formulae and equations that govern the laws of motion starting with simple, testable observations. What I’m trying to emphasize is that calculus is an essential tool for model-making, experimentation, and predictive/reactive analysis. And these few examples barely provide a hint of what calculus can do. It’s an interesting, powerful, expansive world.

**2. Concluding remarks **

I hope you made it this far. If you have any comments, questions, concerns, tips, or whatnot then feel free to leave a comment below. For additional reading, I would advise you to use only google and free materials, as everything is available for free (and I mean legally and freely available). The last section, section 1, actually details first examples of a class on Ordinary Differential Equations, which describe those equations that arise from relating the values of a function with values of its rate of change (or rates of rate of change, etc.).

This document was written with a slightly modified latex2wp, so I have pdfs available. The pdfs do not include the gifs, and the graphics are a bit too big to be natural. If you’re nice, I also have the TeX available.

The graphics were all produced using the free mathematical software SAGE. I highly encourage people to check SAGE out.

And to my students – I look forward to seeing you in class. Our first class is this coming Thursday.

]]>IdeaLab invited 20 early career mathematicians to come together for a week and to generate ideas on two very different problems: Tipping Points in Climate Systems and Efficient Fully Homomorphic Encryption. Although I plan on writing a bit more about each of these problems and the IdeaLab process in action (at least from my point of view), I should say something about what these are.

Models of Earth’s climate are used all the time, to give daily weather reports, to predict and warn about hurricanes, to attempt to understand the effects of anthropogenic sources of carbon on long-term climate. As we know from uncertainty about weather reports, these models aren’t perfect. In particular, they don’t currently predict sudden, abrupt changes called ‘Tippling points.’ But are tipping points possible? There have been warm periods following ice-ages in the past, so it seems that there might be tipping points that aren’t modelled in the system. Understanding these form the basis for the idea behind the Tipping Points in Climate Systems project. This project also forms another link in Mathematics of Planet Earth.

On the other hand, homomorphic encryption is a topic in modern cryptography. To encrypt a message is to make it hard or impossible for others to read it unless they have a ‘key.’ You might think that you wouldn’t want someone holding onto an encrypted data to be able to do anything with the data, and in most modern encryption algorithms this is the case. But what if we were able to give Google an encrypted dataset and ask them to perform a search on it? Is it possible to have a secure encryption that would allow Google to do some sort of search algorithm and give us the results, but without Google ever understanding the data itself? It may seem far-fetched, but this is exactly the idea behind the Efficient Fully Homomorphic Encryption group. Surprisingly enough, it is possible. But known methods are obnoxiously slow and infeasible. This is why the group was after ‘efficient’ encryption.

So 20 early career mathematicians from all sorts of areas of mathematics gathered to think about these two questions. For the rest of this post, I’d like to talk about the structure and my thoughts on the IdeaLab process. In later posts, I’ll talk about each of the two major topics and what sorts of ideas came out of the process.

20 mathematicians, 5 days, 2 intractable problems. What is to be done? The first day was spent almost entirely on introductions. Each of the 20 participants prepared a 6 minute, 3 slide introduction about his or her own work and background. There was an interesting design plan here: all 20 participants spoke to each other, even though they were going to split into 2+ groups within the next 24 hours. This might be because the 20 people hadn’t officially precommitted to one or the other of the two problems. You could still decide. Conceivably, there might be cross-pollination and members from each half could inform each other. But while this sounds like a good idea, it also seems not very likely. The two problems were very disparate: climate models seem to rely very fundamentally on dynamical systems (which I know nothing about, really). Cryptography seems to rely on one-way or trapdoor functions, thus far with a number theory bent.

Then again, the goal of the week was for cross-pollination and spreading ideas. Both tipping points and homomorphic encryption need an influx of new ideas. So perhaps it wasn’t a bad idea? What I do know is that the opening introductions did not help people bridge the gap. The introductions were full of jargon, acronyms, and a general assumption of a base level of knowledge that was perhaps true of those who would work on the same problem, but which was beyond the level of those on the other problem. For example, some did work on dynamical systems and gave explicit formulas in terms of massive systems of differential equations with dozens of parameters. An unintended consequence of having a 6-minute, 3-slide intro is that there’s not much space to say anything. So to fit an accurate representation of your thesis topic (which many did), you have to be fast and dense – not conducive to general understanding.

I don’t mean to say the number theorists were any better. On our part, there were Dirichlet series and L-functions being acted on with Hecke operators, all without definitions given due to time/space/planning consequences. As a result, I think that the introductions largely served to grow the divide between the two problem groups. I didn’t anticipate this beforehand, but I think the next IdeaLab should really think about this. Either split up the groups earlier so that there is more conversation sooner, or somehow advertise that the introductions need to be aimed at a lower audience.

After the introductions, the organizers (relative experts on the two problems) gave an overview of the problems and the current level of progress towards their solution. The group hadn’t split yet, and I must say I enjoyed learning about problems facing modelers of climate systems and the potential for tipping points. This presentation was given by Christopher Jones, and he instilled within me sufficient interest for me to have at least kept tabs on the climate group’s progress over the week (although I did not contribute). I also enjoyed Henry Cohn’s explanation of homomorphic encryption, but I was already familiar with the material: the dramatic power help less sway over me. Perhaps this is how the applied mathematicians felt about the climate talk?

This was the end of the first day, but by the following morning the participants needed to decide which group they were going to attend since almost every subsequent meeting was split. I came to work on homomorphic encryption as a student of Jeff Hoffstein, one of the organizers of the cryptography portion (and the reason why me, as a grad student, felt comfortable enough to attend a conference/workshop aimed at early-career mathematicians), so I didn’t need to make a decision. But I don’t think anyone was actually on the fence. The disciplines were very separate.

The second day started with more talks given by the experts on the subject. The material was more pointed, more technical. Unlike the first day, I learned a lot very quickly. Or perhaps I should say that many things I didn’t know were quickly taught to me, and I picked up many bits and pieces. “The goal,” they continually reminded us, “is not to completely solve fully homomorphic encryption. That’s not a reasonable goal for a week. The goal is to come up with new ideas, to reinvigorate, inform, or create new connections to the field.” Then afternoon came, and the experts were about to set us loose.

This was a very exciting time. Ten of us are sitting together, none of us cryptographers, suddenly to be set loose. I don’t know what we would have done, but someone had the foresight to have us all sit and just brainstorm a large list of potential ideas/areas/hard problems to explore or investigate. I think this stage was essential: it gave us some sort of direction. This is my second big piece of advice to future IdeaLabs: keep up this sort of brainstorming session. After we had a good sized list and nothing else struck us, we split into two smaller groups of roughly five each to pursue subsets of the ideas. The climate group did roughly the same, except that three subgroups formed (I think – I wasn’t there, so I won’t really say much more about their process throughout the week).

What this meant for the cryptography group is that the five of us with a number theory background formed a subgroup, and those with a graph theory/logic/probability background formed another. From this moment on, we were more or less left to our own devices by the experts. This was intentional – the idea is that if they were around, the temptation to look to them for guidance would likely lead group progress towards already-established ideas in the field. And why would you want that?

We worked a lot harder and on a lot more different directions than I had anticipated. We toyed with tropical geometry, Hecke algebras, lattices, rings, etc. We bounced ideas off each other, and often split into subsubgroups that varied in composition and goal. I liked the fluidity, but I don’t know if we were too unfocused. Perhaps we were too “free” in accepting that we weren’t going to solve the problem? We certainly generated a lot of ideas, albeit not wholly fleshed out ideas.

One thing I really enjoyed was seeing how others worked and were informed by their backgrounds. For the second time, I attended an ICERM program as the most junior mathematician around. It’s very inspiring to be around more experienced mathematicians. Some tried to modify existing protocols. Some sought to find new ‘hard problems’ to create new trapdoor functions. Some were excited to jump into areas of math far from their own discipline because it seemed fun or interesting. The graph theorists had some really interesting ideas that are completely different from modern cryptography (as far as I can tell). It wouldn’t surprise me if there were some really nice deep material in what they presented. All in all, it was exciting and rewarding.

All too soon, Friday came, and we presented our ideas to all 20 participants and a panel of visitors from institutions like the NSF. I didn’t expect the time to go by so soon. The visitors and experts had many pointed and well-thought questions. I was particularly impressed with one of the climate presentations and the question and answer session between them and the panel. In a later post, I’ll talk more about that.

Finally, the panel told us about grants, grant sources, institutes of collaborative mathematics (like ICERM), and so on. It was a good experience, and I would absolutely recommend it to others. Perhaps more importantly, I’m still thinking about some of the things I worked on at IdeaLab. I have two encryption schemes in particular that I really like and that seem interesting but different to current schemes. I’m also very happy to know the people I met. We run in tiny circles like hamsters sometimes, and I’m sure that we’ll cross paths again. If nothing else, I think this program allowed me to expand my collaborative sphere. I’m really tempted to delve deeper into cryptography too. There seems to be room for more mathematicians, as opposed to computer scientists and programmers, in the field.

I should also mention that another member from my subgroup, Adriana Salerno, has written a post to her blog PhD+epsilon about her IdeaLab experience too. You should read it to get another perspective!

There’s a cute story here too. During the first working day (Tuesday), Adriana was the first person daring enough to go up to the white board and just start going down idea paths. (There’s really nothing to be afraid of, but she was the first to overcome the hesitation). She filled up the whole board, and I got an idea and went up to the board. One minute later, while I’m explaining what I was thinking about, the ICERM staff takes a picture and posts it to their site (below).

An action shot of math in progress!

As a final note, I’d like to mention one last thing I would add to the IdeaLab experience. There were many talks and many slides, and the final presentations were done beamer-style in slides. But they aren’t available (or at least, not yet). But why not? Shouldn’t the ideas from the IdeaLab be made available? Nonetheless, I had a great time, I learned a lot, and I’d recommend IdeaLabs to everyone.

]]>Now that we’d established ideas about solving the modular equation , solving the linear diophantine equation , and about general modular arithmetic, we began to explore systems of modular equations. That is, we began to look at equations like

Suppose satisfies the following three modular equations (rather, the following system of linear congruences):

Can we find out what is? This is a clear parallel to solving systems of linear equations, as is usually done in algebra I or II in secondary school. A common way to solve systems of linear equations is to solve for a variable and substitute it into the next equation. We can do something similar here.

From the first equation, we know that for some . Substituting this into the second equation, we get that , or that . So will be the modular inverse of . A quick calculation (or a slightly less quick Euclidean algorithm in the general case) shows that the inverse is . Multiplying both sides by yields , or rather that for some . Back substituting, we see that this means that , or that .

Now we repeat this work, using the third equation. , so that . Another quick calculation (or Euclidean algorithm) shows that this means , or rather for some . Putting this back into yields the final answer:

And if you go back and check, you can see that this works.

There is another, very slick, method as well. This was a clever solution mentioned in class. The idea is to construct a solution directly. The way we’re going to do this is to set up a sum, where each part only contributes to one of the three modular equations. In particular, note that if we take something like , where this inverse means the modular inverse with respect to , then this vanishes mod and mod , but gives . Similarly vanishes mod 5 and mod 9 but leaves the right remainder mod 2, and vanishes mod 5 and mod 7, but leaves the right remainder mod 9.

Summing them together yields a solution (Do you see why?). The really nice thing about this algorithm to get the solution is that is parallelizes really well, meaning that you can give different computers separate problems, and then combine the things together to get the final answer. This is going to come up again later in this post.

These are two solutions that follow along the idea of the Chinese Remainder Theorem (CRT), which in general says that as long as the moduli are relative prime, then the system

will always have a unique solution . Note, this is two statements: there is a solution (statement 1), and the statement is unique up to modding by the product of this moduli (statement 2). *Proof Sketch: *Either of the two methods described above to solve that problem can lead to a proof here. But there is one big step that makes such a proof much easier. Once you’ve shown that the CRT is true for a system of two congruences (effectively meaning you can replace them by one congruence), this means that you can use induction. You can reduce the n+1st case to the nth case using your newfound knowledge of how to combine two equations into one. Then the inductive hypothesis carries out the proof.

Note also that it’s pretty easy to go backwards. If I know that , then I know that will also be the solution to the system

In fact, a higher view of the CRT reveals that the great strength is that considering a number mod a set of relatively prime moduli is the exact same (isomorphic to) considering a number mod the product of the moduli.

The remainder of this post will be about why the CRT is cool and useful.

Firstly, the easier application. Suppose you have two really large integers (by really large, I mean with tens or hundreds of digits at least – for concreteness, say they each have digits). When a computer computes their product , it has to perform digit multiplications, which can be a whole lot if is big. But a computer can calculate mods of numbers in something like time, which is much much much faster. So one way to quickly compute the product of two really large numbers is to use the Chinese Remainder Theorem to represent each of and with a set of much smaller congruences. For example (though we’ll be using small numbers), say we want to multiply by . We might represent by and represent by . To find their product, calculate their product in each of the moduli: . We know we can get a solution to the resulting system of congruences using the above algorithm, and the smallest positive solution will be the actual product.

This might not feel faster, but for much larger numbers, it really is. As an aside, here’s one way to make it play nice for parallel processing (which vastly makes things faster). After you’ve computed the congruences of and for the different moduli, send the numbers mod 5 to one computer, the numbers mod 7 to another, and the numbers mod 11 to a third (but also send each computer the list of moduli: 5,7,11). Each computer will calculate the product in their modulus and then use the Euclidean algorithm to calculate the inverse of the product of the other two moduli, and multiply these together. Afterwards, the computers resend their data to a central computer, which just adds the result and takes it mod (to get the smallest positive solution). Since mods are fast and all the multiplication is with smaller integers (no bigger than the largest mod, ever), it all goes faster. And since it’s parallelized, you’re replacing a hard task with a bunch of smaller easier tasks that can all be worked on at the same time. Very powerful stuff!

I have actually never seen someone give the optimal running time that would come from this sort of procedure, though I don’t know why. Perhaps I’ll look into that one day.

This is really slick. Let’s lay out the situation: I have a secret. I want you, my students, to have access to the secret, but only if at least six of you decide together that you want access. So I give each of you a message, consisting of a number and a modulus. Using the CRT, I can create a scheme where if any six of you decide you want to open the message, then you can pool your six bits together to get the message. Notice, I mean *any six* of you, instead of a designated set of six. Further, *no five people* can recover the message without a sixth in a reasonable amount of time. That’s pretty slick, right?

The basic idea is for me to encode my message as a number (I use P to mean plain-text). Then I choose a set of moduli, one for each of you, but I choose them in such a way that the product of any of them is smaller than , but the product of any of them is greater than (what this means is that I choose a lot of primes or near-primes right around the same size, all right around the fifth root of ). To each of you, I give you the value of and the modulus , where is your modulus. Since is much bigger than , it would take you a very long time to just happen across the correct multiple that reveals a message (if you ever managed). Now, once six of you get together and put your pieces together, the CRT guarantees a solution. Since the product of your six moduli will be larger than , the smallest solution will be . But if only five of you get together, since the product of your moduli is less than , you don’t recover . In this way, we have our secret sharing network.

To get an idea of the security of this protocol, you might imagine if I gave each of you moduli around the size of a quadrillion. Then missing any single person means there are hundreds of trillions of reasonable multiples of your partial plain-text to check before getting to the correct multiple.

A similar idea, but which doesn’t really use the CRT, is to consider the following problem: suppose two millionaires Alice and Bob (two people of cryptological fame) want to see which of them is richer, but without revealing how much wealth they actually have. This might sound impossible, but indeed it is not! There is a way for them to establish which one is richer but with neither knowing how much money the other has. Similar problems exist for larger parties (more than just 2 people), but none is more famous than the original: Yao’s Millionaire Problem.

Alright – I’ll see you all in class.

]]>