The contests for HMMT team selection have now concluded. However, I started writing this blog post before performance contests 4 and 5, so you will see my thoughts progress over a week.
Performance test 3. Halfway home. Something is clearly different. This time the curve is much more spread out. This is pretty much the exact score distribution that I was going for, but I wanted this to happen on performance test 1. So when I heard that the top 15 average was almost 10, I honestly winced: "Was this test too easy?" That question is still going through my mind. I like the score distribution, but I can't help but think that I only got it because the test wasn't at the same level as the last two.
And this is confirmed by others. When people commented on this test, they often said "I think this contest was much better than the last ones, mainly because problem 3 was completely trivial." When I look back over the test, I can't say that I really think it is too easy. Certainly problem 3 was easier than it has been before, but the hard problems were, I think, the same level as the ones before. The difference was that people had more time to work on them.
So no, this test wasn't too easy. What really gets me now is that nobody got a 4. Somehow people felt that problem 3 was easier than problem 2, and in fact more people solved problem 3 than problem 2. I don't know what to make of this. Problem 3 was a very classical problem that many people had seen before, but problem 2 was supposed to be an easy problem regardless of whether you've seen it before or not.
Problem 4 did pretty much exactly what it was meant to do: people who actually finished the problem (proved their answer) got it right, whereas people who sort of tried a few values and hazarded a guess didn't. Problem 5 turned out a bit easier than I anticipated, but that didn't affect the score distribution much at all. Finally, I was hoping more people would get problem 6, but I'm happy that one person got it right, even if he didn't quite finish the computation.
Let's go on to contests 4 and 5, which both happened on December 16, the day where all the seniors would find out about MIT early action decisions.
The graphs really interest me because the score distributions for the later contests were much smoother than those for the earlier contests. Contest 4 is the kind of score distribution that I would like if I were writing a real contest, where I'd want to name a top 3. This is because the top 3 are really distinct from the rest of the crowd. Test 5 is something that would be better for choosing a top 10 (although it looks like it's really only top 8), since there is a hump at 7. This means that the people who did better than that hump are distinguished, and should be named in the top 8. If you were to ask me my interpretation, I would say that contest 4 is better for training for the contests that we will be going to, and contest 5 is better for choosing the team (although the way our system works, getting an 8 on contest 5 doesn't help very much, unfortunately).
So what caused this change in score distribution? I can think of two possibilities: either my contests got easier or the team got used to my contests (hopefully improving in the meantime). I don't know if the contests really got easier. I will say, however, that I did feel that test 5 was definitely easier than test 4. However, what I didn't expect was that a nonzero number of people solved the last problem on test 4, whereas as far as I know only one person was close to solving the last problem on test 5 (and he finished incorrectly). This has a very simple explanation: the last problem on test 4 was a geometry problem, and I am relatively bad at geometry. Therefore, the geometry problems I think are the right difficulty are going to generally be easier than the nongeometry problems I think are the right difficulty. When I noticed that test 5 was easier than test 4, I thought about it a bit, then decided that I really liked test 5's problem 6, so I left it that way to give people time to work.
For those of you reading this, take a look at the tests now, after everything is done. Did the tests really get easier? Or can I safely say that the team is improving? Perhaps both, but I hope the tests didn't really get
that much easier. So now I am interested in what people thought of this experiment (since it really was an experiment). Did the performance tests go well? Did you feel that they were better or worse than the other contests for determining teams (consider all three teams - I really didn't write these for the purposes of determining C team, but they had an effect down there)? Finally: did you like them?
My guess is that next year I won't be able to write performance tests for the team, although I had a lot of fun putting them together. However, I'm still not sure what I'll be doing second semester. I might (don't hold me to this) put together some contests and run them problem-of-the-week style. Would the people currently at TJ be interested? (this would also be open to people outside of TJ, so feel free to say that you would be interested if you aren't at TJ as well). If I get enough interest, I'll definitely try to do it.
I have some other blog posts that I want to write now, so this post is a bit short compared to some of my previous ones. For those of you who haven't read the solutions to performance contest 5 yet, I invite you to try this generalization of the last problem:
Brian flips a coin repeatedly.
Before each flip, he decides randomly with equal probability to either continue flipping or to stop (so he flips 0 times half of the time). What is the probability that, after he stops, he has flipped exactly k more heads than tails in terms of k?