The Life of a Dead Hamster: contests

Showing posts with label contests. Show all posts

Wednesday, November 24, 2010

Too Hard

Whoops, Tim already wrote a blog post with this same name. I'm using it anyway.

I was the problem czar for the Harvard-MIT November Tournament, which happened on Sunday, November 7, 2010. As problem czar, I was responsible for making sure that enough problems got written on time for the contest to run. That in itself was a huge task, made no easier by the relatively small amount of help that the February problem czars gave me. I personally wrote the majority of the test, with help from Travis in some key areas (he wrote many of the geometry problems), and from Jacob in not as key areas (he kept giving me problems about Bayesian inference...).

66 problems later, I get the (somewhat delayed) response from testsolvers: "Too Hard."

What is too hard?

Honestly I am probably too biased to comment, but I don't believe that it's possible for something to be too hard on its own. Being too hard can cause other issues, such as a lack of distinguishing power, but being too easy can also cause that. And I am fully aware of the issues that come from a test that lacks distinguishing power.

So when I see a complaint that something is too hard, what do I think? I feel that it is extremely more likely that some aspect of the undertaker is off rather than that the task is actually too hard. And it seems like others are the opposite -- they would much rather blame the test than the test takers, especially when they are the test taker.

Actually, I do that too. Lots of my blog posts last year were about things that I disagreed with in tests. Was it really the tests that were at fault or was it me? I think it was some of both. I don't think it's deniable that the HMMT calculus test was unable to distinguish between the top 4 competitors (hint: they tied), but at the same time it wasn't like the test gave them absolutely zero opportunity to distinguish themselves (the tie was at 28, not 50).

Regardless, I feel like there is this idea that if lots of people get two or fewer problems then the test is too hard. So, people say, make there be a few problems that everyone will get. But why? Now I've reduced my contest from 10 problems to 7. That doesn't seem useful at all.

I've rarely seen someone look at a test and say ``This is too easy,'' but many people will look at a test and say ``This is too hard.''

Tuesday, October 5, 2010

Contest Theory

The IOI this year took a radically different approach to contest programming than the IOIs in the past. The Canadian Host Scientific Committee decided that it would be a good idea for the IOI to less favor the teams who have individuals who train very hard on standard algorithmic and data structure problems. I will put off judging whether that was a good or a bad decision until later in this post.

There is one fundamental aspect of writing a contest which is up for major contention and I don't believe that people pay enough attention to it. This is the question that you have to answer before you write a single problem for your test. That question is who should win.

Who should win? It seems like a simple question. The person who is best at math should win a math contest, the person who is best at skiing should win an Olympic medal for skiing. But these answers don't really tell you who should win. If I were to run a contest and I said that the winner would be the person who is best at life, nobody would take me seriously. Why? Because there are a relatively small number of things that would be on this contest. We wouldn't be able to determine the best sports player by just playing football, or just football and soccer, or any other (proper) subset of the sports in the world. And even if we were to play every sport in the world, I'd have to give an arbitrary weighting to each one. How do I compare two people if they have different strengths? It's simply not possible.

This problem arises at a lower level too. Let's look at football. Who is the best football player? I could say let's have a competition where everyone plays football. But this won't work. Maybe one player is a very strong quarterback and another is a very strong wide receiver, but only one of them gets their favored role. Well okay, let's have them pick their positions. We still have an issue. How are we supposed to compare someone's performance as a quarterback to another person's performance as a wide receiver?

Perhaps the answer is to have a quarterback competition, solely to determine who is the best quarterback. That is certainly a possible solution, but sometimes this will make the field too small. Say we object to a spelling competition because there are words from a plethora of sources, which some people will be better at than others. So we use our technique and say okay, we will have a spelling competition consisting of only English words of Sanskrit origin. How many people are specialists of such words? There are probably a few, but not enough to make a competition.

So we need some sort of compromise. We have to accept the fact that there are different specialties within whatever activity for which we hold a competition and that these are in some ways not comparable, but at the same time compare them somehow and determine a winner. It is for this reason that it is not the job of the contest to determine the best, but rather it is the job of the contest to determine the winner.

Of course, if there is a single best competitor, that person should win the competition. If one person is better than everyone else at every position in a football team, he would win a football competition. If your method of determining the winner didn't do that, then you have some problems. But this can be resolved by essentially any performance based scheme that uses positive weights on every event.

But in practically every case, there isn't a single dominating competitor, but rather several top competitors who all do well in different areas. So which one of them should win? It's up to the contest organizers to decide.

And this decision is often very debatable. Perhaps the most prominent example for me personally is the US IMO team selection. There is little argument to be made against the choice to favor those who will bring home the highest IMO scores. The argument stems from the fact that the US team leadership has seemingly decided that the most important subjects to be good at are algebra and geometry, casting aside combinatorics and, to a lesser extent, number theory. It's painful to look at the TST to see three geometry problems, graded in difficulty, so that it is almost certain that geometry skill will matter in team selection, in stark contrast with the single combinatorics problem, difficult enough that only Evan solved it (although I like to think I might have if I hadn't spent nearly all of my time on a geometry problem), clearly not mattering for team selection. The IMO claims to be a contest about all four main subject areas, but the US strategy says loud and clear that this is not the case.

Who is at fault? It's difficult to say. Maybe no one. Maybe everyone. It could be that the lack of combinatorics problems did not stem from a belief that combinatorics does not correlate to IMO score. It could be simply that there was not a sufficient supply of appropriate and interesting problems last year, so the TST was forced to be mostly comprised of the other subjects. But I'm not convinced that this was the case.

So now we return to IOI. The Canadians looked at the IOI and decided that they did not like the choice of winner that was made in the past. They didn't want someone who simply coded problems from online judges for the past year to have an easy road to victory. They wanted someone who could think about a problem which didn't have a standard complete answer and could still perform well.

So they changed the contest.

Gennady still won, of course. IOI might be a case where a dominating force actually exists. Regardless, the IOI change hurt my personal placement, but I can understand where it came from. The Canadians did well to explicate their goals, their means to reach those goals, and why those were their goals in the first place. I agree that the IOI had been reaching a less optimal position by simply escalating the difficulty of algorithmic problems, but I also think that the Canadians went too far in the other direction.

Next time you're involved in running a competition. Ask your group the question, ``Who do we want to win?''

Thursday, July 1, 2010

Contests

As many of you have probably heard by now, I did not make the IMO team. Although I was definitely sad for the hours after the TST and perhaps for a few days after that, I am not bitter about it. After all, I'm in no position to say that I deserve a spot any more than the six who got it.

There's something about contests that I've known for a while, but TST brought it up again. Contests aren't for deciding the best, as much as people would like to think that. No, the person who wins the USAMO is not necessarily the best mathematician, nor is the person who wins ARML, nor is the person who wins HMMT, nor the winner of any other competition. Math contests don't crown the best mathematician. They crown the winner.

Sure, a trip to Kazakhstan would have been great. Winning certainly does come with perks. But when it comes down to it, I know that the fact that I lost on the TST just means that I'm not on the IMO team. It doesn't mean I'm worse at math.

Look forward to a more complete post on MOP soon.

Tuesday, March 2, 2010

Medalia de Aur

As some of you know, I went to Târgu Mureş, Romania for the Central European Olympiad in Informatics. This year, I went to Bucharest, Romania for the Romanian Masters in Mathematics. The team consisted of Allen Yuan, Vlad Firoiu, Sam Keller, Tim Chu, Albert Gu, and myself, headed by coaches Po-Shen Loh and Yi Sun.

Two days before we were to leave, Po-Shen sent us an email that, among other things, notified us that Lufthansa was currently experiencing a strike and that if our flight out of DC was canceled, the entire trip would be also. Obviously, this did not sit well with us, as we were all strongly looking forward to the trip.

Luckily, the strike was called off before we left, although Lufthansa was still short pilots, so some of the flights got canceled, but ours wasn't one of them. The flights to Bucharest actually went pretty well, including the AMC B. I did worse on the B than the A, but it really doesn't matter. I took it mainly because I figured everyone else would also and I didn't want to be bored for those 75 minutes. Tim had gotten a 96 on the A and was worried that he didn't qualify for AIME, and he wasn't exactly relieved when he got a 96 on the B as well.

On the trip there, we were expecting to be housed at Hotel Moxa, a 4 star hotel in Bucharest. However, it turns out that it was actually Complex Moxa, which is used for college dorms and is just an annex of the hotel or something. The rooms were pretty unfortunately bad, but ours had a TV in it! (the others apparently didn't). Because of the 7 hour time difference, the Olympics were on after all of the events for a day ended, which was extremely convenient. I definitely watched more of the Olympics while in Romania than any other time.

Sam checking out the room
We also found out that the complex didn't have an open wireless access point....But Vlad had this USB thing that allowed him to get internet access in Romania. It's called Zapp or something. At least we had internet access, even though it was pretty bad.

The next day we still weren't competing. We got our first taste of Romanian breakfast, which included an interesting tea (I think it was purple) that tasted pretty good, as well as some cheese. Being American, we obviously thought the portions were way too small so we ate masses of bread with oil and vinegar.

Our first Romanian breakfast
After breakfast, we met our guides and went to the high school where we would be taking the contest in the following two days. After touring the school and dropping in on a ``superior algebra'' class, the guides asked us if we wanted to go into the gym to play some sports. Inside, there were lots of people from various teams playing volleyball, but the court was pretty full so we didn't join them. Instead, we saw a ping-pong table, but nobody had any paddles, so we started playing basketball while we waited for a guide to retrieve paddles from the complex.

For some reason, someone thought it would be a good idea to play outside, even though there were huge puddles of water on the ground and the court was not very even. There were also ping-pong tables outside, but they looked pretty bad. They were really low, weren't flat, and the nets were actually iron fences.

China plays on them anyway
Eventually we got some paddles and played some ping-pong, as did the Chinese. The Chinese team didn't know much English and the only Chinese speakers were on the US team as either a student or a coach, so they spent a lot of time with us (and also Allen and Tim were in a room with two of them).

At some point we went back to our room to hang out until dinner, after which would be the opening ceremony. But as we were just starting to chill in our room, our guides came up to inform us that the opening ceremony got moved from 2000 to 1600, and we had to go back to the school.

The opening ceremony was actually quite nice. Only a small part of it was dual-run in Romanian and English. All of the guest speakers spoke in English, so translation was unnecessary, and they also all kept it very short. It made the opening ceremony much shorter than what I expected.

The next day was competition day 1.

Go go go!
I read the problems and solved 1i on sight, as did the rest of the team except for Vlad, who apparently took 1.5 hours on it. I then spent a bit of time on 1ii, but wasn't quite getting the details. I figured it would be easy anyway and went to do number 2 before finishing.

Number 2 was dispatched rather readily, and at this point I had about 3 hours left, if I remember correctly. I drew the diagram for 3 (although I actually drew the wrong diagram, thinking ``external'' meant that the quadrilateral was external to the circle, rather than the circle is external to the quadrilateral), wrote down some random stuff, and went back to 1ii. After all, surely a number 1 number theory would be easier for me than a number 3 geometry, right?

So it turned out that I didn't solve 1ii, and didn't have anything worth partial on 3, whoops. In the last 5 minutes I wrote down some stuff for 1ii that I figured had no hope of working, but it turned out to be extremely close to the correct solution. I left the room thinking ``Man, I'm going to have to tell the rest of the team that I didn't solve number 1.''

So talking with the others after day 1, it seemed initially that most of them had solved two problems: either 1 and 2 or 1 and 3. The exceptions were Allen, who solved only 1i and 2, and Sam, who solved only 1. After talking a bit more, however, Albert determined that his 1ii was completely wrong, and so he had only solved 1.5 problems as well. After day 2, we would find out that during coordination the coordinators had thought that Albert's solution had worked too, and Yi and Po-Shen had to tell them it was wrong to keep the spirit of the contest.

Allen and I both had essentially identical progress on 1ii, and since it was so close to the correct solution, we came out of coordination with 6s...somehow. The graders were apparently pretty lenient with scoring.

Day 1 Scores

ID	Name	P1	P2	P3	Total
USA1	Timothy Chu	7	7	0	14
USA2	Vlad Firoiu	7	3	7	17
USA3	Albert Gu	3	0	7	10
USA4	Brian Hamrick	6	7	0	13
USA5	Sam Keller	7	0	0	7
USA6	Allen Yuan	6	7	3	16

After day 1, we just went back to our room to hang out, being exhausted from the competition. Nothing much interesting happened. We just watched the Olympics and played card games, mostly.

We woke up the next day for day 2 of the competition.

No geometry! Wooo!

So I read the day 2 problems and I thought ``YES! There's no geometry! Let's get a 21 on day 2! Oh wait, these problems look time consuming. 4.5 hours might not be enough...'' Anyway I looked at problem 4 and killed it in about 20 minutes. I start working on problem 5 and it dies in another 50 minutes or so. At this point it's about 1050 and I have two complete solutions written up and I'm starting to think maybe number 6 is really hard and they gave us two really easy problems to compensate (a la IOI day 1).

So I spend the next 3 hours trying various stuff on number 6, but I don't do the thing that actually leads to a solution because it looked stupidly messy. Oh well. I wrote up what I had (which wasn't exactly the cleanest thing in the first place), and then turned in the test. When I was leaving the room, I figured I probably had a pretty standard result on day 2.

However, when I talked to the rest of the team, I found out that I could hardly be more wrong. They had all solved problem 4 (except Albert, who got a 0 on day 2, unfortunately), but nobody else had solved problem 5. I was really surprised. Tim thought he solved problem 6, but none of us could really verify it since he was the only one who felt that he had made significant progress.

Later in the day, we found out (with our awesome Chinese-speaking skills) that CHN1 had been the only Chinese team member to solve either 5 or 6 (and he solved both (and CHN was really Shanghai, not all of China)). Apparently 5 was supposed to be very difficult. I still don't really see why.

After day 2, we went to the mall to play some laser tag! Except that the game was actually pretty lame. At first there was only like one person on the red team, so it was just walking around for a while until the person running the thing decided to restart it. Unfortunately, the respawn time was still around 3 seconds, so whenever you killed someone they could just follow you until they respawn and kill you immediately. It made for a pretty annoying game.

We got back to the complex pretty late, so we missed the normal dinner and had to order pizza, and our discussion of day 2 with Yi and Po-Shen was at around 2230, way later than we expected.

Day 2 Scores

ID	Name	P4	P5	P6	Total
USA1	Timothy Chu	7	2	5	14
USA2	Vlad Firoiu	7	2	0	9
USA3	Albert Gu	0	0	0	0
USA4	Brian Hamrick	7	7	4	18
USA5	Sam Keller	7	2	0	9
USA6	Allen Yuan	7	2	0	9

The awards ceremony was the day right after day 2. But before that, coordination had to happen. So to get rid of us pesky contestants for a while, they sent us to the village museum: a collection of traditional Romanian houses. It would have been a really cool experience, but the ground was extremely muddy and it was simply unpleasant to walk around.

When we got back it was time for the awards ceremony. Well, almost. It was actually delayed for half an hour. Anyway, the awards ceremony, just like the opening ceremony, was very quick. The speakers knew that we didn't want to listen to a bunch of long speeches (and it was hard to understand some of their English anyway), so they went straight to the awards. Albert was the first USA competitor called up for honorable mention (solving at least one problem perfectly).

Next up was the bronze medals. There were a lot of bronzes, and Sam was among them. I was actually pretty nervous during the bronzes because I wasn't sure if I had screwed up something on day 2, in which case I would probably be in the low end of silver. As the bronzes ended, I breathed a sigh of relief.

The bronze medalists
Silvers started getting called now, and I was preparing to go up. They called the other three, and after a bit I handed my camera to Albert, expecting to be called up at any point. but the number of silver medals remaining was very clearly diminishing, and then they stopped. Stunned, I almost missed taking a picture of the silver medalists. At this point, I was just amazed.

The silver medalists
The gold medals started being announced, starting with the Chinese perfect scorer. Then the other gold medalists, and finally ending with me. The suspense was incredible. After going up to receive my gold medal, my hands were incredibly shaky. I could barely take pictures of the remainder of the ceremony, where China handed the trophy over to Russia (RMM has one trophy that the winning team keeps until another team ousts them), and then a few more short words.

After the award ceremony, Po-Shen informed us that the reason the awards ceremony was delayed was because they had to argue for my solution to #5 for about an hour. There was a step that I thought was obvious and Po-Shen thought was obvious, but the graders disagreed. Apparently they had to call in a third party to give an impartial opinion. Eventually, though, they agreed to give me a 7. Lesson from this: write more on combo problems because other people don't have the same idea of obvious as I do for combo.

Mathcamp pride!

Final USA Results

ID	Name	P1	P2	P3	Day 1	P4	P5	P6	Day 2	Total	Award
USA1	Timothy Chu	7	7	0	14	7	2	5	14	28	Silver Medal
USA2	Vlad Firoiu	7	3	7	17	7	2	0	9	26	Silver Medal
USA3	Albert Gu	3	0	7	10	0	0	0	0	10	Honorable Mention
USA4	Brian Hamrick	6	7	0	13	7	7	4	18	31	Gold Medal
USA5	Sam Keller	7	0	0	7	7	2	0	9	16	Bronze Medal
USA6	Allen Yuan	6	7	3	16	7	2	0	9	25	Silver Medal

The team with our lovely (and camera shy) guides

Tuesday, February 23, 2010

Thoughts on HMMT

Overall, HMMT was well run. However, some of the tests could definitely have been better written. I'm going to just talk about the Combinatorics and Calculus subject tests from individual, since those were the two I took, and I'll also talk about team and guts.

First up is Calculus. I think everyone should realize that a 4 way tie for first at 29 is a problem with the test. The problems that I liked on calculus were 1, 2, 3, and 8. The rest of them have some issues.

Problem 4: Everyone who thinks about this problem can probably get it, but I think it's not exactly kosher to assume that people that people know the equidistribution theorem.

Problem 5: Just differentiate 4 times...seriously? I mean there's the nicer approach where you can notice that you only get 4 copies of

when you differentiate the

term 4 times, so you can directly pull out the coefficient by looking at just that term. By the time problem 5 rolls around I think you should be moving away from the stupidly straightforward problems.

Problem 6: I didn't actually solve this problem, although I had enough intuition that I could have finished it rigorously somewhat quickly. I just said, ``Let's put the line through the inflection point'', which is exactly what you want to do as cubics are symmetric about the inflection point.

Problem 7: This problem shares the same issue as many of the problems on the test. The answer (set two equal and imaginary and the third one real) is guessable (although I don't think anyone did), but it's completely unreasonable to expect students to prove it in 50 minutes when there are 9 other problems to work on.

Problem 9: Nice solution, but do you really expect anyone to get it?

Problem 10: This one is definitely doable...but it basically has seeing it before as a prerequisite. I thought that was what we were trying to avoid after last year's #10. It's a nice technique, but I don't think anyone would be able to come up with it during the test.

Overall, calculus had relatively easy problems #1-#6, a doable #8, and impossible #7, #9, and #10. 29 was getting all the doable problems. It really doesn't help the test to put a bunch of impossible problems on. The difficulty just has such a huge jump between 6 and 7, with 8 in between somewhere. I'd not be surprised if there is not only a huge tie at 29, but also a huge tie at 23. Perfect scores aren't a problem; ties are.

Next up: Combinatorics. Most of this test was actually good. I only really have complaints about problems 7 and 10.

Problem 7: This problem is just so out of place at HMMT. Looking at the rest of the problems, there is absolutely no strenuous computation. This problem, in contrast, is a complete computation-fest, after a moderately silly manipulation with expected values.

Problem 10: Same issue as Calculus #7. It's somewhat possible (although I doubt anyone did) to guess the optimal configuration, but not reasonable to expect students to prove it during the test. It's made even worse by the obfuscation that $16 = 4^2$, so instead of trying things like 5x5 with 4 numbers, people would rather have tried 4x4 with 2 numbers. I really dislike the problem for this kind of test. It would make a good team round problem, though.

I would have liked the test a lot better if problem 10 were what is now problem 7, and an actual problem 7 were in the problem 7 slot, although I really don't like the current problem 7 as problem 10 either.

Now for team round. I liked the team round more than other rounds this year (although that might have been because we won), because I think there was actually a scaling of difficulty (and the ability to give partial credit helps immensely). However, some of the problems had minor issues.

Problem 1: This is pretty classic. I'm pretty sure that Dan is not wrong when he says that he has seen it before.

Problem 2: I feel like I have seen this problem before, although it may have been slightly different (and the key observation should be that every divisor of an odd number is odd).

Problem 4: I'm pretty sure this is way too classic (although I forgot to cover the case where the 2x2 system for x+y and xy is singular, oops!). Actually I'm wondering if it's even possible for A, B, C, and D to be rational except at x=0, y=0.

Problem 5: I think it was fine, except that ``decreasing'' is ambiguous because you write polynomials starting from the highest order term, so we had the (unanswerable) question of does

have decreasing coefficients or does

? We did eventually settle on the one in the official solution, luckily.

Problem 6: Okay darn, I gave a pretty bad argument for the existence of an infinite ray being inside the set (A better argument is to just look at the furthest distance at each angle. It's clearly continuous and then it should have a maximum since

is compact, but that would mean it's bounded. Contradiction.). Mine can be made rigorous when you add in a weird continuity requirement and use the fact that

is compact, but then you just get exactly the argument above. I actually like this problem, but I think that Jacob has mentioned that usually problems that have roots in college level math are rejected.

Problem 7: Maybe we're just bad at geometry, but it took Alex Zhu and I about 3 hours working together to solve this problem. Pretty sure this was harder than both 8 and 9 (and 10a, but having 10 be 10 is justified by 10b), but it was a good problem.

Problem 9: Maximum should run from i=1 to n, not i=0 to n-1, but I think that was pretty clear for most people. This problem was definitely easier than some of the ones that appear before it on the test. I'm not sure why it's a problem 9.

Problem 10: 10a is nice, but when Jacob says ``The idea for 10a works for 10b too after a few hours of work,'' it starts to look a bit unreasonable. I feel sad because I would have guessed

and now I'm wondering why I didn't write that down. Maybe we would have gotten a point!

Finally, guts.

I really liked most of the guts round (in fact, almost all of it). But there were a few issues:

Problem 12: No, it is not ``obvious'' that

does not need to be multiplied out. Replace the 9 by a 2010 and it would be. I don't see why that wasn't done.

Problem 17: Again, assuming people know (or can intuit) the equidistribution theorem (although in this case you don't actually need equidistribution) is a bit sketchy. However, I mind this a lot less in guts than in the other rounds.

Problem 32: I'm pretty sure our team had a fraction that we did not have time to turn into a decimal approximation. Without calculators, I find it a bit annoying that you would ask for a decimal to 5 places.

Problem 33: You have an exact form, so I'm not sure why the test is asking for the floor of

. I'd also like to point out that Vieta jumping tells you that

immediately (and it's odd because this recurrence was used earlier in the round). I would have rather asked for the exact form, although perhaps it is impractical to grade? Regardless, I would avoid approximation problems that can be solved exactly.

As you can see, I have many fewer issues with the guts round than the other rounds. This is probably because I consider guts to have a vastly different style, so it is easier to write problems for it and also there are so many problems that it's almost impossible to get the issues like what happened on the calculus individual test.

I guess a large part of my complaint is that the calculus test had a huge wall at 29 points that really made it hard for people who took calculus to compete with the people who took the other tests. This definitely has happened in the past (such as with the even harder wall at 50 for geometry a few years ago), and I guess I'm just a bit bitter that it happened to my tests this year. I do think (looking at results again) it affected this year's competition a lot more than last year's. Last year calculus was the test that suffered from the most ties (which was probably from the test being a bit too straightforward), but it wasn't a four way tie for first.

Overall, well done as always, but let's make next year's even better!

The Life of a Dead Hamster