Thursday, September 24, 2009

Homology

It's been a while since I gave an update on my research project, and since I got working on blogger, it's time to write a new post.

So to explain what Smith Normal Form is, I should explain what it does, and to do that I need to explain the problems that I am looking at. I'm going to assume that you are at least somewhat familiar with the concept of a group and quotient groups. If not, read the wikipedia article on it.

So the first important idea is a homology group. Suppose that we have an infinite sequence of abelian groups and homomorphisms that satisfy the condition (that is, applying two of these homomorphisms always results in the identity). Then we see immediately that . Furthermore, since these are abelian groups, we have , so the quotient group is defined. This infinite sequence of abelian groups is called a chain complex and these quotient groups are called the homology groups. The nth homology group is defined as .

Okay, that's probably pretty opaque if you don't already know what I'm talking about, so here is a simple example (which will need an example of its own to illustrate, but that will come in time) called simplicial homology. We will build up a space as follows: To begin, we start with some set of isolated points, which we will call the 0-skeleton of our space. Next, we'll draw some lines (these are topological lines, so we're assuming that none of them touch except at points in the 0-skeleton and they don't have to be straight) between points of the 0-skeleton. The collection of points and lines that results will be called the 1-skeleton. Now, we take some triangles (that are "filled in") and glue them on so that their boundary is part of the 1-skeleton. The 1-skeleton, along with these triangles, form the 2-skeleton. We create the 3-skeleton, the 4-skeleton, et cetera by gluing in tetrahedra, 4-simplices, 5-simplices, and so on. (this gets harder to visualize past the 2-skeleton, so I'll stop there).

Now let's define a chain complex on this simplicial complex as follows. Our abelian groups are the free abelian groups generated by the n-simplices (so they'll all be isomorphic to for some k). The homomorphisms are the boundary maps, with orientation done so that the composition requirement is satisfied. For example, the boundary of a line will be (as one possibility), the formal difference of the start point and the end point (the reason for using addition as the group operation here will become apparent later). Then the boundary of a triangle will be directed so that the differences all cancel out when the boundary is taken again.

So as an example consider the following simplicial complex:
This complex is made up of six points, nine lines, and three triangles. So the chain groups are , and all the rest are trivial. We can then compute the homology groups. The first homology group is . I claim that this is isomorphic to . To see this, consider the top triangle. Suppose that our element of has a component with the top right line. Then the boundary includes a contribution of the top point. We must then have an equal component of the top left line to cancel out this contribution, but we have the relation that the sum of the boundary of the top triangle is trivial (because of the quotient). That means that we can effectively replace all the components of the top left and top right lines with the bottom line of the top triangle. Similarly, we can do this for the other two triangles and reduce our element to just a combination of the middle three lines. Then for these to have trivial boundary, the three components must all be equal, but they can be anything. This means that our homology group is , since it is simply determined by how many times it contains this loop around the center. The second homology group is much simpler, since the image of is trivial because is trivial, so the second homology group is the kernel of . The kernel of is also trivial, since the boundaries of the three triangles are all independent and none of them are empty. This means that the second homology group is the trivial group.

Wooo, if you followed that you probably understand as much as I do about homology (although I was pretty lazy about keeping track of orientation, which is bad)! In the next post on this series I'll explain what this has to do with matrices and Smith normal form.

Monday, September 21, 2009

LATEX

Hurrah! I finally have working on Blogger!

Wednesday, September 16, 2009

Bezout's Identity

Before making a blog post on Smith Normal Form, I want to step back and look at an identity that many of you probably know or think is obvious. This is, as the title suggests, Bezout's Identity. Additionally, I am trying to figure out a way to put LaTeX in these entries and I haven't quite gotten it yet, so this gives me something to write about without trying to format matrices in text format.

Bezout's Identity states that for any two integers, a and b, if g is their greatest common divisor then there exist integers x and y such that ax + by = g. A consequence of this fact is that the set of possible values for ax + by for any integers x and y is simply the set of multiples of g, since g clearly divides ax + by and Bezout's identity guarantees a construction for any multiple (just multiply x and y by the same number). Let us quickly prove this identity.

Let us induct on pairs (a,b) by the magnitude of b. If |b| = 0, then g = a, x = 1, and y = 0, so our base case is done. Now suppose we know Bezout's identity is true for all pairs with |b| < k. We will establish the identity for a pair (a,b) with |b| = k. Write a = qb + r with 0 <= r < |b| = k. Then notice that gcd(a,b) = gcd(b,r). Thus by the inductive hypothesis there are integers x' and y' with bx' + ry' = g, but bx' + ry' = bx' + (a - qb)y' = b(x'- qy') + ay' = g, establishing the identity for a pair (a,b) with b = |k| and completing our induction.

This proof has the added benefit of providing us with an algorithm to compute these coefficients. Let us consider the example of a=73, b=17. We start by writing out the division algorithm repeatedly as we would to compute the gcd:

73 = 4*17 + 5
17 = 3*5 + 2
5 = 2*2 + 1

Now we start from the bottom and write them in the following way:

1 = 5 - 2*2
= 5 - 2*(17 - 3*5) = 7*5 - 2*17
= 7 * (73 - 4*17) - 2*17 = 7* 73 - 30*17

So our coefficients are 7 and -30. This algorithm is also very nice for computing modular inverses. Notice at the end that if we take our entire equation modulo 73, we immediately have that the inverse of 17 is -30, or 43. However, this algorithm, being recursive, is not optimal for a computational method due to the need to remember each step. Because of this, I want to show you another algorithm to compute these coefficients.

We will begin exactly the same as before, using the division algorithm repeatedly until we reach the gcd.

73 = 4*17 + 5
17 = 3*5 + 2
5 = 2*2 + 1

But now, instead of working our way up from 1, we're going to work our way down toward 1, writing the equations as follows:

5 = 73 - 4*17
2 = 17 - 3*5 = 17 - 3*(73 - 4*17) = 13*17 - 3*73
1 = 5 - 2*2 = (73 - 4*17) - 2*(13*17 - 3*73) = 7*73 - 30*17

Notice that we arrived at the exact same answer as before, with coefficients of 7 and -30. In general, when working down only the last two results are necessary, similar to how when you compute Fibonacci numbers you only need to keep the last two in order to compute the next one. This trick means that to compute these coefficients only a constant amount of memory is necessary, although possibly a logarithmic amount of time.

Hopefully the example of 17 and 73 is illustrative enough, I don't feel like writing out a formal algorithm right now, but as an exerpt from my code, here is the function to compute the gcd of two numbers and the coefficients for Bezout's Identity.
int gcd(int a, int b, int& x, int& y) {
int ax=1, ay=0, bx=0, by=1;
while(b != 0) {
int q = a/b, r = a%b;
int tx = ax - q*bx, ty = ay-q*by;
a = b;
b = r;
ax = bx;
ay = by;
bx = tx;
by = ty;
}
x = ax;
y = ay;
if(a < 0) {
a = -a;
x = -x;
y = -y;
}
return a;
}
You can see all the code that I've written so far at http://github.com/bhamrick/csl.

Monday, September 14, 2009

The Beginning of Research

Six days ago school started. Fourth period that Tuesday I walked in to the Computer Systems Lab to see all of the projects we had proposed last year projected on the wall. at that point, I did not know what I really wanted to do, but knew that it was not what was on the wall. I did, however, know that I wanted to do something related to math, but because I'm in a computer systems lab, I would have to relate it to computing somehow. So that day I told the teacher that I had changed my mind about what I wanted to do over the summer and that I would do some sort of computational mathematics, but I didn't know what.

So that evening I started googling. What sorts of topics could I do computationally? My first thought was group theory, and indeed a google search for computational group theory yielded results, but I quickly realized that I did not know the important questions in group theory and would be more or less at a loss for what I could actually make progress on. As a next thought, I thought about what I could do that I knew a little bit about but not much, so while doing my research on something that I know about, I could be learning some interesting tidbits about the topic at hand. My topic of choice was topology.

Now I had learned some topology at Mathcamp, but of course a couple weeks isn't enough to say that I really know topology. I had some background in homotopy theory as well, as JR had gone through and shown us several nice applications of the fundamental group, and on the last day had briefly gone over the skeletal concepts behind homology. So if I were going to do a topological project, I would need to determine a few things:

1) How am I going to represent a topological space? Practically all interesting spaces have an infinite number of points (although maybe there is something interesting computationally in enumerating the finite spaces as there is for finite groups - I haven't looked into it), so there needs to be some implicit form.
2) What am I going to compute? I had some idea that homology was much easier to compute than homotopy (and that higher homotopy groups were often computed via what I later found out is called the Hurewicz Theorem), but since I really didn't know what homology was I wasn't sure how much I could do with this and the only homology I had seen was simplicial homology, and I wasn't sure about it's applicability.

So in order to answer these questions I started searching computational topology. What I quickly found was the Computational Homology Project (CHomP) and I also found references to this program called Kenzo. After reading through the documentation on the Kenzo site and the CHomP site, I realized that most of it was beyond what I could learn in a reasonable amount of time. So I abandoned this line of thought and went to a backup: complexity.

The biggest impression that I got from reading about CHomP and Kenzo was that their algorithms sucked. I had no idea what their algorithms really were, I couldn't find any satisfactory descriptions, but I knew that they took a long time, although much faster than humans (Kenzo claims to have experimentally determined homology groups that no human has). So I thought what is the most basic question that I can study the complexity for? My decision solved the first issue above - the formatting of the space. I said, "Okay, I know the definition (at least) of Simplicial Homology, so let's see how fast Simplicial homology can be computed theoretically."

That search yielded fruitful results. The first paper I found was titled On the Complexity of Computing the Homology Type of a Triangulation. Reading through this paper I found that this was pretty much exactly what I was looking at. The authors took a finite simplicial complex in matrix form (each column of the matrix simply being the boundary of the corresponding simplex) and probabilistically computed the homology groups in O(Dn^2) time where D is the dimension of the complex and n is the number of simplices. So this was a great start. Even better was at the end the authors wrote that area for future research could include probabilistic computation of homology for regular cell complexes and CW complexes, which are not conceptually much trickier than simplicial complexes. They also say that the computation of the cohomology ring operations could be useful, since while the cohomology is determined by the homology, there are spaces with the same homology and cohomology with different ring operations. So these two areas are where I am looking toward for my research. Currently I have only actually read into the first. I don't actually yet know what they are referring to by the cohomology ring operations.

So now I have a topic, but I need a plan of attack. While reading this first paper and thinking about it a little in my head, this problem of computing homology groups is really about computing something from an integer matrix, since the boundary maps can each be simply represented as an integer matrix. What I had gathered was that this was called Smith Normal Form, but I couldn't really understand what Smith Normal Form was, why it existed, or how to compute it naively. This was what I looked for today, and found this rather readable paper that introduces the concept of Smith Normal Form and explains how to compute it in a finite amount of time (although not bounded by anything, since the fact that it's finite simply comes from the fact that the integers are Noetherian). So now I have at least a plan for the first part of my research. I can for now remove myself from the topological nature of my problem and attack a very simple problem of computing Smith Normal Form. The topology will come back in because the fact that the matrix represents a simplicial (or regular cell or CW) complex means that it may be forced to satisfy certain sparsity or regularity requirements, which (as in the paper linked in the preceding paragraph) can be exploited in order to obtain a much faster expected running time than the naive or even less naive deterministic algorithms.

This series of posts will probably run two or three times a week, but most of the posts won't be this long as I was covering essentially an entire week here.

Sunday, September 13, 2009

The Birth Of A Dead Hamster

This blog fad has been going for a while. Many of my friends have blogs. I was opposed to it at first because I had nothing useful to do with it, just as I was opposed to Facebook. Actually I'm still opposed to it, but the opposition is quickly draining. I expect I'll actually break down and get one around this time next year, maybe earlier. For now, though, I will not.

So what made me get a blog? Well, quite simply, I realized that I could do something useful with it - that is I can chronicle my work on my senior research project and at the same time dump other thoughts into here. As such, at least for this year, it is quite possible that I'll be making several updates a week as I work toward getting something useful done.

I suppose I will explain the name of my blog up front. For several years now I have used the name Hamster1800. I actually created this name in elementary school when I got my AIM account. Hamster by itself was, of course, already taken, so I put a number after it, which happened to be 1800. It was not until later that it was pointed out to me that it could be interpreted as the phone number 1-800-HAMSTER. Anyway, I use this name in many places, including iccup, probably the most competitive ladder for playing starcraft outside of Korea. However, at some point a friend of mine thought it would be cool for us to put fake clan tags on our name, so he created the name [AhungrY]Melon, and another of my friends followed with [AhungrY]Argh. However, when I tried to create the account [AhungrY]Hamster, I was told that it was too long, so I changed it to [AdeaD]Hamster. This is the account that I use currently for playing Protoss. I have another account separate from both Hamster1800 and [AdeaD]Hamster that I use for playing Zerg, but I'm not going to release that name just yet.

So I was going to make my blog and I was trying to think of what the URL should be. I first tried bhamrick and hamster, but they were both unfortunately taken. Then after quite a bit of thought I remembered that I am also [AdeaD]Hamster and sure enough adeadhamster.blogspot.com was untaken. The title of the blog followed pretty quickly from that.

So this has been a pretty long intro post. I'm going to let it end here. Look for a post either later tonight or tomorrow depending on when I decide to post again.

For your amusement in the meantime: http://sendables.jibjab.com/view/ayCdy5tARoEdgXtb