Jekyll2017-12-18T05:56:28+00:00https://nivent.github.io/Thoughts of a ProgrammerNiven Achenjang's Personal WebsiteDifference of squares2017-12-17T21:55:00+00:002017-12-17T21:55:00+00:00https://nivent.github.io/blog/difference-squares<p>Two new posts in one day? It must be Christmas. I think this post will be relatively short. I want to talk about a problem that popped in my head while I was working on the last post, and then mention some thoughts that this problem sparked which I hope are worth writing down before I forget.</p>
<h1 id="which-numbers-can-be-written-as-the-difference-of-two-squares">Which numbers can be written as the difference of two squares?</h1>
<p>Let’s just jump right into things. One natural place to start tackling this questions is with the primes. With that said, let $p$ be a prime number and suppose we can write $p=x^2-y^2$ for some $x,y\in\Z_{\ge0}$. This gives<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></p>
<script type="math/tex; mode=display">p=x^2-y^2=(x-y)(x+y)\implies p=x+y\text{ and }x-y=1</script>
<p>which means we require that $p$ is the sum of two consecutive numbers! Now, this took me longer than I’d like to admit to realize while I was working on this, but this is equivalent to saying that $p$ is odd. In other words, all primes $p\neq2$ can be written as the difference of two squares, namely $p=\lceil p/2\rceil^2-\lfloor p/2\rfloor^2$.</p>
<p>Since we’ve completely characterized which primes are differences of squares, we really hope that the product of differences of squares is also a difference of squares. I claim that a natural way of reaching this conclusion is to make use of the ring $\Z[\eps]\simeq\Z[x]/(x^2-1)$ where $\eps^2=1$. Using this ring let’s us factor $x^2-y^2=(x+\eps y)(x-\eps y)$, and while we could factor things before, this factorization is for useful since we can easily calculate</p>
<script type="math/tex; mode=display">(a+b\eps)(c+d\eps)=(ac+bd)+\eps(ad+bc)</script>
<p>which is enough to see that $(a^2-b^2)(c^2-d^2)=(ac+bd)^2-(ad+bd)^2$ so being a difference of squares is preserved by multiplication. Since all odd primes are differences of squares, this get’s us that all odd numbers are differences of squares<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
<p>Now, let $z=n^2m$ where $m=x^2-y^2$. Then, $z=n^2(x^2-y^2)=(nx)^2-(ny)^2$ is also a difference of squares. Hence, any number that $2$ divides an even number of times is a difference of squares. At this point, I was tempted to think that I was done, but then I realized that $8=3^2-1^2$ is also a difference of squares. Since $4=2^2-0^2$, we know that $2^2$ and $2^3$ are differences of squares, so $2^{2a+3b}$ is also a difference of squares where $a,b\in\Z_{\ge0}$. It’s not hard to see that every integer can be written as $2a+3b$ execept for 1 <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>, so $2^1=2$ is the only power of two that cannot be written as a difference of squares.</p>
<p>Since every odd prime is a difference of squares, and all but one power of $2$ is also a difference of squares, we’ve shown that the only number that might not be difference of squares are those that $2$ divides exactly once. Another way of characterizing these “bad” numbers is that they are the $n\in\Z$ for which $n\equiv2\pmod4$. Now, the equation $x^2-y^2\equiv2\pmod4$ has no solutions as one can verify by checking all 16 possible assignments of $x,y$. Thus, we’ve completely characterized the numbers that can be written as differences of squares; they are all the integers that are not $2\pmod4$! This is a surprisingly simple and nice outcome if you ask me.</p>
<h1 id="thoughts">Thoughts<sup id="fnref:5"><a href="#fn:5" class="footnote">4</a></sup></h1>
<p>With that question resolved, I wanna mention some thoughts it motivated. In the process of answering the question, it was helpful to consider the ring $\Z[\eps]$ which morally felt like the ring of integers of the number field $K=\Q(\eps)$ <sup id="fnref:4"><a href="#fn:4" class="footnote">5</a></sup>. However, this is technically wrong since $K$ isn’t a number field; it’s not a field at all (or even just a domain since $(1-\eps)(1+\eps)=0$). Despite this, we still have a natural notion of a “relative norm” on $K$ over $\Q$ given by $\knorm(x+y\eps)=x^2-y^2$, which made me wonder how much of algebraic number theory can be recovered if we study ring extensions of $\Q$ like this one <sup id="fnref:7"><a href="#fn:7" class="footnote">6</a></sup>.</p>
<p>Taking a step back to a slightly more general setting, my curiosity shifted away from number theory specifically to wonder what happens if you do Galois theory in a more general setting like this. The “ring extension” $K/\Q$ morally feels like degree 2 Galois extension with non-trivial autmorphism given by $\sigma(x+y\eps)=\sigma(x-y\eps)$. After having this in the back of my mind all day, this is what I’ve discovered so far as the possible beginnings of a formalism…</p>
<p>Fix some field $\F$. We want to study (commutative) rings <sup id="fnref:6"><a href="#fn:6" class="footnote">7</a></sup> containing this field, so let $R$ be such a ring. $R$ is still an $\F$-vector space, so we can still define the degree of the extension $R/\F$ as $[R:\F]:=\deg_{\F}R$. However, if we think on this more, it might make more sense to think of $R$ less an some sort of extension ring, and more as an $\F$-algebra. I don’t know how beneficial this algebra viewpoint is compared to thinking in terms of ring extensions, but it does at least suggest that the true object of interest of this hypothetical generalized Galois theory should be $R$-algebras where we require that $R$ is a $k$-algebra for some field $k$ (or equivalently, that $R$ is a vector space).</p>
<p>The first major issue I see with recovering Galois theory in this setting is the behavior of towers of extensions. Classically, if we have $L/K/F$ a tower of field extensions, then we get that $[L:F]=[L:K][K:F]$ and this allows one to perform induction arguments <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>. This fact basically follows from the niceness of vector spaces, but since generally for a ring $R$, non-free $R$-modules exist, we face some issues with studying towers of ring extensions. It’s possible that $R$ being a $k$-algebra (for $k$ a field) is a strong enough restriction to force all $R$-algebras to be free $R$-modules, but I don’t know enough algebra to think of a proof or counterexample to that claim off the top of my head (although it’s almost certainly false), so this issue remains unresolved. Despite this, if one could find away to get around this issue of towers of extensions, then<sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup> I think you can manage to recover at least a few gems from Galois theory. I’m a hopeful enough person to think that this might be possible in some nice settings, so</p>
<blockquote>
<p>Conjecture<br />
Let $\F$ be a field, and fix some $f(x)\in\F[x]$. Let $R$ be the “splitting ring” of $f(x)$ <sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup>. Then, the number of automorphisms $\sigma:R\rightarrow R$ fixing $\F$ is at most $\deg_{\F}R$.</p>
</blockquote>
<p>I obviously don’t know if this conjecture is true, but I feel that something like it should be true. I suspect I won’t do a lot of thinking about this anytime soon, so I leave this here to one day return to it and continue my thoughts.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>It’s technically also possible that x-y=-1, but without loss of generality, assume x>y <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>This didn’t hit me until after I’d done all the work that went into this post, but the difference between the nth square and the (n+1)th square is the nth odd number (i.e. (n+1)^2-n^2=2n+1) so this conclusion is trivial <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>If n=2k is even, then take a=k and b=0. If n=2k+1 is odd, then take a=k-1 and b=1 (this fails only if k=0, i.e. if n=1) <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>of a Programmer <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<table>
<tbody>
<tr>
<td>I’m pretty sure this notation technically makes no sense, but by it, I mean the ring Q(e)={a+be</td>
<td>a,b are fractions} where e^2=1</td>
</tr>
</tbody>
</table>
<p><a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>One immediate question I haven’t thought about the answer two is “does the traditional definition of the ring of integers still work?” In this example, one would expect that the integer ring of Q(e) should be Z[e], but I haven’t varified that this is what you get from just applying the standard definition of an integer ring. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>I’m hesistant to require commutativty because the quaternions also feel morally like an extension that should be considered in this more general theory, albeit not a Galois one <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>If you’re studying an extenion K/F, you might start by picking a in K-F so you get the tower K/F(a)/F. Prove your statement for F(a)/F, get the same thing for K/F(a) by induction of degrees and then use these to together to conclude something about K/F <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>Don’t quote me on this; I haven’t thought about it too deeply <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>i.e. the smallest ring (containing F) in which f splits <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Two new posts in one day? It must be Christmas. I think this post will be relatively short. I want to talk about a problem that popped in my head while I was working on the last post, and then mention some thoughts that this problem sparked which I hope are worth writing down before I forget.An interesting equation2017-12-17T20:39:00+00:002017-12-17T20:39:00+00:00https://nivent.github.io/blog/interesting-equation<p>One day I will return to writing posts that are not always very algebraic in nature, but this is not that day. I want to talk today about an example of a peculiar equation, but first a little background… In my mind, number theory (at least on the algebraic side) is ultimately about solving diophantine equations and not much more. This is what originally got me interested in the subject because trying to solve these equations can often feel like some sort of puzzle or exploratory game; there’s a common set of tricks one can apply, but not much of a single path or algorithm that always gets you the solution. Among the most basic/fundamental of tricks is to use congruences. If you seek (integer) solutions to $x^2+y^2=3$, then a natural thing to do is consider this equation (mod $4$) and note that there are no solutions to $x^2+y^2\equiv3\pmod4$ <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, so there are no integer solutions to the original equation; easy. In fact, you can state this principal in general.</p>
<blockquote>
<p>Fact<br />
Let $p(x,y)$ be some polynomial with integer coefficients. If $p(x,y)\equiv0\pmod m$ has no solutions for some $m$, then $p(x,y)$ has no integer solutions.</p>
</blockquote>
<p>It’s not far fetched to imagine that this is the only thing preventing polynomials from having integer solutions. That is, it’s natural to consider if any polynomial that can be solved$\pmod m$ for all $m\in\Z_{>0}$ must be solvable by integers. However, it turns out that this is not the case, and the subject of this post is a single counterexample</p>
<script type="math/tex; mode=display">\begin{align*}
x^2-82y^2=2
\end{align*}</script>
<p>For conveince, let’s let $q(x,y)=x^2-82y^2-2$. I haven’t thought too much about this, but for understanding this post, probably <a href="../number-theory">these</a> <a href="../solving-pell">two</a> posts on number theory, and some <a href="../ring-intro">ring theory</a> should be sufficient; if you don’t feel like reading those, then just stick with this post and if anything doesn’t make sense, you can refer back to those posts to figure it out or leave a comment, asking a question.</p>
<h1 id="solutions-in-q-and-zmod-m">Solutions in $\Q$ and $\zmod m$</h1>
<p>We’ll first establish that this equation has solutions both in $\Q$ and in $\zmod m$ for all $m$. For $\Q$, we’ll actually do one better and show that it has infinitely many solutions. We will first search for a single rational solution, so let $x=a/b$ and $y=c/d$ where $a,b,c,d\in\Z$. In order to keep things simple, we’ll assume $b=d$ so we can rewrite our equation as</p>
<script type="math/tex; mode=display">\begin{align*}
x^2-82y^2=2\iff(a/b)^2-82(c/d)^2=2\iff a^2-82c^2=2b^2\iff a^2=2b^2+82c^2
\end{align*}</script>
<p>This suggests one way of finding a solution. We just need to search for integers $b,c$ such that $2b^2+82c^2$ is a perfect square. If you were to try some examples by hand or write a computer program to search, you’d eventually come across $2(3)^2+82(1)^2=10^2$ which gives $(x,y)=(10/3,1/3)$ as a solution to $q(x,y)\in\Q[x,y]$. This is one solution, but far from the only one.</p>
<blockquote>
<p>Exercise<br />
Show that $q(x,y)\in\Q[x,y]$ has infinitely many rational solutions <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
</blockquote>
<p>To show that this equation has solutions in $\zmod m$, it’ll be sufficient to show that it can be solved for $m$ a prime power. This is a consequence of the <a href="https://www.wikiwand.com/en/Chinese_remainder_theorem">Chinese Remainder Theorem</a>.</p>
<blockquote>
<p>Theorem<br />
Let $p\neq3$ be a prime. Then, $q(x,y)$ has a solution viewed as a polynomial $q(x,y)\in\zmod{p^r}[x,y]$ for all $r$. That is, there exists integers $a,b$ s.t. <center>$q(a,b)=a^2-82b^2\equiv2\pmod{p^r}$</center></p>
</blockquote>
<div class="proof2">
Pf: Fix any positive integer $r$, and note that $\gcd(3,p^r)=\gcd(3,p)=1$, which means $3$ is a unit in $\zmod{p^r}$. Fix some $b$ s.t. $3b\equiv1\pmod{p^r}$ and note that $(x,y)=(10b,b)$ is a solution to $q(x,y)\in\zmod{p^r}[x,y]$ as <center>$$100b^2-82b^2=18b^2\equiv2(3b)(3b)\equiv2\pmod{p^r}$$</center>$\square$
</div>
<p>This just leaves the case of powers of $3$. This isn’t actually a special case, and can be handled much in the same way as all the others. We begin by noting that $(66/13,-7/13)$ is a rational solution to $q(x,y)$. Since $3$ does not divide $13$, we say that they are coprime, and so $13$ is a unit in $\zmod{3^r}$ for all $r$. Hence, $(66/13,-7/13)$ still makes sense as a solution in $\zmod{3^r}$. This will give away one solution to the exercise, but in case you’re curious where this point came from <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<p>Thus, we’ve shown $q(x,y)$ has solutions (mod $p^r$) for all prime powers $p^r$, and hence it has solutions (mod $m$) for all integers $m$.</p>
<h1 id="chinese-remainder-theorem">Chinese Remainder Theorem</h1>
<p>This section can be skipped, but I wanted to give a statement and proof of CRT for completeness.</p>
<blockquote>
<p>Chinese Remainder Theorem<br />
Let $R$ be a ring, and $I_1,\dots,I_n$ be a collection of pairwise coprime two-sided ideals (i.e. $I_i+I_j=R$ for all $i\neq j$). Then, we have a ring isomorphism</p>
</blockquote>
<center>$$\Large\begin{matrix}
\frac R{I_1\cap I_2\cap\dots\cap I_n} &\longrightarrow& \frac R{I_1}\oplus\frac R{I_2}\oplus\dots\oplus\frac R{I_n}\\
r+I_1\cap I_2\cap\dots\cap I_n &\longmapsto& \left(r+I_1,r+I_2,\dots,r+I_n\right)
\end{matrix}$$</center>
<div class="proof2">
Pf: We will prove this by induction on $n$, starting with the case of two ideals and the map $\phi:(r+I_1\cap I_2)\mapsto(r+I_1,r+I_2)$. We first need to confirm that this map is well-defined. Pick some $r,s\in R$ in the same coset so $r-s\in I_1\cap I_2$. Practically by definition, this means that $r+I_1=s+I_1$ and $r+I_2=s+I_2$ so $\phi$ is well-defined. From the behavior of cosets, it's clear that $\phi$ is a homomorphism so we only need to verify injectivity and surjectivity. Now, pick some $r+I_1\cap I_2\in\ker\phi$ so $r\in I_1\cap I_2$. Then, $\phi(r)=(r+I_1,r+I_2)=(I_1,I_2)$ is the identity so $\phi$ has trivial kernel which only leaves surjectivity. Fix some $x\in I_1$ and $y\in I_2$ such that $x+y=1$ which we get from coprimality. Then, $(r+I_1,s+I_2)$ has $sx+ry$ as a preimage, so $\phi$ is surjective. Now that we've established the case of two ideals, the general case will follow if we can show that $I_1$ and $I_2\cap\dots\cap I_n$ are coprime. To this end, for each $2\le j\le n$ pick $x_j\in I_1$ and $y_j\in I_j$ s.t. $x_j+y_j=1$. Then, <center>$$1=(x_2+y_2)(x_3+y_3)\dots(x_n+y_n)=\sum_{S\subseteq\{2,\dots,n\}}\left(\prod_{i\in S}x_i\right)\left(\prod_{j\not\in S}y_j\right)=X+\left(\prod_{j=2}^ny_j\right)$$</center>
where $X\in I_1$ is some linear combination of terms that contain $x_i$ for some $i\in\{2,\dots,k\}$ and $\prod y_j\in I_2\cap\dots\cap I_n$. Thus, $1\in I_1+(I_2\cap\dots\cap I_n)$ so $I_1+(I_2\cap\dots\cap I_n)=R$ which means these ideals are coprime as claimed. $\square$
</div>
<blockquote>
<p>Corollary<br />
Fix an integer $m$ and factor it as $m=p_1^{r_1}p_2^{r_2}\dots p_n^{r_n}$ where each $p_i$ is a different prime. Then, <center>$\zmod m\simeq\zmod{p_1^{r_1}}\oplus\zmod{p_2^{r_2}}\oplus\dots\oplus\zmod{p_n^{r_n}}$</center></p>
</blockquote>
<div class="proof2">
Pf: Exercise to reader
</div>
<p>For our purposes, we only need the corollary and not the full CRT. We want to confirm that $x^2-82y^2=2$ has solutions (mod $m$) for all $m$. Well, given any $m$, we factor it into prime powers to see that $\zmod m\simeq\zmod{p_1^{r_1}}\oplus\zmod{p_2^{r_2}}\oplus\dots\oplus\zmod{p_n^{r_n}}$. In the previous section, we found solutions in each of these factors so let $(x_j,y_j)$ satisfy $x_j^2-82y_j^2\equiv2\pmod{p_j^{r_j}}$. Then, CRT guarantees the existence of some <script type="math/tex">x^*,y^*\in\zmod m</script> such that <script type="math/tex">x^*\equiv x_j\pmod{p_j^{r_j}}</script> and <script type="math/tex">y^*\equiv y_j\pmod{p_j^{r_j}}</script> for all $j$. Thus, <script type="math/tex">(x^*)^2-82(y^*)^2\equiv2\pmod{p_j^{r_j}}</script> for all $j$. Since <script type="math/tex">(x^*,y^*)</script> satisfy $q(x,y)$ in each factor of $\zmod m$ (i.e. in each $\zmod{p_j^{r_j}}$), they must satisfy it in $\zmod m$ itself, so $q(x,y)$ does indeed have solutions modulo any integer.</p>
<h1 id="no-solutions-in-z">No Solutions in $\Z$</h1>
<p>To finish things off, we’ll show that there are no integer solutions to $x^2-82y^2=2$. This section will use some of the ideas previously touched upon in my <a href="../solving-pell">pell’s post</a>. Our first observation is that the right setting to analyze this equation is in $\zadjs{82}$, which, using terminology from that pell’s post, is the ring of integers for $K=\qadjs{82}$. We see that solutions to this equation correspond exactly to elements of $\zadj{82}$ with norm $2$. As it turns out, understanding which numbers have norm $2$ is related to understanding how $2$ factors in $\ints K=\zadj{82}$. More specifically, we wish to factor $(2)$ into prime ideals:</p>
<script type="math/tex; mode=display">(2)=(2,\sqrt{82})^2</script>
<p>This equality is easily verified as $(2,\sqrt{82})^2=(4,2\sqrt{82},82)=(2)(2,\sqrt{82},41)=(2)$ since $(2,\sqrt{82},41)=(1)$ as $41-2(20)=1$. Now, suppose $z=x+y\sqrt{82}$ ($x,y\in\Z$) has norm 2. i.e. assume that $=x^2-82y^2=2$. It is a fact that I will not prove that this is possible only if $(x+y\sqrt{82})=(2,\sqrt{82})$. Given this, we see that $(z^2)=(z)^2=(2)$ so $z^2=2u$ for some unit $u\in\ints K^\times$. Taking norms of both sides gives</p>
<script type="math/tex; mode=display">4\knorm(u)=\knorm(2)\knorm(u)=\knorm(2u)=\knorm(z^2)=\knorm(z)^2=4\implies\knorm(u)=1</script>
<p>Now, note that $\zadj{82}$ has fundamental unit <script type="math/tex">\eps=9+\sqrt{82}</script> and that <script type="math/tex">\knorm(\eps)=-1</script>. Since every unit is $\pm$ a power of $\eps$, this means we can write $u=\pm\eps^{2k}$ for some $k$. Thus, we can rewrite $z^2=2u$ as $(\eps^{-k}z)^2=\pm2$. To finish things off, we will show that neither of $\pm2$ is a square in $\ints K$, giving a contradiction. This is easily seen by observing that given any $a,b\in\Z$, we have</p>
<script type="math/tex; mode=display">(a+b\sqrt{82})^2=(a^2+82b^2)+2ab\sqrt{82}</script>
<p>This can’t be $-2$ because the non-$\sqrt{82}$ is always positive, and it can’t be $+2$ since that would require $b=0$ and $2$ is not a square in the normal integers. Thus, $\pm2$ are no squres in $\ints K$ so there’s no element of norm $2$ which means that $x^2-82y^2=2$ has no integer solutions.</p>
<h1 id="further-work">Further Work</h1>
<p>So we’ve shown that $x^2-82y^2=2$ has infinitely many rational solutions, and solutions in $\zmod m$ for all $m$, but no integer solutions. This means congruential obstructions are not the only things that can prevent a polynomial from being solved in the integers. We might still be interested in asking questions about better understanding congruential obstructions though. For example, in our analysis of this equation, the fact that we has solutions in $\zmod m$ for all $m$ was very related to the fact that we had (infinitely) many rational solutions, which begs the question</p>
<blockquote>
<p>Conjecture<br />
Let $p$ be a polynomial with integer coefficients. Then, $p$ has solutions in $\zmod m$ for all $m\iff p$ has infinitely many rational solutions.</p>
</blockquote>
<p>It actually turns out that this conjecture if false, and one counterexample is the polynomial</p>
<script type="math/tex; mode=display">p(x) = (x^2-2)(x^2-17)(x^2-34)</script>
<p>This has solutions (mod $m$) for all $m$ <sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>, but there are visibly no rational solutions. This breaks one direction of the iff above, but it’s still possible that the other direction holds, and I encourage you to investigate this.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Just try all 16 possible pairs of values for x,y <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="../number-theory">hint</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>I used the (10/3,1/3) solution to project the line y=0 onto the curve defined by x^2-82y^2=2. Under this projection/correspondence, the point (4,0) on this line gets mapped to (66/13,-7/13) on this curve <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>The easiest way to convince yourself of this claim is via <a href="https://www.wikiwand.com/en/Quadratic_reciprocity">quadratic reciprocity</a> <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>One day I will return to writing posts that are not always very algebraic in nature, but this is not that day. I want to talk today about an example of a peculiar equation, but first a little background… In my mind, number theory (at least on the algebraic side) is ultimately about solving diophantine equations and not much more. This is what originally got me interested in the subject because trying to solve these equations can often feel like some sort of puzzle or exploratory game; there’s a common set of tricks one can apply, but not much of a single path or algorithm that always gets you the solution. Among the most basic/fundamental of tricks is to use congruences. If you seek (integer) solutions to $x^2+y^2=3$, then a natural thing to do is consider this equation (mod $4$) and note that there are no solutions to $x^2+y^2\equiv3\pmod4$ 1, so there are no integer solutions to the original equation; easy. In fact, you can state this principal in general. Just try all 16 possible pairs of values for x,y ↩Algebra Part II2017-11-21T03:00:00+00:002017-11-21T03:00:00+00:00https://nivent.github.io/blog/ring-intro<p>This is the second post in a series that serves as an introduction to abstract algebra. In the <a href="../group-intro">last one</a>, we defined groups and certain sets endowed with a well-behaved operation. Here, we’ll look at rings which are what you get when your set has two operations defined on it, and we’ll see that much of the theory for groups has a natural analogue in ring theory.</p>
<blockquote>
<p>Bit of a Disclaimer<br />
I can’t possibly mention everything on a particular subject in one post, and I am not a particular fan of writing insanely long posts, so some things have to be cut. In particular, I aim to introduce most of the important topics in each subject without necessarily doing a deep dive, and while I will try to mention specific examples of things, I won’t spend too much time looking at them closely. It will be up to you to take the time to make sure the example makes sense. Because of this, I’ll try to include exercises that should be good checks of understanding. Finally, as always, things are presented according to my tastes and according to whatever order they happen to pop into my head; hence, they are not necessarily done the usual way.</p>
</blockquote>
<h2 id="rings-and-better-rings">Rings and Better Rings</h2>
<p>If you’ve heard of groups, I doubt I need to motivate rings much. Things like the integers, real numbers, matrices, etc. all form groups but when considering them as such, you have to make a choice about whether you want to consider they’re additive structure, or their multiplicative structure; why not look at both?</p>
<blockquote>
<p>Definition<br />
A <strong>ring</strong> $(R, +, \cdot)$ is a set $R$ together with two operations $+:R\times R\rightarrow R$ and $\cdot:R\times R\rightarrow R$ satisfying the following for all $a,b,c\in R$<br />
<script type="math/tex">% <![CDATA[
\begin{align*}
&\bullet (R, +)\text{ is an abelian group with additive identity }0\\
&\bullet a\cdot(b\cdot c) = (a\cdot b)\cdot c\\
&\bullet a\cdot(b+c) = a\cdot b+a\cdot c\text{ and }(a+b)\cdot c=a\cdot c+b\cdot c
\end{align*} %]]></script><br />
If additionally, $a\cdot b=b\cdot a$ always, then we call this a <strong>commutative ring</strong></p>
</blockquote>
<p>There are a few things worth noticing about the definition of a ring. First of all, it’s kinda short; at least, it was shorter than I expected the first time I saw it. There are like four different properties you need to satisfy to be an abelian group; to be a ring, you just need associative multiplication and (both left and right) distributivity. You don’t need inverses, and you don’t even need to have a multiplicative identity. This means you can get some weird stuff happening in general rings.</p>
<ul>
<li>In $2\Z$, you have $ab\neq a$ no matter which $a,b\in2\Z$ you choose</li>
<li>In $\zmod8$, you get $4*2=0$ even though $4,2\neq0$</li>
<li>Also in $\zmod8$, you get $7^2=1^2=3^3=5^2=1$ so you have 4 different square roots of $1$</li>
</ul>
<p>Because of this, we will see a number of different types of rings with increasingly more conditions on them, guaranteeing nice behavior. Also, in case you were wondering by require an abelian group under addition and not just a group, it’s because general groups and noncommutative rings are ugly enough separately, having one object that doesn’t commute under addition or multiplication just sounds awful <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Now, leet’s see some additional conditions we may want to place on rings</p>
<blockquote>
<p>Definition<br />
A ring $R$ is said to <strong>have unity</strong> (alternatively, $R$ is <strong>unital</strong>) if there exists some $1\in R$ such that $a\cdot1=1\cdot a=a$ for all $a\in R$.</p>
</blockquote>
<p>This gets rid of the first problematic ring I mentioned above, but not the other two. The third one may stick around for awhile, but the second we really don’t like.</p>
<blockquote>
<p>Definition<br />
An element $a\in R$ is called a <strong>zero divisor</strong> if there exists some nonzero $b\in R$ such that $ab=0$</p>
</blockquote>
<blockquote>
<p>Definition<br />
An <strong>integral domain</strong> $D$ is a commutative ring with unity and no zero divisors</p>
</blockquote>
<blockquote>
<p>Question<br />
Is the zero ring an integral domain?</p>
</blockquote>
<p>Now that’s something we can work with. It practive, almost all rings you work with will have unity, the majority will be commutative <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>, and plenty of them will be integral domains. The fact that integral domains don’t have zero divisors means that we can “cancel” multiplication. Normally when we have some equation like $ab=ac$, we cancel out the $a$’s and conclude that $b=c$. However, as the $4\cdot2=0$ example from $\zmod8$ shows, this isn’t always legitimate. In high-school algebra, we justify this cancellation by
saying we multiply both sides by $a^{-1}$, but $a^{-1}$ won’t exist most of the time in general! Luckily, even if it doesn’t we can still justify cancellation most of the time.</p>
<blockquote>
<p>Theorem<br />
Let $D$ be a ring. Then left (respectively, right) cancellation holds iff $D$ has no zero divisors</p>
</blockquote>
<div class="proof2">
Pf: $(\rightarrow)$ Assume left cancellation holds. Pick $a,b\in D$ such that $ab=0=a0$. By left cancellation, this means $b=0$ so there are no zero divisors.<br />
($\leftarrow$) Assume $D$ has no zero divisors. Pick $a,b,c\in D$ with $a$ nonzero such that $ab=ac$. Then, we can subtract to get $0=ac-ab=a(c-b)$. Since there are no zero divisors, either $a=0$ or $c-b=0$, but $a\neq0$ by assumption so $c-b=0$ and $c=b$. $\square$
</div>
<p>Above you’ll notice that we used $a0=0=0a$ for all $a$ in a ring. This is nonobvious, but also not all that profound. You can prove basic properties of rings like this <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> without much effort, so I’ll omit them. Furthermore, as with groups, we can define <strong>ring homomorphisms (or ring maps)</strong> as maps $f:R\rightarrow S$ such that $f(a+b)=f(a)+f(b)$ and $f(ab)=f(a)f(b)$; additionally, if $R,S$ both have unity we require $f(1_R)=1_S$. We can also define the <strong>kernal</strong> of a ring map $f:R\rightarrow S$ to be the subset of $R$ mapping into $0$. There’s also a notion of subring that’s exactly what you think it is.</p>
<p>For the rest of this post, I think we’ll be looking (almost) exclusively at commutive rings with unity, so unless otherwise specified, assume that’s the case. Now, before moving onto more definitions and whatnot, I want to make note of one of the most important classes of rings: polynomial rings.</p>
<blockquote>
<p>Definition<br />
Given a ring<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> $R$, the <strong>polynomial ring</strong> $R[x]$ is the ring of (formal) polynomials in $x$ with coefficients in $R$.</p>
</blockquote>
<p>The above definition isn’t all that formal, but the idea is that you have things like $3x^2+2x-7\in\Z[x]$, $\pi x^4-e\in\R[x]$, $3x-1/2\in\Q[x]$, etc. One thing to be careful about is that two polynomials are equal iff they are identically the same; any polynomial $p(x)\in R[x]$ gives write to a function <script type="math/tex">R\rightarrow R</script> via evaluation, but the mapping $p\mapsto[r\mapsto p(r)]$ <sup id="fnref:6"><a href="#fn:6" class="footnote">5</a></sup> is not necessarily injective! That is, you can have distinct polynomials that determine the same function such as $p(x)=x^2$ and $q(x)=x$ in $\zmod2[x]$; $p(x)\neq q(x)$ as polynomials even though $\forall n\in\zmod2:p(n)=q(n)$.</p>
<blockquote>
<p>Aside<br />
If you wan’t to careful define the polynomial ring, you can define it as the subset of the ring $R^{\N}$ of functions from $\N$ to $R$ consisting of elements that evaluate to 0 on all but finitely many $n\in\N$. You also have to specify what multiplication looks like because it’s not the usual componentwise product.</p>
</blockquote>
<h1 id="domains">Domains</h1>
<p>Unfortunately, unlike the previous post on groups, there isn’t some major result like Langrange’s Theorem of the First Isomorphism Theorem that we’re working towards here; this post is more a goaless overview of some basics in ring theory.</p>
<blockquote>
<p>Proposition<br />
If $D$ is an integral domain, then so is $D[x]$</p>
</blockquote>
<p>Before proving this, we’ll define the degree of a polynomial which is a notion we’ll see more than once.</p>
<blockquote>
<p>Definition<br />
Given some polynomial $p(x)=\sum_{k=0}^na_kx^k\in R[x]$, its <strong>degree</strong> $\deg p(x)$ is the largest $k$ such that $a_k\neq0$. Note that $\deg0$ is undefined whereas $\deg r=0$ for any nonzero $r\in R$.</p>
</blockquote>
<blockquote>
<p>Remark<br />
Given nonzero $p,q\in D[x]$ where $D$ is an integral domain, we have $\deg(pq)=\deg p+\deg q$ which is simple but important.</p>
</blockquote>
<p>The above remark is actually strong enough to imply to proposition, so we omit a formal proof of that. In the remark, we require $D$ to be an integral domain so that the leading coefficient <sup id="fnref:5"><a href="#fn:5" class="footnote">6</a></sup> of $pq$ is guaranteed to be nonzero since it’s the product of two nonzero things (i.e. the leading coefficients of $p$ and $q$). I don’t have a good transition here, but another important thing related to integral domains is…</p>
<blockquote>
<p>Definition<br />
Given a (possibly non-commutative) ring $R$ with unity, there is a unique ring map $\Z\rightarrow R$ (why?). If $D$ is a domain, then we define the <strong>characteristic</strong> $\Char D$ of $D$ to be the least positive $n\in\Z$ mapping to 0 under this unique ring map. If no positive integer maps to 0, then we say $\Char D=0$.</p>
</blockquote>
<p>There are a few ways to think of characteristic. In a while when we define ideals, it’ll become clear that the characteristic of $D$ is the generator of $\ker(\Z\rightarrow D)$; we can also say that $\Char D$ is the (least) number of times you can add 1 to itself in a ring before getting $0$ <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>; alternatively, remembering that rings are abelian groups, $\Char D$ is the additive order of $1$. Good examples to keep in mind here are $\Char\Z=0$ and $\Char\zmod p=p$. Vaguely put, characteristic is a good indicator for the behavior of a ring; weird things can happen in rings of low characteristics.</p>
<blockquote>
<p>Theorem<br />
Given an integral domain $D$ of nonzero characteristic, $\Char D$ is prime</p>
</blockquote>
<div class="proof2">
Pf: Let $D$ be an integral domain and assume $\Char D=n\neq0$. Now, write $n=ab$ ($a,b$ both positive) and let $f:\Z\rightarrow D$ be the ring map. We have $0=f(n)=f(ab)=f(a)f(b)$ but $D$ has no zero divisors so $f(a)=0$ or $f(b)=0$. Assume WLOG that $f(a)=0$. Since $a\le n$ and $n$ is the minimal integer with $f(n)=0$, we conclude that $a=n$ which means $b=1$. Thus, the only divisors of $n$ are $1,n$ so $n$ is prime. $\square$
</div>
<blockquote>
<p>Corollary<br />
$\zmod n$ is an integral domain implies that $n$ is prime</p>
</blockquote>
<p>The converse of that corollary is true as well, and proving both directions is left as an exercise. Let’s shift gears a little, and instead of talking about properties of rings, we’ll look at some specific types of elements.</p>
<blockquote>
<p>Definitions<br />
Let $R$ be a ring. An element $r\in R$ is called a <strong>unit</strong> if it divides 1. That is, there exists some $s\in R$ such that $rs=1$. We say a non-zero non-unit $r\in R$ is <strong>irreducible</strong> if whenever we write $r=ab$, it must be the case that either $a$ or $b$ is a unit. Finally, a non-zero non-unit $r\in R$ is <strong>prime</strong> if $r\mid ab$ implies that $r\mid a$ or $r\mid b$.</p>
</blockquote>
<p>In the integers $\Z$, the only units are $\pm1$, and prime and irreducible mean the same thing. In general, prime implies irreducible.</p>
<blockquote>
<p>Theorem<br />
Let $D$ be an integral domain. Then, every prime element is irreducible</p>
</blockquote>
<div class="proof2">
Pf: Pick some prime $p\in D$ and write $p=ab$. Then, $p\mid ab$ so $p\mid a$ or $p\mid b$. Assume WLOG that $p\mid a$ and write $a=pc$. Substituting into the first equation, we have $p=pcb$ which means $1=cb$ and so $b$ was a unit. $\square$
</div>
<p>The converse of this theorem does not hold in general though. For a counter example, we can consider the ring $\Z[\sqrt{-5}]=\{a+b\sqrt{-5}:a,b\in\Z\}$. Here, $2$ is irreducible as can easily be proven using the norm map <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>. However, $2$ divides $6=(1+\sqrt{-5})(1-\sqrt{-5})$ but divides neither of $1\pm\sqrt{-5}$ since, for example, any multiple of $2$ will have both components even. Since the two definitions coincided for integers, we’d like to study other rings where they are the same as well. To that end, we define the following</p>
<blockquote>
<p>Definition<br />
A <strong>unique factorization domain</strong> (or UFD) $U$ is an integral domain where every non-zero $x\in U$ can be written as a product $x=up_1p_2\dots p_n$ of a unit $u$ with irreducibles $p_i$. Furthermore, this representation is unique in the sense that given $x=wq_1q_2\dots q_m$ as well, we must have $m=n$ and (after rearrangement) $q_i=v_ip_i$ for units $v_i$.</p>
</blockquote>
<p>That definition is a bit of a mess, but the basic idea is that we have some analogue of the fundamental theorem of arithmetic. A good example of a UFD to keep in mind is $\Z[x]$, and in fact more generally</p>
<blockquote>
<p>Theorem<br />
If $U$ is a UFD, then so is $U[x]$</p>
</blockquote>
<div class="proof2">
Pf: Omitted. At this point, I don't think we've quite developed enough theory to prove this nicely, so look up a proof after finishing this post.
</div>
<p>Because I don’t want to wait too long before saying this, and because it’s related previously omitted proof, let’s look at just about the nicest rings known to algebra.</p>
<blockquote>
<p>Definition<br />
A <strong>field</strong> $k$ is a ring such that $(k-\{0\},\cdot)$ is an abelian group. i.e. all non-zero elements of $k$ are units.</p>
</blockquote>
<p>Examples of fields include $\Q, \R, \C,$ and $\qadjs2=\{a+b\sqrt2:a,b\in\Q\}$. Because multiplicative inverses exist, cancellation automatically holds in fields and so all fields are domains. The 4 examples I just gave all have characteristic 0, but fields with prime characteristic exist as well.</p>
<blockquote>
<p>Proposition<br />
$\zmod p$ is a field. More generally, any finite domain is a field.</p>
</blockquote>
<div class="proof2">
Pf: Let $D$ be a finite domain, and fix some non-zero $d\in D$. Consider the map $m_d:D\rightarrow D$ given by $m_d(a)=da$. We claim this map is injective. Pick some $a\in\ker m_d$. Then, $m_d(a)=da=0\implies a=0$ since $d\neq0$ by assumption. Thus, $m_d$ has trivial kernal and hence is injective. Now, any injective map between finite sets is automatically surjective, so $\image m_d=D$. In particular, there exists some $c\in D$ such that $m_d(c)=dc=1$ so $d$ has an inverse. $\square$
</div>
<p>When thinking of $\zmod p$ as a field, we usually denote it $\F_p$ and call it the (finite) field with $p$ elements. This is not the only connection between domains and fields. It’s clear that every subring of a field is a domain, but it turns out the converse also holds.</p>
<blockquote>
<p>Definition<br />
Given a domain $D$, its <strong>field of fractions</strong> $\Frac(D)$ is the fields whose elements are formal symbols $\frac ab$ ($a,b\in D$) modded by the relation $\sim$ with addition and multiplication given by<br /><center>
$$\begin{align*}
\frac ab+\frac cd=\frac{ad+bc}{bd} && \frac ab\frac cd=\frac{ac}{bd} && \frac ab\sim\frac cd\iff ad=bc
\end{align*}$$</center>
Note that we can embed $D$ in its field of fractions via the map $d\mapsto\frac d1$</p>
</blockquote>
<p>It’s up to you to verify that this construction gives an actual field. After that, forgetting fields for a moment, we look again at the definition of a UFD and realize that units can be annoying, so we’ll turn our attention to something a little more unit-agnostic.</p>
<h1 id="ideals">Ideals</h1>
<p>The big payoffs of the group theory post both were related to the idea of quotient groups. In this section, we’ll see how to define the analgous idea of quotient rings.</p>
<blockquote>
<p>Definition<br />
Given a ring $R$, an <strong>ideal</strong> $I\subseteq R$ is an additive subgroup such that $ar\in I$ for all $a\in I$ and $r\in R$.</p>
</blockquote>
<blockquote>
<p>Remark<br />
Depending on the author, ideals may or may not be rings. They are if you don’t require rings to have unity, but aren’t otherwise.</p>
</blockquote>
<p>We could have followed the footsteps of the group theory post and defined ideals as kernels of ring maps, but ideals have more a life of their own that normal subgroups, so this definition is the one always used.</p>
<blockquote>
<p>Proposition<br />
Given a ring map $f:R\rightarrow S$, its kernal $\ker f\subseteq R$ is an ideal.</p>
</blockquote>
<div class="proof2">
Pf: Exercise to reader
</div>
<p>Now, recall that in abelian groups, all subgroups are normal. Luckily for us, all rings are additive abelian groups, so ideals are automatically normal subgroups. This means we can form the quotient group $R/I$ as before; however, the additional condition that $I$ “absorbs” $R$ ensures that we can endow this quotient with a ring structure.</p>
<blockquote>
<p>Definition<br />
Given a ring $R$ and ideal $I\subseteq R$, the <strong>quotient ring</strong> $R/I$ is the quotient group endowed with the following multiplication of cosets<br /><center>$$(a+I)(b+I)=ab+I$$</center></p>
</blockquote>
<blockquote>
<p>Exercise<br />
Verify that this definition gives a well-defined ring.</p>
</blockquote>
<blockquote>
<p>Exercise<br />
Prove the first isomorphism theorem for rings: Given a surjective ring map $f:R\rightarrow S$, we have $R/\ker f\simeq S$</p>
</blockquote>
<p>As there are different types of rings, we have different types of ideals depending on how nice the associated quotient ring is. It’s worth convincing yourself that any ideal containing a unit must be all of $R$, and the only ideals of a field are the trivial ones <sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup>.</p>
<blockquote>
<p>Definition<br />
Let $R$ be a ring and $I\subseteq R$ an ideal of $R$. We say that $I$ is <strong>prime</strong> if $R/I$ is an integral domain, and $I$ is <strong>maximal</strong> if $R/I$ is a field.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let $R$ be a ring with ideal $I$. Then, $I$ is prime iff $ab\in I\implies a\in I$ or $b\in I$, and $I$ is maximal iff $I\neq R$ and given any ideal $J$ with $I\subseteq J\subseteq R$, either $J=I$ or $J=R$.</p>
</blockquote>
<div class="proof2">
Pf: The statement about prime ideals is left as an exercise, but we'll prove the one about maximal ideals here. $(\rightarrow)$ Assume $I$ is maximal, and pick some ideal $J$ with $I\subseteq J\subseteq R$. Let $f:R\rightarrow R/I$ be the quotient map. It's easily verified that $f(J)$ is an ideal so $f(J)=\{0\}$ or $f(J)=R/I$. In the first case, we have $J\subseteq\ker f=I\implies J=I$. In the second, any preimage of $1\in R/I$ is necessarily a unit, so $J=R$ as it contains a unit. ($\leftarrow$) Conversely, assume that $I\subseteq J\subseteq R\implies J=I$ or $J=R$. Let $f:R\rightarrow R/I$ be the quotient map again, and consider an ideal $\tilde J\subseteq R/I$. It's again easily verified that $f^{-1}(\tilde J)$ is an ideal. Since $0\in\tilde J$, we must have $I\subseteq f^{-1}(\tilde J)\subseteq R$ so $\tilde J=\{0\}$ or $R/I$. This implies that $R/I$ is a field. Indeed, given any nonzero $x\in R/I$, the ideal it generates $(x)=R/I$ so there must be some $y\in R/I$ such that $xy=1$. $\square$
</div>
<p>The above proof makes use of the <strong>ideal generated by x</strong> which is given by $(x)=Rx=\{rx:r\in R\}$. We can generalize this notion to any collection of elements</p>
<blockquote>
<p>Definition<br />
Given a (not necessarily finite) subset $S\subseteq R$, the <strong>ideal generated by S</strong> is the ideal<br /><center>$$\left\{\sum_{s\in S}a_s\cdot s:a_s\in R,\text{all but finitely many }a_s\text{ are zero}\right\}$$</center>
When $S=\{x_1,\dots,x_n\}$ is finite, this is commonly denoted<br /><center>$$(x_1,\dots,x_n)=\sum_{i=1}^nRx_i=\{r_1x_1+\dots+r_nx_n:r_i\in R\}$$</center></p>
</blockquote>
<p>With this, we can define that last special type of ideal in this post.</p>
<blockquote>
<p>Definition<br />
We call an ideal $I\subseteq R$ <strong>principal</strong> (or say it’s <strong>principally generated</strong>) if it is generated by a single element.</p>
</blockquote>
<p>Principal ideals are some of the nicest ideals, and behave very similar to numbers (i.e. elements of $R$). However, they have the added benefit that if you multiply one by a unit, nothing changes. Hence, we arrive at our next kind of ring</p>
<blockquote>
<p>Definition<br />
If $R$ is a domain where every ideal is principal, then we call $R$ a <strong>principal ideal domain</strong>, or <strong>PID</strong>.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Every PID is a UFD</p>
</blockquote>
<div class="proof2">
Pf: One of my goals this post is to avoid writing any proofs involving UFDs, so omitted.
</div>
<p>Examples of PIDs include $\Z$, and as we’ll see in a moment $k[x]$ for $k$ a field. One thing that is true in general is that $(p)$ is a prime ideal if $p$ is a prime element. Given the following theorem, this means that for PIDS, every prime ideal is maximal.</p>
<blockquote>
<p>Theorem<br />
In a PID, an ideal is maximal iff its generated by an irreducible</p>
</blockquote>
<div class="proof2">
Pf: $(\leftarrow)$ Let $I=(r)$ where $r\in R$ is irreducible and $R$ is a PID. Consider some ideal $J=(a)$ with $I\subseteq J\subseteq R$. Since $r\in(a)$, there must exist some $b\in R$ with $r=ab$. However, because $r$ is irreducible, either $a$ is a unit or $b$ is. If $a$ is a unit, then $J=R$. If $b$ is a unit, then $J=I$ since unit multiplies generate the same ideal. Thus, $I$ is maximal. ($\rightarrow$) Run the same argument in reverse: $I=(r)$ is maximal and $r=ab$ implies $(r)\subseteq(a)\subseteq R$ so fill in the blank. $\square$
</div>
<p>One application of the above theorem is that it let’s us generate fields of varying sizes.</p>
<blockquote>
<p>Exercise<br />
Show that if $f(x)\in\F_p[x]$ is an irreducible polynomial, then $\F_p[x]/(f(x))$ is a finite field of size $p^{\deg f}$</p>
</blockquote>
<p>Showing something is a PID directly can be difficult, so it’s sometimes helpful to instead show the stronger condition that your ring has a Euclidean algorithm on it.</p>
<blockquote>
<p>Definition<br />
A <strong>Euclidean domain</strong> $E$ is an integral domain with a function $f:R-\{0\}\rightarrow\Z_{\ge0}$ such that for any $a,b\in R$ with $b\neq0$, there exists $q,r\in R$ where $a=bq+r$ and $r=0$ or $f(r)<f(b)$.</p>
</blockquote>
<p>In essence, you can perform division in $E$, and there’s a sense in which the remainder is smaller than what you started with. Examples include $\Z$ where $f(n)=|n|$ and any field $k$ with $f(x)=1$. A more interesting example is the Gaussian integers $\Z[i]$ with $f(a+bi)=a^2+b^2$. If you’ve been paying attention, you’ll notice that there was no $R\text{ PID}\implies R[x]\text{ PID}$ theorem; this is because this statement is false (for a counter example, consider $R=\Z$. The ideal $(2,x)\subset\Z[x]$ is not principal). However, with strong assumptions, you can get something almost like this.</p>
<blockquote>
<p>Theorem<br />
If $k$ is a field, then $k[x]$ is a Euclidean domain</p>
</blockquote>
<div class="proof2">
Pf sketch: For your function $f:k[x]-\{0\}\rightarrow\Z_{\ge0}$, you just use $f(p)=\deg p$. With this choice, polynomial long division gets you what you need. Since we're working over a field, you can always scale the leading coefficient of the divisor to cancel out all higher order terms of the dividend so that the remainder has strictly smaller degree. $\square$
</div>
<blockquote>
<p>Theorem<br />
Every ED is a PID</p>
</blockquote>
<div class="proof2">
Pf: Let $E$ be a Euclidean domain with ideal $I$. Pick non-zero $x\in I$ so that $f(x)$ is minimal among elements of $I$. Now, consider any $a\in I$ and divide to get $a=xq+r$ where $r=0$ or $f(r)< f(x)$. We claim that $a\in(x)$. Note that $r=a-xq\in I$ so $f(r)\ge f(x)$ (if $r\neq0$) by minimality of $x$. This means that $r=0$ so $a=xq\in(x)$ as desired and so $I=(x)$ is principal. $\square$
</div>
<p>Hence, the polynomial ring over any field is a PID.</p>
<h1 id="a-glimpse-of-field-theory">A Glimpse of Field Theory</h1>
<p>Hopefully there’s nothing major I forgot to say <sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup>. With this last bit, I want to mention one neat result about fields. For this, I’m going to need to assume you know a little linear algebra: specifically, the definition of a vector space over a field, and the fact that every vector space has a basis. We’ll use this to show that the sizes of fields are pretty constrained.</p>
<blockquote>
<p>Definition<br />
Let $F,E$ be fields and assume that $F\subseteq E$. We call $E$ an <strong>extension field</strong> of $F$ and denote this $F/E$</p>
</blockquote>
<p>One of the most important things about extension fields is that if $F/E$ is a field extension, then $F$ is an $E$-vector space! Although it’s not difficult to see, you should verify this claim. It basically boils down to the fact that multiplication is linear.</p>
<blockquote>
<p>Definition<br />
The <strong>degree</strong> of a field extension $E/F$ is $[E:F]=\dim_FE$ the dimension of $F$ as an $E$-vector space</p>
</blockquote>
<p>With that, our last result</p>
<blockquote>
<p>Theorem<br />
Let $E$ be a finite field. Then, $|E|=p^n$ for some prime $p$ and integer $n$</p>
</blockquote>
<div class="proof2">
Pf: Let $p=\Char E$ which must be prime since it's nonzero. Let $F$ be $E$'s so-called $\textit{prime subfield}$ which is the image of the map $\Z\rightarrow E$. Finally, let $n=[E:F]$ and let $e_1,\dots,e_n\in E$ be an $F$-basis for $E$. Then, every element of $E$ can be written uniquely in the form $$a_1e_1+\dots+a_ne_n$$ where $a_i\in F$. Since $|F|=\Char E=p$, and there are $n$ coefficient's to choose, there are $p^n$ expressions of this form and correspondingly $|E|=p^n$. $\square$
</div>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Also, such objects don’t show up in practice then often (read: ever) <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Matrices are the standard example of non-commutative rings <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>(multiplicative) inverses are unique when they exists, if its a ring with unity that (-1)a = -a (i.e. -1 times a is the additive inverse of a), etc. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>While typing this, I realized I don’t know if people ever work in polynomials over non-commutative or non-unital rings <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>This is shorthand for a map that given a polynomial p, returns a function that takes some element r and returns p(r), the result of evaluation p at r <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>coefficient of $x^d$ where $d=\deg p$ <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>This is how I’ve always seen characteristic define, but it leads to confusing notation like n1:=1+1+…+1 (n times), so you write things like n1=(ab)1=(a1)(b1) and it gets annoying to keep track of which 1’s matter and which you can drop because identity. I much prefer making explicit mention to a ring map <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>previously defined <a href="../solving-pell">here</a> <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>the zero ideal and the field itself <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Worst-case scenario, I just edit this post later on to add in anything missing <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>This is the second post in a series that serves as an introduction to abstract algebra. In the last one, we defined groups and certain sets endowed with a well-behaved operation. Here, we’ll look at rings which are what you get when your set has two operations defined on it, and we’ll see that much of the theory for groups has a natural analogue in ring theory.Addition Done Right2017-09-20T03:00:00+00:002017-09-20T03:00:00+00:00https://nivent.github.io/blog/addition<p>This post will go over the material already nicely covered by <a href="https://pdfs.semanticscholar.org/b44b/eb7ff396be62e548e4a6dc39df0bdf65e593.pdf">this document</a>, so if you want, you could just read that instead. The main purpose of reproducing things here is for me to think more actively about the ideas presented there, and to see what I’ll want to do differently. This post will assume as much group theory as I covered in my <a href="../group-intro">last post</a>.</p>
<div id="latex-commands" class="latex-commands">
$\DeclareMathOperator{\T}{\mathcal T}
\DeclareMathOperator{\O}{\mathcal O}
\DeclareMathOperator{\H}{\mathcal H}
\DeclareMathOperator{\B}{\mathcal B}
\DeclareMathOperator{\z}{\mathcal Z}
\newcommand{\rep}[2]{\left[#1\mid#2\right]}$
</div>
<p>We will begin by looking at how we add 2-digit numbers, and from there, we’ll consider addition rules different from the normal one we learn in grade school, and this will lead us into some of the ideas that arise in the study of group cohomology. I don’t myself actually know a lot about cohomology, so forgive any mistakes.</p>
<h1 id="the-setup">The setup</h1>
<p>The setup for the meat of this post might seem needlessly complicated, but why we present things this way will become more clear as we move into later sections. As I mentioned before, we begin with grade school addition of two-digit numbers <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Since we only care about two-digit numbers, we conduct our math here in $\Z_{100}\simeq\Z/100\Z$ <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>. $\Z_{100}$ contains $\Z_{10}$ as a subgroup realized as the multiplies of $10$; we will denote this subgroup $\T$ for tens. Because getting onoly one copy of <script type="math/tex">\Z_{10}</script> from <script type="math/tex">\Z_{100}</script> would be boring, we note that $\Z_{100}/\T\simeq\Z_{10}$ as well and call this quotient group $\O$ for ones. For a sneak preview of what’s to come, note that we have the following short exact sequence.</p>
<script type="math/tex; mode=display">\begin{CD}
0 @>>> \T @>>> \Z_{100} @>>> \O @>>> 0
\end{CD}</script>
<p>Above, $0$ denotes the trivial group, and the maps into/out of $\Z_{100}$ are the inclusion and quotient maps, respectively.</p>
<p>Before we can start adding members of $\Z_{100}$, we need to agree on way to represent it’s members. To this end, every member of $\Z_{100}$ will be represented as $\rep ab$ where $a\in\T$ and $b\in\O$. In this notation we have, for example, $43=\rep43$, $8=\rep08$, and $90=\rep90$. Now, if we want to write the usual addition law, we cannot simply say that $\rep ab+\rep cd=\rep{a+c}{b+d}$ since $\rep78+\rep03=\rep81\neq\rep71$. In particular, the addition law here comes equipped with a notion of carry so that in general</p>
<script type="math/tex; mode=display">\rep{a_1}{b_1}+\rep{a_2}{b_2}=\rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2}</script>
<p>where $z:\O\times\O\rightarrow\T$ is the function defined by $z(b_1,b_2)=1$ when $b_1+b_2\ge10$ and $z(b_1,b_2)=0$ otherwise <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<blockquote>
<p>Aside<br />
The above definition for $z$ is technically nonsense since $b_1,b_2\in\O$ and there’s no notion of ordering in $\O$ (even if there was, $10=0$ in $\O$ so it wouldn’t be helpful here). To make the definition rigourous, we would want to introduce a specific mapping from $\O$ to $\Z$ and then use the order on $\Z$ to be able to compactly describe $z$. However, this is just boilerplate so I didn’t bother.</p>
</blockquote>
<p>This set of symbols ($\rep ab$ for $a\in\T$ and $b\in\O$) along with this addition law completely characterizes $\Z_{100}$. We will soon see what happens when we define different addition laws, but first we will observe an interesting property of $z$. Recall that addition is associative so</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
(\rep{a_1}{b_1} + \rep{a_2}{b_2}) + \rep{a_3}{b_3} &= \rep{a_1}{b_1} + (\rep{a_2}{b_2} + \rep{a_3}{b_3})\\
\rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2} + \rep{a_3}{b_3} &= \rep{a_1}{b_1} + \rep{a_2+a_3+z(b_2,b_3)}{b_2+b_3}\\
\rep{a_1+a_2+a_3+z(b_1,b_2)+z(b_1+b_2,b_3)}{b_1+b_2+b_3} &= \rep{a_1+a_2+a_3+z(b_2,b_3)+z(b_1,b_2+b_3)}{b_1+b_2+b_3}\\
z(b_1,b_2) - z(b_1,b_2+b_3) &+ z(b_1+b_2,b_3) - z(b_2,b_3) = 0
\end{align*} %]]></script>
<p>The equation we got at the end we call the <strong>cocycle condition</strong>. We also observe that $z$ satisfies the so-called <strong>normalization condition</strong> which says that $z(0,b)=0=z(b,0)$ for all $b\in\O$. Inspired by this, we make the following definition.</p>
<blockquote>
<p>Definition<br />
A function $z:\O\times\O\rightarrow\T$ is called a <strong>cocycle</strong> if it satisfies the cocycle and normalization conditions.</p>
</blockquote>
<p>So this is intersting. By looking at addition in 2-digits, we arrived at a way of completely characterizing $\Z_{100}$ in terms of symbols $\rep ab$ formed from members of two groups - $\T$ and $\O$ - and a so-called cocycle. A natural question is whether or not we can get other groups from different choices of a cocycle.</p>
<p>The answer of course is yes. We will see this in more generality in a second, but first a quick example. Consider the trivial cocycle given by $z(b_1,b_2)=0$ for all $b_1,b_2\in\O$. With this choice of cocycle, addition is given by $\rep ab+\rep cd=\rep{a+c}{b+d}$ so that for example</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\rep27 &+& \rep41 &=& \rep68\\
\rep99 &+& \rep05 &=& \rep94\\
\rep43 &+& \rep27 &=& \rep60
\end{matrix} %]]></script>
<p>It is not to difficult to see that this cocycle gives rise to the group $\Z_{10}\times\Z_{10}$ with the identification given by $\rep ab\leftrightarrow(a,b)$.</p>
<h1 id="extensions-are-cocycles">Extensions are cocycles</h1>
<p>In the previous section, we saw how the standard carrying function let’s us construct $\Z_{100}$ from $\T$ and $\O$. We also saw that replacing this function by the $0$ function instead let’s us construct $\Z_{10}\times\Z_{10}$. In general, any cocycle $z:\O\times\O\rightarrow\T$ gives rise to some (abelian) group of order 100 constructed from $\T$ and $\O$. One simple source of cocycles is multiples of the standard carrying function, and in fact it can shown that these get you all abelian groups of order 100. The question still remains, “which other choices of $z$ let us construct abelian groups?”. $z$ cannot be any function because it has to be a cocycle, but which functions are cocycles?</p>
<blockquote>
<p>Exercise<br />
Show that any cocycle gives rise to an abelian group. That is, if $z:\O\times\O\rightarrow\T$ is a cocycle, then we can form an abelian group on the symbols $\rep ab$ for $a\in\T$ and $b\in\O$ with addition given by<center>
$$\rep{a_1}{b_1} + \rep{a_2}{b_2} = \rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2}$$</center></p>
</blockquote>
<p>Recall that an extension $E$ of $\T$ by $\O$ is an abelian group s.t. a short exact of the following form exists</p>
<script type="math/tex; mode=display">\begin{CD}
0 @>>> \T @>>> E @>>> \O @>>> 0
\end{CD}</script>
<p>Furthermore, existence of such a sequence implies that $\T\le E$ and $E/\T\simeq\O$. Now, let $E$ be an arbitrary extension of $\T$ by $\O$ with quotient map $p:E\rightarrow\O$. We want the describe the possible group structures of $E$. First, since $\T\le E$, for any $a\in\T$ write it’s equivalent element in $E$ as $\rep a0$. Now, for each $b\in\O$, pick some element $x_b\in E$ and write this element as $x_b=\rep0b\in E$ s.t. $p(x_b)=b$. Finally, for any $a\in\T$ and $b\in\O$, let $\rep ab$ denote the element $\rep a0+\rep0b\in E$.</p>
<blockquote>
<p>Theorem<br />
Every element of $E$ can be written uniquely in the form $\rep ab$.</p>
</blockquote>
<div class="proof3">
Pf: Pick any $x\in E$, and let $b=p(x)$. Note that $p(x-\rep0b)=p(x)-p(\rep0b)=0$. Hence, $x-\rep0b\in\ker p=\T$ so choose an $a\in\T$ such that $\rep a0=x-\rep0b$. Then, $x=\rep ab$. For uniqueness, note that there are 100 elements of $E$ of the form $\rep ab$. Furthermore, since $E/\T\simeq\O$, Langrange implies that $|E|/|\T|=|\O|$ so $E$ only has 100 elements. The theorem follow. $\square$
</div>
<p>The above theorem let’s us safely assume elements of $E$ are written in the form $\rep ab$. We now wish to understand addition on $E$. From viewing $\T$ as a subgroup of $E$, we can see that</p>
<script type="math/tex; mode=display">\rep{a_1}0+\rep{a_2}0=\rep{a_1+a_2}0</script>
<p>When can ask what happens when we instead add $\rep0{b_1}+\rep0{b_2}$. This must be $\rep ab$ for some $a\in\T$ and $b\in\O$. By applying $p$ to the sum, we see that $b=b_1+b_2$. However, we do not know for sure what $a$ is. It will be $0$ in the case that $E\simeq\T\times\O$ but not always.</p>
<blockquote>
<p>Definition<br />
Given an extension $E$ of $\T$ by $\O$, the <strong>associated cocycle</strong> of $E$ is the function $z:\O\times\O\rightarrow\T$ given by the formula<br /><center>
$$\rep0{b_1}+\rep0{b_2}=\rep{z(b_1,b_2)}{b_1+b_2}$$</center></p>
</blockquote>
<p>By making use of the associated cocycle, we see that addition on $E$ in general is given by</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\rep{a_1}{b_1} + \rep{a_2}{b_2} &= \rep{a_1}0 + \rep{a_2}0 + \rep0{b_1} + \rep0{b_2} \\
&= \rep{a_1+a_2}0 + \rep{z(b_1,b_2)}{b_1+b_2} \\
&= \rep{a_1+a_2}0 + \rep{z(b_1,b_2)}0 + \rep0{b_1+b_2} \\
&= \rep{a_1+a_2+z(b_1,b_2)}0 + \rep0{b_1+b_2} \\
&= \rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2}
\end{align*} %]]></script>
<blockquote>
<p>Theorem<br />
Let $E$ by an extension of $\T$ by $\O$. Then, the associated cocycle $z$ of $E$ actually is a cocyle.</p>
</blockquote>
<div class="proof3">
Pf: The cocycle condition follows from associativity of addition on $E$. Normalization follows from $\rep ab+\rep00=\rep ab=\rep00+\rep ab$. $$\square$$
</div>
<p>Thus, every extension gives rise to a cocycle, and every cocycle gives rise to an extension <sup id="fnref:3:1"><a href="#fn:3" class="footnote">3</a></sup> !</p>
<h1 id="coboundaries">Coboundaries</h1>
<p>We’ve just shown that the problem of understanding cocycles and which functions they can be is connected to the problem of understand extensions of $\T$ by $\O$, and so we wonder if this connection is 1-1; that is, does every choice of cocycle give rise to a different extension?</p>
<blockquote>
<p>Definition<br />
A group homomorphism $\phi:E\rightarrow E’$ of extensions of $\T$ by $\O$ is called an <strong>isomorphism of extensions</strong> if $\phi$ restricts to the identity on $\T$ and the induced map on quotients $\bar\phi:\O\rightarrow\O$ is the identity on $\O$.</p>
</blockquote>
<p>Consider some isomorphism of extensions $\phi:E\rightarrow E’$. The condition that $\phi$ is the identity of $\T$ says that $\phi(\rep a0)=\rep a0$ for any $a\in\T$. The fact that it’s the identity on the induced quotient map says that $\phi(\rep ab)=\rep{a’}b$ for some $a’\in\T$ depending $a$ and $b$.</p>
<p>To study this dependence further, let $h:\O\rightarrow\T$ be the function defined by</p>
<script type="math/tex; mode=display">\phi(\rep0b)=\rep{h(b)}b</script>
<p>Then, letting $z,z’$ denote the associated cocycles of $E,E’$ repsectively, we can perform the following manipulations to determine a condition linking the associated cocycles of $E$ and $E’$</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\phi(\rep0{b_1} + \rep0{b_2}) &=& \rep{h(b_1)}{b_1} + \rep{h(b_2)}{b_2} &=& \rep{h(b_1)+h(b_2)+z'(b_1,b_2)}{b_1+b_2}\\
\phi(\rep0{b_2} + \rep0{b_1}) &=& \phi(\rep{z(b_1,b_2)}{b_1+b_2}) &=& \rep{z(b_1,b_2)+h(b_1+b_2)}{b_1+b_2}
\end{matrix} %]]></script>
<p>This let’s us see that</p>
<script type="math/tex; mode=display">\begin{align*}
h(b_1) + h(b_2) + z'(b_1,b_2) = z(b_1,b_2) + h(b_1+b_2) \implies z(b_1,b_2)-z'(b_1,b_2) = h(b_1) - h(b_1+b_2) + h(b_2)
\end{align*}</script>
<p>so for any two isomorphic extensions, the difference of their cocycles must be in this form. The converse of this claim holds true as well.</p>
<blockquote>
<p>Exercise<br />
Suppose that $E,E’$ are extensions with associated cocycles $z,z’:\O\times\O\rightarrow\T$ such that there exists some function $h:\O\rightarrow\T$ for which <sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> <center>
$$z(b_1,b_2)-z'(b_1,b_2) = h(b_1) - h(b_1+b_2) + h(b_2)$$</center>
Then, $E$ and $E’$ are isomorphic extensions.</p>
</blockquote>
<p>This is reason enough for functions of this form to be given a definition, prompting</p>
<blockquote>
<p>Definition<br />
Given a function $h:\O\rightarrow\T$ such that $h(0)=0$, it’s <strong>coboundary</strong> is the function $\delta h:\O\times\O\rightarrow\T$ given by <center>$$ \delta h(b_1,b_2)=h(b_1)-h(b_1+b_2)+h(b_2)$$</center></p>
</blockquote>
<p>Given the above investigation into isomorphic extensions and exercise, we can state, and have already proven, the following theorem.</p>
<blockquote>
<p>Theorem<br />
Two extensions are isomorphic if and only if their associated cocycles differ by a coboundary.</p>
</blockquote>
<p>One consequence of this is that the correspondence between cocycles and extensions observed in the last section is not 1-1.</p>
<h1 id="cohomology">Cohomology</h1>
<p>In the section, we’ll prove what was for me the most surprising result of the paper. To begin, let $\z(\T;\O)$ and $\B(\T;\O)$ denote the sets of cocycles and coboundaries, repectively.</p>
<blockquote>
<p>Proposition<br />
$\z(\T;\O)$ and $\B(\T;\O)$ are abelian groups.</p>
</blockquote>
<div class="proof3">
Pf: Left as an exercise
</div>
<blockquote>
<p>Proposition<br />
$\B(\T;\O)\le\z(\T;\O)$</p>
</blockquote>
<div class="proof3">
Pf: Pick some coboundary $\delta h\in\B(\T,\O)$. We need to show that it satisfies the cocycle and normalization conditions. The normalization condiditon follows from the requirement that $h(0)=0$ so <center>$\delta h(0,b) = h(0) - h(b) + h(b) = 0 = h(b) - h(b) + h(0) = h(b,0)$</center>
For the cocycle condition, you do algebra and expand definitions and it works out. $\square$
</div>
<p>Now, we do the following.</p>
<blockquote>
<p>Definition<br />
The <strong>cohomology group</strong> $\H(\T;\O)$ is the quotient group $\frac{\z(\T;\O)}{\B(\T;\O)}$</p>
</blockquote>
<p>This is amazing because it says exactly that the cohomology group is what you get when you take cocycles, and then consider two of them equal whenever they differ by a coboundary! In other words, the last theorem of the previous section can be restated as</p>
<blockquote>
<p>Theorem<br />
The cohomology group $\H(\T;\O)$ is isomorphic to the set of (isomorphism classes of) extensions of $\T$ by $\O$.</p>
</blockquote>
<p>So we have to abelian groups $\T$ and $\O$. From these we can form several different extensions, some of which are equivalent. If we look at these extensions of abelian groups under our notion of equivalency, they themselves form an abelian group. As such, there must be some notion of addition of extensions recoverably from this identification with $\H(\T;\O)$. We can take 2 extensions (2 abelian groups), and add them in a well-behaved way in order to produce a new abelian group that is also an extension. That really caught me off guard when I was reading the paper. The last thing we do will be to describe this notion of addition.</p>
<p>Let $E,E’$ be two extensions of $\T$ by $\O$ with associated cocycles $z,z’$. The isomorphism referenced in the above theorem works by sending $E\mapsto z + \B(T;\O)$ so whatever, $E+E’$ is, we should have $E+E’\mapsto z+z’$. Thus, $E+E’$ is (up to isomorphism of extensions) simply just the extension with associated cocyle $z+z’$. This is perhaps unsurprising in hindsight, but is still a notion one might not think to consider before coming across this cohomology group. If you would like a more group-theoretic description of $E+E’$ instead of the one I gave in terms of cocycles, then check out page 802 of <a href="https://pdfs.semanticscholar.org/b44b/eb7ff396be62e548e4a6dc39df0bdf65e593.pdf">the paper</a> <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>.</p>
<h1 id="final-words">Final Words</h1>
<p>The ideas appearing in this post <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup> belong to the filed of group cohomology, which looks to be pretty interesting. It’s my understanding that there are many types of cohomologies out there in use in different fields of mathematics, and I believe cohomology grew out of topology. There, you are interested in characterizing the holes of some space by looking at which loops can be contracted to a point <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>. It’s from consideration here that the terms cocycle and coboundary got their names.</p>
<p>Finally, to satisfy any last curiosites, it is know that $\H(\T;\O)\simeq\Z_{10}$ with an isomorphism $f:\Z_{10}\rightarrow\H(\T;\O)$ determined by $f(1)=z+\B(\T;\O)$ where $z$ is the carrying cocycle (associated to $\Z_{100}$) we started this post off with. This means that every extension of $\T$ by $\O$ arises as some multiple of $\Z_{100}$ in the sense of (repeated) extension addition.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>With a twist, the result can’t have a third digit. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>I’ll stick to the former notation because it’s more compact, but wanted both here because the latter is also common. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Previous exercise showed that cocycles give rise to abelian groups. It’s not hard to see that these groups are extensions of T by O <a href="#fnref:3" class="reversefootnote">↩</a> <a href="#fnref:3:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:4">
<p>the pdf also includes the condition that h(0)=0, but I’m pretty sure that is redundant <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>although it doesn’t say much in how they arrived at such a construction <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>cocycles and coboundaries and their quotient groups and whatnot <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>in particular, if a loop can’t be contracted to a point then this indicates the existence of some hole in the way of the would-be contraction. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>This post will go over the material already nicely covered by this document, so if you want, you could just read that instead. The main purpose of reproducing things here is for me to think more actively about the ideas presented there, and to see what I’ll want to do differently. This post will assume as much group theory as I covered in my last post.Algebra Part I2017-09-12T05:00:00+00:002017-09-12T05:00:00+00:00https://nivent.github.io/blog/group-intro<p>I have ideas for <a href="../addition">a couple</a> posts I want to write, but unfortunately, they both will require some level of familiarity with abstract algebra, and I don’t want to just assume the reader has the necessary prereq and then go on writing them. Instead, I’ve given myself the ambitious <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> goal of introducing most of the relevant algebra (spoiler: <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>) in a series of blog posts beginning with this one on group theory.</p>
<blockquote>
<p>Bit of a Disclaimer<br />
I can’t possibly mention everything on a particular subject in one post, and I am not a particular fan of writing insanely long posts, so some things have to be cut. In particular, I aim to introduce most of the important topics in each subject without necessarily doing a deep dive, and while I will try to mention specific examples of things, I won’t spend too much time looking at them closely. It will be up to you to take the time to make sure the example makes sense. Because of this, I’ll try to include exercises that should be good checks of understanding. Finally, as always, things are presented according to my tastes and according to whatever order they happen to pop into my head; hence, they are not necessarily done the usual way.</p>
</blockquote>
<h1 id="whats-a-group">What’s a group?</h1>
<p>The natural place to start is with the definition of a group. Broadly speaking, groups follow two important themes in mathematics. These are the idea that, in math, we like to study collections of object that possess some kind of structure, and the idea that symmetry is often beneficial to doing mathematics. With that said, a group is intuitively a collection of symmetries <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> where you can think of a symmetry as some action <sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> on an object that preserves it’s shape. We’ll see this more explicitly with our first example of a group.</p>
<p>Before we get to a formal definition, let’s look at <script type="math/tex">D_8</script>, the group of symmetries of a square. These are all the actions you can perform on a square that leave it visually unchanged. To help make sense of what they are, we’ll visualize them using a square with labelled vertices.</p>
<center>
<img src="https://nivent.github.io/images/blog/group-intro/d8.jpeg" width="200" height="200" />
</center>
<p>Above is our square starting out. The simplest symmetry we can apply to it is the “do nothing” symmetry. However, that’s not a particularly exciting thing to do, so we spend a little more time thinking about how we can reposition the square, and decide to try rotating it 90 degrees (counterclockwise). After thinking about it some more, we realize we could also flip the square about it’s main diagonal (from 1 to 3)</p>
<center>
<img src="https://nivent.github.io/images/blog/group-intro/rot.jpeg" width="200" height="200" />
<img src="https://nivent.github.io/images/blog/group-intro/diag.jpeg" width="200" height="200" />
</center>
<p>Now the interesting thing happens when we try to compose these. What if we rotate and then flip (left image), or flip and then rotate (right image)?</p>
<center>
<img src="https://nivent.github.io/images/blog/group-intro/fr.jpeg" width="200" height="200" />
<img src="https://nivent.github.io/images/blog/group-intro/rf.jpeg" width="200" height="200" />
</center>
<p>The first thing we notice is that these images are different, so order of symmetris matter. The second thing we may notice is that the image on the left is the previous right image (the flip) rotated 270 degrees, while the image on the right is the previous left image (the rotation) flipped across the other diagonal. Letting <script type="math/tex">R</script> denote a <script type="math/tex">90\deg</script> rotation, <script type="math/tex">F</script> denote a flip across the main diagonal, and <script type="math/tex">F'</script> denote a flip across the other diagonal, symbolically, we have</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
FR = R^3F && RF = F'R
\end{align*} %]]></script>
<p>where <script type="math/tex">AB</script> denote the result of first applying <script type="math/tex">B</script> and then applying <script type="math/tex">A</script>. Now, there is more that can be said about this group, but I’ll leave the exploring to you.</p>
<p>From the above example, we can see some properties we might want in order to call something a collection of symmetries (i.e. a group). First, there should be a “do nothing” symmetry that just leaves things unchanged. Second, symmetries don’t necessarily have to commute (that is, <script type="math/tex">AB\neq BA</script> in general), but should always be composable. I didn’t highlight these last two, but we should also expect that applying symmetries one after the other always gets you the same thing as long as you apply them in the same order (e.g. <script type="math/tex">A(BC)</script> and <script type="math/tex">(AB)C</script> should be the same), and we want to be able to undo an symmetry. Putting all these together, we get the following defintion.</p>
<blockquote>
<p>Definition<br />
A <strong>group</strong> <script type="math/tex">(G,\star)</script> is a set <script type="math/tex">G</script> together with an operation <script type="math/tex">\star:G\times G\rightarrow G</script> satisfying the following for all <script type="math/tex">a,b,c\in G</script><br />
<script type="math/tex">% <![CDATA[
\begin{align*}
&\bullet a\star(b\star c)=(a\star b)\star c && \text{Associativity}\\
&\bullet \text{there exists an }e\in G\text{ such that }g\star e=g=e\star g\text{ for all }g\in G && \text{Identity}\\
&\bullet \text{there exists a }g\in G\text{ such that }g\star a=e=a\star g && \text{Inverses}
\end{align*} %]]></script><br />
If it turns out that additionally, <script type="math/tex">a\star b=b\star a</script> for all <script type="math/tex">a,b\in G</script>, then <script type="math/tex">G</script> is called an <strong>abelian group</strong> <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup></p>
</blockquote>
<p>That’s all there is to it. 3 simple conditions, and we’ll see that groups exhibit some well-behaved properties <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>. Most of the time when talking about a specific group, the underlying operation will be understood, and so I’ll just refer to the group as <script type="math/tex">G</script> instead of <script type="math/tex">(G,\star)</script>. Furthermore, the group operation is usually denoted <script type="math/tex">ab</script> instead of <script type="math/tex">a\star b</script> cause mathematicians are lazy. If the group is abelian, then <script type="math/tex">a+b</script> is also common. Now that we have a definition, try to answer as many of these as you can.</p>
<blockquote>
<p>Question<br />
Are the following groups, abelian groups, or neither?</p>
<ul>
<li>$D_8$</li>
<li>$(\Z,+)$</li>
<li>$(\Z,*)$</li>
<li>$(\R,+)$</li>
<li>$(\R,*)$</li>
<li>$(\R-\{0\},*)$</li>
<li>$(M_2(\R),+)$ the set of <script type="math/tex">2\times2</script> matrices under addition</li>
<li>$(\Q-\{0\},*)$</li>
<li>$D_{2n}$ the set of symmetrices of a regular <script type="math/tex">n</script>-gon under composition</li>
<li>$(2\Z, +)$ the even numbers under addition</li>
<li>$(\Z_{12}, +)$ the integers mod 12 under addition</li>
<li>$(\Z_7-\{0\}, *)$ the integers mod 7 (except 0) under multiplication</li>
<li>the empty set under any operation you like</li>
<li>{e} the singleton consisting of only the identity</li>
</ul>
</blockquote>
<h1 id="basic-properties-of-groups">Basic Properties of Groups</h1>
<p>Alright, now that we know what a group is, let’s see some of the benefits of studying them. The most obvious one is generality. If we show something is true for an arbitary group <script type="math/tex">G</script>, then we automatically know it’s true for the integers or the reals or matrices or what have you. So to start off, let’s prove some basic facts about groups.</p>
<blockquote>
<p>Theorem<br />
The identity element of a group is unique</p>
</blockquote>
<div class="proof3">
Pf: Let $$G$$ be a group, and suppose that both $$e$$ and $$f$$ are identity elements. That is, for any $$a\in G$$, we have $$ae=a=ea$$ and $$af=a=fa$$. Hence, the theorem follows from<br />
$$\begin{align*}
e=ef=f
\end{align*}$$
where we used the fact that $$f$$ is the identity on the left equality, and the fact that $$e$$ is the identity on the right equality. $$\square$$
</div>
<p>The above theorem maybe isn’t too surprising. It basically says that there’s only one way to do nothing. The next theorem maybe also isn’t surprising either.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">a\in G</script> be an element of a group. Then, <script type="math/tex">a</script> has a unique inverse.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<p>Now that we know that inverses are unique, we’ll denote the inverse of <script type="math/tex">a\in G</script> by <script type="math/tex">a^{-1}</script> (or <script type="math/tex">-a</script> if <script type="math/tex">G</script> is abelian). We’ll see a couple more proofs, and then we’ll get a look at something maybe a little less abstract.</p>
<blockquote>
<p>Theorem (Socks and Shoes <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>)<br />
You put socks on before wearing shoes, but you have to remove your shoes before you can remove your socks. Symbollically, in a group <script type="math/tex">G</script>, for any <script type="math/tex">a,b\in G</script>, we have <script type="math/tex">(ab)^{-1}=b^{-1}a^{-1}</script></p>
</blockquote>
<div class="proof3">
Pf: <br />
$$\begin{align*}
(ab)(b^{-1}a^{-1}) &= a(bb^{-1})a^{-1}\\ &= aea^{-1}\\ &= aa{-1} \\ &= e
\end{align*}$$<br />
Since inverse are unique, we must have $$(ab)^{-1}=b^{-1}a^{-1}$$.$$\square$$
</div>
<blockquote>
<p>Theorem<br />
If <script type="math/tex">(G,\star_G)</script> and <script type="math/tex">(H,\star_H)</script> are groups, then we can form the group <script type="math/tex">G\times H</script> of pairs of elements of <script type="math/tex">G</script> and <script type="math/tex">H</script> with product <script type="math/tex">(g_1,h_1)(g_2,h_2)=(g_1\star_Gg_2,h_1\star_Hh_2)</script></p>
</blockquote>
<div class="proof3">
Pf: Exercise to the reader. Verify the group properties.
</div>
<p>Note that the above group is called the <strong>direct product</strong> of <script type="math/tex">G</script> and <script type="math/tex">H</script>. It is sometimes also denoted <script type="math/tex">G\oplus H</script>, and called their <strong>direct sum</strong>. These notions are the same here, but differ if you start talking about the direct product/sum of infinite collections of groups; however, we won’t get into that here.</p>
<h1 id="structure-of-groups">Structure of Groups</h1>
<p>Let’s return to our <script type="math/tex">D_8</script> example. Recall that <script type="math/tex">R</script> denotes rotation by 90 degrees and <script type="math/tex">F</script> denotes a flip about the main diagonal. A fact that I will not prove here, but that you can spend some time convincing yourself of, is that <script type="math/tex">D_8</script> is generated by <script type="math/tex">F</script> and <script type="math/tex">D</script> in the following sense.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">G</script> be a group and <script type="math/tex">A\subset G</script> be a subset of <script type="math/tex">G</script>. Then, we say <script type="math/tex">G</script> is <strong>generated by</strong> <script type="math/tex">A</script> if every elment of <script type="math/tex">G</script> is some (finite) product of elements of <script type="math/tex">A</script>. Furthermore, if <script type="math/tex">A</script> is finite <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>, then we say that <script type="math/tex">G</script> is <strong>finitely generated</strong>.</p>
</blockquote>
<p>The remark about the definition amounts to saying that any symmetry of a square is really just the result of a bunch of flips and rotations. I mention this just out of curiousity. I had planned on using it as justification in giving a more explicity description for <script type="math/tex">D_8</script>, but unfortunately, the description would be longer than I’m willin to type out, so let’s look at a different group instead.</p>
<p>Out new star group is <script type="math/tex">\Z_4</script> which is the integers under addition (mod 4). This group is special for a few reasons, but most of these are a result of it being generated by 1 element which motivates the following definition.</p>
<blockquote>
<p>Definition<br />
A group <script type="math/tex">G</script> is called <strong>cyclic</strong> if it is generated by a single element. That is, it is cyclic there exists some <script type="math/tex">g\in G</script> s.t. every <script type="math/tex">a\in G</script> can be written in the form <script type="math/tex">g^n</script> for some integer <script type="math/tex">n</script>.</p>
</blockquote>
<p>In order to better understand <script type="math/tex">\Z_4</script>’s structure, we will look at its “multiplication” table.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{c | c | c | c}
\Z_4 & 0 & 1 & 2 & 3 \\ \hline
0 & 0 & 1 & 2 & 3 \\ \hline
1 & 1 & 2 & 3 & 0 \\ \hline
2 & 2 & 3 & 0 & 1 \\ \hline
3 & 3 & 0 & 1 & 2 \\
\end{array} %]]></script>
<p>Now, let’s consider the group <script type="math/tex">\Z_{10}^\times</script>. This is the integers (mod 10) that are coprime to 10 under multiplication. Hence, it’s multiplication table looks like</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{c | c | c | c}
\Z_{10}^\times & 1 & 3 & 7 & 9 \\ \hline
1 & 1 & 3 & 7 & 9 \\ \hline
3 & 3 & 9 & 1 & 7 \\ \hline
7 & 7 & 1 & 9 & 3 \\ \hline
9 & 9 & 7 & 3 & 1 \\
\end{array} %]]></script>
<p>Now, you look at this and ask “what’s the point?” cause we’re just looking at some random tables. However, an important observation is that things like <script type="math/tex">\{0,1,2,3\}</script> and <script type="math/tex">\{1,3,7,9\}</script> are just symbols. It doesn’t really matter how we call the elemnts of the group; all that’s important is how they relate to each other. As an exercise in this way of thinking let’s relabel these tables using the following mappings</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
&\Z_4 && \Z_{10}^\times\\
&0\mapsto a && 1\mapsto a\\
&1\mapsto b && 3\mapsto b\\
&2\mapsto c && 9\mapsto c\\
&3\mapsto d && 7\mapsto d
\end{align*} %]]></script>
<p>If we do that and remake the tables we will get the following for <script type="math/tex">\Z_4</script></p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{ c | c | c | c}
\Z_4 & a & b & c & d \\ \hline
a & a & b & c & d \\ \hline
b & b & c & d & a \\ \hline
c & c & d & a & b \\ \hline
d & d & a & b & c \\
\end{array} %]]></script>
<p>Repeat this process for <script type="math/tex">\Z_{10}^\times</script> and look at the table you get. If everything went as planned, you will see you ended up with exactly the same table. Since we said that the important thing was not the symbols but how they interacted with each other, this gives us complete justification in saying that, in some sense, <script type="math/tex">\Z_4</script> and <script type="math/tex">\Z_{10}^\times</script> are the same group. This is an idea we will now make rigorous via the notion of structure-preserving maps.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">(G,\star_G)</script> and <script type="math/tex">(H,\star_H)</script> be two groups. A <strong>homomorphism</strong> or <strong>group map</strong> <script type="math/tex">f:G\rightarrow H</script> is a function with the property that for any <script type="math/tex">a,b\in G</script>, we have <script type="math/tex">f(a\star_Gb)=f(a)\star_Hf(b)</script>. If furthermore, <script type="math/tex">f</script> is injective then we call it an <strong>embedding</strong> of <script type="math/tex">G</script> into <script type="math/tex">H</script>, and it <script type="math/tex">f</script> is bijective, then it is called an <strong>isomorphism</strong> and we say that <script type="math/tex">G</script> and <script type="math/tex">H</script> are <strong>isomorphic</strong> groups and denote this <script type="math/tex">G\simeq H</script>.</p>
</blockquote>
<p>In essence, homomorphisms let us relate the structures of two groups by saying that they are doing something similar. If the homomorphism is injective, then it is essentially saying that a copy of <script type="math/tex">G</script> lives inside of <script type="math/tex">H</script>. An example of this is the following</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
f:&\Z_3 &\longrightarrow& \Z_{15}\\
&a &\longmapsto& 5a
\end{matrix} %]]></script>
<p>It is not hard to verify that this is an injective homomorphism, so it lets us realize <script type="math/tex">\Z_3</script> as sitting inside of <script type="math/tex">\Z_{15}</script> in the form <script type="math/tex">f(\Z_3)=\{0,5,10\}</script>. By looking at their multiplication tables, we showed earlier that <script type="math/tex">\Z_4\simeq\Z_{10}^\times</script>. To get a better handle on homomorphims and see why they are the natural type of function to consider for groups, we’ll prove some quick theorems</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism. Let <script type="math/tex">e_G,e_H</script> be the identities of <script type="math/tex">G</script> and <script type="math/tex">H</script>, respectively. Then, <script type="math/tex">f(e_G)=e_H</script></p>
</blockquote>
<div class="proof3">
Pf: Fix any $$g\in G$$ so $$f(g)=f(ge_G)=f(g)f(e_G)$$. If we multiply both sides (on the left) by $$f(g)^{-1}$$, then this equation becomes $$e_H=f(g)^{-1}f(g)=f(g)^{-1}f(g)f(e_G)=f(e_G)$$ $$\square$$
</div>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism. Then, for any <script type="math/tex">a\in G</script> and <script type="math/tex">n\in\mathbb Z</script>, we have <script type="math/tex">f(a^n)=f(a)^n</script>.</p>
</blockquote>
<div class="proof3">
Pf: We will show that $$f(a^{-1})=f(a)^{-1}$$ and the theorem will follow from an easy induction argument I won't bother doing. This case is an immediate consequence of $$f(a)f(a^{-1})=f(aa^{-1})=f(e_G)=e_H$$$$\square$$
</div>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">G</script> be a group and fix some <script type="math/tex">g\in G</script>. The function <script type="math/tex">f:G\rightarrow G</script> defined by <script type="math/tex">f(x)=gx</script> is a bijection, but not a homomorphism.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be an isomorphism. Then, <script type="math/tex">f^{-1}:H\rightarrow G</script> is also an isomorphism.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\simeq</script> is an equivalence relation on the class of groups.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<h1 id="subgroups">Subgroups</h1>
<p>When introducing embeddings, I mentioned that they give us a way of viewing one group inside of another. This idea is formalized view the notion of subgroups which are pretty much exactly what they sound like.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">H\subseteq G</script> be a subset of a group <script type="math/tex">G</script>. We say <script type="math/tex">H</script> is a subgroup of <script type="math/tex">G</script>, denoted <script type="math/tex">H\le G</script>, if <script type="math/tex">H</script> itself is a group.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism. Then, <script type="math/tex">f(G)\subseteq H</script> is a subgroup of <script type="math/tex">H</script>. Furthermore, if <script type="math/tex">f</script> is injective, then <script type="math/tex">f(G)\simeq G</script>.</p>
</blockquote>
<div class="proof3">
Pf: Let $$I:=f(G)$$. To show that $$I$$ is a group, we need to show it contains the identity, has inverses, and it's multiplicatin is associative. Well, $$f(e_G)=e_H$$ and $$f(e_G)\in I$$ by definition, so we're good there. Furthermore, any element of $$I$$ can be written in the form of $$f(g)$$ for some $$g\in G$$. Hence, $$f(g^{-1})=f(g)^{-1}$$ is in $$I$$ as well, so we have inverses. Associativity follows from the fact that $$I$$'s multiplication is $$H$$'s multiplication, and it's associative. Finally, $$f\mid_I$$ is surjective by definition and so an isomorphism when $$f$$ is injective. $$\square$$
</div>
<p>One thing worth mentioning is that while the definition of a subgroup requires that it first be a subset, we usually ignore this part and call any <script type="math/tex">H</script> embaddable into <script type="math/tex">G</script> a subgroup of <script type="math/tex">G</script>. Now that we have this notion of a subgroup, let’s see what we can do with it.</p>
<blockquote>
<p>Theorem (2-step subgroup test)<br />
Let <script type="math/tex">H\subseteq G</script> be a non-empty subset of a group. Then, <script type="math/tex">H</script> is a subgroup of <script type="math/tex">G</script> if it is closed under multiplication, and contains inverses for each element.</p>
</blockquote>
<div class="proof3">
Pf: Multiplication on $$H$$ inherits associativity from multiplication on $$G$$, and it has inverses by assumption. The group operation of $$H$$ is well-defined since it is closed, so the only thing to verify is that $$H$$ contains the identity. $$H$$ is non-empty so pick some $$h\in H$$. By assumption $$h^{-1}\in H$$ as well. Since $$H$$ is closed under multiplication, $$hh^{-1}\in H$$ so $$H$$ contains the identity. $$\square$$
</div>
<blockquote>
<p>Theorem (1-step subgroup test)<br />
Let <script type="math/tex">H\subseteq G</script> be a non-empty subset of <script type="math/tex">G</script>. Then, <script type="math/tex">H</script> is a subgroup of <script type="math/tex">G</script> if for all <script type="math/tex">a,b\in H</script>, we have <script type="math/tex">ab^{-1}\in H</script> as well.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">G</script> be a group and fix some element <script type="math/tex">a\in G</script>. We say <script type="math/tex">\langle a\rangle=\{a^n:n\in\Z\}</script> is the <strong>cyclic group generated by <script type="math/tex">a</script></strong>.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\langle a\rangle\le G</script></p>
</blockquote>
<div class="proof3">
Pf: Pick any two elements, $a^n,a^m\in\gen a$. Then, $(a^n)(a^m)^{-1}=(a^n)(a^{-m})=a^{n-m}\in\gen a$. Furthermore, $\gen a$ is visibly non-empty, so it's a subgroup by the 1-step test. $\square$
</div>
<p>So as an example, in <script type="math/tex">\Z</script>, <script type="math/tex">\langle3\rangle=3\Z</script> the multiplies of <script type="math/tex">3</script>. This brings up a good source of confusion. When considering <script type="math/tex">\Z</script> as a group under addition, <script type="math/tex">3^n</script> is not <script type="math/tex">3</script> raised to the <script type="math/tex">n</script>th power, but instead <script type="math/tex">3</script> times <script type="math/tex">n</script>. Luckily, <script type="math/tex">\Z</script> is abelian so this is normally written as <script type="math/tex">3n</script>, but just wanted to clarify.</p>
<p>One important notion in group theory, that unfortunately plays a minor role in this post, is that of the order of an element. For a group <script type="math/tex">G</script>, the order of the group is simply its size. For some element, <script type="math/tex">a\in G</script> its order, denoted $|a|$ is the smallest positive exponent $n$ s.t. <script type="math/tex">a^n=e</script>. If no such <script type="math/tex">n</script> exists, then <script type="math/tex">a</script> is said to have infinite order. For a finite group, every element has some finite order (why?). Calling this the order of <script type="math/tex">a</script> is justified by the following.</p>
<blockquote>
<p>Exercise<br />
<script type="math/tex">|a|=|\gen a|</script></p>
</blockquote>
<p>Note that a (finite) cyclic group is one where the order of some element is the order of the group. Furthermore, since I didn’t make this an exercise before, show that any cyclic group is abelian.</p>
<blockquote>
<p>Definition<br />
Let $f:G\rightarrow H$ be a group homomorphism. We define the <strong>kernel</strong> of $f$ as <script type="math/tex">\ker f=\{g\in G:f(g)=e\}</script> the set of elements mapped to the identity.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a group homomorphism. Then, $\ker f\le G$</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<p>Since we now know some stuff about homormorphisms and subgroups, the majority of what follows will focus on proving two main theorems: Langrange’s Theorem and the First Isomorphism Theorem. After that, I will mention something that will be useful for one of the things I wanna talk about in a future post.</p>
<h1 id="cosets">Cosets</h1>
<p>The goal for this section is to find a way to generalize modular arithmetic to arbitrary groups. In modular arithmetic we can say things like $7\equiv4\pmod3$ when $(7-4)|3$, but if you generalize divison to groups in the obvious way, you don’t get anything useful; you’d end up with any two elements being equivalent as long as you modded out by something non-zero (why?). Because of this, instead of building off of division, we will follow the idea that modding out by $3$ is a way of treating $3$ as being $0$; this choice will manifest itself in an important placed on kernels.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">H\le G</script> be a subgroup. We say that <script type="math/tex">H</script> is <strong>normal</strong> if it is the kernel of some homomorphism. We denote this <script type="math/tex">H\trianglelefteq G</script>.</p>
</blockquote>
<p>Returning to our <script type="math/tex">7\equiv4\pmod3</script> example, from the perspective of $3$ being $0$ (i.e. $3\Z$ being normal), this equivalence is really expressing that $7=4+3\equiv4+0=4$. We can take this a step further by writing <script type="math/tex">7=1+2*3</script> and $4=1+1*3$ which makes it apparent that they are equivalent because they are both $1$ more than a multiple of $3$. In the context of general groups, if we are going to treat some subgroup as being $0$, then any elements that are a fixed amount more than members of the subgroup should similarly be considered equivalent.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">H\le G</script> be a subgroup, and fix some element <script type="math/tex">a\in G</script>. A <strong>(left) coset</strong> of <script type="math/tex">H</script> is a set of the form <script type="math/tex">aH=\{ah:h\in H\}</script></p>
</blockquote>
<blockquote>
<p>Exercise<br />
Prove or disprove. For any subgroup <script type="math/tex">H\le G</script> and element <script type="math/tex">a\in G</script>, we have that <script type="math/tex">aH=Ha</script> where <script type="math/tex">Ha:=\{ha:h\in H\}</script> is a right coset.</p>
</blockquote>
<p>Hopefully you did the exercise. It turns out to be false in general, but miraculously, it is true for normal subgroups.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\trianglelefteq G</script> be a normal subgroup. Then, every left coset of <script type="math/tex">H</script> is also a right coset (and vice versa).</p>
</blockquote>
<div class="proof3">
Pf: Let $f:G\rightarrow K$ be a homomorphism with $\ker f=H$, and fix any $a\in G$. Then, an arbitrary element of $aH$ has the form $ah$ where $h\in H$. Note that $ah=aha^{-1}a$. To complete the proof we see that $f(aha^{-1})=f(a)f(h)f(a)^{-1}=f(a)ef(a)^{-1}=e$ so $aha^{-1}\in\ker f=H$ and so $ah=(aha^{-1})a\in Ha$. This shows that $aH\subseteq Ha$. The other direction can be shown analagously, so $aH=Ha$ for all $a$ which is more that sufficient to prove the claim. $$\square$$
</div>
<p>The above theorem goes to show that normal subgroups are indeed special, and as it turns out, the converse of the theorem above is true as well <sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup>, so this gives another way of characterizing normal subgroups. Now, recall that we want to come up with some notion of “modding out by a subgroup”, and so we want a way of saying when two elements of the big group are equivalent. We defined cosets with the idea that all of their members should be equivalent, and so the following shouldn’t be surprising.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\le G</script> be a (not necessarily normal) subgroup of <script type="math/tex">G</script>. Then, the relation <script type="math/tex">\sim</script> defined by <script type="math/tex">x\sim y</script> iff <script type="math/tex">xH=yH</script> is an equivalence relation.</p>
</blockquote>
<div class="proof3">
Pf: We need to show that $\sim$ is reflexive, symmetric, and transitive. It's obviously reflexive and symmetric, so we'll focus on transitivity. Suppose $x\sim y$ and $y\sim z$. Then, $xH=yH=zH$ so we have $x\sim z$, and we're done. $$\square$$
</div>
<blockquote>
<p>Corollary<br />
The cosets of <script type="math/tex">H</script> partition <script type="math/tex">G</script></p>
</blockquote>
<p>Now, we can prove our first major result of the post.</p>
<blockquote>
<p>Theorem (Lagrange)<br />
Let <script type="math/tex">H\le G</script> be a subgroup. Then, <script type="math/tex">|H|</script> divides <script type="math/tex">|G|</script></p>
</blockquote>
<div class="proof3">
Pf: Pick two cosets $aH$ and $bH$ of $G$. Then, the map $ah\mapsto (ba^{-1})ah$ is injective (result of every element having an inverse), and you get a similar injective map from $bH\rightarrow aH$. Thus, $|aH|=|bH|$ so all cosets have the same order (size). Since to cosets of $H$ partition $G$, assuming there are $k$ such cosets, we have $|G|=k|H|$. $$\square$$
</div>
<blockquote>
<p>Corollary<br />
The order of an element of a group divides the order of the group.</p>
</blockquote>
<blockquote>
<p>Corollary<br />
Every group of prime order is cyclic</p>
</blockquote>
<p>Most everything after the definition of a coset wasn’t strictly needed for this, but is still good to know. We finally say what it means to mod out by a subgroup.</p>
<blockquote>
<p>Definition<br />
The <strong>index</strong> of <script type="math/tex">H</script> in <script type="math/tex">G</script> is the number of (left) cosets of <script type="math/tex">H</script> in <script type="math/tex">G</script> and is denoted <script type="math/tex">[G:H]</script> or <script type="math/tex">|G:H|</script>.</p>
</blockquote>
<blockquote>
<p>Definition<br />
Let $H\trianglelefteq G$ be a normal subgroup. We define the <strong>quotient group</strong> $G/H$ to be the set of left cosets of $H$ together with the multiplication operation $(aH)(bH)=(ab)H$.</p>
</blockquote>
<p>When we mod a subgroup, we treat elements of the same coset as being equivalent, so instead of operating on individual elements, we operate on cosets instead. In practice, we can usally find a nice group that a quotient is isomorphic to, and so work with it instead of the quotient directly. This way, we can have quotients but still deal with elements instead of cosets. As a few examples</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\frac{\Z\times\Z}{\Z\times\{e\}}\simeq\Z && \frac{\Z}{n\Z}\simeq\Z_n
\end{matrix} %]]></script>
<p>Now that we have this definied, we need to show that this is a good definition.</p>
<blockquote>
<p>Theorem<br />
Multiplication of the quotient group is well-defined. That is, if <script type="math/tex">xH=x'H</script> and <script type="math/tex">yH=y'H</script>, then <script type="math/tex">(xy)H=(x'y')H</script>.</p>
</blockquote>
<div class="proof3">
Pf: Suppose $$H\trianglelefteq G$$ and pick $$x,x',y,y'\in G$$ s.t. $$xH=x'H$$ and $$yH=y'H$$. Then, $$(xy)H=x(yH)=x(Hy)=(xH)y=(x'H)y=x'(Hy)=x'(Hy')=x'(y'H)=x'y'H$$. $\square$
</div>
<p>The proof was short, but notice that we could only switch between left and right cosets so freely because we assumed that <script type="math/tex">H</script> was normal. If <script type="math/tex">H</script> is not normal, then this theorem is false. Now that we have well-definedness, the real crucial thing to show is up to you.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\trianglelefteq G</script>. Then, <script type="math/tex">G/H</script> is a group, and the <strong>quotient map</strong> <script type="math/tex">f:G\rightarrow G/H</script> defined by <script type="math/tex">f(x)=x+H</script> is a surjective homomorphism with $\ker f=H$.</p>
</blockquote>
<div class="proof3">
Pf: (important) exercise to the reader.
</div>
<blockquote>
<p>Exercise<br />
Prove that a subgroup <script type="math/tex">H\le G</script> is normal iff <script type="math/tex">xH=Hx</script> for all <script type="math/tex">x\in G</script>. After this, prove that this is the case iff <script type="math/tex">x^{-1}Hx\subseteq H</script> for all <script type="math/tex">x\in G</script>.</p>
</blockquote>
<blockquote>
<p>Exercise<br />
Prove that for an abelian group, every subgroup is normal.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\trianglelefteq G</script> be a normal group of finite index, and fix any <script type="math/tex">a\in G</script>. Then, <script type="math/tex">a^{[G:H]}\in H</script>.</p>
</blockquote>
<div class="proof3">
Pf: Let $$f:G\rightarrow G/H$$ be the quotient map, and note that $$|G/H|=[G:H]$$. Thus, $$f(a)=a+H$$ has order dividing $$[G:H]$$ by Lagrange, so $$f(a^{[G:H]})=(a+H)^{[G:H]}=H$$ so $$a^{[G:H]}\in\ker f=H$$. $$\square$$
</div>
<h1 id="diagrams-et-al">Diagrams et al.</h1>
<p>In this section, I’m gonna include some pictures, but they’ll be different from the type of images I usually include. Here, I’ll use so-called commutative diagrams. A diagram is a collection of objects (i.e. groups) with direct arrows (i.e. homomorphisms) drawn between them that makes it easier to discuss things when you have several functions going between different groups. We say such a diagram commutes when any path along arrows from one group to another gives the same result <sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup>. This will make more sense when we see some.</p>
<p>I mentioned before that we would prove the first isomorphism theorem. Instead of proving it directly, we will derive it as a corollary of a better theorem in my opinion.</p>
<blockquote>
<p>Factor Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism, and let <script type="math/tex">K\le\ker f</script>. Then, there exists a unique homomorphism <script type="math/tex">h:G/K\rightarrow H</script> such that <script type="math/tex">f</script> factors through <script type="math/tex">h</script> in the sense that the following diagram commutes<center>
<img src="https://nivent.github.io/images/blog/group-intro/factor.png" width="200" height="200" /></center>
That is, <script type="math/tex">f</script> is the composition of <script type="math/tex">h</script> and the quotient map. Furthermore, <script type="math/tex">h</script> is injective iff <script type="math/tex">K=\ker f</script>, and <script type="math/tex">h</script> is surjective iff <script type="math/tex">f</script> is surjective.</p>
</blockquote>
<div class="proof3">
Pf: Let $$f,G,H,K$$ be as in the statement of the theorem. We first want to show that there is a unique $$h:G/K\rightarrow H$$ such that $$h(xK)=f(x)$$ for all $$x\in G$$. Well, define $$h$$ based solely off of this equation. Every element of $$G/K$$ is of the form $$xK$$ and $$f(x)$$ is unique given a choice of $$x$$, so this gives a unique satisying $$h$$. We now need to make sure that $$h$$ is well-definied (the fact that it's a homomorphism follows from $$f$$ being one), so pick $$g,g'\in G$$ s.t. $$gK=g'K$$. Then, $$g^{-1}g'\in K$$ so $$f(g')=f(gg^{-1}g')=f(g)f(g^{-1}g')=f(g)\implies h(gK)=h(g'K)$$ so we're good. For the statements about injectivity and surjectivity, convince yourself that a homomorphism is injective iff it's kernel is trivial, and that $$\image(h)=\image(f)$$. $$\square$$
</div>
<blockquote>
<p>Corollary (First isomorphism theorem)<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a surjective homomorphism. Then, <script type="math/tex">G/\ker f\simeq H</script>.</p>
</blockquote>
<div class="proof3">
Pf: By the above theorem, $$f$$ must factor through some map $$g:G/\ker f\rightarrow H$$. Furthermore, this map must be injective and surjective since we're factoring through the full kernel and $$f$$ was surjective. Thus, we have an isomorphism. $$\square$$
</div>
<p>Lagrange’s Theorem and the first isomorphism theorem are two of the big, foundational theorems for group theory, and we’ve now proven both of them. Normally, this would be a good place to stop, but there’s one last thing I want to quickly introduce <sup id="fnref:11"><a href="#fn:11" class="footnote">11</a></sup>.</p>
<blockquote>
<p>Definition<br />
Consider a sequence of groups and homomorphisms between them<center>
$$\begin{CD}
G_1 @>f_1>> G_2 @>f_2>> G_3 @>f_3>> G_4 @>f_4>> \dots @>f_{n-1}>> G_n
\end{CD}$$</center>
We say such a sequence is <strong>exact</strong> if <script type="math/tex">\image(f_k)=\ker(f_{k+1})</script> for all <script type="math/tex">% <![CDATA[
1\le k<n %]]></script>. In particular, a <strong>short exact sequence</strong> is an exact sequence of the form<center>
$$\begin{CD}
\{e\} @>>> N @>f>> G @>g>> H @>>> \{e\}
\end{CD}$$</center>
where <script type="math/tex">\{e\}</script> is the trivial group.</p>
</blockquote>
<p>The first time I saw exact sequences, all I could think was, “Why? Who cares?” At first glance, they seem pretty artificial, but they actually give a compact way of codifying some information about how groups are related to each other. Let’s look at the short exact sequence appearing in the above definition for example. The fact that the sequence is exact that <script type="math/tex">N</script> says that the image of the incoming map (which must send the trivial element to the identity in <script type="math/tex">N</script>) is the kernal of <script type="math/tex">f</script>. This is just the statement that <script type="math/tex">\ker f=\{e\}</script> or equivalently that <script type="math/tex">f</script> is injective! Similarly, exactness at <script type="math/tex">H</script> says that the image of <script type="math/tex">g</script> is the kernel of the map sending all of <script type="math/tex">H</script> to the identity, so <script type="math/tex">\image g=H</script> and <script type="math/tex">g</script> is surjective! Finally, exactness at <script type="math/tex">G</script> says that <script type="math/tex">\image f=\ker g</script>. Since we know <script type="math/tex">f</script> is injective, this means we can embed <script type="math/tex">N</script> in <script type="math/tex">G</script> as a normal subgroup. Furthermore, since <script type="math/tex">g</script> is surjective, the first isomorphism theorem tells us that <script type="math/tex">G/N\simeq G/\image f\simeq G/\ker g\simeq H</script>, so we get the sense that <script type="math/tex">G</script> is somehow made up from <script type="math/tex">N</script> and <script type="math/tex">H</script> (the simplest example is <script type="math/tex">G\simeq N\times H</script>, and you can easily pick $f,g$ to form a short exact sequence in this case, but other choices of $G$ may work too). Because of this observation, we make our final definition.</p>
<blockquote>
<p>Definition<br />
We say <script type="math/tex">G</script> is an <strong>extension</strong> of <script type="math/tex">N</script> by <script type="math/tex">H</script> <sup id="fnref:12"><a href="#fn:12" class="footnote">12</a></sup> if there exists a short exact sequence<br /><center>
$$\begin{CD}
\{e\} @>>> N @>f>> G @>g>> H @>>> \{e\}
\end{CD}$$</center></p>
</blockquote>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>ambitious because <a href="../modular-arithmetic">historically speaking</a> I’m bad at sticking with these kinds of things <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>I imagine one post of group theory, one on rings/fields, and then one on noetherian rings and dedekind domains <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>whatever that means <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>you won’t see this much here, but it’s important to keep in mind that groups often do perform some action on an object, and studying these group actions can lead to good math. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>not in this algebra sequence, but at some point I hope to give reason to why this isn’t the most appropriate name, and these things should really be called Z-modules instead <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>not in this algebra sequence, but at some point I hope to give reason to why actually groups in general are really ugly and do some pathological things <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>Socks and Sandals if you prefer <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>This is not quite the definition of finitely generated, but works in almost all (actually, maybe all. I don’t know of a counterexample) cases <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>This might be more apparent after we define quotient groups <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Basically every diagram you see in the wild will be commutative <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
<li id="fn:11">
<p>Before I forget to mention this, exercise: look up the like 3 other isomorphism theorems. Also, there’s a decent amount of group theory you’d usually learn leading up to some of the stuff I’ve mentioned here that I didn’t bring up at all. <a href="#fnref:11" class="reversefootnote">↩</a></p>
</li>
<li id="fn:12">
<p>Some people say G is an extension of H by N instead. Doesn’t really matter <a href="#fnref:12" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>I have ideas for a couple posts I want to write, but unfortunately, they both will require some level of familiarity with abstract algebra, and I don’t want to just assume the reader has the necessary prereq and then go on writing them. Instead, I’ve given myself the ambitious 1 goal of introducing most of the relevant algebra (spoiler: 2) in a series of blog posts beginning with this one on group theory. ambitious because historically speaking I’m bad at sticking with these kinds of things ↩ I imagine one post of group theory, one on rings/fields, and then one on noetherian rings and dedekind domains ↩Solving Pell’s Equations2017-08-05T15:50:00+00:002017-08-05T15:50:00+00:00https://nivent.github.io/blog/solving-pell<p>I think this is going to end up being a long one <sup id="fnref:21"><a href="#fn:21" class="footnote">1</a></sup>, and possibly not the easiest post to follow that I’ve made; mostly because I will likely end up introducing a decent number of topics I haven’t talked about here before. I guess we’ll see how things turn out <sup id="fnref:19"><a href="#fn:19" class="footnote">2</a></sup>.</p>
<p>Historically, one topic of interest to number theorists has been diophantine equations. The are equations where you are looking for integer solutions. One famous example is <script type="math/tex">a^2+b^2=c^2</script> where you look for integer solutions. In general, there’s no overarching method to solve any diophantine equation <sup id="fnref:2"><a href="#fn:2" class="footnote">3</a></sup>, and so individual equations may be solved using ad hoc seeming methods. For example, the pythagorean equation can be solved <a href="../number-theory">by projecting points from the unit circle onto a line</a> <sup id="fnref:3"><a href="#fn:3" class="footnote">4</a></sup>. Another (class of) well-known example(s) is due to Fermat: <script type="math/tex">a^n+b^n=c^n, n>2</script>, but we’ll put off solving this one until a later post.</p>
<p>This post is all about solving Pell’s equation (here, of course, <script type="math/tex">x,y,</script> and <script type="math/tex">d</script> are integers)</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
x^2-dy^2 = 1 && d>1
\end{align*} %]]></script>
<blockquote>
<p>Question<br />
Why do we require <script type="math/tex">d>1</script>? What happens if <script type="math/tex">d\le1</script>?</p>
</blockquote>
<blockquote>
<p>Edit<br />
I never mentioned this in the original post <sup id="fnref:37"><a href="#fn:37" class="footnote">5</a></sup>, but we also want to assume that <script type="math/tex">d</script> is not a square number. If <script type="math/tex">d=k^2</script>, then the equation becomes <script type="math/tex">(x-ky)(x+ky)=1</script> which means <script type="math/tex">x+ky=x-ky=\pm1</script> so <script type="math/tex">ky=-ky\implies y=0</script> and <script type="math/tex">(x,y)=(\pm1,0)</script> are the only solutions.</p>
</blockquote>
<h1 id="a-warm-up-problem-y2x3-2">A Warm-up Problem: <script type="math/tex">y^2=x^3-2</script></h1>
<p>Before solving Pell’s equations, we’ll start with a simpler task (although it may not be immediately obvious that this equation is any easier to solve). At this point, if it seems like things here will be really novel to you, then I recommend that you check out my <a href="../number-theory">previous post on number theory</a>. It’s not required to understand this post, and won’t necessarily add a bunch to your knowledge of the ideas used here, but I think it could serve as good motivation for seeing that both geometric reasoning and working in number systems larger than <script type="math/tex">\Z</script> can be helpful in tackling number theoretic problems <sup id="fnref:8"><a href="#fn:8" class="footnote">6</a></sup>.</p>
<p>If you were to decide to stop reading, leave this post, and go start finding and solving diophantine equations, one thing you will notice is that multiplication makes things so much easier.</p>
<blockquote>
<p>Mini warmup<br />
<script type="math/tex">% <![CDATA[
\begin{align*}
&x^2+6xy+5y^2=10\\
\implies &(x+5y)(x+y) = 10\\
\implies &x+y=2, x+5y=5 &or& &x+y=5, x+5y=2 &&or\\
&x+y=-2, x+5y=-5 &or& &x+y=-5, x+5y=-2\\
\implies &2+4y=5 &or& &5+4y=2 &&or\\
-&2+4y=-5 &or& &-5+4y=-2\\
\implies &y=\pm\frac34
\end{align*} %]]></script><br />
Hence, this particular diophantine equation has no solutions.</p>
</blockquote>
<p>If you can set your problem up as one thing times another thing equals a third thing, then since everything is an integer, the things on the left hand side must be factors of the right hand side! This vastly reduces the number of potential solutions <sup id="fnref:4"><a href="#fn:4" class="footnote">7</a></sup>, and often can lead directly into an actual solution (ot show that non exist).</p>
<p>That being said, the key insight to solving our warmup problem is that we can rewrite it as <script type="math/tex">y^2+2=x^3</script>. I’ll take a second to pause so you can let out a gasp <sup id="fnref:38"><a href="#fn:38" class="footnote">8</a></sup> of amazement. Once things are in this form, we can see that the left hand side is almost a difference of squares. The only problem is that it’s not a difference and <script type="math/tex">2</script>’s not a square, but motivated by the possibility of factoring the left hand side, we ignore these constraints, stop restricting ourselves to <script type="math/tex">\Z</script>, and from here on out, do our work in <script type="math/tex">\zadjns2=\{a+b\sqrt{-2}\mid a,b\in\Z\}</script> instead <sup id="fnref:35"><a href="#fn:35" class="footnote">9</a></sup>. I don’t know if this feels illegitamate, but it shouldn’t because it’s not, so I’m gonna move on.</p>
<p>We can now write our equation as <script type="math/tex">(y+\sqrt{-2})(y-\sqrt{-2})=x^3</script>. At this point, we really hope that <script type="math/tex">y\pm\sqrt{-2}</script> are coprime so that they must both be perfect cubes; this would be a fairly restrictive condition. However, hoping this would be getting ahead of ourselves. This line of thinking would work in <script type="math/tex">\Z</script>, but the reason it works (and the reason we can have a sensible definiton of coprime in the first place) is because <script type="math/tex">\Z</script> is a unique factorization domain <sup id="fnref:5"><a href="#fn:5" class="footnote">10</a></sup>, but we’re working with <script type="math/tex">\zadjns2</script> instead of just <script type="math/tex">\Z</script>. Luckily, it turns out that this is a UFD as well, but this is a non-trivial claim that could have failed if we had added a different square root instead <sup id="fnref:6"><a href="#fn:6" class="footnote">11</a></sup>.</p>
<blockquote>
<p>Exercise<br />
Show that <script type="math/tex">\zadjns2</script> is a UFD.<br />
Hint: It suffices to show that it’s a Euclidean domain, and you can do this by considering points closest to the “ambient quotient” <sup id="fnref:7"><a href="#fn:7" class="footnote">12</a></sup>. Also, you might want to read ahead a little before tackling this exercise.</p>
</blockquote>
<p>To show that <script type="math/tex">y\pm\sqrt{-2}</script> are coprime, we’ll introduce a norm.</p>
<blockquote>
<p>Definition<br />
Given <script type="math/tex">x+y\sqrt{-2}\in\zadjns2</script> with <script type="math/tex">x,y\in\Z</script>, it’s <strong>norm</strong> is <script type="math/tex">N(x+y\sqrt{-2}):=(x+y\sqrt{-2})(x-y\sqrt{-2})=x^2+2y^2</script></p>
</blockquote>
<p>This definition should look familar to anyone who read my previous post, and so it should not come as a surprise that this norm is multiplicative. That is, for any <script type="math/tex">x,y\in\Z[\sqrt{-2}]</script>, we have <script type="math/tex">N(xy)=N(x)N(y)</script>. Let <script type="math/tex">p\in\Z[\sqrt{-2}]</script> be a common factor of <script type="math/tex">y\pm\sqrt{-2}</script>; this means that <script type="math/tex">p\mid(y+\sqrt{-2})-(y-\sqrt{-2})</script> so <script type="math/tex">p\mid2\sqrt{-2}=-(\sqrt{-2})^3</script>.</p>
<p>Before proceeding, a quick note. When considering factoring and related concepts (like primality), we don’t care about units (numbers dividing 1) because units are annoying and change nothing. Furthermore, a number <script type="math/tex">x\in\zadjns2</script> is a unit iff <script type="math/tex">N(x)=1</script>. Proving this is left as an exercise to the reader.</p>
<p>Now, back to our problem. The following proposition implies that <script type="math/tex">p=u\sqrt{-2}^e</script> for some unit <script type="math/tex">u\in\Z[\sqrt{-2}]</script> and integer <script type="math/tex">0\le e\le 3</script>.</p>
<blockquote>
<p>Proposition<br />
In <script type="math/tex">\zadjns2</script>, <script type="math/tex">\sqrt{-2}</script> is prime <sup id="fnref:36"><a href="#fn:36" class="footnote">13</a></sup>.</p>
</blockquote>
<div class="proof2">
Pf: It suffices to note that \(N(\sqrt{-2})=2\) is prime (why?). \(\square\)
</div>
<p>While we’re on the subject</p>
<blockquote>
<p>Exercise<br />
Show that the only units in <script type="math/tex">\Z[\sqrt{-2}]</script> are <script type="math/tex">\pm1</script>.</p>
</blockquote>
<p>Returning to showing that those two numbers are coprime, we can now safely conclude that <script type="math/tex">p=u\sqrt{-2}^e</script> with <script type="math/tex">u,e</script> as described above. Hence, either <script type="math/tex">p</script> is a unit (in which case we win) or <script type="math/tex">\sqrt{-2}\mid p</script>, so let’s assume the latter. This then means that <script type="math/tex">\sqrt{-2}\mid y+\sqrt{-2}</script>. So, for appropriately chosen integers <script type="math/tex">u,v</script>, we have</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
y+\sqrt{-2} = \sqrt{-2}(u+v\sqrt{-2}) &= -2v + u\sqrt{-2} \\
&\implies y = -2v
\end{align*} %]]></script>
<p>Finally, the following proposition shows that this is impossible so <script type="math/tex">p</script> must be a unit, and hence <script type="math/tex">y\pm\sqrt{-2}</script> are coprime.</p>
<blockquote>
<p>Proposition<br />
If <script type="math/tex">(x,y)\in\Z^2</script> satisfies <script type="math/tex">y^2=x^3-2</script>, then <script type="math/tex">y</script> is odd.</p>
</blockquote>
<div class="proof2">
Pf: Assume \(y^2=x^3-2\) for integers \(x,y\). If \(y\) is even, then so is \(x\), so \(x^3\) is divisble by \(8\) but \(y^2+2=4k+2\) is not. \(\square\)
</div>
<p>Now we’re almost there. We know that <script type="math/tex">y+\sqrt{-2}</script> and <script type="math/tex">y-\sqrt{-2}</script> are coprime, and that their product is <script type="math/tex">x^3</script>. It follows from unique factorization that <script type="math/tex">y+\sqrt{-2}</script> must the product of a unit and a cube. However, you showed that the only units are <script type="math/tex">\pm1</script>, so <script type="math/tex">y+\sqrt{-2}=(a+b\sqrt{-2})^3</script> for some integers <script type="math/tex">a,b</script>. We can expand things to get</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
& y+\sqrt{-2} &=& (a^3-6ab^2)+\sqrt{-2}(3a^2b-2b^3)\\
\implies & 1 &=& 3a^2b-2b^3 &=& b(3a^2-2b^2)\\
\implies & \pm1 &=& b\\
\implies & \pm1 &=& 3a^2 - 2\\
\implies & 2\pm1 &=& 3a^2\\
\implies & \pm1 &=& a\\
\implies & 1 &=& b
\end{matrix} %]]></script>
<p>I didn’t feel like explaining all those implications, but the moral of the story is that we have two solutions given by <script type="math/tex">(a,b)=(\pm1, 1)</script> which correspond to <script type="math/tex">y+\sqrt{-2}=(1+\sqrt{-2})^3=-5+\sqrt{-2}</script> and <script type="math/tex">y+\sqrt{-2}=(-1+\sqrt{-2})^3=5+\sqrt{-2}</script>. Both of these solutions for <script type="math/tex">y</script> corresponsd to <script type="math/tex">x=3</script>, so our original equation has two solutions: <script type="math/tex">(x,y)=(3,\pm5)</script>.</p>
<h1 id="introduction-to-algebraic-integers">Introduction to Algebraic Integers</h1>
<p>Now that the warmup is done, we can start building all the theory we’ll need to solve Pell’s equations. The first thing we’ll need is the so-called ring of integers. As the warmup demonstrated, it’s often helpful to work in “large” systems of numbers instead of the relatively small <script type="math/tex">\Z</script>. However, it may not always be clear what group of numbers is the right one for a problem. To answer this question, we focus our attention on algebraic integers. The idea is that regular integers are a particular, nice subset of the field <script type="math/tex">\Q</script> of rational numbers <sup id="fnref:9"><a href="#fn:9" class="footnote">14</a></sup>, and so the right systems of numbers to work with are analagous nice subsets of fields bigger than <script type="math/tex">\Q</script>. Because we’re doing algebra, and algebraists love little more than polynomials, the characterization of nice will be done in terms of a polynomial condition inspired by the <a href="https://www.wikiwand.com/en/Rational_root_theorem">rational root theorem</a>. With all of that said, let’s see some definitions</p>
<blockquote>
<p>Definition<br />
A <strong>number field</strong> <script type="math/tex">K</script> is a finite field extension <sup id="fnref:10"><a href="#fn:10" class="footnote">15</a></sup> of <script type="math/tex">\Q</script>. Furthermore, if <script type="math/tex">K</script> is of the form <script type="math/tex">K=\Q(\sqrt d)=\{a+b\sqrt d\mid a,b\in\Q\}</script> for squarefree <script type="math/tex">d\in\Z</script>, then <script type="math/tex">K</script> is called a <strong>(real or imaginary) quadratic number field</strong>.</p>
</blockquote>
<p>Number fields play the role of <script type="math/tex">\Q</script> in the big picture. From these, we extract nice subsets of so-called algebraic integers.</p>
<blockquote>
<p>Definition<br />
An <strong>algebraic integer</strong> <script type="math/tex">x</script> is a root of a monic polynomial <sup id="fnref:11"><a href="#fn:11" class="footnote">16</a></sup> <script type="math/tex">f\in\Z[X]</script> with integer coefficients. Given a number field <script type="math/tex">K</script>, it’s <strong>ring of integers</strong> <script type="math/tex">\ints K</script> <sup id="fnref:12"><a href="#fn:12" class="footnote">17</a></sup> is the set of algebraic integers in <script type="math/tex">K</script>.</p>
</blockquote>
<p>If this definition, seems weird, then maybe the next exercise will help you see why it’s actually reasonable. If that doesn’t work, then try to come up with another definition that’s well-defined for any number field and makes sense in the case of <script type="math/tex">\Q</script>.</p>
<blockquote>
<p>Exercise<br />
Show that <script type="math/tex">\ints\Q=\Z</script>.</p>
</blockquote>
<p>Because of this exercise, mathematicians sometimes refer to <script type="math/tex">\Z</script> as the ring of “rational” integers in order to distinquish it from other rings of integers. Also, I keep calling these things rings, but it is in no way obvious that they actually do form rings (go ahead and try to prove that <script type="math/tex">ab</script> and <script type="math/tex">a+b</script> are algebraic integers if <script type="math/tex">a,b</script> are. I won’t cover the proof here, but the secret is Cramer’s formula).</p>
<p>For the purposes of this post, we’ll only need to study quadratic number fields, but it’s worth noting that number fields in general – and even arbitrary finite field extensions – have a norm.</p>
<blockquote>
<p>Definition<br />
Given a finite field extension <script type="math/tex">L/K</script>, let <script type="math/tex">\alpha\in L</script> be an arbitary element. Note that <script type="math/tex">\alpha</script> induces a map <script type="math/tex">m_\alpha:L\rightarrow L</script> given by multiplication <script type="math/tex">m_\alpha(\beta)=\alpha\beta</script>, and that <script type="math/tex">m_\alpha</script> is <script type="math/tex">K</script>-linear. We define the <strong>norm</strong> of <script type="math/tex">\alpha</script> to be the determinant of this map. That is, the norm is a map<br /></p>
</blockquote>
<center>$$\begin{matrix}
\norm_{L/K}: &L &\longrightarrow& K\\
&\alpha &\longmapsto& \det(m_\alpha)
\end{matrix}$$</center>
<p>This definition is quite a bit to digest, but we’ll unpack it in the case of quadratic number fields. One thing of note we can quickly gleen from this definition is that it makes the statement that the norm is multiplicative almost trivial (why?).</p>
<p>Now, let’s see what this definition gives in the quadratic case. Fix some squarefree <script type="math/tex">d\in\Z-\{1\}</script>, and let <script type="math/tex">K=\Q(\sqrt d)</script> so <script type="math/tex">K/\Q</script> is a degree 2 <sup id="fnref:13"><a href="#fn:13" class="footnote">18</a></sup> field extension, and one <script type="math/tex">\Q</script>-basis of <script type="math/tex">K</script> is <script type="math/tex">\{1,\sqrt d\}</script>. Fix any element <script type="math/tex">\alpha=a+b\sqrt d</script> of <script type="math/tex">K</script> with <script type="math/tex">a,b\in\Q</script>. We are interested in the determinant of its multiplication map, so we’ll first find the matrix for this map. To do this we only need to compute <script type="math/tex">m_\alpha(1)=\alpha=a+b\sqrt d</script> and <script type="math/tex">m_\alpha(\sqrt d)=\alpha\sqrt d=db+a\sqrt d</script>. Hence, the <script type="math/tex">m_\alpha</script> is given by this matrix (assuming we use the basis <script type="math/tex">\{1,\sqrt d\}</script>):</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{pmatrix}
a & db\\
b & a
\end{pmatrix} %]]></script>
<p>Thus, <script type="math/tex">\knorm(a+b\sqrt d)=a^2-db^2</script> which turns out to also be <script type="math/tex">(a+b\sqrt d)(a-b\sqrt d)</script> <sup id="fnref:14"><a href="#fn:14" class="footnote">19</a></sup>. The fact that this norm takes this multiplicative form means the following theorem is really easy to prove in the quadratic case <sup id="fnref:15"><a href="#fn:15" class="footnote">20</a></sup>.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">K</script> be a quadratic number field. Then, <script type="math/tex">\alpha\in\ints K</script> is a unit (in <script type="math/tex">\ints K</script>, not <script type="math/tex">K</script>. <script type="math/tex">K</script> is a field so basically everything is a unit) if and only if <script type="math/tex">\knorm(\alpha)=\pm1</script></p>
</blockquote>
<div class="proof2">
Pf: \((\rightarrow)\) Assume \(\alpha\in\ints K\) is a unit, and write \(\alpha\beta=1\) Then, \(1=\knorm(1)=\knorm(\alpha\beta)=\knorm(\alpha)\knorm(\beta)\implies\knorm(\alpha)=\pm1\)<br />
(\(\leftarrow\)) Coversely, assume \(\knorm(\alpha)=\pm1\). As the above discussion noted, \(\knorm(\alpha)\) is a product of two numbers, so this says exactly that \(\alpha\) is a unit. \(\square\)
</div>
<p>The diligent reader will be somewhat bothered by the above proof. That’s because it implicitly relies upon something I forgot to prove first, which is the following (where does the above proof rely on this theorem?).</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">K</script> be a quadratic number field. Then, <script type="math/tex">\knorm(\ints K)\subseteq\ints\Q</script> which is to say that the norm of an algebraic integer is a rational integer <sup id="fnref:22"><a href="#fn:22" class="footnote">21</a></sup>.</p>
</blockquote>
<div class="proof3">
Pf: First, let $$K=\qadjs d$$ and $$\alpha=a+b\sqrt d$$ with $$a,b\in\Q$$. Then, $$\alpha$$ is an algebraic integer if and only if $$\conj\alpha:=a-b\sqrt d$$ is an algebraic integer. This is because the map $${}^-:\ints K\rightarrow\ints K$$ fixes $$\Z$$ and preserves both addition and multiplication, so a polynomial $$f\in\Z[X]$$ satisfied by $$\alpha$$ is also satisfied by $$\conj\alpha$$. In particular, $$\alpha$$ satisfies a monic polynomial $$f\in\Z[X]$$ if and only if $$\conj\alpha$$ does, so they share integrality statuses. Now, the product of two algebraic integers is an algebraic integer (mentioned before, not proved), so $$\knorm(\alpha)$$ is an algebraic integer whenever $$\alpha$$ is. Since it's also a rational number, this means $$\knorm$$ maps algebraic integers into rational integers as claimed. $$\square$$
</div>
<p>Now, one last thing, and then we’ll say how all of this discussion on integers and norms relates to Pell’s equations <sup id="fnref:16"><a href="#fn:16" class="footnote">22</a></sup>. I’ve tried to be careful so far to be conscious of the fact than a priori, an elment of <script type="math/tex">\ints K</script> could look like a general member of <script type="math/tex">K</script> in the sense that it’s coefficients could be general rational numbers. From here on out, we’ll be a little more concrete because we’re going to actually compute <script type="math/tex">\ints K</script> in the quadratic case.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">K=\Q(\sqrt d)</script> be a quadratic number field with <script type="math/tex">d\in\Z-\{1\}</script> a product of distinct primes (i.e. square free). Then,<br /></p>
</blockquote>
<center>$$\ints K=\begin{cases}
\zadjs d & \text{if } d\equiv2,3\pmod 4 \\
\zadj{\frac{1+\sqrt d}2} & \text{if }d\equiv 1\pmod 4
\end{cases}$$</center>
<div class="proof3">
Pf: Assume $$K,d$$ are as above, pick any $$\alpha\in\ints K$$, and write $$\alpha=x+y\sqrt d$$ with $$x,y\in\Q$$. Then, $$\conj\alpha=x-y\sqrt d\in\ints K$$ as well, and so is $$\alpha+\conj\alpha=2x$$. This means that $$2x\in\Q\cap\ints K=\Z$$ so $$x$$ is either an integer or half an integer.<br /><br />
Case 1: $$x\in\Z$$<br />
We also know that $$\knorm(\alpha)=\alpha\conj\alpha=x^2-dy^2\in\Z$$ so by taking a difference, this means that $$dy^2\in\Z$$. This means that $$y\in\Z$$. If it were not, then the denominator of $$y^2$$ would be divided by some prime $$p$$ more than once. However, $$d$$ is divisible by $$p$$ at most once, so the product $$dy^2$$ would contain a $$p$$ in the denominator and hence not be an integer. Thus, $$\alpha\in\Z[\sqrt d]$$.<br /><br />
Case 2: $$x=\frac n2$$ for some odd $$n\in\Z$$<br />
Once again, $$\knorm(\alpha)=x^2-dy^2=n^2/4-dy^2\in\Z$$. Note that $$y\not\in\Z$$ since otherwise we would have $$n^2/4\in\Z$$. Counting prime factors again shows us that we must have $$y=m/2$$ for some odd $$m\in\Z$$ which means that $$n^2-dm^2\equiv0\pmod4$$ with $$n^2,m^2\equiv1\pmod4$$ so $$d\equiv1\pmod4$$. Hence, $$\alpha=n/2+m\sqrt d/2=(1+\sqrt d)/2+(n-1)/2+\sqrt d(m-1)/2\in\Z[(1+\sqrt d)/2]$$ since $$\sqrt d=2(1+\sqrt d)/d - 1\in\Z[(1+\sqrt d)/2]$$<br />
<div align="right">\(\square\)</div>
</div>
<blockquote>
<p>Corollary<br />
<script type="math/tex">\ints{\Q(\sqrt{-2})}=\Z[\sqrt{-2}]</script> so this was the right setting for the warmup problem.</p>
</blockquote>
<p>At this point, I think we know everything about rings of integers that we’ll need <sup id="fnref:17"><a href="#fn:17" class="footnote">23</a></sup>. In case you have forgotten, our goal is find all integer solutions to Pell’s equations which are <script type="math/tex">x^2-dy^2=1</script> for integers <script type="math/tex">x,y\in\Z</script> and positive integer <script type="math/tex">d\in\Z_{>1}</script>. As this discussion hinted at, for the time being, we’ll further restrict <script type="math/tex">d</script> to be square free. This has the advantage that since Pell’s equation can then be written as <script type="math/tex">(x-y\sqrt d)(x+y\sqrt d)=1</script>, <script type="math/tex">d</script> which means that we’re really just looking for units of <script type="math/tex">\Z[\sqrt d]</script>, which is convenient because <script type="math/tex">\Z[\sqrt d]=\ints{\Q(\sqrt d)}</script> for square free <script type="math/tex">d</script> (or at least it is 2 times out of 3) <sup id="fnref:18"><a href="#fn:18" class="footnote">24</a></sup>.</p>
<h1 id="geometry-of-numbers">Geometry of Numbers</h1>
<p>I debated whether I should talk about what comes next in one section or two. I ultimately decided on two because I didn’t want to introduce too much stuff all at once. A priori, the material of this section isn’t relevant to the larger discussion at hand, but in the next section, we’ll see it play a crucial role. This is the point in the post where we open up to the possibility of me throwing in some pictures.</p>
<blockquote>
<p>Definition<br />
A <strong>lattice</strong> of a real vector space is the <script type="math/tex">\Z</script>-span of some <script type="math/tex">\R</script>-basis. If <script type="math/tex">L</script> is a lattice of a real vector space <script type="math/tex">V</script>, then we say the <strong>rank</strong> of <script type="math/tex">L</script> is the dimension of <script type="math/tex">V</script> <sup id="fnref:20"><a href="#fn:20" class="footnote">25</a></sup>.</p>
</blockquote>
<p>The only (finite-dimensional) real vector spaces are <script type="math/tex">\R^n</script> for various choices of <script type="math/tex">n</script>, so a lattice is really just a set of the form <script type="math/tex">\{a_1b_1+a_2b_2+\dots+a_nb_n:a_1,\dots,a_n\in\Z, b_1,\dots,b_n\in\R\}</script> and the <script type="math/tex">b_i</script>’s are <script type="math/tex">\Z</script>-linearly independent. We might write such a lattice using the notation <script type="math/tex">L=\Z b_1\oplus\Z b_2\oplus\dots\oplus\Z b_n</script> <sup id="fnref:32"><a href="#fn:32" class="footnote">26</a></sup>. The canonical example (and in some sense only (finite) example) of a lattice is <script type="math/tex">\Z^n</script>. Some lattices of <script type="math/tex">\R^2</script> are pictured below</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/lattices.jpg" width="650" height="200" />
</center>
<p>One property of lattices that will come up is that they are discrete.</p>
<blockquote>
<p>Definition<br />
A subset <script type="math/tex">D\subset\R^n</script> is called <strong>discrete</strong> if only finitely many of its points are contained in any bounded region. That is, it is discrete if <script type="math/tex">D\cap B_r</script> is finite for all <script type="math/tex">r\in\R</script> where <script type="math/tex">B_r</script> is the (solid) ball of radius <script type="math/tex">r</script> centered at the origin.</p>
</blockquote>
<p>I won’t give a full, formal proof that all lattices are discrete, but I will sketch one direction you could take. The idea is that in a lattice, there exists some <script type="math/tex">\eps>0</script> such that any two lattice points are a distance greater than <script type="math/tex">\eps</script> apart. So, if you have some bounded set <script type="math/tex">B</script>, you can split it up into finitely many balls <script type="math/tex">B_{\frac{\eps}2}</script> of radius <script type="math/tex">\eps/2</script>. Each of these contains at most 1 lattice point, so <script type="math/tex">B</script> contains a finite number of lattice points.</p>
<p>Now, lattices are discrete and look a lot like <script type="math/tex">\Z^n</script>, so it’s natural to think that have some connection with numbers<sup id="fnref:23"><a href="#fn:23" class="footnote">27</a></sup>. Because of this, and because they have applications in number theory, some results on or relating to lattices form the so-called <a href="https://www.wikiwand.com/en/Geometry_of_numbers">geometry of numbers</a>. Here, we’ll prove and use one such theorem, but before that, we need to describe the volume of a lattice.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">\Gamma</script> be a lattice in <script type="math/tex">\R^n</script>. Then, we say the <strong>volume</strong> <script type="math/tex">\vol_{\Gamma}</script> of <script type="math/tex">\Gamma</script> is the volume of a parrallelogram <sup id="fnref:24"><a href="#fn:24" class="footnote">28</a></sup> spanned by a <script type="math/tex">\Z</script>-basis of <script type="math/tex">\Gamma</script>.</p>
</blockquote>
<blockquote>
<p>Examples<br />
The standard lattice <script type="math/tex">\Gamma_1=\Z^2=\Z(1,0)\oplus\Z(0,1)</script> has volume <script type="math/tex">\vol_{\Gamma_1}=1</script> since the basis <script type="math/tex">\{(0,1),(1,0)\}</script> spans a unit square.<br /><br />
The lattice <script type="math/tex">\Gamma_2 = \Z(1,\sqrt2)\oplus\Z(0,-2\sqrt2)</script> has voluem <script type="math/tex">\vol_{\Gamma_2}=2\sqrt2</script></p>
</blockquote>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/volumes.jpg" width="650" height="200" />
</center>
<p>I know what you’re thinking. What if I choose a different basis for my lattice? Instead of writing <script type="math/tex">\Z^2=\Z(1,0)\oplus\Z(0,1)</script>, I might want to write <script type="math/tex">\Z^2=\Z(2,-1)\oplus\Z(-3,1)</script>. Well, doesn’t matter.</p>
<blockquote>
<p>Theorem<br />
The volume of a lattice is well-defined.</p>
</blockquote>
<div class="proof3">
Pf: Let $$\Gamma$$ be a lattice in $$\R^n$$, and let $$B_1,B_2$$ be two bases for $$\Gamma$$. Then, these bases can be represented by matrices and we have $$\vol_{\Gamma}=|\det B_1|$$ or $$\vol_{\Gamma}=|\det B_2|$$, so we only need to show that $$|\det B_1|=|\det B_2|$$. Note that, since they're bases for the same lattice, there much exist a change a basis matrix $$T$$ with $$B_1=TB_2$$. Furthermore, $$T$$ must be invertible with integer entries so $$\det T\in\Z^\times=\{\pm1\}$$, so $$|\det B_1|=|\det T\det B_2|=|\det B_2|$$. $$\square$$
</div>
<p>Now, a couple definitions just to make sure everyone is on the same page, and then the main theorem.</p>
<blockquote>
<p>Definitions<br />
Fix some subset <script type="math/tex">D\subseteq\R^n</script>.<br />
We say <script type="math/tex">D</script> is <b>compact</b> if it is closed <sup id="fnref:25"><a href="#fn:25" class="footnote">29</a></sup> and bounded.<br /><br />
We say <script type="math/tex">D</script> is <strong>convex</strong> if any line between points in <script type="math/tex">D</script> is contained in <script type="math/tex">D</script>. That is, for any <script type="math/tex">a,b\in D</script> and <script type="math/tex">t\in[0,1]</script>, the point <script type="math/tex">at+b(1-t)\in D</script>.<br /><br />
Fixing some point <script type="math/tex">o\in\R^n</script>, we say <script type="math/tex">D</script> is <strong>symmetric about <script type="math/tex">o</script></strong> if for all <script type="math/tex">p\in D</script>, we also have <script type="math/tex">o-p\in D</script>.</p>
</blockquote>
<ul>
<li>A ball is compact, convex, and symmetric about it’s center</li>
<li>A sphere is compact and symmetric about it’s center, but not convex</li>
<li>A triangle is compact and convex, but not symmetric about any point</li>
<li>The inside of a ball is convex and symmetric about it’s center, but not compact</li>
<li>A five-pointed star is compact and convex, but not symmetric about any point</li>
<li>A line segment is compact, convex, and symmetric about it’s center</li>
<li>An infinite line is convex and symmetric about many points, but not compact</li>
</ul>
<blockquote>
<p>Minkowski Convex Body Theorem<br />
Let <script type="math/tex">\Gamma</script> be a lattice in <script type="math/tex">\R^n</script>, and <script type="math/tex">D\subseteq\R^n</script> be compact, convex, and symmetric about the origin. Furthermore, assume <script type="math/tex">\vol(D)\ge2^n\vol_\Gamma</script>. Then, <script type="math/tex">\Gamma\cap D\neq\{0\}</script>.</p>
</blockquote>
<p>The idea behind the proof is that <script type="math/tex">D</script> is just too big to miss all of <script type="math/tex">\Gamma</script>. You essentially take a big parralellogram spanned by a basis of <script type="math/tex">\Gamma</script> and tile <script type="math/tex">\R^n</script> with it. After that you move all the pieces touching <script type="math/tex">D</script> back to the original piece about the origin, and if the volume of <script type="math/tex">D</script> is greater than the volume of the original piece, then two points of <script type="math/tex">D</script> must end up at the same point of the parallelogram. This means that their difference must be twice a lattice point, so their midpoint is a lattice point.</p>
<p>I wasn’t sure what the best way to visualize this without it being a mess was, so here’s a picture of the parralellogram to keep in mind. You can add in <script type="math/tex">D</script> and whatnot using your imagination.</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/translates.jpg" width="300" height="150" />
</center>
<p>If you follow the sketch above, in the end, it relies on <script type="math/tex">D</script> have strictly greater volume, but the theorem doesn’t. This is reconciled by the following.</p>
<blockquote>
<p>Lemma<br />
If Minkowski’s theorem holds for all <script type="math/tex">D</script> with <script type="math/tex">\vol(D)>2^n\vol_\Gamma</script>, then it holds for all <script type="math/tex">D</script> with <script type="math/tex">\vol(D)=2^n\vol_\Gamma</script> as well.</p>
</blockquote>
<div class="proof3">
Pf: Assume $$D,\Gamma$$ satisfy all the conditions of Minkowski's theorem, and that $$\vol(D)=2^n\vol_\Gamma$$. For all $$\eps>0$$, let $$D_{\eps}=(1+\eps)D$$ so $$D_{\eps}$$ is compact, convex, symmetric about the origin, and $$\vol(D)>2^n\vol_\Gamma$$. Since $$\Gamma$$ is discrete, we know that $$D_{\eps}\cap\Gamma$$ is finite for all $$\eps>0$$. Now, note that for $$\eps<\eps'$$ we have $$D_{\eps}\cap L\subseteq D_{\eps'}\cap L$$. Because all these sets are finite, we can only lose points for so long. Hence, for suffficiently small $$\eps$$, all $$D_{\eps}\cap\Gamma$$ are the same, so there exists some nonzero $$\ell\in\Gamma\cap_{\eps>0}D_{\eps}$$, but since $$D$$ is closed, $$\cap_{\eps>0}D_{\eps}=D$$, so the lemma holds. $$\square$$
</div>
<p>Awesome, now we handle the main theorem with the further assumption that <script type="math/tex">D</script> has strictly greater volume.</p>
<div class="proof3">
Pf: Assume $$D,\Gamma$$ satisfy all the conditions of Minkowski's theorem, and that $$\vol(D)>2^n\vol_\Gamma$$. Pick some $$\Z$$-basis $$\mathbb e=\{e_i\}_{i=1}^n$$ of $$\Gamma$$, and let $$P:=\left\{\sum_{i=1}^nt_ie_i:-1\le t_i< 1\right\}$$. Note that $$\vol(P)=2^n\vol_\Gamma$$, and that $$\R^n=\bigsqcup_{\ell\in\Gamma}(P+2\ell)$$. Because $$D$$ is bounded, $$D_\ell:=D\cap(P+\ell)$$ is nonempty for only finitely many $$\ell\in\Gamma$$. Now, consider translates $$D_\ell':=D_\ell-\ell\subseteq P$$ and note that<br />
<center>
$$\begin{align*}
\sum_{\ell\in\Gamma}\vol(D_\ell')
= \sum_{\ell\in\Gamma}\vol(D_\ell)
= \vol(D)
> 2^n\vol_\Gamma
= \vol(P)
\end{align*}$$
</center>
Thus, there must exist distinct $$\ell_1,\ell_2\in2\Gamma$$ such that $$D_{\ell_1}'\cap D_{\ell_2}'\neq\emptyset$$ so pick some $$x\in D_{\ell_1}$$ and $$y\in D_{\ell_2}$$ s.t. $$x-\ell_1=y-\ell_2$$. Note that $$(x-y)/2$$ is nonzero by assumption and in $$D$$ by convexity and symmetry about the origin. At the same time<br />
<center>
$$\begin{align*}
\frac{x-y}2 = \frac{\ell_1-\ell_2}2 \in \frac12(2\Gamma) = \Gamma
\end{align*}$$
</center>
Thus, the theorem holds. $$\square$$
</div>
<h1 id="real-embeddings-of-quadratic-number-fields">Real Embeddings of Quadratic Number Fields</h1>
<p>Now we know some things about about quadratic number fields, and we know a little about the geometry of numbers, so let’s put the two together. The bridge between abstract fields and more concrete geometric ideas will be embeddings.</p>
<blockquote>
<p>Definition<br />
A <strong>real embedding of a number field</strong> <script type="math/tex">K</script> is a ring homomorphism <sup id="fnref:26"><a href="#fn:26" class="footnote">30</a></sup> <script type="math/tex">K\hookrightarrow\R</script>.</p>
</blockquote>
<p>If you’ll recall, in Pell’s equations, we have some <script type="math/tex">d>1</script>, so consider such a <script type="math/tex">d</script> and a number field <script type="math/tex">K=\Q(\sqrt d)</script>. This field has 2 real embeddings. An embedding is determined by where it sends <script type="math/tex">1</script> and <script type="math/tex">\sqrt d</script> (why?), but <script type="math/tex">1</script> must map into <script type="math/tex">1</script>. However, <script type="math/tex">\sqrt d</script> has two possible images corresponding to the two solutions to <script type="math/tex">x^2-d=0</script>. In fact, if <script type="math/tex">\sigma:K\hookrightarrow\R</script> is an embedding, then <script type="math/tex">\sigma(x^2-d)=\sigma(x)^2-d</script> so <script type="math/tex">\sigma(\sqrt d)^2-d=\sigma((\sqrt d)^2-d)=\sigma(0)=0</script>. This is all to say that a real quadratic field has two real embeddings. Because these are equivalent as far as algebra is concerned, instead of choosing one arbitarily, we’ll make use of both. Define the function <sup id="fnref:27"><a href="#fn:27" class="footnote">31</a></sup></p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\iota: &K &\longrightarrow& \R^2\\
&\alpha &\longmapsto& (\alpha, \conj\alpha)\\
&a+b\sqrt d &\longmapsto& (a+b\sqrt d, a-b\sqrt d) && a,b\in\Q
\end{matrix} %]]></script>
<blockquote>
<p>Theorem<br />
For a real quadratic number field <script type="math/tex">K</script>, <script type="math/tex">\iota(\ints K)</script> is a lattice.</p>
</blockquote>
<div class="proof3">
Pf: Proof left as an exercise for the reader. $$\square$$
</div>
<p>A natural question to ask is what is the volume of <script type="math/tex">\iota(\ints K)</script>? Once again, fix some square free <script type="math/tex">d>1</script> and let <script type="math/tex">K=\Q(\sqrt d)</script>. Then,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\vol_{\iota(\ints K)} &=&
\begin{vmatrix}
\iota(1) \\
\iota(\sqrt d)
\end{vmatrix} &=&
\begin{vmatrix}
1 & 1\\
\sqrt d & -\sqrt d
\end{vmatrix} &=&
|-2\sqrt d| &=&
2\sqrt d
&& \text{if }d\equiv2,3\pmod4\\
\vol_{\iota(\ints K)} &=&
\begin{vmatrix}
\iota(1) \\
\iota\left(\frac{1+\sqrt d}2\right)
\end{vmatrix} &=&
\begin{vmatrix}
1 & 1\\
\frac{1+\sqrt d}2 & \frac{1-\sqrt d}2
\end{vmatrix} &=&
|-\sqrt d| &=&
\sqrt d
&& \text{if }d\equiv1\pmod4
\end{matrix} %]]></script>
<p>Motivated by this calculation, we make the following definition</p>
<blockquote>
<p>Definition<br />
The <strong>discriminant</strong> of a real quadratic number field <script type="math/tex">K=\Q(\sqrt d)</script> is <script type="math/tex">4d</script> if <script type="math/tex">d\equiv2,3\pmod4</script> and <script type="math/tex">d</script> if <script type="math/tex">d\equiv1\pmod4</script>. Depending on how we feel, we may denote this <script type="math/tex">\disc(K), \zdisc(\ints K),</script> or <script type="math/tex">D_K</script>.</p>
</blockquote>
<p>Note that <script type="math/tex">\vol_{\ints K}:=\vol_{\iota(\ints K)}=\sqrt{\zdisc(\ints K)}</script>. Furthermore, the following is a neat result <sup id="fnref:28"><a href="#fn:28" class="footnote">32</a></sup></p>
<blockquote>
<p>Theorem<br />
If <script type="math/tex">K=\Q(\sqrt d)</script> is a real quadratic number field, then<br /></p>
</blockquote>
<center>$$\begin{align*}
\ints K \simeq\frac{\Z[X]}{(X^2-D_KX+(D_K^2-D_K)/4)}\simeq\zadj{\frac{D_K+\sqrt{D_K}}2}
\end{align*}$$</center>
<div class="proof3">
Pf: Exercise to the reader. Just check both cases separately. $$\square$$
</div>
<p>Recall that solutions to Pell’s equations correspond to units of <script type="math/tex">\ints K</script>, so the goal of this section is to understand the structure of these units. These units form a multiplicative group <script type="math/tex">\ints K^\times</script> <sup id="fnref:29"><a href="#fn:29" class="footnote">33</a></sup> so we’ll “embed” these in <script type="math/tex">\R^2</script> in a way that makes use of this multiplicative structure. Define</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\ell: &\ints K^\times &\longrightarrow& \R^2\\
&u &\longmapsto& (\log|u|, \log|\conj u|)\\
\\
h: &(\R^\times)^2 &\longrightarrow& \R^2\\
&(x,y) &\longmapsto& (\log|x|, \log|y|)
\end{matrix} %]]></script>
<p>Note that <script type="math/tex">\ell=h\circ\iota\mid_{\ints K^\times}</script>. Furthermore, since <script type="math/tex">N(u)=u\conj u=\pm1</script> for any unit, <script type="math/tex">\iota(\ints K^\times)</script> lies on the hyperbola <script type="math/tex">xy=\pm1</script>, since <script type="math/tex">\log</script>’s turn multiplication into addition <sup id="fnref:30"><a href="#fn:30" class="footnote">34</a></sup> this means that <script type="math/tex">\ell(\ints K^\times)</script> lies on the line <script type="math/tex">x+y=0</script>. The picure looks something like</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/embed.jpg" width="700" height="100" />
</center>
<p>Now, stare at <script type="math/tex">\ell</script> long enough for you to be convinced that <script type="math/tex">\ker(\ell)=\{\pm1\}</script>. By the first isomorphism theorem, this means that</p>
<script type="math/tex; mode=display">\begin{align*}
\ell(\ints K^\times) \simeq \frac{\ints K^\times}{\{\pm1\}}
\end{align*}</script>
<p>Luckily for us, <script type="math/tex">\ints K^\times</script> is abelian <sup id="fnref:31"><a href="#fn:31" class="footnote">35</a></sup> so we can write</p>
<script type="math/tex; mode=display">\begin{align*}
\zmod2\oplus\ell(\ints K^\times) \simeq \{\pm1\}\oplus\ell(\ints K^\times) \simeq \ints K^\times
\end{align*}</script>
<p>Thus, once we figure out what <script type="math/tex">\ell(\ints K^\times)</script> is, we will perfectly understand the structure of units in real quadratic fields. As it turn out, this group will be infinite cyclic, but we’ll first show it’s discrete. I’ve already been kind of loose with this, but just keep in mind that <script type="math/tex">K</script> is of the form <script type="math/tex">\Q(\sqrt d)</script> for some square free <script type="math/tex">d\in\Z_{>1}</script>; I won’t always specify this.</p>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\ell(\ints K^\times)</script> is discrete.</p>
</blockquote>
<div class="proof3">
Pf: Fix any $$f\in\R_{\ge0}$$. We will show that $$\ell^{-1}([-r,r]^2)\subseteq\ints K^\times$$ is finite. Consider an arbitrary $$u\in\ints K^\times$$ s.t. $$\ell(u)\in[-r,r]^2$$. This means that $$(\log|u|,\log|\conj u|)\in[-r,r]^2$$ so $$|u|,|\conj u|\in[e^{-r},e^r]$$ which means that $$|u+\conj u|\le2e^r$$ and $$|u\conj u|\le e^{2r}$$. Now, notice that $$u$$ must satisfy the following equation<br />
<center>
$$\begin{align*}
X^2 - aX + b = 0 && a=u+\conj u\in\Z, b=u\conj u\in\Z
\end{align*}$$
</center>
There are only finitely many choices for $$a,b$$, and each choice corresponds to at most 2 such $$u$$, so there are only finitely many such $$u$$. $$\square$$
</div>
<p>Now, remember earlier when I said that <script type="math/tex">\ell(\ints K^\times)</script> lies on the line <script type="math/tex">x+y=0</script>? Well, the previous theorem says this it is a discrete subgroup of this line. This line is a 1-dimensional real vector space, and there aren’t many discrete subgroups of such a space. In fact, there are two: <script type="math/tex">\{0\}</script> and <script type="math/tex">\Z</script>. Thus, if we can show that <script type="math/tex">\ell(\ints K^\times)</script> has more than one element, then we will show this it must be <script type="math/tex">\Z</script> which will in turn show that Pell’s equations have infinitely many solutions. Even more than this, this will show that there exists an <script type="math/tex">\eps\in\ints K^\times</script> such that any unit has the form <script type="math/tex">\pm\eps^n</script> for some <script type="math/tex">n\in\Z</script>, so all solutions to Pell’s equations are generated from a single solution! Getting ahead of myself because this is all still conjecture at this point, we will call such an <script type="math/tex">\eps</script> a <strong>fundamental unit</strong> of <script type="math/tex">\ints K</script>.</p>
<p>The idea is to find to elements of <script type="math/tex">\ints K</script> that are equal in norm but not absolute value. Then, they must differ by a unit <script type="math/tex">u</script>, and <script type="math/tex">\ell(u)\neq0</script> (why?), so there’s some nonzero elment which means the group must be <script type="math/tex">\Z</script>, and we win.</p>
<p>When writing the below proof, I got kinda lost in the details. To help me remember what everything is, and what’s going on, I quickly put together the following image. It’s not labelled or anything, but it illustrates the <script type="math/tex">\alpha_\lambda</script> (the x-coordinate of the green point) we are going to find, and why we can bound it’s absolute value both above and below.</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/proof.jpg" width="400" height="100" />
</center>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\ell(\ints K^\times)\simeq\Z</script></p>
</blockquote>
<div class="proof3">
Pf: Fix a real number $$\lambda>0$$ and consider the box $$B=[-\lambda,\lambda]\times[-\sqrt{D_K}/\lambda,\sqrt{D_K}/\lambda]$$. It is symmetric about the origin, compact, convex, and has area $$4\sqrt{D_K}=2^2\vol_{\ints K}$$. Our time looking at the geometry of numbers tells us that this means there's a nonzero $$\alpha_\lambda\in\ints K$$ with $$\iota(\alpha_\lambda)\in B$$. Note that this means $$|\knorm(\alpha_\lambda)|\le\sqrt{D_K}$$ as the norm of $$\alpha_\lambda$$ is the product of the coordinates of $$\iota(\alpha_\lambda)$$. Furthermore, $$\knorm(\alpha_\lambda)\ge1$$ since it's a nonzero (rational) integer so $$\iota(\alpha_\lambda)$$ lies outside of the hyperbola $$|xy|=1$$. This hyperbola intersects $$B$$ at the top and bottom when $$|x|=\lambda/\sqrt{D_K}$$ so we've shown that<br />
<center>
$$\begin{align*}
\lambda/\sqrt{D_K} \le |\alpha_\lambda| \le \lambda
\end{align*}$$
</center>
as $$\alpha_\lambda$$ is the $$x$$-coordinate of $$\iota(\alpha_\lambda)\in B$$. Now pick another $$\lambda'>\lambda\sqrt{D_K}$$ and we can find some $$\alpha_{\lambda'}$$ with <br />
<center>
$$\begin{align*}
|\alpha_\lambda| \le \lambda < \lambda'/\sqrt{D_K} \le |\alpha_{\lambda'}|
\end{align*}$$
</center>
Now, we just keep on doing this, producing a sequence $$\{\alpha_{\lambda_n}\}_{n=1}^\infty\subset\ints K$$ with the property that $$|\alpha_{\lambda_i}|<|\alpha_{\lambda_j}|$$ whenever $$i < j$$, and $$|\knorm(\alpha_{\lambda i})|\le\sqrt{D_K}$$ for all $$i$$. Now, all these norms are bounded integers, so there's only finitely many possible distinct values among them, but there are infinitely many $$\alpha$$'s in our sequence. Thus, there must exist some $$i\neq j$$ such that $$\knorm(\alpha_{\lambda_i})=\knorm(\alpha_{\lambda_j})$$ but $$|\alpha_{\lambda_i}|\neq|\alpha_{\lambda_j}|$$ so $$\alpha_{\lambda_i}=u\alpha_{\lambda_j}$$ for some unit $$u\in\ints K^\times$$ not equal to $$\pm1$$. Thus, $$\ell(u)\neq0$$, and the theorem follows. $$\square$$
</div>
<p>I should mention by appealing to absolute value, the above proof implicitly fixes a choice of an embedding <script type="math/tex">K\hookrightarrow\R</script>. It doesn’t really matter which one is used, but worth noting what’s going on behind the scenes.</p>
<h1 id="pell-at-last">Pell at Last</h1>
<p>Well, we’ve gone over a lot, and if you’re still here, kudos to you <sup id="fnref:33"><a href="#fn:33" class="footnote">36</a></sup>, but we’re finally ready to actually solve Pell’s equations. Fix any square free <script type="math/tex">d\in\Z_{>1}</script>. Integer solutions to the equation <script type="math/tex">x^2-dy^2=1</script> are units of <script type="math/tex">\ints{\Q(\sqrt d)}</script>, and these units are all in the form <script type="math/tex">\pm\eps^n</script> for some fundamental unit <script type="math/tex">\eps</script>. In order to call this equation solved, we only need to find a fundamental unit. I’ll handle the case that <script type="math/tex">d\equiv2,3\pmod4</script>. The other case can be done analagously, and figuring out its details is left as an exercise.</p>
<p>Assume <script type="math/tex">d\equiv2,3\pmod4</script> and <script type="math/tex">\eps</script> is a fundamental unit of <script type="math/tex">K:=\Q(\sqrt d)</script>. Then, <script type="math/tex">-\eps,\eps^{-1}</script>, and <script type="math/tex">-\eps^{-1}</script> are all fundamental units as well <sup id="fnref:34"><a href="#fn:34" class="footnote">37</a></sup>. Write <script type="math/tex">\eps=a_1+b_1\sqrt d</script> with <script type="math/tex">a_1,b_1\in\Z^+</script>. We can always get positive coefficients by appropriately choosing one of the four fundamental units. Now let <script type="math/tex">\eps^k:=a_k+b_k\sqrt d</script> be the positive powers of <script type="math/tex">\eps</script> and note that <script type="math/tex">b_k=a_1b_{k-1}+b_1a_{k-1}</script> so the sequence <script type="math/tex">\{b_k\}</script> is increasing. Thus, if you want to find a fundamental unit, just guess and check. Start with <script type="math/tex">b_1=1</script> and check to see if <script type="math/tex">db_1^2\pm1</script> is a perfect square. If not, move on to <script type="math/tex">b_1=2</script> and repeat. Once you’ve found a value that works, write <script type="math/tex">nb_1^2\pm1=a_1^2</script> and your fundamental unit is <script type="math/tex">a_1+b_1\sqrt d</script>.</p>
<blockquote>
<p>Example<br />
Let <script type="math/tex">d=11</script>. If we take <script type="math/tex">b_1=1</script>, then <script type="math/tex">11b_1^2\pm1=\{10,12\}</script> so no good. If we take <script type="math/tex">b_1=2</script>, then <script type="math/tex">11b_1^2\pm1=\{45,43\}</script> so still no luck. Now we try <script type="math/tex">b_1=3</script> to get <script type="math/tex">11b_1^2\pm1=\{100,98\}</script> and we have a winner. Our fundamental unit is <script type="math/tex">10+3\sqrt{11}</script>. Indeed, <script type="math/tex">10^2-11*3^2=1</script> is a solution to Pell’s equation.</p>
</blockquote>
<blockquote>
<p>Example<br />
Now, take <script type="math/tex">d=2</script> instead. If we let <script type="math/tex">b_1=1</script>, then <script type="math/tex">2b_1^2\pm1=\{1,3\}</script> so our fundamental unit is <script type="math/tex">\eps=1+\sqrt 2</script>. However, this has norm <script type="math/tex">1-2=-1</script> so it’s not a solution to Pell’s equation. In cases like this, we instead focus our attention on <script type="math/tex">\eps^2=3+2\sqrt2</script> and use this to generate solutions.</p>
</blockquote>
<p>I’d like to say that’s everything, but I’ve left a few loose ends. These include what to do if <script type="math/tex">d</script> isn’t square free, and what about the case where <script type="math/tex">d\equiv1\pmod4</script> so the fundamental unit can have non-integer coefficients. Honestly, I wanted to take care of them myself, but this post became much longer than I anticipated, so I’ll leave them to you. I will say that they have similar resolutions. The main issue in both cases is that <script type="math/tex">\Z[\sqrt d]</script> may not be all of <script type="math/tex">\ints K</script>. However, it can be shown that in general, <script type="math/tex">\Z[\sqrt d]</script> has finite index in <script type="math/tex">\ints K</script>. This means in particular that it is still infinite cyclic (why?), and so we still can find a fundamental unit <script type="math/tex">\eps\in\Z[\sqrt d]^\times</script>. Then, solutions to Pell’s equation either correspond to powers of <script type="math/tex">\eps</script> or even powers of <script type="math/tex">\eps</script> depending on if <script type="math/tex">\knorm(\eps)=\pm1</script>.</p>
<div class="footnotes">
<ol>
<li id="fn:21">
<p>Just finished day one of writing this, and it looks like this will end up being my longest post yet by a sizeable amount. I could be wrong about the sizeable amount part (hard to tell), but either way, this post is gonna dethrone <a href="../surreal-numbers">the king</a>… Having just now finished writing this thing, something tells me it will hold the title of longest post for quite some time. In case anyone’s curious, the previous record holder had a little under 26,650 characters. This one has around 45,000. <a href="#fnref:21" class="reversefootnote">↩</a></p>
</li>
<li id="fn:19">
<p>I usually try not to have prereqs for gaining understanding from my posts, but for this one, I feel like you should at least be comfortable with linear algebra (and in particular abstract vector spaces and determinants), or you’ll likely be lost at some key points. Once we start talking about embeddings, a little bit of abstract algebra will help too (in particular, knowing about group homomorphisms) <a href="#fnref:19" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>this statement can be made precise and proven. I might do that if ever I come up with a good excuse to introduce computability theory on this blog. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>another way to do it is hinted at at the beginning of the second half of that post. You just need to use the fact that the norm N(a+bi)=a^2+b^2 of a Guassian integer is multiplicative. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:37">
<p>because I forgot it was possible for it not to be the case <a href="#fnref:37" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>I should mention that, as is not <a href="../fundamental-theorem">uncommon</a> in this blog, this post won’t necessarily present the simplest way to solve things, but instead opt for one that introduces interesting mathematics. Also, as always, minimal planning is done before I begin writing so almost certainly details will be missing or presented out of their usual order. It is up to the reader to reconstruct coherent arguments where this happens (it’s a good test of understanding) <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>from infinite to finite <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:38">
<p>some people will try to tell you that this is impossible, but do not be fooled. I believe in you, and I know you can let out a gasp if you so will it. <a href="#fnref:38" class="reversefootnote">↩</a></p>
</li>
<li id="fn:35">
<p>at this point, you might wonder why I didn’t just write the problem like this in the first place. That’s because the place I stole it from wrote it as y^2=x^3-2 originally as well. <a href="#fnref:35" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>if you don’t know what this is, it basically means that the fundamental theorem of arithmetic holds: every integer can be factored uniquely into primes <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>For example, let d^2=-5. Then, Z[d] is not a UFD since 2*3=(1+d)(1-d) and all of these are (different) primes <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>Not a technical term. What I mean is that if you have x,y in Z[-sqrt{2}], then x/y exists in Q(-sqrt{2}), and this is what I mean by their ambient quotient. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:36">
<p>If you write it as the product of two numbers, one of them is a unit <a href="#fnref:36" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>I’m gonna use some algrebraic words like field, ring, etc. in this section. If you don’t know what they are, that’s fine; I don’t think knowing their definition is technically required to understand how we’re gonna solve Pell’s equations. I’ll also try to include watered-down versions of what they mean for some of them. For example, a ring is a set with addition and multiplication. A field, has both of these plus division. Integers are a ring; fractions are a field. <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>A field extension K of Q (often written K/Q) is just a field K that contains Q (Ex. R is a field extension of Q). Every field extension has a degree d (written [K:Q]) which is a measure of much bigger K is than Q (formally speaking, if K/Q is a field extension, then K is a Q-vector space (why?) and the degree of K/Q is the Q-dimension of K as a vector space). We say a field extension is finite if it has finite degree (ex. Q(sqrt{-2}) is a finite field extension of Q (of degree 2) and hence a number field. R is and infinite field extension of Q and hence not a number field) <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
<li id="fn:11">
<p>Monic just means the leading coefficient is 1. The set Z[X] is just the set of polynomials with integer coefficients. An example of an algebraic integer is sqrt{-2} because it satisfied X^2 + 2 = 0 <a href="#fnref:11" class="reversefootnote">↩</a></p>
</li>
<li id="fn:12">
<p>That O is \mathscr and not \mathcal. This is important. <a href="#fnref:12" class="reversefootnote">↩</a></p>
</li>
<li id="fn:13">
<p>Hence the name quadratic <a href="#fnref:13" class="reversefootnote">↩</a></p>
</li>
<li id="fn:14">
<p>This is no coincidence, but we won’t get into the reason why here. <a href="#fnref:14" class="reversefootnote">↩</a></p>
</li>
<li id="fn:15">
<p>It’s true in general, but the general proof requires a property of the norm I won’t mention here since the norm isn’t always given as a nice product. We got lucky here that quadratic fields are in a sense “complete” (read: Galois) <a href="#fnref:15" class="reversefootnote">↩</a></p>
</li>
<li id="fn:22">
<p>In general, if L/K is an extension of number fields, then N_{L/K}(O_L) is a subset of O_K. In other words, the norm always maps algebraic integers into algebraic integers. <a href="#fnref:22" class="reversefootnote">↩</a></p>
</li>
<li id="fn:16">
<p>Although maybe you’ve already made the connection by now <a href="#fnref:16" class="reversefootnote">↩</a></p>
</li>
<li id="fn:17">
<p>If I’m wrong, we’ll introduce the other stuff as it pops up <a href="#fnref:17" class="reversefootnote">↩</a></p>
</li>
<li id="fn:18">
<p>Which reminds me, why don’t we consider the case d = 0 (mod 4) in the previous theorem? <a href="#fnref:18" class="reversefootnote">↩</a></p>
</li>
<li id="fn:20">
<p>Why not just call this dimension? Because lattices are free Z-modules, and free modules have rank instead of dimesnion. Modules are something I want to talk about on this blog at some point, but for now, just know that although lattices have geometric interpretations, modules (and even free modules) in general do not (unlike typical vector spaces), so we use the less geometric-sounding rank instead of dimension. <a href="#fnref:20" class="reversefootnote">↩</a></p>
</li>
<li id="fn:32">
<p>for our purposes, the plus in a circle is just another way of writing the direct product. It has the advantage of looking like addition which is good because the dimension of A x B is dim A + dim B instead of dim A * dim B. This notation helps hint at the idea that things should be thought of additively, so you might want to represent the pair (a,b) in A x B as a single value a+b instead (be warned. This is not always a legitimate alternate representation of pairs) <a href="#fnref:32" class="reversefootnote">↩</a></p>
</li>
<li id="fn:23">
<p>By numbers, I mean as in number theory. i.e. integers and their analouges <a href="#fnref:23" class="reversefootnote">↩</a></p>
</li>
<li id="fn:24">
<p>parrallelopiped? <a href="#fnref:24" class="reversefootnote">↩</a></p>
</li>
<li id="fn:25">
<p>meaning it contains all the points on its boundary <a href="#fnref:25" class="reversefootnote">↩</a></p>
</li>
<li id="fn:26">
<p>i.e. a function f that preserves addition and multiplication in the sense that f(ab)=f(a)f(b) and f(a+b)=f(a)+f(b) <a href="#fnref:26" class="reversefootnote">↩</a></p>
</li>
<li id="fn:27">
<p>there’s some abuse of notation going on here. In a number field, sqrt(d) is really just an abstract number whose square happens to be d whereas in the setting of real numbers (where there are two numbers matching this description) it is specifically the positive sqrt(d). <a href="#fnref:27" class="reversefootnote">↩</a></p>
</li>
<li id="fn:28">
<p>as far as I know, there isn’t some deep reason why this expression in particular works. It’s just a happy coincidence. I could be wrong though; coincidences are rare in math. <a href="#fnref:28" class="reversefootnote">↩</a></p>
</li>
<li id="fn:29">
<p>this basically just means if you multiply two units you get another unit, but the same doesn’t necessarily hold for addition <a href="#fnref:29" class="reversefootnote">↩</a></p>
</li>
<li id="fn:30">
<p>and since log(1)=0 <a href="#fnref:30" class="reversefootnote">↩</a></p>
</li>
<li id="fn:31">
<p>plus possibly an argument involving splitting O_K^x into “positive” and “negative” parts and mentioning phrases like <a href="https://www.wikiwand.com/en/Semidirect_product#/Relation_to_direct_products">semidirect product</a> (the argument I have in my head does this, but I’m pretty sure it’s overkill)… A different argument is to notice that O_K^x is an extension of {+/-1} and l(O_K^x). Hence, we get this conslusion by noticing letting N: O_K^x -> {+/-1} be the norm, the map a |-> N(a)*sign(a) is a <a href="http://www.math.uconn.edu/~kconrad/blurbs/linmultialg/splittingmodules.pdf">splitting map</a>. <a href="#fnref:31" class="reversefootnote">↩</a></p>
</li>
<li id="fn:33">
<p>no way I’d ever read this much math without losing interest and moving on to something else <a href="#fnref:33" class="reversefootnote">↩</a></p>
</li>
<li id="fn:34">
<p>if you fix an embedding into R, then the unique fundamental unit > 1 is often called <strong>the</strong> fundamental unit <a href="#fnref:34" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>I think this is going to end up being a long one 1, and possibly not the easiest post to follow that I’ve made; mostly because I will likely end up introducing a decent number of topics I haven’t talked about here before. I guess we’ll see how things turn out 2. Just finished day one of writing this, and it looks like this will end up being my longest post yet by a sizeable amount. I could be wrong about the sizeable amount part (hard to tell), but either way, this post is gonna dethrone the king… Having just now finished writing this thing, something tells me it will hold the title of longest post for quite some time. In case anyone’s curious, the previous record holder had a little under 26,650 characters. This one has around 45,000. ↩ I usually try not to have prereqs for gaining understanding from my posts, but for this one, I feel like you should at least be comfortable with linear algebra (and in particular abstract vector spaces and determinants), or you’ll likely be lost at some key points. Once we start talking about embeddings, a little bit of abstract algebra will help too (in particular, knowing about group homomorphisms) ↩Demystifying Emulators2017-07-12T00:00:00+00:002017-07-12T00:00:00+00:00https://nivent.github.io/blog/demystifying-emulators<p>I don’t remember when I first discovered emulators, but I remember thinking they were absolutely amazing. All of a sudden, I could play games from old systems I knew I would never own, replay games I had owned but lost, etc. I didn’t think much of them past what games I could play until I started getting more into serious about coding and realized that there must be some magic going on underneath the hood that’s making these things work. I tried imagining how they might work, but to no avail; they remained a black box. Because of that, I made it a goal of mine to write my own emulator one day, and as it turns out, <a href="https://github.com/NivenT/RGB">dreams do come true</a>.</p>
<p>Here, I want to talk a little bit about how emulators work <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, and about how they’re not magic after all. The basic idea behind emulators is surprisingly straightforward. They’re very aptly named in that they do exactly what they say; they emulate. An emulator works by reproducing in software the same functionality provided by the hardware. Then, when you feed that software the same input as you would the hardware, it does the same things, and you have a working game system right on your computer.</p>
<p>So the basic idea is to convert hardware into software, but what does that actually mean? I don’t really know a good single answer to that quation. Instead, it depends on the level of granularity you use. You could imagine you wanted to emulate a simple four-function calculator.</p>
<center><img src="https://www.staples-3p.com/s7/is/image/Staples/m001843923_sc7?$splssku$" width="250" height="250" /><br />image source: <a href="https://www.staples.com/Impecca-Standard-Function-Calculator-Black-Ivy/product_1480120">Stables</a></center>
<p>One way to do this would be to take a higher level approach where you let users type the corresponding keys on their keyboard, and implement the logic with a single <code class="highlighter-rouge">switch</code> statement. The workhorse of your code might look something like this<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">eval</span><span class="p">(</span><span class="n">lhs</span><span class="p">,</span> <span class="n">rhs</span><span class="p">,</span> <span class="n">op</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s">'+'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">+</span><span class="n">rhs</span><span class="p">,</span>
<span class="s">'-'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">-</span><span class="n">rhs</span><span class="p">,</span>
<span class="s">'/'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">/</span><span class="n">rhs</span><span class="p">,</span>
<span class="s">'*'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">*</span><span class="n">rhs</span><span class="p">,</span>
<span class="p">}[</span><span class="n">op</span><span class="p">]</span>
</code></pre></div></div>
<p>At this point, this doesn’t feel much like an emulator, but it’s doing the same thing the calculator does so I’d call it one. If you wanted a lower level, more granular approach, you might end up writing something like this to simulate to underlying logic circuits used by the calculator</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">addBits</span><span class="p">(</span><span class="n">lhs</span><span class="p">,</span> <span class="n">rhs</span><span class="p">):</span>
<span class="n">total</span> <span class="o">=</span> <span class="n">lhs</span> <span class="o">^</span> <span class="n">rhs</span>
<span class="n">carry</span> <span class="o">=</span> <span class="n">lhs</span> <span class="o">&</span> <span class="n">rhs</span>
<span class="k">return</span> <span class="p">(</span><span class="n">total</span><span class="p">,</span> <span class="n">carry</span><span class="p">)</span>
</code></pre></div></div>
<p>In this example, thinking as low as this is overkill, but that’s not always the case. You will end up writing code that looks more like this than the <code class="highlighter-rouge">dictionary</code> lookup above, but you have to be careful not to think too granularly. I remember when I first heard this idea of emulators being software ports of hardware, I immediately starting imagining how to go about making classes for things as low level as the system bus and logic gates, and how to piece these together to get a working emulator. Sure, you could do something like that and make things work, but you don’t have to stay that faithful to the hardware. Now that we have the general idea covered, let’s dive into a working example</p>
<h1 id="curly-succotash">curly-succotash</h1>
<p>One of the simplest systems you can emulate that is still non-trivial is the <a href="https://www.wikiwand.com/en/CHIP-8">Chip-8</a>. Hence, for the purpose of this blog, I wrote a <a href="https://github.com/NivenT/curly-succotash">sample Chip-8 emulator</a> to use as a reference. Before getting into the details, I should preface things by saying that this emulator is flawed in many ways. Aside from not being very user friendly, and possibly slow, it’s main issue is that it was written in <a href="https://www.haskell.org/">Haskell</a>. I chose Haskell because I wanted to learn the language, and this seemed like a decently sized project for that, but given that Haskell is functional and so tries to avoid things like state and sequential logic, it’s not the ideal language for an emulator<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>. Also, Haskell isn’t super popular, and doesn’t have to most readable syntax to someone who doesn’t know it. That being said, let’s get started.</p>
<h1 id="starting-off">Starting Off</h1>
<p>With emulators, I like to think of the project as writing a CPU. Every other part of the machine is just a peripheral to help make sure the CPU is working properly. This mindset, while not always accurate, helps me focus my efforts and gives me a starting point from where to branch off other parts of the project. So we need to build a Chip-8 CPU. To start with that, we better make sure we have everything that the CPU needs to interact with so we can implement all its instructions. A quick Google search reveals that <a href="https://www.wikiwand.com/en/CHIP-8">Wikipedia has all the information we’ll need on one page</a>, so we can go there to see the various registers and whatnot that make up our system. I chose to emulate each component like so</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Chip8</span> <span class="o">=</span> <span class="kt">Chip8</span> <span class="p">{</span>
<span class="n">mem</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Word8</span><span class="p">],</span> <span class="c1">-- 4096 1-byte address</span>
<span class="n">regs</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Word8</span><span class="p">],</span> <span class="c1">-- 16 registers</span>
<span class="n">stack</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Int</span><span class="p">],</span> <span class="c1">-- 16(?) levels of addresses</span>
<span class="n">ptr</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span> <span class="c1">-- I register (usually stores memory address)</span>
<span class="n">pc</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">sp</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">delay_timer</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">sound_timer</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">keys</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Bool</span><span class="p">],</span> <span class="c1">-- 16 keys</span>
<span class="n">screen</span> <span class="o">::</span> <span class="kt">Map</span><span class="o">.</span><span class="kt">Map</span> <span class="p">(</span><span class="kt">Int</span><span class="p">,</span> <span class="kt">Int</span><span class="p">)</span> <span class="kt">Bool</span> <span class="c1">-- 64x32 pixels</span>
<span class="p">}</span> <span class="kr">deriving</span> <span class="p">(</span><span class="kt">Show</span><span class="p">)</span>
</code></pre></div></div>
<ul>
<li>The memory is just an array of values, and addresses are represented implicitly by index.</li>
<li>A register is just some value; it’s completely determined by the number it stores.</li>
<li>The simplest representation of a key is just a Bool saying whether it’s pressed or not. It’s also possible to explicitly store the mapping between computer keys and Chip-8 keys here, but I generally prefer to keep that separate.</li>
<li>The screen is a map from pairs of ints (pixel coordinates) to bools (pixel values: on/off). I originally had this as a 2d list, but that made some of the code more complicated than need be, so I changed it to this. Normally, a 2d array would be fine, but Haskell<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>.</li>
</ul>
<p>Once you have this framework for the pieces of the emulator in place, you can start making it do stuff <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>. In practice, this means implementing cpu instructions. Things like adding registers together or loading a value from memory into a register and stuff like that. The majority of instructions are fairly simple. Unfortunately, there are usually a lot of them which makes it easy to make mistakes, and boring to implement. Because of that, I like to only write a few at a time. I’ll implement a couple, run the code and have it throw an error whenever it comes accross an instruction it doesn’t know; then I’ll implement that instruction, rinse, wash, and repeat. This cycle makes sure you’re always working towards a working product, and let’s you catch other bugs early on.</p>
<p>One common bug is to mess up the initial state of the system. This will manifest itself in your emulator trying to execute the opcode for an instruction that doesn’t exist. I ran into that issue <a href="https://github.com/NivenT/curly-succotash/blob/d6ad6144acd739b8e7ab113cc38cbd24a4978161/emu.hs#L85">here</a> when I first tried implementing game loading. I had memory laid out like</p>
<script type="math/tex; mode=display">% <![CDATA[
\newcommand{\x}{\text{x}}
\begin{array}{| c | c | c |}
0\x0 & 0\x1 & 0\x2 & \dots & 0\x4F & 0\x50 & 0\x51 & 0\x52 & \dots & S & S+1 & S+2 & \dots\\\hline
f_0 & f_1 & f_2 & \dots & f_{79} & g_0 & g_1 & g_2 & \dots & 0 & 1 & 2 & \dots\\
\hline
\end{array} %]]></script>
<p>where <script type="math/tex">f_0,f_1,f_2,\dots,f_{79}</script> is the font data <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>, <script type="math/tex">g_0,g_1,g_2,\dots,g_N</script> is the game data, and <script type="math/tex">S=0\x50+N+1</script> is the beginning of memory after all game data.</p>
<p>This is wrong for two reasons. First of all, the Chip-8 begins executing instructions from memory address <script type="math/tex">0\x200</script> so that’s where game data should begin. Secondly, the unused locations (<script type="math/tex">S</script> and beyond) should be populated with all <script type="math/tex">0</script>’s. I caught this bug because earlier versions of the emulator would try to execute a non-existent instruction. If I had implemented all instructions before testing and only seen the issue then, I would have assumed the issue was with an incorrectly implemented instruction and not been able to fix things as quickly.</p>
<p>The <a href="https://github.com/NivenT/curly-succotash/blob/master/emu.hs#L110">correct memory layout</a> is</p>
<script type="math/tex; mode=display">% <![CDATA[
\newcommand{\x}{\text{x}}
\begin{array}{| c | c | c |}
0\x0 & 0\x1 & \dots & 0\x4F & 0\x50 & 0\x51 & \dots & 0\x200 & 0\x201 & \dots & T & T+1 & \dots\\\hline
f_0 & f_1 & \dots & f_{79} & 0 & 0 & \dots & g_0 & g_1 & \dots & 0 & 0 & \dots\\
\hline
\end{array} %]]></script>
<p>where <script type="math/tex">T=0\x200+N+1</script> takes the role of <script type="math/tex">S</script>.</p>
<p>For actually implementing instructions, you need some kind of mapping from opcodes to the execution of the instruction itself. One way of doing this that works fairly well is to just use a <code class="highlighter-rouge">switch</code> statement like in the calculator emulator we started with. The nice thing about this is that, if done right <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>, it’s simple, efficient <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>, and <a href="https://github.com/NivenT/curly-succotash/blob/master/op.hs#L81">works</a>. However, it can also be a pain to maintain a giant switch statement, so you might want something more sophisticated. One things you can do is use a <a href="https://github.com/NivenT/RGB/blob/master/src/emulator/instructions.rs#L542">struct</a> to hold basic information about instructions that can replace you having to re-lookup all the details hidden in your switch statement <a href="https://github.com/NivenT/RGB/blob/master/src/emulator/emulator.rs#L326">when things go wrong</a>.</p>
<h1 id="aside-on-switch-statements">Aside on Switch Statements</h1>
<p>This section is pretty unrelated to the rest of the post, but is something interesting I wanted to talk about. I mentioned in a footnote that <code class="highlighter-rouge">switch</code> statements can be faster than a series of <code class="highlighter-rouge">if</code> and <code class="highlighter-rouge">else if</code>’s. This may seem counterintuitive because the two appear to be doing functionaly equivalent things, and the obvious way to implement a switch statement is with if and else if. Take a look at this code</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">use_switch</span><span class="p">(</span><span class="kt">int</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="k">switch</span><span class="p">(</span><span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">0</span><span class="p">:</span> <span class="n">func0</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="n">func1</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="n">func2</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">3</span><span class="p">:</span> <span class="n">func3</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="nl">default:</span> <span class="n">func4</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">use_if</span><span class="p">(</span><span class="kt">int</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func0</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func1</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func2</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func3</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">func4</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>A priori these two functions should be carried out the same way, but the difference is that using a switch statement means you are comparing <code class="highlighter-rouge">int</code>s to decide what to do whereas an if statement could rely on any condition. This may not seem like much, but in practice it means that <code class="highlighter-rouge">use_switch</code> can be implemented with an array of function pointers instead of a series of ifs. Perhaps more clearly, it could expand to something like this<sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup></p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">FunctionPtr</span><span class="p">)();</span>
<span class="kt">void</span> <span class="nf">switch_expanded</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="n">FunctionPtr</span> <span class="n">arr</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="n">func0</span><span class="p">,</span> <span class="n">func1</span><span class="p">,</span> <span class="n">func2</span><span class="p">,</span> <span class="n">func3</span><span class="p">,</span> <span class="n">func4</span><span class="p">};</span>
<span class="n">arr</span><span class="p">[</span><span class="n">val</span> <span class="o"><</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">val</span> <span class="o">:</span> <span class="mi">4</span><span class="p">]();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Notice that these only makes use of a single condition but is functionally equivalent to the switch statement. As the number of cases in the switch grows, you can still only need to check a single condition whereas you’d (potentially) need to check all of them with if statements. Hence, you could think of the improvement of <code class="highlighter-rouge">switch</code> over <code class="highlighter-rouge">if</code> as being <script type="math/tex">O(1)</script> versus <script type="math/tex">O(n)</script>, but things aren’t this simple in practice.</p>
<h1 id="seeing-stuff--more-than-a-cpu">Seeing Stuff + More than a CPU</h1>
<p>Back to emulators… Ultimately, you want your emulator to do more than execute instructions. Specifically, from most to least important, you want it to display things, get user input, and produce sound. Displaying things can be tricky. Representing the state of the screen internally in the emulator and showing it to a user are two very different things. The best way to handle this I’ve found is to use OpenGl for rendering. You can draw a single rectangle that covers your entire screen and then give it a texture made from data from your emulator. I do not do that in curly-succotash because I could not figure out how to do textures with Gloss, and it seemed like overkill for what was supposed to be a simple project, but <a href="https://github.com/NivenT/RGB/blob/master/src/rendering.rs#L57">here’s a place I do do that</a>. Getting to the point that your emulator displays anything intelligible is a major step; displaying things correctly can be one of the harder parts to get right. As an example, <a href="https://github.com/NivenT/curly-succotash/blob/master/op.hs#L81">look at the relative complexity of the draw_sprite function to every other instruction in Chip-8</a>. User input is usually much more tame, and I have not yet figured out a good way to do sound.</p>
<p>With Chip-8, that’s basically all there is to it. Once you’ve implemented CPU instructions, the user can press buttons, and it beeps, you’re done. You can have a functioning emulator in only a few hundred lines of code, and no magic. If you tackle a larger system, then there’s more to do. You might have an actual GPU separate from the CPU, have more complicated timers or interrupts, different kinds of memory storage schemes, etc. However, in the end, it’s all the same thing: you’re just trying to get familiar enough with a system to be able to translate what it does to code.</p>
<h1 id="last-words">Last words</h1>
<p>Something you don’t want to do with an emulator is render the screen by manually drawing the individual pixels. This is needlessly slow and (potentially) pixelated when you could get better, faster results by just using a single rectangle + a texture. An example of what not to do would be</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">square</span> <span class="o">::</span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Picture</span>
<span class="n">square</span> <span class="n">r</span> <span class="n">c</span> <span class="o">=</span> <span class="kt">Polygon</span> <span class="o">$</span> <span class="n">map</span> <span class="n">f</span> <span class="p">[(</span><span class="n">r</span><span class="p">,</span><span class="n">c</span><span class="p">),</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">c</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">c</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">c</span><span class="p">)]</span>
<span class="kr">where</span> <span class="n">f</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">800.0</span><span class="o">/</span><span class="mf">64.0</span> <span class="o">*</span> <span class="p">(</span><span class="n">fromIntegral</span> <span class="n">c</span><span class="p">)</span> <span class="o">-</span> <span class="mf">400.0</span><span class="p">,</span> <span class="mf">600.0</span><span class="o">/</span><span class="mf">32.0</span> <span class="o">*</span> <span class="p">(</span><span class="n">fromIntegral</span> <span class="n">r</span><span class="p">)</span> <span class="o">-</span> <span class="mf">300.0</span><span class="p">)</span>
<span class="n">render_emu</span> <span class="o">::</span> <span class="kt">Chip8</span> <span class="o">-></span> <span class="kt">Picture</span>
<span class="n">render_emu</span> <span class="n">emu</span> <span class="o">=</span> <span class="n">pictures</span> <span class="o">.</span> <span class="kt">Map</span><span class="o">.</span><span class="n">foldrWithKey</span> <span class="n">draw_pixel</span> <span class="kt">[]</span> <span class="o">$</span> <span class="n">screen</span> <span class="n">emu</span>
<span class="kr">where</span> <span class="n">draw_pixel</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span> <span class="n">v</span> <span class="n">lst</span> <span class="o">=</span> <span class="kr">if</span> <span class="n">v</span> <span class="kr">then</span> <span class="p">(</span><span class="kt">Color</span> <span class="n">white</span> <span class="o">$</span> <span class="n">square</span> <span class="p">(</span><span class="mi">31</span><span class="o">-</span><span class="n">r</span><span class="p">)</span> <span class="n">c</span><span class="p">)</span><span class="o">:</span><span class="n">lst</span> <span class="kr">else</span> <span class="n">lst</span>
</code></pre></div></div>
<p>One thing that is helpful is comments. Naturally with an emulator, there will be things you have to do because of something specific requirement or implementation detail of the system you’re recreating. It’s really easy to forget these details later one, so it helps to remind yourself of them as you move forward.</p>
<p>Finally, remember not to take things too literally because that can complicate how you do stuff. When you see an instruction or component you need to implement, focus more on replicating its behavior than its hardware implementation or the wording used to describe it. An example of this is instruction <script type="math/tex">0\text{x}\text F00\text A</script> which in the Chip-8 which waits until a key is pressed, then stores that key in register <script type="math/tex">\text V0</script>. It’s easy to think of doing this with a while loop or something like that, but that blocks all other parts of your code from running (something more complicated than Chip-8 could have <a href="https://github.com/NivenT/RGB/blob/master/src/emulator/emulator.rs#L230">parts that run independently from the CPU</a>) and is needlessly complicated. Instead, you can implement this waiting behaviour with something a little more clever<sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup></p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- 0xFX0A Wait for a key press, then store key in VX (note: blocking operation)</span>
<span class="o">|</span> <span class="p">(</span><span class="o">.&.</span><span class="p">)</span> <span class="n">op</span> <span class="mh">0xf0ff</span> <span class="o">==</span> <span class="mh">0xf00a</span> <span class="o">=</span> <span class="kr">case</span> <span class="n">get_key</span> <span class="n">ks</span> <span class="kr">of</span>
<span class="kt">Just</span> <span class="n">i</span> <span class="o">-></span> <span class="kt">Left</span><span class="p">(</span><span class="n">rng</span><span class="p">,</span> <span class="n">emu</span><span class="p">{</span><span class="n">regs</span><span class="o">=</span><span class="n">rpl_nth</span> <span class="n">rs</span> <span class="n">x</span> <span class="o">$</span> <span class="n">fromIntegral</span> <span class="n">i</span><span class="p">})</span>
<span class="kt">Nothing</span> <span class="o">-></span> <span class="kt">Left</span><span class="p">(</span><span class="n">rng</span><span class="p">,</span> <span class="n">emu</span><span class="p">{</span><span class="n">pc</span><span class="o">=</span><span class="n">p</span><span class="o">-</span><span class="mi">2</span><span class="p">})</span>
</code></pre></div></div>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>I should probably mention that I’m no expert on emulators or really anything I’m gonna talk about in this post, so do with my advice what you will. Also, some of the advice I give is stuff I learned (i.e. stole) from other people. I’m too lazy to credit them everywhere I do it. Just know that if something I say seems like a well-thought out good idea, it probably wasn’t mine. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Ignore the fact that this calculator has a decimal point button, or a percent button, or whatever that thing in the top left is <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>And given that it’s not super popular, it’s not an ideal language for a blog post. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Probably a way to make it work w/o ugly code, but I’m still new <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>It’s common, for me at least, to be missing pieces the first time you do this. However, that’s fine. You’ll realize that you’re missing stuff when a CPU instruction requires it or later when trying to flesh out other parts of the emulator in the case of more complicated systems. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>The Chip-8 has 16 predefined sprites that are the hex digits <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>curly-succotash does not do this right. A better way to do it would be to switch on the first (hexadecimal) digit of the opcode, and then nest more switch statements if needed (at most 4 deep, but really only like 2 deep in practice) <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>It’s not uncommon for switch statements to be faster than sequences of if’s and else if’s <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>I made val unsigned to not have to deal with negative issues. It doesn’t really affect anything <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Just decrement the program counter so this same instruction gets executed next timestep <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>I don’t remember when I first discovered emulators, but I remember thinking they were absolutely amazing. All of a sudden, I could play games from old systems I knew I would never own, replay games I had owned but lost, etc. I didn’t think much of them past what games I could play until I started getting more into serious about coding and realized that there must be some magic going on underneath the hood that’s making these things work. I tried imagining how they might work, but to no avail; they remained a black box. Because of that, I made it a goal of mine to write my own emulator one day, and as it turns out, dreams do come true.Fundamental Theorem of Algebra2017-06-18T18:12:00+00:002017-06-18T18:12:00+00:00https://nivent.github.io/blog/fundamental-theorem<p>One of the first “theorems” I heard about was The Fundamental Theorem of Algebra, and I remember being kind of drawn to it for a long time after first seeing it. I think this was less because of the statement of the theorem itself, and more because the word fundamental in its title made it seem really important and imposing <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Either way, I was convinced for a long time that it was somehow a mysterious theorem, that although easy to state, must have one of those impossible to understand, complicated proofs; the kind of thing that’s proved once via a lot of effort, and then is just applied afterwards without many people wanting to return to the proof because it’s just that out there. Despite this, my fascination with it made me determined to see and understand its proof once I became really good at/knowledgable of math. Luckily for me, I was wrong. The proof of the theorem is not arcane. In fact, there are <a href="https://www.amazon.com/Fundamental-Theorem-Algebra-Undergraduate-Mathematics/dp/0387946578">many</a> proofs of it, some of which even I can understand.</p>
<p>Before getting into a proof, let’s quickly state the theorem and then move on</p>
<blockquote>
<p>The Fundamental Theorem of Algebra<br />
If <script type="math/tex">p(x)=a_nx^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0</script> is any polynomial with complex coefficients, then <script type="math/tex">p(x)</script> has a zero in <script type="math/tex">\mathbb C</script>.</p>
</blockquote>
<h1 id="intro-to-mathbb-c">Intro to <script type="math/tex">\mathbb C</script></h1>
<p>If the idea of number <script type="math/tex">i</script> such that <script type="math/tex">i^2=-1</script> doesn’t frighten or anger you, then skip this section. If not, I’m going to somewhat quickly try to convince you that this is ok.</p>
<p>One way to think of complex numbers is to view them as a way of doing geometry via arithmetic. Let’s say, for example, you are making a 2D game, and in this game you probably want to keep track of positions of different objects, so you represent positions as points in the plane. Each object has some position <script type="math/tex">(x,y)\in\mathbb R^2</script>. You probably want objects to move, so along with a position, every object needs some velocity <script type="math/tex">(dx,dy)\in\mathbb R^2</script>. Now, you can move objects by adding their velocity to their position, so after one timestep their position becomes <script type="math/tex">(x+dx,y+dy)</script>. Simple enough. Objects in your game also rotate around each other<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>. Originally, you might handle this by having some angle <script type="math/tex">\theta</script> of rotation for an object, and then updating its poition via some complicated formula involving <script type="math/tex">\sin</script> and <script type="math/tex">\cos</script>. This is kinda messy, but then you remember how well representing things as points worked for moving things around before, and so you store rotations as a point <script type="math/tex">(\cos\theta,\sin\theta)</script> on the unit circle. You then need some operation <script type="math/tex">\cdot</script> such that <script type="math/tex">(x,y)\cdot(\cos\theta,\sin\theta)</script> gives the rotation of <script type="math/tex">(x,y)</script> (about the origin. To rotate about a different point, you just translate, rotate, then translate back). Once you do this, you’ll likely want to extend <script type="math/tex">\cdot</script> such that <script type="math/tex">(x,y)\cdot(a,b)</script> makes sense for all points in the plane, and not just onces where <script type="math/tex">(a,b)</script> is on the unit circle. Motivated by the fact that <script type="math/tex">5*(x,y)</script> scales <script type="math/tex">(x,y)</script> by a factor of 5, and <script type="math/tex">0.5*(x,y)</script> scales it by a factor of <script type="math/tex">1/2</script>, you say that <script type="math/tex">(x,y)\cdot(a,b)</script> rotates <script type="math/tex">(x,y)</script> by the angle <script type="math/tex">(a,b)</script> makes with the <script type="math/tex">x</script>-axis, and then scales us by the distance of <script type="math/tex">(a,b)</script> from the origin.</p>
<center><img src="https://nivent.github.io/images/blog/fund-theorem/mult.jpeg" width="400" height="100" /></center>
<p>This turns out to be pretty useful because it lets you combine two transformations into one, and this <script type="math/tex">\cdot</script> operation plays really nicely with adding points. In fact, if you do the math to work things out, you will see that <script type="math/tex">(x,y)\cdot(a,b)=(xa-yb,xb+ay)</script> which means that <script type="math/tex">(x,0)\cdot(y,0)=(xy,0)</script> so the <script type="math/tex">x</script>-axis is really just the real number line, and <script type="math/tex">(0,1)\cdot(0,1)=(-1,0)</script> so you have a number whose square is <script type="math/tex">-1</script>! By trying to create an arithmetic that allows us to do geometric transformations, we naturally find ourselves actually manipulating complex numbers where <script type="math/tex">a+bi\leftrightarrow(a,b)</script>. I probably should mention that complex number actually usually aren’t used for rotations and such in 2D games, but an extension of them called <a href="https://www.wikiwand.com/en/Quaternion">quaternions</a> are used for rotations in 3D games.</p>
<p>If that’s not convincing, then another perspective on complex numbers is that you are really just doing clock arithmetic when you work with them. When doing math with time, you wrap around every 12 (or 24) hours, so you are really just treating 12 as if it were 0, and then doing normal math (Ex. <script type="math/tex">4+10=14=12+2=2</script> so <script type="math/tex">10</script> hours past <script type="math/tex">4</script> is <script type="math/tex">2</script>). With complex numbers, you are doing something similar. You are doing normal math with polynomials (with real coefficients), except you treat the polynomial <script type="math/tex">x^2+1</script> as being zero. So, for example, when you say <script type="math/tex">(3+4i)(5-2i)=23+14i</script>, this is really because</p>
<script type="math/tex; mode=display">\begin{align*}
(3+4x)(5-2x) = 15 + 14x - 8x^2 = 15 + 14x - 8(x^2+1) + 8 = 23 + 14x
\end{align*}</script>
<p>Symbolically, in case you’ve studied some abstract algebra but not seen this,</p>
<script type="math/tex; mode=display">\begin{align*}
\mathbb C\simeq\frac{\mathbb R[x]}{(x^2+1)}
\end{align*}</script>
<h1 id="definitions-and-junk">Definitions and Junk</h1>
<p>Now that we have that out of the way, before moving on to the proof itself, we need to setup some notation, definitions, and lemmas, so let’s get to that. In the below definitions, <script type="math/tex">X</script> is an arbitrary subset of <script type="math/tex">\mathbb C</script>.</p>
<blockquote>
<p>Definition<br />
A <strong>path</strong> is a continuous function <script type="math/tex">f:[0,1]\rightarrow X</script>. Furthermore, if we have that <script type="math/tex">f(0)=f(1)</script>, then we call <script type="math/tex">f</script> a <strong>loop</strong> based at <script type="math/tex">f(0)</script>.</p>
</blockquote>
<p>An important thing to know about paths is that you can compose them. If you have two paths <script type="math/tex">f,g:[0,1]\rightarrow X</script> where <script type="math/tex">f(1)=g(0)</script>, then you can form a new path <script type="math/tex">g\cdot f:[0,1]\rightarrow X</script> where you first do <script type="math/tex">f</script>, then do <script type="math/tex">g</script>. In order to keep the domain <script type="math/tex">[0,1]</script>, you have to traverse <script type="math/tex">f</script> and <script type="math/tex">g</script> at twice the normal speed, but that’s really just a technicality<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<p>Note that for some reason I think in terms of paths more easily than I do in terms of loops, so although we’ll be dealing exclusively with loops here, I will often forget and say path instead.</p>
<blockquote>
<p>Notation<br />
Let <script type="math/tex">S^1=\{z\in\mathbb C:|z|=1\}</script> be the unit circle in the complex plane</p>
</blockquote>
<p>Quick remark: Notice that there is a 1-1 correspondence between loops and continuous circle functions <script type="math/tex">f:S^1\rightarrow X</script> since a circle is really just a line segment with its endpoints glued together. I may end up switching between these two perspectives during this post.</p>
<p>The proof of the fundamental theorem we’ll present is pretty dependent on loops. The basic idea is that if you have a polynomial without a zero, then you can find a “constant loop that circles the origin multiple times”. I use quotes because this is not exactly what we’ll show, but basically it. In either case, if a loop is constant it doesn’t move so there’s no way it could circle the origin even once, and so contradiction. We need a mathematically precise way of defining what it means to “circle the origin multiple times”, and for that, we’ll use a little homotopy theory.</p>
<blockquote>
<p>Definition<br />
Given two paths <script type="math/tex">f:[0,1]\rightarrow X</script> and <script type="math/tex">g:[0,1]\rightarrow X</script> with the same basepoints (i.e. <script type="math/tex">f(0)=g(0)</script> and <script type="math/tex">f(1)=g(1)</script>), a <strong>homotopy</strong> <script type="math/tex">H:[0,1]\times[0,1]\rightarrow X</script> from <script type="math/tex">f</script> to <script type="math/tex">g</script> is a continuous<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> function such that <script type="math/tex">H(t,0)=f(t)</script> and <script type="math/tex">H(t,1)=g(t)</script> for all <script type="math/tex">t\in[0,1]</script>, and <script type="math/tex">H(0,s)=f(0)</script> and <script type="math/tex">H(1,s)=f(1)</script> for all <script type="math/tex">s\in[0,1]</script>. If there exists a homotopy <script type="math/tex">H</script> from <script type="math/tex">f</script> to <script type="math/tex">g</script>, then we say <script type="math/tex">f</script> and <script type="math/tex">g</script> are <strong>homotopy equivalent</strong>, and denote this <script type="math/tex">f\sim g</script>.</p>
</blockquote>
<p>You can think of a homotopy as a continuous deformation from one path into the other. Something like this</p>
<center><img src="https://nivent.github.io/images/blog/fund-theorem/homotopy.gif" width="250" height="100" /></center>
<blockquote>
<p>Remark<br />
One important example of a homotopy is the one depicted above. This is the so-called <strong>staright line homotopy</strong>, and is the result of thinking of your paths as points and then drawing a line between them. For <script type="math/tex">f,g:[0,1]\rightarrow X</script> paths between the same points, you can define <script type="math/tex">H(t,s)=(1-s)f(t) + sg(t)</script>. This is almost always continuous.</p>
</blockquote>
<blockquote>
<p>Question<br />
When does the straight line homotopy fail to be a homotopy?</p>
</blockquote>
<blockquote>
<p>Exercise<br />
Show that homotopy equivalence is an equivalence relation.</p>
</blockquote>
<p>In this upcoming section, we’ll apply homotopy to loops to see that every loop around a circle has a well-defined number of times it goes around. This will then lead us to the proof of the theorem.</p>
<h1 id="circles-and-degrees">Circles and Degrees</h1>
<p>Here, we will study loops <script type="math/tex">f:[0,1]\rightarrow S^1</script> around the unit circle. In general, these things can behave annoyingly by stopping in place, backtracking, etc. so to get a handle on them, we’ll homotope all our paths into nice loops. To that end, let <script type="math/tex">\omega_n:[0,1]\rightarrow S^1</script> be the path <script type="math/tex">\omega_n(t)=e^{2t\pi in}</script> that goes around the unit circle <script type="math/tex">n</script> times where we made use of Euler’s formula.</p>
<p>Our goal is to show that any loop <script type="math/tex">f:[0,1]\rightarrow S^1</script> is homotopic to exactly one “nice” loop <script type="math/tex">\omega_n</script>. We will then let the degree of <script type="math/tex">f</script> be <script type="math/tex">\deg f=n</script>, and this will be our characterization of the number of times <script type="math/tex">f</script> travels around the unit circle <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>. In order to do this, we’ll make use of a special function<sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup></p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
p: &\mathbb R &\longrightarrow &S^1\\
&r &\longmapsto &\cos(2\pi r)+i\sin(2\pi r)
\end{matrix} %]]></script>
<p>What makes this function special is that is allows us to “lift” loops in <script type="math/tex">S^1</script> up to paths in <script type="math/tex">\mathbb R</script>. This function is far from injective, but it maps every unit interval in <script type="math/tex">\mathbb R</script> around the circle in a “nice” way. If we look at any (connected) neighborhood around a point on our circle, there are many disjoint copies of that neighborhood in <script type="math/tex">\mathbb R</script> that get mapped into it by <script type="math/tex">p</script>. This means that <script type="math/tex">p</script> in some sense has multiple local inverses of any neighborhood in <script type="math/tex">S^1</script>. These local inverses are what allow us to lift loops up to <script type="math/tex">\mathbb R</script>. Specifically,<sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup></p>
<blockquote>
<p>Lemma<br />
For any path <script type="math/tex">f:[0,1]\rightarrow S^1</script>, there exists a unique <strong>lift</strong> <script type="math/tex">\tilde f:[0,1]\rightarrow\mathbb R</script> such that <script type="math/tex">p\circ\tilde f=f</script> and <script type="math/tex">\tilde f(0)=0</script>.</p>
</blockquote>
<div class="proof2">
Pf: Let \(f:[0,1]\rightarrow S^1\) be a path. The remark I made above on local inverses can be said more formally as this: for any point \(x\in S^1\), there exists a neighborhood \(N\) of \(x\), called an elementary neighborhood, such that each path component of \(p^{-1}(N)\) is mapped homeomorphically onto \(N\). Let \(\{U_i\}_{i\in I}\) be a collection of elementary neighborhoods that cover \(S^1\), so \(\{f^{-1}(U_i)\}_{i\in I}\) is an open cover of the compact metric space \([0,1]\), which means it has some finite subcover \(\{V_j\}_{j=1}^n\subseteq\{f^{-1}(U_i)_{i\in I}\}\). Furthermore, it is a fact that I will not prove here that you can find a natural \(m\in\mathbb N\) such that each of the images \(f([0,1/m]),f([1/m,2/m]),\dots,f([(m-1)/m,1])\) is completely contained in some elementary neighborhood \(W_j\). To simplify notation, let \(x_j=j/m\) and \(I_j=[x_j,x_{j+1}]\). Now, we can lift \(f\) by lifting it piece by piece. For each \(j\in\{0,1\dots,m-1\}\), we can form a unique path \(g_j:I_j\rightarrow\mathbb R\) such that \(p\circ g_j=f\mid_{I_j}\) since \(f(I_j)\) is contained in an elementary neighborhood \(W_j\) which has exactly one "local inverse" \(V_j\) containing \(g_{j-1}(x_j)\) and so contains a unique path beginning at \(g_{j-1}(x_j)\) that lifts \(f\mid_{I_j}\) contained in \(U_j\). Thus, our unique lift of \(f\) is \(\tilde f=g_{m-1}g_{m-2}\dots g_2g_1g_0\). \(\square\)
</div>
<p>That may not have been put perfectly clearly because it’s a proof that is best digested with accompanying visuals, but I am not going through the trouble of making some. One thing I did not make explicit is that we take <script type="math/tex">g_{-1}(x_0)=0</script> in order to comply with <script type="math/tex">\tilde f(0)=0</script>. Another thing to keep in mind is that we form our lift by breaking the path up into small pieces, lifting those, then joining them together. If we get a piece <script type="math/tex">I_j</script> of our path small enough to be contained in an elementary neighborhood, then the fact that it has one local inverse containing the point our path left off at means there is a unique way to extend the path. This follows from the fact that each local inverse (i.e. path component) is mapped homeomorphically onto <script type="math/tex">W_j</script>, so there’s a unique lift for everything.</p>
<p>For the purpose of the secion, let <script type="math/tex">1+0i</script> be a distingueshed point in the sense that all loops around the circle begin and end there.</p>
<blockquote>
<p>Lemma<br />
Let <script type="math/tex">f:[0,1]\rightarrow S^1</script> be a loop based at <script type="math/tex">1</script>, and let <script type="math/tex">\tilde f</script> be its unique lift. Then, <script type="math/tex">\tilde f(1)</script> is an integer. We call this integer the degree of <script type="math/tex">f</script>.</p>
</blockquote>
<div class="proof2">
Pf: \(p(\tilde f(1))=f(1)=1\) so \(\tilde f(1)\in p^{-1}(1)=\mathbb Z\). \(\square\)
</div>
<p>If you notice, I just redefined degree, so we better hope these definitions are equivalent. Clearly, <script type="math/tex">\deg\omega_n=n</script> since <script type="math/tex">\tilde\omega_n</script> is just a straight path from <script type="math/tex">0</script> to <script type="math/tex">n</script>, so we will show these definitions are equivalent via the following lemmas</p>
<blockquote>
<p>Lemma<br />
The degree of a path is homotopy-invariant. That is, if <script type="math/tex">f\sim g</script>, then <script type="math/tex">\deg f=\deg g</script>.</p>
</blockquote>
<p>Before we get to the proof, let’s look at a picture of what’s going on here.</p>
<center><img src="https://nivent.github.io/images/blog/fund-theorem/lift.gif" width="500" height="200" /></center>
<p>We have a path <script type="math/tex">f</script> going around the circle (here <script type="math/tex">f=\omega_2</script>), and by using local inverses of <script type="math/tex">p</script>, we lift this to a path in <script type="math/tex">\mathbb R</script> from <script type="math/tex">0</script> to <script type="math/tex">2</script>. This captures the fact that this circle loop makes two full revolutions around the circle. The idea behind the proof is similar to the proof of paths having unique lefts. You essentially show that you can also lift homotopies, so if <script type="math/tex">f\sim g</script>, then <script type="math/tex">\tilde f\sim\tilde g</script> which means they have the same endpoints.</p>
<div class="proof2">
Pf: Exercise for the reader.
</div>
<blockquote>
<p>Lemma<br />
The converse holds: If <script type="math/tex">\deg f=\deg g</script>, then <script type="math/tex">f\sim g</script>.</p>
</blockquote>
<div class="proof2">
Pf: Let \(f,g:[0,1]\rightarrow S^1\) be loops such that \(\deg f=\deg g\). Let \(\tilde f,\tilde g:[0,1]\rightarrow\mathbb R\) be their respective lifts and note that \(\tilde f(1)=\tilde g(1)\). Let \(\tilde H:[0,1]\times[0,1]\rightarrow\mathbb R\) be the straight line homotopy \(\tilde H(t,s)=(1-s)\tilde f(t)+s\tilde g(t)\), and define \(H:[0,1]\times[0,1]\rightarrow S^1\) by \(H(t,s)=p\circ\tilde H(t,s)\). Then, \(H\) is continuous since it is a composition of continuous functions. Furthermore, \(H(t,0)=p\circ\tilde f(t)=f(t)\), \(H(t,1)=p\circ\tilde g(t)=g(t)\), \(H(0,s)=p(0)=1=f(0)=g(0)\) and \(H(1,s)=p(\tilde f(1))=1=f(1)=g(1)\) for all \(t,s\in[0,1]\). Thus, \(H\) is a homotopy so \(f\sim g\). \(\square\)
</div>
<blockquote>
<p>Remark<br />
We’ve just shown that any loop around the circle is completely characterized (up to homotopy which is really all that matters) by a single integer, the number of times it goes around. Furthermore, it is easily shown that this integer is additive in the sense that <script type="math/tex">\deg(fg)=\deg f+\deg g</script> (It’s enough to show this for the case that <script type="math/tex">f=\omega_n</script> and <script type="math/tex">g=\omega_m</script> which is obvious), so the structure of loops around the circle is the additive structure of the integers! This is pretty amazing, and can be used to prove some interesting stuff<sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup></p>
</blockquote>
<h1 id="proof-at-last">Proof at Last</h1>
<p>At this point, we’ve developed everything we need. Before we get to the proof, let’s “strengthen” our assumptions a little bit. Let <script type="math/tex">f_0(x)=a_nx^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0</script> be any polynomial. Note that we can divide through by <script type="math/tex">a_n</script> without changing the zeros of this polynomial, so we only need to investigate monic polynomials like <script type="math/tex">f_1(x)=x^n+b_{n-1}x^{n-1}+\dots+b_1x+b_0</script> where <script type="math/tex">b_i=a_i/a_n</script>. Furthermore, we can replace <script type="math/tex">x</script> with any invertible transformation, and although we change the zeros, we’re still able to recover all the ones we started with. Hence, we can pick <script type="math/tex">N\in\mathbb R</script> small enough that <script type="math/tex">% <![CDATA[
\mid Nb_{n-1}\mid+\dots+\mid N^{n-1}b_1\mid+\mid N^nb_0\mid<1 %]]></script> and then consider polynomials like <script type="math/tex">f_2(x)=N^nf_1(x/N)=x^n+c_{n-1}x^{n-1}+\dots+c_1x+c_0</script> where <script type="math/tex">c_i=N^{n-i}b_i</script>. This limits the type of polynomials enough that we can state the theorem as</p>
<blockquote>
<p>Fundamental Theorem of Algebra<br />
Let <script type="math/tex">f(x)=x^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0</script> be any polynomial with complex coefficients (whose degree <script type="math/tex">n>0</script>) such that <script type="math/tex">% <![CDATA[
\mid a_{n-1}\mid+\dots+\mid a_1\mid+\mid a_0\mid<1 %]]></script>. Then, there exists some <script type="math/tex">x_0\in\mathbb C</script> with <script type="math/tex">f(x_0)=0</script>.</p>
</blockquote>
<div class="proof2">
Pf: Suppose that \(f(x)\) has no zero in \(\mathbb C\), so we can regard \(f\) as a function from \(\mathbb C\) to \(\mathbb C-\{0\}\). Now, define a function \(g:S^1\rightarrow S^1\) by \(g(x)=\frac{f(x)}{\mid f(x)\mid}\), and note that we can equivalently view \(g\) as a loop in \(S^1\), so \(g\) has a well-defined degree. Let \(D=\{z\in\mathbb C:|z|\le1\}\) be the unit disc, and note that, representing complex numbers in polar form, we can similarly define
$$\begin{matrix}
G: &D &\longrightarrow &S^1\\
&re^{2\pi i\theta} &\longmapsto &\frac{f(re^{2\pi i\theta})}{\mid f(re^{2\pi i\theta})\mid} & 0\le r \le 1 &0\le\theta\le 1
\end{matrix}$$
so we can think of \(G\) as a function from \([0,1]\times[0,1]\rightarrow S^1\) (the first argument is \(r\) and the second \(\theta\)). Thus, defining \(H:[0,1]\times[0,1]\rightarrow S^1\) by \(H(t,s)=G(s,t)\) makes \(H\) a homotopy! Clearly, \(H(t,1)=G(1,t)=g(t)\) (where we view \(g\) as a loop instead of as a circle function) and \(H(t,0)=G(0,t)=f(0)/\mid f(0)\mid\) for all \(t\in[0,1]\), so \(g\) is homotopic to a constant function and \(\deg g=0\). However, we can also define the following
$$\begin{matrix}
H': &[0,1]\times[0,1] &\longrightarrow &S^1\\
&(t,s) &\longmapsto &\frac{z^n + s(a_{n-1}z^{n-1}+\dots+a_1z+a_0)}{\mid z^n + s(a_{n-1}z^{n-1}+\dots+a_1z+a_0)\mid} & z=e^{2\pi it}
\end{matrix}$$
This function is continuous since its the composition of a bunch of continuous functions, and it is well-defined since the denominator is never 0
$$\begin{align*}
\mid z^n + s(a_{n-1}z^{n-1}+\dots+a_1z+a_0)\mid
&\ge |z|^n - s|a_{n-1}z^{n-1}+\dots+a_1z+a_0|\\
&\ge |z|^n - s(|a_{n-1}||z^{n-1}|+\dots+|a_1||z|+|a_0|)\\
&= 1 - s(|a_{n-1}|+\dots+|a_1|+|a_0|)\\
&> 1 - s\\
&\ge 0
\end{align*}$$
Now, we just need to note that \(H'\) is a homotopy from \([t\mapsto z^n]=[t\mapsto e^{2\pi itn}]=\omega_n\) to \(g\), so \(\deg g=n\), but this is a contradiction, and hence our initial assumption about \(f\) having a zero must be wrong. \(\square\)
</div>
<blockquote>
<p>Corollary<br />
Let <script type="math/tex">f(x)</script> be a degree <script type="math/tex">n</script> polynomial with coefficients in <script type="math/tex">\mathbb C</script>. Then, <script type="math/tex">f</script> has exactly <script type="math/tex">n</script> (not necessarily distinct) zeros.</p>
</blockquote>
<div class="proof2">
Pf: By the theorem, \(f\) has some zero \(z_0\in\mathbb C\), so let's divide \(f\) by \(z-z_0\). Using long division, we get some polynomials \(q(z),r(z)\) such that \(f(z)=q(z)(z-z_0)+r(z)\) and \(\deg r(z)<\deg(z-z_0)=1\) or \(r(z)=0\) which means \(r(z)\) is a constant. Since \(0=f(z_0)=q(z_0)(z_0-z_0)+r(z_0)=r(z_0)\), we must have \(r(z)=0\) so \(f(z)=q(z)(z-z_0)\) and \(\deg q(z)=n-1\). Now just apply induction to get that \(q(z)\) has \(n-1\) zeros, so \(f(z)\) has \(n\) zeros. \(\square\)
</div>
<p>Finally, an exercise.</p>
<blockquote>
<p>Exercise<br />
Where does the argument for the main theorem fail if <script type="math/tex">f</script> has a zero? Since, <script type="math/tex">f</script> has exactly <script type="math/tex">\deg f</script> zeros, you can always find a closed disc on which <script type="math/tex">f</script> has no zero, so why don’t we always get a contradiction?</p>
</blockquote>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>I’ve seen many other fundamental theorems besides this one, and I am very confused by how a theorem gets to be called fundamental <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Maybe it’s a space game, or maybe you have enemies that circles a base to protect it, or maybe etc. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Once we introduce homotopy, we’ll have an equivalence relation on paths. This has the effect that the set of (equivalence classes of) loops based at a single point forms a group called the fundamental group of X. Secretly, this post is really just exploring the fundamental group of the circle. Without homotopy, composition of paths isn’t associative because of the whole doubling speed thing. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Throughtout this post, I will avoid the issue of defining what a continuous function is, because doing so properly requires defining a topology on a set and that’s just too out of the way for this post. You can think of continuity intuitively as meaning nearby inputs get mapped to nearby outputs <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>Since w_n*w_m=w_{n+m}, this will also show that the fundamental group of the circle is Z, the set of integers <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>Secretly, this is a covering function, and R is the universal covering space of S^1 <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>This proof may actually require some, but not much, background in topology. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>the fundamental theorem of algebra of course, but also, for example, Brouwer’s Fixed Point Theorem. Brouwer’s theorem then can be used to show the existence of Nash Equilibria in normal form games (think first form of Prisoner’s Dilema shown in my post on it), and from there you get like all of game theory. <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>One of the first “theorems” I heard about was The Fundamental Theorem of Algebra, and I remember being kind of drawn to it for a long time after first seeing it. I think this was less because of the statement of the theorem itself, and more because the word fundamental in its title made it seem really important and imposing 1. Either way, I was convinced for a long time that it was somehow a mysterious theorem, that although easy to state, must have one of those impossible to understand, complicated proofs; the kind of thing that’s proved once via a lot of effort, and then is just applied afterwards without many people wanting to return to the proof because it’s just that out there. Despite this, my fascination with it made me determined to see and understand its proof once I became really good at/knowledgable of math. Luckily for me, I was wrong. The proof of the theorem is not arcane. In fact, there are many proofs of it, some of which even I can understand. I’ve seen many other fundamental theorems besides this one, and I am very confused by how a theorem gets to be called fundamental ↩Local Minima2017-06-02T22:48:00+00:002017-06-02T22:48:00+00:00https://nivent.github.io/blog/local-minima<p>My friend told me about an interesting algorithms problem he had come across and had not yet solved a while ago. We then worked on it together for some time, and eventually arrived at what believe to be a solution. This post is my attempt to recall the problem and our solution<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
<h1 id="warmup-problem">Warmup Problem</h1>
<p>The problem has to do with finding a certain element of a matrix. Before getting to it, I want to start with a simpler case of the problem just to get used to things. We first need to know what we are searching for.</p>
<blockquote>
<p>Definition<br />
A <b>local minimum</b> of an array of numbers is an element that is (strictly) smaller than its all of its neighbors<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.</p>
</blockquote>
<p>So, unsurprisingly, we are gonna be searching for local minima. One important thing, though, is that we assume no array has a repeated element.</p>
<blockquote>
<p>Problem<br />
Given any array of <script type="math/tex">n</script> unique numbers, how do you find a local minimum in time <script type="math/tex">O(\log n)</script>?</p>
</blockquote>
<p>The solution, as hinted by the <script type="math/tex">O(\log n)</script> time complexity, is to essentially do a binary search. Start by looking at the middle element. If it’s a local minimum, you’re done. If it’s bigger than the element to the right, then recursively search the right half of the array.</p>
<p>Below in an example of this algorithm<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> run on a list of unique numbers. The convention I use is that a number is large if its part of the numbers currently being considered by the algorithm, and it is huge if its the sole number the algorithm is looking at in the moment.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\Large{1} & \Large 4 & \Large 3 & \Large{12} & \Large{11} & \Large 0 & \Large{10} & \Large 9 & \Large 8 & {\Huge 7} & \Large 6 & \Large 2 & \Large{14} & \Large{13} & \Large{17} & \Large{15} & \Large{16} & \Large 5 \\
&&&&&&&&&\bigg\downarrow \\
{1} & \ 4 & \ 3 & {12} & {11} & \ 0 & {10} & \ 9 & \ 8 & { 7} & \Large 6 & \Large 2 & \Large{14} & \Large{13} & {\Huge 17} & \Large{15} & \Large{16} & \Large 5 \\
&&&&&&&&&\bigg\downarrow \\
{1} & \ 4 & \ 3 & {12} & {11} & \ 0 & {10} & \ 9 & \ 8 & { 7} & \Large 6 & \Large 2 & {\Huge 14} & \Large{13} & { 17} & {15} & {16} & 5 \\
&&&&&&&&&\bigg\downarrow \\
{1} & \ 4 & \ 3 & {12} & {11} & \ 0 & {10} & \ 9 & \ 8 & { 7} & \Large 6 & {\Huge 2} & { 14} & {13} & { 17} & {15} & {16} & 5
\end{matrix} %]]></script>
<p>I coded this up in C++ as below</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">inline</span> <span class="kt">bool</span> <span class="nf">isSmallerLeft</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o"><</span><span class="kt">int</span><span class="o">>&</span> <span class="n">arr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">index</span> <span class="o"><=</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">arr</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o"><</span> <span class="n">arr</span><span class="p">[</span><span class="n">index</span><span class="o">-</span><span class="mi">1</span><span class="p">];</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">bool</span> <span class="nf">isSmallerRight</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o"><</span><span class="kt">int</span><span class="o">>&</span> <span class="n">arr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">index</span> <span class="o">+</span> <span class="mi">1</span> <span class="o">>=</span> <span class="n">arr</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="o">||</span> <span class="n">arr</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o"><</span> <span class="n">arr</span><span class="p">[</span><span class="n">index</span><span class="o">+</span><span class="mi">1</span><span class="p">];</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">bool</span> <span class="nf">isLocalMin</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o"><</span><span class="kt">int</span><span class="o">>&</span> <span class="n">arr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">isSmallerLeft</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">index</span><span class="p">)</span> <span class="o">&&</span> <span class="n">isSmallerRight</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">index</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">findLocalMin</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o"><</span><span class="kt">int</span><span class="o">>&</span> <span class="n">arr</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">lo</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">hi</span> <span class="o">=</span> <span class="n">arr</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">hi</span> <span class="o">>=</span> <span class="n">lo</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">mid</span> <span class="o">=</span> <span class="p">(</span><span class="n">lo</span> <span class="o">+</span> <span class="n">hi</span><span class="p">)</span> <span class="o">>></span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isLocalMin</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">mid</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">mid</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">isSmallerLeft</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">mid</span><span class="p">))</span> <span class="p">{</span>
<span class="n">lo</span> <span class="o">=</span> <span class="n">mid</span><span class="o">+</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">hi</span> <span class="o">=</span> <span class="n">mid</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<blockquote>
<p>Theorem<br />
The above algorithm is correct and <script type="math/tex">O(\log n)</script></p>
</blockquote>
<div class="proof2">
Pf: Assume without loss of generality that you just checked an element \(x\) and found out it was bigger then the element to the right of it (This is the only interesting case). Then, you know there must a local minimum to the right of \(x\) in the array. Why? Either the array keeps decreasing, in which case the last element is necessarily a minimum, or the array increases at some point in this direction. In that case, you will find a minimum where that increase happens. Thus, the algorithm is definetly correct. Analyzing its time complexity is left as an exercise to the reader who is currently unconvinced of it. \(\square\)
</div>
<h1 id="real-problem">Real Problem</h1>
<p>Now that we got that out of the way, let’s look at something a little bit harder. Now, instead of considering 1D arrays, we’ll look for a local minimum of a matrix. We still require every number in the matrix to be unique, and it is (probably) important to keep in mind that when we search for local minima, any entry in a matrix has at most 4 neighbors.</p>
<blockquote>
<p>Problem<br />
Given an <script type="math/tex">n\times n</script> matrix of unique numbers, how can you find a local minimum in time at worse <script type="math/tex">O(n)</script>?</p>
</blockquote>
<p>The first thing to notice is that moving from arrays to matrices cause our time complexity to go from <script type="math/tex">O(\log n)</script> to <script type="math/tex">O(n)</script>. I don’t know about you, but this seemed strange to me. I don’t know of a clear realtion between <script type="math/tex">\log n</script> and <script type="math/tex">n</script> in the context of matrices. Furthermore, you would expect the matrix problem to be slower, but it actually has better complexity in relation to what it could be<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>. What I mean is that in the worst case we check every element, so we can look at ratios to get a sense of how much better than worst-possible we are doing. In the array case, we get <script type="math/tex">\frac{\log n}n</script>, but in the matrix case we get <script type="math/tex">\frac n{n^2}=\frac1n</script> which is even smaller.</p>
<p>At this point, I encourage you to stop reading and spend the next couple of hours thinking about this problem, trying to come up with a solution. I’ll wait<script type="math/tex">\dots</script></p>
<p>You’re back; let’s say more stuff <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>. As it turns out, this lack of obvious relation between time complexites is related to there being fundamentally different ideas going into how each problem is solved. The issue with trying to generalize the case of arrays is that with matrices there’s no nice ordering on the indices. You could “flatten” an <script type="math/tex">n\times n</script> matrix into a large array and run the previous algorithm on it, and this would get a solution in time <script type="math/tex">O(\log(n^2))=O(2\log n)=O(\log n)</script> which is even better than we want, but you are not actually guaranteed to get a local minimum by doing this! <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup> If this isn’t obvious, stop for a second and convince yourself that it is. Below, you probably want to read the footnotes.</p>
<p>Instead, here’s the solution my friend and I came up with. Start with the middle row, and find its absolute (read: not local) minimum <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>. If its a local minimum, we win. If not, then either the value above or below it is smaller. Recursively find a local minimum in the half-matrix containing the smaller value <sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup>. That’s it. Note that because of the nature of the algorithm, the only things we consider as possible local minima are the absolute minimal elements of rows.</p>
<p>Before looking at an example, let’s calculate its time complexity <script type="math/tex">T_n</script> on an <script type="math/tex">n\times n</script> matrix. It is</p>
<script type="math/tex; mode=display">\begin{align*}
T_n = n + \frac n2 + \frac n2 + \frac n4 + \frac n4 + \frac n8 + \frac n8 + \dots = 3n = O(n)
\end{align*}</script>
<p>where we consecutively look at matrices of size <script type="math/tex">n\times n,(n/2)\times n,(n/2)\times(n/2),(n/4)\times(n/2),\dots</script></p>
<p>The above calucation was informal, but that’s ok since its correct. Below is an example run of the algorithm with the same conventions as before with an addition of a star by the row or column whose minimum was found.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\begin{matrix}
\\
& \Large{22} & \Large{7} & \Large{21} & \Large{4} & \Large{12} \\
& \Large{6} & \Large{2} & \Large{8} & \Large{13} & \Large{10} \\
* & \Large{17} & \Huge{5} & \Large{15} & \Large{14} & \Large{9} \\
& \Large{16} & \Large{11} & \Large{18} & \Large{3} & \Large{20} \\
& \Large{19} & \Large{23} & \Large{1} & \Large{25} & \Large{24}
\end{matrix}
\longrightarrow
\begin{matrix}
& & * & &\\
\Large{22} & \Large{7} & \Large{21} & \Large{4} & \Large{12} \\
\Large{6} & \Large{2} & \Huge{8} & \Large{13} & \Large{10} \\
{17} & {5} & {15} & {14} & {9} \\
{16} & {11} & {18} & {3} & {20} \\
{19} & {23} & {1} & {25} & {24}
\end{matrix}
\longrightarrow
\begin{matrix}
\\
\Large{22} & \Large{7} & {21} & {4} & {12} & \\
\Large{6} & \Huge{2} & {8} & {13} & {10} & *\\
{17} & {5} & {15} & {14} & {9} & \\
{16} & {11} & {18} & {3} & {20} & \\
{19} & {23} & {1} & {25} & {24} &
\end{matrix}
\end{matrix} %]]></script>
<blockquote>
<p>Exercise<br />
Write code that performs this algorithm, and count the number of steps (comparisons) it takes to find a local minimum on a large sample of random <script type="math/tex">n\times n</script> matrices. See if this number is actually roughly <script type="math/tex">3n</script> as expected.</p>
</blockquote>
<p>Finally, a far too wordy proof…</p>
<blockquote>
<p>Theorem<br />
This algorithm is correct with time complexity <script type="math/tex">O(n)</script>.</p>
</blockquote>
<div class="proof2">
Pf idea: The claim of the time complexity was shown earlier, so we only argue for the algorithm's correctness here. If the first value checked (the minimum of the middle row) is not a local minimum, how do we know we will still find one? The first thing to notice is that every matrix (and hence every matrix of original \(n\times n\) matrix) has a local minimum. Hence, we could recursively search any submatrix and find a local minimum, except for the fact that a local minimum of a submatrix might not be a localminim of the original matrix (Ex. take the extreme case of a one-element submatrix). It is not too hard to see that if a local minimum of a submatrix fails to be a local minimum of the original matrix, then it must lie on the edge of the submatrix. Thus, without loss of generality, assume the middle element of the first row is not a local minimum and the element directly above it is smaller than it. In this case, we would recursively search the top half of the matrix and everything is fine unless the local minimum we find is on the row directly above the middle row (then things might or might not be fine), so assume this is the case. Note that some element of this row is smaller than the minimum element of the middle row (that's why this submatrix was chosen), and so the middle element of this row is smaller than the minimum elment (and by extension any element) of the middle row. Here, using the fact that our algorithm only searches minimal elements of rows, the minimum element of the row above the middle row must be the number we returned given the assumptions we've made. This number is smaller than the numbers directly above it and to its left and right (if it were not, then we wouldn't have chosen it as the local minimum). By the previous reasoning, its also smaller than the number directly below it, so it is in fact a local minimum. Since this was the only case things could go wrong, our algorithm must work in all cases. \(\square\)
</div>
<p>Take some time to make sure this proof makes sense to you and is legit. If after a while, you’re still not convinced and you know a case where things go wrong, then leave a comment telling me where I went wrong.</p>
<h1 id="bonus">Bonus</h1>
<p>This post lacked much motivation and insight into where these solutions came from, and I usually like to try to have those things. I won’t include them here, but I will say I feel I somewhat robbed you of the chance to solve these problems yourself, so here’s an (unrelated) bonus problem I found just before writing this post. It’s not quite as difficult as this problem, but (hopefully) not immediately obvious</p>
<blockquote>
<p>Problem<br />
Given a sorted <script type="math/tex">n</script>-element array where every element appears 2 times except for a single number appearing once, how do you find this number in time <script type="math/tex">O(\log n)</script>?</p>
</blockquote>
<p>As an example, given the array <script type="math/tex">\{1,1,2,2,3,3,5,5,7,8,8,9,9\}</script>, you would return <script type="math/tex">7</script> as your answer.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Warning: I’m not 100% sure our solution is correct. It’s fairly handwavy <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Warning: This post might end up being kinda dense because I’m not sure what to say other than “here’s a problem; here’s a solution” <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>When we move on to matrices, most elements will have 4 neighbors. Diagonal elements are not neighbors. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Modulo me being inconsistent about which element is the middle one <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>I’m not an Algorithms person, and this is not a standard complexity comparing method. This is just a way I think/thought of things <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>If you didn’t come up with a solution, you’re not alone. This problem originated in an Algorithms class here on a problem set, but was hard enough that the professor changed his mind after assigning it and decided to not make it mandatory. Instead, it became a bonus problem. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>There might be a way to fix this issue to end up with a working algorithm. I have not explored this option. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>If there are fewer columns than rows, start with the middle column <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>Every step you end up rotating the matrix 90 degrees. For example, in the first step, you search a row of size n, but in the second you search a column of size n/2. Note that you don’t actually rotate the matrix because that would waste time. You just switch between rows and columns. <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>My friend told me about an interesting algorithms problem he had come across and had not yet solved a while ago. We then worked on it together for some time, and eventually arrived at what believe to be a solution. This post is my attempt to recall the problem and our solution12. Warning: I’m not 100% sure our solution is correct. It’s fairly handwavy ↩ Warning: This post might end up being kinda dense because I’m not sure what to say other than “here’s a problem; here’s a solution” ↩A Little Bit of Number Theory2017-05-20T21:01:00+00:002017-05-20T21:01:00+00:00https://nivent.github.io/blog/number-theory<p>When I was a freshman in high-school<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, my math teacher showed me <a href="https://brilliant.org/">this</a> site where you can find a seemingly endless supply of math problems; I loved it. I spent a decent amount of my time solving problems, and a significant amount of my time failing to solve problems, but I enjoyed it either way. On the site, there are different categories, and you have a level from 1-5 in each category, indicating how well you do at solving those problems. After a while on the site, I came to be level 5 in number theory<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>, and was pretty shocked. I had barely heard of this “number theory” field, and wasn’t sure what about it made me do well<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>, so even before I had an idea of what number theory was, it seemed like a interesting field.</p>
<p>In general, I feel like number theory is a relatively unknown field to most people, so this is my little part in changing that. This post is probably gonna be a long one. In it, I want to talk about two (hopefully somewhat motivated) problems that lead to some interesting mathematics. Because I want to do the mathematics justice, I will try my best to keep the post self-contained, proving any non-trivial claims I make<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>. If you are more interested in the results/overall argument than the details, you can skip these.</p>
<h1 id="pythagoras">Pythagoras</h1>
<p>The Greeks seem like as good a place as any to start, and in the context of mathematics, one of the most famous Greeks in Pythagoras with his theorem.</p>
<blockquote>
<p>Pythagoras’ Theorem<br />
Given a right triangle with side lengths of <script type="math/tex">a,b</script>, and <script type="math/tex">c</script> where <script type="math/tex">c>a,b</script>, the following relation holds:<br />
<script type="math/tex">a^2 + b^2 = c^2</script></p>
</blockquote>
<p>I said I would try to prove any non-trivial claim I made, and this theorem has always struck me as non-trivial. However, it is also pretty well known so I’ll just leave you with a picture and a <a href="http://www.mathalino.com/reviewer/derivation-of-formulas/derivation-of-pythagorean-theorem">link</a> to the site I stole it from instead<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>.</p>
<p><a href="http://www.mathalino.com/reviewer/derivation-of-formulas/derivation-of-pythagorean-theorem"><center><img src="https://nivent.github.io/images/blog/number-theory/pythagoras.jpg" width="250" height="100" /></center> </a></p>
<p>Now that we have this, this lets us see that we can have a right triangle with side lengths <script type="math/tex">(3,4,5)</script>, <script type="math/tex">(5,12,13)</script>, <script type="math/tex">(21,28,35)</script>, or <script type="math/tex">(7,24,25)</script> but not one with side lengths <script type="math/tex">(1,2,3)</script> or <script type="math/tex">(4,4,4)</script>. In fact, if we think about it, this says we can’t have any equilateral right triangle. If we did and the side length was <script type="math/tex">x</script>, then this would give <script type="math/tex">x^2+x^2=x^2\implies 2x^2=x^2\implies x=0</script>, and a triangle with 0 side length is no triangle at all. I think a natural question to ask at this point would be “what side lengths can we get?”. Now, we have to be careful about how we phrase this, because obviosly given any <script type="math/tex">a</script> and <script type="math/tex">b</script>, we can find some <script type="math/tex">c</script> that gives a right triangle, but that <script type="math/tex">c</script> may not be nice. For example, if <script type="math/tex">a=2</script> and <script type="math/tex">b=4</script>, then <script type="math/tex">c=\sqrt{a^2+b^2}=\sqrt{20}=2\sqrt5</script> isn’t the nicest looking solution. What we really want is a triangle where all side lengths are whole numbers.</p>
<blockquote>
<p>Question<br />
What are the triples of integers <script type="math/tex">(a,b,c)</script> where <script type="math/tex">a,b,c\in\mathbb Z</script><sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup> such that <script type="math/tex">a^2+b^2=c^2</script>?</p>
</blockquote>
<p>One possible issue worth addressing is that this phrasing allows for negative integers. I said before that a “triangle with 0 side length is no triangle at all”; well, a triangle with negative side length seems like it should be even less qualified to answer the question. However, there are a few good reasons to allow negative (and even zero) side lengths as I’ve done with this question:</p>
<ul>
<li>
<p>Worse case scenario, they can be igonored. Squaring erases information about sign, so if you have a negative number in a solution, just negate it to get a solution with only positive numbers. If you have a zero, then you just have <script type="math/tex">a^2=c^2\implies a=\pm c</script>, so your solution can be considered “trivial” and just ignored. There’s no harm done here.</p>
</li>
<li>
<p>Working with all integers instead of just the positive integers adds symmetry to the problem. Symmetry and patterns and whatnot are usual good in math, so including them can make finding a solution easier.</p>
</li>
<li>
<p>They have geometric interpretations. If a solution includes <script type="math/tex">0</script>, then the right triangle you are describing is actually just a line segment. If it includes negative numbers then your the signs of numbers describe its orientations. You can imagine <script type="math/tex">(3,4,5)</script> as a triangle pointing to the right while <script type="math/tex">(-3,4,5)</script> points to the left.</p>
</li>
</ul>
<h2 id="progress">Progress</h2>
<p>Now that we have a question we’re happy with, how do we actually go about solving it? In my reasons for using all the integers instead of just the positive ones, I said that the whole integers were nicer than the positive integers. While this is true, they still aren’t super nice. When you think about, you can only add, subtract, and multiply integers. You can’t really divide them, so we’ll have to be careful about what algebra we do. With that in mind, let’s see what progress we can make.</p>
<p>If you play with this equation for a while, trying to find solutions and seeing what comes up, you might notice a pattern. Looking at the four solutions I gave earlier, we see both <script type="math/tex">(3,4,5)</script> and <script type="math/tex">(21,28,35)</script>. The second of these is <script type="math/tex">7</script> times the first, and this leads into the following observation.</p>
<blockquote>
<p>Lemma<br />
If <script type="math/tex">(a,b,c)</script> is a Pythagorean triple and <script type="math/tex">n\in\mathbb Z</script>, then <script type="math/tex">(an,bn,cn)</script> is a Pythagorean triple as well</p>
</blockquote>
<div class="proof2">
Pf: $$\begin{align*}
a^2 + b^2 = c^2 &\implies n^2(a^2 + b^2) = n^2c^2\\
&\implies (na)^2 + (nb)^2 = (nc)^2
\end{align*}$$
\(\square\)
</div>
<p>What this means is that in our search for integer solutions, we really only need worry about ones where <script type="math/tex">a</script>, <script type="math/tex">b</script>, and <script type="math/tex">c</script> have no common factors.</p>
<blockquote>
<p>Exercise<br />
If two of <script type="math/tex">a</script>, <script type="math/tex">b</script>, and <script type="math/tex">c</script> share a common factor <script type="math/tex">d</script>, then <script type="math/tex">d</script> divides the third one as well.</p>
</blockquote>
<p>This is a start. From now on, we’ll implicitly assume the numbers we’re working with are coprime<sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>. Returning to our issue with integers, we can’t divide them, but we know that if we do we’ll get a rational number, and fractions are even nicer then integers. This leads to the following simple yet profound algebraic manipulation.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
&a^2 + b^2 = c^2 \implies \frac{a^2}{c^2} + \frac{b^2}{c^2} = 1 \implies \left(\frac ac\right)^2 + \left(\frac bc\right)^2 = 1\\
\text{Conversley, } &\left(\frac ac\right)^2 + \left(\frac bc\right)^2 = 1 \implies \frac{a^2}{c^2} + \frac{b^2}{c^2} = 1 \implies a^2 + b^2 = c^2
\end{align*} %]]></script>
<p>When we do this, <script type="math/tex">\frac ac</script> and <script type="math/tex">\frac bc</script> are fractions in simplest form since this goes both ways, this means that finding (coprime) integer solutions to <script type="math/tex">a^2+b^2=c^2</script> is equivalent to finding rational solutions to <script type="math/tex">A^2+B^2=1</script>! It get’s even better than this.</p>
<blockquote>
<p>Lemma<br />
The rational <script type="math/tex">A,B</script> satisfying <script type="math/tex">A^2+B^2=1</script> are exactly the rational points <script type="math/tex">(A,B)</script> on the unit circle.</p>
</blockquote>
<div class="proof2">
Pf: Pick any point \((x,y)\) on the unit circle. Draw a right triangle with points at \((0,0)\), \((0,x)\) and \((x,y)\). A moment's reflection will reveal that this triangle has the line segment from the origin to \((x,y)\) as its hypotenuse (which is length 1 by assumption), and that its width and height are \(x\) and \(y\) units long, respectively. Thus, by Pythagoras' Theorem, we have \(x^2+y^2=1\). Thus, the points on the unit circle are those satisfying \(x^2+y^2=1\) and from this we see that claim holds.
</div>
<p><sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>Stop and think about this for a second. We have shown that the number theoretic problem finding all integer solutions to <script type="math/tex">a^2+b^2=c^2</script> is equivalent to the geometric problem of finding all rational points on the unit circle. This is a major shift in perspective and will directly lead into the insight that allows us to answer our question.</p>
<h2 id="solution">Solution</h2>
<p>When giving a mathematical argument on here, I’m always a little stumped on how to motivate to main idea. I would like to give the impression that there’s nothing magical or other worldly going on here; that given enought time to think about it, you could have produced everything I do here. What follows is my best attempt to do that.</p>
<p>Now that we know that we’re finding points on a circle, let’s try to use some geometric intuition. If you look at the circumference of a circle, it’s really just a line that’s been wrapped around to connect to itself, and while circles can be complicated to work with, lines are pretty simple objects. Imagine if we could uncurl the circle back into a line. Then, ideally, all the rational points on the circle would end up laying exactly on top of a rational number on the line, and this would let us find the rational points of a circle by looking at the rational points of the number line (which are just the fractions). What we’ll be doing is essentially the following</p>
<center><img src="https://nivent.github.io/images/blog/number-theory/circ.gif" width="250" height="100" /></center>
<p>To describe this mathematically, we need to decide how we’ll uncurl the circle. We want to manipulate the circle until it lies along the number line, but there are many ways to draw a line in the plane, so we need to choose one. Any will work, but following the animation’s example, we’ll use the line <script type="math/tex">x=-1</script>. Now, we need a choice of reference point, where we’ll begin our uncurling. You can imagine that if we unzipped the circle from the point <script type="math/tex">(0,1)</script> then <script type="math/tex">\sim\frac34</script> of it would end up below the <script type="math/tex">x</script>-axis and only <script type="math/tex">\sim\frac14</script> would end up above the <script type="math/tex">x</script>-axis. Because we like to keep things simple and symmetric, we’ll unzip from the point <script type="math/tex">(1,0)</script>.</p>
<p>Now we’re ready for some algebra. What we’re realing doing here to unzip the circle is drawing lines through it. Our goal is to end up on the line <script type="math/tex">x=-1</script>, so start with the point <script type="math/tex">(-1,t)</script> where <script type="math/tex">t\in\mathbb Q</script> is an arbitrary rational number. We want to relate this to a (rational) point on the circle. Since we chose <script type="math/tex">(1,0)</script> as our reference, we’ll do this by looking at the line <script type="math/tex">t(X-1)=-2Y</script> through both <script type="math/tex">(-1,t)</script> and <script type="math/tex">(1,0)</script>. This line will intersect the circle <script type="math/tex">X^2+Y^2=1</script> in one other place, and this will be the rational point associated to <script type="math/tex">t</script>.<sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup></p>
<center>
<img src="https://nivent.github.io/images/blog/number-theory/pt.jpg" width="250" height="100" />
<img src="https://nivent.github.io/images/blog/number-theory/circ2.gif" width="250" height="100" />
</center>
<p>At this point, things look good but you have to believe me that our line will intersect the circle in a second point, and that this point will also be rational. Visually, I think it’s easy to see that the line will have a second point of intersection with the circle. If you want something a little more rigourous…</p>
<blockquote>
<p>Lemma<br />
The line <script type="math/tex">t(X-1)=-2Y</script> where <script type="math/tex">t</script> is rational intersects the circle <script type="math/tex">X^2+Y^2=1</script> in two rational points.</p>
</blockquote>
<div class="proof2">
Pf: Substituting \(Y=\frac{-t}2(X-1)\) into \(X^2+Y^2=1\) gives a quadratic equation in \(X\) which has two roots, so we know the line will have a second intersection point on the circle. We will now show that this second point is rational. Write \(aX^2+bX+c=0\) for the quadratic we get from this substitution. Using the quadratic formula, its two roots are \(\frac{b\pm\sqrt{b^2-4ac}}{2a}\). We know by construction that one of its roots is at \((1,0)\), which means that \(\sqrt{b^2-4ac}\) is rational. Since all the numbers (\(2,t,1,\dots\)) used to form \(a,b\), and \(c\) are rational, they are as well. This means that \(X=\frac{b\pm\sqrt{b^2-4ac}}{2a}\) is rational (technically, are rational) as any sum, difference, product, or quotient of rational numbers is rational. Using the equation for the line, if \(X\) is rational then \(Y=\frac{-t}2(X-1)\) is as well so our second point must be rational. \(\square\)
</div>
<p>All that’s left to do is actually make this substitution and solve for <script type="math/tex">X</script> and <script type="math/tex">Y</script> given <script type="math/tex">t</script>. We have</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
t(X-1)=-2Y &&& X^2+Y^2=1\\
t^2(X-1)^2=4Y^2 &&& 4X^2+4Y^2=4\\
&& t^2(X^2-2X+1)=4-4X^2\\
&& (t^2+4)X^2-2t^2X^2+(t^2-4)=0
\end{align*} %]]></script>
<p>At this point, we could follow the proof and use the quadratic formula, but I’d rather introduce something new called <a href="https://www.wikiwand.com/en/Vieta%27s_formulas">Vieta’s formulas</a>.</p>
<blockquote>
<p>Theorem (Vieta)<br />
If the polynomial <script type="math/tex">f(X)=a_nX^n+a_{n-1}X^{n-1}+\dots+a_1X+a_0</script> has roots <script type="math/tex">\lambda_1,\lambda_2,\dots,\lambda_n</script>, then <script type="math/tex">\lambda_1+\lambda_2+\dots+\lambda_n=-a_{n-1}/a_n</script>.</p>
</blockquote>
<div class="proof2">
Pf: Since we know its roots, we can write \(f(X)=a_n(X-\lambda_1)(X-\lambda_2)\dots(X-\lambda_n)\). Using this representation, consider the coefficient \(a_{n-1}\) of the second highest term of \(f(X)\). For a term in this product to contribute to it, we must pick exactly \((n-1)\) \(X\)'s and one \(-\lambda_i\). This means that the terms of the product summing up to the \(a_{n-1}X^{n-1}\) term in the polynomial are all of the form \(-a_n\lambda_iX^{n-1}\) for some \(i\in\{1,2,\dots,n\}\). This means that \(a_{n-1}=-a_n\lambda_1-a_n\lambda_2-\dots-a_n\lambda_n\) so divide both sides by \(-a_n\) to get the desired result. \(\square\)
</div>
<p>This theorem is useful because we know that <script type="math/tex">X=1</script> is a root of the complicated polynomial we have (corresponding to our reference point <script type="math/tex">(1,0)</script>), so whatever the second root <script type="math/tex">X</script> is, we know we must have</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
X+1=\frac{2t^2}{t^2+4} &\implies X=\frac{t^2-4}{t^2+4}\\
&\implies Y=\frac{-t}2(X-1)=\frac{4t}{t^2+4}
\end{align*} %]]></script>
<p>This seems like another good place for a quick pause. What we have just done is found a way to turn a rational number <script type="math/tex">t</script> into a pair of rational numbers <script type="math/tex">(X,Y)</script> on the unit circle which can be turned into 3 coprime integers <script type="math/tex">(X,Y,Z)</script> that satisfy the Pythagorean theorem. In essence, we have found a 1-1 correspondence between fractions and rational points on the circle <sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup> which solves our problem by our earlier observation that rational points on the circle are in 1-1 correspondence with coprime Pythagorean triples.</p>
<p>To finish up, we’ll note that a rational number is really just two integers as we can write <script type="math/tex">t=\frac mn</script> where <script type="math/tex">m,n\in\mathbb Z</script> are coprime. Thus, to form any Pythagorean triple, pick any two integers <script type="math/tex">m,n</script> and use calculate the following: <script type="math/tex">\left(\frac{t^2-4}{t^2+4},\frac{4t}{t^2+4}\right)=\left(\frac{(m/n)^2-4}{(m/n)^2+4},\frac{4m/n}{(m/n)^2+4}\right)=\left(\frac{m^2-4n^2}{m^2+4n^2},\frac{4mn}{m^2+4n^2}\right)</script>
which corresponds to the solution<sup id="fnref:11"><a href="#fn:11" class="footnote">11</a></sup></p>
<script type="math/tex; mode=display">\begin{align*}
(a,b,c)=(m^2-4n^2,4mn,m^2+4n^2)
\end{align*}</script>
<p>All Pythagorean triples can be generated this way with the caveat that you may have to swap <script type="math/tex">a</script> and <script type="math/tex">b</script>, negate one (or both) of them, or scale both of them up by a constant factor.</p>
<h1 id="gauss">Gauss</h1>
<p>There is a nice connection between the second thing I wanted to talk about and the first, but unfortunately, I cannot remember it so I don’t have a good segway into this topic like I wanted to. I will still mention that one of my favorite identites from algebra was always different of squares: <script type="math/tex">a^2-b^2=(a-b)(a+b)</script>, and in fact, if you look at the Pythagorean problem, it was essentially a fancy difference of squares problem. We had <script type="math/tex">a^2+b^2=c^2</script> which just means that <script type="math/tex">a^2-(-b^2)=c^2</script>. The issue with factoring this like before is having to take a square root of <script type="math/tex">(-b^2)</script>. This isn’t a big issue though. You can extend the real numbers to a bigger set of numbers called complex numbers that includes a number <script type="math/tex">i</script> such that <script type="math/tex">i^2=-1</script>. Allowing for use of <script type="math/tex">i</script>, we can write <script type="math/tex">a^2+b^2=c^2</script> as <script type="math/tex">(a+bi)(a-bi)=c^2</script>. In fact<sup id="fnref:12"><a href="#fn:12" class="footnote">12</a></sup>, you can use this to derive a formula for all Pythagorean triples instead of taking the approach we did above. If nothing else, that should be indication that this <script type="math/tex">i</script> is something worth studying.</p>
<p>Since we’re doing number theory, instead of looking at all complex numbers, we’ll focus on a subset called the Gaussian integers; these are the numbers <script type="math/tex">a+bi</script> where <script type="math/tex">a,b\in\mathbb Z</script>. As their name suggests, these numbers play an analagous role to the normal integers as a subset of the real (or just rational) numbers. One of the most important aspects of the normal integers is the behavior of prime numbers, so we’ll investigate the analgous behavior for Gaussian primes. In particular, we’ll ask the following question</p>
<blockquote>
<p>Question<br />
Which primes <script type="math/tex">p\in\mathbb Z</script> remain prime we viewed as the Guassian integer <script type="math/tex">p=p+0i\in\mathbb Z[i]</script></p>
</blockquote>
<p>To get a feel for this question. Let’s try to factor the first few primes and see what happens. After some trial and error, you might end up with a table like</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
2=(1+i)(1-i) & 3=3 & 5=(2+i)(2-i)\\
7=7 & 11=11 & 13=(3+2i)(3-2i)\\
17 = (4+i)(4-i) & 19=19 & 23=23\\
& \vdots
\end{matrix} %]]></script>
<p>The primes that can be factored seem pretty random, but the way in which they are factored has a pattern to it.</p>
<blockquote>
<p>Aside<br />
When I introduced <script type="math/tex">i</script>, I said that <script type="math/tex">i^2=-1</script> which means that <script type="math/tex">i</script> is a square root of <script type="math/tex">-1</script>. However, every number has two square roots, and indeed <script type="math/tex">(-i)^2=(-1)^2(i)^2=1(-1)=-1</script>, so <script type="math/tex">-1</script> is no exception. However, when I defined <script type="math/tex">i</script>, I just said it was a square root of <script type="math/tex">-1</script> without specifying which. You can imagine swapping <script type="math/tex">i</script> with <script type="math/tex">(-i)</script> everywhere you use it, and this shouldn’t change whether what you have is true or not since <script type="math/tex">i</script>’s defining feature (having <script type="math/tex">-1</script> as a square) doesn’t define it uniquely. This could make you think that there might be ways of combining <script type="math/tex">i</script> and <script type="math/tex">-i</script> to gain information about numbers.</p>
</blockquote>
<p>Noticing this pattern, and the aside above, we make the following definition (whenever I write something like <script type="math/tex">a+bi</script>, just assume that <script type="math/tex">a</script> and <script type="math/tex">b</script> are normal integers. I won’t bother always specifying.)</p>
<blockquote>
<p>Definition<br />
Given a number <script type="math/tex">a+bi\in\mathbb Z[i]</script>, we define its <b>norm</b> to be <script type="math/tex">N(a+bi):=(a+bi)(a-bi)=a^2+b^2</script>. Note that the norm of a number is always a non-negative integer.</p>
</blockquote>
<p>This function has the nice property of being multiplicative</p>
<blockquote>
<p>Lemma<br />
<script type="math/tex">N((a+bi)(c+di))=N(a+bi)N(c+di)</script></p>
</blockquote>
<div class="proof2">
Pf: Left as an exercise \(\square\)
</div>
<p>With this definition under our belt, we see that the primes that are no longer prime factor as the norm of a number, and so can be written as the sum of two squares. Does this hold in general? Before answering this, a quick remark. When we factor numbers in the normal integers, there is always the issue of <script type="math/tex">-1</script>. If we write <script type="math/tex">15=3(5)</script>, then we could alternatively write <script type="math/tex">15=(-3)(-5)</script>. Similarly, if we write <script type="math/tex">7=7(1)</script>, we could also write <script type="math/tex">7=7(-1)</script>. Despite this, we still like to say <script type="math/tex">15</script> has a unique factorization and <script type="math/tex">7</script> has no factors other than 1 and itself. What’s going on here is that <script type="math/tex">-1</script> divides <script type="math/tex">1</script> so no matter how we factor something, we can always just absorb extra <script type="math/tex">-1</script>’s into the factorization. This shouldn’t count as really being a different factorization, so we characterize annoying numbers like this as follows</p>
<blockquote>
<p>Definition<br />
A number <script type="math/tex">u</script> that divides <script type="math/tex">1</script> is called a <b>unit</b>. Equivalently, if there exists a number <script type="math/tex">v</script> such that <script type="math/tex">uv=1</script>, then <script type="math/tex">u</script> is a unit.</p>
</blockquote>
<p>When we talk about factorization, we don’t care about units. Furthermore,</p>
<blockquote>
<p>Theorem<br />
In <script type="math/tex">\mathbb Z[i]</script> the only units are <script type="math/tex">\pm1,\pm i</script>, and a number <script type="math/tex">x</script> is a unit if and only if <script type="math/tex">N(x)=1</script></p>
</blockquote>
<div class="proof2">
Pf: \(uv=1\implies N(uv)=N(u)N(v)=N(1)=1\implies N(u),N(v)=1\). Conversely, the statement \(N(u)=1\) itself says that \(u\) is a unit (why?).<br />
For the first part of the theorem, assume \(u\) is a unit and write \(u=a+bi\) so \(a^2+b^2=1\). Clearly, either \((a,b)=(1,0)\) or \((a,b)=(0,1)\) so the claim holds. \(\square\)
</div>
<blockquote>
<p>Theorem<br />
(Normal) prime <script type="math/tex">p</script> factors in <script type="math/tex">\mathbb Z[i]</script> if and only if <script type="math/tex">p</script> can be written as the sum of two squares.</p>
</blockquote>
<div class="proof2">
Pf: \((\rightarrow)\) Assume \(p=\alpha\beta\) factors in \(\mathbb Z[i]\) with \(\alpha,\beta\in\mathbb Z[i]\) both non-units. Then, \(p^2=N(p)=N(\alpha\beta)=N(\alpha)N(\beta)\). Since \(N(\alpha),N(\beta)\neq1\) by assumption, this means \(N(\alpha)=p\). Write \(\alpha=a+bi\) to get the result.<br />
(\(\leftarrow\)) If \(a^2+b^2=p\), then \(p=(a+bi)(a-bi)\).\(\square\)
</div>
<p>This means we can classify the normal primes that are Gaussian primes by figuring out which can be written as the sum of two squares. This turns out to have a surprising answer using some modular arithmetic.</p>
<blockquote>
<p>Theorem<br />
An odd prime <script type="math/tex">p</script> can be written as the sum of two squares only if <script type="math/tex">p\equiv1\pmod4</script>.</p>
</blockquote>
<div class="proof2">
Pf: Assume \(p=a^2+b^2\). Note that the only squares\({}\bmod4\) are \(0\) and \(1\) which can be seen by squaring all four numbers. Hence, trying all possibilities, we have \(p\equiv0,1,2\pmod4\). Since \(p\) is odd, this means \(p\equiv1\pmod4\). \(\square\)
</div>
<p>We would like to prove the other direction, that if <script type="math/tex">p\equiv1\pmod4</script>, then its the sum of two squares. While this turns out to be true, it doesn’t have as simple of a proof <sup id="fnref:13"><a href="#fn:13" class="footnote">13</a></sup>. First, note that if we have a a number <script type="math/tex">d</script> s.t. <script type="math/tex">d^2\equiv-s^2\pmod p</script> for some <script type="math/tex">s</script>, then we know that <script type="math/tex">p</script> divides <script type="math/tex">d^2+s^2</script> which looks like a step in the right direction. This motivates the following theorem.</p>
<blockquote>
<p>Lemma<br />
If prime <script type="math/tex">p\equiv1\pmod4</script>, then <script type="math/tex">-1\equiv\square\pmod p</script>.</p>
</blockquote>
<div class="proof2">
Pf: Assume prime \(p\equiv1\pmod4\). Consider the group \(\mathbb F_p^\times\) of non-zero integers modulo \(p\) under multiplication. Writing \(p=4k+1\), this group is cyclic of order \(4k\) so there exists some \(g\in\mathbb F_p^\times\) with order \(4k\). Hence \(g^{4k}=1\implies g^{2k}g^{2k}=1\implies g^{2k}=-1\implies (g^k)^2=-1\) so \(-1\) is a square modulo \(p\) as claimed. \(\square\)<br /><br />
Alternative Pf: Use <a href="https://www.wikiwand.com/en/Wilson%27s_theorem">Wilson's Theorem</a> to show that \((2k)!(2k)!\equiv-1\pmod p\) if \(p=4k+1\). Details left to reader. \(\square\)
</div>
<blockquote>
<p>Exercise<br />
Show that the converse of this lemma holds as well. If <script type="math/tex">-1\equiv\square\pmod p</script> for odd prime <script type="math/tex">p</script>, then <script type="math/tex">p\equiv1\pmod4</script>.</p>
</blockquote>
<p>Now that we’ve taken that step, we can finally proof the other direction of our previous theorem <sup id="fnref:16"><a href="#fn:16" class="footnote">14</a></sup>.</p>
<blockquote>
<p>Theorem<br />
An odd prime <script type="math/tex">p</script> can be written as the sum of two squares if <script type="math/tex">p\equiv1\pmod4</script>.</p>
</blockquote>
<div class="proof2">
Pf: Assume \(p\equiv1\pmod4\) and pick some integer \(n\) such that \(p\mid(n^2+1)\). Working in \(\mathbb Z[i]\), we can write \(n^2+1=(n-i)(n+i)\). Assume that \(p\) remains in the Gaussian integers. This would mean that \(p\mid(n-i)\) or \(p\mid(n+i)\). However, both of these are nonsense because \(\frac{n\pm i}p\not\in\mathbb Z[i]\) since its coefficients are not integers. Thus, \(p\) must not be prime so it factors and hence is the sum of two squares. \(\square\)
</div>
<p>This means that the normal primes that are also Gaussian primes are exactly those that are congruenct to 1 modulo 4. However, are there any Gaussian primes that are not normal primes? Short answer, yes. In fact, if <script type="math/tex">p</script> is a Guassian prime, then so is <script type="math/tex">ip</script>. Furthermore, it can be shown <sup id="fnref:14"><a href="#fn:14" class="footnote">15</a></sup> that <script type="math/tex">a+bi</script> with <script type="math/tex">a,b\neq0</script> is a Gaussian prime if and only if <script type="math/tex">N(a+bi)</script> is a normal prime. Finally, because why stop there, I have one last exercise</p>
<blockquote>
<p>Exercise<br />
Extend the work done here to show that an integer <script type="math/tex">N\in\mathbb Z</script> can be written as the sum of two squares if and only if every prime <script type="math/tex">p\equiv3\pmod4</script> that divides it does so evenly many times.</p>
</blockquote>
<p>As an example, we can write <script type="math/tex">180=2^2*3^2*5=6^2+12^2</script>, but we cannot write <script type="math/tex">105=3*5*7</script> as the sum of two squares no matter how hard we try.</p>
<h1 id="thoughts">Thoughts</h1>
<p>The second half of this post did not turn out as well as I had hoped it would have. I probably should have thought through the order I would cover steps of the proof, and decided some minimum level of mathematically knowledge that I would assume before writing it in order to avoid it sound like a poorly motivated mess<sup id="fnref:15"><a href="#fn:15" class="footnote">16</a></sup>. But oh well, too late now.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>I wonder what percentage of my posts start with me thinking back on an ealier time in my life, and if this percentage will remain relativley constant as time passes. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>This blog is really just a place for me to brag <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>I suspect I was drawn to the consistent combination of brute force, and mathematical sleight of hand I used in answering its questions. I remember starting problems with equations in ~4 variables and then employing some trick to eventually find an answer in all the mess. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Except the ones I leave for the reader <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>If you don’t see how this picture is relevant, and don’t want to go to the link, my advice is to find the area of that square two different ways <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>If this fancy Z scares you, don’t let it. It’s just notation for the set of integers so this says a,b,c are integers. The Z is short for some German word, I think but I’d have to look it up to be sure. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>Have no common factors <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>Technically, we also need to show all (x,y) satisfying x^2+y^2=1 are on the unit circle, but this is easy and essentially the same argument. <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>I spent some time trying to adjust animation speeds so everything was smooth, but then quickly gave up and decided this was good enough <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Our correspondence isn’t exactly 1-1. No rational number gets matched up with the point reference point (1,0) on the circle. This is fine, and in case you’re curious, it’s a consequence of the fact that our correspondence is continuous and the circle is fundamentally different from the line (Ex. it has a hole while a line does not), but a punctured circle (circle missing 1 point) isn’t (you can uncurl it into a line like we’ve done) <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
<li id="fn:11">
<p>If the last footnote makes you think, we still missed (1,0) <—> (1,0,0), this solution comes from m=1 and n=0. This doesn’t contradict the previous footnote, because this choice of m and n gives t=1/0 which is very much not rational. <a href="#fnref:11" class="reversefootnote">↩</a></p>
</li>
<li id="fn:12">
<p>I don’t know the exact details but I saw them once long ago. I’m not going to try to reproduce them here. <a href="#fnref:12" class="reversefootnote">↩</a></p>
</li>
<li id="fn:13">
<p>or at least not one that I know <a href="#fnref:13" class="reversefootnote">↩</a></p>
</li>
<li id="fn:16">
<p>Here, I ignore the issuze of Z[i] being a UFD which means every number factors uniquely into primes like they do in the integers. This is not a trivial/obvious property for a set of numbers to have, but I didn’t want to get into the details of proving this. <a href="#fnref:16" class="reversefootnote">↩</a></p>
</li>
<li id="fn:14">
<p>and is up to you to show that. One direction is easy. The other I’m not 100% sure is true but I’m pretty sure it is. <a href="#fnref:14" class="reversefootnote">↩</a></p>
</li>
<li id="fn:15">
<p>and it being almost circular in the end <a href="#fnref:15" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>When I was a freshman in high-school1, my math teacher showed me this site where you can find a seemingly endless supply of math problems; I loved it. I spent a decent amount of my time solving problems, and a significant amount of my time failing to solve problems, but I enjoyed it either way. On the site, there are different categories, and you have a level from 1-5 in each category, indicating how well you do at solving those problems. After a while on the site, I came to be level 5 in number theory2, and was pretty shocked. I had barely heard of this “number theory” field, and wasn’t sure what about it made me do well3, so even before I had an idea of what number theory was, it seemed like a interesting field. I wonder what percentage of my posts start with me thinking back on an ealier time in my life, and if this percentage will remain relativley constant as time passes. ↩ This blog is really just a place for me to brag ↩ I suspect I was drawn to the consistent combination of brute force, and mathematical sleight of hand I used in answering its questions. I remember starting problems with equations in ~4 variables and then employing some trick to eventually find an answer in all the mess. ↩