Jekyll2018-06-06T07:20:46+00:00https://nivent.github.io/Thoughts of a ProgrammerNiven Achenjang's Personal WebsiteOrbit-Stabilizer for Finite Group Representations2018-06-06T00:19:00+00:002018-06-06T00:19:00+00:00https://nivent.github.io/blog/orbit-stab-rep<p>One of my professors covered the main result of this post during a class that I missed awhile ago. Using some notes from a friend who attended that class, I want to try to reconstruct the theorem <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Experience with representation theory will be useful for this post, but I’ll try to cover enough of the basics so that previous exposure isn’t strictly required.</p>
<h1 id="orbit-stabalizer">Orbit-Stabalizer</h1>
<p>We will begin by continuing our discussion of group actions from <a href="../geo-group">last post</a>. Recall the definition</p>
<div class="definition">
Let $G$ be a group and let $X$ be a set. A <b>(left) group action</b> of $G$ on $X$ is a map $\phi:G\times X\rightarrow X$ satisfying
<ul>
<li> $1\cdot x=x$ for all $x\in X$ where $1\in G$ is the identity</li>
<li> $g\cdot(h\cdot x)=(gh)\cdot x$ for all $x\in X$ and $g,h\in G$</li>
</ul>
where $g\cdot x$ denotes $\phi(g,x)$.
</div>
<p>We sometimes write $G\curvearrowright X$ to denote that $G$ acts on $X$. If $X$ has additional structure (e.g. if $X$ is a vector space), then we require our group action to respect $X$’s structure. In general, a group action $G\curvearrowright X$ is a map $G\rightarrow\Aut(X)$ where the automorphisms of $X$ depend on the context <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
<p>Now, in order to (state and) prove Orbit-Stabalizer, we’ll need to know what those words mean.</p>
<div class="definition">
Let $G$ be a group acting on a set $X$. Given some $x\in X$, its <b>orbit</b> is
$$G\cdot x=\{g\cdot x\mid g\in G\}$$
Furthermore, its <b>stabalizer</b> is
$$\Stab(x)=G_x=\{g\in G\mid g\cdot x=x\}$$
</div>
<p>Note that the stabalizer of $x\in X$ is a subgroup of $G$ since $g,h\in G_x\implies(g\inv h)\cdot x=g\cdot x=x$. Furthermore, if $G\cdot x=X$ for some $x\in X$, then we say $G$ acts <b>transitively</b> on $X$.</p>
<p>Finally, if $G\curvearrowright X$, then we call $X$ a <b>$G$-set</b>. Naturally, these spaces have their own notion of homomorphisms.</p>
<div class="definition">
Let $X,Y$ be two $G$-sets. A <b>$G$-map</b> (or <b>$G$-equivariant map</b> or <b>$G$-morphisms</b>) is a map $f:X\rightarrow Y$ s.t. $f(g\cdot x)=g\cdot f(x)$ for all $g\in G$ and $x\in X$. We say $f$ is a <b>$G$-isomorphism</b> if it is bijective.
</div>
<div class="exercise">
Show that if $f$ is a $G$-isomorphism, then $\inv f$ is $G$-equivariant.
</div>
<p>With our definitions set up, we come to</p>
<div class="theorem" name="Orbit-Stabilizer">
Let $X$ be a $G$-set. Fix any $Y\subseteq X$ s.t. $g\cdot Y\cap Y\in\{Y,\emptyset\}$ for all $g\in G$ and $G\cdot Y=\{g\cdot y\mid g\in G,y\in Y\}=X$. Finally, let $H=\Stab(Y)=\{g\in G\mid\forall y\in Y:g\cdot y\in Y\}$. Then,
$$X\simeq\bigsqcup_{\sigma_i\in G/H}\sigma_iY$$
as $G$-sets where the union is taken over coset representatives of $G/H$ and $\sigma_iY=\{\sigma_i y\mid y\in Y\}$ (note: $\sigma_iy$ is just a formal symbol) and $G$ acts on it via $g\cdot(\sigma_iy)=\sigma_j(h\cdot y)$ for the unique $\sigma_j,h$ s.t. $g\sigma_i=h\sigma_j$.
</div>
<div class="proof4">
Let $f:\bigsqcup_{\sigma_i\in G/H}\sigma_iY\to X$ be the map $f(\sigma_iy)=\sigma_i\cdot y$. This map is $G$-equivariant since
$$f(g\cdot\sigma_iy)=f(\sigma_j(h\cdot y))=\sigma_j\cdot(h\cdot y)=(\sigma_jh)\cdot y=(g\sigma_i)\cdot y=g\cdot(\sigma_i\cdot y)=g\cdot f(\sigma_iy)$$
where $g\sigma_i=\sigma_jh$. For injectivity, if $\sigma_i\cdot y=\sigma_j\cdot y'$, then
$$(\inv\sigma_j\sigma_i)\cdot y=y'\implies\inv\sigma_j\sigma_i\in H\implies\sigma_iH=\sigma_jH\implies\sigma_i=\sigma_j\implies y=y'$$
where the second-to-last implication comes from the fact that we fixed our coset representatives ahead of time. Finally, for surjectivity, fix any $x\in X$. Since $G\cdot Y=X$, there exists $g\in G$ and $y\in Y$ s.t. $g\cdot y=x$. Thus, writing $g=\sigma_jh$, we have that $f(\sigma_j(h\cdot y))=x$.
</div>
<div class="cor">
Let $X$ be a $G$-set, and fix any $x\in X$. Then, $|G\cdot x|=|G:G_x|=|G|/|G_x|$
</div>
<div class="proof4">
Apply the above theorem to the $G$-set $G\cdot x$ where $Y=\{x\}$.
</div>
<p>It’s worth noting that Orbit-Stabilizer usually only refers to the corollary above, but this stronger version is closer to our main theorem.</p>
<h1 id="a-quick-intro-to-representations-of-finite-groups">A Quick Intro to Representations of Finite Groups</h1>
<p>Now that we’ve seen Orbit-Stabilizer, we’ll need to introduce some definitions from representation theory.</p>
<div class="definition">
Fix a group $G$ and a vector space $V$ over a field $\F$. A <b>(linear) representation of $G$</b> is a map $\rho:G\rightarrow\GL_{\F}(V)$. Given such a map, we call $V$ a <b>$G$-rep</b>, and morphisms of $G$-reps are $G$-equivariant linear maps. Finally $\theta:G\to\GL_{\F}(U)$ is a <b>subrepresentation</b> if $U\subseteq V$ and $\theta(g)=\rho(g)\mid_U$ for all $g\in G$.
</div>
<p>When studying linear representations of groups, there are two main perspectives one can take. Everything can be done in terms of an explicit representation (i.e. the map $\rho$ above) or in terms of modules over the group ring. Since I haven’t talked about modules on this blog before <sup id="fnref:4"><a href="#fn:4" class="footnote">3</a></sup>, I’ll stick to the explicit representation approach and leave exercises to translate things into statements about modules for the interested reader.</p>
<p><span class="exercise">
Prove that a linear representation of $G$ is the same thing as an $\F[G]$-module. <sup id="fnref:5"><a href="#fn:5" class="footnote">4</a></sup>
</span></p>
<p>Thankfully, we don’t need a lot of representation theory for the main result of this post. We only need to know a few different types of linear representations. Also, in case I ever forget to mention this, for the rest of this post, assume all vector spaces are finite-dimensional and assume that all groups are finite.</p>
<div class="definition">
A <b>permutation representation</b> of $G$ on a finite-dimensional $\F$-vector space $V$ is a linear representation $\rho:G\rightarrow\GL(V)$ in which the elements of $G$ act by permuting some basis $B=\{b_1,\dots,b_n\}$ for $V$.
</div>
<div class="example">
Consider the symmetric group $S_n$ acting on $\C^n=\bigoplus_{i=1}^n\C e_i$ via $\sigma\cdot e_i=e_{\sigma(i)}$.
</div>
<div class="example">
Let $G$ be any finite group, and consider $\C[G]\simeq\bigoplus_{g\in G}\C g$ as vector spaces. This is the <b>regular representation</b> when $G$ acts via $h\cdot g=hg$ on the basis.
</div>
<p>Finally, we need the notion of induced representations. This let’s you take a representation of a group $H$ and canoncially construct a representation of a larger group $G\supseteq H$. The construction is very reminiscent of the Orbit-Stabilizer theorem.</p>
<div class="construction">
Let $H\le G$ be a subgroup of $G$, and let $V$ be an $H$-rep. Fix a complete set of coset representatives $\sigma_1=e,\dots,\sigma_n\in G$ s.t. $G/H=\{\sigma_iH:0\le i\le n\}$ and $n=|G/H|$. Then, as a vector space, the <b>induced representation</b> from $H$ to $G$ is
$$\Ind_H^GV=\bigoplus_{i=1}^n\sigma_iV$$
where $\sigma_iV=\{\sigma_iv\mid v\in V\}$ is a space of formal symbols. This is given a $G$-action as follows: given some $\sigma_iv\in\Ind_H^GV$, there's a unique $\sigma_j$ and $h\in H$ s.t. $g\sigma_i=\sigma_jh$. We define $g\cdot\sigma_iv=\sigma_j(h\cdot v)$.
</div>
<div class="exercise">
Prove that, as $\F[G]$-modules, we have
$$\Ind_H^GV\simeq\F[G]\otimes_{\F[H]}V$$
so induction is really just extension of scalars.
</div>
<div class="example">
The regular representation is $\Ind_1^G\F$ where $1$ denotes the trivial group and $G$ acts trivially (i.e. by the identity) on $\F$.
</div>
<h1 id="orbit-stabilizer-v2">Orbit-Stabilizer v2</h1>
<p>This is where we’ll prove the main result, which roughly says that permutation representations are induced representations.</p>
<div class="theorem" name="Orbit-Stabilizer Variation">
Let $V$ be a $G$-rep with a decomposition $V\simeq\bigoplus_{i=0}^nV_i$ as a vector space s.t. for all $i,j\in\{0,\dots,n\}$, there exists a $g\in G$ s.t. $g\cdot V_i=V_j$, and let $H=\Stab(V_0)$. Then,
$$V\simeq\Ind_H^GV_0$$
</div>
<div class="proof4">
We will show this by constructing an explicit isomorphism. Let $f:\Ind_H^GV_0\rightarrow V$ be the map given by
$$f(\sigma_iv_0)=\sigma_i\cdot v_0$$
This is easily seen to be $G$-equivariant, and it is linear by construction. For surjectivity, it suffices to find preimages for elements of the form $v_i\in V_i$. Given such an element, there exists some $g_i\in G$ and $w_i\in V_0$ s.t. $g_i\cdot w_i=v_i$. Now, we can write $g_i=h_i\ith\sigma_j$ for a unique $h_i\in H$ and coset representative $\ith\sigma_j$. Doing so gives us that $f(\ith\sigma_j(h_i\cdot w_i))=v_i$ so $f$ is surjective as claimed. Finally, we need to show that $f$ is injective, so fix some $w=\sum_{\sigma_i\in G/H}\sigma_i\ith v_0\in\ker f$. This means that $\sum_{\sigma_i\in G/H}\sigma_i\cdot\ith v_0=0$, but we claim that $\sigma_i\cdot\ith v_0$ and $\sigma_j\cdot\Ith vj_0$ belong to different summands (i.e. different $V_i$'s) which forces $\sigma_i\cdot\ith v_0=0\implies\ith v_0=0$ for all $i$ so $w=0$. To prove the claim, suppose that $\sigma_i\cdot\ith v_0,\sigma_j\cdot\Ith vj_0\in V_k$ for some $k$. Then,
$$\inv\sigma_j\sigma_i\cdot\ith v_0\in V_0\implies\inv\sigma_j\sigma_i\in H\implies\sigma_j=\sigma_i$$
and we win.
</div>
<p>This wasn’t the proof I had in mind. I imagined (and still do) that it was possible to directly apply the original orbit-stabilizer by letting $X$ be a (well-chosen) basis for $V$ and $Y$ be a (well-chosen) basis for $V_0$. However, in trying to make this work, I ran in to issues getting a well-defined action of $G$ on $B$. Basically, $H=\Stab(V_0)$ can act nontrivially so it’s possible that $h\cdot B_0\not\subseteq B$ which is troublesome. I still hold out hope that this idea can be salvaged in general<sup id="fnref:6"><a href="#fn:6" class="footnote">5</a></sup>, so</p>
<div class="exercise">
See if you can come up with a proof of the above that applies the original Orbit-Stabilizer theorem (e.g. apply it to a basis of V and then extend linearly). If you can, let me know.
</div>
<p>Even though the proof is a little unsatisfying, we have proven what we set out to prove, so let’s end with a couple examples. $\newcommand{\trv}{\underline{\text{Trv}}}\newcommand{\alt}{\underline{\text{Alt}}}$</p>
<div class="example">
Consider $S_n\curvearrowright\Sym^2\C^n$ where $\C^n=\bigoplus\C e_i$ and $S_n$ acts by permuting the $e_i$. Restricting this action to the basis $B=\{e_ie_j:i,j\in\{1,\dots,n\}\}$, we see there are two $S_n$-orbits
$$\begin{matrix}
B_0 &=& \brackets{e_ie_j:i\neq j} && \Stab(\C e_1e_2) &=& S_2\times S_{n-2}\\
B_1 &=& \brackets{e_1^2,\dots,e_n^2} && \Stab(\C e_1^2) &=& S_{n-1}
\end{matrix}$$
Thus we can write $\Sym^2\C^n=V\oplus W$ where $V=\C B_0=\bigoplus_{i\neq j}\C e_ie_j$ and $W=\C B_1=\bigoplus_{i=1}^n\C e_i^2$. Furthermore, $S^n$ acts transitively on these decompositions of $V,W$ so applying our theorem ($V_0=\C e_1e_2$ and $W_0=\C e_1^2$) yields
$$\Sym^2\C^n\simeq\parens{\Ind_{S_2\times S_{n-2}}^{S_n}\trv\otimes\trv}\oplus\parens{\Ind_{S_{n-1}}^{S_n}\trv}$$
where $\trv$ is the trivial 1-dimensional $S_k$ representation sending each element to the number 1.
</div>
<div class="example">
This time, let's look at $S_n\curvearrowright\parens{\Wedge^2\C^n}\otimes\C^n$ where $\C^n=\bigoplus e_i$ and $S_n$ again acts by permuting the $e_i$. We have a basis $B=\{(e_i\wedge e_j)\otimes e_k:i,j,k\in\{1,\dots,n\},i< j\}$ but it's not fixed by $S_n$ (e.g. $(12)\cdot(e_1\wedge e_2)\otimes e_3=(e_2\wedge e_1)\otimes e_3\not\in B$), so we'll look instead at the spanning set $B'=\{(e_i\wedge e_j)\otimes e_k:i,j,k\in\{1,\dots,n\},i\neq j\}$ which is fixed by $S_n$. This has the following orbits
$$\begin{matrix}
B_0 &=& \brackets{(e_i\wedge e_j)\otimes e_k:i\neq j\neq k} && \Stab(\C (e_1\wedge e_2\otimes e_3)) &=& S_2\times S_{n-3}\\
B_1 &=& \brackets{(e_i\wedge e_j)\otimes e_k:i\neq j,k\in\{i,j\}} && \Stab(\C(e_1\wedge e_2\otimes e_1)) &=& S_{n-2}
\end{matrix}$$
It's worth noting that $(12)\cdot(e_1\wedge e_2)\otimes e_1=-(e_1\wedge e_2)\otimes e_2$ so we can switch whether $k=i$ or $k=j$ in $B_1$ above. Applying our theorem to (the span of) each orbit and summing them up, we get that
$$\Wedge^2\C^n\otimes\C^n\simeq\parens{\Ind_{S_2\times S_{n-3}}^{S_n}\alt\otimes\trv}\oplus\parens{\Ind_{S_{n-2}}^{S_n}\trv}$$
where $\alt$ is the alternating 1-dimensional $S_k$ representation sending each element to its sign.
</div>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>which, unsurprisingly, is a version of Orbit-stabilizer for representations of finite groups <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>for X a set, they are (self) bijections <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>but really should at some point <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>This includes proving that $\F[G]$-linear maps are $G$-equivariant and that submodules correspond to subrepresentations <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>It certainly can be in the case that H does indeed act trivially (or at least stabalizes the basis)… Question: is there always a basis B_0 s.t. Stab(V_0) is contained in Stab(B_0)? <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>One of my professors covered the main result of this post during a class that I missed awhile ago. Using some notes from a friend who attended that class, I want to try to reconstruct the theorem 1. Experience with representation theory will be useful for this post, but I’ll try to cover enough of the basics so that previous exposure isn’t strictly required. which, unsurprisingly, is a version of Orbit-stabilizer for representations of finite groups ↩Groups Aren’t Abstract Nonsense2018-03-26T02:08:00+00:002018-03-26T02:08:00+00:00https://nivent.github.io/blog/geo-group<p>I’ve recently been skimming through this book called <a href="https://press.princeton.edu/titles/11042.html">“Office Hours with a Geometric Group Theorist”</a> which, perhaps unsurprisingly, is about using geometric objects to study groups. It mostly focuses on how group actions on graphs and metric spaces can reveal information about the group<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, and contains some pretty nice results. Unfortunately, I have too many in mind for one post, but I would still like to introduce the basic notions of the subject and a few results I enjoyed. I imagine this will be a lengthy post with a mix of introducing theory and neat applications<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
<h1 id="groups-actions-">Groups Actions <sup id="fnref:9"><a href="#fn:9" class="footnote">3</a></sup></h1>
<p>While I didn’t emphasize this much then, towards the beginning of my <a href="../group-intro">intro post</a> on group theory, I mentioned that groups often “perform some action on an object.” Here is where we’ll get to see what I meant and why this is useful.</p>
<div class="definition">
Let $G$ be a group and $X$ a set. A <b>(left) group action</b> of $G$ on $X$ (sometimes denoted $G\curvearrowright X$, pronounced "$G$ acts on $X$") is a function $G\times X\rightarrow X$ where the image of $(g,x)$ is denoted by $g\cdot x$ satisfying
<ul>
<li> $1\cdot x=x$ for all $x\in X$ where $1\in G$ is the identity</li>
<li> $g\cdot(h\cdot x)=(gh)\cdot x$ for all $x\in X$ and $g,h\in G$</li>
</ul>
</div>
<p>In essence, a group action is an action of $G$ on $X$ (i.e. a function $G\times X\rightarrow X$) that respects the group structure of $G$. Above, $X$ is just a set so this is all we require. When we later look at actions on graphs or metric spaces, we’ll further require that $G$’s actions preserves $X$’s structure<sup id="fnref:3"><a href="#fn:3" class="footnote">4</a></sup>.</p>
<p>One of the most basic examples of a group action comes from symmetric groups.
<span class="definition">
Given a set $X$, its <b>symmetric group</b> $S_X$ is $S_X=\{f:X\rightarrow X\mid f\text{ bijective}\}$ with composition as the group operation. This has a natural action on $X$; namely $f\cdot x=f(x)$.
</span>
If you want, you can verify that this gives a group action, but it is pretty tautological. When $X$ is just a set, there’s no additional structure to preserve so we think of its symmetries as just being permutations; hence the name of the above group. When you think about it, for any group action $G\curvearrowright X$, each $g\in G$ induces a function $X\rightarrow X$ and this function turns out to be a permutation. Thus, we have the following
<span class="exercise">
Prove that a group action $G\curvearrowright X$ is the same thing as a group homomorphism $G\rightarrow S_X$. If this homomorphism is injective, then we say that the action is <b>faithful</b>.
</span>
As a corollary to this exercise, we get the following
<span class="theorem" name="Cayley">
Every group is isomorphic to a subset of a symmetric group.
</span>
<span class="proof4">
Let $G$ be a group. By the above exercise it suffices to show that $G$ acts faithfully on some set $X$. We can simply take $X=G$ with action given by left multiplication (i.e. $g\cdot h=gh$). This action is faithful since every $g\in G$ acts differently on the identity, so we win.
</span>
When studying general group actions, there are two basic concepts of extreme importance.
<span class="definition">
Let $G$ be a group acting on a set $X$. Given some $x\in X$, its <b>orbit</b> is
<script type="math/tex">G\cdot x=\{g\cdot x\mid g\in G\}</script>
Furthermore, its <b>stabalizer (or stabalizer subgroup)</b> is
<script type="math/tex">G_x=\{g\in G\mid g\cdot x=x\}</script>
</span>
Personally, I like to think of orbits and stabalizers in terms of graphs. Given an action $G\curvearrowright X$, you can form a graph with vertex set $X$, such that for each $(x,g)\in X\times G$ you get an edge from $x$ to $g\cdot x$. In this language, orbits are connected components of this graph and (elements of) stabalizers correspond to self edges. <sup id="fnref:4"><a href="#fn:4" class="footnote">5</a></sup></p>
<p>The following two theorems are good to know for general group action knowledge, but I do not believe either will be used in this post, so I won’t bother proving them here.
<span class="theorem" name="orbit-stabilizer">
Let $G$ be a group acting on $X$. Fix some $x\in X$. Then,
<script type="math/tex">|G\cdot x|=|G:G_x|=\frac{|G|}{|\Stab(x)|}</script>
</span>
<span class="lemma" name="Burnside">
Let $G$ be a finite group acting on a set $X$. For $g\in G$, let $X^g=\{x\in X\mid g\cdot x=x\}$ be the elments fixed by $g$, and let $X/G$ denote the (set-theoretic) quotient of $X$ by the equivalence relation $x\sim y\iff x\in G_y$. Then,
<script type="math/tex">|X/G|=\frac1{|G|}\sum_{g\in G}|X^g|</script>
</span>
To finish this section, I’ll mention a few definitions that should come up in this post. There are a few types of group actions. In the first exercise above, I defined what a faithful action is already; two more of importance are transitive and free actions.
<span class="definition">
A group action $G\actson X$ is called <b>transitive</b> if $G\cdot x=X$ for some (equivalently, all) $x\in X$. The action is <b>free (or $G$ acts freely on $X$)</b> if $\Stab(x)$ is trivial for all $x\in X$ (i.e. no non-trivial element of $G$ fixes any element of $X$)
</span>
We’ll see that free actions in particular reveal algebraic information about the group involved.</p>
<h1 id="graphs">Graphs</h1>
<p>If we’re gonna study groups through their actions, then we’re gonna need objects for them to act on. Sets are all well and good, but are too unrestrictive. The more structure we require on the objects we act on, the more we will have at our disposal to gain information from.</p>
<p>To begin, we will remind the reader of what a graph is, and then introduce how groups act on them.
<span class="definition">
A <b>graph</b> $G=(V,E)$ is pair consisting of a set $V=V(G)$ or <b>vertices</b> and a set $E=E(G)\subseteq V\times V$ of <b>edges</b>. If $(u,v)\in E\iff(v,u)\in E$ for all $u,v\in V$, then we say $G$ is <b>undirected</b>. If $(v,v)\not\in E$ for all $v\in V$, then we say $G$ is <b>simple</b>.
</span></p>
<p><span class="exercise">
If you don’t already know them, look up definitions for paths, connected graphs, and trees. <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>
</span></p>
<p>Graphs are hardly ever written down in terms of explicit vertex and edge sets. More commonly, they are given as some drawing.</p>
<p><span class="example">
Below is an example of an undirected (but not simple) graph
<center><img src="https://nivent.github.io/images/blog/geo-group/graph.png" width="600" height="100" /></center>
</span>
To define a group action on a graph, we need a notion of an (invertible) structure preserving map. This will give us a notion of when two graphs are the same, and the different ways in which we can view a graph as being the same as itself<sup id="fnref:5"><a href="#fn:5" class="footnote">7</a></sup> will be what we call it’s symmetries.
<span class="definition">
Given two graphs $G,H$, a <b>graph isomorphism</b> is a bijection $f:V(G)\rightarrow V(H)$ s.t. <script type="math/tex">(u,v)\in E(G)\iff(f(u),f(v))\in E(H)\text{ for all }u,v\in V(G)</script>
A graph ismorphism $f:V(G)\rightarrow V(G)$ bewteen a graph and itself is called an <b>automorphism</b>.
</span>
<span class="exercise">
Given a graph $G$, let $\Aut(G)$ denote the set of automorphisms of $G$. Prove that this set forms a group under composition.
</span>
<span class="example">
Let $K_n$ denote the complete graph on $n$ vertices (i.e. every vertex is connected to every other vertex). Then, every vertex is interchangable, so any rearrangement of vertices gives an isomorphism. Hence $\Aut(K_n)\simeq S_n$.
</span>
<span class="exercise">
$K_{n,m}$ is the graph with vertex set $V=V_1\sqcup V_2$ s.t. $|V_1|=m$, $|V_2|=n$, and $E=V_1\times V_2\cup V_2\times V_1$. e.g. $K_{3,2}$ is pictured below
<center><img src="https://nivent.github.io/images/blog/geo-group/k32.png" width="300" height="100" /></center>
Calculate $\Aut(K_{n,m})$ when $n\neq m$ and $\Aut(K_{n,n})$ for all $n,m$.
</span>
Now that we’re comfortable with graphs and their isomorphisms, we finally define
<span class="definition">
A <b>group action on a graph</b> is a group homomorphism $G\rightarrow\Aut(\Gamma)$ where $G$ is a group and $\Gamma$ is graph.
</span>
<span class="exercise">
Let $G\actson\Gamma:G\rightarrow\Aut(\Gamma)$ be a group action with $\Gamma$ a graph. Show that this is the same thing as a group action $G\actson V(\Gamma)$ s.t. $(v,u)\in E(\Gamma)\iff(g\cdot v,g\cdot u)\in E(\Gamma)$.
</span>
If the above exercise seems obvious, then that’s a good sign that we’ve made good definitions. Admittedly, groups acting on graphs don’t really make an appearance in any of the applications I want to talk about today (although they will appear in the “further reading” section at the end), but this is still something one should know. As our only real use of this, we’ll prove a slightly strong version of Cayley’s theorem.
<span class="definition">
Let $G$ be a group with a generating set $S$. Its <b>Cayley Graph</b> (with respect to $S$) is the graph $\Gamma(G,S)=(V,E)$ where $V=G$ and $E=\{(g,gs)\mid\forall g\in G\text{ and }s\in S\}$.
</span>
<span class="example">
The Cayley Graph for $S_3$ with generating set $S={(12),(23)}$ is pictured below (blue edges are $(12)$ and red edges are $(23)$)
<center><img src="https://nivent.github.io/images/blog/geo-group/s3.png" width="300" height="100" /></center>
</span>
<span class="theorem" name="Cayley">
Every group is isomoprhic to the automorphism group of some graph.
</span></p>
<div class="proof4">
Fix a group $G$ with a generating set $S$, and let $\Gamma=\Gamma(G,S)$ be its Cayley graph. We will show in particular that $G\simeq\Aut(\Gamma(G,S))$. Now, we clearly have a homomorphism $G\rightarrow\Aut(\Gamma)$ given by $g\cdot v_h=v_{gh}$ where to avoid confusition we denote the vertex set of $\Gamma$ by the symbols $\{v_g\mid g\in G\}$. This action is faithful because, letting $e\in G$ denote the identity, $g\cdot v_e\neq h\cdot v_e$ for $g\neq h$. Hence, we only need to show all automorphisms arise in this way.<br />
To make this part of the proof simple, we'll need to impose one further restriction on our automorphisms: they must preserve edge labels (e.g. $(g,gs)\in E\iff(\phi(g),\phi(g)s)\in E$ for $\phi\in\Aut(\Gamma)$ where $s\in S$ is regarded as the label of that edge). Now, consider an arbitrary $\phi\in\Aut(\Gamma)$ and write $\phi(v_e)=v_g$ where $e$ is the identity. Fix some vertex $v_h\in V(\Gamma)$. We will show that $\phi(v_h)=\phi_g(v_h):=v_{gh}$ by inducting on the length of the shortest (not necessarily directed) path from $v_e$ to $v_h$. This obviously holds when the path has length 0, so suppose the path has length $n>0$, so we can write $h=ws$ where there's a path of length $n-1$ from $v_e$ to $v_w$ and $s\in S$. By our inductive hypothesis, $\phi(v_w)=\phi_g(v_w)=v_{gw}$ so $\phi$ carries the edge $(v_w,v_h)$ to the edge $(v_{gw},v_{gws})$ to $\phi(v_h)=v_{gws}=v_h$ and hence $\phi=\phi_g$, proving the claim.
</div>
<p><span class="exercise">
When inducting in the above proof, we claim that $s\in S$, but in actuality, it’s also possible that $\inv s\in S$ instead. Finish the proof by handling this case.
</span></p>
<p><span class="exercise">
In the above proof, we assume that automorphisms preserve edge labels. Does the theorem still hold without this extra assumption (hint: <sup id="fnref:7"><a href="#fn:7" class="footnote">8</a></sup>)?
</span></p>
<p>This realizes every group as the symmetries of some graph. Hence, even the most abstract groups have some kind of concrete realization.</p>
<h1 id="metric-spaces">Metric Spaces</h1>
<p>We’re almost to the first nice result; I promise. Before we can state it though, I need to introduce the concept of a Metric space and how they’re acted on by groups. Intuitively, a metric space is anywhere you have some notion of distance.</p>
<div class="definition">
A <b>metric space</b> $(X,d)$ is a set $X$ together with a function $d:X\times X\rightarrow\R_{\ge0}$ satisfying the following:
<ul>
<li> positive definitedness: $d(x,x)\ge0$ for all $x\in X$ with equality iff $x=0$</li>
<li> symmetry: $d(x,y)=d(y,x)$ for all $x,y\in X$</li>
<li> triangle inequality: $d(x,y)\le d(x,z) + d(z,y)$ for all $x,y,z\in X$</li>
</ul>
</div>
<div class="example">
There are plenty of examples of metric spaces already familiar to you
<ul>
<li> Euclidean space $\R^n$ with the Euclidean metric
$$d((x_1,\dots,x_n),(y_1,\dots,y_n))=\sqrt{\sum_{i=1}^n|x_i-y_i|^2}$$
</li>
<li> Euclidean space with the taxicab metric
$$d((x_1,\dots,x_n),(y_1,\dots,y_n))=\sum_{i=1}^n|x_i-y_i|$$
</li>
<li> Any graph $\Gamma$ (really just its vertex set $V(\Gamma)$) with the path metric, where the distance bewteen any two vertices is the length of the shortest path between them.
</li>
</ul>
</div>
<p>The next example is of particular importance, and it also a bit surprising at first.</p>
<div class="example">
Let $G$ be a group with generating set $S$ and Cayley graph $\Gamma=\Gamma(G,S)$. Then, we can view $G$ also as a metric space with metric given by the path metric on $\Gamma$ (technically, $V(\Gamma)$)!
</div>
<p>There’s actually an alternative way to think of this metric <sup id="fnref:8"><a href="#fn:8" class="footnote">9</a></sup>. Let $\inv S=\{\inv s\mid s\in S\}$ and define the <strong>word length</strong> of an arbitrary $g\in G$ to be the length of the shortest word in $S\cup\inv S$ that is equal to $g$. Then, we can turn $G$ into a metric space where the distance between $g,h\in G$ is the word length of $\inv gh$.</p>
<div class="exercise">
Show that the word length of $g\in G$ with respect to $S$ is the length of the shortest path from $v_e$ to $v_g$ in $\Gamma(G,S)$. Furthermore, show that this word length construction and the Cayley graph construction give $G$ the same metric (in the sence that $d_\text{word length}(g,h)=d_\text{Cayley}(g,h)$)
</div>
<div class="exercise">
Show that the word metric on $\Z^2$ is the same as the taxicab metric.
</div>
<p>Naturally, we have a notion of a symmetry of a metric space coming from ways of viewing a metric space as equivalent to itself.</p>
<div class="definition">
Let $(X,d_X), (Y,d_Y)$ be two metric spaces. An <b>isometry</b> $f:X\rightarrow Y$ is bijective function s.t.
$$d_X(x,y)=d_Y(f(x),f(y))$$
for all $x,y\in X$. Furthermore, the set of isometries from $X\rightarrow X$ forms a group denoted $\DeclareMathOperator{\Isom}{Isom}\Isom(X)$.
</div>
<div class="definition">
A <b>group action on a metric space</b> is a group homomorphism $G\rightarrow\Isom(X)$ where $G$ is a group and $X$ a metric space
</div>
<div class="exercise">
Give an equivalent formulation of groups acting on metric spaces
</div>
<div class="example">
<ul>
<li> $\zmod n$ acts on $\R^2$ with the Euclidean metric via rotations </li>
<li> $\Z^n$ acts on $\R^n$ with the Euclidean or taxicab metrics by translations </li>
<li> $S^4$ acts on a tetrehdron by permuting its vertices </li>
<li> Every group acts on itself with the word metric by left multiplication </li>
</ul>
</div>
<p>Finally, let’s see how actions can be used to reveal algebraic information about groups.</p>
<h1 id="first-neat-application">First Neat Application</h1>
<p>Before I introduce our first real application of these ideas, I need to introduce one more definition.</p>
<div class="definition">
Let $G$ be a group. An element $g\in G$ is said to be <b>torsion</b> if it has finite order. If $G$ has no (non-trivial) torsion elements, then we say that $G$ is <b>torsion-free</b>
</div>
<p>Now, our goal of this section will be to prove the following theorem</p>
<div class="theorem">
If $G$ acts freely on $\R^n$ with the Euclidean metric, then $G$ is torsion-free
</div>
<p>Take a moment to let that sink in; from a almost purely geometric situation (a group acting on a metric space), we can make a non-trivial algebraic conclusion. This is a central theme of Geometric Group Theory, and more examples of this will be seen in this post<sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup>. For this particular theorem, the proof relies on the following lemma</p>
<p><span class="definition">
Let $S\subseteq\R^n$ be some collection of points. A <b>centroid</b><sup id="fnref:11"><a href="#fn:11" class="footnote">11</a></sup> of $S$ is a point $w\in\R^n$ minimizing
<script type="math/tex">\sum_{s\in S}d(w, s)^2</script>
Note that $w$ may not exist if $|S|=\infty$
</span></p>
<div class="lemma">
Any finite set in $\R^n$ has a unique centroid
</div>
<div class="proof4">
Let $S=\{\Ith s1,\dots,\Ith sm\}$ be a subset of $\R^n$, and let $f:\R^n\rightarrow\R$ be defined by $f(x)=\sum_{i=1}^nd(x,\ith s)^2$. Then, write $x=(x_1,\dots,x_n)$ and $\ith s=(\ith s_1,\dots,\ith s_n)$, so
$$\pderiv f{x_j}=\sum_{i=1}^n2(x_j-\ith s_j)\implies\grad f=2\sum_{i=1}^n(x-\ith s)$$
From this we wee that the unique point the gradient vanishes at is $f=\frac1n\sum_{i=1}^n\ith s$, so this is a critical point. Proving that this is a (global) minimum is left as an exercise. The lemma follows.
</div>
<p>With that Lemma, our theorem will actually have a fairly short proof</p>
<div class="theorem">
Let $G$ be a group acting freely on $\R^n$ with the Euclidean metric. Then, $G$ is torsion-free.
</div>
<div class="proof4">
Let $g\in G$ be any torsion element, and fix some point $x\in\R^n$. Let $\mc O=\{x,g\cdot x,g^2\cdot x,\dots,g^m\cdot x\}$ be the orbit of $x$ under $\gen g$. Since $\mc O$ is finite, we can apply the lemma to find a centroid $y\in\R^n$. Since $g$ acts by an isometry, $g\cdot\mc O$ must have $g\cdot y$ as a centroid, but $g\cdot\mc O=\mc O$! Hence, $g\cdot y=y$ so $g\in\Stab(y)$. Since we assumed that $G$ acts freely, $g$ must be the identity so $G$ is torsion-free.
</div>
<h1 id="free-groups-and-presentations">Free Groups and Presentations</h1>
<p>For the next application I want to cover, we need a little more group theory. Specifically, we need to define free groups, and since it would be a shame to introduce free groups without introducing presentations, we include them here too. The idea behind free groups is that they’re essentially the bare minimum of what’s needed to call something a group; they don’t satisfy any non-trivial relations. We will begin with the definition of a free group, and then give a “categorical” (or “universal”) characterization of them.</p>
<div class="definition">
Let $S$ be an arbitrary set, and let $\inv S=\{\inv s\mid s\in S\}$ (note: $S\cap\inv S=\emptyset$). The <b>free group on $S$</b> is the set $F(S)$ of (reduced) words in $S\cup\inv S$ with group operation given by reduced concatnation (e.g. write $ab\inv ba$ as $a^2$). The <b>rank</b> of $F(S)$ is $|S|$ and $|S|=|T|\implies F(S)\simeq F(T)$. The free group on an $n$-element set is denoted $F_n$.
</div>
<div class="exercise">
Verify that the above actually defines a group.
</div>
<div class="example">
$F_1\simeq F(\{a\})$ is the set of words in $a$. Any such word can be written as $a^n$ for some $n\in\Z$ so the map $a^n\mapsto n$ givens an isomorphism $F_1\simeq\Z$
</div>
<div class="example">
$F_2\simeq F(\{a,b\})$ is the group of words in $\{a,b\}$ so a typical element might look like $a^2\inv bab^{-3}a^3b$. Note that $ab\neq ba$ so $F_2$ is non-abelian. We do have a surjective homomorphism $F_2\rightarrow\Z^2$ given by $a\mapsto(1,0)$ and $b\mapsto(0,1)$
</div>
<p>Free groups can take a while to wrap your head around. I remember I used to be enamored with group theory because from so few axioms (only like 3 or 4), you were guaranteed to get an object that was really well-behaved with so much structure, but that all ended when I learned about free groups<sup id="fnref:14"><a href="#fn:14" class="footnote">12</a></sup>. Free groups, and by extension general (non-abelian) groups <sup id="fnref:12"><a href="#fn:12" class="footnote">13</a></sup>, are trash; they can have too much freedom and/or unintuitive structure. It’s not all bad though; this makes results involving them feel extra interesting.</p>
<div class="exercise">
Construct an explicit embedding $F_3\hookrightarrow F_2$. If that's too easy then construct an embedding $F_4\hookrightarrow F_2$.
</div>
<div class="exercise">
The previous exercise shows that you can construct an injection homomorphism $F_2\hookrightarrow F_2$ that is not surjective. Is the opposite possible? Construct an example of a surjective homomorphism $F_2\twoheadrightarrow F_2$ with non-trivial kernel, or prove non exist
</div>
<p>This construction of free groups is nice (and necessary), but we could alternatively choose to characterize free groups in terms of a so-called universal property. This has the advantage of including only the defining properties of a free group without tying the definition to any particular construction.</p>
<div class="definition">
Given a set $S$, we say $F(S)$ is a <b>free group</b> on $S$, if there exists an embedding $S\hookrightarrow F(S)$, and given any group $G$ and set map $S\rightarrow G$, there exists a unique group homomorphism $\phi:F(S)\rightarrow G$ s.t. the following diagram commutes
<center><img src="https://nivent.github.io/images/blog/geo-group/diag.png" width="150" height="100" /></center>
where the dotted line signifies that this is the map we are claiming existence of.
</div>
<div class="exercise">
Show that our previous construction satisfies this criterion
</div>
<div class="theorem">
The above characterises the free group on $S$ uniquely up to unique isomorphism
</div>
<div class="proof4">
Let $G,H$ be two groups with embeddings $S\hookrightarrow G,H$ satisfying the above criterion. Then, because $G$ is a free group on $S$ and we have a map $S\hookrightarrow H$, this extends to a unique homomorphism $\phi:G\rightarrow H$. Simiarly, we get a unique homomorphism $\psi:H\rightarrow G$ s.t. the left diagram below commutes.
<center><img src="https://nivent.github.io/images/blog/geo-group/diag2.png" width="300" height="100" /></center>
By commutativity of the left diagram, we get that $\psi\circ\phi:G\rightarrow G$ extends the embedding $S\hookrightarrow G$, but the identity function $1_G:G\rightarrow G$ does this as well. Since such a homomorphism is unqiue, we must have $\psi\circ\phi=1_G$. Similar reasoning shows that $\phi\circ\psi=1_H$ so $\phi,\psi$ are isomorphisms.
</div>
<p>This universal property of free groups gives a natural segway into our next topic. Intuitively, a presentation of a group is a compact way of writing it down. Instead of specifying every single element and how to multiply them, you only write down some set of generators and relations (e.g. words equivalent to the identity). The notation looks like</p>
<script type="math/tex; mode=display">G\simeq\pres{\text{generators}}{\text{relations}}</script>
<p>so for example, we have $F_1\simeq\gen{a}\simeq\Z$ and $\zmod n\simeq\pres{a}{a^n}$. In order to formalize this, we make use of the following theorem.</p>
<div class="exercise">
Prove that every group is the quotient of a free group
</div>
<p>Now, given a group $G$, in order to write down a presentation for it, we first find some free group $F(S)$ and normal subgroup $K\le F(S)$ s.t. $G\simeq F(S)/K$. Then, letting $R\subset K$ be a generating set for $K$, our presentation is</p>
<script type="math/tex; mode=display">G\simeq\pres SR</script>
<p>giving a formal definition of the notation<sup id="fnref:13"><a href="#fn:13" class="footnote">14</a></sup></p>
<div class="example">
The dihedral group $D_{2n}$ (symmetries of a regular $n$-gon) has presentation $D_{2n}\simeq\pres{r,f}{r^n,frfr}$ where $r$ is rotation by $2\pi/n$ degrees and $f$ is flipping across the diagonal.
</div>
<div class="exercise">
Group presentations are not unqiue. Show that $\pres{x,y}{xyx=yxy}\simeq\pres{a,b}{a^2=b^3}$
</div>
<div class="exercise">
$\zmod 2\simeq\pres{a}{a^2}$ is a 2-element group. What is the cardinality of $G\simeq\pres{a,b}{a^2,b^2}$? Can you have a familar group that $G$ contains as a subgroup?
</div>
<h1 id="second-neat-application">Second Neat Application</h1>
<p>Now that we’re a bit more aquainted with how horrible general groups can be, let’s focus on something a bit more familiar</p>
<script type="math/tex; mode=display">\DeclareMathOperator{\SL}{SL}\SL_2(\Z)=\brackets{\mat abcd\mid ad-bd=1}</script>
<p>The group of $2\times2$ matrices with integer coordinates and determinant $1$. Linear algebra is a particularly nice subject, and this is a very linear algebraic group, so it must certainly be really nice, right?</p>
<div class="theorem">
$\SL_2(\Z)$ contains $F_2$ has a subgroup
</div>
<p>Like I said, general groups are trash. This application will be similar to the last; we’ll prove a lemma that will give us a direct route to this theorem.</p>
<div class="lemma" name="Ping-Pong">
Let $G$ be a group generated by two elements $a,b$, and suppose that $G$ acts on a set $X$. If $X$ has disjoint nonempty subsets $X_a$ and $X_b$ s.t. $a^k\cdot X_b\subseteq X_a$ and $b^k\cdot X_a\subseteq X_b$ for all nonzero $k$, then $G\simeq F_2$.
</div>
<div class="proof4">
First, convince yourself that every non-empty word in $a,b$ is conjugate to one of the form $g=a^*b^*\dots b^*a^*$ where the stars are arbitrary nonzero exponents. Now, $g$ is not the identity because $g\cdot X_b\subseteq X_a$, so $G$ has no non-trivial relations and the conclusion follows.
</div>
<div class="exercise">
Show that every non-empty word in $a,b$ is conjugate to one of the form used in the proof.
</div>
<p>The above proof is gem in and of itself, because it’s so clean. In case some clarity is lost in its brevity, the idea is that as you apply each syllable (i.e. $a^*$ or $b^*$) of $g$ to $X_b$, you keeping moving back between $X_b$ and $X_a$ (like a game of ping-pong), landing away from where you started. This simple lemma will let us prove this section’s main theorem in a rather concrete way.</p>
<div class="theorem">
Fix some integer $m\ge2$. Let
$$A =\mat 1m01\text{ and }B=\mat10m1$$
Then, $A,B$ generate free subgroup of rank 2 in $\SL_2(\Z)$.
</div>
<div class="proof4">
It is easily verified that $A,b\in\SL_(\Z)$ and an induction argument show that
$$A^k=\mat1{km}01\text{ and }B^k=\mat10{km}1$$
Now, note that $A,B$ have a natural action on $\R^2$ via multiplication. Consider the sets
$$X_A=\brackets{\vvec xy\in\R^2\mid|x|>|y|}\text{ and }X_B=\brackets{\vvec xy\in\R^2\mid|x|<|y|}$$
Given $v=\hvec xy^T\in X_A$ and $v'=\hvec{x'}{y'}^T\in X_B$, we have $B^kv\in X_B$ since
$$|y+kmx|\ge|kmx|-|y|=m|k||x|-|y|\ge2|x|-|y|>|x|$$
for $k\in\Z\sm\{0\}$. Simiarly, $A^kv'\in X_A$ so the ping pong lemma applies.
</div>
<h1 id="results-for-future-posts">Results for Future Posts</h1>
<p>I mentioned in the beginning that I wouldn’t be able to cover all the results I would like. In this final section, I waat to mention some of things I left out.</p>
<div class="lemma">
If a group acts freely on a tree, then it is a free group.
</div>
<p>An immediate application of this lemma<sup id="fnref:15"><a href="#fn:15" class="footnote">15</a></sup> is the following</p>
<div class="theorem" name="Nielsen-Schreier">
Any subgroup of a free group is free
</div>
<p>While this fact may seem obvious or benign, it is definitely non-trivial. As an attempt to appreciate this tree lemma, I challenge you to prove this theorem completely algebraically (Spoiler: <sup id="fnref:16"><a href="#fn:16" class="footnote">16</a></sup>).</p>
<p>Keeping in touch with the theme of free groups, another surprising result is that $\SL_2(\Z)$ actually contains many more free groups than the one I pointed out in the last section.</p>
<div class="theorem">
For all $m\ge3$, the group
$$ \SL_2(\Z)[m]:=\ker(\SL_2(\Z)\rightarrow\SL_2(\zmod m)) $$
is free.
</div>
<p>This theorem is actually also proved by exhibiting a free action of this group on a tree.</p>
<p>Moving away from free groups, there’s a notion similar to an isometry but weaker called a quasi-isometry.</p>
<div class="definition">
Let $(X,d_X)$ and $(Y,d_Y)$ be metric spaces. A function $f:X\rightarrow Y$ is called a <b>quasi-isometry</b> if there are constants $K\ge1$ and $C\ge0$ s.t.
$$\frac1Kd_X(x_1,x_2)-C\le d_Y(f(x_1),f(x_2))\le Kd_X(x_1,x_2)+C$$
for all $x_1,x_2\in X$ and there's a constant $D>0$ so that for every point $y\in Y$, there's some $x\in X$ s.t. $d_Y(f(x),y)\le D$.
</div>
<div class="theorem">
If $G$ is quasi-isometric to $\Z^n$ (both with the word metric), then $G$ contains $\Z^n$ as a finite index subgroup.
</div>
<p>This theorem is particuarly interesting because it is highly geometric (or at least, the premise is). It is unclear how to even formulate this theorem algebraically if such a thing can be done.</p>
<p>As the name of this section suggests, I may return to give actual proofs of some of these results in a future post, but for now, I think this post has gone on long enough.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>I haven’t looked into the book too deeply yet, so maybe this changes towards the end <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Plus many exercises (don’t feel pressured to do them all) <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>This section was originally titled “Groups aren’t abstract nonsense” with the title of the post being “Geometric Group Theory”. While writing it, I decided that the concreteness of groups has a recurring-enough theme to be reflected in the title <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>This won’t come up here, but it’s worth mentioning that much of representation theory is simply the study of group actions, and specifically, groups acting on vector spaces <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>I want to make and insert an explicit example image of this, but was too lazy to do so. Hence, I encourage you to do this yourself with the group D_8 (symmetries of a square) acting on a regular octagon <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>Honestly, half of graph theory is just setting up definitions. Also, shameless plus, I defined 2/3 of these in a <a href="../probabilstic-method">previous post</a> <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>Recall two sets are the same when there’s a bijection between them, and the symmetrices of a set are the bijections to itself <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>I don’t actually know the answer to this, but I suspect it doesn’t; at the very least, when trying to prove it without this assumption, I ran into issues showing that edge labels can’t change. If you decide to look for a (small) counterexample, I believe (but do not know for sure) than one should exist among finite groups with two or three generators <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>Worth mentioning that we’ve just turned any group into a geometric object: a metric space <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Mostly referenced without proof at the very end <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
<li id="fn:11">
<p>The book (pg. 40) seems to define a centroid as the minimizer of the sum of distances (not squared distances) <a href="#fnref:11" class="reversefootnote">↩</a></p>
</li>
<li id="fn:14">
<p>Interestingly enough, I had the opposite experience with linear algebra. At first I thought abstract vector spaces were cool because they were so abstract and potentially wild. When I first saw that every vector space had a basis, it ruined linear algebra for me because all these strange, crazy things I had been dealing with were essentially just R^n in disguise <a href="#fnref:14" class="reversefootnote">↩</a></p>
</li>
<li id="fn:12">
<p>i.e. groups you may not see in your first exposure to group theory. <a href="#fnref:12" class="reversefootnote">↩</a></p>
</li>
<li id="fn:13">
<p>To get a group from it’s presentation, just take the free group on its generators and quotient out by the smallest normal subgroup containing all the relations <a href="#fnref:13" class="reversefootnote">↩</a></p>
</li>
<li id="fn:15">
<p>Maybe plus the fact that free groups have (infinite) trees for Cayley graphs <a href="#fnref:15" class="reversefootnote">↩</a></p>
</li>
<li id="fn:16">
<p>There are ways other ways to prove this, but as far as I know, all are geometric <a href="#fnref:16" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>I’ve recently been skimming through this book called “Office Hours with a Geometric Group Theorist” which, perhaps unsurprisingly, is about using geometric objects to study groups. It mostly focuses on how group actions on graphs and metric spaces can reveal information about the group1, and contains some pretty nice results. Unfortunately, I have too many in mind for one post, but I would still like to introduce the basic notions of the subject and a few results I enjoyed. I imagine this will be a lengthy post with a mix of introducing theory and neat applications2. I haven’t looked into the book too deeply yet, so maybe this changes towards the end ↩ Plus many exercises (don’t feel pressured to do them all) ↩Difference of squares2017-12-17T21:55:00+00:002017-12-17T21:55:00+00:00https://nivent.github.io/blog/difference-squares<p>Two new posts in one day? It must be Christmas. I think this post will be relatively short. I want to talk about a problem that popped in my head while I was working on the last post, and then mention some thoughts that this problem sparked which I hope are worth writing down before I forget.</p>
<h1 id="which-numbers-can-be-written-as-the-difference-of-two-squares">Which numbers can be written as the difference of two squares?</h1>
<p>Let’s just jump right into things. One natural place to start tackling this questions is with the primes. With that said, let $p$ be a prime number and suppose we can write $p=x^2-y^2$ for some $x,y\in\Z_{\ge0}$. This gives<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></p>
<script type="math/tex; mode=display">p=x^2-y^2=(x-y)(x+y)\implies p=x+y\text{ and }x-y=1</script>
<p>which means we require that $p$ is the sum of two consecutive numbers! Now, this took me longer than I’d like to admit to realize while I was working on this, but this is equivalent to saying that $p$ is odd. In other words, all primes $p\neq2$ can be written as the difference of two squares, namely $p=\lceil p/2\rceil^2-\lfloor p/2\rfloor^2$.</p>
<p>Since we’ve completely characterized which primes are differences of squares, we really hope that the product of differences of squares is also a difference of squares. I claim that a natural way of reaching this conclusion is to make use of the ring $\Z[\eps]\simeq\Z[x]/(x^2-1)$ where $\eps^2=1$. Using this ring let’s us factor $x^2-y^2=(x+\eps y)(x-\eps y)$, and while we could factor things before, this factorization is more useful since we can easily calculate</p>
<script type="math/tex; mode=display">(a+b\eps)(c+d\eps)=(ac+bd)+\eps(ad+bc)</script>
<p>which is enough to see that $(a^2-b^2)(c^2-d^2)=(ac+bd)^2-(ad+bd)^2$ so being a difference of squares is preserved by multiplication. Since all odd primes are differences of squares, this get’s us that all odd numbers are differences of squares<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
<p>Now, let $z=n^2m$ where $m=x^2-y^2$. Then, $z=n^2(x^2-y^2)=(nx)^2-(ny)^2$ is also a difference of squares. Hence, any number that $2$ divides an even number of times is a difference of squares. At this point, I was tempted to think that I was done, but then I realized that $8=3^2-1^2$ is also a difference of squares. Since $4=2^2-0^2$, we know that $2^2$ and $2^3$ are differences of squares, so $2^{2a+3b}$ is also a difference of squares where $a,b\in\Z_{\ge0}$. It’s not hard to see that every integer can be written as $2a+3b$ execept for 1 <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>, so $2^1=2$ is the only power of two that cannot be written as a difference of squares.</p>
<p>Since every odd prime is a difference of squares, and all but one power of $2$ is also a difference of squares, we’ve shown that the only number that might not be difference of squares are those that $2$ divides exactly once. Another way of characterizing these “bad” numbers is that they are the $n\in\Z$ for which $n\equiv2\pmod4$. Now, the equation $x^2-y^2\equiv2\pmod4$ has no solutions as one can verify by checking all 16 possible assignments of $x,y$. Thus, we’ve completely characterized the numbers that can be written as differences of squares; they are all the integers that are not $2\pmod4$! This is a surprisingly simple and nice outcome if you ask me.</p>
<h1 id="thoughts">Thoughts<sup id="fnref:5"><a href="#fn:5" class="footnote">4</a></sup></h1>
<p>With that question resolved, I wanna mention some thoughts it motivated. In the process of answering the question, it was helpful to consider the ring $\Z[\eps]$ which morally felt like the ring of integers of the number field $K=\Q(\eps)$ <sup id="fnref:4"><a href="#fn:4" class="footnote">5</a></sup>. However, this is technically wrong since $K$ isn’t a number field; it’s not a field at all (or even just a domain since $(1-\eps)(1+\eps)=0$). Despite this, we still have a natural notion of a “relative norm” on $K$ over $\Q$ given by $\knorm(x+y\eps)=x^2-y^2$, which made me wonder how much of algebraic number theory can be recovered if we study ring extensions of $\Q$ like this one <sup id="fnref:7"><a href="#fn:7" class="footnote">6</a></sup>.</p>
<p>Taking a step back to a slightly more general setting, my curiosity shifted away from number theory specifically to wonder what happens if you do Galois theory in a more general setting like this. The “ring extension” $K/\Q$ morally feels like degree 2 Galois extension with non-trivial autmorphism given by $\sigma(x+y\eps)=\sigma(x-y\eps)$. After having this in the back of my mind all day, this is what I’ve discovered so far as the possible beginnings of a formalism…</p>
<p>Fix some field $\F$. We want to study (commutative) rings <sup id="fnref:6"><a href="#fn:6" class="footnote">7</a></sup> containing this field, so let $R$ be such a ring. $R$ is still an $\F$-vector space, so we can still define the degree of the extension $R/\F$ as $[R:\F]:=\deg_{\F}R$. However, if we think on this more, it might make more sense to think of $R$ less as some sort of extension ring, and more as an $\F$-algebra. I don’t know how beneficial this algebra viewpoint is compared to thinking in terms of ring extensions, but it does at least suggest that the true object of interest of this hypothetical generalized Galois theory should be $R$-algebras where we require that $R$ is a $k$-algebra for some field $k$ (or equivalently, that $R$ is a vector space).</p>
<p>The first major issue I see with recovering Galois theory in this setting is the behavior of towers of extensions. Classically, if we have $L/K/F$ a tower of field extensions, then we get that $[L:F]=[L:K][K:F]$ and this allows one to perform induction arguments <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>. This fact basically follows from the niceness of vector spaces, but since generally for a ring $R$, non-free $R$-modules exist, we face some issues with studying towers of ring extensions. It’s possible that $R$ being a $k$-algebra (for $k$ a field) is a strong enough restriction to force all $R$-algebras to be free $R$-modules, but I don’t know enough algebra to think of a proof or counterexample to that claim off the top of my head (although it’s almost certainly false), so this issue remains unresolved. Despite this, if one could find away to get around this issue of towers of extensions, then<sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup> I think you can manage to recover at least a few gems from Galois theory. I’m a hopeful enough person to think that this might be possible in some nice settings, so</p>
<blockquote>
<p>Conjecture<br />
Let $\F$ be a field, and fix some $f(x)\in\F[x]$. Let $R$ be the “splitting ring” of $f(x)$ <sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup>. Then, the number of automorphisms $\sigma:R\rightarrow R$ fixing $\F$ is at most $\deg_{\F}R$.</p>
</blockquote>
<p>I obviously don’t know if this conjecture is true, but I feel that something like it should be true. I suspect I won’t do a lot of thinking about this anytime soon, so I leave this here to one day return to it and continue my thoughts.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>It’s technically also possible that x-y=-1, but without loss of generality, assume x>y <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>This didn’t hit me until after I’d done all the work that went into this post, but the difference between the nth square and the (n+1)th square is the nth odd number (i.e. (n+1)^2-n^2=2n+1) so this conclusion is trivial <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>If n=2k is even, then take a=k and b=0. If n=2k+1 is odd, then take a=k-1 and b=1 (this fails only if k=0, i.e. if n=1) <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>of a Programmer <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>I’m pretty sure this notation technically makes no sense, but by it, I mean the ring Q(e)={a+be:a,b are fractions} where e^2=1 <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>One immediate question I haven’t thought about the answer to is “does the traditional definition of the ring of integers still work?” In this example, one would expect that the integer ring of Q(e) should be Z[e], but I haven’t varified that this is the integral closure of Z <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>I’m hesistant to require commutativty because the quaternions also feel morally like an extension that should be considered in this more general theory, albeit not a Galois one <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>If you’re studying an extenion K/F, you might start by picking a in K-F so you get the tower K/F(a)/F. Prove your statement for F(a)/F, get the same thing for K/F(a) by induction of degrees and then use these to together to conclude something about K/F <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>Don’t quote me on this; I haven’t thought about it too deeply <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>i.e. the smallest ring (containing F) in which f splits <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Two new posts in one day? It must be Christmas. I think this post will be relatively short. I want to talk about a problem that popped in my head while I was working on the last post, and then mention some thoughts that this problem sparked which I hope are worth writing down before I forget.An interesting equation2017-12-17T20:39:00+00:002017-12-17T20:39:00+00:00https://nivent.github.io/blog/interesting-equation<p>One day I will return to writing posts that are not always very algebraic in nature, but this is not that day. I want to talk today about an example of a peculiar equation, but first a little background… In my mind, number theory (at least on the algebraic side) is ultimately about solving diophantine equations and not much more. This is what originally got me interested in the subject because trying to solve these equations can often feel like some sort of puzzle or exploratory game; there’s a common set of tricks one can apply, but not much of a single path or algorithm that always gets you the solution. Among the most basic/fundamental of tricks is to use congruences. If you seek (integer) solutions to $x^2+y^2=3$, then a natural thing to do is consider this equation (mod $4$) and note that there are no solutions to $x^2+y^2\equiv3\pmod4$ <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, so there are no integer solutions to the original equation; easy. In fact, you can state this principal in general.</p>
<blockquote>
<p>Fact<br />
Let $p(x,y)$ be some polynomial with integer coefficients. If $p(x,y)\equiv0\pmod m$ has no solutions for some $m$, then $p(x,y)$ has no integer solutions.</p>
</blockquote>
<p>It’s not far fetched to imagine that this is the only thing preventing polynomials from having integer solutions. That is, it’s natural to consider if any polynomial that can be solved$\pmod m$ for all $m\in\Z_{>0}$ must be solvable by integers. However, it turns out that this is not the case, and the subject of this post is a single counterexample</p>
<script type="math/tex; mode=display">\begin{align*}
x^2-82y^2=2
\end{align*}</script>
<p>For conveince, let’s let $q(x,y)=x^2-82y^2-2$. I haven’t thought too much about this, but for understanding this post, probably <a href="../number-theory">these</a> <a href="../solving-pell">two</a> posts on number theory, and some <a href="../ring-intro">ring theory</a> should be sufficient; if you don’t feel like reading those, then just stick with this post and if anything doesn’t make sense, you can refer back to those posts to figure it out or leave a comment, asking a question.</p>
<h1 id="solutions-in-q-and-zmod-m">Solutions in $\Q$ and $\zmod m$</h1>
<p>We’ll first establish that this equation has solutions both in $\Q$ and in $\zmod m$ for all $m$. For $\Q$, we’ll actually do one better and show that it has infinitely many solutions. We will first search for a single rational solution, so let $x=a/b$ and $y=c/d$ where $a,b,c,d\in\Z$. In order to keep things simple, we’ll assume $b=d$ so we can rewrite our equation as</p>
<script type="math/tex; mode=display">\begin{align*}
x^2-82y^2=2\iff(a/b)^2-82(c/d)^2=2\iff a^2-82c^2=2b^2\iff a^2=2b^2+82c^2
\end{align*}</script>
<p>This suggests one way of finding a solution. We just need to search for integers $b,c$ such that $2b^2+82c^2$ is a perfect square. If you were to try some examples by hand or write a computer program to search, you’d eventually come across $2(3)^2+82(1)^2=10^2$ which gives $(x,y)=(10/3,1/3)$ as a solution to $q(x,y)\in\Q[x,y]$. This is one solution, but far from the only one.</p>
<blockquote>
<p>Exercise<br />
Show that $q(x,y)\in\Q[x,y]$ has infinitely many rational solutions <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
</blockquote>
<p>To show that this equation has solutions in $\zmod m$, it’ll be sufficient to show that it can be solved for $m$ a prime power. This is a consequence of the <a href="https://www.wikiwand.com/en/Chinese_remainder_theorem">Chinese Remainder Theorem</a>.</p>
<blockquote>
<p>Theorem<br />
Let $p\neq3$ be a prime. Then, $q(x,y)$ has a solution viewed as a polynomial $q(x,y)\in\zmod{p^r}[x,y]$ for all $r$. That is, there exists integers $a,b$ s.t. <center>$q(a,b)=a^2-82b^2\equiv2\pmod{p^r}$</center></p>
</blockquote>
<div class="proof2">
Pf: Fix any positive integer $r$, and note that $\gcd(3,p^r)=\gcd(3,p)=1$, which means $3$ is a unit in $\zmod{p^r}$. Fix some $b$ s.t. $3b\equiv1\pmod{p^r}$ and note that $(x,y)=(10b,b)$ is a solution to $q(x,y)\in\zmod{p^r}[x,y]$ as <center>$$100b^2-82b^2=18b^2\equiv2(3b)(3b)\equiv2\pmod{p^r}$$</center>$\square$
</div>
<p>This just leaves the case of powers of $3$. This isn’t actually a special case, and can be handled much in the same way as all the others. We begin by noting that $(66/13,-7/13)$ is a rational solution to $q(x,y)$. Since $3$ does not divide $13$, we say that they are coprime, and so $13$ is a unit in $\zmod{3^r}$ for all $r$. Hence, $(66/13,-7/13)$ still makes sense as a solution in $\zmod{3^r}$. This will give away one solution to the exercise, but in case you’re curious where this point came from <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<p>Thus, we’ve shown $q(x,y)$ has solutions (mod $p^r$) for all prime powers $p^r$, and hence it has solutions (mod $m$) for all integers $m$.</p>
<h1 id="chinese-remainder-theorem">Chinese Remainder Theorem</h1>
<p>This section can be skipped, but I wanted to give a statement and proof of CRT for completeness.</p>
<blockquote>
<p>Chinese Remainder Theorem<br />
Let $R$ be a ring, and $I_1,\dots,I_n$ be a collection of pairwise coprime two-sided ideals (i.e. $I_i+I_j=R$ for all $i\neq j$). Then, we have a ring isomorphism</p>
</blockquote>
<center>$$\Large\begin{matrix}
\frac R{I_1\cap I_2\cap\dots\cap I_n} &\longrightarrow& \frac R{I_1}\oplus\frac R{I_2}\oplus\dots\oplus\frac R{I_n}\\
r+I_1\cap I_2\cap\dots\cap I_n &\longmapsto& \left(r+I_1,r+I_2,\dots,r+I_n\right)
\end{matrix}$$</center>
<div class="proof2">
Pf: We will prove this by induction on $n$, starting with the case of two ideals and the map $\phi:(r+I_1\cap I_2)\mapsto(r+I_1,r+I_2)$. We first need to confirm that this map is well-defined. Pick some $r,s\in R$ in the same coset so $r-s\in I_1\cap I_2$. Practically by definition, this means that $r+I_1=s+I_1$ and $r+I_2=s+I_2$ so $\phi$ is well-defined. From the behavior of cosets, it's clear that $\phi$ is a homomorphism so we only need to verify injectivity and surjectivity. Now, pick some $r+I_1\cap I_2\in\ker\phi$ so $r\in I_1\cap I_2$. Then, $\phi(r)=(r+I_1,r+I_2)=(I_1,I_2)$ is the identity so $\phi$ has trivial kernel which only leaves surjectivity. Fix some $x\in I_1$ and $y\in I_2$ such that $x+y=1$ which we get from coprimality. Then, $(r+I_1,s+I_2)$ has $sx+ry$ as a preimage, so $\phi$ is surjective. Now that we've established the case of two ideals, the general case will follow if we can show that $I_1$ and $I_2\cap\dots\cap I_n$ are coprime. To this end, for each $2\le j\le n$ pick $x_j\in I_1$ and $y_j\in I_j$ s.t. $x_j+y_j=1$. Then, <center>$$1=(x_2+y_2)(x_3+y_3)\dots(x_n+y_n)=\sum_{S\subseteq\{2,\dots,n\}}\left(\prod_{i\in S}x_i\right)\left(\prod_{j\not\in S}y_j\right)=X+\left(\prod_{j=2}^ny_j\right)$$</center>
where $X\in I_1$ is some linear combination of terms that contain $x_i$ for some $i\in\{2,\dots,k\}$ and $\prod y_j\in I_2\cap\dots\cap I_n$. Thus, $1\in I_1+(I_2\cap\dots\cap I_n)$ so $I_1+(I_2\cap\dots\cap I_n)=R$ which means these ideals are coprime as claimed. $\square$
</div>
<blockquote>
<p>Corollary<br />
Fix an integer $m$ and factor it as $m=p_1^{r_1}p_2^{r_2}\dots p_n^{r_n}$ where each $p_i$ is a different prime. Then, <center>$\zmod m\simeq\zmod{p_1^{r_1}}\oplus\zmod{p_2^{r_2}}\oplus\dots\oplus\zmod{p_n^{r_n}}$</center></p>
</blockquote>
<div class="proof2">
Pf: Exercise to reader
</div>
<p>For our purposes, we only need the corollary and not the full CRT. We want to confirm that $x^2-82y^2=2$ has solutions (mod $m$) for all $m$. Well, given any $m$, we factor it into prime powers to see that $\zmod m\simeq\zmod{p_1^{r_1}}\oplus\zmod{p_2^{r_2}}\oplus\dots\oplus\zmod{p_n^{r_n}}$. In the previous section, we found solutions in each of these factors so let $(x_j,y_j)$ satisfy $x_j^2-82y_j^2\equiv2\pmod{p_j^{r_j}}$. Then, CRT guarantees the existence of some <script type="math/tex">x^*,y^*\in\zmod m</script> such that <script type="math/tex">x^*\equiv x_j\pmod{p_j^{r_j}}</script> and <script type="math/tex">y^*\equiv y_j\pmod{p_j^{r_j}}</script> for all $j$. Thus, <script type="math/tex">(x^*)^2-82(y^*)^2\equiv2\pmod{p_j^{r_j}}</script> for all $j$. Since <script type="math/tex">(x^*,y^*)</script> satisfy $q(x,y)$ in each factor of $\zmod m$ (i.e. in each $\zmod{p_j^{r_j}}$), they must satisfy it in $\zmod m$ itself, so $q(x,y)$ does indeed have solutions modulo any integer.</p>
<h1 id="no-solutions-in-z">No Solutions in $\Z$</h1>
<p>To finish things off, we’ll show that there are no integer solutions to $x^2-82y^2=2$. This section will use some of the ideas previously touched upon in my <a href="../solving-pell">pell’s post</a>. Our first observation is that the right setting to analyze this equation is in $\zadjs{82}$, which, using terminology from that pell’s post, is the ring of integers for $K=\qadjs{82}$. We see that solutions to this equation correspond exactly to elements of $\zadjs{82}$ with norm $2$. As it turns out, understanding which numbers have norm $2$ is related to understanding how $2$ factors in $\ints K=\zadjs{82}$. More specifically, we wish to factor $(2)$ into prime ideals:</p>
<script type="math/tex; mode=display">(2)=(2,\sqrt{82})^2</script>
<p>This equality is easily verified as $(2,\sqrt{82})^2=(4,2\sqrt{82},82)=(2)(2,\sqrt{82},41)=(2)$ since $(2,\sqrt{82},41)=(1)$ as $41-2(20)=1$. Now, suppose $z=x+y\sqrt{82}$ ($x,y\in\Z$) has norm 2. i.e. assume that $=x^2-82y^2=2$. It is a fact that I will not prove that this is possible only if $(x+y\sqrt{82})=(2,\sqrt{82})$. Given this, we see that $(z^2)=(z)^2=(2)$ so $z^2=2u$ for some unit $u\in\ints K^\times$. Taking norms of both sides gives</p>
<script type="math/tex; mode=display">4\knorm(u)=\knorm(2)\knorm(u)=\knorm(2u)=\knorm(z^2)=\knorm(z)^2=4\implies\knorm(u)=1</script>
<p>Now, note that $\zadjs{82}$ has fundamental unit <script type="math/tex">\eps=9+\sqrt{82}</script> and that <script type="math/tex">\knorm(\eps)=-1</script>. Since every unit is $\pm$ a power of $\eps$, this means we can write $u=\pm\eps^{2k}$ for some $k$. Thus, we can rewrite $z^2=2u$ as $(\eps^{-k}z)^2=\pm2$. To finish things off, we will show that neither of $\pm2$ is a square in $\ints K$, giving a contradiction. This is easily seen by observing that given any $a,b\in\Z$, we have</p>
<script type="math/tex; mode=display">(a+b\sqrt{82})^2=(a^2+82b^2)+2ab\sqrt{82}</script>
<p>This can’t be $-2$ because the non-$\sqrt{82}$ part is always positive, and it can’t be $+2$ since that would require $b=0$ and $2$ is not a square in the normal integers. Thus, $\pm2$ are not squares in $\ints K$ so there’s no element of norm $2$ which means that $x^2-82y^2=2$ has no integer solutions.</p>
<h1 id="further-work">Further Work</h1>
<p>So we’ve shown that $x^2-82y^2=2$ has infinitely many rational solutions, and solutions in $\zmod m$ for all $m$, but no integer solutions. This means congruential obstructions are not the only things that can prevent a polynomial from being solved in the integers. We might still be interested in asking questions about better understanding congruential obstructions though. For example, in our analysis of this equation, the fact that we have solutions in $\zmod m$ for all $m$ was very related to the fact that we had (infinitely) many rational solutions, which begs the question</p>
<blockquote>
<p>Conjecture<br />
Let $p$ be a polynomial with integer coefficients. Then, $p$ has solutions in $\zmod m$ for all $m\iff p$ has infinitely many rational solutions.</p>
</blockquote>
<p>It actually turns out that this conjecture is false, and one counterexample is the polynomial</p>
<script type="math/tex; mode=display">p(x) = (x^2-2)(x^2-17)(x^2-34)</script>
<p>This has solutions (mod $m$) for all $m$ <sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>, but there are visibly no rational solutions. This breaks one direction of the iff above, but it’s still possible that the other direction holds, and I encourage you to investigate this.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Just try all 16 possible pairs of values for x,y <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="../number-theory">hint</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>I used the (10/3,1/3) solution to project the line y=0 onto the curve defined by x^2-82y^2=2. Under this projection/correspondence, the point (4,0) on this line gets mapped to (66/13,-7/13) on this curve <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>The easiest way to convince yourself of this claim is via <a href="https://www.wikiwand.com/en/Quadratic_reciprocity">quadratic reciprocity</a> <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>One day I will return to writing posts that are not always very algebraic in nature, but this is not that day. I want to talk today about an example of a peculiar equation, but first a little background… In my mind, number theory (at least on the algebraic side) is ultimately about solving diophantine equations and not much more. This is what originally got me interested in the subject because trying to solve these equations can often feel like some sort of puzzle or exploratory game; there’s a common set of tricks one can apply, but not much of a single path or algorithm that always gets you the solution. Among the most basic/fundamental of tricks is to use congruences. If you seek (integer) solutions to $x^2+y^2=3$, then a natural thing to do is consider this equation (mod $4$) and note that there are no solutions to $x^2+y^2\equiv3\pmod4$ 1, so there are no integer solutions to the original equation; easy. In fact, you can state this principal in general. Just try all 16 possible pairs of values for x,y ↩Algebra Part II2017-11-21T03:00:00+00:002017-11-21T03:00:00+00:00https://nivent.github.io/blog/ring-intro<p>This is the second post in a series that serves as an introduction to abstract algebra. In the <a href="../group-intro">last one</a>, we defined groups and certain sets endowed with a well-behaved operation. Here, we’ll look at rings which are what you get when your set has two operations defined on it, and we’ll see that much of the theory for groups has a natural analogue in ring theory.</p>
<blockquote>
<p>Bit of a Disclaimer<br />
I can’t possibly mention everything on a particular subject in one post, and I am not a particular fan of writing insanely long posts, so some things have to be cut. In particular, I aim to introduce most of the important topics in each subject without necessarily doing a deep dive, and while I will try to mention specific examples of things, I won’t spend too much time looking at them closely. It will be up to you to take the time to make sure the example makes sense. Because of this, I’ll try to include exercises that should be good checks of understanding. Finally, as always, things are presented according to my tastes and according to whatever order they happen to pop into my head; hence, they are not necessarily done the usual way.</p>
</blockquote>
<h2 id="rings-and-better-rings">Rings and Better Rings</h2>
<p>If you’ve heard of groups, I doubt I need to motivate rings much. Things like the integers, real numbers, matrices, etc. all form groups but when considering them as such, you have to make a choice about whether you want to consider they’re additive structure, or their multiplicative structure; why not look at both?</p>
<blockquote>
<p>Definition<br />
A <strong>ring</strong> $(R, +, \cdot)$ is a set $R$ together with two operations $+:R\times R\rightarrow R$ and $\cdot:R\times R\rightarrow R$ satisfying the following for all $a,b,c\in R$<br />
<script type="math/tex">% <![CDATA[
\begin{align*}
&\bullet (R, +)\text{ is an abelian group with additive identity }0\\
&\bullet a\cdot(b\cdot c) = (a\cdot b)\cdot c\\
&\bullet a\cdot(b+c) = a\cdot b+a\cdot c\text{ and }(a+b)\cdot c=a\cdot c+b\cdot c
\end{align*} %]]></script><br />
If additionally, $a\cdot b=b\cdot a$ always, then we call this a <strong>commutative ring</strong></p>
</blockquote>
<p>There are a few things worth noticing about the definition of a ring. First of all, it’s kinda short; at least, it was shorter than I expected the first time I saw it. There are like four different properties you need to satisfy to be an abelian group; to be a ring, you just need associative multiplication and (both left and right) distributivity. You don’t need inverses, and you don’t even need to have a multiplicative identity. This means you can get some weird stuff happening in general rings.</p>
<ul>
<li>In $2\Z$, you have $ab\neq a$ no matter which $a,b\in2\Z$ you choose</li>
<li>In $\zmod8$, you get $4*2=0$ even though $4,2\neq0$</li>
<li>Also in $\zmod8$, you get $7^2=1^2=3^3=5^2=1$ so you have 4 different square roots of $1$</li>
</ul>
<p>Because of this, we will see a number of different types of rings with increasingly more conditions on them, guaranteeing nice behavior. Also, in case you were wondering by require an abelian group under addition and not just a group, it’s because general groups and noncommutative rings are ugly enough separately, having one object that doesn’t commute under addition or multiplication just sounds awful <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Now, leet’s see some additional conditions we may want to place on rings</p>
<blockquote>
<p>Definition<br />
A ring $R$ is said to <strong>have unity</strong> (alternatively, $R$ is <strong>unital</strong>) if there exists some $1\in R$ such that $a\cdot1=1\cdot a=a$ for all $a\in R$.</p>
</blockquote>
<p>This gets rid of the first problematic ring I mentioned above, but not the other two. The third one may stick around for awhile, but the second we really don’t like.</p>
<blockquote>
<p>Definition<br />
An element $a\in R$ is called a <strong>zero divisor</strong> if there exists some nonzero $b\in R$ such that $ab=0$</p>
</blockquote>
<blockquote>
<p>Definition<br />
An <strong>integral domain</strong> $D$ is a commutative ring with unity and no zero divisors</p>
</blockquote>
<blockquote>
<p>Question<br />
Is the zero ring an integral domain?</p>
</blockquote>
<p>Now that’s something we can work with. It practive, almost all rings you work with will have unity, the majority will be commutative <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>, and plenty of them will be integral domains. The fact that integral domains don’t have zero divisors means that we can “cancel” multiplication. Normally when we have some equation like $ab=ac$, we cancel out the $a$’s and conclude that $b=c$. However, as the $4\cdot2=0$ example from $\zmod8$ shows, this isn’t always legitimate. In high-school algebra, we justify this cancellation by
saying we multiply both sides by $a^{-1}$, but $a^{-1}$ won’t exist most of the time in general! Luckily, even if it doesn’t we can still justify cancellation most of the time.</p>
<blockquote>
<p>Theorem<br />
Let $D$ be a ring. Then left (respectively, right) cancellation holds iff $D$ has no zero divisors</p>
</blockquote>
<div class="proof2">
Pf: $(\rightarrow)$ Assume left cancellation holds. Pick $a,b\in D$ such that $ab=0=a0$. By left cancellation, this means $b=0$ so there are no zero divisors.<br />
($\leftarrow$) Assume $D$ has no zero divisors. Pick $a,b,c\in D$ with $a$ nonzero such that $ab=ac$. Then, we can subtract to get $0=ac-ab=a(c-b)$. Since there are no zero divisors, either $a=0$ or $c-b=0$, but $a\neq0$ by assumption so $c-b=0$ and $c=b$. $\square$
</div>
<p>Above you’ll notice that we used $a0=0=0a$ for all $a$ in a ring. This is nonobvious, but also not all that profound. You can prove basic properties of rings like this <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> without much effort, so I’ll omit them. Furthermore, as with groups, we can define <strong>ring homomorphisms (or ring maps)</strong> as maps $f:R\rightarrow S$ such that $f(a+b)=f(a)+f(b)$ and $f(ab)=f(a)f(b)$; additionally, if $R,S$ both have unity we require $f(1_R)=1_S$. We can also define the <strong>kernal</strong> of a ring map $f:R\rightarrow S$ to be the subset of $R$ mapping into $0$. There’s also a notion of subring that’s exactly what you think it is.</p>
<p>For the rest of this post, I think we’ll be looking (almost) exclusively at commutive rings with unity, so unless otherwise specified, assume that’s the case. Now, before moving onto more definitions and whatnot, I want to make note of one of the most important classes of rings: polynomial rings.</p>
<blockquote>
<p>Definition<br />
Given a ring<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> $R$, the <strong>polynomial ring</strong> $R[x]$ is the ring of (formal) polynomials in $x$ with coefficients in $R$.</p>
</blockquote>
<p>The above definition isn’t all that formal, but the idea is that you have things like $3x^2+2x-7\in\Z[x]$, $\pi x^4-e\in\R[x]$, $3x-1/2\in\Q[x]$, etc. One thing to be careful about is that two polynomials are equal iff they are identically the same; any polynomial $p(x)\in R[x]$ gives write to a function <script type="math/tex">R\rightarrow R</script> via evaluation, but the mapping $p\mapsto[r\mapsto p(r)]$ <sup id="fnref:6"><a href="#fn:6" class="footnote">5</a></sup> is not necessarily injective! That is, you can have distinct polynomials that determine the same function such as $p(x)=x^2$ and $q(x)=x$ in $\zmod2[x]$; $p(x)\neq q(x)$ as polynomials even though $\forall n\in\zmod2:p(n)=q(n)$.</p>
<blockquote>
<p>Aside<br />
If you wan’t to careful define the polynomial ring, you can define it as the subset of the ring $R^{\N}$ of functions from $\N$ to $R$ consisting of elements that evaluate to 0 on all but finitely many $n\in\N$. You also have to specify what multiplication looks like because it’s not the usual componentwise product.</p>
</blockquote>
<h1 id="domains">Domains</h1>
<p>Unfortunately, unlike the previous post on groups, there isn’t some major result like Langrange’s Theorem of the First Isomorphism Theorem that we’re working towards here; this post is more a goaless overview of some basics in ring theory.</p>
<blockquote>
<p>Proposition<br />
If $D$ is an integral domain, then so is $D[x]$</p>
</blockquote>
<p>Before proving this, we’ll define the degree of a polynomial which is a notion we’ll see more than once.</p>
<blockquote>
<p>Definition<br />
Given some polynomial $p(x)=\sum_{k=0}^na_kx^k\in R[x]$, its <strong>degree</strong> $\deg p(x)$ is the largest $k$ such that $a_k\neq0$. Note that $\deg0$ is undefined whereas $\deg r=0$ for any nonzero $r\in R$.</p>
</blockquote>
<blockquote>
<p>Remark<br />
Given nonzero $p,q\in D[x]$ where $D$ is an integral domain, we have $\deg(pq)=\deg p+\deg q$ which is simple but important.</p>
</blockquote>
<p>The above remark is actually strong enough to imply to proposition, so we omit a formal proof of that. In the remark, we require $D$ to be an integral domain so that the leading coefficient <sup id="fnref:5"><a href="#fn:5" class="footnote">6</a></sup> of $pq$ is guaranteed to be nonzero since it’s the product of two nonzero things (i.e. the leading coefficients of $p$ and $q$). I don’t have a good transition here, but another important thing related to integral domains is…</p>
<blockquote>
<p>Definition<br />
Given a (possibly non-commutative) ring $R$ with unity, there is a unique ring map $\Z\rightarrow R$ (why?). If $D$ is a domain, then we define the <strong>characteristic</strong> $\Char D$ of $D$ to be the least positive $n\in\Z$ mapping to 0 under this unique ring map. If no positive integer maps to 0, then we say $\Char D=0$.</p>
</blockquote>
<p>There are a few ways to think of characteristic. In a while when we define ideals, it’ll become clear that the characteristic of $D$ is the generator of $\ker(\Z\rightarrow D)$; we can also say that $\Char D$ is the (least) number of times you can add 1 to itself in a ring before getting $0$ <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>; alternatively, remembering that rings are abelian groups, $\Char D$ is the additive order of $1$. Good examples to keep in mind here are $\Char\Z=0$ and $\Char\zmod p=p$. Vaguely put, characteristic is a good indicator for the behavior of a ring; weird things can happen in rings of low characteristics.</p>
<blockquote>
<p>Theorem<br />
Given an integral domain $D$ of nonzero characteristic, $\Char D$ is prime</p>
</blockquote>
<div class="proof2">
Pf: Let $D$ be an integral domain and assume $\Char D=n\neq0$. Now, write $n=ab$ ($a,b$ both positive) and let $f:\Z\rightarrow D$ be the ring map. We have $0=f(n)=f(ab)=f(a)f(b)$ but $D$ has no zero divisors so $f(a)=0$ or $f(b)=0$. Assume WLOG that $f(a)=0$. Since $a\le n$ and $n$ is the minimal integer with $f(n)=0$, we conclude that $a=n$ which means $b=1$. Thus, the only divisors of $n$ are $1,n$ so $n$ is prime. $\square$
</div>
<blockquote>
<p>Corollary<br />
$\zmod n$ is an integral domain implies that $n$ is prime</p>
</blockquote>
<p>The converse of that corollary is true as well, and proving both directions is left as an exercise. Let’s shift gears a little, and instead of talking about properties of rings, we’ll look at some specific types of elements.</p>
<blockquote>
<p>Definitions<br />
Let $R$ be a ring. An element $r\in R$ is called a <strong>unit</strong> if it divides 1. That is, there exists some $s\in R$ such that $rs=1$. We say a non-zero non-unit $r\in R$ is <strong>irreducible</strong> if whenever we write $r=ab$, it must be the case that either $a$ or $b$ is a unit. Finally, a non-zero non-unit $r\in R$ is <strong>prime</strong> if $r\mid ab$ implies that $r\mid a$ or $r\mid b$.</p>
</blockquote>
<p>In the integers $\Z$, the only units are $\pm1$, and prime and irreducible mean the same thing. In general, prime implies irreducible.</p>
<blockquote>
<p>Theorem<br />
Let $D$ be an integral domain. Then, every prime element is irreducible</p>
</blockquote>
<div class="proof2">
Pf: Pick some prime $p\in D$ and write $p=ab$. Then, $p\mid ab$ so $p\mid a$ or $p\mid b$. Assume WLOG that $p\mid a$ and write $a=pc$. Substituting into the first equation, we have $p=pcb$ which means $1=cb$ and so $b$ was a unit. $\square$
</div>
<p>The converse of this theorem does not hold in general though. For a counter example, we can consider the ring $\Z[\sqrt{-5}]=\{a+b\sqrt{-5}:a,b\in\Z\}$. Here, $2$ is irreducible as can easily be proven using the norm map <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>. However, $2$ divides $6=(1+\sqrt{-5})(1-\sqrt{-5})$ but divides neither of $1\pm\sqrt{-5}$ since, for example, any multiple of $2$ will have both components even. Since the two definitions coincided for integers, we’d like to study other rings where they are the same as well. To that end, we define the following</p>
<blockquote>
<p>Definition<br />
A <strong>unique factorization domain</strong> (or UFD) $U$ is an integral domain where every non-zero $x\in U$ can be written as a product $x=up_1p_2\dots p_n$ of a unit $u$ with irreducibles $p_i$. Furthermore, this representation is unique in the sense that given $x=wq_1q_2\dots q_m$ as well, we must have $m=n$ and (after rearrangement) $q_i=v_ip_i$ for units $v_i$.</p>
</blockquote>
<p>That definition is a bit of a mess, but the basic idea is that we have some analogue of the fundamental theorem of arithmetic. A good example of a UFD to keep in mind is $\Z[x]$, and in fact more generally</p>
<blockquote>
<p>Theorem<br />
If $U$ is a UFD, then so is $U[x]$</p>
</blockquote>
<div class="proof2">
Pf: Omitted. At this point, I don't think we've quite developed enough theory to prove this nicely, so look up a proof after finishing this post.
</div>
<p>Because I don’t want to wait too long before saying this, and because it’s related previously omitted proof, let’s look at just about the nicest rings known to algebra.</p>
<blockquote>
<p>Definition<br />
A <strong>field</strong> $k$ is a ring such that $(k-\{0\},\cdot)$ is an abelian group. i.e. all non-zero elements of $k$ are units.</p>
</blockquote>
<p>Examples of fields include $\Q, \R, \C,$ and $\qadjs2=\{a+b\sqrt2:a,b\in\Q\}$. Because multiplicative inverses exist, cancellation automatically holds in fields and so all fields are domains. The 4 examples I just gave all have characteristic 0, but fields with prime characteristic exist as well.</p>
<blockquote>
<p>Proposition<br />
$\zmod p$ is a field. More generally, any finite domain is a field.</p>
</blockquote>
<div class="proof2">
Pf: Let $D$ be a finite domain, and fix some non-zero $d\in D$. Consider the map $m_d:D\rightarrow D$ given by $m_d(a)=da$. We claim this map is injective. Pick some $a\in\ker m_d$. Then, $m_d(a)=da=0\implies a=0$ since $d\neq0$ by assumption. Thus, $m_d$ has trivial kernal and hence is injective. Now, any injective map between finite sets is automatically surjective, so $\image m_d=D$. In particular, there exists some $c\in D$ such that $m_d(c)=dc=1$ so $d$ has an inverse. $\square$
</div>
<p>When thinking of $\zmod p$ as a field, we usually denote it $\F_p$ and call it the (finite) field with $p$ elements. This is not the only connection between domains and fields. It’s clear that every subring of a field is a domain, but it turns out the converse also holds.</p>
<blockquote>
<p>Definition<br />
Given a domain $D$, its <strong>field of fractions</strong> $\Frac(D)$ is the fields whose elements are formal symbols $\frac ab$ ($a,b\in D$) modded by the relation $\sim$ with addition and multiplication given by<br /><center>
$$\begin{align*}
\frac ab+\frac cd=\frac{ad+bc}{bd} && \frac ab\frac cd=\frac{ac}{bd} && \frac ab\sim\frac cd\iff ad=bc
\end{align*}$$</center>
Note that we can embed $D$ in its field of fractions via the map $d\mapsto\frac d1$</p>
</blockquote>
<p>It’s up to you to verify that this construction gives an actual field. After that, forgetting fields for a moment, we look again at the definition of a UFD and realize that units can be annoying, so we’ll turn our attention to something a little more unit-agnostic.</p>
<h1 id="ideals">Ideals</h1>
<p>The big payoffs of the group theory post both were related to the idea of quotient groups. In this section, we’ll see how to define the analgous idea of quotient rings.</p>
<blockquote>
<p>Definition<br />
Given a ring $R$, an <strong>ideal</strong> $I\subseteq R$ is an additive subgroup such that $ar\in I$ for all $a\in I$ and $r\in R$.</p>
</blockquote>
<blockquote>
<p>Remark<br />
Depending on the author, ideals may or may not be rings. They are if you don’t require rings to have unity, but aren’t otherwise.</p>
</blockquote>
<p>We could have followed the footsteps of the group theory post and defined ideals as kernels of ring maps, but ideals have more a life of their own that normal subgroups, so this definition is the one always used.</p>
<blockquote>
<p>Proposition<br />
Given a ring map $f:R\rightarrow S$, its kernal $\ker f\subseteq R$ is an ideal.</p>
</blockquote>
<div class="proof2">
Pf: Exercise to reader
</div>
<p>Now, recall that in abelian groups, all subgroups are normal. Luckily for us, all rings are additive abelian groups, so ideals are automatically normal subgroups. This means we can form the quotient group $R/I$ as before; however, the additional condition that $I$ “absorbs” $R$ ensures that we can endow this quotient with a ring structure.</p>
<blockquote>
<p>Definition<br />
Given a ring $R$ and ideal $I\subseteq R$, the <strong>quotient ring</strong> $R/I$ is the quotient group endowed with the following multiplication of cosets<br /><center>$$(a+I)(b+I)=ab+I$$</center></p>
</blockquote>
<blockquote>
<p>Exercise<br />
Verify that this definition gives a well-defined ring.</p>
</blockquote>
<blockquote>
<p>Exercise<br />
Prove the first isomorphism theorem for rings: Given a surjective ring map $f:R\rightarrow S$, we have $R/\ker f\simeq S$</p>
</blockquote>
<p>As there are different types of rings, we have different types of ideals depending on how nice the associated quotient ring is. It’s worth convincing yourself that any ideal containing a unit must be all of $R$, and the only ideals of a field are the trivial ones <sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup>.</p>
<blockquote>
<p>Definition<br />
Let $R$ be a ring and $I\subseteq R$ an ideal of $R$. We say that $I$ is <strong>prime</strong> if $R/I$ is an integral domain, and $I$ is <strong>maximal</strong> if $R/I$ is a field.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let $R$ be a ring with ideal $I$. Then, $I$ is prime iff $ab\in I\implies a\in I$ or $b\in I$, and $I$ is maximal iff $I\neq R$ and given any ideal $J$ with $I\subseteq J\subseteq R$, either $J=I$ or $J=R$.</p>
</blockquote>
<div class="proof2">
Pf: The statement about prime ideals is left as an exercise, but we'll prove the one about maximal ideals here. $(\rightarrow)$ Assume $I$ is maximal, and pick some ideal $J$ with $I\subseteq J\subseteq R$. Let $f:R\rightarrow R/I$ be the quotient map. It's easily verified that $f(J)$ is an ideal so $f(J)=\{0\}$ or $f(J)=R/I$. In the first case, we have $J\subseteq\ker f=I\implies J=I$. In the second, any preimage of $1\in R/I$ is necessarily a unit, so $J=R$ as it contains a unit. ($\leftarrow$) Conversely, assume that $I\subseteq J\subseteq R\implies J=I$ or $J=R$. Let $f:R\rightarrow R/I$ be the quotient map again, and consider an ideal $\tilde J\subseteq R/I$. It's again easily verified that $f^{-1}(\tilde J)$ is an ideal. Since $0\in\tilde J$, we must have $I\subseteq f^{-1}(\tilde J)\subseteq R$ so $\tilde J=\{0\}$ or $R/I$. This implies that $R/I$ is a field. Indeed, given any nonzero $x\in R/I$, the ideal it generates $(x)=R/I$ so there must be some $y\in R/I$ such that $xy=1$. $\square$
</div>
<p>The above proof makes use of the <strong>ideal generated by x</strong> which is given by $(x)=Rx=\{rx:r\in R\}$. We can generalize this notion to any collection of elements</p>
<blockquote>
<p>Definition<br />
Given a (not necessarily finite) subset $S\subseteq R$, the <strong>ideal generated by S</strong> is the ideal<br /><center>$$\left\{\sum_{s\in S}a_s\cdot s:a_s\in R,\text{all but finitely many }a_s\text{ are zero}\right\}$$</center>
When $S=\{x_1,\dots,x_n\}$ is finite, this is commonly denoted<br /><center>$$(x_1,\dots,x_n)=\sum_{i=1}^nRx_i=\{r_1x_1+\dots+r_nx_n:r_i\in R\}$$</center></p>
</blockquote>
<p>With this, we can define that last special type of ideal in this post.</p>
<blockquote>
<p>Definition<br />
We call an ideal $I\subseteq R$ <strong>principal</strong> (or say it’s <strong>principally generated</strong>) if it is generated by a single element.</p>
</blockquote>
<p>Principal ideals are some of the nicest ideals, and behave very similar to numbers (i.e. elements of $R$). However, they have the added benefit that if you multiply one by a unit, nothing changes. Hence, we arrive at our next kind of ring</p>
<blockquote>
<p>Definition<br />
If $R$ is a domain where every ideal is principal, then we call $R$ a <strong>principal ideal domain</strong>, or <strong>PID</strong>.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Every PID is a UFD</p>
</blockquote>
<div class="proof2">
Pf: One of my goals this post is to avoid writing any proofs involving UFDs, so omitted.
</div>
<p>Examples of PIDs include $\Z$, and as we’ll see in a moment $k[x]$ for $k$ a field. One thing that is true in general is that $(p)$ is a prime ideal if $p$ is a prime element. Given the following theorem, this means that for PIDS, every prime ideal is maximal.</p>
<blockquote>
<p>Theorem<br />
In a PID, an ideal is maximal iff its generated by an irreducible</p>
</blockquote>
<div class="proof2">
Pf: $(\leftarrow)$ Let $I=(r)$ where $r\in R$ is irreducible and $R$ is a PID. Consider some ideal $J=(a)$ with $I\subseteq J\subseteq R$. Since $r\in(a)$, there must exist some $b\in R$ with $r=ab$. However, because $r$ is irreducible, either $a$ is a unit or $b$ is. If $a$ is a unit, then $J=R$. If $b$ is a unit, then $J=I$ since unit multiplies generate the same ideal. Thus, $I$ is maximal. ($\rightarrow$) Run the same argument in reverse: $I=(r)$ is maximal and $r=ab$ implies $(r)\subseteq(a)\subseteq R$ so fill in the blank. $\square$
</div>
<p>One application of the above theorem is that it let’s us generate fields of varying sizes.</p>
<blockquote>
<p>Exercise<br />
Show that if $f(x)\in\F_p[x]$ is an irreducible polynomial, then $\F_p[x]/(f(x))$ is a finite field of size $p^{\deg f}$</p>
</blockquote>
<p>Showing something is a PID directly can be difficult, so it’s sometimes helpful to instead show the stronger condition that your ring has a Euclidean algorithm on it.</p>
<blockquote>
<p>Definition<br />
A <strong>Euclidean domain</strong> $E$ is an integral domain with a function $f:R-\{0\}\rightarrow\Z_{\ge0}$ such that for any $a,b\in R$ with $b\neq0$, there exists $q,r\in R$ where $a=bq+r$ and $r=0$ or $f(r)<f(b)$.</p>
</blockquote>
<p>In essence, you can perform division in $E$, and there’s a sense in which the remainder is smaller than what you started with. Examples include $\Z$ where $f(n)=|n|$ and any field $k$ with $f(x)=1$. A more interesting example is the Gaussian integers $\Z[i]$ with $f(a+bi)=a^2+b^2$. If you’ve been paying attention, you’ll notice that there was no $R\text{ PID}\implies R[x]\text{ PID}$ theorem; this is because this statement is false (for a counter example, consider $R=\Z$. The ideal $(2,x)\subset\Z[x]$ is not principal). However, with strong assumptions, you can get something almost like this.</p>
<blockquote>
<p>Theorem<br />
If $k$ is a field, then $k[x]$ is a Euclidean domain</p>
</blockquote>
<div class="proof2">
Pf sketch: For your function $f:k[x]-\{0\}\rightarrow\Z_{\ge0}$, you just use $f(p)=\deg p$. With this choice, polynomial long division gets you what you need. Since we're working over a field, you can always scale the leading coefficient of the divisor to cancel out all higher order terms of the dividend so that the remainder has strictly smaller degree. $\square$
</div>
<blockquote>
<p>Theorem<br />
Every ED is a PID</p>
</blockquote>
<div class="proof2">
Pf: Let $E$ be a Euclidean domain with ideal $I$. Pick non-zero $x\in I$ so that $f(x)$ is minimal among elements of $I$. Now, consider any $a\in I$ and divide to get $a=xq+r$ where $r=0$ or $f(r)< f(x)$. We claim that $a\in(x)$. Note that $r=a-xq\in I$ so $f(r)\ge f(x)$ (if $r\neq0$) by minimality of $x$. This means that $r=0$ so $a=xq\in(x)$ as desired and so $I=(x)$ is principal. $\square$
</div>
<p>Hence, the polynomial ring over any field is a PID.</p>
<h1 id="a-glimpse-of-field-theory">A Glimpse of Field Theory</h1>
<p>Hopefully there’s nothing major I forgot to say <sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup>. With this last bit, I want to mention one neat result about fields. For this, I’m going to need to assume you know a little linear algebra: specifically, the definition of a vector space over a field, and the fact that every vector space has a basis. We’ll use this to show that the sizes of fields are pretty constrained.</p>
<blockquote>
<p>Definition<br />
Let $F,E$ be fields and assume that $F\subseteq E$. We call $E$ an <strong>extension field</strong> of $F$ and denote this $F/E$</p>
</blockquote>
<p>One of the most important things about extension fields is that if $F/E$ is a field extension, then $F$ is an $E$-vector space! Although it’s not difficult to see, you should verify this claim. It basically boils down to the fact that multiplication is linear.</p>
<blockquote>
<p>Definition<br />
The <strong>degree</strong> of a field extension $E/F$ is $[E:F]=\dim_FE$ the dimension of $F$ as an $E$-vector space</p>
</blockquote>
<p>With that, our last result</p>
<blockquote>
<p>Theorem<br />
Let $E$ be a finite field. Then, $|E|=p^n$ for some prime $p$ and integer $n$</p>
</blockquote>
<div class="proof2">
Pf: Let $p=\Char E$ which must be prime since it's nonzero. Let $F$ be $E$'s so-called $\textit{prime subfield}$ which is the image of the map $\Z\rightarrow E$. Finally, let $n=[E:F]$ and let $e_1,\dots,e_n\in E$ be an $F$-basis for $E$. Then, every element of $E$ can be written uniquely in the form $$a_1e_1+\dots+a_ne_n$$ where $a_i\in F$. Since $|F|=\Char E=p$, and there are $n$ coefficient's to choose, there are $p^n$ expressions of this form and correspondingly $|E|=p^n$. $\square$
</div>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Also, such objects don’t show up in practice then often (read: ever) <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Matrices are the standard example of non-commutative rings <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>(multiplicative) inverses are unique when they exists, if its a ring with unity that (-1)a = -a (i.e. -1 times a is the additive inverse of a), etc. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>While typing this, I realized I don’t know if people ever work in polynomials over non-commutative or non-unital rings <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>This is shorthand for a map that given a polynomial p, returns a function that takes some element r and returns p(r), the result of evaluation p at r <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>coefficient of $x^d$ where $d=\deg p$ <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>This is how I’ve always seen characteristic define, but it leads to confusing notation like n1:=1+1+…+1 (n times), so you write things like n1=(ab)1=(a1)(b1) and it gets annoying to keep track of which 1’s matter and which you can drop because identity. I much prefer making explicit mention to a ring map <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>previously defined <a href="../solving-pell">here</a> <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>the zero ideal and the field itself <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Worst-case scenario, I just edit this post later on to add in anything missing <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>This is the second post in a series that serves as an introduction to abstract algebra. In the last one, we defined groups and certain sets endowed with a well-behaved operation. Here, we’ll look at rings which are what you get when your set has two operations defined on it, and we’ll see that much of the theory for groups has a natural analogue in ring theory.Addition Done Right2017-09-20T03:00:00+00:002017-09-20T03:00:00+00:00https://nivent.github.io/blog/addition<p>This post will go over the material already nicely covered by <a href="https://pdfs.semanticscholar.org/b44b/eb7ff396be62e548e4a6dc39df0bdf65e593.pdf">this document</a>, so if you want, you could just read that instead. The main purpose of reproducing things here is for me to think more actively about the ideas presented there, and to see what I’ll want to do differently. This post will assume as much group theory as I covered in my <a href="../group-intro">last post</a>.</p>
<div id="latex-commands" class="latex-commands">
$\DeclareMathOperator{\T}{\mathcal T}
\DeclareMathOperator{\O}{\mathcal O}
\DeclareMathOperator{\H}{\mathcal H}
\DeclareMathOperator{\B}{\mathcal B}
\DeclareMathOperator{\z}{\mathcal Z}
\newcommand{\rep}[2]{\left[#1\mid#2\right]}$
</div>
<p>We will begin by looking at how we add 2-digit numbers, and from there, we’ll consider addition rules different from the normal one we learn in grade school, and this will lead us into some of the ideas that arise in the study of group cohomology. I don’t myself actually know a lot about cohomology, so forgive any mistakes.</p>
<h1 id="the-setup">The setup</h1>
<p>The setup for the meat of this post might seem needlessly complicated, but why we present things this way will become more clear as we move into later sections. As I mentioned before, we begin with grade school addition of two-digit numbers <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Since we only care about two-digit numbers, we conduct our math here in $\Z_{100}\simeq\Z/100\Z$ <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>. $\Z_{100}$ contains $\Z_{10}$ as a subgroup realized as the multiplies of $10$; we will denote this subgroup $\T$ for tens. Because getting onoly one copy of <script type="math/tex">\Z_{10}</script> from <script type="math/tex">\Z_{100}</script> would be boring, we note that $\Z_{100}/\T\simeq\Z_{10}$ as well and call this quotient group $\O$ for ones. For a sneak preview of what’s to come, note that we have the following short exact sequence.</p>
<script type="math/tex; mode=display">\begin{CD}
0 @>>> \T @>>> \Z_{100} @>>> \O @>>> 0
\end{CD}</script>
<p>Above, $0$ denotes the trivial group, and the maps into/out of $\Z_{100}$ are the inclusion and quotient maps, respectively.</p>
<p>Before we can start adding members of $\Z_{100}$, we need to agree on way to represent it’s members. To this end, every member of $\Z_{100}$ will be represented as $\rep ab$ where $a\in\T$ and $b\in\O$. In this notation we have, for example, $43=\rep43$, $8=\rep08$, and $90=\rep90$. Now, if we want to write the usual addition law, we cannot simply say that $\rep ab+\rep cd=\rep{a+c}{b+d}$ since $\rep78+\rep03=\rep81\neq\rep71$. In particular, the addition law here comes equipped with a notion of carry so that in general</p>
<script type="math/tex; mode=display">\rep{a_1}{b_1}+\rep{a_2}{b_2}=\rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2}</script>
<p>where $z:\O\times\O\rightarrow\T$ is the function defined by $z(b_1,b_2)=1$ when $b_1+b_2\ge10$ and $z(b_1,b_2)=0$ otherwise <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<blockquote>
<p>Aside<br />
The above definition for $z$ is technically nonsense since $b_1,b_2\in\O$ and there’s no notion of ordering in $\O$ (even if there was, $10=0$ in $\O$ so it wouldn’t be helpful here). To make the definition rigourous, we would want to introduce a specific mapping from $\O$ to $\Z$ and then use the order on $\Z$ to be able to compactly describe $z$. However, this is just boilerplate so I didn’t bother.</p>
</blockquote>
<p>This set of symbols ($\rep ab$ for $a\in\T$ and $b\in\O$) along with this addition law completely characterizes $\Z_{100}$. We will soon see what happens when we define different addition laws, but first we will observe an interesting property of $z$. Recall that addition is associative so</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
(\rep{a_1}{b_1} + \rep{a_2}{b_2}) + \rep{a_3}{b_3} &= \rep{a_1}{b_1} + (\rep{a_2}{b_2} + \rep{a_3}{b_3})\\
\rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2} + \rep{a_3}{b_3} &= \rep{a_1}{b_1} + \rep{a_2+a_3+z(b_2,b_3)}{b_2+b_3}\\
\rep{a_1+a_2+a_3+z(b_1,b_2)+z(b_1+b_2,b_3)}{b_1+b_2+b_3} &= \rep{a_1+a_2+a_3+z(b_2,b_3)+z(b_1,b_2+b_3)}{b_1+b_2+b_3}\\
z(b_1,b_2) - z(b_1,b_2+b_3) &+ z(b_1+b_2,b_3) - z(b_2,b_3) = 0
\end{align*} %]]></script>
<p>The equation we got at the end we call the <strong>cocycle condition</strong>. We also observe that $z$ satisfies the so-called <strong>normalization condition</strong> which says that $z(0,b)=0=z(b,0)$ for all $b\in\O$. Inspired by this, we make the following definition.</p>
<blockquote>
<p>Definition<br />
A function $z:\O\times\O\rightarrow\T$ is called a <strong>cocycle</strong> if it satisfies the cocycle and normalization conditions.</p>
</blockquote>
<p>So this is intersting. By looking at addition in 2-digits, we arrived at a way of completely characterizing $\Z_{100}$ in terms of symbols $\rep ab$ formed from members of two groups - $\T$ and $\O$ - and a so-called cocycle. A natural question is whether or not we can get other groups from different choices of a cocycle.</p>
<p>The answer of course is yes. We will see this in more generality in a second, but first a quick example. Consider the trivial cocycle given by $z(b_1,b_2)=0$ for all $b_1,b_2\in\O$. With this choice of cocycle, addition is given by $\rep ab+\rep cd=\rep{a+c}{b+d}$ so that for example</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\rep27 &+& \rep41 &=& \rep68\\
\rep99 &+& \rep05 &=& \rep94\\
\rep43 &+& \rep27 &=& \rep60
\end{matrix} %]]></script>
<p>It is not to difficult to see that this cocycle gives rise to the group $\Z_{10}\times\Z_{10}$ with the identification given by $\rep ab\leftrightarrow(a,b)$.</p>
<h1 id="extensions-are-cocycles">Extensions are cocycles</h1>
<p>In the previous section, we saw how the standard carrying function let’s us construct $\Z_{100}$ from $\T$ and $\O$. We also saw that replacing this function by the $0$ function instead let’s us construct $\Z_{10}\times\Z_{10}$. In general, any cocycle $z:\O\times\O\rightarrow\T$ gives rise to some (abelian) group of order 100 constructed from $\T$ and $\O$. One simple source of cocycles is multiples of the standard carrying function, and in fact it can shown that these get you all abelian groups of order 100. The question still remains, “which other choices of $z$ let us construct abelian groups?”. $z$ cannot be any function because it has to be a cocycle, but which functions are cocycles?</p>
<blockquote>
<p>Exercise<br />
Show that any cocycle gives rise to an abelian group. That is, if $z:\O\times\O\rightarrow\T$ is a cocycle, then we can form an abelian group on the symbols $\rep ab$ for $a\in\T$ and $b\in\O$ with addition given by<center>
$$\rep{a_1}{b_1} + \rep{a_2}{b_2} = \rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2}$$</center></p>
</blockquote>
<p>Recall that an extension $E$ of $\T$ by $\O$ is an abelian group s.t. a short exact of the following form exists</p>
<script type="math/tex; mode=display">\begin{CD}
0 @>>> \T @>>> E @>>> \O @>>> 0
\end{CD}</script>
<p>Furthermore, existence of such a sequence implies that $\T\le E$ and $E/\T\simeq\O$. Now, let $E$ be an arbitrary extension of $\T$ by $\O$ with quotient map $p:E\rightarrow\O$. We want the describe the possible group structures of $E$. First, since $\T\le E$, for any $a\in\T$ write it’s equivalent element in $E$ as $\rep a0$. Now, for each $b\in\O$, pick some element $x_b\in E$ and write this element as $x_b=\rep0b\in E$ s.t. $p(x_b)=b$. Finally, for any $a\in\T$ and $b\in\O$, let $\rep ab$ denote the element $\rep a0+\rep0b\in E$.</p>
<blockquote>
<p>Theorem<br />
Every element of $E$ can be written uniquely in the form $\rep ab$.</p>
</blockquote>
<div class="proof3">
Pf: Pick any $x\in E$, and let $b=p(x)$. Note that $p(x-\rep0b)=p(x)-p(\rep0b)=0$. Hence, $x-\rep0b\in\ker p=\T$ so choose an $a\in\T$ such that $\rep a0=x-\rep0b$. Then, $x=\rep ab$. For uniqueness, note that there are 100 elements of $E$ of the form $\rep ab$. Furthermore, since $E/\T\simeq\O$, Langrange implies that $|E|/|\T|=|\O|$ so $E$ only has 100 elements. The theorem follow. $\square$
</div>
<p>The above theorem let’s us safely assume elements of $E$ are written in the form $\rep ab$. We now wish to understand addition on $E$. From viewing $\T$ as a subgroup of $E$, we can see that</p>
<script type="math/tex; mode=display">\rep{a_1}0+\rep{a_2}0=\rep{a_1+a_2}0</script>
<p>When can ask what happens when we instead add $\rep0{b_1}+\rep0{b_2}$. This must be $\rep ab$ for some $a\in\T$ and $b\in\O$. By applying $p$ to the sum, we see that $b=b_1+b_2$. However, we do not know for sure what $a$ is. It will be $0$ in the case that $E\simeq\T\times\O$ but not always.</p>
<blockquote>
<p>Definition<br />
Given an extension $E$ of $\T$ by $\O$, the <strong>associated cocycle</strong> of $E$ is the function $z:\O\times\O\rightarrow\T$ given by the formula<br /><center>
$$\rep0{b_1}+\rep0{b_2}=\rep{z(b_1,b_2)}{b_1+b_2}$$</center></p>
</blockquote>
<p>By making use of the associated cocycle, we see that addition on $E$ in general is given by</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\rep{a_1}{b_1} + \rep{a_2}{b_2} &= \rep{a_1}0 + \rep{a_2}0 + \rep0{b_1} + \rep0{b_2} \\
&= \rep{a_1+a_2}0 + \rep{z(b_1,b_2)}{b_1+b_2} \\
&= \rep{a_1+a_2}0 + \rep{z(b_1,b_2)}0 + \rep0{b_1+b_2} \\
&= \rep{a_1+a_2+z(b_1,b_2)}0 + \rep0{b_1+b_2} \\
&= \rep{a_1+a_2+z(b_1,b_2)}{b_1+b_2}
\end{align*} %]]></script>
<blockquote>
<p>Theorem<br />
Let $E$ by an extension of $\T$ by $\O$. Then, the associated cocycle $z$ of $E$ actually is a cocyle.</p>
</blockquote>
<div class="proof3">
Pf: The cocycle condition follows from associativity of addition on $E$. Normalization follows from $\rep ab+\rep00=\rep ab=\rep00+\rep ab$. $$\square$$
</div>
<p>Thus, every extension gives rise to a cocycle, and every cocycle gives rise to an extension <sup id="fnref:3:1"><a href="#fn:3" class="footnote">3</a></sup> !</p>
<h1 id="coboundaries">Coboundaries</h1>
<p>We’ve just shown that the problem of understanding cocycles and which functions they can be is connected to the problem of understand extensions of $\T$ by $\O$, and so we wonder if this connection is 1-1; that is, does every choice of cocycle give rise to a different extension?</p>
<blockquote>
<p>Definition<br />
A group homomorphism $\phi:E\rightarrow E’$ of extensions of $\T$ by $\O$ is called an <strong>isomorphism of extensions</strong> if $\phi$ restricts to the identity on $\T$ and the induced map on quotients $\bar\phi:\O\rightarrow\O$ is the identity on $\O$.</p>
</blockquote>
<p>Consider some isomorphism of extensions $\phi:E\rightarrow E’$. The condition that $\phi$ is the identity of $\T$ says that $\phi(\rep a0)=\rep a0$ for any $a\in\T$. The fact that it’s the identity on the induced quotient map says that $\phi(\rep ab)=\rep{a’}b$ for some $a’\in\T$ depending $a$ and $b$.</p>
<p>To study this dependence further, let $h:\O\rightarrow\T$ be the function defined by</p>
<script type="math/tex; mode=display">\phi(\rep0b)=\rep{h(b)}b</script>
<p>Then, letting $z,z’$ denote the associated cocycles of $E,E’$ repsectively, we can perform the following manipulations to determine a condition linking the associated cocycles of $E$ and $E’$</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\phi(\rep0{b_1} + \rep0{b_2}) &=& \rep{h(b_1)}{b_1} + \rep{h(b_2)}{b_2} &=& \rep{h(b_1)+h(b_2)+z'(b_1,b_2)}{b_1+b_2}\\
\phi(\rep0{b_2} + \rep0{b_1}) &=& \phi(\rep{z(b_1,b_2)}{b_1+b_2}) &=& \rep{z(b_1,b_2)+h(b_1+b_2)}{b_1+b_2}
\end{matrix} %]]></script>
<p>This let’s us see that</p>
<script type="math/tex; mode=display">\begin{align*}
h(b_1) + h(b_2) + z'(b_1,b_2) = z(b_1,b_2) + h(b_1+b_2) \implies z(b_1,b_2)-z'(b_1,b_2) = h(b_1) - h(b_1+b_2) + h(b_2)
\end{align*}</script>
<p>so for any two isomorphic extensions, the difference of their cocycles must be in this form. The converse of this claim holds true as well.</p>
<blockquote>
<p>Exercise<br />
Suppose that $E,E’$ are extensions with associated cocycles $z,z’:\O\times\O\rightarrow\T$ such that there exists some function $h:\O\rightarrow\T$ for which <sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> <center>
$$z(b_1,b_2)-z'(b_1,b_2) = h(b_1) - h(b_1+b_2) + h(b_2)$$</center>
Then, $E$ and $E’$ are isomorphic extensions.</p>
</blockquote>
<p>This is reason enough for functions of this form to be given a definition, prompting</p>
<blockquote>
<p>Definition<br />
Given a function $h:\O\rightarrow\T$ such that $h(0)=0$, it’s <strong>coboundary</strong> is the function $\delta h:\O\times\O\rightarrow\T$ given by <center>$$ \delta h(b_1,b_2)=h(b_1)-h(b_1+b_2)+h(b_2)$$</center></p>
</blockquote>
<p>Given the above investigation into isomorphic extensions and exercise, we can state, and have already proven, the following theorem.</p>
<blockquote>
<p>Theorem<br />
Two extensions are isomorphic if and only if their associated cocycles differ by a coboundary.</p>
</blockquote>
<p>One consequence of this is that the correspondence between cocycles and extensions observed in the last section is not 1-1.</p>
<h1 id="cohomology">Cohomology</h1>
<p>In the section, we’ll prove what was for me the most surprising result of the paper. To begin, let $\z(\T;\O)$ and $\B(\T;\O)$ denote the sets of cocycles and coboundaries, repectively.</p>
<blockquote>
<p>Proposition<br />
$\z(\T;\O)$ and $\B(\T;\O)$ are abelian groups.</p>
</blockquote>
<div class="proof3">
Pf: Left as an exercise
</div>
<blockquote>
<p>Proposition<br />
$\B(\T;\O)\le\z(\T;\O)$</p>
</blockquote>
<div class="proof3">
Pf: Pick some coboundary $\delta h\in\B(\T,\O)$. We need to show that it satisfies the cocycle and normalization conditions. The normalization condiditon follows from the requirement that $h(0)=0$ so <center>$\delta h(0,b) = h(0) - h(b) + h(b) = 0 = h(b) - h(b) + h(0) = h(b,0)$</center>
For the cocycle condition, you do algebra and expand definitions and it works out. $\square$
</div>
<p>Now, we do the following.</p>
<blockquote>
<p>Definition<br />
The <strong>cohomology group</strong> $\H(\T;\O)$ is the quotient group $\frac{\z(\T;\O)}{\B(\T;\O)}$</p>
</blockquote>
<p>This is amazing because it says exactly that the cohomology group is what you get when you take cocycles, and then consider two of them equal whenever they differ by a coboundary! In other words, the last theorem of the previous section can be restated as</p>
<blockquote>
<p>Theorem<br />
The cohomology group $\H(\T;\O)$ is isomorphic to the set of (isomorphism classes of) extensions of $\T$ by $\O$.</p>
</blockquote>
<p>So we have to abelian groups $\T$ and $\O$. From these we can form several different extensions, some of which are equivalent. If we look at these extensions of abelian groups under our notion of equivalency, they themselves form an abelian group. As such, there must be some notion of addition of extensions recoverably from this identification with $\H(\T;\O)$. We can take 2 extensions (2 abelian groups), and add them in a well-behaved way in order to produce a new abelian group that is also an extension. That really caught me off guard when I was reading the paper. The last thing we do will be to describe this notion of addition.</p>
<p>Let $E,E’$ be two extensions of $\T$ by $\O$ with associated cocycles $z,z’$. The isomorphism referenced in the above theorem works by sending $E\mapsto z + \B(T;\O)$ so whatever, $E+E’$ is, we should have $E+E’\mapsto z+z’$. Thus, $E+E’$ is (up to isomorphism of extensions) simply just the extension with associated cocyle $z+z’$. This is perhaps unsurprising in hindsight, but is still a notion one might not think to consider before coming across this cohomology group. If you would like a more group-theoretic description of $E+E’$ instead of the one I gave in terms of cocycles, then check out page 802 of <a href="https://pdfs.semanticscholar.org/b44b/eb7ff396be62e548e4a6dc39df0bdf65e593.pdf">the paper</a> <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>.</p>
<h1 id="final-words">Final Words</h1>
<p>The ideas appearing in this post <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup> belong to the filed of group cohomology, which looks to be pretty interesting. It’s my understanding that there are many types of cohomologies out there in use in different fields of mathematics, and I believe cohomology grew out of topology. There, you are interested in characterizing the holes of some space by looking at which loops can be contracted to a point <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>. It’s from consideration here that the terms cocycle and coboundary got their names.</p>
<p>Finally, to satisfy any last curiosites, it is know that $\H(\T;\O)\simeq\Z_{10}$ with an isomorphism $f:\Z_{10}\rightarrow\H(\T;\O)$ determined by $f(1)=z+\B(\T;\O)$ where $z$ is the carrying cocycle (associated to $\Z_{100}$) we started this post off with. This means that every extension of $\T$ by $\O$ arises as some multiple of $\Z_{100}$ in the sense of (repeated) extension addition.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>With a twist, the result can’t have a third digit. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>I’ll stick to the former notation because it’s more compact, but wanted both here because the latter is also common. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Previous exercise showed that cocycles give rise to abelian groups. It’s not hard to see that these groups are extensions of T by O <a href="#fnref:3" class="reversefootnote">↩</a> <a href="#fnref:3:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:4">
<p>the pdf also includes the condition that h(0)=0, but I’m pretty sure that is redundant <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>although it doesn’t say much in how they arrived at such a construction <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>cocycles and coboundaries and their quotient groups and whatnot <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>in particular, if a loop can’t be contracted to a point then this indicates the existence of some hole in the way of the would-be contraction. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>This post will go over the material already nicely covered by this document, so if you want, you could just read that instead. The main purpose of reproducing things here is for me to think more actively about the ideas presented there, and to see what I’ll want to do differently. This post will assume as much group theory as I covered in my last post.Algebra Part I2017-09-12T05:00:00+00:002017-09-12T05:00:00+00:00https://nivent.github.io/blog/group-intro<p>I have ideas for <a href="../addition">a couple</a> posts I want to write, but unfortunately, they both will require some level of familiarity with abstract algebra, and I don’t want to just assume the reader has the necessary prereq and then go on writing them. Instead, I’ve given myself the ambitious <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> goal of introducing most of the relevant algebra (spoiler: <sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>) in a series of blog posts beginning with this one on group theory.</p>
<blockquote>
<p>Bit of a Disclaimer<br />
I can’t possibly mention everything on a particular subject in one post, and I am not a particular fan of writing insanely long posts, so some things have to be cut. In particular, I aim to introduce most of the important topics in each subject without necessarily doing a deep dive, and while I will try to mention specific examples of things, I won’t spend too much time looking at them closely. It will be up to you to take the time to make sure the example makes sense. Because of this, I’ll try to include exercises that should be good checks of understanding. Finally, as always, things are presented according to my tastes and according to whatever order they happen to pop into my head; hence, they are not necessarily done the usual way.</p>
</blockquote>
<h1 id="whats-a-group">What’s a group?</h1>
<p>The natural place to start is with the definition of a group. Broadly speaking, groups follow two important themes in mathematics. These are the idea that, in math, we like to study collections of object that possess some kind of structure, and the idea that symmetry is often beneficial to doing mathematics. With that said, a group is intuitively a collection of symmetries <sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> where you can think of a symmetry as some action <sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> on an object that preserves it’s shape. We’ll see this more explicitly with our first example of a group.</p>
<p>Before we get to a formal definition, let’s look at <script type="math/tex">D_8</script>, the group of symmetries of a square. These are all the actions you can perform on a square that leave it visually unchanged. To help make sense of what they are, we’ll visualize them using a square with labelled vertices.</p>
<center>
<img src="https://nivent.github.io/images/blog/group-intro/d8.jpeg" width="200" height="200" />
</center>
<p>Above is our square starting out. The simplest symmetry we can apply to it is the “do nothing” symmetry. However, that’s not a particularly exciting thing to do, so we spend a little more time thinking about how we can reposition the square, and decide to try rotating it 90 degrees (counterclockwise). After thinking about it some more, we realize we could also flip the square about it’s main diagonal (from 1 to 3)</p>
<center>
<img src="https://nivent.github.io/images/blog/group-intro/rot.jpeg" width="200" height="200" />
<img src="https://nivent.github.io/images/blog/group-intro/diag.jpeg" width="200" height="200" />
</center>
<p>Now the interesting thing happens when we try to compose these. What if we rotate and then flip (left image), or flip and then rotate (right image)?</p>
<center>
<img src="https://nivent.github.io/images/blog/group-intro/fr.jpeg" width="200" height="200" />
<img src="https://nivent.github.io/images/blog/group-intro/rf.jpeg" width="200" height="200" />
</center>
<p>The first thing we notice is that these images are different, so order of symmetris matter. The second thing we may notice is that the image on the left is the previous right image (the flip) rotated 270 degrees, while the image on the right is the previous left image (the rotation) flipped across the other diagonal. Letting <script type="math/tex">R</script> denote a <script type="math/tex">90\deg</script> rotation, <script type="math/tex">F</script> denote a flip across the main diagonal, and <script type="math/tex">F'</script> denote a flip across the other diagonal, symbolically, we have</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
FR = R^3F && RF = F'R
\end{align*} %]]></script>
<p>where <script type="math/tex">AB</script> denote the result of first applying <script type="math/tex">B</script> and then applying <script type="math/tex">A</script>. Now, there is more that can be said about this group, but I’ll leave the exploring to you.</p>
<p>From the above example, we can see some properties we might want in order to call something a collection of symmetries (i.e. a group). First, there should be a “do nothing” symmetry that just leaves things unchanged. Second, symmetries don’t necessarily have to commute (that is, <script type="math/tex">AB\neq BA</script> in general), but should always be composable. I didn’t highlight these last two, but we should also expect that applying symmetries one after the other always gets you the same thing as long as you apply them in the same order (e.g. <script type="math/tex">A(BC)</script> and <script type="math/tex">(AB)C</script> should be the same), and we want to be able to undo an symmetry. Putting all these together, we get the following defintion.</p>
<blockquote>
<p>Definition<br />
A <strong>group</strong> <script type="math/tex">(G,\star)</script> is a set <script type="math/tex">G</script> together with an operation <script type="math/tex">\star:G\times G\rightarrow G</script> satisfying the following for all <script type="math/tex">a,b,c\in G</script><br />
<script type="math/tex">% <![CDATA[
\begin{align*}
&\bullet a\star(b\star c)=(a\star b)\star c && \text{Associativity}\\
&\bullet \text{there exists an }e\in G\text{ such that }g\star e=g=e\star g\text{ for all }g\in G && \text{Identity}\\
&\bullet \text{there exists a }g\in G\text{ such that }g\star a=e=a\star g && \text{Inverses}
\end{align*} %]]></script><br />
If it turns out that additionally, <script type="math/tex">a\star b=b\star a</script> for all <script type="math/tex">a,b\in G</script>, then <script type="math/tex">G</script> is called an <strong>abelian group</strong> <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup></p>
</blockquote>
<p>That’s all there is to it. 3 simple conditions, and we’ll see that groups exhibit some well-behaved properties <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>. Most of the time when talking about a specific group, the underlying operation will be understood, and so I’ll just refer to the group as <script type="math/tex">G</script> instead of <script type="math/tex">(G,\star)</script>. Furthermore, the group operation is usually denoted <script type="math/tex">ab</script> instead of <script type="math/tex">a\star b</script> cause mathematicians are lazy. If the group is abelian, then <script type="math/tex">a+b</script> is also common. Now that we have a definition, try to answer as many of these as you can.</p>
<blockquote>
<p>Question<br />
Are the following groups, abelian groups, or neither?</p>
<ul>
<li>$D_8$</li>
<li>$(\Z,+)$</li>
<li>$(\Z,*)$</li>
<li>$(\R,+)$</li>
<li>$(\R,*)$</li>
<li>$(\R-\{0\},*)$</li>
<li>$(M_2(\R),+)$ the set of <script type="math/tex">2\times2</script> matrices under addition</li>
<li>$(\Q-\{0\},*)$</li>
<li>$D_{2n}$ the set of symmetrices of a regular <script type="math/tex">n</script>-gon under composition</li>
<li>$(2\Z, +)$ the even numbers under addition</li>
<li>$(\Z_{12}, +)$ the integers mod 12 under addition</li>
<li>$(\Z_7-\{0\}, *)$ the integers mod 7 (except 0) under multiplication</li>
<li>the empty set under any operation you like</li>
<li>{e} the singleton consisting of only the identity</li>
</ul>
</blockquote>
<h1 id="basic-properties-of-groups">Basic Properties of Groups</h1>
<p>Alright, now that we know what a group is, let’s see some of the benefits of studying them. The most obvious one is generality. If we show something is true for an arbitary group <script type="math/tex">G</script>, then we automatically know it’s true for the integers or the reals or matrices or what have you. So to start off, let’s prove some basic facts about groups.</p>
<blockquote>
<p>Theorem<br />
The identity element of a group is unique</p>
</blockquote>
<div class="proof3">
Pf: Let $$G$$ be a group, and suppose that both $$e$$ and $$f$$ are identity elements. That is, for any $$a\in G$$, we have $$ae=a=ea$$ and $$af=a=fa$$. Hence, the theorem follows from<br />
$$\begin{align*}
e=ef=f
\end{align*}$$
where we used the fact that $$f$$ is the identity on the left equality, and the fact that $$e$$ is the identity on the right equality. $$\square$$
</div>
<p>The above theorem maybe isn’t too surprising. It basically says that there’s only one way to do nothing. The next theorem maybe also isn’t surprising either.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">a\in G</script> be an element of a group. Then, <script type="math/tex">a</script> has a unique inverse.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<p>Now that we know that inverses are unique, we’ll denote the inverse of <script type="math/tex">a\in G</script> by <script type="math/tex">a^{-1}</script> (or <script type="math/tex">-a</script> if <script type="math/tex">G</script> is abelian). We’ll see a couple more proofs, and then we’ll get a look at something maybe a little less abstract.</p>
<blockquote>
<p>Theorem (Socks and Shoes <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>)<br />
You put socks on before wearing shoes, but you have to remove your shoes before you can remove your socks. Symbollically, in a group <script type="math/tex">G</script>, for any <script type="math/tex">a,b\in G</script>, we have <script type="math/tex">(ab)^{-1}=b^{-1}a^{-1}</script></p>
</blockquote>
<div class="proof3">
Pf: <br />
$$\begin{align*}
(ab)(b^{-1}a^{-1}) &= a(bb^{-1})a^{-1}\\ &= aea^{-1}\\ &= aa{-1} \\ &= e
\end{align*}$$<br />
Since inverse are unique, we must have $$(ab)^{-1}=b^{-1}a^{-1}$$.$$\square$$
</div>
<blockquote>
<p>Theorem<br />
If <script type="math/tex">(G,\star_G)</script> and <script type="math/tex">(H,\star_H)</script> are groups, then we can form the group <script type="math/tex">G\times H</script> of pairs of elements of <script type="math/tex">G</script> and <script type="math/tex">H</script> with product <script type="math/tex">(g_1,h_1)(g_2,h_2)=(g_1\star_Gg_2,h_1\star_Hh_2)</script></p>
</blockquote>
<div class="proof3">
Pf: Exercise to the reader. Verify the group properties.
</div>
<p>Note that the above group is called the <strong>direct product</strong> of <script type="math/tex">G</script> and <script type="math/tex">H</script>. It is sometimes also denoted <script type="math/tex">G\oplus H</script>, and called their <strong>direct sum</strong>. These notions are the same here, but differ if you start talking about the direct product/sum of infinite collections of groups; however, we won’t get into that here.</p>
<h1 id="structure-of-groups">Structure of Groups</h1>
<p>Let’s return to our <script type="math/tex">D_8</script> example. Recall that <script type="math/tex">R</script> denotes rotation by 90 degrees and <script type="math/tex">F</script> denotes a flip about the main diagonal. A fact that I will not prove here, but that you can spend some time convincing yourself of, is that <script type="math/tex">D_8</script> is generated by <script type="math/tex">F</script> and <script type="math/tex">D</script> in the following sense.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">G</script> be a group and <script type="math/tex">A\subset G</script> be a subset of <script type="math/tex">G</script>. Then, we say <script type="math/tex">G</script> is <strong>generated by</strong> <script type="math/tex">A</script> if every elment of <script type="math/tex">G</script> is some (finite) product of elements of <script type="math/tex">A</script>. Furthermore, if <script type="math/tex">A</script> is finite <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>, then we say that <script type="math/tex">G</script> is <strong>finitely generated</strong>.</p>
</blockquote>
<p>The remark about the definition amounts to saying that any symmetry of a square is really just the result of a bunch of flips and rotations. I mention this just out of curiousity. I had planned on using it as justification in giving a more explicity description for <script type="math/tex">D_8</script>, but unfortunately, the description would be longer than I’m willin to type out, so let’s look at a different group instead.</p>
<p>Out new star group is <script type="math/tex">\Z_4</script> which is the integers under addition (mod 4). This group is special for a few reasons, but most of these are a result of it being generated by 1 element which motivates the following definition.</p>
<blockquote>
<p>Definition<br />
A group <script type="math/tex">G</script> is called <strong>cyclic</strong> if it is generated by a single element. That is, it is cyclic there exists some <script type="math/tex">g\in G</script> s.t. every <script type="math/tex">a\in G</script> can be written in the form <script type="math/tex">g^n</script> for some integer <script type="math/tex">n</script>.</p>
</blockquote>
<p>In order to better understand <script type="math/tex">\Z_4</script>’s structure, we will look at its “multiplication” table.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{c | c | c | c}
\Z_4 & 0 & 1 & 2 & 3 \\ \hline
0 & 0 & 1 & 2 & 3 \\ \hline
1 & 1 & 2 & 3 & 0 \\ \hline
2 & 2 & 3 & 0 & 1 \\ \hline
3 & 3 & 0 & 1 & 2 \\
\end{array} %]]></script>
<p>Now, let’s consider the group <script type="math/tex">\Z_{10}^\times</script>. This is the integers (mod 10) that are coprime to 10 under multiplication. Hence, it’s multiplication table looks like</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{c | c | c | c}
\Z_{10}^\times & 1 & 3 & 7 & 9 \\ \hline
1 & 1 & 3 & 7 & 9 \\ \hline
3 & 3 & 9 & 1 & 7 \\ \hline
7 & 7 & 1 & 9 & 3 \\ \hline
9 & 9 & 7 & 3 & 1 \\
\end{array} %]]></script>
<p>Now, you look at this and ask “what’s the point?” cause we’re just looking at some random tables. However, an important observation is that things like <script type="math/tex">\{0,1,2,3\}</script> and <script type="math/tex">\{1,3,7,9\}</script> are just symbols. It doesn’t really matter how we call the elemnts of the group; all that’s important is how they relate to each other. As an exercise in this way of thinking let’s relabel these tables using the following mappings</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
&\Z_4 && \Z_{10}^\times\\
&0\mapsto a && 1\mapsto a\\
&1\mapsto b && 3\mapsto b\\
&2\mapsto c && 9\mapsto c\\
&3\mapsto d && 7\mapsto d
\end{align*} %]]></script>
<p>If we do that and remake the tables we will get the following for <script type="math/tex">\Z_4</script></p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{ c | c | c | c}
\Z_4 & a & b & c & d \\ \hline
a & a & b & c & d \\ \hline
b & b & c & d & a \\ \hline
c & c & d & a & b \\ \hline
d & d & a & b & c \\
\end{array} %]]></script>
<p>Repeat this process for <script type="math/tex">\Z_{10}^\times</script> and look at the table you get. If everything went as planned, you will see you ended up with exactly the same table. Since we said that the important thing was not the symbols but how they interacted with each other, this gives us complete justification in saying that, in some sense, <script type="math/tex">\Z_4</script> and <script type="math/tex">\Z_{10}^\times</script> are the same group. This is an idea we will now make rigorous via the notion of structure-preserving maps.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">(G,\star_G)</script> and <script type="math/tex">(H,\star_H)</script> be two groups. A <strong>homomorphism</strong> or <strong>group map</strong> <script type="math/tex">f:G\rightarrow H</script> is a function with the property that for any <script type="math/tex">a,b\in G</script>, we have <script type="math/tex">f(a\star_Gb)=f(a)\star_Hf(b)</script>. If furthermore, <script type="math/tex">f</script> is injective then we call it an <strong>embedding</strong> of <script type="math/tex">G</script> into <script type="math/tex">H</script>, and it <script type="math/tex">f</script> is bijective, then it is called an <strong>isomorphism</strong> and we say that <script type="math/tex">G</script> and <script type="math/tex">H</script> are <strong>isomorphic</strong> groups and denote this <script type="math/tex">G\simeq H</script>.</p>
</blockquote>
<p>In essence, homomorphisms let us relate the structures of two groups by saying that they are doing something similar. If the homomorphism is injective, then it is essentially saying that a copy of <script type="math/tex">G</script> lives inside of <script type="math/tex">H</script>. An example of this is the following</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
f:&\Z_3 &\longrightarrow& \Z_{15}\\
&a &\longmapsto& 5a
\end{matrix} %]]></script>
<p>It is not hard to verify that this is an injective homomorphism, so it lets us realize <script type="math/tex">\Z_3</script> as sitting inside of <script type="math/tex">\Z_{15}</script> in the form <script type="math/tex">f(\Z_3)=\{0,5,10\}</script>. By looking at their multiplication tables, we showed earlier that <script type="math/tex">\Z_4\simeq\Z_{10}^\times</script>. To get a better handle on homomorphims and see why they are the natural type of function to consider for groups, we’ll prove some quick theorems</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism. Let <script type="math/tex">e_G,e_H</script> be the identities of <script type="math/tex">G</script> and <script type="math/tex">H</script>, respectively. Then, <script type="math/tex">f(e_G)=e_H</script></p>
</blockquote>
<div class="proof3">
Pf: Fix any $$g\in G$$ so $$f(g)=f(ge_G)=f(g)f(e_G)$$. If we multiply both sides (on the left) by $$f(g)^{-1}$$, then this equation becomes $$e_H=f(g)^{-1}f(g)=f(g)^{-1}f(g)f(e_G)=f(e_G)$$ $$\square$$
</div>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism. Then, for any <script type="math/tex">a\in G</script> and <script type="math/tex">n\in\mathbb Z</script>, we have <script type="math/tex">f(a^n)=f(a)^n</script>.</p>
</blockquote>
<div class="proof3">
Pf: We will show that $$f(a^{-1})=f(a)^{-1}$$ and the theorem will follow from an easy induction argument I won't bother doing. This case is an immediate consequence of $$f(a)f(a^{-1})=f(aa^{-1})=f(e_G)=e_H$$$$\square$$
</div>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">G</script> be a group and fix some <script type="math/tex">g\in G</script>. The function <script type="math/tex">f:G\rightarrow G</script> defined by <script type="math/tex">f(x)=gx</script> is a bijection, but not a homomorphism.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be an isomorphism. Then, <script type="math/tex">f^{-1}:H\rightarrow G</script> is also an isomorphism.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\simeq</script> is an equivalence relation on the class of groups.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<h1 id="subgroups">Subgroups</h1>
<p>When introducing embeddings, I mentioned that they give us a way of viewing one group inside of another. This idea is formalized view the notion of subgroups which are pretty much exactly what they sound like.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">H\subseteq G</script> be a subset of a group <script type="math/tex">G</script>. We say <script type="math/tex">H</script> is a subgroup of <script type="math/tex">G</script>, denoted <script type="math/tex">H\le G</script>, if <script type="math/tex">H</script> itself is a group.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism. Then, <script type="math/tex">f(G)\subseteq H</script> is a subgroup of <script type="math/tex">H</script>. Furthermore, if <script type="math/tex">f</script> is injective, then <script type="math/tex">f(G)\simeq G</script>.</p>
</blockquote>
<div class="proof3">
Pf: Let $$I:=f(G)$$. To show that $$I$$ is a group, we need to show it contains the identity, has inverses, and it's multiplicatin is associative. Well, $$f(e_G)=e_H$$ and $$f(e_G)\in I$$ by definition, so we're good there. Furthermore, any element of $$I$$ can be written in the form of $$f(g)$$ for some $$g\in G$$. Hence, $$f(g^{-1})=f(g)^{-1}$$ is in $$I$$ as well, so we have inverses. Associativity follows from the fact that $$I$$'s multiplication is $$H$$'s multiplication, and it's associative. Finally, $$f\mid_I$$ is surjective by definition and so an isomorphism when $$f$$ is injective. $$\square$$
</div>
<p>One thing worth mentioning is that while the definition of a subgroup requires that it first be a subset, we usually ignore this part and call any <script type="math/tex">H</script> embaddable into <script type="math/tex">G</script> a subgroup of <script type="math/tex">G</script>. Now that we have this notion of a subgroup, let’s see what we can do with it.</p>
<blockquote>
<p>Theorem (2-step subgroup test)<br />
Let <script type="math/tex">H\subseteq G</script> be a non-empty subset of a group. Then, <script type="math/tex">H</script> is a subgroup of <script type="math/tex">G</script> if it is closed under multiplication, and contains inverses for each element.</p>
</blockquote>
<div class="proof3">
Pf: Multiplication on $$H$$ inherits associativity from multiplication on $$G$$, and it has inverses by assumption. The group operation of $$H$$ is well-defined since it is closed, so the only thing to verify is that $$H$$ contains the identity. $$H$$ is non-empty so pick some $$h\in H$$. By assumption $$h^{-1}\in H$$ as well. Since $$H$$ is closed under multiplication, $$hh^{-1}\in H$$ so $$H$$ contains the identity. $$\square$$
</div>
<blockquote>
<p>Theorem (1-step subgroup test)<br />
Let <script type="math/tex">H\subseteq G</script> be a non-empty subset of <script type="math/tex">G</script>. Then, <script type="math/tex">H</script> is a subgroup of <script type="math/tex">G</script> if for all <script type="math/tex">a,b\in H</script>, we have <script type="math/tex">ab^{-1}\in H</script> as well.</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">G</script> be a group and fix some element <script type="math/tex">a\in G</script>. We say <script type="math/tex">\langle a\rangle=\{a^n:n\in\Z\}</script> is the <strong>cyclic group generated by <script type="math/tex">a</script></strong>.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\langle a\rangle\le G</script></p>
</blockquote>
<div class="proof3">
Pf: Pick any two elements, $a^n,a^m\in\gen a$. Then, $(a^n)(a^m)^{-1}=(a^n)(a^{-m})=a^{n-m}\in\gen a$. Furthermore, $\gen a$ is visibly non-empty, so it's a subgroup by the 1-step test. $\square$
</div>
<p>So as an example, in <script type="math/tex">\Z</script>, <script type="math/tex">\langle3\rangle=3\Z</script> the multiplies of <script type="math/tex">3</script>. This brings up a good source of confusion. When considering <script type="math/tex">\Z</script> as a group under addition, <script type="math/tex">3^n</script> is not <script type="math/tex">3</script> raised to the <script type="math/tex">n</script>th power, but instead <script type="math/tex">3</script> times <script type="math/tex">n</script>. Luckily, <script type="math/tex">\Z</script> is abelian so this is normally written as <script type="math/tex">3n</script>, but just wanted to clarify.</p>
<p>One important notion in group theory, that unfortunately plays a minor role in this post, is that of the order of an element. For a group <script type="math/tex">G</script>, the order of the group is simply its size. For some element, <script type="math/tex">a\in G</script> its order, denoted $|a|$ is the smallest positive exponent $n$ s.t. <script type="math/tex">a^n=e</script>. If no such <script type="math/tex">n</script> exists, then <script type="math/tex">a</script> is said to have infinite order. For a finite group, every element has some finite order (why?). Calling this the order of <script type="math/tex">a</script> is justified by the following.</p>
<blockquote>
<p>Exercise<br />
<script type="math/tex">|a|=|\gen a|</script></p>
</blockquote>
<p>Note that a (finite) cyclic group is one where the order of some element is the order of the group. Furthermore, since I didn’t make this an exercise before, show that any cyclic group is abelian.</p>
<blockquote>
<p>Definition<br />
Let $f:G\rightarrow H$ be a group homomorphism. We define the <strong>kernel</strong> of $f$ as <script type="math/tex">\ker f=\{g\in G:f(g)=e\}</script> the set of elements mapped to the identity.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a group homomorphism. Then, $\ker f\le G$</p>
</blockquote>
<div class="proof3">
Pf: Exercise for the reader
</div>
<p>Since we now know some stuff about homormorphisms and subgroups, the majority of what follows will focus on proving two main theorems: Langrange’s Theorem and the First Isomorphism Theorem. After that, I will mention something that will be useful for one of the things I wanna talk about in a future post.</p>
<h1 id="cosets">Cosets</h1>
<p>The goal for this section is to find a way to generalize modular arithmetic to arbitrary groups. In modular arithmetic we can say things like $7\equiv4\pmod3$ when $(7-4)|3$, but if you generalize divison to groups in the obvious way, you don’t get anything useful; you’d end up with any two elements being equivalent as long as you modded out by something non-zero (why?). Because of this, instead of building off of division, we will follow the idea that modding out by $3$ is a way of treating $3$ as being $0$; this choice will manifest itself in an important placed on kernels.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">H\le G</script> be a subgroup. We say that <script type="math/tex">H</script> is <strong>normal</strong> if it is the kernel of some homomorphism. We denote this <script type="math/tex">H\trianglelefteq G</script>.</p>
</blockquote>
<p>Returning to our <script type="math/tex">7\equiv4\pmod3</script> example, from the perspective of $3$ being $0$ (i.e. $3\Z$ being normal), this equivalence is really expressing that $7=4+3\equiv4+0=4$. We can take this a step further by writing <script type="math/tex">7=1+2*3</script> and $4=1+1*3$ which makes it apparent that they are equivalent because they are both $1$ more than a multiple of $3$. In the context of general groups, if we are going to treat some subgroup as being $0$, then any elements that are a fixed amount more than members of the subgroup should similarly be considered equivalent.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">H\le G</script> be a subgroup, and fix some element <script type="math/tex">a\in G</script>. A <strong>(left) coset</strong> of <script type="math/tex">H</script> is a set of the form <script type="math/tex">aH=\{ah:h\in H\}</script></p>
</blockquote>
<blockquote>
<p>Exercise<br />
Prove or disprove. For any subgroup <script type="math/tex">H\le G</script> and element <script type="math/tex">a\in G</script>, we have that <script type="math/tex">aH=Ha</script> where <script type="math/tex">Ha:=\{ha:h\in H\}</script> is a right coset.</p>
</blockquote>
<p>Hopefully you did the exercise. It turns out to be false in general, but miraculously, it is true for normal subgroups.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\trianglelefteq G</script> be a normal subgroup. Then, every left coset of <script type="math/tex">H</script> is also a right coset (and vice versa).</p>
</blockquote>
<div class="proof3">
Pf: Let $f:G\rightarrow K$ be a homomorphism with $\ker f=H$, and fix any $a\in G$. Then, an arbitrary element of $aH$ has the form $ah$ where $h\in H$. Note that $ah=aha^{-1}a$. To complete the proof we see that $f(aha^{-1})=f(a)f(h)f(a)^{-1}=f(a)ef(a)^{-1}=e$ so $aha^{-1}\in\ker f=H$ and so $ah=(aha^{-1})a\in Ha$. This shows that $aH\subseteq Ha$. The other direction can be shown analagously, so $aH=Ha$ for all $a$ which is more that sufficient to prove the claim. $$\square$$
</div>
<p>The above theorem goes to show that normal subgroups are indeed special, and as it turns out, the converse of the theorem above is true as well <sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup>, so this gives another way of characterizing normal subgroups. Now, recall that we want to come up with some notion of “modding out by a subgroup”, and so we want a way of saying when two elements of the big group are equivalent. We defined cosets with the idea that all of their members should be equivalent, and so the following shouldn’t be surprising.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\le G</script> be a (not necessarily normal) subgroup of <script type="math/tex">G</script>. Then, the relation <script type="math/tex">\sim</script> defined by <script type="math/tex">x\sim y</script> iff <script type="math/tex">xH=yH</script> is an equivalence relation.</p>
</blockquote>
<div class="proof3">
Pf: We need to show that $\sim$ is reflexive, symmetric, and transitive. It's obviously reflexive and symmetric, so we'll focus on transitivity. Suppose $x\sim y$ and $y\sim z$. Then, $xH=yH=zH$ so we have $x\sim z$, and we're done. $$\square$$
</div>
<blockquote>
<p>Corollary<br />
The cosets of <script type="math/tex">H</script> partition <script type="math/tex">G</script></p>
</blockquote>
<p>Now, we can prove our first major result of the post.</p>
<blockquote>
<p>Theorem (Lagrange)<br />
Let <script type="math/tex">H\le G</script> be a subgroup. Then, <script type="math/tex">|H|</script> divides <script type="math/tex">|G|</script></p>
</blockquote>
<div class="proof3">
Pf: Pick two cosets $aH$ and $bH$ of $G$. Then, the map $ah\mapsto (ba^{-1})ah$ is injective (result of every element having an inverse), and you get a similar injective map from $bH\rightarrow aH$. Thus, $|aH|=|bH|$ so all cosets have the same order (size). Since to cosets of $H$ partition $G$, assuming there are $k$ such cosets, we have $|G|=k|H|$. $$\square$$
</div>
<blockquote>
<p>Corollary<br />
The order of an element of a group divides the order of the group.</p>
</blockquote>
<blockquote>
<p>Corollary<br />
Every group of prime order is cyclic</p>
</blockquote>
<p>Most everything after the definition of a coset wasn’t strictly needed for this, but is still good to know. We finally say what it means to mod out by a subgroup.</p>
<blockquote>
<p>Definition<br />
The <strong>index</strong> of <script type="math/tex">H</script> in <script type="math/tex">G</script> is the number of (left) cosets of <script type="math/tex">H</script> in <script type="math/tex">G</script> and is denoted <script type="math/tex">[G:H]</script> or <script type="math/tex">|G:H|</script>.</p>
</blockquote>
<blockquote>
<p>Definition<br />
Let $H\trianglelefteq G$ be a normal subgroup. We define the <strong>quotient group</strong> $G/H$ to be the set of left cosets of $H$ together with the multiplication operation $(aH)(bH)=(ab)H$.</p>
</blockquote>
<p>When we mod a subgroup, we treat elements of the same coset as being equivalent, so instead of operating on individual elements, we operate on cosets instead. In practice, we can usally find a nice group that a quotient is isomorphic to, and so work with it instead of the quotient directly. This way, we can have quotients but still deal with elements instead of cosets. As a few examples</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\frac{\Z\times\Z}{\Z\times\{e\}}\simeq\Z && \frac{\Z}{n\Z}\simeq\Z_n
\end{matrix} %]]></script>
<p>Now that we have this definied, we need to show that this is a good definition.</p>
<blockquote>
<p>Theorem<br />
Multiplication of the quotient group is well-defined. That is, if <script type="math/tex">xH=x'H</script> and <script type="math/tex">yH=y'H</script>, then <script type="math/tex">(xy)H=(x'y')H</script>.</p>
</blockquote>
<div class="proof3">
Pf: Suppose $$H\trianglelefteq G$$ and pick $$x,x',y,y'\in G$$ s.t. $$xH=x'H$$ and $$yH=y'H$$. Then, $$(xy)H=x(yH)=x(Hy)=(xH)y=(x'H)y=x'(Hy)=x'(Hy')=x'(y'H)=x'y'H$$. $\square$
</div>
<p>The proof was short, but notice that we could only switch between left and right cosets so freely because we assumed that <script type="math/tex">H</script> was normal. If <script type="math/tex">H</script> is not normal, then this theorem is false. Now that we have well-definedness, the real crucial thing to show is up to you.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\trianglelefteq G</script>. Then, <script type="math/tex">G/H</script> is a group, and the <strong>quotient map</strong> <script type="math/tex">f:G\rightarrow G/H</script> defined by <script type="math/tex">f(x)=x+H</script> is a surjective homomorphism with $\ker f=H$.</p>
</blockquote>
<div class="proof3">
Pf: (important) exercise to the reader.
</div>
<blockquote>
<p>Exercise<br />
Prove that a subgroup <script type="math/tex">H\le G</script> is normal iff <script type="math/tex">xH=Hx</script> for all <script type="math/tex">x\in G</script>. After this, prove that this is the case iff <script type="math/tex">x^{-1}Hx\subseteq H</script> for all <script type="math/tex">x\in G</script>.</p>
</blockquote>
<blockquote>
<p>Exercise<br />
Prove that for an abelian group, every subgroup is normal.</p>
</blockquote>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">H\trianglelefteq G</script> be a normal group of finite index, and fix any <script type="math/tex">a\in G</script>. Then, <script type="math/tex">a^{[G:H]}\in H</script>.</p>
</blockquote>
<div class="proof3">
Pf: Let $$f:G\rightarrow G/H$$ be the quotient map, and note that $$|G/H|=[G:H]$$. Thus, $$f(a)=a+H$$ has order dividing $$[G:H]$$ by Lagrange, so $$f(a^{[G:H]})=(a+H)^{[G:H]}=H$$ so $$a^{[G:H]}\in\ker f=H$$. $$\square$$
</div>
<h1 id="diagrams-et-al">Diagrams et al.</h1>
<p>In this section, I’m gonna include some pictures, but they’ll be different from the type of images I usually include. Here, I’ll use so-called commutative diagrams. A diagram is a collection of objects (i.e. groups) with direct arrows (i.e. homomorphisms) drawn between them that makes it easier to discuss things when you have several functions going between different groups. We say such a diagram commutes when any path along arrows from one group to another gives the same result <sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup>. This will make more sense when we see some.</p>
<p>I mentioned before that we would prove the first isomorphism theorem. Instead of proving it directly, we will derive it as a corollary of a better theorem in my opinion.</p>
<blockquote>
<p>Factor Theorem<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a homomorphism, and let <script type="math/tex">K\le\ker f</script>. Then, there exists a unique homomorphism <script type="math/tex">h:G/K\rightarrow H</script> such that <script type="math/tex">f</script> factors through <script type="math/tex">h</script> in the sense that the following diagram commutes<center>
<img src="https://nivent.github.io/images/blog/group-intro/factor.png" width="200" height="200" /></center>
That is, <script type="math/tex">f</script> is the composition of <script type="math/tex">h</script> and the quotient map. Furthermore, <script type="math/tex">h</script> is injective iff <script type="math/tex">K=\ker f</script>, and <script type="math/tex">h</script> is surjective iff <script type="math/tex">f</script> is surjective.</p>
</blockquote>
<div class="proof3">
Pf: Let $$f,G,H,K$$ be as in the statement of the theorem. We first want to show that there is a unique $$h:G/K\rightarrow H$$ such that $$h(xK)=f(x)$$ for all $$x\in G$$. Well, define $$h$$ based solely off of this equation. Every element of $$G/K$$ is of the form $$xK$$ and $$f(x)$$ is unique given a choice of $$x$$, so this gives a unique satisying $$h$$. We now need to make sure that $$h$$ is well-definied (the fact that it's a homomorphism follows from $$f$$ being one), so pick $$g,g'\in G$$ s.t. $$gK=g'K$$. Then, $$g^{-1}g'\in K$$ so $$f(g')=f(gg^{-1}g')=f(g)f(g^{-1}g')=f(g)\implies h(gK)=h(g'K)$$ so we're good. For the statements about injectivity and surjectivity, convince yourself that a homomorphism is injective iff it's kernel is trivial, and that $$\image(h)=\image(f)$$. $$\square$$
</div>
<blockquote>
<p>Corollary (First isomorphism theorem)<br />
Let <script type="math/tex">f:G\rightarrow H</script> be a surjective homomorphism. Then, <script type="math/tex">G/\ker f\simeq H</script>.</p>
</blockquote>
<div class="proof3">
Pf: By the above theorem, $$f$$ must factor through some map $$g:G/\ker f\rightarrow H$$. Furthermore, this map must be injective and surjective since we're factoring through the full kernel and $$f$$ was surjective. Thus, we have an isomorphism. $$\square$$
</div>
<p>Lagrange’s Theorem and the first isomorphism theorem are two of the big, foundational theorems for group theory, and we’ve now proven both of them. Normally, this would be a good place to stop, but there’s one last thing I want to quickly introduce <sup id="fnref:11"><a href="#fn:11" class="footnote">11</a></sup>.</p>
<blockquote>
<p>Definition<br />
Consider a sequence of groups and homomorphisms between them<center>
$$\begin{CD}
G_1 @>f_1>> G_2 @>f_2>> G_3 @>f_3>> G_4 @>f_4>> \dots @>f_{n-1}>> G_n
\end{CD}$$</center>
We say such a sequence is <strong>exact</strong> if <script type="math/tex">\image(f_k)=\ker(f_{k+1})</script> for all <script type="math/tex">% <![CDATA[
1\le k<n %]]></script>. In particular, a <strong>short exact sequence</strong> is an exact sequence of the form<center>
$$\begin{CD}
\{e\} @>>> N @>f>> G @>g>> H @>>> \{e\}
\end{CD}$$</center>
where <script type="math/tex">\{e\}</script> is the trivial group.</p>
</blockquote>
<p>The first time I saw exact sequences, all I could think was, “Why? Who cares?” At first glance, they seem pretty artificial, but they actually give a compact way of codifying some information about how groups are related to each other. Let’s look at the short exact sequence appearing in the above definition for example. The fact that the sequence is exact that <script type="math/tex">N</script> says that the image of the incoming map (which must send the trivial element to the identity in <script type="math/tex">N</script>) is the kernal of <script type="math/tex">f</script>. This is just the statement that <script type="math/tex">\ker f=\{e\}</script> or equivalently that <script type="math/tex">f</script> is injective! Similarly, exactness at <script type="math/tex">H</script> says that the image of <script type="math/tex">g</script> is the kernel of the map sending all of <script type="math/tex">H</script> to the identity, so <script type="math/tex">\image g=H</script> and <script type="math/tex">g</script> is surjective! Finally, exactness at <script type="math/tex">G</script> says that <script type="math/tex">\image f=\ker g</script>. Since we know <script type="math/tex">f</script> is injective, this means we can embed <script type="math/tex">N</script> in <script type="math/tex">G</script> as a normal subgroup. Furthermore, since <script type="math/tex">g</script> is surjective, the first isomorphism theorem tells us that <script type="math/tex">G/N\simeq G/\image f\simeq G/\ker g\simeq H</script>, so we get the sense that <script type="math/tex">G</script> is somehow made up from <script type="math/tex">N</script> and <script type="math/tex">H</script> (the simplest example is <script type="math/tex">G\simeq N\times H</script>, and you can easily pick $f,g$ to form a short exact sequence in this case, but other choices of $G$ may work too). Because of this observation, we make our final definition.</p>
<blockquote>
<p>Definition<br />
We say <script type="math/tex">G</script> is an <strong>extension</strong> of <script type="math/tex">N</script> by <script type="math/tex">H</script> <sup id="fnref:12"><a href="#fn:12" class="footnote">12</a></sup> if there exists a short exact sequence<br /><center>
$$\begin{CD}
\{e\} @>>> N @>f>> G @>g>> H @>>> \{e\}
\end{CD}$$</center></p>
</blockquote>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>ambitious because <a href="../modular-arithmetic">historically speaking</a> I’m bad at sticking with these kinds of things <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>I imagine one post of group theory, one on rings/fields, and then one on noetherian rings and dedekind domains <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>whatever that means <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>you won’t see this much here, but it’s important to keep in mind that groups often do perform some action on an object, and studying these group actions can lead to good math. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>not in this algebra sequence, but at some point I hope to give reason to why this isn’t the most appropriate name, and these things should really be called Z-modules instead <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>not in this algebra sequence, but at some point I hope to give reason to why actually groups in general are really ugly and do some pathological things <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>Socks and Sandals if you prefer <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>This is not quite the definition of finitely generated, but works in almost all (actually, maybe all. I don’t know of a counterexample) cases <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>This might be more apparent after we define quotient groups <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Basically every diagram you see in the wild will be commutative <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
<li id="fn:11">
<p>Before I forget to mention this, exercise: look up the like 3 other isomorphism theorems. Also, there’s a decent amount of group theory you’d usually learn leading up to some of the stuff I’ve mentioned here that I didn’t bring up at all. <a href="#fnref:11" class="reversefootnote">↩</a></p>
</li>
<li id="fn:12">
<p>Some people say G is an extension of H by N instead. Doesn’t really matter <a href="#fnref:12" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>I have ideas for a couple posts I want to write, but unfortunately, they both will require some level of familiarity with abstract algebra, and I don’t want to just assume the reader has the necessary prereq and then go on writing them. Instead, I’ve given myself the ambitious 1 goal of introducing most of the relevant algebra (spoiler: 2) in a series of blog posts beginning with this one on group theory. ambitious because historically speaking I’m bad at sticking with these kinds of things ↩ I imagine one post of group theory, one on rings/fields, and then one on noetherian rings and dedekind domains ↩Solving Pell’s Equations2017-08-05T15:50:00+00:002017-08-05T15:50:00+00:00https://nivent.github.io/blog/solving-pell<p>I think this is going to end up being a long one <sup id="fnref:21"><a href="#fn:21" class="footnote">1</a></sup>, and possibly not the easiest post to follow that I’ve made; mostly because I will likely end up introducing a decent number of topics I haven’t talked about here before. I guess we’ll see how things turn out <sup id="fnref:19"><a href="#fn:19" class="footnote">2</a></sup>.</p>
<p>Historically, one topic of interest to number theorists has been diophantine equations. The are equations where you are looking for integer solutions. One famous example is <script type="math/tex">a^2+b^2=c^2</script> where you look for integer solutions. In general, there’s no overarching method to solve any diophantine equation <sup id="fnref:2"><a href="#fn:2" class="footnote">3</a></sup>, and so individual equations may be solved using ad hoc seeming methods. For example, the pythagorean equation can be solved <a href="../number-theory">by projecting points from the unit circle onto a line</a> <sup id="fnref:3"><a href="#fn:3" class="footnote">4</a></sup>. Another (class of) well-known example(s) is due to Fermat: <script type="math/tex">a^n+b^n=c^n, n>2</script>, but we’ll put off solving this one until a later post.</p>
<p>This post is all about solving Pell’s equation (here, of course, <script type="math/tex">x,y,</script> and <script type="math/tex">d</script> are integers)</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
x^2-dy^2 = 1 && d>1
\end{align*} %]]></script>
<blockquote>
<p>Question<br />
Why do we require <script type="math/tex">d>1</script>? What happens if <script type="math/tex">d\le1</script>?</p>
</blockquote>
<blockquote>
<p>Edit<br />
I never mentioned this in the original post <sup id="fnref:37"><a href="#fn:37" class="footnote">5</a></sup>, but we also want to assume that <script type="math/tex">d</script> is not a square number. If <script type="math/tex">d=k^2</script>, then the equation becomes <script type="math/tex">(x-ky)(x+ky)=1</script> which means <script type="math/tex">x+ky=x-ky=\pm1</script> so <script type="math/tex">ky=-ky\implies y=0</script> and <script type="math/tex">(x,y)=(\pm1,0)</script> are the only solutions.</p>
</blockquote>
<h1 id="a-warm-up-problem-y2x3-2">A Warm-up Problem: <script type="math/tex">y^2=x^3-2</script></h1>
<p>Before solving Pell’s equations, we’ll start with a simpler task (although it may not be immediately obvious that this equation is any easier to solve). At this point, if it seems like things here will be really novel to you, then I recommend that you check out my <a href="../number-theory">previous post on number theory</a>. It’s not required to understand this post, and won’t necessarily add a bunch to your knowledge of the ideas used here, but I think it could serve as good motivation for seeing that both geometric reasoning and working in number systems larger than <script type="math/tex">\Z</script> can be helpful in tackling number theoretic problems <sup id="fnref:8"><a href="#fn:8" class="footnote">6</a></sup>.</p>
<p>If you were to decide to stop reading, leave this post, and go start finding and solving diophantine equations, one thing you will notice is that multiplication makes things so much easier.</p>
<blockquote>
<p>Mini warmup<br />
<script type="math/tex">% <![CDATA[
\begin{align*}
&x^2+6xy+5y^2=10\\
\implies &(x+5y)(x+y) = 10\\
\implies &x+y=2, x+5y=5 &or& &x+y=5, x+5y=2 &&or\\
&x+y=-2, x+5y=-5 &or& &x+y=-5, x+5y=-2\\
\implies &2+4y=5 &or& &5+4y=2 &&or\\
-&2+4y=-5 &or& &-5+4y=-2\\
\implies &y=\pm\frac34
\end{align*} %]]></script><br />
Hence, this particular diophantine equation has no solutions.</p>
</blockquote>
<p>If you can set your problem up as one thing times another thing equals a third thing, then since everything is an integer, the things on the left hand side must be factors of the right hand side! This vastly reduces the number of potential solutions <sup id="fnref:4"><a href="#fn:4" class="footnote">7</a></sup>, and often can lead directly into an actual solution (ot show that non exist).</p>
<p>That being said, the key insight to solving our warmup problem is that we can rewrite it as <script type="math/tex">y^2+2=x^3</script>. I’ll take a second to pause so you can let out a gasp <sup id="fnref:38"><a href="#fn:38" class="footnote">8</a></sup> of amazement. Once things are in this form, we can see that the left hand side is almost a difference of squares. The only problem is that it’s not a difference and <script type="math/tex">2</script>’s not a square, but motivated by the possibility of factoring the left hand side, we ignore these constraints, stop restricting ourselves to <script type="math/tex">\Z</script>, and from here on out, do our work in <script type="math/tex">\zadjns2=\{a+b\sqrt{-2}\mid a,b\in\Z\}</script> instead <sup id="fnref:35"><a href="#fn:35" class="footnote">9</a></sup>. I don’t know if this feels illegitamate, but it shouldn’t because it’s not, so I’m gonna move on.</p>
<p>We can now write our equation as <script type="math/tex">(y+\sqrt{-2})(y-\sqrt{-2})=x^3</script>. At this point, we really hope that <script type="math/tex">y\pm\sqrt{-2}</script> are coprime so that they must both be perfect cubes; this would be a fairly restrictive condition. However, hoping this would be getting ahead of ourselves. This line of thinking would work in <script type="math/tex">\Z</script>, but the reason it works (and the reason we can have a sensible definiton of coprime in the first place) is because <script type="math/tex">\Z</script> is a unique factorization domain <sup id="fnref:5"><a href="#fn:5" class="footnote">10</a></sup>, but we’re working with <script type="math/tex">\zadjns2</script> instead of just <script type="math/tex">\Z</script>. Luckily, it turns out that this is a UFD as well, but this is a non-trivial claim that could have failed if we had added a different square root instead <sup id="fnref:6"><a href="#fn:6" class="footnote">11</a></sup>.</p>
<blockquote>
<p>Exercise<br />
Show that <script type="math/tex">\zadjns2</script> is a UFD.<br />
Hint: It suffices to show that it’s a Euclidean domain, and you can do this by considering points closest to the “ambient quotient” <sup id="fnref:7"><a href="#fn:7" class="footnote">12</a></sup>. Also, you might want to read ahead a little before tackling this exercise.</p>
</blockquote>
<p>To show that <script type="math/tex">y\pm\sqrt{-2}</script> are coprime, we’ll introduce a norm.</p>
<blockquote>
<p>Definition<br />
Given <script type="math/tex">x+y\sqrt{-2}\in\zadjns2</script> with <script type="math/tex">x,y\in\Z</script>, it’s <strong>norm</strong> is <script type="math/tex">N(x+y\sqrt{-2}):=(x+y\sqrt{-2})(x-y\sqrt{-2})=x^2+2y^2</script></p>
</blockquote>
<p>This definition should look familar to anyone who read my previous post, and so it should not come as a surprise that this norm is multiplicative. That is, for any <script type="math/tex">x,y\in\Z[\sqrt{-2}]</script>, we have <script type="math/tex">N(xy)=N(x)N(y)</script>. Let <script type="math/tex">p\in\Z[\sqrt{-2}]</script> be a common factor of <script type="math/tex">y\pm\sqrt{-2}</script>; this means that <script type="math/tex">p\mid(y+\sqrt{-2})-(y-\sqrt{-2})</script> so <script type="math/tex">p\mid2\sqrt{-2}=-(\sqrt{-2})^3</script>.</p>
<p>Before proceeding, a quick note. When considering factoring and related concepts (like primality), we don’t care about units (numbers dividing 1) because units are annoying and change nothing. Furthermore, a number <script type="math/tex">x\in\zadjns2</script> is a unit iff <script type="math/tex">N(x)=1</script>. Proving this is left as an exercise to the reader.</p>
<p>Now, back to our problem. The following proposition implies that <script type="math/tex">p=u\sqrt{-2}^e</script> for some unit <script type="math/tex">u\in\Z[\sqrt{-2}]</script> and integer <script type="math/tex">0\le e\le 3</script>.</p>
<blockquote>
<p>Proposition<br />
In <script type="math/tex">\zadjns2</script>, <script type="math/tex">\sqrt{-2}</script> is prime <sup id="fnref:36"><a href="#fn:36" class="footnote">13</a></sup>.</p>
</blockquote>
<div class="proof2">
Pf: It suffices to note that \(N(\sqrt{-2})=2\) is prime (why?). \(\square\)
</div>
<p>While we’re on the subject</p>
<blockquote>
<p>Exercise<br />
Show that the only units in <script type="math/tex">\Z[\sqrt{-2}]</script> are <script type="math/tex">\pm1</script>.</p>
</blockquote>
<p>Returning to showing that those two numbers are coprime, we can now safely conclude that <script type="math/tex">p=u\sqrt{-2}^e</script> with <script type="math/tex">u,e</script> as described above. Hence, either <script type="math/tex">p</script> is a unit (in which case we win) or <script type="math/tex">\sqrt{-2}\mid p</script>, so let’s assume the latter. This then means that <script type="math/tex">\sqrt{-2}\mid y+\sqrt{-2}</script>. So, for appropriately chosen integers <script type="math/tex">u,v</script>, we have</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
y+\sqrt{-2} = \sqrt{-2}(u+v\sqrt{-2}) &= -2v + u\sqrt{-2} \\
&\implies y = -2v
\end{align*} %]]></script>
<p>Finally, the following proposition shows that this is impossible so <script type="math/tex">p</script> must be a unit, and hence <script type="math/tex">y\pm\sqrt{-2}</script> are coprime.</p>
<blockquote>
<p>Proposition<br />
If <script type="math/tex">(x,y)\in\Z^2</script> satisfies <script type="math/tex">y^2=x^3-2</script>, then <script type="math/tex">y</script> is odd.</p>
</blockquote>
<div class="proof2">
Pf: Assume \(y^2=x^3-2\) for integers \(x,y\). If \(y\) is even, then so is \(x\), so \(x^3\) is divisble by \(8\) but \(y^2+2=4k+2\) is not. \(\square\)
</div>
<p>Now we’re almost there. We know that <script type="math/tex">y+\sqrt{-2}</script> and <script type="math/tex">y-\sqrt{-2}</script> are coprime, and that their product is <script type="math/tex">x^3</script>. It follows from unique factorization that <script type="math/tex">y+\sqrt{-2}</script> must the product of a unit and a cube. However, you showed that the only units are <script type="math/tex">\pm1</script>, so <script type="math/tex">y+\sqrt{-2}=(a+b\sqrt{-2})^3</script> for some integers <script type="math/tex">a,b</script>. We can expand things to get</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
& y+\sqrt{-2} &=& (a^3-6ab^2)+\sqrt{-2}(3a^2b-2b^3)\\
\implies & 1 &=& 3a^2b-2b^3 &=& b(3a^2-2b^2)\\
\implies & \pm1 &=& b\\
\implies & \pm1 &=& 3a^2 - 2\\
\implies & 2\pm1 &=& 3a^2\\
\implies & \pm1 &=& a\\
\implies & 1 &=& b
\end{matrix} %]]></script>
<p>I didn’t feel like explaining all those implications, but the moral of the story is that we have two solutions given by <script type="math/tex">(a,b)=(\pm1, 1)</script> which correspond to <script type="math/tex">y+\sqrt{-2}=(1+\sqrt{-2})^3=-5+\sqrt{-2}</script> and <script type="math/tex">y+\sqrt{-2}=(-1+\sqrt{-2})^3=5+\sqrt{-2}</script>. Both of these solutions for <script type="math/tex">y</script> corresponsd to <script type="math/tex">x=3</script>, so our original equation has two solutions: <script type="math/tex">(x,y)=(3,\pm5)</script>.</p>
<h1 id="introduction-to-algebraic-integers">Introduction to Algebraic Integers</h1>
<p>Now that the warmup is done, we can start building all the theory we’ll need to solve Pell’s equations. The first thing we’ll need is the so-called ring of integers. As the warmup demonstrated, it’s often helpful to work in “large” systems of numbers instead of the relatively small <script type="math/tex">\Z</script>. However, it may not always be clear what group of numbers is the right one for a problem. To answer this question, we focus our attention on algebraic integers. The idea is that regular integers are a particular, nice subset of the field <script type="math/tex">\Q</script> of rational numbers <sup id="fnref:9"><a href="#fn:9" class="footnote">14</a></sup>, and so the right systems of numbers to work with are analagous nice subsets of fields bigger than <script type="math/tex">\Q</script>. Because we’re doing algebra, and algebraists love little more than polynomials, the characterization of nice will be done in terms of a polynomial condition inspired by the <a href="https://www.wikiwand.com/en/Rational_root_theorem">rational root theorem</a>. With all of that said, let’s see some definitions</p>
<blockquote>
<p>Definition<br />
A <strong>number field</strong> <script type="math/tex">K</script> is a finite field extension <sup id="fnref:10"><a href="#fn:10" class="footnote">15</a></sup> of <script type="math/tex">\Q</script>. Furthermore, if <script type="math/tex">K</script> is of the form <script type="math/tex">K=\Q(\sqrt d)=\{a+b\sqrt d\mid a,b\in\Q\}</script> for squarefree <script type="math/tex">d\in\Z</script>, then <script type="math/tex">K</script> is called a <strong>(real or imaginary) quadratic number field</strong>.</p>
</blockquote>
<p>Number fields play the role of <script type="math/tex">\Q</script> in the big picture. From these, we extract nice subsets of so-called algebraic integers.</p>
<blockquote>
<p>Definition<br />
An <strong>algebraic integer</strong> <script type="math/tex">x</script> is a root of a monic polynomial <sup id="fnref:11"><a href="#fn:11" class="footnote">16</a></sup> <script type="math/tex">f\in\Z[X]</script> with integer coefficients. Given a number field <script type="math/tex">K</script>, it’s <strong>ring of integers</strong> <script type="math/tex">\ints K</script> <sup id="fnref:12"><a href="#fn:12" class="footnote">17</a></sup> is the set of algebraic integers in <script type="math/tex">K</script>.</p>
</blockquote>
<p>If this definition, seems weird, then maybe the next exercise will help you see why it’s actually reasonable. If that doesn’t work, then try to come up with another definition that’s well-defined for any number field and makes sense in the case of <script type="math/tex">\Q</script>.</p>
<blockquote>
<p>Exercise<br />
Show that <script type="math/tex">\ints\Q=\Z</script>.</p>
</blockquote>
<p>Because of this exercise, mathematicians sometimes refer to <script type="math/tex">\Z</script> as the ring of “rational” integers in order to distinquish it from other rings of integers. Also, I keep calling these things rings, but it is in no way obvious that they actually do form rings (go ahead and try to prove that <script type="math/tex">ab</script> and <script type="math/tex">a+b</script> are algebraic integers if <script type="math/tex">a,b</script> are. I won’t cover the proof here, but the secret is Cramer’s formula).</p>
<p>For the purposes of this post, we’ll only need to study quadratic number fields, but it’s worth noting that number fields in general – and even arbitrary finite field extensions – have a norm.</p>
<blockquote>
<p>Definition<br />
Given a finite field extension <script type="math/tex">L/K</script>, let <script type="math/tex">\alpha\in L</script> be an arbitary element. Note that <script type="math/tex">\alpha</script> induces a map <script type="math/tex">m_\alpha:L\rightarrow L</script> given by multiplication <script type="math/tex">m_\alpha(\beta)=\alpha\beta</script>, and that <script type="math/tex">m_\alpha</script> is <script type="math/tex">K</script>-linear. We define the <strong>norm</strong> of <script type="math/tex">\alpha</script> to be the determinant of this map. That is, the norm is a map<br /></p>
</blockquote>
<center>$$\begin{matrix}
\norm_{L/K}: &L &\longrightarrow& K\\
&\alpha &\longmapsto& \det(m_\alpha)
\end{matrix}$$</center>
<p>This definition is quite a bit to digest, but we’ll unpack it in the case of quadratic number fields. One thing of note we can quickly gleen from this definition is that it makes the statement that the norm is multiplicative almost trivial (why?).</p>
<p>Now, let’s see what this definition gives in the quadratic case. Fix some squarefree <script type="math/tex">d\in\Z-\{1\}</script>, and let <script type="math/tex">K=\Q(\sqrt d)</script> so <script type="math/tex">K/\Q</script> is a degree 2 <sup id="fnref:13"><a href="#fn:13" class="footnote">18</a></sup> field extension, and one <script type="math/tex">\Q</script>-basis of <script type="math/tex">K</script> is <script type="math/tex">\{1,\sqrt d\}</script>. Fix any element <script type="math/tex">\alpha=a+b\sqrt d</script> of <script type="math/tex">K</script> with <script type="math/tex">a,b\in\Q</script>. We are interested in the determinant of its multiplication map, so we’ll first find the matrix for this map. To do this we only need to compute <script type="math/tex">m_\alpha(1)=\alpha=a+b\sqrt d</script> and <script type="math/tex">m_\alpha(\sqrt d)=\alpha\sqrt d=db+a\sqrt d</script>. Hence, the <script type="math/tex">m_\alpha</script> is given by this matrix (assuming we use the basis <script type="math/tex">\{1,\sqrt d\}</script>):</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{pmatrix}
a & db\\
b & a
\end{pmatrix} %]]></script>
<p>Thus, <script type="math/tex">\knorm(a+b\sqrt d)=a^2-db^2</script> which turns out to also be <script type="math/tex">(a+b\sqrt d)(a-b\sqrt d)</script> <sup id="fnref:14"><a href="#fn:14" class="footnote">19</a></sup>. The fact that this norm takes this multiplicative form means the following theorem is really easy to prove in the quadratic case <sup id="fnref:15"><a href="#fn:15" class="footnote">20</a></sup>.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">K</script> be a quadratic number field. Then, <script type="math/tex">\alpha\in\ints K</script> is a unit (in <script type="math/tex">\ints K</script>, not <script type="math/tex">K</script>. <script type="math/tex">K</script> is a field so basically everything is a unit) if and only if <script type="math/tex">\knorm(\alpha)=\pm1</script></p>
</blockquote>
<div class="proof2">
Pf: \((\rightarrow)\) Assume \(\alpha\in\ints K\) is a unit, and write \(\alpha\beta=1\) Then, \(1=\knorm(1)=\knorm(\alpha\beta)=\knorm(\alpha)\knorm(\beta)\implies\knorm(\alpha)=\pm1\)<br />
(\(\leftarrow\)) Coversely, assume \(\knorm(\alpha)=\pm1\). As the above discussion noted, \(\knorm(\alpha)\) is a product of two numbers, so this says exactly that \(\alpha\) is a unit. \(\square\)
</div>
<p>The diligent reader will be somewhat bothered by the above proof. That’s because it implicitly relies upon something I forgot to prove first, which is the following (where does the above proof rely on this theorem?).</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">K</script> be a quadratic number field. Then, <script type="math/tex">\knorm(\ints K)\subseteq\ints\Q</script> which is to say that the norm of an algebraic integer is a rational integer <sup id="fnref:22"><a href="#fn:22" class="footnote">21</a></sup>.</p>
</blockquote>
<div class="proof3">
Pf: First, let $$K=\qadjs d$$ and $$\alpha=a+b\sqrt d$$ with $$a,b\in\Q$$. Then, $$\alpha$$ is an algebraic integer if and only if $$\conj\alpha:=a-b\sqrt d$$ is an algebraic integer. This is because the map $${}^-:\ints K\rightarrow\ints K$$ fixes $$\Z$$ and preserves both addition and multiplication, so a polynomial $$f\in\Z[X]$$ satisfied by $$\alpha$$ is also satisfied by $$\conj\alpha$$. In particular, $$\alpha$$ satisfies a monic polynomial $$f\in\Z[X]$$ if and only if $$\conj\alpha$$ does, so they share integrality statuses. Now, the product of two algebraic integers is an algebraic integer (mentioned before, not proved), so $$\knorm(\alpha)$$ is an algebraic integer whenever $$\alpha$$ is. Since it's also a rational number, this means $$\knorm$$ maps algebraic integers into rational integers as claimed. $$\square$$
</div>
<p>Now, one last thing, and then we’ll say how all of this discussion on integers and norms relates to Pell’s equations <sup id="fnref:16"><a href="#fn:16" class="footnote">22</a></sup>. I’ve tried to be careful so far to be conscious of the fact than a priori, an elment of <script type="math/tex">\ints K</script> could look like a general member of <script type="math/tex">K</script> in the sense that it’s coefficients could be general rational numbers. From here on out, we’ll be a little more concrete because we’re going to actually compute <script type="math/tex">\ints K</script> in the quadratic case.</p>
<blockquote>
<p>Theorem<br />
Let <script type="math/tex">K=\Q(\sqrt d)</script> be a quadratic number field with <script type="math/tex">d\in\Z-\{1\}</script> a product of distinct primes (i.e. square free). Then,<br /></p>
</blockquote>
<center>$$\ints K=\begin{cases}
\zadjs d & \text{if } d\equiv2,3\pmod 4 \\
\zadj{\frac{1+\sqrt d}2} & \text{if }d\equiv 1\pmod 4
\end{cases}$$</center>
<div class="proof3">
Pf: Assume $$K,d$$ are as above, pick any $$\alpha\in\ints K$$, and write $$\alpha=x+y\sqrt d$$ with $$x,y\in\Q$$. Then, $$\conj\alpha=x-y\sqrt d\in\ints K$$ as well, and so is $$\alpha+\conj\alpha=2x$$. This means that $$2x\in\Q\cap\ints K=\Z$$ so $$x$$ is either an integer or half an integer.<br /><br />
Case 1: $$x\in\Z$$<br />
We also know that $$\knorm(\alpha)=\alpha\conj\alpha=x^2-dy^2\in\Z$$ so by taking a difference, this means that $$dy^2\in\Z$$. This means that $$y\in\Z$$. If it were not, then the denominator of $$y^2$$ would be divided by some prime $$p$$ more than once. However, $$d$$ is divisible by $$p$$ at most once, so the product $$dy^2$$ would contain a $$p$$ in the denominator and hence not be an integer. Thus, $$\alpha\in\Z[\sqrt d]$$.<br /><br />
Case 2: $$x=\frac n2$$ for some odd $$n\in\Z$$<br />
Once again, $$\knorm(\alpha)=x^2-dy^2=n^2/4-dy^2\in\Z$$. Note that $$y\not\in\Z$$ since otherwise we would have $$n^2/4\in\Z$$. Counting prime factors again shows us that we must have $$y=m/2$$ for some odd $$m\in\Z$$ which means that $$n^2-dm^2\equiv0\pmod4$$ with $$n^2,m^2\equiv1\pmod4$$ so $$d\equiv1\pmod4$$. Hence, $$\alpha=n/2+m\sqrt d/2=(1+\sqrt d)/2+(n-1)/2+\sqrt d(m-1)/2\in\Z[(1+\sqrt d)/2]$$ since $$\sqrt d=2(1+\sqrt d)/d - 1\in\Z[(1+\sqrt d)/2]$$<br />
<div align="right">\(\square\)</div>
</div>
<blockquote>
<p>Corollary<br />
<script type="math/tex">\ints{\Q(\sqrt{-2})}=\Z[\sqrt{-2}]</script> so this was the right setting for the warmup problem.</p>
</blockquote>
<p>At this point, I think we know everything about rings of integers that we’ll need <sup id="fnref:17"><a href="#fn:17" class="footnote">23</a></sup>. In case you have forgotten, our goal is find all integer solutions to Pell’s equations which are <script type="math/tex">x^2-dy^2=1</script> for integers <script type="math/tex">x,y\in\Z</script> and positive integer <script type="math/tex">d\in\Z_{>1}</script>. As this discussion hinted at, for the time being, we’ll further restrict <script type="math/tex">d</script> to be square free. This has the advantage that since Pell’s equation can then be written as <script type="math/tex">(x-y\sqrt d)(x+y\sqrt d)=1</script>, <script type="math/tex">d</script> which means that we’re really just looking for units of <script type="math/tex">\Z[\sqrt d]</script>, which is convenient because <script type="math/tex">\Z[\sqrt d]=\ints{\Q(\sqrt d)}</script> for square free <script type="math/tex">d</script> (or at least it is 2 times out of 3) <sup id="fnref:18"><a href="#fn:18" class="footnote">24</a></sup>.</p>
<h1 id="geometry-of-numbers">Geometry of Numbers</h1>
<p>I debated whether I should talk about what comes next in one section or two. I ultimately decided on two because I didn’t want to introduce too much stuff all at once. A priori, the material of this section isn’t relevant to the larger discussion at hand, but in the next section, we’ll see it play a crucial role. This is the point in the post where we open up to the possibility of me throwing in some pictures.</p>
<blockquote>
<p>Definition<br />
A <strong>lattice</strong> of a real vector space is the <script type="math/tex">\Z</script>-span of some <script type="math/tex">\R</script>-basis. If <script type="math/tex">L</script> is a lattice of a real vector space <script type="math/tex">V</script>, then we say the <strong>rank</strong> of <script type="math/tex">L</script> is the dimension of <script type="math/tex">V</script> <sup id="fnref:20"><a href="#fn:20" class="footnote">25</a></sup>.</p>
</blockquote>
<p>The only (finite-dimensional) real vector spaces are <script type="math/tex">\R^n</script> for various choices of <script type="math/tex">n</script>, so a lattice is really just a set of the form <script type="math/tex">\{a_1b_1+a_2b_2+\dots+a_nb_n:a_1,\dots,a_n\in\Z, b_1,\dots,b_n\in\R\}</script> and the <script type="math/tex">b_i</script>’s are <script type="math/tex">\Z</script>-linearly independent. We might write such a lattice using the notation <script type="math/tex">L=\Z b_1\oplus\Z b_2\oplus\dots\oplus\Z b_n</script> <sup id="fnref:32"><a href="#fn:32" class="footnote">26</a></sup>. The canonical example (and in some sense only (finite) example) of a lattice is <script type="math/tex">\Z^n</script>. Some lattices of <script type="math/tex">\R^2</script> are pictured below</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/lattices.jpg" width="650" height="200" />
</center>
<p>One property of lattices that will come up is that they are discrete.</p>
<blockquote>
<p>Definition<br />
A subset <script type="math/tex">D\subset\R^n</script> is called <strong>discrete</strong> if only finitely many of its points are contained in any bounded region. That is, it is discrete if <script type="math/tex">D\cap B_r</script> is finite for all <script type="math/tex">r\in\R</script> where <script type="math/tex">B_r</script> is the (solid) ball of radius <script type="math/tex">r</script> centered at the origin.</p>
</blockquote>
<p>I won’t give a full, formal proof that all lattices are discrete, but I will sketch one direction you could take. The idea is that in a lattice, there exists some <script type="math/tex">\eps>0</script> such that any two lattice points are a distance greater than <script type="math/tex">\eps</script> apart. So, if you have some bounded set <script type="math/tex">B</script>, you can split it up into finitely many balls <script type="math/tex">B_{\frac{\eps}2}</script> of radius <script type="math/tex">\eps/2</script>. Each of these contains at most 1 lattice point, so <script type="math/tex">B</script> contains a finite number of lattice points.</p>
<p>Now, lattices are discrete and look a lot like <script type="math/tex">\Z^n</script>, so it’s natural to think that have some connection with numbers<sup id="fnref:23"><a href="#fn:23" class="footnote">27</a></sup>. Because of this, and because they have applications in number theory, some results on or relating to lattices form the so-called <a href="https://www.wikiwand.com/en/Geometry_of_numbers">geometry of numbers</a>. Here, we’ll prove and use one such theorem, but before that, we need to describe the volume of a lattice.</p>
<blockquote>
<p>Definition<br />
Let <script type="math/tex">\Gamma</script> be a lattice in <script type="math/tex">\R^n</script>. Then, we say the <strong>volume</strong> <script type="math/tex">\vol_{\Gamma}</script> of <script type="math/tex">\Gamma</script> is the volume of a parrallelogram <sup id="fnref:24"><a href="#fn:24" class="footnote">28</a></sup> spanned by a <script type="math/tex">\Z</script>-basis of <script type="math/tex">\Gamma</script>.</p>
</blockquote>
<blockquote>
<p>Examples<br />
The standard lattice <script type="math/tex">\Gamma_1=\Z^2=\Z(1,0)\oplus\Z(0,1)</script> has volume <script type="math/tex">\vol_{\Gamma_1}=1</script> since the basis <script type="math/tex">\{(0,1),(1,0)\}</script> spans a unit square.<br /><br />
The lattice <script type="math/tex">\Gamma_2 = \Z(1,\sqrt2)\oplus\Z(0,-2\sqrt2)</script> has voluem <script type="math/tex">\vol_{\Gamma_2}=2\sqrt2</script></p>
</blockquote>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/volumes.jpg" width="650" height="200" />
</center>
<p>I know what you’re thinking. What if I choose a different basis for my lattice? Instead of writing <script type="math/tex">\Z^2=\Z(1,0)\oplus\Z(0,1)</script>, I might want to write <script type="math/tex">\Z^2=\Z(2,-1)\oplus\Z(-3,1)</script>. Well, doesn’t matter.</p>
<blockquote>
<p>Theorem<br />
The volume of a lattice is well-defined.</p>
</blockquote>
<div class="proof3">
Pf: Let $$\Gamma$$ be a lattice in $$\R^n$$, and let $$B_1,B_2$$ be two bases for $$\Gamma$$. Then, these bases can be represented by matrices and we have $$\vol_{\Gamma}=|\det B_1|$$ or $$\vol_{\Gamma}=|\det B_2|$$, so we only need to show that $$|\det B_1|=|\det B_2|$$. Note that, since they're bases for the same lattice, there much exist a change a basis matrix $$T$$ with $$B_1=TB_2$$. Furthermore, $$T$$ must be invertible with integer entries so $$\det T\in\Z^\times=\{\pm1\}$$, so $$|\det B_1|=|\det T\det B_2|=|\det B_2|$$. $$\square$$
</div>
<p>Now, a couple definitions just to make sure everyone is on the same page, and then the main theorem.</p>
<blockquote>
<p>Definitions<br />
Fix some subset <script type="math/tex">D\subseteq\R^n</script>.<br />
We say <script type="math/tex">D</script> is <b>compact</b> if it is closed <sup id="fnref:25"><a href="#fn:25" class="footnote">29</a></sup> and bounded.<br /><br />
We say <script type="math/tex">D</script> is <strong>convex</strong> if any line between points in <script type="math/tex">D</script> is contained in <script type="math/tex">D</script>. That is, for any <script type="math/tex">a,b\in D</script> and <script type="math/tex">t\in[0,1]</script>, the point <script type="math/tex">at+b(1-t)\in D</script>.<br /><br />
Fixing some point <script type="math/tex">o\in\R^n</script>, we say <script type="math/tex">D</script> is <strong>symmetric about <script type="math/tex">o</script></strong> if for all <script type="math/tex">p\in D</script>, we also have <script type="math/tex">o-p\in D</script>.</p>
</blockquote>
<ul>
<li>A ball is compact, convex, and symmetric about it’s center</li>
<li>A sphere is compact and symmetric about it’s center, but not convex</li>
<li>A triangle is compact and convex, but not symmetric about any point</li>
<li>The inside of a ball is convex and symmetric about it’s center, but not compact</li>
<li>A five-pointed star is compact and convex, but not symmetric about any point</li>
<li>A line segment is compact, convex, and symmetric about it’s center</li>
<li>An infinite line is convex and symmetric about many points, but not compact</li>
</ul>
<blockquote>
<p>Minkowski Convex Body Theorem<br />
Let <script type="math/tex">\Gamma</script> be a lattice in <script type="math/tex">\R^n</script>, and <script type="math/tex">D\subseteq\R^n</script> be compact, convex, and symmetric about the origin. Furthermore, assume <script type="math/tex">\vol(D)\ge2^n\vol_\Gamma</script>. Then, <script type="math/tex">\Gamma\cap D\neq\{0\}</script>.</p>
</blockquote>
<p>The idea behind the proof is that <script type="math/tex">D</script> is just too big to miss all of <script type="math/tex">\Gamma</script>. You essentially take a big parralellogram spanned by a basis of <script type="math/tex">\Gamma</script> and tile <script type="math/tex">\R^n</script> with it. After that you move all the pieces touching <script type="math/tex">D</script> back to the original piece about the origin, and if the volume of <script type="math/tex">D</script> is greater than the volume of the original piece, then two points of <script type="math/tex">D</script> must end up at the same point of the parallelogram. This means that their difference must be twice a lattice point, so their midpoint is a lattice point.</p>
<p>I wasn’t sure what the best way to visualize this without it being a mess was, so here’s a picture of the parralellogram to keep in mind. You can add in <script type="math/tex">D</script> and whatnot using your imagination.</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/translates.jpg" width="300" height="150" />
</center>
<p>If you follow the sketch above, in the end, it relies on <script type="math/tex">D</script> have strictly greater volume, but the theorem doesn’t. This is reconciled by the following.</p>
<blockquote>
<p>Lemma<br />
If Minkowski’s theorem holds for all <script type="math/tex">D</script> with <script type="math/tex">\vol(D)>2^n\vol_\Gamma</script>, then it holds for all <script type="math/tex">D</script> with <script type="math/tex">\vol(D)=2^n\vol_\Gamma</script> as well.</p>
</blockquote>
<div class="proof3">
Pf: Assume $$D,\Gamma$$ satisfy all the conditions of Minkowski's theorem, and that $$\vol(D)=2^n\vol_\Gamma$$. For all $$\eps>0$$, let $$D_{\eps}=(1+\eps)D$$ so $$D_{\eps}$$ is compact, convex, symmetric about the origin, and $$\vol(D)>2^n\vol_\Gamma$$. Since $$\Gamma$$ is discrete, we know that $$D_{\eps}\cap\Gamma$$ is finite for all $$\eps>0$$. Now, note that for $$\eps<\eps'$$ we have $$D_{\eps}\cap L\subseteq D_{\eps'}\cap L$$. Because all these sets are finite, we can only lose points for so long. Hence, for suffficiently small $$\eps$$, all $$D_{\eps}\cap\Gamma$$ are the same, so there exists some nonzero $$\ell\in\Gamma\cap_{\eps>0}D_{\eps}$$, but since $$D$$ is closed, $$\cap_{\eps>0}D_{\eps}=D$$, so the lemma holds. $$\square$$
</div>
<p>Awesome, now we handle the main theorem with the further assumption that <script type="math/tex">D</script> has strictly greater volume.</p>
<div class="proof3">
Pf: Assume $$D,\Gamma$$ satisfy all the conditions of Minkowski's theorem, and that $$\vol(D)>2^n\vol_\Gamma$$. Pick some $$\Z$$-basis $$\mathbb e=\{e_i\}_{i=1}^n$$ of $$\Gamma$$, and let $$P:=\left\{\sum_{i=1}^nt_ie_i:-1\le t_i< 1\right\}$$. Note that $$\vol(P)=2^n\vol_\Gamma$$, and that $$\R^n=\bigsqcup_{\ell\in\Gamma}(P+2\ell)$$. Because $$D$$ is bounded, $$D_\ell:=D\cap(P+\ell)$$ is nonempty for only finitely many $$\ell\in\Gamma$$. Now, consider translates $$D_\ell':=D_\ell-\ell\subseteq P$$ and note that<br />
<center>
$$\begin{align*}
\sum_{\ell\in\Gamma}\vol(D_\ell')
= \sum_{\ell\in\Gamma}\vol(D_\ell)
= \vol(D)
> 2^n\vol_\Gamma
= \vol(P)
\end{align*}$$
</center>
Thus, there must exist distinct $$\ell_1,\ell_2\in2\Gamma$$ such that $$D_{\ell_1}'\cap D_{\ell_2}'\neq\emptyset$$ so pick some $$x\in D_{\ell_1}$$ and $$y\in D_{\ell_2}$$ s.t. $$x-\ell_1=y-\ell_2$$. Note that $$(x-y)/2$$ is nonzero by assumption and in $$D$$ by convexity and symmetry about the origin. At the same time<br />
<center>
$$\begin{align*}
\frac{x-y}2 = \frac{\ell_1-\ell_2}2 \in \frac12(2\Gamma) = \Gamma
\end{align*}$$
</center>
Thus, the theorem holds. $$\square$$
</div>
<h1 id="real-embeddings-of-quadratic-number-fields">Real Embeddings of Quadratic Number Fields</h1>
<p>Now we know some things about about quadratic number fields, and we know a little about the geometry of numbers, so let’s put the two together. The bridge between abstract fields and more concrete geometric ideas will be embeddings.</p>
<blockquote>
<p>Definition<br />
A <strong>real embedding of a number field</strong> <script type="math/tex">K</script> is a ring homomorphism <sup id="fnref:26"><a href="#fn:26" class="footnote">30</a></sup> <script type="math/tex">K\hookrightarrow\R</script>.</p>
</blockquote>
<p>If you’ll recall, in Pell’s equations, we have some <script type="math/tex">d>1</script>, so consider such a <script type="math/tex">d</script> and a number field <script type="math/tex">K=\Q(\sqrt d)</script>. This field has 2 real embeddings. An embedding is determined by where it sends <script type="math/tex">1</script> and <script type="math/tex">\sqrt d</script> (why?), but <script type="math/tex">1</script> must map into <script type="math/tex">1</script>. However, <script type="math/tex">\sqrt d</script> has two possible images corresponding to the two solutions to <script type="math/tex">x^2-d=0</script>. In fact, if <script type="math/tex">\sigma:K\hookrightarrow\R</script> is an embedding, then <script type="math/tex">\sigma(x^2-d)=\sigma(x)^2-d</script> so <script type="math/tex">\sigma(\sqrt d)^2-d=\sigma((\sqrt d)^2-d)=\sigma(0)=0</script>. This is all to say that a real quadratic field has two real embeddings. Because these are equivalent as far as algebra is concerned, instead of choosing one arbitarily, we’ll make use of both. Define the function <sup id="fnref:27"><a href="#fn:27" class="footnote">31</a></sup></p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\iota: &K &\longrightarrow& \R^2\\
&\alpha &\longmapsto& (\alpha, \conj\alpha)\\
&a+b\sqrt d &\longmapsto& (a+b\sqrt d, a-b\sqrt d) && a,b\in\Q
\end{matrix} %]]></script>
<blockquote>
<p>Theorem<br />
For a real quadratic number field <script type="math/tex">K</script>, <script type="math/tex">\iota(\ints K)</script> is a lattice.</p>
</blockquote>
<div class="proof3">
Pf: Proof left as an exercise for the reader. $$\square$$
</div>
<p>A natural question to ask is what is the volume of <script type="math/tex">\iota(\ints K)</script>? Once again, fix some square free <script type="math/tex">d>1</script> and let <script type="math/tex">K=\Q(\sqrt d)</script>. Then,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\vol_{\iota(\ints K)} &=&
\begin{vmatrix}
\iota(1) \\
\iota(\sqrt d)
\end{vmatrix} &=&
\begin{vmatrix}
1 & 1\\
\sqrt d & -\sqrt d
\end{vmatrix} &=&
|-2\sqrt d| &=&
2\sqrt d
&& \text{if }d\equiv2,3\pmod4\\
\vol_{\iota(\ints K)} &=&
\begin{vmatrix}
\iota(1) \\
\iota\left(\frac{1+\sqrt d}2\right)
\end{vmatrix} &=&
\begin{vmatrix}
1 & 1\\
\frac{1+\sqrt d}2 & \frac{1-\sqrt d}2
\end{vmatrix} &=&
|-\sqrt d| &=&
\sqrt d
&& \text{if }d\equiv1\pmod4
\end{matrix} %]]></script>
<p>Motivated by this calculation, we make the following definition</p>
<blockquote>
<p>Definition<br />
The <strong>discriminant</strong> of a real quadratic number field <script type="math/tex">K=\Q(\sqrt d)</script> is <script type="math/tex">4d</script> if <script type="math/tex">d\equiv2,3\pmod4</script> and <script type="math/tex">d</script> if <script type="math/tex">d\equiv1\pmod4</script>. Depending on how we feel, we may denote this <script type="math/tex">\disc(K), \zdisc(\ints K),</script> or <script type="math/tex">D_K</script>.</p>
</blockquote>
<p>Note that <script type="math/tex">\vol_{\ints K}:=\vol_{\iota(\ints K)}=\sqrt{\zdisc(\ints K)}</script>. Furthermore, the following is a neat result <sup id="fnref:28"><a href="#fn:28" class="footnote">32</a></sup></p>
<blockquote>
<p>Theorem<br />
If <script type="math/tex">K=\Q(\sqrt d)</script> is a real quadratic number field, then<br /></p>
</blockquote>
<center>$$\begin{align*}
\ints K \simeq\frac{\Z[X]}{(X^2-D_KX+(D_K^2-D_K)/4)}\simeq\zadj{\frac{D_K+\sqrt{D_K}}2}
\end{align*}$$</center>
<div class="proof3">
Pf: Exercise to the reader. Just check both cases separately. $$\square$$
</div>
<p>Recall that solutions to Pell’s equations correspond to units of <script type="math/tex">\ints K</script>, so the goal of this section is to understand the structure of these units. These units form a multiplicative group <script type="math/tex">\ints K^\times</script> <sup id="fnref:29"><a href="#fn:29" class="footnote">33</a></sup> so we’ll “embed” these in <script type="math/tex">\R^2</script> in a way that makes use of this multiplicative structure. Define</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
\ell: &\ints K^\times &\longrightarrow& \R^2\\
&u &\longmapsto& (\log|u|, \log|\conj u|)\\
\\
h: &(\R^\times)^2 &\longrightarrow& \R^2\\
&(x,y) &\longmapsto& (\log|x|, \log|y|)
\end{matrix} %]]></script>
<p>Note that <script type="math/tex">\ell=h\circ\iota\mid_{\ints K^\times}</script>. Furthermore, since <script type="math/tex">N(u)=u\conj u=\pm1</script> for any unit, <script type="math/tex">\iota(\ints K^\times)</script> lies on the hyperbola <script type="math/tex">xy=\pm1</script>, since <script type="math/tex">\log</script>’s turn multiplication into addition <sup id="fnref:30"><a href="#fn:30" class="footnote">34</a></sup> this means that <script type="math/tex">\ell(\ints K^\times)</script> lies on the line <script type="math/tex">x+y=0</script>. The picure looks something like</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/embed.jpg" width="700" height="100" />
</center>
<p>Now, stare at <script type="math/tex">\ell</script> long enough for you to be convinced that <script type="math/tex">\ker(\ell)=\{\pm1\}</script>. By the first isomorphism theorem, this means that</p>
<script type="math/tex; mode=display">\begin{align*}
\ell(\ints K^\times) \simeq \frac{\ints K^\times}{\{\pm1\}}
\end{align*}</script>
<p>Luckily for us, <script type="math/tex">\ints K^\times</script> is abelian <sup id="fnref:31"><a href="#fn:31" class="footnote">35</a></sup> so we can write</p>
<script type="math/tex; mode=display">\begin{align*}
\zmod2\oplus\ell(\ints K^\times) \simeq \{\pm1\}\oplus\ell(\ints K^\times) \simeq \ints K^\times
\end{align*}</script>
<p>Thus, once we figure out what <script type="math/tex">\ell(\ints K^\times)</script> is, we will perfectly understand the structure of units in real quadratic fields. As it turn out, this group will be infinite cyclic, but we’ll first show it’s discrete. I’ve already been kind of loose with this, but just keep in mind that <script type="math/tex">K</script> is of the form <script type="math/tex">\Q(\sqrt d)</script> for some square free <script type="math/tex">d\in\Z_{>1}</script>; I won’t always specify this.</p>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\ell(\ints K^\times)</script> is discrete.</p>
</blockquote>
<div class="proof3">
Pf: Fix any $$f\in\R_{\ge0}$$. We will show that $$\ell^{-1}([-r,r]^2)\subseteq\ints K^\times$$ is finite. Consider an arbitrary $$u\in\ints K^\times$$ s.t. $$\ell(u)\in[-r,r]^2$$. This means that $$(\log|u|,\log|\conj u|)\in[-r,r]^2$$ so $$|u|,|\conj u|\in[e^{-r},e^r]$$ which means that $$|u+\conj u|\le2e^r$$ and $$|u\conj u|\le e^{2r}$$. Now, notice that $$u$$ must satisfy the following equation<br />
<center>
$$\begin{align*}
X^2 - aX + b = 0 && a=u+\conj u\in\Z, b=u\conj u\in\Z
\end{align*}$$
</center>
There are only finitely many choices for $$a,b$$, and each choice corresponds to at most 2 such $$u$$, so there are only finitely many such $$u$$. $$\square$$
</div>
<p>Now, remember earlier when I said that <script type="math/tex">\ell(\ints K^\times)</script> lies on the line <script type="math/tex">x+y=0</script>? Well, the previous theorem says this it is a discrete subgroup of this line. This line is a 1-dimensional real vector space, and there aren’t many discrete subgroups of such a space. In fact, there are two: <script type="math/tex">\{0\}</script> and <script type="math/tex">\Z</script>. Thus, if we can show that <script type="math/tex">\ell(\ints K^\times)</script> has more than one element, then we will show this it must be <script type="math/tex">\Z</script> which will in turn show that Pell’s equations have infinitely many solutions. Even more than this, this will show that there exists an <script type="math/tex">\eps\in\ints K^\times</script> such that any unit has the form <script type="math/tex">\pm\eps^n</script> for some <script type="math/tex">n\in\Z</script>, so all solutions to Pell’s equations are generated from a single solution! Getting ahead of myself because this is all still conjecture at this point, we will call such an <script type="math/tex">\eps</script> a <strong>fundamental unit</strong> of <script type="math/tex">\ints K</script>.</p>
<p>The idea is to find to elements of <script type="math/tex">\ints K</script> that are equal in norm but not absolute value. Then, they must differ by a unit <script type="math/tex">u</script>, and <script type="math/tex">\ell(u)\neq0</script> (why?), so there’s some nonzero elment which means the group must be <script type="math/tex">\Z</script>, and we win.</p>
<p>When writing the below proof, I got kinda lost in the details. To help me remember what everything is, and what’s going on, I quickly put together the following image. It’s not labelled or anything, but it illustrates the <script type="math/tex">\alpha_\lambda</script> (the x-coordinate of the green point) we are going to find, and why we can bound it’s absolute value both above and below.</p>
<center>
<img src="https://nivent.github.io/images/blog/pell-equations/proof.jpg" width="400" height="100" />
</center>
<blockquote>
<p>Theorem<br />
<script type="math/tex">\ell(\ints K^\times)\simeq\Z</script></p>
</blockquote>
<div class="proof3">
Pf: Fix a real number $$\lambda>0$$ and consider the box $$B=[-\lambda,\lambda]\times[-\sqrt{D_K}/\lambda,\sqrt{D_K}/\lambda]$$. It is symmetric about the origin, compact, convex, and has area $$4\sqrt{D_K}=2^2\vol_{\ints K}$$. Our time looking at the geometry of numbers tells us that this means there's a nonzero $$\alpha_\lambda\in\ints K$$ with $$\iota(\alpha_\lambda)\in B$$. Note that this means $$|\knorm(\alpha_\lambda)|\le\sqrt{D_K}$$ as the norm of $$\alpha_\lambda$$ is the product of the coordinates of $$\iota(\alpha_\lambda)$$. Furthermore, $$\knorm(\alpha_\lambda)\ge1$$ since it's a nonzero (rational) integer so $$\iota(\alpha_\lambda)$$ lies outside of the hyperbola $$|xy|=1$$. This hyperbola intersects $$B$$ at the top and bottom when $$|x|=\lambda/\sqrt{D_K}$$ so we've shown that<br />
<center>
$$\begin{align*}
\lambda/\sqrt{D_K} \le |\alpha_\lambda| \le \lambda
\end{align*}$$
</center>
as $$\alpha_\lambda$$ is the $$x$$-coordinate of $$\iota(\alpha_\lambda)\in B$$. Now pick another $$\lambda'>\lambda\sqrt{D_K}$$ and we can find some $$\alpha_{\lambda'}$$ with <br />
<center>
$$\begin{align*}
|\alpha_\lambda| \le \lambda < \lambda'/\sqrt{D_K} \le |\alpha_{\lambda'}|
\end{align*}$$
</center>
Now, we just keep on doing this, producing a sequence $$\{\alpha_{\lambda_n}\}_{n=1}^\infty\subset\ints K$$ with the property that $$|\alpha_{\lambda_i}|<|\alpha_{\lambda_j}|$$ whenever $$i < j$$, and $$|\knorm(\alpha_{\lambda i})|\le\sqrt{D_K}$$ for all $$i$$. Now, all these norms are bounded integers, so there's only finitely many possible distinct values among them, but there are infinitely many $$\alpha$$'s in our sequence. Thus, there must exist some $$i\neq j$$ such that $$\knorm(\alpha_{\lambda_i})=\knorm(\alpha_{\lambda_j})$$ but $$|\alpha_{\lambda_i}|\neq|\alpha_{\lambda_j}|$$ so $$\alpha_{\lambda_i}=u\alpha_{\lambda_j}$$ for some unit $$u\in\ints K^\times$$ not equal to $$\pm1$$. Thus, $$\ell(u)\neq0$$, and the theorem follows. $$\square$$
</div>
<p>I should mention by appealing to absolute value, the above proof implicitly fixes a choice of an embedding <script type="math/tex">K\hookrightarrow\R</script>. It doesn’t really matter which one is used, but worth noting what’s going on behind the scenes.</p>
<h1 id="pell-at-last">Pell at Last</h1>
<p>Well, we’ve gone over a lot, and if you’re still here, kudos to you <sup id="fnref:33"><a href="#fn:33" class="footnote">36</a></sup>, but we’re finally ready to actually solve Pell’s equations. Fix any square free <script type="math/tex">d\in\Z_{>1}</script>. Integer solutions to the equation <script type="math/tex">x^2-dy^2=1</script> are units of <script type="math/tex">\ints{\Q(\sqrt d)}</script>, and these units are all in the form <script type="math/tex">\pm\eps^n</script> for some fundamental unit <script type="math/tex">\eps</script>. In order to call this equation solved, we only need to find a fundamental unit. I’ll handle the case that <script type="math/tex">d\equiv2,3\pmod4</script>. The other case can be done analagously, and figuring out its details is left as an exercise.</p>
<p>Assume <script type="math/tex">d\equiv2,3\pmod4</script> and <script type="math/tex">\eps</script> is a fundamental unit of <script type="math/tex">K:=\Q(\sqrt d)</script>. Then, <script type="math/tex">-\eps,\eps^{-1}</script>, and <script type="math/tex">-\eps^{-1}</script> are all fundamental units as well <sup id="fnref:34"><a href="#fn:34" class="footnote">37</a></sup>. Write <script type="math/tex">\eps=a_1+b_1\sqrt d</script> with <script type="math/tex">a_1,b_1\in\Z^+</script>. We can always get positive coefficients by appropriately choosing one of the four fundamental units. Now let <script type="math/tex">\eps^k:=a_k+b_k\sqrt d</script> be the positive powers of <script type="math/tex">\eps</script> and note that <script type="math/tex">b_k=a_1b_{k-1}+b_1a_{k-1}</script> so the sequence <script type="math/tex">\{b_k\}</script> is increasing. Thus, if you want to find a fundamental unit, just guess and check. Start with <script type="math/tex">b_1=1</script> and check to see if <script type="math/tex">db_1^2\pm1</script> is a perfect square. If not, move on to <script type="math/tex">b_1=2</script> and repeat. Once you’ve found a value that works, write <script type="math/tex">nb_1^2\pm1=a_1^2</script> and your fundamental unit is <script type="math/tex">a_1+b_1\sqrt d</script>.</p>
<blockquote>
<p>Example<br />
Let <script type="math/tex">d=11</script>. If we take <script type="math/tex">b_1=1</script>, then <script type="math/tex">11b_1^2\pm1=\{10,12\}</script> so no good. If we take <script type="math/tex">b_1=2</script>, then <script type="math/tex">11b_1^2\pm1=\{45,43\}</script> so still no luck. Now we try <script type="math/tex">b_1=3</script> to get <script type="math/tex">11b_1^2\pm1=\{100,98\}</script> and we have a winner. Our fundamental unit is <script type="math/tex">10+3\sqrt{11}</script>. Indeed, <script type="math/tex">10^2-11*3^2=1</script> is a solution to Pell’s equation.</p>
</blockquote>
<blockquote>
<p>Example<br />
Now, take <script type="math/tex">d=2</script> instead. If we let <script type="math/tex">b_1=1</script>, then <script type="math/tex">2b_1^2\pm1=\{1,3\}</script> so our fundamental unit is <script type="math/tex">\eps=1+\sqrt 2</script>. However, this has norm <script type="math/tex">1-2=-1</script> so it’s not a solution to Pell’s equation. In cases like this, we instead focus our attention on <script type="math/tex">\eps^2=3+2\sqrt2</script> and use this to generate solutions.</p>
</blockquote>
<p>I’d like to say that’s everything, but I’ve left a few loose ends. These include what to do if <script type="math/tex">d</script> isn’t square free, and what about the case where <script type="math/tex">d\equiv1\pmod4</script> so the fundamental unit can have non-integer coefficients. Honestly, I wanted to take care of them myself, but this post became much longer than I anticipated, so I’ll leave them to you. I will say that they have similar resolutions. The main issue in both cases is that <script type="math/tex">\Z[\sqrt d]</script> may not be all of <script type="math/tex">\ints K</script>. However, it can be shown that in general, <script type="math/tex">\Z[\sqrt d]</script> has finite index in <script type="math/tex">\ints K</script>. This means in particular that it is still infinite cyclic (why?), and so we still can find a fundamental unit <script type="math/tex">\eps\in\Z[\sqrt d]^\times</script>. Then, solutions to Pell’s equation either correspond to powers of <script type="math/tex">\eps</script> or even powers of <script type="math/tex">\eps</script> depending on if <script type="math/tex">\knorm(\eps)=\pm1</script>.</p>
<div class="footnotes">
<ol>
<li id="fn:21">
<p>Just finished day one of writing this, and it looks like this will end up being my longest post yet by a sizeable amount. I could be wrong about the sizeable amount part (hard to tell), but either way, this post is gonna dethrone <a href="../surreal-numbers">the king</a>… Having just now finished writing this thing, something tells me it will hold the title of longest post for quite some time. In case anyone’s curious, the previous record holder had a little under 26,650 characters. This one has around 45,000. <a href="#fnref:21" class="reversefootnote">↩</a></p>
</li>
<li id="fn:19">
<p>I usually try not to have prereqs for gaining understanding from my posts, but for this one, I feel like you should at least be comfortable with linear algebra (and in particular abstract vector spaces and determinants), or you’ll likely be lost at some key points. Once we start talking about embeddings, a little bit of abstract algebra will help too (in particular, knowing about group homomorphisms) <a href="#fnref:19" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>this statement can be made precise and proven. I might do that if ever I come up with a good excuse to introduce computability theory on this blog. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>another way to do it is hinted at at the beginning of the second half of that post. You just need to use the fact that the norm N(a+bi)=a^2+b^2 of a Guassian integer is multiplicative. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:37">
<p>because I forgot it was possible for it not to be the case <a href="#fnref:37" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>I should mention that, as is not <a href="../fundamental-theorem">uncommon</a> in this blog, this post won’t necessarily present the simplest way to solve things, but instead opt for one that introduces interesting mathematics. Also, as always, minimal planning is done before I begin writing so almost certainly details will be missing or presented out of their usual order. It is up to the reader to reconstruct coherent arguments where this happens (it’s a good test of understanding) <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>from infinite to finite <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:38">
<p>some people will try to tell you that this is impossible, but do not be fooled. I believe in you, and I know you can let out a gasp if you so will it. <a href="#fnref:38" class="reversefootnote">↩</a></p>
</li>
<li id="fn:35">
<p>at this point, you might wonder why I didn’t just write the problem like this in the first place. That’s because the place I stole it from wrote it as y^2=x^3-2 originally as well. <a href="#fnref:35" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>if you don’t know what this is, it basically means that the fundamental theorem of arithmetic holds: every integer can be factored uniquely into primes <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>For example, let d^2=-5. Then, Z[d] is not a UFD since 2*3=(1+d)(1-d) and all of these are (different) primes <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>Not a technical term. What I mean is that if you have x,y in Z[-sqrt{2}], then x/y exists in Q(-sqrt{2}), and this is what I mean by their ambient quotient. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:36">
<p>If you write it as the product of two numbers, one of them is a unit <a href="#fnref:36" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>I’m gonna use some algrebraic words like field, ring, etc. in this section. If you don’t know what they are, that’s fine; I don’t think knowing their definition is technically required to understand how we’re gonna solve Pell’s equations. I’ll also try to include watered-down versions of what they mean for some of them. For example, a ring is a set with addition and multiplication. A field, has both of these plus division. Integers are a ring; fractions are a field. <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>A field extension K of Q (often written K/Q) is just a field K that contains Q (Ex. R is a field extension of Q). Every field extension has a degree d (written [K:Q]) which is a measure of much bigger K is than Q (formally speaking, if K/Q is a field extension, then K is a Q-vector space (why?) and the degree of K/Q is the Q-dimension of K as a vector space). We say a field extension is finite if it has finite degree (ex. Q(sqrt{-2}) is a finite field extension of Q (of degree 2) and hence a number field. R is and infinite field extension of Q and hence not a number field) <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
<li id="fn:11">
<p>Monic just means the leading coefficient is 1. The set Z[X] is just the set of polynomials with integer coefficients. An example of an algebraic integer is sqrt{-2} because it satisfied X^2 + 2 = 0 <a href="#fnref:11" class="reversefootnote">↩</a></p>
</li>
<li id="fn:12">
<p>That O is \mathscr and not \mathcal. This is important. <a href="#fnref:12" class="reversefootnote">↩</a></p>
</li>
<li id="fn:13">
<p>Hence the name quadratic <a href="#fnref:13" class="reversefootnote">↩</a></p>
</li>
<li id="fn:14">
<p>This is no coincidence, but we won’t get into the reason why here. <a href="#fnref:14" class="reversefootnote">↩</a></p>
</li>
<li id="fn:15">
<p>It’s true in general, but the general proof requires a property of the norm I won’t mention here since the norm isn’t always given as a nice product. We got lucky here that quadratic fields are in a sense “complete” (read: Galois) <a href="#fnref:15" class="reversefootnote">↩</a></p>
</li>
<li id="fn:22">
<p>In general, if L/K is an extension of number fields, then N_{L/K}(O_L) is a subset of O_K. In other words, the norm always maps algebraic integers into algebraic integers. <a href="#fnref:22" class="reversefootnote">↩</a></p>
</li>
<li id="fn:16">
<p>Although maybe you’ve already made the connection by now <a href="#fnref:16" class="reversefootnote">↩</a></p>
</li>
<li id="fn:17">
<p>If I’m wrong, we’ll introduce the other stuff as it pops up <a href="#fnref:17" class="reversefootnote">↩</a></p>
</li>
<li id="fn:18">
<p>Which reminds me, why don’t we consider the case d = 0 (mod 4) in the previous theorem? <a href="#fnref:18" class="reversefootnote">↩</a></p>
</li>
<li id="fn:20">
<p>Why not just call this dimension? Because lattices are free Z-modules, and free modules have rank instead of dimesnion. Modules are something I want to talk about on this blog at some point, but for now, just know that although lattices have geometric interpretations, modules (and even free modules) in general do not (unlike typical vector spaces), so we use the less geometric-sounding rank instead of dimension. <a href="#fnref:20" class="reversefootnote">↩</a></p>
</li>
<li id="fn:32">
<p>for our purposes, the plus in a circle is just another way of writing the direct product. It has the advantage of looking like addition which is good because the dimension of A x B is dim A + dim B instead of dim A * dim B. This notation helps hint at the idea that things should be thought of additively, so you might want to represent the pair (a,b) in A x B as a single value a+b instead (be warned. This is not always a legitimate alternate representation of pairs) <a href="#fnref:32" class="reversefootnote">↩</a></p>
</li>
<li id="fn:23">
<p>By numbers, I mean as in number theory. i.e. integers and their analouges <a href="#fnref:23" class="reversefootnote">↩</a></p>
</li>
<li id="fn:24">
<p>parrallelopiped? <a href="#fnref:24" class="reversefootnote">↩</a></p>
</li>
<li id="fn:25">
<p>meaning it contains all the points on its boundary <a href="#fnref:25" class="reversefootnote">↩</a></p>
</li>
<li id="fn:26">
<p>i.e. a function f that preserves addition and multiplication in the sense that f(ab)=f(a)f(b) and f(a+b)=f(a)+f(b) <a href="#fnref:26" class="reversefootnote">↩</a></p>
</li>
<li id="fn:27">
<p>there’s some abuse of notation going on here. In a number field, sqrt(d) is really just an abstract number whose square happens to be d whereas in the setting of real numbers (where there are two numbers matching this description) it is specifically the positive sqrt(d). <a href="#fnref:27" class="reversefootnote">↩</a></p>
</li>
<li id="fn:28">
<p>as far as I know, there isn’t some deep reason why this expression in particular works. It’s just a happy coincidence. I could be wrong though; coincidences are rare in math. <a href="#fnref:28" class="reversefootnote">↩</a></p>
</li>
<li id="fn:29">
<p>this basically just means if you multiply two units you get another unit, but the same doesn’t necessarily hold for addition <a href="#fnref:29" class="reversefootnote">↩</a></p>
</li>
<li id="fn:30">
<p>and since log(1)=0 <a href="#fnref:30" class="reversefootnote">↩</a></p>
</li>
<li id="fn:31">
<p>plus possibly an argument involving splitting O_K^x into “positive” and “negative” parts and mentioning phrases like <a href="https://www.wikiwand.com/en/Semidirect_product#/Relation_to_direct_products">semidirect product</a> (the argument I have in my head does this, but I’m pretty sure it’s overkill)… A different argument is to notice that O_K^x is an extension of {+/-1} and l(O_K^x). Hence, we get this conslusion by noticing letting N: O_K^x -> {+/-1} be the norm, the map a |-> N(a)*sign(a) is a <a href="http://www.math.uconn.edu/~kconrad/blurbs/linmultialg/splittingmodules.pdf">splitting map</a>. <a href="#fnref:31" class="reversefootnote">↩</a></p>
</li>
<li id="fn:33">
<p>no way I’d ever read this much math without losing interest and moving on to something else <a href="#fnref:33" class="reversefootnote">↩</a></p>
</li>
<li id="fn:34">
<p>if you fix an embedding into R, then the unique fundamental unit > 1 is often called <strong>the</strong> fundamental unit <a href="#fnref:34" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>I think this is going to end up being a long one 1, and possibly not the easiest post to follow that I’ve made; mostly because I will likely end up introducing a decent number of topics I haven’t talked about here before. I guess we’ll see how things turn out 2. Just finished day one of writing this, and it looks like this will end up being my longest post yet by a sizeable amount. I could be wrong about the sizeable amount part (hard to tell), but either way, this post is gonna dethrone the king… Having just now finished writing this thing, something tells me it will hold the title of longest post for quite some time. In case anyone’s curious, the previous record holder had a little under 26,650 characters. This one has around 45,000. ↩ I usually try not to have prereqs for gaining understanding from my posts, but for this one, I feel like you should at least be comfortable with linear algebra (and in particular abstract vector spaces and determinants), or you’ll likely be lost at some key points. Once we start talking about embeddings, a little bit of abstract algebra will help too (in particular, knowing about group homomorphisms) ↩Demystifying Emulators2017-07-12T00:00:00+00:002017-07-12T00:00:00+00:00https://nivent.github.io/blog/demystifying-emulators<p>I don’t remember when I first discovered emulators, but I remember thinking they were absolutely amazing. All of a sudden, I could play games from old systems I knew I would never own, replay games I had owned but lost, etc. I didn’t think much of them past what games I could play until I started getting more into serious about coding and realized that there must be some magic going on underneath the hood that’s making these things work. I tried imagining how they might work, but to no avail; they remained a black box. Because of that, I made it a goal of mine to write my own emulator one day, and as it turns out, <a href="https://github.com/NivenT/RGB">dreams do come true</a>.</p>
<p>Here, I want to talk a little bit about how emulators work <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, and about how they’re not magic after all. The basic idea behind emulators is surprisingly straightforward. They’re very aptly named in that they do exactly what they say; they emulate. An emulator works by reproducing in software the same functionality provided by the hardware. Then, when you feed that software the same input as you would the hardware, it does the same things, and you have a working game system right on your computer.</p>
<p>So the basic idea is to convert hardware into software, but what does that actually mean? I don’t really know a good single answer to that quation. Instead, it depends on the level of granularity you use. You could imagine you wanted to emulate a simple four-function calculator.</p>
<center><img src="https://www.staples-3p.com/s7/is/image/Staples/m001843923_sc7?$splssku$" width="250" height="250" /><br />image source: <a href="https://www.staples.com/Impecca-Standard-Function-Calculator-Black-Ivy/product_1480120">Stables</a></center>
<p>One way to do this would be to take a higher level approach where you let users type the corresponding keys on their keyboard, and implement the logic with a single <code class="highlighter-rouge">switch</code> statement. The workhorse of your code might look something like this<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">eval</span><span class="p">(</span><span class="n">lhs</span><span class="p">,</span> <span class="n">rhs</span><span class="p">,</span> <span class="n">op</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s">'+'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">+</span><span class="n">rhs</span><span class="p">,</span>
<span class="s">'-'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">-</span><span class="n">rhs</span><span class="p">,</span>
<span class="s">'/'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">/</span><span class="n">rhs</span><span class="p">,</span>
<span class="s">'*'</span><span class="p">:</span> <span class="n">lhs</span><span class="o">*</span><span class="n">rhs</span><span class="p">,</span>
<span class="p">}[</span><span class="n">op</span><span class="p">]</span>
</code></pre></div></div>
<p>At this point, this doesn’t feel much like an emulator, but it’s doing the same thing the calculator does so I’d call it one. If you wanted a lower level, more granular approach, you might end up writing something like this to simulate to underlying logic circuits used by the calculator</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">addBits</span><span class="p">(</span><span class="n">lhs</span><span class="p">,</span> <span class="n">rhs</span><span class="p">):</span>
<span class="n">total</span> <span class="o">=</span> <span class="n">lhs</span> <span class="o">^</span> <span class="n">rhs</span>
<span class="n">carry</span> <span class="o">=</span> <span class="n">lhs</span> <span class="o">&</span> <span class="n">rhs</span>
<span class="k">return</span> <span class="p">(</span><span class="n">total</span><span class="p">,</span> <span class="n">carry</span><span class="p">)</span>
</code></pre></div></div>
<p>In this example, thinking as low as this is overkill, but that’s not always the case. You will end up writing code that looks more like this than the <code class="highlighter-rouge">dictionary</code> lookup above, but you have to be careful not to think too granularly. I remember when I first heard this idea of emulators being software ports of hardware, I immediately starting imagining how to go about making classes for things as low level as the system bus and logic gates, and how to piece these together to get a working emulator. Sure, you could do something like that and make things work, but you don’t have to stay that faithful to the hardware. Now that we have the general idea covered, let’s dive into a working example</p>
<h1 id="curly-succotash">curly-succotash</h1>
<p>One of the simplest systems you can emulate that is still non-trivial is the <a href="https://www.wikiwand.com/en/CHIP-8">Chip-8</a>. Hence, for the purpose of this blog, I wrote a <a href="https://github.com/NivenT/curly-succotash">sample Chip-8 emulator</a> to use as a reference. Before getting into the details, I should preface things by saying that this emulator is flawed in many ways. Aside from not being very user friendly, and possibly slow, it’s main issue is that it was written in <a href="https://www.haskell.org/">Haskell</a>. I chose Haskell because I wanted to learn the language, and this seemed like a decently sized project for that, but given that Haskell is functional and so tries to avoid things like state and sequential logic, it’s not the ideal language for an emulator<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>. Also, Haskell isn’t super popular, and doesn’t have to most readable syntax to someone who doesn’t know it. That being said, let’s get started.</p>
<h1 id="starting-off">Starting Off</h1>
<p>With emulators, I like to think of the project as writing a CPU. Every other part of the machine is just a peripheral to help make sure the CPU is working properly. This mindset, while not always accurate, helps me focus my efforts and gives me a starting point from where to branch off other parts of the project. So we need to build a Chip-8 CPU. To start with that, we better make sure we have everything that the CPU needs to interact with so we can implement all its instructions. A quick Google search reveals that <a href="https://www.wikiwand.com/en/CHIP-8">Wikipedia has all the information we’ll need on one page</a>, so we can go there to see the various registers and whatnot that make up our system. I chose to emulate each component like so</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Chip8</span> <span class="o">=</span> <span class="kt">Chip8</span> <span class="p">{</span>
<span class="n">mem</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Word8</span><span class="p">],</span> <span class="c1">-- 4096 1-byte address</span>
<span class="n">regs</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Word8</span><span class="p">],</span> <span class="c1">-- 16 registers</span>
<span class="n">stack</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Int</span><span class="p">],</span> <span class="c1">-- 16(?) levels of addresses</span>
<span class="n">ptr</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span> <span class="c1">-- I register (usually stores memory address)</span>
<span class="n">pc</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">sp</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">delay_timer</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">sound_timer</span> <span class="o">::</span> <span class="kt">Int</span><span class="p">,</span>
<span class="n">keys</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Bool</span><span class="p">],</span> <span class="c1">-- 16 keys</span>
<span class="n">screen</span> <span class="o">::</span> <span class="kt">Map</span><span class="o">.</span><span class="kt">Map</span> <span class="p">(</span><span class="kt">Int</span><span class="p">,</span> <span class="kt">Int</span><span class="p">)</span> <span class="kt">Bool</span> <span class="c1">-- 64x32 pixels</span>
<span class="p">}</span> <span class="kr">deriving</span> <span class="p">(</span><span class="kt">Show</span><span class="p">)</span>
</code></pre></div></div>
<ul>
<li>The memory is just an array of values, and addresses are represented implicitly by index.</li>
<li>A register is just some value; it’s completely determined by the number it stores.</li>
<li>The simplest representation of a key is just a Bool saying whether it’s pressed or not. It’s also possible to explicitly store the mapping between computer keys and Chip-8 keys here, but I generally prefer to keep that separate.</li>
<li>The screen is a map from pairs of ints (pixel coordinates) to bools (pixel values: on/off). I originally had this as a 2d list, but that made some of the code more complicated than need be, so I changed it to this. Normally, a 2d array would be fine, but Haskell<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>.</li>
</ul>
<p>Once you have this framework for the pieces of the emulator in place, you can start making it do stuff <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>. In practice, this means implementing cpu instructions. Things like adding registers together or loading a value from memory into a register and stuff like that. The majority of instructions are fairly simple. Unfortunately, there are usually a lot of them which makes it easy to make mistakes, and boring to implement. Because of that, I like to only write a few at a time. I’ll implement a couple, run the code and have it throw an error whenever it comes accross an instruction it doesn’t know; then I’ll implement that instruction, rinse, wash, and repeat. This cycle makes sure you’re always working towards a working product, and let’s you catch other bugs early on.</p>
<p>One common bug is to mess up the initial state of the system. This will manifest itself in your emulator trying to execute the opcode for an instruction that doesn’t exist. I ran into that issue <a href="https://github.com/NivenT/curly-succotash/blob/d6ad6144acd739b8e7ab113cc38cbd24a4978161/emu.hs#L85">here</a> when I first tried implementing game loading. I had memory laid out like</p>
<script type="math/tex; mode=display">% <![CDATA[
\newcommand{\x}{\text{x}}
\begin{array}{| c | c | c |}
0\x0 & 0\x1 & 0\x2 & \dots & 0\x4F & 0\x50 & 0\x51 & 0\x52 & \dots & S & S+1 & S+2 & \dots\\\hline
f_0 & f_1 & f_2 & \dots & f_{79} & g_0 & g_1 & g_2 & \dots & 0 & 1 & 2 & \dots\\
\hline
\end{array} %]]></script>
<p>where <script type="math/tex">f_0,f_1,f_2,\dots,f_{79}</script> is the font data <sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>, <script type="math/tex">g_0,g_1,g_2,\dots,g_N</script> is the game data, and <script type="math/tex">S=0\x50+N+1</script> is the beginning of memory after all game data.</p>
<p>This is wrong for two reasons. First of all, the Chip-8 begins executing instructions from memory address <script type="math/tex">0\x200</script> so that’s where game data should begin. Secondly, the unused locations (<script type="math/tex">S</script> and beyond) should be populated with all <script type="math/tex">0</script>’s. I caught this bug because earlier versions of the emulator would try to execute a non-existent instruction. If I had implemented all instructions before testing and only seen the issue then, I would have assumed the issue was with an incorrectly implemented instruction and not been able to fix things as quickly.</p>
<p>The <a href="https://github.com/NivenT/curly-succotash/blob/master/emu.hs#L110">correct memory layout</a> is</p>
<script type="math/tex; mode=display">% <![CDATA[
\newcommand{\x}{\text{x}}
\begin{array}{| c | c | c |}
0\x0 & 0\x1 & \dots & 0\x4F & 0\x50 & 0\x51 & \dots & 0\x200 & 0\x201 & \dots & T & T+1 & \dots\\\hline
f_0 & f_1 & \dots & f_{79} & 0 & 0 & \dots & g_0 & g_1 & \dots & 0 & 0 & \dots\\
\hline
\end{array} %]]></script>
<p>where <script type="math/tex">T=0\x200+N+1</script> takes the role of <script type="math/tex">S</script>.</p>
<p>For actually implementing instructions, you need some kind of mapping from opcodes to the execution of the instruction itself. One way of doing this that works fairly well is to just use a <code class="highlighter-rouge">switch</code> statement like in the calculator emulator we started with. The nice thing about this is that, if done right <sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>, it’s simple, efficient <sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>, and <a href="https://github.com/NivenT/curly-succotash/blob/master/op.hs#L81">works</a>. However, it can also be a pain to maintain a giant switch statement, so you might want something more sophisticated. One things you can do is use a <a href="https://github.com/NivenT/RGB/blob/master/src/emulator/instructions.rs#L542">struct</a> to hold basic information about instructions that can replace you having to re-lookup all the details hidden in your switch statement <a href="https://github.com/NivenT/RGB/blob/master/src/emulator/emulator.rs#L326">when things go wrong</a>.</p>
<h1 id="aside-on-switch-statements">Aside on Switch Statements</h1>
<p>This section is pretty unrelated to the rest of the post, but is something interesting I wanted to talk about. I mentioned in a footnote that <code class="highlighter-rouge">switch</code> statements can be faster than a series of <code class="highlighter-rouge">if</code> and <code class="highlighter-rouge">else if</code>’s. This may seem counterintuitive because the two appear to be doing functionaly equivalent things, and the obvious way to implement a switch statement is with if and else if. Take a look at this code</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">use_switch</span><span class="p">(</span><span class="kt">int</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="k">switch</span><span class="p">(</span><span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">0</span><span class="p">:</span> <span class="n">func0</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="n">func1</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="n">func2</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">3</span><span class="p">:</span> <span class="n">func3</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="nl">default:</span> <span class="n">func4</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">use_if</span><span class="p">(</span><span class="kt">int</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func0</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func1</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func2</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">==</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
<span class="n">func3</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">func4</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>A priori these two functions should be carried out the same way, but the difference is that using a switch statement means you are comparing <code class="highlighter-rouge">int</code>s to decide what to do whereas an if statement could rely on any condition. This may not seem like much, but in practice it means that <code class="highlighter-rouge">use_switch</code> can be implemented with an array of function pointers instead of a series of ifs. Perhaps more clearly, it could expand to something like this<sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup></p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">FunctionPtr</span><span class="p">)();</span>
<span class="kt">void</span> <span class="nf">switch_expanded</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span>
<span class="n">FunctionPtr</span> <span class="n">arr</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="n">func0</span><span class="p">,</span> <span class="n">func1</span><span class="p">,</span> <span class="n">func2</span><span class="p">,</span> <span class="n">func3</span><span class="p">,</span> <span class="n">func4</span><span class="p">};</span>
<span class="n">arr</span><span class="p">[</span><span class="n">val</span> <span class="o"><</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">val</span> <span class="o">:</span> <span class="mi">4</span><span class="p">]();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Notice that these only makes use of a single condition but is functionally equivalent to the switch statement. As the number of cases in the switch grows, you can still only need to check a single condition whereas you’d (potentially) need to check all of them with if statements. Hence, you could think of the improvement of <code class="highlighter-rouge">switch</code> over <code class="highlighter-rouge">if</code> as being <script type="math/tex">O(1)</script> versus <script type="math/tex">O(n)</script>, but things aren’t this simple in practice.</p>
<h1 id="seeing-stuff--more-than-a-cpu">Seeing Stuff + More than a CPU</h1>
<p>Back to emulators… Ultimately, you want your emulator to do more than execute instructions. Specifically, from most to least important, you want it to display things, get user input, and produce sound. Displaying things can be tricky. Representing the state of the screen internally in the emulator and showing it to a user are two very different things. The best way to handle this I’ve found is to use OpenGl for rendering. You can draw a single rectangle that covers your entire screen and then give it a texture made from data from your emulator. I do not do that in curly-succotash because I could not figure out how to do textures with Gloss, and it seemed like overkill for what was supposed to be a simple project, but <a href="https://github.com/NivenT/RGB/blob/master/src/rendering.rs#L57">here’s a place I do do that</a>. Getting to the point that your emulator displays anything intelligible is a major step; displaying things correctly can be one of the harder parts to get right. As an example, <a href="https://github.com/NivenT/curly-succotash/blob/master/op.hs#L81">look at the relative complexity of the draw_sprite function to every other instruction in Chip-8</a>. User input is usually much more tame, and I have not yet figured out a good way to do sound.</p>
<p>With Chip-8, that’s basically all there is to it. Once you’ve implemented CPU instructions, the user can press buttons, and it beeps, you’re done. You can have a functioning emulator in only a few hundred lines of code, and no magic. If you tackle a larger system, then there’s more to do. You might have an actual GPU separate from the CPU, have more complicated timers or interrupts, different kinds of memory storage schemes, etc. However, in the end, it’s all the same thing: you’re just trying to get familiar enough with a system to be able to translate what it does to code.</p>
<h1 id="last-words">Last words</h1>
<p>Something you don’t want to do with an emulator is render the screen by manually drawing the individual pixels. This is needlessly slow and (potentially) pixelated when you could get better, faster results by just using a single rectangle + a texture. An example of what not to do would be</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">square</span> <span class="o">::</span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Picture</span>
<span class="n">square</span> <span class="n">r</span> <span class="n">c</span> <span class="o">=</span> <span class="kt">Polygon</span> <span class="o">$</span> <span class="n">map</span> <span class="n">f</span> <span class="p">[(</span><span class="n">r</span><span class="p">,</span><span class="n">c</span><span class="p">),</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">c</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">c</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">c</span><span class="p">)]</span>
<span class="kr">where</span> <span class="n">f</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">800.0</span><span class="o">/</span><span class="mf">64.0</span> <span class="o">*</span> <span class="p">(</span><span class="n">fromIntegral</span> <span class="n">c</span><span class="p">)</span> <span class="o">-</span> <span class="mf">400.0</span><span class="p">,</span> <span class="mf">600.0</span><span class="o">/</span><span class="mf">32.0</span> <span class="o">*</span> <span class="p">(</span><span class="n">fromIntegral</span> <span class="n">r</span><span class="p">)</span> <span class="o">-</span> <span class="mf">300.0</span><span class="p">)</span>
<span class="n">render_emu</span> <span class="o">::</span> <span class="kt">Chip8</span> <span class="o">-></span> <span class="kt">Picture</span>
<span class="n">render_emu</span> <span class="n">emu</span> <span class="o">=</span> <span class="n">pictures</span> <span class="o">.</span> <span class="kt">Map</span><span class="o">.</span><span class="n">foldrWithKey</span> <span class="n">draw_pixel</span> <span class="kt">[]</span> <span class="o">$</span> <span class="n">screen</span> <span class="n">emu</span>
<span class="kr">where</span> <span class="n">draw_pixel</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span> <span class="n">v</span> <span class="n">lst</span> <span class="o">=</span> <span class="kr">if</span> <span class="n">v</span> <span class="kr">then</span> <span class="p">(</span><span class="kt">Color</span> <span class="n">white</span> <span class="o">$</span> <span class="n">square</span> <span class="p">(</span><span class="mi">31</span><span class="o">-</span><span class="n">r</span><span class="p">)</span> <span class="n">c</span><span class="p">)</span><span class="o">:</span><span class="n">lst</span> <span class="kr">else</span> <span class="n">lst</span>
</code></pre></div></div>
<p>One thing that is helpful is comments. Naturally with an emulator, there will be things you have to do because of something specific requirement or implementation detail of the system you’re recreating. It’s really easy to forget these details later one, so it helps to remind yourself of them as you move forward.</p>
<p>Finally, remember not to take things too literally because that can complicate how you do stuff. When you see an instruction or component you need to implement, focus more on replicating its behavior than its hardware implementation or the wording used to describe it. An example of this is instruction <script type="math/tex">0\text{x}\text F00\text A</script> which in the Chip-8 which waits until a key is pressed, then stores that key in register <script type="math/tex">\text V0</script>. It’s easy to think of doing this with a while loop or something like that, but that blocks all other parts of your code from running (something more complicated than Chip-8 could have <a href="https://github.com/NivenT/RGB/blob/master/src/emulator/emulator.rs#L230">parts that run independently from the CPU</a>) and is needlessly complicated. Instead, you can implement this waiting behaviour with something a little more clever<sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup></p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- 0xFX0A Wait for a key press, then store key in VX (note: blocking operation)</span>
<span class="o">|</span> <span class="p">(</span><span class="o">.&.</span><span class="p">)</span> <span class="n">op</span> <span class="mh">0xf0ff</span> <span class="o">==</span> <span class="mh">0xf00a</span> <span class="o">=</span> <span class="kr">case</span> <span class="n">get_key</span> <span class="n">ks</span> <span class="kr">of</span>
<span class="kt">Just</span> <span class="n">i</span> <span class="o">-></span> <span class="kt">Left</span><span class="p">(</span><span class="n">rng</span><span class="p">,</span> <span class="n">emu</span><span class="p">{</span><span class="n">regs</span><span class="o">=</span><span class="n">rpl_nth</span> <span class="n">rs</span> <span class="n">x</span> <span class="o">$</span> <span class="n">fromIntegral</span> <span class="n">i</span><span class="p">})</span>
<span class="kt">Nothing</span> <span class="o">-></span> <span class="kt">Left</span><span class="p">(</span><span class="n">rng</span><span class="p">,</span> <span class="n">emu</span><span class="p">{</span><span class="n">pc</span><span class="o">=</span><span class="n">p</span><span class="o">-</span><span class="mi">2</span><span class="p">})</span>
</code></pre></div></div>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>I should probably mention that I’m no expert on emulators or really anything I’m gonna talk about in this post, so do with my advice what you will. Also, some of the advice I give is stuff I learned (i.e. stole) from other people. I’m too lazy to credit them everywhere I do it. Just know that if something I say seems like a well-thought out good idea, it probably wasn’t mine. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Ignore the fact that this calculator has a decimal point button, or a percent button, or whatever that thing in the top left is <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>And given that it’s not super popular, it’s not an ideal language for a blog post. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Probably a way to make it work w/o ugly code, but I’m still new <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>It’s common, for me at least, to be missing pieces the first time you do this. However, that’s fine. You’ll realize that you’re missing stuff when a CPU instruction requires it or later when trying to flesh out other parts of the emulator in the case of more complicated systems. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>The Chip-8 has 16 predefined sprites that are the hex digits <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>curly-succotash does not do this right. A better way to do it would be to switch on the first (hexadecimal) digit of the opcode, and then nest more switch statements if needed (at most 4 deep, but really only like 2 deep in practice) <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>It’s not uncommon for switch statements to be faster than sequences of if’s and else if’s <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>I made val unsigned to not have to deal with negative issues. It doesn’t really affect anything <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>Just decrement the program counter so this same instruction gets executed next timestep <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>I don’t remember when I first discovered emulators, but I remember thinking they were absolutely amazing. All of a sudden, I could play games from old systems I knew I would never own, replay games I had owned but lost, etc. I didn’t think much of them past what games I could play until I started getting more into serious about coding and realized that there must be some magic going on underneath the hood that’s making these things work. I tried imagining how they might work, but to no avail; they remained a black box. Because of that, I made it a goal of mine to write my own emulator one day, and as it turns out, dreams do come true.Fundamental Theorem of Algebra2017-06-18T18:12:00+00:002017-06-18T18:12:00+00:00https://nivent.github.io/blog/fundamental-theorem<p>One of the first “theorems” I heard about was The Fundamental Theorem of Algebra, and I remember being kind of drawn to it for a long time after first seeing it. I think this was less because of the statement of the theorem itself, and more because the word fundamental in its title made it seem really important and imposing <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Either way, I was convinced for a long time that it was somehow a mysterious theorem, that although easy to state, must have one of those impossible to understand, complicated proofs; the kind of thing that’s proved once via a lot of effort, and then is just applied afterwards without many people wanting to return to the proof because it’s just that out there. Despite this, my fascination with it made me determined to see and understand its proof once I became really good at/knowledgable of math. Luckily for me, I was wrong. The proof of the theorem is not arcane. In fact, there are <a href="https://www.amazon.com/Fundamental-Theorem-Algebra-Undergraduate-Mathematics/dp/0387946578">many</a> proofs of it, some of which even I can understand.</p>
<p>Before getting into a proof, let’s quickly state the theorem and then move on</p>
<blockquote>
<p>The Fundamental Theorem of Algebra<br />
If <script type="math/tex">p(x)=a_nx^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0</script> is any polynomial with complex coefficients, then <script type="math/tex">p(x)</script> has a zero in <script type="math/tex">\mathbb C</script>.</p>
</blockquote>
<h1 id="intro-to-mathbb-c">Intro to <script type="math/tex">\mathbb C</script></h1>
<p>If the idea of number <script type="math/tex">i</script> such that <script type="math/tex">i^2=-1</script> doesn’t frighten or anger you, then skip this section. If not, I’m going to somewhat quickly try to convince you that this is ok.</p>
<p>One way to think of complex numbers is to view them as a way of doing geometry via arithmetic. Let’s say, for example, you are making a 2D game, and in this game you probably want to keep track of positions of different objects, so you represent positions as points in the plane. Each object has some position <script type="math/tex">(x,y)\in\mathbb R^2</script>. You probably want objects to move, so along with a position, every object needs some velocity <script type="math/tex">(dx,dy)\in\mathbb R^2</script>. Now, you can move objects by adding their velocity to their position, so after one timestep their position becomes <script type="math/tex">(x+dx,y+dy)</script>. Simple enough. Objects in your game also rotate around each other<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>. Originally, you might handle this by having some angle <script type="math/tex">\theta</script> of rotation for an object, and then updating its poition via some complicated formula involving <script type="math/tex">\sin</script> and <script type="math/tex">\cos</script>. This is kinda messy, but then you remember how well representing things as points worked for moving things around before, and so you store rotations as a point <script type="math/tex">(\cos\theta,\sin\theta)</script> on the unit circle. You then need some operation <script type="math/tex">\cdot</script> such that <script type="math/tex">(x,y)\cdot(\cos\theta,\sin\theta)</script> gives the rotation of <script type="math/tex">(x,y)</script> (about the origin. To rotate about a different point, you just translate, rotate, then translate back). Once you do this, you’ll likely want to extend <script type="math/tex">\cdot</script> such that <script type="math/tex">(x,y)\cdot(a,b)</script> makes sense for all points in the plane, and not just onces where <script type="math/tex">(a,b)</script> is on the unit circle. Motivated by the fact that <script type="math/tex">5*(x,y)</script> scales <script type="math/tex">(x,y)</script> by a factor of 5, and <script type="math/tex">0.5*(x,y)</script> scales it by a factor of <script type="math/tex">1/2</script>, you say that <script type="math/tex">(x,y)\cdot(a,b)</script> rotates <script type="math/tex">(x,y)</script> by the angle <script type="math/tex">(a,b)</script> makes with the <script type="math/tex">x</script>-axis, and then scales us by the distance of <script type="math/tex">(a,b)</script> from the origin.</p>
<center><img src="https://nivent.github.io/images/blog/fund-theorem/mult.jpeg" width="400" height="100" /></center>
<p>This turns out to be pretty useful because it lets you combine two transformations into one, and this <script type="math/tex">\cdot</script> operation plays really nicely with adding points. In fact, if you do the math to work things out, you will see that <script type="math/tex">(x,y)\cdot(a,b)=(xa-yb,xb+ay)</script> which means that <script type="math/tex">(x,0)\cdot(y,0)=(xy,0)</script> so the <script type="math/tex">x</script>-axis is really just the real number line, and <script type="math/tex">(0,1)\cdot(0,1)=(-1,0)</script> so you have a number whose square is <script type="math/tex">-1</script>! By trying to create an arithmetic that allows us to do geometric transformations, we naturally find ourselves actually manipulating complex numbers where <script type="math/tex">a+bi\leftrightarrow(a,b)</script>. I probably should mention that complex number actually usually aren’t used for rotations and such in 2D games, but an extension of them called <a href="https://www.wikiwand.com/en/Quaternion">quaternions</a> are used for rotations in 3D games.</p>
<p>If that’s not convincing, then another perspective on complex numbers is that you are really just doing clock arithmetic when you work with them. When doing math with time, you wrap around every 12 (or 24) hours, so you are really just treating 12 as if it were 0, and then doing normal math (Ex. <script type="math/tex">4+10=14=12+2=2</script> so <script type="math/tex">10</script> hours past <script type="math/tex">4</script> is <script type="math/tex">2</script>). With complex numbers, you are doing something similar. You are doing normal math with polynomials (with real coefficients), except you treat the polynomial <script type="math/tex">x^2+1</script> as being zero. So, for example, when you say <script type="math/tex">(3+4i)(5-2i)=23+14i</script>, this is really because</p>
<script type="math/tex; mode=display">\begin{align*}
(3+4x)(5-2x) = 15 + 14x - 8x^2 = 15 + 14x - 8(x^2+1) + 8 = 23 + 14x
\end{align*}</script>
<p>Symbolically, in case you’ve studied some abstract algebra but not seen this,</p>
<script type="math/tex; mode=display">\begin{align*}
\mathbb C\simeq\frac{\mathbb R[x]}{(x^2+1)}
\end{align*}</script>
<h1 id="definitions-and-junk">Definitions and Junk</h1>
<p>Now that we have that out of the way, before moving on to the proof itself, we need to setup some notation, definitions, and lemmas, so let’s get to that. In the below definitions, <script type="math/tex">X</script> is an arbitrary subset of <script type="math/tex">\mathbb C</script>.</p>
<blockquote>
<p>Definition<br />
A <strong>path</strong> is a continuous function <script type="math/tex">f:[0,1]\rightarrow X</script>. Furthermore, if we have that <script type="math/tex">f(0)=f(1)</script>, then we call <script type="math/tex">f</script> a <strong>loop</strong> based at <script type="math/tex">f(0)</script>.</p>
</blockquote>
<p>An important thing to know about paths is that you can compose them. If you have two paths <script type="math/tex">f,g:[0,1]\rightarrow X</script> where <script type="math/tex">f(1)=g(0)</script>, then you can form a new path <script type="math/tex">g\cdot f:[0,1]\rightarrow X</script> where you first do <script type="math/tex">f</script>, then do <script type="math/tex">g</script>. In order to keep the domain <script type="math/tex">[0,1]</script>, you have to traverse <script type="math/tex">f</script> and <script type="math/tex">g</script> at twice the normal speed, but that’s really just a technicality<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<p>Note that for some reason I think in terms of paths more easily than I do in terms of loops, so although we’ll be dealing exclusively with loops here, I will often forget and say path instead.</p>
<blockquote>
<p>Notation<br />
Let <script type="math/tex">S^1=\{z\in\mathbb C:|z|=1\}</script> be the unit circle in the complex plane</p>
</blockquote>
<p>Quick remark: Notice that there is a 1-1 correspondence between loops and continuous circle functions <script type="math/tex">f:S^1\rightarrow X</script> since a circle is really just a line segment with its endpoints glued together. I may end up switching between these two perspectives during this post.</p>
<p>The proof of the fundamental theorem we’ll present is pretty dependent on loops. The basic idea is that if you have a polynomial without a zero, then you can find a “constant loop that circles the origin multiple times”. I use quotes because this is not exactly what we’ll show, but basically it. In either case, if a loop is constant it doesn’t move so there’s no way it could circle the origin even once, and so contradiction. We need a mathematically precise way of defining what it means to “circle the origin multiple times”, and for that, we’ll use a little homotopy theory.</p>
<blockquote>
<p>Definition<br />
Given two paths <script type="math/tex">f:[0,1]\rightarrow X</script> and <script type="math/tex">g:[0,1]\rightarrow X</script> with the same basepoints (i.e. <script type="math/tex">f(0)=g(0)</script> and <script type="math/tex">f(1)=g(1)</script>), a <strong>homotopy</strong> <script type="math/tex">H:[0,1]\times[0,1]\rightarrow X</script> from <script type="math/tex">f</script> to <script type="math/tex">g</script> is a continuous<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> function such that <script type="math/tex">H(t,0)=f(t)</script> and <script type="math/tex">H(t,1)=g(t)</script> for all <script type="math/tex">t\in[0,1]</script>, and <script type="math/tex">H(0,s)=f(0)</script> and <script type="math/tex">H(1,s)=f(1)</script> for all <script type="math/tex">s\in[0,1]</script>. If there exists a homotopy <script type="math/tex">H</script> from <script type="math/tex">f</script> to <script type="math/tex">g</script>, then we say <script type="math/tex">f</script> and <script type="math/tex">g</script> are <strong>homotopy equivalent</strong>, and denote this <script type="math/tex">f\sim g</script>.</p>
</blockquote>
<p>You can think of a homotopy as a continuous deformation from one path into the other. Something like this</p>
<center><img src="https://nivent.github.io/images/blog/fund-theorem/homotopy.gif" width="250" height="100" /></center>
<blockquote>
<p>Remark<br />
One important example of a homotopy is the one depicted above. This is the so-called <strong>staright line homotopy</strong>, and is the result of thinking of your paths as points and then drawing a line between them. For <script type="math/tex">f,g:[0,1]\rightarrow X</script> paths between the same points, you can define <script type="math/tex">H(t,s)=(1-s)f(t) + sg(t)</script>. This is almost always continuous.</p>
</blockquote>
<blockquote>
<p>Question<br />
When does the straight line homotopy fail to be a homotopy?</p>
</blockquote>
<blockquote>
<p>Exercise<br />
Show that homotopy equivalence is an equivalence relation.</p>
</blockquote>
<p>In this upcoming section, we’ll apply homotopy to loops to see that every loop around a circle has a well-defined number of times it goes around. This will then lead us to the proof of the theorem.</p>
<h1 id="circles-and-degrees">Circles and Degrees</h1>
<p>Here, we will study loops <script type="math/tex">f:[0,1]\rightarrow S^1</script> around the unit circle. In general, these things can behave annoyingly by stopping in place, backtracking, etc. so to get a handle on them, we’ll homotope all our paths into nice loops. To that end, let <script type="math/tex">\omega_n:[0,1]\rightarrow S^1</script> be the path <script type="math/tex">\omega_n(t)=e^{2t\pi in}</script> that goes around the unit circle <script type="math/tex">n</script> times where we made use of Euler’s formula.</p>
<p>Our goal is to show that any loop <script type="math/tex">f:[0,1]\rightarrow S^1</script> is homotopic to exactly one “nice” loop <script type="math/tex">\omega_n</script>. We will then let the degree of <script type="math/tex">f</script> be <script type="math/tex">\deg f=n</script>, and this will be our characterization of the number of times <script type="math/tex">f</script> travels around the unit circle <sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>. In order to do this, we’ll make use of a special function<sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup></p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{matrix}
p: &\mathbb R &\longrightarrow &S^1\\
&r &\longmapsto &\cos(2\pi r)+i\sin(2\pi r)
\end{matrix} %]]></script>
<p>What makes this function special is that is allows us to “lift” loops in <script type="math/tex">S^1</script> up to paths in <script type="math/tex">\mathbb R</script>. This function is far from injective, but it maps every unit interval in <script type="math/tex">\mathbb R</script> around the circle in a “nice” way. If we look at any (connected) neighborhood around a point on our circle, there are many disjoint copies of that neighborhood in <script type="math/tex">\mathbb R</script> that get mapped into it by <script type="math/tex">p</script>. This means that <script type="math/tex">p</script> in some sense has multiple local inverses of any neighborhood in <script type="math/tex">S^1</script>. These local inverses are what allow us to lift loops up to <script type="math/tex">\mathbb R</script>. Specifically,<sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup></p>
<blockquote>
<p>Lemma<br />
For any path <script type="math/tex">f:[0,1]\rightarrow S^1</script>, there exists a unique <strong>lift</strong> <script type="math/tex">\tilde f:[0,1]\rightarrow\mathbb R</script> such that <script type="math/tex">p\circ\tilde f=f</script> and <script type="math/tex">\tilde f(0)=0</script>.</p>
</blockquote>
<div class="proof2">
Pf: Let \(f:[0,1]\rightarrow S^1\) be a path. The remark I made above on local inverses can be said more formally as this: for any point \(x\in S^1\), there exists a neighborhood \(N\) of \(x\), called an elementary neighborhood, such that each path component of \(p^{-1}(N)\) is mapped homeomorphically onto \(N\). Let \(\{U_i\}_{i\in I}\) be a collection of elementary neighborhoods that cover \(S^1\), so \(\{f^{-1}(U_i)\}_{i\in I}\) is an open cover of the compact metric space \([0,1]\), which means it has some finite subcover \(\{V_j\}_{j=1}^n\subseteq\{f^{-1}(U_i)_{i\in I}\}\). Furthermore, it is a fact that I will not prove here that you can find a natural \(m\in\mathbb N\) such that each of the images \(f([0,1/m]),f([1/m,2/m]),\dots,f([(m-1)/m,1])\) is completely contained in some elementary neighborhood \(W_j\). To simplify notation, let \(x_j=j/m\) and \(I_j=[x_j,x_{j+1}]\). Now, we can lift \(f\) by lifting it piece by piece. For each \(j\in\{0,1\dots,m-1\}\), we can form a unique path \(g_j:I_j\rightarrow\mathbb R\) such that \(p\circ g_j=f\mid_{I_j}\) since \(f(I_j)\) is contained in an elementary neighborhood \(W_j\) which has exactly one "local inverse" \(V_j\) containing \(g_{j-1}(x_j)\) and so contains a unique path beginning at \(g_{j-1}(x_j)\) that lifts \(f\mid_{I_j}\) contained in \(U_j\). Thus, our unique lift of \(f\) is \(\tilde f=g_{m-1}g_{m-2}\dots g_2g_1g_0\). \(\square\)
</div>
<p>That may not have been put perfectly clearly because it’s a proof that is best digested with accompanying visuals, but I am not going through the trouble of making some. One thing I did not make explicit is that we take <script type="math/tex">g_{-1}(x_0)=0</script> in order to comply with <script type="math/tex">\tilde f(0)=0</script>. Another thing to keep in mind is that we form our lift by breaking the path up into small pieces, lifting those, then joining them together. If we get a piece <script type="math/tex">I_j</script> of our path small enough to be contained in an elementary neighborhood, then the fact that it has one local inverse containing the point our path left off at means there is a unique way to extend the path. This follows from the fact that each local inverse (i.e. path component) is mapped homeomorphically onto <script type="math/tex">W_j</script>, so there’s a unique lift for everything.</p>
<p>For the purpose of the secion, let <script type="math/tex">1+0i</script> be a distingueshed point in the sense that all loops around the circle begin and end there.</p>
<blockquote>
<p>Lemma<br />
Let <script type="math/tex">f:[0,1]\rightarrow S^1</script> be a loop based at <script type="math/tex">1</script>, and let <script type="math/tex">\tilde f</script> be its unique lift. Then, <script type="math/tex">\tilde f(1)</script> is an integer. We call this integer the degree of <script type="math/tex">f</script>.</p>
</blockquote>
<div class="proof2">
Pf: \(p(\tilde f(1))=f(1)=1\) so \(\tilde f(1)\in p^{-1}(1)=\mathbb Z\). \(\square\)
</div>
<p>If you notice, I just redefined degree, so we better hope these definitions are equivalent. Clearly, <script type="math/tex">\deg\omega_n=n</script> since <script type="math/tex">\tilde\omega_n</script> is just a straight path from <script type="math/tex">0</script> to <script type="math/tex">n</script>, so we will show these definitions are equivalent via the following lemmas</p>
<blockquote>
<p>Lemma<br />
The degree of a path is homotopy-invariant. That is, if <script type="math/tex">f\sim g</script>, then <script type="math/tex">\deg f=\deg g</script>.</p>
</blockquote>
<p>Before we get to the proof, let’s look at a picture of what’s going on here.</p>
<center><img src="https://nivent.github.io/images/blog/fund-theorem/lift.gif" width="500" height="200" /></center>
<p>We have a path <script type="math/tex">f</script> going around the circle (here <script type="math/tex">f=\omega_2</script>), and by using local inverses of <script type="math/tex">p</script>, we lift this to a path in <script type="math/tex">\mathbb R</script> from <script type="math/tex">0</script> to <script type="math/tex">2</script>. This captures the fact that this circle loop makes two full revolutions around the circle. The idea behind the proof is similar to the proof of paths having unique lefts. You essentially show that you can also lift homotopies, so if <script type="math/tex">f\sim g</script>, then <script type="math/tex">\tilde f\sim\tilde g</script> which means they have the same endpoints.</p>
<div class="proof2">
Pf: Exercise for the reader.
</div>
<blockquote>
<p>Lemma<br />
The converse holds: If <script type="math/tex">\deg f=\deg g</script>, then <script type="math/tex">f\sim g</script>.</p>
</blockquote>
<div class="proof2">
Pf: Let \(f,g:[0,1]\rightarrow S^1\) be loops such that \(\deg f=\deg g\). Let \(\tilde f,\tilde g:[0,1]\rightarrow\mathbb R\) be their respective lifts and note that \(\tilde f(1)=\tilde g(1)\). Let \(\tilde H:[0,1]\times[0,1]\rightarrow\mathbb R\) be the straight line homotopy \(\tilde H(t,s)=(1-s)\tilde f(t)+s\tilde g(t)\), and define \(H:[0,1]\times[0,1]\rightarrow S^1\) by \(H(t,s)=p\circ\tilde H(t,s)\). Then, \(H\) is continuous since it is a composition of continuous functions. Furthermore, \(H(t,0)=p\circ\tilde f(t)=f(t)\), \(H(t,1)=p\circ\tilde g(t)=g(t)\), \(H(0,s)=p(0)=1=f(0)=g(0)\) and \(H(1,s)=p(\tilde f(1))=1=f(1)=g(1)\) for all \(t,s\in[0,1]\). Thus, \(H\) is a homotopy so \(f\sim g\). \(\square\)
</div>
<blockquote>
<p>Remark<br />
We’ve just shown that any loop around the circle is completely characterized (up to homotopy which is really all that matters) by a single integer, the number of times it goes around. Furthermore, it is easily shown that this integer is additive in the sense that <script type="math/tex">\deg(fg)=\deg f+\deg g</script> (It’s enough to show this for the case that <script type="math/tex">f=\omega_n</script> and <script type="math/tex">g=\omega_m</script> which is obvious), so the structure of loops around the circle is the additive structure of the integers! This is pretty amazing, and can be used to prove some interesting stuff<sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup></p>
</blockquote>
<h1 id="proof-at-last">Proof at Last</h1>
<p>At this point, we’ve developed everything we need. Before we get to the proof, let’s “strengthen” our assumptions a little bit. Let <script type="math/tex">f_0(x)=a_nx^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0</script> be any polynomial. Note that we can divide through by <script type="math/tex">a_n</script> without changing the zeros of this polynomial, so we only need to investigate monic polynomials like <script type="math/tex">f_1(x)=x^n+b_{n-1}x^{n-1}+\dots+b_1x+b_0</script> where <script type="math/tex">b_i=a_i/a_n</script>. Furthermore, we can replace <script type="math/tex">x</script> with any invertible transformation, and although we change the zeros, we’re still able to recover all the ones we started with. Hence, we can pick <script type="math/tex">N\in\mathbb R</script> small enough that <script type="math/tex">% <![CDATA[
\mid Nb_{n-1}\mid+\dots+\mid N^{n-1}b_1\mid+\mid N^nb_0\mid<1 %]]></script> and then consider polynomials like <script type="math/tex">f_2(x)=N^nf_1(x/N)=x^n+c_{n-1}x^{n-1}+\dots+c_1x+c_0</script> where <script type="math/tex">c_i=N^{n-i}b_i</script>. This limits the type of polynomials enough that we can state the theorem as</p>
<blockquote>
<p>Fundamental Theorem of Algebra<br />
Let <script type="math/tex">f(x)=x^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0</script> be any polynomial with complex coefficients (whose degree <script type="math/tex">n>0</script>) such that <script type="math/tex">% <![CDATA[
\mid a_{n-1}\mid+\dots+\mid a_1\mid+\mid a_0\mid<1 %]]></script>. Then, there exists some <script type="math/tex">x_0\in\mathbb C</script> with <script type="math/tex">f(x_0)=0</script>.</p>
</blockquote>
<div class="proof2">
Pf: Suppose that \(f(x)\) has no zero in \(\mathbb C\), so we can regard \(f\) as a function from \(\mathbb C\) to \(\mathbb C-\{0\}\). Now, define a function \(g:S^1\rightarrow S^1\) by \(g(x)=\frac{f(x)}{\mid f(x)\mid}\), and note that we can equivalently view \(g\) as a loop in \(S^1\), so \(g\) has a well-defined degree. Let \(D=\{z\in\mathbb C:|z|\le1\}\) be the unit disc, and note that, representing complex numbers in polar form, we can similarly define
$$\begin{matrix}
G: &D &\longrightarrow &S^1\\
&re^{2\pi i\theta} &\longmapsto &\frac{f(re^{2\pi i\theta})}{\mid f(re^{2\pi i\theta})\mid} & 0\le r \le 1 &0\le\theta\le 1
\end{matrix}$$
so we can think of \(G\) as a function from \([0,1]\times[0,1]\rightarrow S^1\) (the first argument is \(r\) and the second \(\theta\)). Thus, defining \(H:[0,1]\times[0,1]\rightarrow S^1\) by \(H(t,s)=G(s,t)\) makes \(H\) a homotopy! Clearly, \(H(t,1)=G(1,t)=g(t)\) (where we view \(g\) as a loop instead of as a circle function) and \(H(t,0)=G(0,t)=f(0)/\mid f(0)\mid\) for all \(t\in[0,1]\), so \(g\) is homotopic to a constant function and \(\deg g=0\). However, we can also define the following
$$\begin{matrix}
H': &[0,1]\times[0,1] &\longrightarrow &S^1\\
&(t,s) &\longmapsto &\frac{z^n + s(a_{n-1}z^{n-1}+\dots+a_1z+a_0)}{\mid z^n + s(a_{n-1}z^{n-1}+\dots+a_1z+a_0)\mid} & z=e^{2\pi it}
\end{matrix}$$
This function is continuous since its the composition of a bunch of continuous functions, and it is well-defined since the denominator is never 0
$$\begin{align*}
\mid z^n + s(a_{n-1}z^{n-1}+\dots+a_1z+a_0)\mid
&\ge |z|^n - s|a_{n-1}z^{n-1}+\dots+a_1z+a_0|\\
&\ge |z|^n - s(|a_{n-1}||z^{n-1}|+\dots+|a_1||z|+|a_0|)\\
&= 1 - s(|a_{n-1}|+\dots+|a_1|+|a_0|)\\
&> 1 - s\\
&\ge 0
\end{align*}$$
Now, we just need to note that \(H'\) is a homotopy from \([t\mapsto z^n]=[t\mapsto e^{2\pi itn}]=\omega_n\) to \(g\), so \(\deg g=n\), but this is a contradiction, and hence our initial assumption about \(f\) having a zero must be wrong. \(\square\)
</div>
<blockquote>
<p>Corollary<br />
Let <script type="math/tex">f(x)</script> be a degree <script type="math/tex">n</script> polynomial with coefficients in <script type="math/tex">\mathbb C</script>. Then, <script type="math/tex">f</script> has exactly <script type="math/tex">n</script> (not necessarily distinct) zeros.</p>
</blockquote>
<div class="proof2">
Pf: By the theorem, \(f\) has some zero \(z_0\in\mathbb C\), so let's divide \(f\) by \(z-z_0\). Using long division, we get some polynomials \(q(z),r(z)\) such that \(f(z)=q(z)(z-z_0)+r(z)\) and \(\deg r(z)<\deg(z-z_0)=1\) or \(r(z)=0\) which means \(r(z)\) is a constant. Since \(0=f(z_0)=q(z_0)(z_0-z_0)+r(z_0)=r(z_0)\), we must have \(r(z)=0\) so \(f(z)=q(z)(z-z_0)\) and \(\deg q(z)=n-1\). Now just apply induction to get that \(q(z)\) has \(n-1\) zeros, so \(f(z)\) has \(n\) zeros. \(\square\)
</div>
<p>Finally, an exercise.</p>
<blockquote>
<p>Exercise<br />
Where does the argument for the main theorem fail if <script type="math/tex">f</script> has a zero? Since, <script type="math/tex">f</script> has exactly <script type="math/tex">\deg f</script> zeros, you can always find a closed disc on which <script type="math/tex">f</script> has no zero, so why don’t we always get a contradiction?</p>
</blockquote>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>I’ve seen many other fundamental theorems besides this one, and I am very confused by how a theorem gets to be called fundamental <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Maybe it’s a space game, or maybe you have enemies that circles a base to protect it, or maybe etc. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Once we introduce homotopy, we’ll have an equivalence relation on paths. This has the effect that the set of (equivalence classes of) loops based at a single point forms a group called the fundamental group of X. Secretly, this post is really just exploring the fundamental group of the circle. Without homotopy, composition of paths isn’t associative because of the whole doubling speed thing. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Throughtout this post, I will avoid the issue of defining what a continuous function is, because doing so properly requires defining a topology on a set and that’s just too out of the way for this post. You can think of continuity intuitively as meaning nearby inputs get mapped to nearby outputs <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>Since w_n*w_m=w_{n+m}, this will also show that the fundamental group of the circle is Z, the set of integers <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>Secretly, this is a covering function, and R is the universal covering space of S^1 <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>This proof may actually require some, but not much, background in topology. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>the fundamental theorem of algebra of course, but also, for example, Brouwer’s Fixed Point Theorem. Brouwer’s theorem then can be used to show the existence of Nash Equilibria in normal form games (think first form of Prisoner’s Dilema shown in my post on it), and from there you get like all of game theory. <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>One of the first “theorems” I heard about was The Fundamental Theorem of Algebra, and I remember being kind of drawn to it for a long time after first seeing it. I think this was less because of the statement of the theorem itself, and more because the word fundamental in its title made it seem really important and imposing 1. Either way, I was convinced for a long time that it was somehow a mysterious theorem, that although easy to state, must have one of those impossible to understand, complicated proofs; the kind of thing that’s proved once via a lot of effort, and then is just applied afterwards without many people wanting to return to the proof because it’s just that out there. Despite this, my fascination with it made me determined to see and understand its proof once I became really good at/knowledgable of math. Luckily for me, I was wrong. The proof of the theorem is not arcane. In fact, there are many proofs of it, some of which even I can understand. I’ve seen many other fundamental theorems besides this one, and I am very confused by how a theorem gets to be called fundamental ↩