\documentclass[%
version=last,%
a5paper,
10pt,%
headings=small,%
bibliography=totoc,%
twoside,%
reqno,%
cleardoublepage=empty,%
%parskip=half,%
draft=true,%
%DIV=classic,%
DIV=12,%
headinclude=false,%
pagesize]
{scrartcl}

%\usepackage[notcite,notref]{showkeys}

\usepackage[greek,english]{babel}

\usepackage[neverdecrease]{paralist}
\usepackage{hfoldsty,url,verbatim,relsize}

\usepackage{relsize} % Here \smaller scales by 1/1.2; \relscale{X} scales by X
\renewenvironment{quote}{\begin{list}{}
{\relscale{.90}\setlength{\leftmargin}{0.05\textwidth}
\setlength{\rightmargin}{\leftmargin}}
\item[]}
{\end{list}}

\usepackage{pstricks,pst-node,pst-tree}
\usepackage[all]{xy}
%\usepackage{auto-pst-pdf} % This is so that pdflatex works; but then one must run pdflatex -shell-escape (or pdflatex --enable-write18 in MikTeX).  See  http://tug.org/PSTricks/main.cgi?file=pdf/pdfoutput .  I get an error message when I run latex on the file without "-shell-escape".
\newcommand{\Treebox}[1]{\Tr{\psframebox{#1}}}

\usepackage{amsmath,amssymb,amsthm,upgreek}

\newcommand{\stnd}[1]{\mathbb#1}
\newcommand{\R}{\stnd R}
\newcommand{\N}{\stnd N}
\newcommand{\Z}{\stnd Z}
%\newcommand{\lto}{\Rightarrow}
\newcommand{\lto}{\rightarrow}
\newcommand{\liff}{\leftrightarrow}
\usepackage{mathrsfs}
\newcommand{\sig}{\mathscr S}
\newcommand{\str}[1]{\mathfrak#1}
\newcommand{\Forall}[1]{\forall#1\;}
\newcommand{\Exists}[1]{\exists#1\;}
\renewcommand{\leq}{\leqslant}
\newcommand{\included}{\subseteq}
\newcommand{\pincluded}{\subset}
\newcommand{\nincluded}{\nsubseteq}
\usepackage{bm}
%\newcommand{\class}[1]{\bm#1}
\newcommand{\gn}[1]{\ulcorner#1\urcorner}
\renewcommand{\phi}{\varphi}
\renewcommand{\theta}{\vartheta}

\newtheorem{theorem}{Theorem}
\newtheorem*{axiom}{Axiom}
\newtheorem*{lemma}{Lemma}
\newtheorem*{axsch}{Axiom Scheme}
\newtheorem*{corollary}{Corollary}

\renewcommand{\theequation}{\fnsymbol{equation}}

\begin{document}
\title{Induction and Recursion}
\author{David Pierce}
\date{April 5, 2013}
\maketitle
\tableofcontents

\section{Introduction}

In mathematics we use repeated activity in several ways:
\begin{compactenum}[1)]
\item
to define sets;
\item
to prove that all elements of those sets have certain properties;
\item
to define functions on those sets.
\end{compactenum}
These three techniques are often confused, but they should not be.
Clarity here can prevent mathematical mistakes; it can also
highlight important concepts and results such as Fermat's (Little)
Theorem, freeness in a category, and G\"odel's Incompleteness Theorem.
The main purpose of the present article is to show this.   

In the `Preface for the Teacher' of his
\emph{Foundations of Analysis} of 1929 \cite{MR12:397m},
Landau discusses the confusion just mentioned, but without full
attention to the logic of the situation.  The present article
may be considered as a sketch of how Landau's book might be updated.
It is indeed a sketch, and that only.  I make a number of historical
references, when I have been able to consult the cited articles; but
the article is not a thorough-going review of the history of the
mathematical ideas discussed.


\section{Number theory}

For an example of the three techniques, suppose we are given the field
$\R$ of real numbers. 
\begin{asparaenum}
\item
We may define the subset $\N$ of natural numbers by requiring that it
contain $1$ and that it contain $x+1$ for each of its elements $x$.   Thus if a real number cannot be shown to be in $\N$ by application of these rules, repeated as needed, then that number is \emph{not} in $\N$, by definition.
\item
We may show that $\N$ is closed under addition, since for each element $x$ of $\N$, we have by definition $x+1\in\N$, and moreover if
$y\in\N$ and $x+y\in\N$, then
\begin{equation}\label{eqn:+}
x+(y+1)=(x+y)+1, 
\end{equation}
which again is in
$\N$ by definition.  
Thus if $A$ is the set of elements $y$ of $\N$ such that, for all $x$ in $\N$, the sum $x+y$ is in $\N$, then $A$ contains $1$ and is closed under adding $1$.  Therefore $A=\N$, by definition of the latter.
Similarly, $\N$ is closed under multiplication,
since for all $x$ in $\N$ we have $x\cdot1=x$, and if $x\cdot y\in\N$, then
\begin{equation}\label{eqn:mult}
x\cdot(y+1)=x\cdot
y+x,
\end{equation}
which we now know to be in $\N$. 
\item
Similarly, if we have the Gamma function
\begin{equation*}
x\mapsto\int_0^{\infty}\frac{t^{x-1}}{\mathrm e^{t}}\;\operatorname dt
\end{equation*}
on $(0,\infty)$,
then we can show that $\N$ is closed under it, using integration by parts.  Alternatively, we can just define this operation on $\N$ by requiring $\Gamma(1)=1$ and $\Gamma(x+1)=\Gamma(x)\cdot x$, so that in general $\Gamma(x)$ is what is usually denoted by $(x-1)!$.
\end{asparaenum}
The definition of $\N$ here is \textbf{inductive}; the proof that $\N$
is closed under addition and multiplication is \textbf{inductive}; the second
definition of $\Gamma$ on $\N$ is \textbf{recursive.}
Alternatively, the definition of $\N$ can also be called recursive; or the second definition of $\Gamma$ can be called inductive.
But I shall argue that either of these alternatives is misleading. 

Our inductive definition of $\N$ may be considered as informal.
Formally, we may define $\N$ as the \emph{intersection} of the set of all
subsets of $\R$ that contain $1$ and are closed under adding $1$.
Alternatively, $\N$ is the \emph{union} of the set of subsets $A$ of $[1,\infty)$ such that $x-1\in A$ for every $x$ in $A\smallsetminus\{1\}$.
In any case $\N$ is the \emph{smallest} subset of $\R$ that contains $1$ and
is closed under adding $1$.  That is to say, $\N$ admits \textbf{proof
  by induction:} every subset $B$ of $\N$ is equal to $\N$, provided
we can show that $B$ contains $1$ and contains $x+1$ whenever it
contains $x$.  The whole point of the inductive definition of $\N$ is
to ensure that $\N$ admits inductive proofs. 

To avoid taking $\R$ for granted,
we may try a direct axiomatic approach to $\N$.  
 (This is the point of Landau's book.)
We can just declare
that $\N$ is a set that 
\begin{compactenum}[1)]
\item
has an element $1$,
\item
is closed under an operation $x\mapsto x+1$, and
\item
admits inductive proofs.
\end{compactenum}
Then we obtain the operations of addition and multiplication that we obtained before.
Indeed, we already know how to add $1$ to an element $x$ of $\N$: the result is simply $x+1$.  If we know what $x+y$ is, then we can use \eqref{eqn:+} as a definition of $x+(y+1)$.  So we have addition on $\N$.  We go on to define $x\cdot 1$ as $x$, and if we know what $x\cdot y$ is, we define $x\cdot(y+1)$ as in \eqref{eqn:mult}.

This is all true; and yet, in saying it this way, we have cheated.  For, it would be \emph{false} to say by analogy that we can make the definition
\begin{equation}\label{eqn:x^1}
x^1=x,
\end{equation}
and if we know what $x^y$ is, then we can make the definition
\begin{equation}\label{eqn:exp+1}
 x^{y+1}=x^y\cdot x.
\end{equation}
How can this be?
Though the reader may not yet be fully in the know, s/he may have observed that our axiomatic treatment of $\N$ has omitted two of Peano's axioms of 1889 \cite{Peano}: 
\begin{compactenum}[1)]\setcounter{enumi}3
\item
the operation $x\mapsto x+1$ is injective, but 
\item
its range does not contain $1$.
\end{compactenum}
These axioms turn out not to be needed for the definitions of addition and multiplication on $\N$; but they or at least \emph{something} more is needed for exponentiation on $\N$.

Again, how can this be?  Let us first observe that it \emph{is} so, by
noting that the Induction Axiom is available in modular arithmetic, although exponentiation as a binary operation is \emph{not} generally definable there.
Indeed, in the \emph{Disquisitiones Arithmeticae} of 1801
\cite[\P50]{Gauss}, 
which is apparently the origin of our notion of modular arithmetic,
Gauss reports that Euler's first proof of Fermat's
Theorem was as follows.  Let $p$ be a prime modulus.  Trivially
$1^p\equiv 1$ (with respect to $p$ or indeed any modulus).  If $a^p\equiv a$ (\emph{modulo} $p$) for some $a$, then, since
$(a+1)^p\equiv a^p+1$, we conclude $(a+1)^p\equiv a+1$.  This can be
understood as a perfectly valid proof by induction in the ring
with $p$ elements that we denote by $\Z/p\Z$: we have then proved
$a^p=a$ for all $a$ in this ring. 

However, Dyer-Bennet showed in 1940 \cite{MR0001234} that, with
respect to a modulus $n$, all congruences $a\equiv b$ and $c\equiv d$
entail the congruence $a^c\equiv b^d$ if and only if $n$ is one\footnote{Dyer-Bennet names G. Birkhoff as having suggested the problem of finding these $n$ and as having found them independently.  I found Dyer-Bennet's article through \emph{The on-line encyclopedia of integer sequences.}} of
$1$, $2$, $6$, $42$, and $1806$.  We conclude: 

\begin{theorem}\label{thm:mod}
For all $n$ in $\N$,
The finite ring $\Z/n\Z$ has a binary operation $(x,y)\mapsto
x^y$ satisfying the identities \eqref{eqn:x^1} and \eqref{eqn:exp+1} 
if and only if $n\in\{1,2,6,42,1806\}$.
\end{theorem}

Let us observe in passing that the sequence of moduli here arises from
what Mazur \cite{Mazur-Th} calls the \emph{self-proving}
formulation of Euclid's Proposition IX.20 in the \emph{Elements}
\cite{MR1932864}: give me some primes, and I'll give you another one
by multiplying yours together, adding $1$, and finding a prime divisor
of the result.  Indeed, 
the product of \emph{no} primes should be considered as $1$, and then:
\begin{align*}
&\phantom{{}={}}1+1=2,\text{ prime;}\\
&\phantom{{}={}}2+1=3,\text{ prime;}\\
2\cdot 3+1&=6+1=7,\text{ prime;}\\
2\cdot 3\cdot 7+1&=42+1=43,\text{ prime;}\\
2\cdot 3\cdot 7\cdot43+1&=1806+1=1807.
\end{align*}
It turns out that since $1807=13\cdot 139$, the sequence of moduli in the theorem stops,
although of course the set of primes continues to grow. 

Little discoveries like Theorem \ref{thm:mod} are a reason to begin the
natural numbers with $1$ rather than $0$.  When Henkin in 1960
\cite{MR0120156} made some of the observations of the present
article, he started the natural numbers with $0$ and noted in effect that
on $\Z/2\Z$, if \eqref{eqn:exp+1} (namely $ x^{y+1}=x^y\cdot x$) holds, then $0^y=0$ for all $y$, since $y=z+1$ for some $z$; in
particular $0^0\neq1$, so the equation $x^0=1$ fails.  

Of course Henkin's argument works in $\Z/n\Z$ for every $n$ that exceeds $1$.
Still, $\Z/n\Z$ always has an addition given by the identity
\eqref{eqn:+} above (namely $x+(y+1)=(x+y)+1$).
At the beginning of the \emph{Disquisitiones,} Gauss notes that
addition of integers respects congruence; but apparently he does not feel the need to prove it.
However, we may establish the identity \eqref{eqn:+} on $\Z/n\Z$ as
follows.  

We assume that we are given the operation $x\mapsto x+1$.
As an inductive hypothesis, we assume too that we `know' $x+b$ for some $b$; that is, we assume there is an operation $x\mapsto x+b$.  But this is not just any operation; it satisfies the identities 
\begin{align}\label{eqn:ih+}
1+y&=y+1,&(x+1)+y&=(x+y)+1
\end{align}
when $y=b$.
Note that these equations are vacuously true when $y=1$.
If we now use \eqref{eqn:+} when $y=b$ to define $x\mapsto x+(b+1)$,
then as a special case we have
\begin{equation*}
1+(b+1)=(1+b)+1,
\end{equation*}
so by the inductive hypothesis \eqref{eqn:ih+} we have $1+(b+1)=(b+1)+1$.  Similarly
\begin{align*}
(x+1)+(b+1)
&=((x+1)+b)+1&&\text{[by \eqref{eqn:+} when $y=b$]}\\
&=((x+b)+1)+1&&\text{[by \eqref{eqn:ih+} when $y=b$]}\\
&=(x+(b+1))+1.&&\text{[by \eqref{eqn:+} when $y=b$]}
\end{align*}
Thus \eqref{eqn:ih+} holds when $y=b+1$.
Therefore on $\Z/n\Z$, as on $\N$, for every $y$, there is at least one operation $x\mapsto x+y$ satisfying \eqref{eqn:ih+}.
All we have used to establish this is induction (along with the element $1$ and the operation $x\mapsto x+1$; but the Induction Axiom assumes that these exist).  

By induction also, each of the operations $x\mapsto x+y$ satisfying \eqref{eqn:ih+} is unique.  Indeed, suppose when $y=b$ there is one such function, but $f$ is another, that is, $f(1)=b+1$ and $f(x+1)=f(x)+1$.  Then by \eqref{eqn:ih+} when $y=b$ we have $1+b=f(1)$, and if $f(a)=a+b$, then $(a+1)+b=(a+b)+1=f(a)+1=f(a+1)$.  Thus $x+b=f(x)$ for all $x$, that is, $f$ is the function $x\mapsto x+y$.  

Now we have a unique operation $(x,y)\mapsto x+y$ satisfying \eqref{eqn:ih+}.  By looking back at the proof, we conclude that \eqref{eqn:+} is an identity.  Indeed, we used this equation to define an operation $x\mapsto x+(b+1)$ from $x\mapsto x+b$, and since these operations are now known to exist uniquely, \eqref{eqn:+} must hold.
However, Peano \cite[p.\ 95]{Peano} uses this equation by itself as a definition of addition, writing:\footnote{Peano has \eqref{eqn:+} in the form $a+(b+1)=(a+b)+1$.}
\begin{quote}
This definition has to be read as follows:  if $a$ and $b$ are numbers, and if $(a+b)+1$ has a meaning (that is, if $a+b$ is a number) but $a+(b+1)$ has not yet been defined, then $a+(b+1)$ means the number that follows $a+b$.
\end{quote}
Is Peano correct?  Can we take \eqref{eqn:+} as a definition in his sense?  If so, then we should be able to take the equations \eqref{eqn:x^1} and \eqref{eqn:exp+1} as a definition of exponentiation on, say, $\Z/3\Z$.  When Peano makes his remark, he has stated all of his axioms, and $\Z/3\Z$ does not satisfy all of them; still, it satisfies the Induction Axiom, and Peano does not appeal to any other axioms, or a lemma derived from them, to justify his remark.
Following Peano's procedure in $\Z/3\Z$ then, we get the successive rows of the following table:
\begin{equation*}
\begin{array}{l|ccc}
x&1&2&3\\\hline
x^1&1&2&3\\
x^2&1&1&3\\
x^3&1&2&3
\end{array}
\end{equation*}
We make no new row for $x^4$, since $4=1$ in $\Z/3\Z$, so $x^4$ has already been defined.  If we did try to make a row for $x^4$, using \eqref{eqn:exp+1}, then it would not agree with the row for $x^1$.  Thus, although we can use equations \eqref{eqn:x^1} and \eqref{eqn:exp+1} to give a definition, in Peano's sense, of exponentiation in $\Z/3\Z$, those equations are not identities under the definition.

Logically then, although we can use the rule \eqref{eqn:+} by itself to build up an addition table for $\N$ or $\Z/n\Z$, it does not follow that \eqref{eqn:+} is an identity.  This needs an additional argument.

Somewhat modernized, Peano's thinking seems to be this.  Let $A$ be the set of all $y$ such that an operation $x\mapsto x+y$ is defined.  Then $1\in A$.  Moreover if $b\in A$ then, since we can define $x+(b+1)$ by \eqref{eqn:+} when $y=b$, it follows that $b+1\in A$.  By induction, $A$ contains all $y$.  But this gives us no unique operation of addition.  Indeed, assuming $b\in A$, we can show $b+1\in A$ by defining $x+(b+1)$ as $x+b$ or even $1$.  What we must do is something like what we did: let $A$ be the set of all $y$ such that an operation $x\mapsto x+y$ is defined \emph{so as to satisfy \eqref{eqn:ih+}.}

Now I claim to have shown what I said at the beginning, that the definition of addition by means of \eqref{eqn:+} should not be called inductive, because such definitions are not generally justified by induction alone.  The underlying observation here is not original; again, Henkin makes it, and before him,
Landau.  (Landau in turn credits Kalm\'ar with the special proof that addition can indeed be established by induction alone.  Landau does not mention that only induction is used; nor does he give an example like exponentiation, where induction is definitely not enough.)  Using $y'$ for $y+1$, Landau writes in his `Preface for the Teacher':
\begin{quote}
On the basis of his five axioms, Peano defines $x+y$ for fixed $x$ and all $y$ as follows:
\begin{gather*}
	x+1=x'\\
	x+y'=(x+y)',
\end{gather*}
and he and his successors then think that $x+y$ is defined generally; for, the set of $y$'s for which it is defined contains $1$, and contains $y'$ if it contains $y$.

But $x+y$ has \emph{not} been defined.
\end{quote}
Landau once shared the confusion of Peano and his successors; the fact of this earlier confusion is a reason for publishing his book.
Nevertheless, despite the warnings of Landau, Henkin, and
others,\footnote{Dedekind was perhaps the first to give such a warning; see below.} confusion about these basic matters persists.   

I suggest that Landau himself is a bit confused about what an
axiom is; at least, he fails to make a distinction that we find it worthwhile to make today.  Peano himself gives \emph{nine} axioms for $\N$, but three of them are the
reflexive, symmetric, and transitive properties of equality of
numbers, and another is that something equal to a number is a number.  Landau rightly sets these aside as being purely logical properties,\footnote{Perhaps Peano himself, or one of the followers mentioned by Landau, had already done this setting aside.} not specific to elements of $\N$.
Peano's remaining five axioms are those mentioned by Landau and also given
earlier in the present article.  However, two of those, namely that $\N$
contains $1$ and
is closed under adding $1$,
are simply features of the structure\footnote{That is, they are formally entailed by considering $\N$ as a structure in a signature with a constant for $1$ and a singulary operation-symbol for $x\mapsto x+1$.  See \S\ref{sect:alg} below.} of $\N$, features whose properties
are fixed by the remaining three axioms. 

Burris gives these three axioms at the head of `The Dedekind--Peano
Number System', an appendix to his \emph{Logic for Mathematics and
  Computer Science} \cite{Burris}, an excellent book from which I have
learned a lot.  After stating the axioms, Burris defines addition as
Peano does.  As we have seen, the definition \emph{is} justified; but
it is not \emph{obviously} justified.  The student may come away from
that appendix with the wrong impression. 

A similarly wrong impression may be got from Mac Lane and Birkhoff's
\emph{Algebra} \cite[p.\ 15]{MR941522}, where right after the Peano axioms are
given, it is said that the natural numbers can serve to index iterates
of singulary (`unary') operations.  If this is so, then one might expect elements of $\Z/3\Z$ to serve as indices of iterated products---that is, as exponents of powers---in $\Z/3\Z$ (or in any $\Z/n\Z$, or $\Z$ itself); but as we have seen, this is not possible.

Also in his `Preface for the Teacher', Landau warns,
\begin{quote}
My book is
written, as befits such easy material, in merciless telegram
style (\textbf{`Axiom', `Definition', `Theorem', `Proof',} occasionally \textbf{`Preliminary Remark',} rarely words which do not belong to one of these five categories).
\end{quote}
But the material is \emph{not} easy.  Perhaps it is the
assumption that the material \emph{is} easy that led Landau and
others to be confused about it in the first place.  Such confusion
could have real mathematical consequences: it might lead one to
replace Fermat's Theorem with a false belief that $a^{p+1}\equiv a$ is an
identity \emph{modulo} $p$. 

Landau is not concerned with noting what can be proved by induction
alone; the point of his book is just to construct the fields of real
and complex numbers, so the analyst can use them in good conscience. 
Nonetheless, it seems worthwhile to note that, after defining addition on $\N$
as we have done, 
we can go on to establish, again by induction alone, that addition is
commutative and associative.  Using also that the operation $x\mapsto x+1$ is injective, we obtain that addition is cancellative (in the sense that
$x+z=y+z$ implies $x=y$).  Again by induction alone, there is a unique
operation of multiplication, which is commutative and associative, and
which distributes over addition, although it need not be
cancellative.  Thus we have $\N$ as a commutative semi-ring; but then we also have the set $\{1,\dots,n\}$ as a commutative semi-ring when we define $x\mapsto x+1$ on this set as in the following table, so that this operation is injective, and proof by induction is available.\footnote{It may be said that we do not know what $\{1,\dots,n\}$ means unless we have defined the ordering of $\N$, so that $\{1,\dots,n\}=\{x\in\N\colon 1\leq x\leq n\}$.  The theorem that $\N$ can indeed be linearly ordered in the usual way does require all of the Peano axioms.  Without proving this theorem though, we can still define $\{1,2\}$, then $\{1,2,3\}$, and so on, as far as we like.}
\begin{equation*}
\begin{array}{c|cccc}
x  &1&\dots&n-1&n\\\hline
x+1&2&\dots&n  &1
\end{array}
\end{equation*}
Here $n+1=1$, and then by induction $n+x=x$, so $n$ is neutral for addition; also then, every element has an additive inverse, so the set $\{1,\dots,n\}$ becomes the ring $\Z/n\Z$.
Of course, once one has the ring-structure of $\Z$, derived perhaps from the semi-ring structure of $\N$, then it is easy to show that congruence \emph{modulo} $n$ respects this structure, so that $\Z/n\Z$ is ring.  Still, it seems worthwhile to note that most of this \emph{has already been shown} in establishing the semi-ring structure of $\N$, because the very same arguments work for $\Z/n\Z$.

Again, the attempt to define exponentiation by induction alone breaks
down almost completely.  For every $n$ in $\N$, for every element $x$ of
$\Z/n\Z$, there is of course a function $y\mapsto x^y$ from $\N$ to
$\Z/n\Z$ satisfying the identities~\eqref{eqn:x^1} and \eqref{eqn:exp+1}; but this needs
more than induction.  The full result is the following.

\begin{theorem}\label{thm:rec}
For every set $A$ that has an element $1$ and is closed under an
operation $x\mapsto x+1$, the following are equivalent conditions. 
\begin{compactenum}
\item
The operation $x\mapsto x+1$ is injective, 
but its range does not contain $1$, and
no proper subset of $A$ contains $1$ and is closed under the operation.
\item
For every set $B$ that has an element $c$ and is closed under an
operation $x\mapsto f(x)$, there is a unique function $g$ from $A$ to
$B$ satisfying the identities 
\begin{align*}
g(1)&=c,&g(x+1)&=f(g(x)).
\end{align*} 
\end{compactenum}
\end{theorem}

Dedekind's work on the natural numbers predates Peano's, and his
mathematical understanding seems to be more profound.  He gives the
forward direction of Theorem~\ref{thm:rec} in his \emph{Nature and
  Meaning of Numbers} of 1887 \cite[II(126), p.\ 85]{MR0159773}.  It
is an accident of history that the Peano axioms are usually so called.
Peano does give us some notation, which has perhaps helped solidify
the ideas behind it.  Russell and Whitehead may have helped spread the
notation through the \emph{Principia Mathematica}: their sign
$\supset$ for implication is derived from Peano's reversed C, and they
use Peano's sign $\in$ for membership of an individual in a class
(originally the sign is an epsilon, for the Greek
\foreignlanguage{greek}{>est'i} `is' \cite[pp.\ 25--26]{PM}).
Dedekind\label{Ded-con} himself does not distinguish between this
membership relation and the subset relation: he used the same sign for
both, looking something like a $3$ or perhaps a \emph{reversed} epsilon
(\cite[II(3), p.\ 46]{MR0159773} or \cite[p.\ 3]{Dedekind-German}). 
  
  Henkin
\cite[p.\ 337]{MR0120156} proves the reverse direction of Theorem~\ref{thm:rec}, but does not explicitly mention any earlier proof.  If the theorem is difficult, the difficulty lies in recognizing that such a theorem \emph{might} be true; once one can recognize this possibility, proving the theorem is not that hard.\footnote{\label{note:rec}One can obtain the function $g$ as the union of the set of all sets $\{(1,c),(2,f(c)),(3,f^2(c)),\dots,(n,f^{n-1}(c))\}$; this is Dedekind's approach.  Alternatively, $g$ is the intersection of the set of all relations $R$ from $A$ to $B$ such that $1\mathrel Rc$ and $(x+1)\mathrel R(f(y))$ whenever $x\mathrel Ry$.  In the words of Enderton \cite[p.\ 35]{MR1801397}, these are the bottom-up and top-down approaches, respectively.}  Similarly, it is not so hard to prove the independence of Euclid's Parallel Postulate from his others; but it seems to have taken more than two thousand years to recognize the possibility of such a proof.

\section{Algebra}\label{sect:alg}

Theorem~\ref{thm:rec} can be understood as being about \emph{algebras} in a \emph{signature} with one constant and one singulary operation-symbol.  In the most general sense, an \textbf{algebra} is a set with some distinguished operations and elements; those operations and elements are given symbols, which compose the \textbf{signature} of the algebra.
Theorem~\ref{thm:rec} is about an algebra $(A,1,x\mapsto x+1)$, or more simply $\str A$.  In a word, the second condition in the theorem is that $\str A$ admits \textbf{recursion}; more elaborately, the algebra admits \textbf{recursive definition of homomorphisms.}  Another way to say this is that the algebra is \textbf{absolutely free:} that is, it is a free object in the category of \emph{all} algebras of its signature.
In the first condition, the part about not having certain proper subsets is that $\str A$ has no proper \emph{subalgebras}; in a word, $\str A$ is \textbf{minimal.}

Again, such minimality is not enough to establish recursion.  Dedekind
\cite[\P130, p.\ 89]{MR0159773} notes this, observing in effect that
there is no homomorphism from $\Z/2\Z$ to $\Z/3\Z$, even though the
former admits induction.  (Nonetheless, Dedekind does refer to definition by
recursion as definition by induction.) 

If we understand an element of a set as a nullary operation on the set, then Theorem \ref{thm:rec} can be understood as a special case of the following.

\begin{theorem}\label{thm:gen}
An arbitrary algebra is absolutely free if and only if:
\begin{compactenum}[1)]
\item
the algebra is minimal,
\item
each of its operations is injective,
\item
the ranges of any two of those operations are disjoint.
\end{compactenum}
\end{theorem}

To establish the sufficiency of the three conditions,\footnote{See note \ref{note:rec} above.  Enderton \cite[p.~39]{MR1801397} establishes sufficiency in case the signature has one binary, one singulary, and arbitrarily many nullary operation-symbols.  We shall look at a similar case in \S\ref{sect:logic} below.} one proceeds as one would for the corresponding implication in Theorem~\ref{thm:rec}: given an algebra $\str A$ meeting those conditions and an arbitrary algebra $\str B$ of the same signature, one obtains a unique homomorphism from $\str A$ to $\str B$, either as the intersection of the set of all \emph{relations} from $A$ to $B$ with the appropriate properties, or as the union of the set of certain \emph{partial} functions from $A$ to $B$.  For the necessity of the three conditions, one observes that all absolutely free algebras of a given signature are isomorphic to one another; then it is enough to observe that one example meets the conditions.  In the situation of Theorem~\ref{thm:rec}, one already has such an example, or rather, one \emph{assumes} that one has an example, namely $(\N,1,x\mapsto x+1)$.  In the case of a signature with no constants, not only are all absolutely free algebras isomorphic to one another, but they are identical with one another: they are empty.

In case there \emph{are} constants, the natural example of a free algebra would seem to be the \emph{term algebra,}\label{term-alg} as described for example by Hodges \cite[\S1.3, p.~14]{MR94e:03002}.  I want to work out one case, by way of arguing that it is indeed mathematically worthwhile to be aware of Theorem~\ref{thm:gen} in its generality, and not only Theorem~\ref{thm:rec}.

\section{Propositional logic}\label{sect:logic}

In his `automathography' \cite[p.~206]{MR961440}, Halmos writes:
\begin{quote}
An exposition of what logicians call the propositional calculus can annoy and mystify mathematicians.  It looks like a lot of fuss about the use of parentheses, it puts much emphasis on the alphabet, and it gives detailed consideration to `variables' (which do not in any sense vary).  Despite (or because of?) all the pedantic machinery, it is hard to see what genuine content the subject has.  Insofar as it talks about implications between propositions, everything it says looks obvious.  Does it really have any mathematical value?

Yes it does\dots Question: what is the propositional calculus?  Answer: the theory of free Boolean algebras with a countably infinite set of generators.
\end{quote}
It is good that Halmos found a way to understand logic that satisfied him; but he seems to have missed the point.  For one thing, propositional logic is not just about free \emph{Boolean} algebras: it is about certain \emph{absolutely free} algebras, in the sense above, from which Boolean algebras are obtained as \emph{quotients.}
This is how the fuss and pedantry arises that Halmos decries; but I think it is inevitable, and I hope to make it a little more interesting below.

Meanwhile, logic should be understood as \emph{foundational} for mathematics.  One \emph{can,} generally \emph{does,} and probably \emph{must} learn mathematics before symbolic logic; but if one wants to formalize one's work, at least by way of checking for mistakes, then one will reduce one's mathematics to logic, and not the other way around as Halmos does.

I shall describe here a version of the prositional calculus
that Church \cite[ch.\ I]{MR18:631a} develops from that of \L
ukasiewicz.  We first choose a set $V$ of propositional `variables'.  In the algebraic sense above, these variables will indeed be constants.  One does generally want $V$ to be countably infinite, but this will make little difference for us.  Actually, it is perhaps philosophically best to consider $V$ as finite, as long as it can be made as large as necessary to cover any particular situation.  

We define a set of propositional formulas by three rules:   
\begin{compactenum}[1)]
\item
every variable is a formula,
\item
$0$ is a formula, and
\item
if  $F$ and $G$ are formulas, then so is the \emph{implication} $(F\lto G)$.
\end{compactenum}
Thus we obtain a \emph{term algebra}\label{term-alg2} in the signature $V\cup\{0,\lto\}$.
We may understand this definition to pick out the formulas from the set of all \emph{strings} of our symbols, just as our original definition of $\N$ picked out its elements from $\R$.  This set of strings might be formalized as the set of functions from sets $\{1,\dots,n\}$ or $\{0,\dots,n-1\}$ into our set of symbols.  Again though, this way of thinking is backwards or anachronistic, if we think of symbolic logic as being developed for the sake of formalizing the notions of sets and functions and numbers in the first place.  
One should understand a string of symbols as something that can be written down, on paper, left to right in the usual way.
From the given definition of formula, it is easy enough to show that any particular written string is a formula; and that is all we need.

We want to know that the algebra of formulas is \emph{absolutely free.}  That is, we want formulas to have \textbf{unique readability.}  If we accept Theorem~\ref{thm:gen}, then since by definition the algebra of formulas is minimal, we need only show that
\begin{inparaenum}[(1)]
\item
the operation $(F,G)\mapsto(F\lto G)$ is injective and 
\item
its range contains neither $0$ nor a variable.  
\end{inparaenum}
These facts correspond to the two Peano axioms other than Induction; but in the present case they are theorems.  One of the theorems is trivial, since all implications have more than one symbol.  The other theorem is that if $(F\lto G)$ and $(H\lto K)$ are the same formula, then $F$ and $H$ are the same formula (and hence also $G$ and $K$ are the same), assuming $F$, $G$, $H$, and $K$ are formulas in the first place.  This follows from the lemma that no proper initial segment of a formula is a formula.  One can prove this by induction on the \emph{lengths} of formulas: that is, one can prove it as a theorem about	 the algebra $\N$.  
Again this might be anachronistic, if one is developing logic in order to formalize $\N$.
Alternatively, one can prove simultaneously by induction in the algebra of formulas that every formula
neither
\begin{inparaenum}[(1)]
\item
has a formula as a proper initial segment, nor
\item
is itself a proper initial segment of another formula.
\end{inparaenum}

It may be legitimate to consider unique readability of formulas as
being obvious.  From school arithmetic, we have the sense that we can
always put in enough brackets to make a given expression unambiguous.
In our present system, \emph{every} implication is surrounded by
brackets; is it not obvious that this is enough to ensure unique
readability?  Church seems to think so.  He first notes in passing
\cite[p.\ 38, n.\ 91]{MR18:631a} that brackets can be dispensed with
entirely by using the prefix notation of \L ukasiewicz, but does not dwell on the issue. Later \cite[\S10, pp.\ 70--71]{MR18:631a} he gives an algorithm for determining whether a given string is a formula.  Given a string beginning and ending with brackets, the algorithm aims to detect (by counting brackets) an arrow of implication in the string such that, if the string \emph{is} a formula, then the two strings between the arrow and the outer brackets must be formulas.  Then the algorithm is applied in turn to those two strings, and so on.  Church calls the arrow found by the algorithm the \emph{principal implication sign} of the formula.  Now, by definition an implication must have \emph{a} principal implication sign; but Church does not exactly \emph{prove} that his algorithm finds such a sign in every well-formed implication.  Even if this is granted, there remains the question of whether the sign is unique.\footnote{Burris proves
unique readability for the \L ukasiewicz notation only, but seems to take unique
reability of bracketed infix notation as obvious \cite[pp.\ 39, 142,
  197]{Burris}.  Enderton notes in effect \cite[p.\ 30]{MR1801397} that unique readability is immediate if the formula $(F\lto G)$ is understood to be the ordered triple $(F,\lto,G)$.  Again this is anachronistic: it begs the question of whether we \emph{can} write down formulas in such a way that an arbitrary implication can be unambiguously analyzed as an ordered triple $(F,\lto,G)$.  If writing the formula as a \emph{string} $(F,\lto,G)$ is enough, then it is enough to write it without the commas, as $(F\lto G)$.  But the brackets \emph{are} needed; why is that?  Here one may start to feel the need to \emph{prove} that the brackets are enough; and Enderton does in fact supply a proof.}

Whether one proves it or not, unique
readability should be stated clearly, in order to be able to emphasize, as we
shall do below, what might be called the `non-unique readability' of
\emph{theorems.} 

Meanwhile, we need unique readability of formulas to define
\textbf{interpretations,} namely functions $h$ from the algebra of formulas into
$\{0,1\}$ such that $h(0)=0$ and 
\begin{equation*}
h((F\lto G))=1\text{ if and only if }h(F)=0\text{ or }h(G)=1.
\end{equation*}
(So one thinks of $0$ as \emph{falsehood,} and $1$ as \emph{truth.})
An interpretation then is determined by its restriction to the set $V$ of variables.
In the spirit of Halmos, we may note that an interpretation is a homomorphism into the two-element field, if we there understand $x\lto y$ to be $x\cdot y+x+1$.  Moreover, if two formulas are understood to be \textbf{equivalent} if their images are equal under every interpretation, then the set of equivalence-classes of formulas can indeed be understood as a Boolean algebra, at least when we make the usual definitions:  the \emph{negation} $\lnot F$ is $(F\lto0)$, the \emph{disjunction} $(F\lor G)$ is $(\lnot F\lto G)$, and the \emph{conjunction} $(F\land G)$ is $\lnot(\lnot F\lor\lnot G)$.  But again, the point of logic is to show how such a Boolean algebra can be obtained in the first place.  
The Boolean algebra of equivalence-classes of propositional formulas in two variables can be depicted in the Hasse diagram in Figure~\ref{fig:B}; 
\begin{figure}[ht]
\begin{equation*}
\xymatrix@!0@C=10mm@R=12mm{
&&&&&
(P\lor\lnot P)\ar@{-}[dllll]\ar@{-}[dl]\ar@{-}[dr]\ar@{-}[drrr]&&&&\\
&
(P\lto Q)\ar@{-}[dl]\ar@{-}[dr]\ar@{-}[drrr]&&&
(P\lor Q)\ar@{-}[dllll]\ar@{-}[dr]\ar@{-}[drrr]&&                                          (Q\lto P)\ar@{-}[dllll]\ar@{-}[dl]\ar@{-}[drrr]&&
\lnot(P\land Q)\ar@{-}[dllll]\ar@{-}[dl]\ar@{-}[dr]&\\
Q\ar@{-}[dr]\ar@{-}[drrr]&&
(P\liff Q)\ar@{-}[dl]\ar@{-}[drrr]&&
\lnot P\ar@{-}[dl]\ar@{-}[dr]&
P\ar@{-}[dllll]\ar@{-}[drrr]&&
\lnot(P\liff Q)\ar@{-}[dllll]\ar@{-}[dr]&&
\lnot Q\ar@{-}[dllll]\ar@{-}[dl]\\
&
(P\land Q)\ar@{-}[drrr]&&
\lnot(Q\lto P)\ar@{-}[dr]&&
\lnot(P\lor Q)\ar@{-}[dl]&&&
\lnot(P\lto Q)\ar@{-}[dllll]&\\
&&&&P\land\lnot P&&&&&
}
\end{equation*}
\caption{Hasse diagram for an algebra of equivalence-classes of formulas}\label{fig:B}
\end{figure}
but a question that logic must take up is how a node in the diagram can be named so that its position in the diagram can be inferred, even if the name is seen in isolation from other names and the diagram itself.  A naming scheme as in Figure~\ref{fig:no} 
\begin{figure}[ht]
\begin{equation*}
\xymatrix@!0@C=10mm@R=12mm{
&&&&&
F_1\ar@{-}[dllll]\ar@{-}[dl]\ar@{-}[dr]\ar@{-}[drrr]&&&&\\
&
F_2\ar@{-}[dl]\ar@{-}[dr]\ar@{-}[drrr]&&&
F_3\ar@{-}[dllll]\ar@{-}[dr]\ar@{-}[drrr]&&                                          F_4\ar@{-}[dllll]\ar@{-}[dl]\ar@{-}[drrr]&&
F_5\ar@{-}[dllll]\ar@{-}[dl]\ar@{-}[dr]&\\
F_6\ar@{-}[dr]\ar@{-}[drrr]&&
F_7\ar@{-}[dl]\ar@{-}[drrr]&&
F_8\ar@{-}[dl]\ar@{-}[dr]&
F_9\ar@{-}[dllll]\ar@{-}[drrr]&&
F_{10}\ar@{-}[dllll]\ar@{-}[dr]&&
F_{11}\ar@{-}[dllll]\ar@{-}[dl]\\
&
F_{12}\ar@{-}[drrr]&&
F_{13}\ar@{-}[dr]&&
F_{14}\ar@{-}[dl]&&&
F_{15}\ar@{-}[dllll]&\\
&&&&F_{16}&&&&&
}
\end{equation*}
\caption{Bad names for equivalence-classes of formulas}\label{fig:no}
\end{figure}
would not be very useful.

In the foreword of his \emph{Algebra,} Lang writes \cite[p.\ v]{Lang-alg}:
\begin{quote}
Unfortunately, a book must be projected in a totally ordered way on the page axis, but that's not the way mathematics `is', so readers have to make choices how to reset certain topics in parallel for themselves, rather than in succession.
\end{quote}
Logic might be said to investigate how this projecting can be done so that readers are indeed able to recover what they want.

Again, to define an interpretation of formulas, we use the unique readability of formulas.
We use this also to
define a certain binary operation on the set of formulas.  Let us use the symbol $*$ for this operation, so that we can define it by 
\begin{equation*}
F*G=\begin{cases}
	H,&\text{ if $G$ has the form $(F\lto H)$;}\\
	G,&\text{ otherwise.}
\end{cases}
\end{equation*}
Such an operation is a \textbf{rule of inference} (this one being called \emph{Modus Ponens} or Detachment).
We do not actually need unique readability in order to define, as \textbf{logical axioms,}
all formulas of one of the forms 
\begin{gather*}
(F\lto(G\lto F)),\\
	((F\lto(G\lto H))\lto((F\lto G)\lto(F\lto H))),\\
	(((F\lto0)\lto0)\lto F).
\end{gather*}
We obtain the set of \textbf{theorems} by taking the set of logical axioms and
closing under $*$ within the set of formulas.  The set of theorems can then be understood as a
minimal algebra in a signature with $*$ and with a constant
for each logical axiom.  However, unlike the algebra of formulas, the algebra
of theorems is not free.  

Because of the freeness of the algebra of
formulas, each formula has a unique \emph{parsing tree}; for example,
one of the logical axioms has the parsing tree in Figure~\ref{fig:parse}, where $P$, $Q$, and $R$ are propositional variables.
\begin{figure}[ht]
\begin{center}
\pstree[levelsep=30pt,treemode=U]
{\Treebox{$((P\lto(Q\lto R))\lto((P\lto Q)\lto(P\lto R)))$}}
         {\pstree{\Treebox{$(P\lto(Q\lto R))$}}
                 {\Treebox{$P$}
                 {\pstree{\Treebox{$Q\lto R$}}
                         {\Treebox{$Q$}
                          \Treebox{$R$}}}}
          \pstree{\Treebox{$((P\lto Q)\lto(P\lto R))$}}
                 {\pstree{\Treebox{$(P\lto Q)$}}
                         {\Treebox{$P$}
                          \Treebox{$Q$}}
                  \pstree{\Treebox{$(P\lto R)$}}
                                 {\Treebox{$P$}
                                  \Treebox{$R$}}}}
\end{center}
\caption{a parsing tree}\label{fig:parse}
\end{figure}
Strictly the tree is an \emph{ordered} tree, in the sense that left branches must be distinguished from right branches.

The minimality of the algebra of theorems means that each
theorem has a \emph{proof,} which also can be understood as a tree; but this proof
is not unique.  For example,
the theorem $(P\lto P)$ has the proof shown in Figure~\ref{fig:proof},
\begin{figure}[ht]
\begin{center}
\pstree[levelsep=30pt,treemode=U]
{\Treebox{$(P\lto P)$}}
{\Treebox{$(P\lto F)$}
 \pstree{\Treebox{$((P\lto F)\lto(P\lto P))$}}
        {\Treebox{$(P\lto(F\lto P))$}
         \Treebox{$((P\lto(F\lto P))\lto((P\lto F)\lto(P\lto P)))$}}}
\end{center}
\caption{a proof, where $F$ is $(G\lto P)$}\label{fig:proof}
\end{figure}
 where $F$ stands for a formula $(G\lto P)$, where $G$ can be \emph{any} formula.
We may debate whether a proof is `really' a tree, as opposed to a linearly ordered refinement of the tree, such as
\begin{gather*}
	((P\lto((G\lto P)\lto P))\lto((P\lto (G\lto P))\lto(P\lto P))),\\
	(P\lto((G\lto P)\lto P)),\\
	((P\lto (G\lto P))\lto(P\lto P)),\\
	(P\lto (G\lto P)),\\
	(P\lto P).
\end{gather*}
Normally a proof is something that can be \emph{read,} and reading is done linearly;
in the language of Lang, a proof is a totally ordered projection on the page axis.
But to \emph{understand} a proof is to understand the relation of each of its steps to other steps; and this relation places those steps as nodes of a tree.

As there is an algorithm to determine whether an arbitrary string of symbols is a formula, so there is an algorithm to determine whether an arbitrary formula is a theorem in the present sense.
Indeed, by
induction in the algebra of theorems, if $F$ is a theorem, then $1$ is its only interpretation, that is, the formula is \textbf{logically true.}  The converse of this implication is also true, although the proof is not obvious.\footnote{One approach is the following.  For every formula $F$, if $h$ is an interpretation, define $F^h$ as $F$ itself if $h(F)=1$, and otherwise let $F^h$ be $(F\lto0)$.  If the variables of $F$ are among $P_0$, \dots, $P_{n-1}$, then one shows, by induction in the algebra of formulas, that the formula $(P_0{}^h\lto\cdots\lto (P_{n-1}{}^h\lto F^h)\cdots)$ is a theorem.  In particular, if $F$ is logically true, then $(P_0{}^h\lto\cdots\lto (P_{n-1}{}^h\lto F)\cdots)$ is always a theorem.  Then one can in turn eliminate each $P_i{}^h$, using the theorem $((P\lto G)\lto((\lnot P\lto G)\lto G))$.}  And there is an algorithm to determine whether a formula is logically true: just check its interpretations under all of the (finitely numerous) interpretations of its variables: that is, check its \emph{truth table.}  If the formula is indeed logically true, then we can find a proof of it.

But this is a special feature of propositional logic; it fails in \emph{first-order logic.}\footnote{More precisely first-order \emph{predicate} logic, or perhaps first-order \emph{quantifier} logic.}  I am going to try to describe this logic as tersely as possible for present purposes; but the logic is less neat than propositional logic.  Still, we should appreciate that it is one solution found, after decades if not millenia of labor, to the problem of how mathematics can be expressed.

\section{First-order logic}

In first-order logic, the role of propositional variables is taken by \textbf{atomic formulas,} which express equality or other relations between \emph{individuals}; these individuals are denoted by \textbf{terms,} that is, members of the appropriate \emph{term algebra.}\footnote{See pages \pageref{term-alg} and \pageref{term-alg2}.}  
Again this algebra has a signature, comprising $n$-ary operation-symbols for various $n$, including $0$; but there are also \textbf{individual variables,} which---as far as constructing terms is concerned---play the role of constants, that is, nullary operation-symbols.  Terms can be understood as \emph{polynomials.}

First order logic also introduces new operations on formulas: $F\mapsto\Exists xF$ and $F\mapsto\Forall xF$, where $x$ is an individual variable.  
(One can understand $\Forall x\phi$ as an abbreviation of $\lnot\Exists x\lnot\phi$.)
Every occurrence of $x$ in $\Exists xF$ or $\Forall xF$ is \emph{bound.}  A formula in which all occurrences of variables are bound is a \textbf{sentence.}  The set of sentences is not defined inductively; rather, the function that assigns to each formula its set of \emph{free} variables is defined recursively, and then the set of sentences is the inverse image of the empty set of variables under this function.  Thus the definition of sentences does require unique readability of formulas.

An \textbf{interpretation} is still a function on the set of formulas.  Its codomain is no longer $\{0,1\}$, but is instead the family of subsets of finite Cartesian powers of some set $M$.  The interpretation may then be denoted by $\str M$, or more precisely $\phi\mapsto\phi^{\str M}$.  Here $\phi^{\str M}$ can be understood as the \emph{solution set} of the formula $\phi$ in $\str M$, so that if $\phi$ has $n$ free variables, then $\phi^{\str M}$ is a subset of $M^n$.  The interpretation is \emph{not} determined by its restriction to the set of variables; indeed, variables are no longer formulas.  The interpretation is determined by its restriction to the set of atomic formulas.  But now further analysis is possible.  Each atomic formula is a combination of a relation-symbol (possibly the equals sign) and the appropriate number of terms.  Each term $t$ then has an interpretation $t^{\str M}$, which is an operation on $M$.  In particular, if $t$ has $n$ variables, then $t^{\str M}$ is a function from $M^n$ to $M$.  The map $t\mapsto t^{\str M}$ is defined recursively from certain operations $S^{\str M}$ on $M$ that are assigned to the operation-symbols $S$ in the logic.  Then the interpretation of an atomic formula is determined by the relations $S^{\str M}$ on $M$ assigned to the relation-symbols $S$ in the logic.  The whole interpretation $\str M$ is now determined by the set $M$ and the operations and relations $S^{\str M}$ on $M$ for the various symbols $S$.  The interpretation is a \textbf{structure} on $M$.

If $\sigma$ is a sentence, then by the foregoing account $\sigma^{\str M}$ should be a subset of $M^0$.  But $M^n$ can be understood as the set of functions from $\{0,\dots,n-1\}$ to $M$; in particular, $M^0$ is the set of functions from the empty set to $M$.  There is only one such function, the empty set, which can be denoted by $0$; and then $M^0=\{0\}$, which can be denoted by $1$.  Thus $\sigma^{\str M}$ is either $0$ or $1$: it is $0$, if $\sigma$ is \textbf{false} in $\str M$, and $1$ if $\sigma$ is \textbf{true.}  More generally a formula $\phi$ with $n$ free variables can be considered as true in $\str M$ if $\phi^{\str M}=M^n$; but then a formula that is not true is not necessarily false.

A structure in which every formula in a given set is true is a
\textbf{model} of that set.  That set then \textbf{entails} every
formula that is true in every model of the set.  A formula entailed by
the empty set is \textbf{logically true.}  A set of formulas that is
closed under entailment is a \textbf{theory.}  By this definition, a
theory has no obvious algebraic structure whereby the theory is
\emph{minimal} in the sense discussed earlier.
Nonetheless, a theory can be given such an algebraic structure: this
is the import of \textbf{G\"odel's Completeness Theorem} of 1930
\cite{Goedel-compl}.

An algebraic structure on theories is a \textbf{proof system.}  A proof system then is a set of \emph{logical axioms}---certain formulas---and \emph{rules of inference}---certain operations on formulas; the logical axioms can be considered as nullary operations.  We require the logical axioms to be logically true, and the rules of inference to yield only formulas that are entailed by their arguments.
G\"odel's theorem is that there is a proof system\footnote{As logical axioms of this proof system, we can take those of propositional logic, along with the formulas
$((\phi\lto\psi)\lto(\phi\lto\Forall x\psi))$
where $x$ does not occur freely in $\phi$.
The remaining axioms are
$(\Forall x\theta\lto\theta^x_t)$,
where $t$ is a constant or variable, and $\theta^x_t$ is the result of replacing all free occurrences of $x$ in $\theta$  with $t$; if $t$ is a variable, it must not occur freely in a subformula of $\theta$ where $x$ is bound.  As rules of inference, we can take $*$ as before, along with the operations
$\phi\mapsto\Forall x\phi$.}
 that is
\textbf{complete} in the sense that every algebra of formulas with
respect to this system is already a
theory.\footnote{G\"odel's proof requires introduction of new relation-symbols; Henkin's improvement \cite{MR0033781} uses new constants.  Church \cite{MR18:631a} presents both proofs.} 

The set $T$ of formulas entailed by a set $\Gamma$ of formulas is also the smallest algebra of formulas (with respect to a complete proof system) that includes $\Gamma$.  Then $\Gamma$ is a set of \textbf{axioms} for $T$.  This set $T$ is a theory, but it need not be \textbf{complete} as a theory; that is, there may be a
sentence such that neither itself nor its negation is entailed by $\Gamma$.
(For a formula with
free variables, neither it nor its negation need belong to a given
complete theory.)  Presburger showed in 1930 that we can write down a set $\Gamma$ of axioms such that 
\begin{inparaenum}[(1)]
\item
the semigroup $(\N,1,+)$ is a model of $\Gamma$, and
\item
the theory entailed by $\Gamma$ is complete.
\end{inparaenum}
More precisely, although $\Gamma$ is infinite, we can write down as much of it as we need.  Indeed, $\Gamma$ contains, for each $n$ in $\N$, the axiom
\begin{equation*}
\bigvee_{k=1}^n(x=k\lor\Exists yx=ny+k),
\end{equation*}
where $k$ as a term stands for $1+\dotsb+1$ (with $k$ occurrences of $1$) and $ny$ stands for $y+\dotsb+y$ (with $n$ occurrences of $y$); and $\Gamma$ contains finitely many other axioms.\footnote{The other axioms are
$x+y=y+x$,
$x+(y+z)=(x+y)+z$,
$x\neq y\lto\Exists z(x+z=y\lor y+z=x)$,
$x+y\neq x$, and
$x+y\neq 1$.
  See Hodges \cite[pp.\ 85 \&\ 128]{MR94e:03002}; I have not consulted Presburger's original paper.  One way to prove the theorem is to introduce a relation-symbol $<$ so that the atomic formula $x<y$ means $\Exists zx+z=y$; and also, for each $n$ in $\N$, to allow $x\equiv y\pmod n$ as an atomic formula.  In the enlarged signature, every formula is equivalent to a formula with the same free variables, but without quantifiers; also, for every quantifier-free sentence, itself or its negation is entailed by the axioms, since itself or its negation is true in $(\N,1,+)$, and this structure embeds in every model of the axioms.}  
However, by \emph{G\"odel's Incompleteness Theorem} of 1931 \cite{Goedel-incompl}, the same cannot be done for the semi-ring $(\N,1,+,{}\cdot{})$: while the set of formulas that are true in this structure is complete in the present sense, it is not entailed by a set of formulas determined by a rule.\footnote{\label{n:code}G\"odel assigns to each formula $\phi$ a \emph{code,} which is in $\N$.  We can treat operations on $\N$ as composing a kind of algebra: the constants of this algebra are $x\mapsto x+1$, the projections $(x_0,\dots,x_{n-1})\mapsto x_i$, and the constant functions $(x_0,\dots,x_{n-1})\mapsto c$; and there is one binary operation, converting an $n$-ary operation $f$ and an $n+2$-ary operation $g$ into the $n+1$-ary operation $h$ given recursively by $h(\vec x,1)=f(\vec x)$ and $h(\vec x,y+1)=g(\vec x,y,h(\vec x,y))$.  The operations in the \emph{minimal} algebra are called \emph{recursive,} and a subset of $\N$ is called \emph{recursive} if it is $f^{-1}(1)$ for some singulary recursive operation $f$.  Then G\"odel's theorem is that there is no complete theory entailed by formulas that are true in $(\N,1,+,{}\cdot{})$ and whose codes compose a recursive set.  It follows that such a set of codes cannot even be \emph{recursively enumerable,} that is, be the range of a recursive operation.}


It is tedious to work through the number-theoretic details of G\"odel's original argument, but a corresponding incompleteness result is more readily established for set theory.  Moreover, if one agrees with Landau that the analyst ought not just to \emph{assume} the real numbers, but should \emph{construct} them from the natural numbers, then perhaps one ought to construct the natural numbers too, and not just assume them (as Landau does); and this construction can be done in set theory.  Finally, set theory is a context for considering the \emph{inductive} definitions and the minimal algebras that we have mentioned.


\section{Set theory}

From ordinary language, we have the notion of a \emph{collective noun.}  Singular in form, a collective noun refers to many things as one thing.  The Russell Paradox of 1902 \cite{Russell-letter} is that there is no most general collective noun; for if there were, it could be the word \emph{set,} but then there would be a set of all sets that did not contain themselves, and this set would both contain itself and not.

Nonetheless, there can be a most general collective noun for our purposes; as this noun, let us use the word \emph{collection.}  For mathematical study, we distinguish certain collections as \textbf{sets.}  We determine the properties of sets axiomatically.  Our language for doing this is first-order logic with no constants, no operation-symbols, and only one relation-symbol: Peano's $\in$ as mentioned above.  For us this is a binary symbol that takes, as its right argument, a set, and as its left argument, a possible element of the set.  It is then simplest to take \emph{both} arguments of $\in$ as sets.  Our variables then will range over sets alone.

We need not have an official symbol for equality, but can approach the matter as follows.  A given singulary formula $\phi(x)$ will define a collection, namely the collection of all sets $a$ such that $\phi(a)$ is true.  Such a collection will be called a \textbf{class.}  We consider two classes to be \textbf{equal} if they have the same members.

In particular, if $a$ is a set, then we have the formula $x\in a$ (with the parameter $a$).  We consider the class defined by this formula to be $a$ itself.  That is, every set is a class, namely the class of its members.  In particular, two \emph{sets} are equal if they have the same members.  We can now use the expression $x=y$ as an abbreviation of the formula
\begin{equation*}
\Forall z(z\in x\liff z\in y).
\end{equation*}
Now we can state the following.

\begin{axiom}[Equality]
Equal sets are members of the same sets:
\begin{equation*}
\Forall x\Forall y(x=y\lto\Forall z(x\in z\liff y\in z)).
\end{equation*}
\end{axiom}

Alternatively, if the sign of equality were an official relation-symbol, then the sentence
\begin{equation*}
\Forall x\Forall y(x=y\liff\Forall z(z\in x\liff z\in y))
\end{equation*}
would probably be taken
as an axiom, the Axiom of Extensionality, as in Zermelo's original treatment of 1908 \cite[p.\ 201]{Zermelo-invest}.  Then the above Equality Axiom
would be taken as logically true.  Indeed, more would be logically true, namely
\begin{equation*}
\Forall x\Forall y(x=y\lto(\phi(x)\liff\phi(y)))
\end{equation*}
for all singulary formulas $\phi$ (possibly with parameters).  However, for us the truth of all of these sentences is a \emph{theorem,} which can be established by induction in the algebra of formulas.

For our purposes, we introduce two more axioms:

\begin{axiom}[Empty Set]
The empty class is a set.
\end{axiom}

\begin{axiom}[Adjunction]
  For all sets $a$ and $b$, there is a set whose members are $b$ and the members of $a$. \end{axiom}

The empty set is denoted by $0$; the set whose members are $b$ and the members of $a$ is $a\cup\{b\}$.
The Empty-Set and Adjunction axioms together can be understood as saying simply that every \emph{finite} collection of sets is a set.
In particular,
\begin{align*}
&0,& 
&\{0\},& 
&\{0,\{0\}\},&
&\{0,\{0\},\{0,\{0\}\}\}, 
\end{align*}
% $0$, $\{0\}$, $\{0,\{0\}\}$, $\{0,\{0\},\{0,\{0\}\}\}$, 
and so forth are sets, each being the collection of all previously known sets.  This sequence is $0$, $0'$, $0''$, $0'''$, and so forth, where in general $x'=x\cup\{x\}$.  We may call $x'$ the \textbf{successor} of $x$.  We now define $1$ as $0'$, and we define $\N$ as the smallest collection of sets that contains $1$ and is closed under succession.  However, we should be able to \emph{prove} that the three Peano Axioms are satisfied.
The Axiom of Induction is satisfied by definition of $\N$.  Then we have the following lemma, perhaps the simplest application of this axiom.

\begin{lemma}
Every element of $\N$ is the successor of some set.
\end{lemma}

\begin{proof}
$1$ is the successor of $0$, and every successor of a successor is in particular a successor of some set.
\end{proof}

\begin{theorem}
$1$ is not the successor of any element of $\N$.
\end{theorem}

\begin{proof}
If $1=a'$, this means $\{0\}=a\cup\{a\}$, so in particular $a=0$.  But $0$ is not in $\N$, by the lemma.
\end{proof}

It is not so easy to prove that succession is injective on $\N$.  
%We want to show that every element of $\N$ is $a'$ for some \emph{unique} set $a$ that is either $0$ or an element of $\N$.  By the theorem, this is true when $a=1$.  
Suppose
%it is true when $a=b$, but now 
$b'=c'$, that is, $b\cup\{b\}=c\cup\{c\}$.  If $b\neq c$, then we must have $b\in c$ and $c\in b$.  This peculiar possibility can be ruled out by another axiom; but there is a better way.

A class $\bm C$ defined by a formula $\phi$ is a \textbf{subclass} of a class $\bm D$ defined by a formula $\psi$ if $\Forall x(\phi(x)\lto\psi(x))$; we then use $\bm C\included\bm D$ as an abbreviation of
this sentence.
Of course a \textbf{subset} of a class is a subclass that is a set.
We noted (on page \pageref{Ded-con}) that Dedekind confused the membership and subset relations; but now we must carefully distinguish.  
A class \emph{contains} its elements, but \emph{includes} its subclasses.  In particular, the successor of a set $a$ both contains $a$ and includes $a$, and it is the smallest set to do so.
A class $\bm C$ is \textbf{transitive} if it includes all of its elements, that is,
  \begin{equation*}
\Forall x(x\in\bm C\lto x\included\bm C)
\end{equation*}
or $\Forall x\Forall y(y\in x\land x\in\bm C\lto y\in\bm C)$.
A subclass $\bm C$ of a class $\bm D$ is \textbf{proper} if it is not the whole class $\bm D$; in that case we write $\bm C\pincluded\bm D$.  We can now state and prove the following.

\begin{theorem}
The class of transitive sets contains $0$ and is closed under succession, and succession is injective on this class.
\end{theorem}

\begin{proof}
Trivially $0$ is transitive.
Suppose $a$ is transitive and $b\in a\cup\{a\}$.  Then either $b\in a$ or $b=a$, and in each case $b\included a$, so $b\included a'$.  Thus $a'$ is transitive.

Finally suppose $a$ and $b$ are distinct transitive sets, and $a'\included b'$.  Then $a\in b'$, so $a\in b$ (since $a\neq b$), hence $a\included b$, and then $a\pincluded b$.  Therefore $b$ is not a subset of $a$ (since otherwise $a$ would be a proper subset of itself), so $b\notin a$.  Thus $b\notin a'$, so $b'\neq a'$.
\end{proof}

\begin{corollary}
Succession is injective on $\N$.
\end{corollary}

\begin{proof}
Containing $0$ and being closed under succession, the class of transitive sets also contains $1$; but by definition $\N$ is included in every collection, and in particular every class, that contains $1$ and is closed under succession.
\end{proof}

Thus, instead of obtaining $\N$ as a subset of $\R$, or just assuming it exists so as to satisfy the Peano axioms, we can construct $\N$, satisfying the Peano axioms, on the basis of three simple axioms about sets.

This construction does not give us $\N$ as a set.  We may promulgate the Axiom of Infinity, whereby $\N$ or rather $\{0\}\cup\N$ is a set by fiat.  But what does this mean?  We may state as an axiom that \emph{some} set contains $0$ and is closed under succession; we may even state that there is a smallest such set; but we cannot even conclude that there is a smallest such \emph{class} without something like the following axiom or rather scheme of axioms.\footnote{The theory with the Equality, Empty-Set, Adjunction, and Separation axioms is called General Set Theory by
  Boolos~\cite[p.~196]{MR1675856}, but is called  
$\mathrm{STZ}$ by Burgess~\cite[p.~223]{MR2157847},
for \textbf Szmielev and \textbf Tarski with \textbf Zermelo's Axiom
of Separation.}


\begin{axsch}[Separation]
Every subclass of a set is a set.
\end{axsch}  

If we assume now there is \emph{one} set that contains $0$ and is closed under succession, then the intersection of the class of all such sets is a set, called $\upomega$; and no proper subclass of this has the same closure properties.  But even without the assumption, we can obtain $\upomega$ as a class.  One way to do this is as follows.
A class $\bm C$ is \textbf{well-founded} if each of its nonempty sub\emph{sets} has an element that is disjoint from that subset, that is,
  \begin{equation*}
\Forall x(x\included\bm C\land\Exists yy\in x\lto\Exists y(y\in x\land\Forall z(z\in y\lto z\notin x))).
\end{equation*}
A \emph{set} whose every member is the successor of a member is not well-founded.  
Now we define $\upomega$ as the class of all \emph{transitive, well-founded} sets $a$ such that
\begin{compactenum}[1)]
\item
every nonempty member of $a$ is the successor of a member of $a$, and
\item
if $a$ is not empty, it has a member whose successor is \emph{not} in $a$.
\end{compactenum}
Then $0\in\bigcup\upomega$, and with a bit of work, $\bigcup\upomega$ is closed under succession.  By the Separation axioms, $\bigcup\upomega$ is the \emph{smallest} class that contains $0$ and is closed under succession.  Also $\bigcup\upomega=\upomega$.

Note why the members of $\upomega$ must be required to be well-founded.
Having $\N$ informally as above, we may imagine a set $\{a_k\colon k\in\N\}$, where $a_{k+1}{}'=a_k$ in each case, and also the function $k\mapsto a_k$ is injective.  This set meets the two numbered criteria for being in $\upomega$, but is not well-founded.

However, possibly $\{0\}\cup\N\cup\{a_k\colon k\in\N\}$ is a set, but its every infinite subset contains an element of $\{0\}\cup\N$.  Then the set is well-founded.  It is even transitive, if $a_k=\{0\}\cup\N\cup\{a_n\colon k<n\}$ in each case.  We have no formal way to prevent such a set from belonging to $\upomega$.
This is what troubled Skolem \cite{Skolem-some-remarks}: set theory is not categorical, but can have a non-standard model, in which the class defined by the formula for $\upomega$ is actually larger than intended.

We may assume that there is a standard model of set theory in which $\upomega$ really is $\{0\}\cup\N$; but a formalization of this would be a literally infinite statement, and we can quote such a statement only in part: 
\begin{quote}\centering
$\upomega$ contains $0$ and $0'$ and $0''$ and\dots, and nothing else.
\end{quote}

Suppose we nonetheless believe that there is still a standard model of set theory, a model in which $\upomega$ really is $\N\cup\{0\}$.  In that case, the formal sentences that are true in this model compose a collection.  Then, using the idea of G\"odel \cite{Goedel-incompl},\footnote{See note \ref{n:code} on page~\pageref{n:code}.} we can assign to each such formula a \emph{code,} which is a certain set.
Although we could avoid doing so, it will be convenient now to introduce $0$ as an official constant, and ${}'$ as an official function-symbol.
We number the symbols used in our formulas, as for example in the following scheme:
\begin{equation*}
\begin{array}{cccccccccccc}
\lto&\lnot&\Exists&(&)&\in&0&{}'&x&y&z&\dots\\\hline
0&1&2&3&4&5&6&7&8&9&10&\dots
\end{array}
\end{equation*}
Then the \textbf{code} of a formula $\phi$ is a finite sequence $\gn{\phi}$ of elements of $\upomega$.  For example, $\gn{\Exists x\lnot\;0'\in x}$ is $(2,8,1,6,7,5,8)$.
Such a sequence can be understood as a function from some element of $\upomega$ to $\upomega$; in particular, it is a certain finite \emph{set} of ordered pairs.  This is not a formal theorem of set theory; it is just the observation that, by the Empty-Set and Adjunction axioms, for every ordered pair $(a,b)$ of sets, there is a \emph{set} $\{\{a\},\{a,b\}\}$, which (as Kuratowski showed in 1921 \cite{Kuratowski}) can be identified with the ordered pair; and then every finite collection of ordered pairs can be built up one by one into a set.  

If a formula $\phi$ has length $n$, then $\gn{\phi}$ is an element of the class denoted by $\upomega^n$.  Thus the codes of all formulas belong to the class that can be denoted by $\bigcup_{n\in\upomega}\upomega^n$ or $\upomega^{<\upomega}$.
If the collection of codes of all true sentences of set theory were itself a class, then there would be a singulary formula $\theta$ defining the class of codes of all singulary formulas $\phi$ such that $\phi(\gn{\phi})$ is false.  In this case,
\begin{equation*}
\theta(\gn{\theta})\liff\lnot\theta(\gn{\theta}),
\end{equation*}
which is absurd.  This is a variant of the Russell Paradox, known as something like Tarski's Theorem on the Undefinability of Truth.  In stating his version of the theorem, Tarski \cite[p.\ 247]{MR736686} notes his debt to G\"odel.  The point now is that, if $\N$ is a set or even just a class, then the collection of codes of true sentences of set theory is not a class.  In short, some mathematical collections are definitely not classes.  
%We may then wish not to assume that $\N$ is a class.

In the present context, a form of G\"odel's Incompleteness Theorem can be established as follows.
The collection of codes of formally provable sentences is a class, at least on the assumption that $\N$ is a class.  Then there is a class consisting of the codes of all singulary formulas $\phi$ such that $\lnot\phi(\gn{\phi})$ has a formal proof.  This last class is defined by a formula $\psi$.  Then $\psi(\gn{\psi})$ is true if and only if the code of its negation is in the class of codes of provable sentences.  If this class never contains the codes of both a sentence and its negation, our set theory is \textbf{consistent}; but in this case neither $\psi(\gn{\psi})$ nor its negation is provable, although $\lnot\psi(\gn{\psi})$ is true.

If we do not want to assume that $\N$ is a class, we can still argue as follows.  There is a smallest class $\bm C$ comprising the codes of provable sentences.  Let $\psi$ define the class of codes $\gn{\phi}$ such that $\lnot\phi(\gn{\phi})$ is in $\bm C$.  If $\psi(\gn{\psi})$ had a formal proof, then it would be true, and so both $\psi(\gn{\psi})$ and $\lnot\psi(\gn{\psi})$ would be in $\bm C$.  If $\lnot\psi(\gn{\psi})$ had a formal proof, then both it and $\psi(\gn{\psi})$ would be true.

\small

%\bibliographystyle{amsplain}
%\bibliography{../references}
%\bibliography{references}

\def\cprime{$'$} \def\cprime{$'$} \def\cprime{$'$}
\providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
\providecommand{\MR}{\relax\ifhmode\unskip\space\fi MR }
% \MRhref is called by the amsart/book/proc definition of \MR.
\providecommand{\MRhref}[2]{%
  \href{http://www.ams.org/mathscinet-getitem?mr=#1}{#2}
}
\providecommand{\href}[2]{#2}
\begin{thebibliography}{10}

\bibitem{MR1675856}
George Boolos, \emph{Logic, logic, and logic}, Harvard University Press,
  Cambridge, MA, 1998, With introductions and an afterword by John P. Burgess,
  With a preface by Burgess and Richard Jeffrey, Edited by Jeffrey. \MR{1675856
  (2000b:03005)}

\bibitem{MR2157847}
John~P. Burgess, \emph{Fixing {F}rege}, Princeton Monographs in Philosophy,
  Princeton University Press, Princeton, NJ, 2005. \MR{MR2157847 (2006e:03006)}

\bibitem{Burris}
Stanley~N. Burris, \emph{Logic for mathematics and computer science}, Prentice
  Hall, Upper Saddle River, New Jersey, USA, 1998.

\bibitem{MR18:631a}
Alonzo Church, \emph{Introduction to mathematical logic. {V}ol. {I}}, Princeton
  University Press, Princeton, N.~J., 1956. \MR{18,631a}

\bibitem{Dedekind-German}
Richard Dedekind, \emph{Was sind und was sollen die {Z}ahlen?}, Friedrich
  Vieweg, Braunschweig, 1893.

\bibitem{MR0159773}
Richard Dedekind, \emph{Essays on the theory of numbers. {I}: {C}ontinuity and
  irrational numbers. {II}: {T}he nature and meaning of numbers}, authorized
  translation by Wooster Woodruff Beman, Dover Publications Inc., New York,
  1963. \MR{MR0159773 (28 \#2989)}

\bibitem{MR0001234}
John Dyer-Bennet, \emph{A theorem on partitions of the set of positive
  integers}, Amer. Math. Monthly \textbf{47} (1940), 152--154. \MR{MR0001234
  (1,201b)}

\bibitem{MR1801397}
Herbert~B. Enderton, \emph{A mathematical introduction to logic}, second ed.,
  Harcourt/Academic Press, Burlington, MA, 2001. \MR{1801397 (2001h:03001)}

\bibitem{MR1932864}
Euclid, \emph{Euclid's {E}lements}, Green Lion Press, Santa Fe, NM, 2002, All
  thirteen books complete in one volume, the Thomas L. Heath translation,
  edited by Dana Densmore. \MR{MR1932864 (2003j:01044)}

\bibitem{Gauss}
Carl~Friedrich Gauss, \emph{Disquisitiones arithmeticae}, Springer-Verlag, New
  York, 1986, Translated into English by Arthur A. Clarke, revised by William
  C. Waterhouse.

\bibitem{Goedel-compl}
Kurt G{\"o}del, \emph{The completeness of the axioms of the functional calculus
  of logic (1930a)}, From Frege to G{\"o}del (Jean van Heijenoort, ed.),
  Harvard University Press, 1976, pp.~582--91.

\bibitem{Goedel-incompl}
\bysame, \emph{On formally undecidable propositions of {\emph{{p}rincipia
  mathematica}} and related systems {I} (1931)}, From Frege to G{\"o}del (Jean
  van Heijenoort, ed.), Harvard University Press, 1976, pp.~596--616.

\bibitem{MR961440}
Paul~R. Halmos, \emph{I want to be a mathematician}, MAA Spectrum, Mathematical
  Association of America, Washington, DC, 1985, An automathography in three
  parts. \MR{961440 (89e:01044)}

\bibitem{MR0033781}
Leon Henkin, \emph{The completeness of the first-order functional calculus}, J.
  Symbolic Logic \textbf{14} (1949), 159--166. \MR{MR0033781 (11,487d)}

\bibitem{MR0120156}
\bysame, \emph{On mathematical induction}, Amer. Math. Monthly \textbf{67}
  (1960), 323--338. \MR{MR0120156 (22 \#10913)}

\bibitem{MR94e:03002}
Wilfrid Hodges, \emph{Model theory}, Encyclopedia of Mathematics and its
  Applications, vol.~42, Cambridge University Press, Cambridge, 1993.
  \MR{94e:03002}

\bibitem{Kuratowski}
Casimir Kuratowski, \emph{Sur la notion d'ordre dans la th{\'e}orie des
  ensembles}, Fundamenta Mathematicae (1921), 161--71.

\bibitem{MR12:397m}
Edmund Landau, \emph{Foundations of analysis. {T}he arithmetic of whole,
  rational, irrational and complex numbers}, third ed., Chelsea Publishing
  Company, New York, N.Y., 1966, translated by F. Steinhardt; first edition
  1951; first German publication, 1929. \MR{12,397m}

\bibitem{Lang-alg}
Serge Lang, \emph{Algebra}, third ed., Addison-Wesley, Reading, Massachusetts,
  1993, reprinted with corrections, 1997.

\bibitem{MR941522}
Saunders Mac~Lane and Garrett Birkhoff, \emph{Algebra}, third ed., Chelsea
  Publishing Co., New York, 1988. \MR{MR941522 (89i:00009)}

\bibitem{Mazur-Th}
Barry Mazur, \emph{How did {T}heaetetus prove his theorem?},
  \url{http://www.math.harvard.edu/~mazur/preprints/Eva.pdf}, accessed
  Sept.~24, 2008.

\bibitem{Peano}
Giuseppe Peano, \emph{The principles of arithmetic, presented by a new method
  (1889)}, From {F}rege to {G}{\"o}del (Jean van Heijenoort, ed.), Harvard
  University Press, 1976, pp.~83--97.

\bibitem{Russell-letter}
Bertrand Russell, \emph{Letter to {F}rege (1902)}, From Frege to G{\"o}del
  (Jean van Heijenoort, ed.), Harvard University Press, 1976, pp.~124--5.

\bibitem{Skolem-some-remarks}
Thoralf Skolem, \emph{Some remarks on axiomatized set theory (1922)}, From
  Frege to G{\"o}del (Jean van Heijenoort, ed.), Harvard University Press,
  1976, pp.~290--301.

\bibitem{MR736686}
Alfred Tarski, \emph{Logic, semantics, metamathematics}, second ed., Hackett
  Publishing Co., Indianapolis, IN, 1983, Papers from 1923 to 1938, Translated
  by J. H. Woodger, Edited and with an introduction by John Corcoran.
  \MR{736686 (85e:01065)}

\bibitem{PM}
Alfred~North Whitehead and Bertrand Russell, \emph{Principia mathematica},
  vol.~I, University Press, Cambridge, 1910.

\bibitem{Zermelo-invest}
Ernst Zermelo, \emph{Investigations in the foundations of set theory {I}
  (1908a)}, From Frege to G{\"o}del (Jean van Heijenoort, ed.), Harvard
  University Press, 1976, pp.~199--215.

\end{thebibliography}


\end{document}