Inner-product spaces

The vector-space Rⁿ, together with the dot-product, is an example of an inner-product space. In particular, R itself (as a vector space), with multiplication, is an inner-product space.

Definitions

Suppose V is a a vector-space. A function f that converts ordered pairs of vectors in V into scalars is a bilinear form if it satisfies the rules:

f(u + v, w) = f(u, w) + f(v, w)
f(ku, v) = k·f(u, v)
f(u, v + w) = f(u, v) + f(u, w)
f(u, kv) = k·f(u, v).

If f is a bilinear form, it is symmetric if it satisfies:

f(u, v) = f(v, u) .

If f is a symmetric bilinear form, it is positive-definite if it satisfies:

f(u, u) > 0 when u is not 0.

(Note that f(0,0) = 0 for any bilinear form f.)

An inner product is a positive-definite symmetric bilinear form.

An inner-product space is a vector space with an inner product; usually the inner product is denoted by angle-brackets, so that <u, v> is the scalar that results from applying the inner product to the pair (u, v) of vectors. By positive-definiteness, the norm |v| of a vector v in an inner-product space can defined to be the nonnegative square root of <v, v>.

Examples on `Rⁿ`

There are bilinear forms that are not symmetric, and symmetric bilinear forms that are not positive-definite.

Indeed, suppose f is a bilinear form on Rⁿ. Then f(u, v) is the sum of the terms

a_iju_jv_i ,

where a_ij = f(e_j, e_i), and where i and j range from 1 to n. Therefore

f(u, v) = v^TAu ,

where A is the matrix (a_ij)_ij. The bilinear form f is hence symmetric if and only if the matrix A is symmetric.

Suppose that the bilinear form f is symmetric, and that the matrix A just named is diagonalizable; say

A = PDP^-1 ,

where D is a diagonal matrix with diagonal entries d_i; these entries are the eigenvalues of A. Suppose further that P^-1 = P^T (that is, P is an orthogonal matrix). Then

f(u, v) = v^TPDP^Tu = (P^Tv)^TD(P^Tu) .

Hence f(Pu, Pv) = v^TDu, and therefore

f(Pu, Pv) = d₁u₁v₁ +d₂u₂v₂ + ... + d_nu_nv_n.

In particular, f is positive-definite if and only if each eigenvalue of A is positive.

It is in fact the case (although it is a challenge to prove) that every symmetric matrix is diagonalizable by an orthogonal matrix in the way just described. So, every n×n symmetric matrix A with positive eigenvalues determines on Rⁿ the inner product given by the formula,

<u, v> = v^TAu .

A special case is when A is diagonal; then the inner product is a so-called weighted-Euclidean inner product. If A is the identity matrix, then the inner product is just the Euclidean inner product, which is the dot-product.

Examples not on `Rⁿ`

In the vector space P_n, comprising the polynomials in one variable of degree no greater than n, any n+1 distinct numbers, say x₀, x₁, ..., x_n, determine an inner product, namely the operation < , > given by
<p, q> = p(x₀)q(x₀) + p(x₁)q(x₁) + ... + p(x_n)q(x_n) .
On the space of continuous functions on some interval [a,b], there is an inner product such that

<f, g> =ò_a^b fg ,

the definite integral from a to b of the product fg.

Orthogonality

In an inner-product space, two vectors are orthogonal if their product is zero.

If V is an inner-product space, and W is a subspace of V, then the orthogonal complement of W in V is the subspace of V comprising every vector that is orthogonal to every vector of W. (The orthogonal complement of W is often denoted W^{^}, that is, W-with-superscript-upside-down-T.)

Note that the orthogonal complement of W in V is a subset of V, by definition; it is a subspace, by proof.

We can restate a fact noted earlier, thus:

Theorem. With respect to the dot-product, the orthogonal complement of the row space of a matrix is precisely the nullspace of the matrix.

You can show that if the rank of a matrix A is equal to the number of columns of A, then the matrix A^TA is invertible. From this, you can show:

Theorem. If A is an arbitrary m×n matrix, and b is a vector in R^m, then the linear system

A^TAx = A^Tb

is consistent, and there is a unique vector v in the column space of A such that v = Ax for some solution x of the given system.

The unique vector v in the Theorem is such that v - b is orthogonal to the column space of A. It can therefore be called the (orthogonal) projection of b into the column space of A; it is the vector in the column space whose distance from b is minimal.

The system A^TAx = A^Tb is called the normal system associated with Ax = b; a solution to the normal system is a least-squares solution to the system Ax = b itself (even though this system may be inconsistent).

If the matrix A has a single column, namely the vector a, then the vector v of the Theorem is just the vector

((b·a)/(a·a))a,

where ( · ) is the dot-product.

If u and v are vectors in an arbitrary inner-product space V, the vector

(<u, v>/<v, v>)v

is the (orthogonal) projection of u onto v, and is denoted by proj_v(u). In terms of the operation of projection, we can describe the Gram-Schmidt Process as follows:

Suppose {u₁, u₂, ..., u_r} is a finite, linearly independent set, spanning a subspace W of an inner-product space V. We can find a set {v₁, v₂, ..., v_r} such that:

v₁ is a non-zero scalar multiple of u₁;
v_s+1 is a non-zero scalar multiple of:
u_s+1 - proj_v₁(u_s+1) - proj_v₂(u_s+1) - ... - proj_{v_s}(u_s+1)
whenever 1 < s < r.

Then in fact:

span{v₁, v₂, ..., v_s+1} = span{u₁, u₂, ..., u_s+1} whenever s < r;
v_i is orthogonal to v_j whenever i is different from j.

The latter property means that {v₁, v₂, ..., v_r} is an orthogonal basis of its span; the former property means that this span is W. If u is a vector in V, then the vector

proj_v₁(u) + proj_v₂(u) + ... + proj_{v_r}(u)

is the (orthogonal) projection of u into W; denoted by proj_W(u), it is the unique vector v in W such that u - v is in the orthogonal complement of W.

Applications

Fitting polynomial functions to points

Suppose that (x₀, y₀), (x₁, y₁), …, (x_n, y_n) are n + 1 points in the Cartesian plane with distinct x-coordinates. Suppose m is a number no greater than n. We can form an (n+1)×(m+1) matrix A whose row i is

(1 x_i x_i² ... x_i^m) .

Let y be the vector (y₀ y₁ y₂ ... y_n)^T in Rⁿ⁺¹. There is a vector a in Rⁿ⁺¹ (in fact a unique vector) such that Aa is the projection of y into the column space of A. (That is, a satisfies A^TAa = A^Ty.) Let q_m be the polynomial

a₀ + a₁x + a₂x² + ... + a_mx^m ;

then q_m is the polynomial in P_m best fitted to the given points, in the sense that the sum of the squares (q_m(x_i) - y_i)² is as small as possible. In particular, if m = n, the polynomial meets the points exactly: we have

q_n(x_i) = y_i

for each i from 0 to n. In general, q_m is the projection of q_n into P_m with respect to the inner product on P_n described above.

Approximating continuous functions by trigonometric functions

On the space of continuous functions on the interval [0, 2p] from zero to twice pi, we have an inner product as described above. With respect to this inner product, every function on the following list is orthogonal to every other function:

1; cos x, cos 2x, cos 3x, ...; sin x, sin 2x, sin 3x, ....

Except for the constant function 1, they all have the same norm, namely Öp, the square root of pi; but the norm of 1 is Ö(2p), the square root of twice pi. If f is a continuous function defined on the interval [0, 2p], then for any positive number n, we can project f into the span of the set {1, cos x, cos 2x, ..., cos nx, sin x, sin 2x, ..., sin nx}; the projection of f is the function

a₀/2 + (a₁ cos x + b₁ sin x) + (a₂ cos 2x + b₂ sin 2x) + ... + (a_n cos nx + b_n sin nx) ,

where:

a_k = (1/p)ò₀^2pf cos kx dx, the integral from zero to twice pi of f times the cosine of kx, divided by pi;
b_k = (1/p)ò₀^2pf sin kx dx, the integral from zero to twice pi of f times the sine of kx, divided by pi.

Contents | next section