A Model of Associative Memory

How memories are encoded in neural matter is still an open question. The name for these supposed neural correlates of memory is “engram”, and papers about engrams tend to have titles like Finding the engram, Catching the engram, In search of the engram, Continuing the search for the engram, which, though I’m not an expert, makes me feel like the problem isn’t well understood.

(Also, Rite of passage of the engram and Manipulating a cocaine engram, making the practice of neuroscience sometimes sound like a fraternity hazing. Possibly related, while researching this post I learned that a typical experiment will usually involve things like shocking the feet of a fruit fly, sewing shut one eye of a barn owl, and shaving half the whiskers off a mouse.)

A popular theory is that memories are encoded as patterns of synaptic connections. Perception creates neural activity. Neural activity leaves an impression upon the brain as a pattern of modified synaptic connections (perhaps by dendritic spines, which become larger and more numerous to make the connection stronger). A later perception might partly activate this pattern, but this partial activation is often enough to activate the rest of the pattern, too. This is supposed to be a neural model of associative memory. (The tradition is to cite Proust at this point; evidently, a sponge cake was sufficient to activate in him the neural substrate of a 1,267,069 word novel. It’s remarkably illustrative, at least.)

Artificial neural networks are often used to model the networks of the brain. Feedforward networks have typically been used to model the visual system, while recurrent networks have more often been used to model memory. When an input is applied to certain of these recurrent networks, the neural activity will always converge to a stable steady state. This stable pattern of activity is supposed to be a memory, stored within the connections of the network.

Some of the most studied networks are those that are symmetrically connected, like the Hopfield network. A network is symmetrically connected if every neuron is connected with the same weight as whatever is connected to it. A symmetrically connected network with a linear activation function can, for a given set of connection weights, be activated only to a single stable steady state (whose values depend upon the input to the network). The drawback of these networks then is that the activity at future states will be independent of the activity at past states. Past recall cannot influence future recall.

Hahnloser and Seung present a model of associative memory in symmetrically connected networks using instead a threshold-linear activation function (or rectified linear function).

Graph of a rectified linear activation function.

They show that, due to some nice properties of the rectifier, such networks can in general represent multiple patterns of stable activation even for a single input. What pattern the network will fall into upon new input, depends upon what pattern it was in before. Memories linger.

Their main contribution in this paper is in classifying neurons into what they call “permitted” and “forbidden” sets, which describe what sets of neurons may be activated together in a stable steady-state. They describe a method of determining what patterns of stable activity the network can achieve.

The existence of permitted and forbidden sets suggests a new way of thinking about memory in neural networks. When an input is applied, the network must select a set of active neurons, and this selection is constrained to be one of the permitted sets. Therefore the permitted sets can be regarded as memories stored in the synaptic connections.

Symmetric Threshold-Linear Networks (a.k.a. Symmetric Rectifier Networks)

A threshold-linear network has the form \[ \dot{x} = -x + \bigg[W x + b \bigg]_+ \tag 1 \label 1 \] where \(x\) is a vector with \(n\) components representing neural activation, \(W\) an \(n \times n\) matrix representing the connection weights between neurons, \(b\) is a vector representing (constant) external input, and \([\cdot]_+ = \operatorname{max}\{0, \cdot\}\), the rectifier function. Hahnloser and Seung assume the weight matrix \(W\) is symmetric (meaning, neurons are connected symmetrically).

For a single neuron we can write \[ \dot{x}_i = -x_i + \bigg[\sum_{j=1}^n w_{ij} x_j + b_i\bigg]_+ \tag 2 \label 2 \] Whenever \(\sum_{j=1}^n w_{ij} x_j + b_i \leq 0\) the input to the neuron is 0. Its dynamics become \(\dot x_i = -x_i\) and its activation will decay exponentially to 0; it is “off”. What this means is that generally only a subset of neurons will be active at any time, and which neurons are active may change as the system evolves.

It helps to think about neurons as being active within “chambers” of the activation space. (These chambers can be found by considering when the expression inside \([\cdot]_+\) is equal to 0.) In each chamber, some of the neurons will be active and some will be inactive. Within each chamber, the network will evolve according to that chamber’s linear equations: \[ \dot{x} = -x + W_\sigma x + b_\sigma \tag 3 \label 3 \] (Here, \(\sigma\) means the set of neurons that are currently active, and \(W_\sigma\) and \(b_\sigma\) have entries set to 0 for those neurons not in \(\sigma\).) Whenever the system enters a new chamber, some neurons will switch on and some will switch off, and a new set of linear equations takes over. Each chamber has a set of eigenvectors given by \(W_\sigma\). These eigenvectors show straight line flows within that chamber.

Let’s take a look at the dynamics of a two neuron system with weight matrix \(\begin{bmatrix}0 & -\frac12 \\ -\frac12 & 0\end{bmatrix}\).

First, the rectified version. The activation space is divided into four chambers; the labels indicate which neurons are active in that chamber. Each curve represents different initialization values for the neurons; the input vector \(b\) is always the same. On the right is a plot for one initialization. In this example, the network always converges to a single steady state, though in other networks there may be more than one.

graphs showing dynamics of a rectifier network

Notice how the dynamics change when the system enters in innermost chamber \(\{1,2\}\). Compare this to the same system lacking the rectifier \([\cdot]_+\); it is a linear system.

graphs showing dynamics of a network with linear activation

Three Theorems

The authors prove three theorems. The first gives the conditions under which a network will have a set of global, stable steady states (aka. globally asypmtotic fixed points, equilibrium points), depending on connection weights and input. These steady states, when they exist, are fixed points of activation to which the network will always converge.

Assuming these conditions, in the second and third theorems the authors give two possibilities for this set of steady states. The first possibility is that the network contains forbidden sets of neurons, neurons that may not be activated together at a steady state; in this case the network will be multistable: for a given input, it may converge to one of several steady states depending on initial activations. The second possibility is that there are no forbidden sets; in this case, for a given input, the network will always converge to the same steady state; as far as stable points go, it is just like a linear system, without the rectifier.

Theorem 1 - Steady States

Again, this theorem gives the conditions under which a network may have a set of stable steady states.

The authors present their results in terms of the matrix \(I-W\). We can rewrite the linear system \(\ref 3\) as \[ \dot x = (-I + W)x + b \tag 4 \] The stability of the system can be determined from the eigenvalues of the matrix \(-I + W\); specifically, the system is globally asymptotically stable if the real parts of the matrix are all negative. Since \(-I + W\) is symmetric and real, its eigenvalues will all be real; so, we are looking for negative eigenvalues. It is, however, usually more convenient to work with positive numbers, so instead we can look for positive eigenvalues of \(I - W\) (or even eigenvalues of \(W\) that are less than 1).

Theorem 1

If W is symmetric, then the following conditions are equivalent:

  1. All nonnegative eigenvectors of all principal submatrices of \(I - W\) have positive eigenvalues.
  2. The matrix \(I - W\) is copositive. That is, \(x^\top (I - W)x \gt 0\) for all nonnegative \(x\), except \(x = 0\).
  3. For all \(b\), the network has a nonempty set of steady states that are globally asymptotically stable.
plot of the Lagrange function for non-negative v on the unit circle
\(R(v)\) for \(\left\lVert v \right\rVert = 1\)

One of the things I liked about this paper was that they proved their results using methods from both Lyanpunov functions and quadratic programming. They prove that \((1)\) implies \((2)\), for instance, by minimizing \(v^\top (I - W) v\) (a quadratic function) for nonnegative vectors \(v\) on the unit sphere (that is, \(\left\lVert v \right\rVert = 1\)). The quantity \(R(v) = v^\top (I - W) v\) is equivalent to the Rayleigh quotient. Optimizing \(R\) will find the eigenvectors of the matrix \(I - W\). Because of the rectifier, neural activations (provided they start above 0) can never fall below 0. Any steady state therefore will occur along a non-negative eigenvector. This, I think, is one of the most important insights about the effect of the rectification.

Here are the authors again:

The meaning of these stability conditions is best appreciated by comparing with the analogous conditions for the purely linear network obtained by dropping the rectification from (1). In a linear network, all eigenvalues of W would have to be smaller than unity to ensure asymptotic stability. Here only nonnegative eigenvectors are able to grow without bound, due to the rectification, so that only their eigenvalues must be less than unity. All principal submatrices of W must be considered because different sets of feedback connections are active, depending on the set of neurons that are above threshold. In a linear network, \(I - W\) would have to be positive definite to ensure asymptotic stability, but because of the rectification, here this condition is replaced by the weaker condition of copositivity.

So, the tradeoff for the rectification is that we get stability for more general sets of weight matricies, but we have to analyze all \(2^n\) principal submatrices to find out if we get it.

Theorems 2 and 3 - Permitted and Forbidden Sets

These two theorems classify the permitted and forbidden sets of a network.

The first theorem tells us that if a network has a set of global, stable steady states, then all of the nonnegative eigenvectors of all principal submatrices of \(I-W\) will have positive eigenvalues. When the system begins with positive activations, the activation will flow along time-varying superpositions of the (nonnegative) eigenvectors toward some fixed point. We might think that every subsystem has to have a fixed point, then. But this is not so. It could turn out that what would be the fixed point for the subsystem lies outside of its chamber, and then the dynamics will have changed before the system ever reaches it. In this case the system has a forbidden set, because the neurons in that subsystem cannot be coactivated together at a stable steady state.

Theorem 2

If the matrix \(I - W\) is copositive, then the following statements are equivalent:

  1. The matrix \(I - W\) is not positive definite.
  2. There exists a forbidden set.
  3. The network is conditionally multistable. That is, there exists an input \(b\) such that there is more than one stable steady state.
Plots of a three neuron system with two stable points.
A three neuron system with two steady states.

They prove that (2) implies (3) by examining a Lyapunov function \(V(x) = \frac12 x^\top (I - W) x - b^\top x\). They argue as follows: a forbidden set implies the existence of a negative eigenvalue of \(I - W\) in the corresponding active submatrix. The function \(V\) therefore forms a saddle. The system can be initially activated on either side of the saddle, and will descend to a different minimum on each side. These are two different stable steady states.

3D plot of Lyapunov function and a contour plot with line given by a positive eigenvector
The Lyapunov function for a two neuron system with connection weights equal to 2. On the right, a line in the direction of an eigenvector with positive eigenvalue is in red.

Theorem 3 If \(W\) is symmetric, then the following conditions are equivalent:

  1. The matrix \(I - W\) is positive definite.
  2. All sets are permitted.
  3. For all \(b\) there is a unique steady state, and it is stable.

A linear system, like \(\ref 3\), will have a global steady state if \(I-W\) is positive definite (all eigenvalues are positive). So, in a rectified system if all the neurons may be activated together at a stable steady state, the system behaves much like a linear system in regard to its steady states. Rectified systems are more interesting when they have some forbidden sets.

If I am understanding the paper correctly, we could characterize permitted and forbidden sets like this:

permitted set forbidden set
principal submatrix with only positive eigenvalues principal submatrix with a negative eigenvalue
neurons that can be coactivated at a stable steady state neurons that cannot be coactivated at a stable steady state
positive eigenvectors and positive eigenvalues eigenvectors with negative components that give negative eigenvalues

Finally, they show with the interlacing theorem that the sets of neurons that may be coactivated together at stable states are constant in some sense throughout the system, for the reason that eigenvalues of a submatrix have to be contained in the radius of eigenvalues of the parent matrix.

Theorem 4

Any subset of a permitted set is permitted. Any superset of a forbidden set is forbidden.

Here for instance are the permitted sets for a network of ten neurons with randomly generated weights.

Diagram of permitted sets for a ten neuron network.
Permitted sets for a ten neuron network.

(This only shows “maximal” permitted sets; that is, those permitted sets not contained in any other permitted set.)

And this shows the steady state of the topmost permitted set with each neuron receiving an input of 1.

Left: Neural activations. Right: Steady states.

And here is a (different) network transitioning through stable states as inputs and activations vary.

Conclusion

If a connection pattern in a network is a memory, then multistability allows the brain to store memories much more efficiently. Patterns of activation can overlap within a network. One neuron can partake of several memories, much like a single gene can be implicated in the expression of a multitude of traits or behaviors. I imagine that whatever process the brain uses for memory storage, it must make a tradeoff between robustness and efficiency. It wants to minimize the cost of storing memories and so should use as few neurons as possible to do so, yet the death of a single neuron shouldn’t disrupt the system as a whole. The model of overlapping patterns seems to me like a plausible solution.

(I decided to read this paper after becoming interested in Carina Curto’s work on combinatorial threshold networks. She and her collaborators have extended the ideas presented here to more general threshold networks that can display various kind of dynamic behavior. I hope I can review some of her work in the future.)

Appendix - Computing Permitted Sets in Julia

using Combinatorics
using LinearAlgebra

"""Determine whether the list `l1` is a numerical translation of the
list `l2`. The function will return `true` when `l1 == k+.l2` for some `k` 
modulo `n+1`."""
function istranslation(l1, l2, n::Int)
    any([l1 == map(x -> mod(x+i, n+1), l2) for i in 1:n])
end

"""Returns a maximal set of lists from `lists` that are unique up to translation."""
function removetranslations(lists, n::Int)
    ls = []
    for l in lists
        if !any(map(x->istranslation(l, x, n), ls))
            push!(ls, l)
        end
    end
    return ls
end

"""Returns a set of lists from `lists` that are not properly contained in 
any other list."""
function removesubsets(lists)
    isproper(a, b) = issubset(a, b) && a != b
    ls = []
    for a in lists
        if !any(map(b -> isproper(a, b), lists))
            push!(ls, a)
        end
    end
    return ls
end

"""Determines whether a matrix `A` represents a permitted set of neurons. `A` 
should be of the form `I-W`, where `W` is the weight matrix."""
function ispermitted(A)
    all(map(x -> x>0, eigvals(A)))
end

"""Returns a matrix `P` of all permitted sets represented by a matrix
`A` of the form `I-W`. If neuron `j` is contained in permitted set
`i`, then `P[i,j] == 1`; otherwise, `P[i,j] == 0`. Each permitted set
is unique up to translation, and is not contained in any other
permitted set in `P`."""
function permittedparents(A)
    ps = []
    n = length(A[:,1])
    idxs = removetranslations(powerset(1:n), n)
    filter!(!isempty, idxs)
    for idx in idxs
        submatrix = A[idx, idx]
        if ispermitted(submatrix)
            push!(ps, idx) 
        end
    end
    ps = removesubsets(ps)
    P = zeros(length(ps), n)
    for (i, pp) in enumerate(ps)
        for j in pp
            P[i, j] = 1
        end
    end
    return P
end

We share a philosophy about linear algebra: we think basis-free, we write basis-free, but when the chips are down we close the office door and compute with matrices like fury.

Irving Kaplansky

Often, the first step in analyzing a problem is to transform it into something more amenable to our analysis. We would like the representation of our problem to reflect as naturally as possible whatever features of it we are most interested in. We might normalize data through a scaling transform, for instance, to eliminate spurious differences among like quantities. Or we might rotate data to align some of its salient dimensions with the coordinate axes, simplifying computations. Many matrix decompositions take the form \(M = BNA\). When \(B\) and \(A\) are non-singular, we can think of \(N\) as being a simpler representation of \(M\) under coordinate transforms \(B\) and \(A\). The spectral decomposition and the singular value decomposition are of this form.

All of these kinds of coordinate transformations are linear transformations. Linear coordinate transformations come about from operations on basis vectors that leave any vectors represented by them unchanged. They are, in other words, a change of basis.

This post came about from my frustration at not finding simple formulas for these transformations with simple explanations to go along with them. So here, I tried to give a simple exposition of coordinate transformations for vectors in vector spaces along with transformations of their cousins, the covectors in the dual space. I’ll get into matrices and some applications in a future post.

Vectors

Example - Part 1

Let’s go through an example to see how it works. (I’ll assume the field is \(\mathbb{R}\) throughout.)

Let \(e\) be the standard basis in \(\mathbb{R}^2\) and let \(e'\) be another basis where \[ \begin{array}{cc} e'_1 = \frac12 e_1, & e'_2 = -e_1 + 2 e_2 \end{array} \] So we have written the basis \(e'\) in terms of the standard basis \(e\). As vectors they look like this:

Drawing of the basis vectors e_1, e_2, and e'_1, e'_2

And each will induce its own coordinate system, indicating the angle and orientation of each axis, and each axis’ unit of measure.

Drawing of basis vectors and the coordinates they induce.

We can write any vector in \(\mathbb{R}^2\) as a linear combination of these basis elements.

\[\begin{array}{cc} v = e_1 + 2 e_2, & w' = 5 e'_1 + \frac12 e'_2 \end{array}\]

Drawing of vectors in two coordinate systems.

We call the coefficients on the basis elements the coordinates of the vector in that basis. So, in basis \(e\) the vector \(v\) has coordinates \((1, 2)\), and in basis \(e'\) the vector \(w'\) has coordinates \((5, \frac12)\).

Formulas

The choice of basis is just a choice of representation. The vector itself should stay the same. So the question is – how can we rewrite a vector in a different basis without changing the vector itself?

Let’s establish some notation. First, whenever we are talking about a vector in the abstract, let’s write \(\mathbf{v}\), and whenever we are talking about a vector represented in some basis let’s write \(v\). So the same vector \(\mathbf{v}\) might have two different basis representations \(v\) and \(v'\), which nevertheless all stand for the same vector: \(\mathbf{v} = v = v'\). However, when we write \(e\) for a basis, we mean a list of vectors \(e_i\) that form a basis in some vector space \(V\). So, \(v = v'\) always, but in general \(e \neq e'\).

Our basis elements let’s index with subscripts (like \(e_1\)), and coordinates let’s index with superscripts (like \(v^1\)). This will help us keep track of which one we’re working with. Also, let’s write basis elements as row vectors, and coordinates as column vectors. This way we can write a vector as a matrix product of the basis elements and the coordinates:

\[v = \begin{bmatrix} e_1 & e_2 \end{bmatrix}\begin{bmatrix}v^1 \\ v^2\end{bmatrix} = v^1 e_1 + v^2 e_2 \]

Now we can also write the transformation given above of \(e\) into \(e'\) using matrix multiplication:

\[e' = \begin{bmatrix}e'_1& e'_2\end{bmatrix} = \begin{bmatrix}e_1& e_2\end{bmatrix}\begin{bmatrix} \frac12 & -1 \\ 0 & 2 \end{bmatrix} = \begin{bmatrix}\frac12 e_1 & -e_1 + 2 e_2\end{bmatrix}\]

The \(2 \times 2\) matrix used in that transformation is called the transformation matrix from the basis \(e\) to the basis \(e'\).

The general formula is

\[\formbox{e' = e A}\]

where \(A\) is the transformation matrix. We can use this same matrix to transform coordinate vectors, but we shouldn’t necessarily expect that we can use the same formula. The bases and the coordinates are playing different roles here: the basis elements are vectors that describe the coordinate system, while the coordinates are scalars that describe a vector’s position in that system.

Let’s think about how this should work. Generally we write \(v = v^1 e_1 + v^2 e_2 \cdots + v^n e_n\). If we make some new basis \(e'\) by multiplying all the \(e_i\)’s by 2, say, and also multiplied all the \(v_j\)’s by 2, then we would end up with a vector four times the size of the original. Instead, we should have multiplied all the \(v_j\)’s by \(\frac12\), the inverse of 2, and then we would have \(v' = v\), as needed. The vector must be the same in either basis.

So, if we change the \(e\)’s by some factor then, the \(v\)’s need to change in the inverse manner to maintain equality. This suggests that to change \(v\) into a representation \(v'\) in basis \(e'\) we should use instead

\[\formbox{v' = A^{-1} v}\]

(We’ll prove it a little bit later.)

The fact that basis elements change in one way (\(e' = e A\)) while coordinates change in the inverse way (\(v' = A^{-1} v\)), is why we sometimes call the basis elements covariant and the vector coordinates contravariant, and distinguish them with the position of their indices.

Example - Part 2

Let’s go back to our example. Using our formula, we get

\[ v' = \begin{bmatrix}2 & 1 \\ 0 & \frac12 \end{bmatrix} \begin{bmatrix}1 \\ 2\end{bmatrix} = \begin{bmatrix}4 \\ 1\end{bmatrix} \]

But what about \(w'\)? Well, since its representation is in \(e'\), to convert in the opposite direction, to \(e\), we need to use the transformation that’s the inverse of \(A^{-1}\), namely, \(A\).

\[ w = \begin{bmatrix}\frac12 & -1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix}5 \\ \frac12\end{bmatrix} = \begin{bmatrix}2 \\ 1\end{bmatrix} \]

And now we have:

Drawing of vectors v, v', w, and w'.

Each vector is unchanged after a change of basis.

Covectors

Recall the inner product on a vector space.

We might ask, given some vector \(v\) how does an inner product vary as we range over vectors \(w\)? In this case, we could think of \(\langle v, \cdot\rangle\) as a function of vectors in \(V\) whose outputs are scalars. In fact, these sorts of functions themselves form a vector space, called the dual space of \(V\), which we write \(V^*\). The members of \(V^*\) are called linear functionals or covectors. The covector given by \(\langle v, \cdot\rangle\) we denote \(v^*\).

We’ve been working with vectors in \(\mathbb{R}^n\), and in \(\mathbb{R}^n\) the (canonical) inner product is the dot product. This means that if we denote the covectors in \(V^*\) as rows and the vectors in \(V\) as columns (as usual), then we can write

\[ v^*(w) = \begin{bmatrix} v_1 & \cdots & v_n\end{bmatrix}\begin{bmatrix}w^1 \\ \vdots\\ w^n\end{bmatrix} = v_1 w^1 + \cdots + v_n w^n \]

So, the covectors are functions \(\mathbb{R}^n \to \mathbb{R}\), but we can do computations with them just like we do with vectors, using matrix multiplication. We still write the indices of the row vectors as subscripts and the indices of the column vectors as superscripts.

If we can think about vectors in \(\mathbb{R}^n\) as arrows, how should we think about covectors? To simplify things, let’s restrict our attention to the two-dimensional case. Now, consider the action of a covector \(v^*\) on some unknown vector \(w = \begin{bmatrix}x& y\end{bmatrix}^\top\) in \(\mathbb{R}^2\):

\[ v^*(w) = v_1 x + v_2 y \]

Now if we look at the level sets of this function, \(v_1 x + v_2 y = k\), it should start to look familiar…

It’s a family of lines!

And to find out the value of \(v^*(w)\) we just count how many lines of \(v^*\) the vector \(w\) passes through (including maybe “fractional” valued lines – \(k\) doesn’t have to just be an integer). More generally, the covectors of \(\mathbb{R}^n\) can be thought of as hyperplanes, and the value of \(v^*(w)\) can be determined by how many hyperplanes of \(v^*\) the vector \(w\) passes through. And furthermore, the vector \(v\) will be the normal vector to the hyperplanes given by \(v^*\), that is, they are perpendicular.

Example - Part 3

In the standard basis, let \(v^*\) be given by \(\begin{bmatrix}1 & 2\end{bmatrix}\). Its family of lines will then be \(x + 2 y = k\). Now let \(w\) be given by \(\begin{bmatrix}2 & 1\end{bmatrix}\), and count how many lines \(w\) crosses through:

Left: Drawing of v and v^*. Right: Drawing of v^* and w.

It’s exactly the same as \(v^*(w) = 2 + 2(1) = 4\)! I think that’s pretty cool.

The Dual Basis

Okay, so what about bases in \(V^*\)? We’d like to have a basis for \(V^*\) that is the “best fit” for whatever basis we have in \(V\). This turns out to be the basis given by: \[ e^i(e_j) = \begin{cases} 1 & \text{if } i = j\\ 0 & \text{if } i \ne j \end{cases}\]

where \((e_j)\) is a basis in \(V\). Or sometimes people write instead \(e^i(e_j) = \delta^i_j\), where \(\delta^i_j\) is the Kronecker delta. We call this basis \((e^i)\) the dual basis of \((e_j)\). You can see that a basis and its dual have a kind of “bi-orthogonality” property that turns out to be very convenient.

Let’s look at formulas for changing bases now. If we have a vector \(v\) in \(V\) written as a column, how can we find the corresponding vector \(v^*\) in \(V^*\)? The obvious thing to do would be to take the transposition of \(v\). This will not always work. Recall the definitions of \(v, v', w\) and \(w'\) from the first section, and consider:

\[v^\top v = \begin{bmatrix} 1 & 2\end{bmatrix}\begin{bmatrix}1 \\ 2\end{bmatrix} = 1 + 4 = 5\]

\[v'^\top v' = \begin{bmatrix} 4 & 1\end{bmatrix}\begin{bmatrix}4 \\ 1\end{bmatrix} = 16 + 1 = 17\]

This is no good. We get two different values for \(\bar v^*(\bar w)\) depending on which basis we use, but the values of a function on a vector space shouldn’t depend on the basis. The trouble is that the dual of \((e'_i)\) isn’t the transpose of those basis vectors (they don’t satisfy the bi-orthogonality property), so the duals of those vectors represented in it won’t be the transposes of those vectors either.

This will be true for orthonormal bases, however. The standard basis \((e_i)\) is orthonormal, and the duals of the vectors represented in it will in fact be those transposes.

\[ \formbox{v^* = v^\top \text{for any vector } v \text{ written in an orthonormal basis.}} \]

Formulas

The next question is, if we perform a change of basis in \(V\), what is the corresponding change in \(V^*\)? Let’s use the same reasoning that we did before. For a vector \(w\) in \(\mathbb{R}\) and a covector \(v^*\), we have

\[ v^*(w) = v_1 w^1 + \cdots + v_n w^n \]

And so, like before, if we change the values of the \(w_j\)’s, the values of the \(v^i\)’s should change in the inverse manner to preserve equality. But \(w\) changes as \(w' = A^{-1} w\), so \(v^*\) must change as \(v'^* = v^* A\). And its basis (the dual basis) must change as its inverse: \(e'^* = A^{-1} e^*\).

\[\formbox{\begin{align} e'^* &= A^{-1} e^*\\ v'^* &= v^* A \end{align}}\]

Notice that this time the basis vectors are playing the contravariant part, while the coordinates are playing the covariant part with respect to the original vector space.

Example - Part 4

Lets continue our example. Since \(e\) is the standard basis, it is orthonormal, and we can therefore find the duals of \(v\) and \(w\) by taking transposes. We can then apply our formula to find the duals of \(v'^*\) and \(w'^*\).

\[\begin{align} v'^*(x, y) &= \begin{bmatrix}1 & 2\end{bmatrix}\begin{bmatrix}\frac12 & -1\\ 0 & 2\end{bmatrix}\begin{bmatrix}x\\ y\end{bmatrix}\\ &= \begin{bmatrix}\frac12 & 3\end{bmatrix}\begin{bmatrix}x\\ y\end{bmatrix}\\ &= \frac12 x + 3y\\ \\ w'^*(x, y) &= \begin{bmatrix}2 & 1\end{bmatrix}\begin{bmatrix}\frac12 & -1\\ 0 & 2\end{bmatrix}\begin{bmatrix}x\\ y\end{bmatrix}\\ &= \begin{bmatrix}1 & 0\end{bmatrix}\begin{bmatrix}x\\ y\end{bmatrix}\\ &= x \end{align} \]

The duals too are unchanged after a change of basis.

Left: Drawing of $v^*$ and $w^*$. Right: Drawing of $v'^*$ and $w'^*$.

Summary of Formulas

\[\formbox{\begin{array}{llr} e' &= e A & &(1)\\ v' &= A^{-1} v & &(2)\\ e'^* &= A^{-1} e^* & &(3)\\ v'^* &= v^* A & &(4) \end{array} }\]

Suppose \((1)\), that \(e' = e A\), where \(A\) is a non-singular matrix.

Proof of (2): We know \(e v = e' v'\). Now

\[ e' v' = e v = e A A^{-1} v = e' (A^{-1} v) \]

But then it must be that \(v' = A^{-1} v\) since basis representations are unique.

Proof of (4): We also know \(v'^* w' = v^* w\) for all vectors \(w\). But then

\[ v'^* w' = v^* w = v^* w = v^* A A^{-1} w = (v^* A) w' \]

for all \(w'\). So, \(v'^* = v^* A\).

Proof of (3): Lastly,

\[ v'^* e'^* = v^* e^* = v^* A A^{-1} e^* = v'^* A^{-1} e^* \]

for all \(w'\). So, \(e'^* = A^{-1}e^*\). QED