Morphisms of context-free grammars

Let me begin with a little known comment by Noam Chomsky (see [GE82] p.15 or [GE04] p.42), made in response to a question on the significance of automata theory for linguistics and mathematics:

This seems to me what one would expect from applied mathematics, to see if you can find systems that capture some of the properties of the complex system that you are working with, and to ask whether those systems have any intrinsic mathematical interest, and whether they are worth studying in abstraction. And that has happened exactly at one level, the level of context-free grammar. At any other level it has not happened. The systems that capture other properties of language, for example, of transformational grammar, hold no interest for mathematics. But I do not think that is a necessary truth. It could turn out that there would be richer and more appropriate mathematical ideas that would capture other, maybe deeper properties of language than context free grammars do. In that case you have another branch of applied mathematics which might have linguistic consequences. That would be exciting.

Let me hasten to say that I do not wish to argue with Chomsky’s assessment. It would be hard to do, at any rate, since he leaves room for both possibilities: that there is no linguistic theory beyond context-free grammars that is of interest to mathematics; or perhaps there is. But I was particularly struck by the sentence The systems that capture other properties of language, for example, of transformational grammar, hold no interest for mathematics. From the 1970’s on, transformational grammar has been responsible, directly on indirectly, for much research on generalizations of automata that, instead of transforming strings to strings, transform trees to trees. Rational transducers, for example, gave rise to a variety of tree transducers (deterministic, nondeterministic, top-down, bottom-up), in no small part motivated by the desire to find a compact mathematical formalism underlying transformational grammar. In fact, transformations of parse trees, called translations in the computer science literature, are central to the contemporary theory of compilers. There has been a subtle change of perspective, though. Transformational grammar, motivated by examples such as the English passive, seeks to understand operations on tree-like structures within one given language. Compilers translate from source code to object code: from one (typically context-free) language to another.

One can appeal to an algebraic analogy at this point. If context-free languages are like algebras, then context-free grammars are like presentations of algebras via generators and relations. One can map one set of generators into another in a way that preserves relations; such a mapping induces a homomorphism of algebras. So there ought to be such a thing as mapping one context-free grammar into another in a ‘structure-preserving’ way, and this should induce a homomorphism between languages.

The goal of this note is to give one possible definition of morphism of context-free grammars. This notion will organize context-free grammars into a category [CWM] in such a way that the effects of morphisms on parse trees — these are, more or less, the ‘translations’ of computer science — become functorial. The appearance of these category-theoretic concepts is somewhat auxiliary, however, to the main enterprise, which is to understand what it means to map one grammar into another ‘in a grammatical way’.

We will be guided by four examples of grammatical operations. Keeping in mind Chomsky’s dictum, each of them arises naturally within some body of formalized mathematics — algebra or logic. Going through the motivating examples, the reader is invited to play with the following questions: Which levels of the Chomsky hierarchy do the source and target languages belong to? Which family of transformations (translations? transductions?) does the operation belong to? Each of the motivating examples is given by an explicit formal recipe. Isn’t that recipe an outright ‘morphism’?

Notation. We will consider alphabets

and context-free grammars G with productions written x → s where x ∈

and s is a string in

^∗. Neither

nor G is assumed finite. An element x of

is non-terminal if it occurs on the left-hand side of some production, and is terminal otherwise. N and T will denote the set of non-terminal and terminal symbols, respectively; so

= N ⊔T . For u, v ∈

^∗, write u ⇒ v if v is immediately derivable from u; let ⇒⁺ denote the transitive and ⇒^∗ the reflexive-transitive closure of the relation ⇒.

We will find it convenient to consider each non-terminal as a possible start symbol, and to consider strings both in the full alphabet

and in the set of terminals T . For x ∈ N, define

Thus, for non-terminal x,

_G(x) is the set of sentential forms that can be generated from x (considered as a start symbol), and L_G(x) is the usual language generated from x.

Let us recall the notion of unambiguous grammar in the form that will be most useful to us:

Definition 1.1. The context-free grammar G is unambiguous if for every non-terminal x and u ∈^∗ with x ⇒⁺u there exists exactly one pair of k-tuples

s1,s2,s3,...,sk;u1,u2,u3,...,uk

where s_i ∈ and u_i ∈^∗, such that

x → s₁s₂s₃…s_k is a production
u = u₁u₂…u_k, and
s_i ⇒^∗u_i for each 1 ≤ i ≤ k.

This is equivalent to the requirement that the parse tree of every sentential form u ∈ ˆ
L

_G(x) be unique; or, equivalently, that there exist a unique leftmost derivation, starting from x, for each u ∈

_G(x). If every non-terminal is productive, that is, L_G(x) is non-empty for all non-terminals x, then Def. 1.1 is equivalent to the unambiguity of the L_G(x) in the classical sense. However, Def. 1.1 makes sense even if some or all of the L_G(x) are empty.

Definition 1.2. For x ∈ N, let tree _G(x) denote the set of parse trees of sentential forms from _G(x), with root x. (One could just as well consider the set of leftmost or rightmost derivations, or other representatives of equivalence classes of derivations, but the formalism of trees is the handiest.) The depth of a tree is the number of nodes on the longest path from root to any leaf, minus 1. Thus, for T ∈ tree _G(x), depth(T) = 0 if and only if T consists solely of the root (which is also a leaf) x. Note that depth(T) = 1 if and only if T equals some production x → s ∈ G. Let nt(T) denote the set of leaves of T labeled by non-terminal symbols; for a node t of T , let label(t) denote the label (i.e. element of the alphabet ) at t.

Let T₁ ∈ tree _G(x) and let T₂ be a tree with a leaf t such that label(t) = x. We will skip the definition of the horticultural maneuver of grafting T₁ onto T₂ at the location t. It is the same as the composition of (chains of) productions, as the illustration(s) below will make it clear.

2. Morphisms of grammars

Let G₀ and G₁ be context-free grammars in the alphabets

₀ and

₁, with terminals T₀, T₁ and non-terminals N₀, N₁ respectively.

Definition 2.1. A morphism from G₀ to G₁ consists of the following data:

a mapping α : N₀ → N₁
a mapping β that assigns to each production x → s ∈ G₀ an element of tree _G₁(α(x))
for each production p ∈ G₀, a function γ(p,−) from nt(β(p)) to nt(p), with the property that for all t ∈ nt(β(p)), $α (label(γ (p,t))) = label(t).$

More plainly, α gives the translation of lexical categories. β specifies, for each production p : x → s in the source grammar, a parse tree in the target grammar, with root α(x). Productions of the form x → s will be translated to trees of the form β(x → s). The re-indexing map γ(p,−) associates to the location of each non-terminal symbol r occurring as a leaf in β(x → s) the location of a non-terminal symbol s in s such that α will translate s to r. This permits translation of the input parse tree by either top-down or bottom-up recursion.

Let us make this more concrete by a formalization of our motivating example (a). For the sake of readability, we will depart from the BNF convention of enclosing names of non-terminals in angle brackets; strings typeset in sans serif font, such as var and expr, should be considered as stand-alone symbols. Also, we will drop commas separating elements of a set being listed. Dots ‘…’ indicate a (potentially infinite) set indexed by the natural numbers.

Example 2.2. Consider the source alphabet

N₀	= {varexpr}
T₀	= {[,]x₁x₂…x_i…}

Let the grammar G₀ consist of the productions

var	→ x₁\|x₂\|…\|x_i\|…
expr	→ [var, var]
expr	→ [var, expr]
expr	→ [expr, var]
expr	→ [expr, expr]

Now consider the target alphabet

N₁	= {varexpr}
T₁	= {() −⋅x₁x₂…x_i…}

Let the grammar G₁ consist of the productions

var	→ x₁\|x₂\|…\|x_i\|…
expr	→ var − var\|var ⋅ var
expr	→ var − (expr)\|var ⋅ (expr)
expr	→ (expr) − var\|(expr) ⋅ var
expr	→ (expr) − (expr)\|(expr) ⋅ (expr)

There is a morphism from G₀ to G₁ with components α,β,γ defined by

α(expr) = expr and α(var) = var
β(var → x) = x for any variable x; note that γ(var → x,−) has empty domain

β(expr → [var, var]) is

expr | ( expr ) − ( expr ) | | var ⋅ var var ⋅ var

generating the string (var ⋅ var) − (var ⋅ var). Let us refer to the leaves of the above tree via their location in ‘(var ⋅ var) − (var ⋅ var)’; so the leaves labeled with non-terminals occur at {2, 4, 8, 10}. Similarly, let us refer to the leaves in nt(expr → [var, var]) through their location in the string ‘[var, var]’, i.e. {2, 4}. Then define

γ(expr → [var, var], 2)	= 2
γ(expr → [var, var], 4)	= 4
γ(expr → [var, var], 8)	= 4
γ(expr → [var, var], 10)	= 2

Visually, the re-indexing map γ(expr → [var, var],−) is indicated by the dotted and broken arrows

expr expr | | | | [ varkkgg----,--- vareeee ] ( expr ) − ( expr ) ----------------------- | | -------- ------var ⋅ var var ⋅ ----var ---------- ---------- ----------------------------------

Continuing with the next production, define

β(expr → [var,expr]) = (var ⋅ (expr)) − ((expr) ⋅ var)

(Since G₁ is unambiguous, we will identify sentential forms with their parse trees.) Using the same coding of locations as above, define

γ(expr → [var, expr], 2)	= 2
γ(expr → [var, expr], 5)	= 4
γ(expr → [var, expr], 10)	= 4
γ(expr → [var, expr], 13)	= 2

The treatment of the other two productions, and re-indexing of non-terminals therein, is analogous.

How does translation from

_G₀(expr) to

_G₁(expr) actually work? Consider a sentential form generated by G₀ from expr, say,

Since G₁ is unambiguous, the process is easiest to describe by bottom-up induction. Starting from the leaves, associate to each non-terminal symbol t in the input tree a string τ(t) from

_G₁(α(x)):

To see what is going on, let us affix subscripts to the non-terminals of the above parse tree:

Above, G₀ was an unambiguous grammar, hence one could talk of the translation of a string or of a parse tree interchangeably. The next proposition defines the effect of a morphism of grammars in general. We retain the notation of Def. 2.1.

Proposition 2.3. A morphism of grammars from G₀ to G₁ induces, for each x ∈ N₀, a mapping

τ : treeG0(x ) → treeG1(α(x)).

Indeed, for T ∈ tree _G₀(x), define τ(T) ∈ tree _G₁(α(x)) by induction on the depth of T :

∙ If depth(T) = 0, then T must be x itself, and τ(T) is defined to be α(x).
∙ If depth(T) > 0, let x → s ∈ G₀ be the top production in T . Write p for x → s for brevity. Note that nt(p) can be identified with a subset of s, namely, the locations of the non-terminal symbols in s. Since G₀ is context-free, each s ∈ nt(p) induces a subtree T_s of T with s as root. For each t ∈ nt(β(p)), graft the tree τ(T_γ(p,t)) on β(p) with t as root. τ(T) is defined to be the resulting tree.

The definition makes sense: since depth(T_s) < depth(T) for any s ∈ nt(p), τ(T_s) is defined by the induction hypothesis. Note that τ(T_s) belongs to tree _G₁(α(label(s)) by the induction assumption, and α

label(γ(p,t))

= label(t) by Def. 2.1. That is, the non-terminal symbol at the root of τ(T_γ(p,t)) coincides with the non-terminal symbol at the location t. Since G₁ is a context-free grammar, the graft is well-defined, and τ(T) will belong to tree _G₁(α(x)) as desired. □

Obviously, one can rewrite the above recursive definition into an algorithm to compute τ(T) by bottom-up induction on T , from leaves toward the root. Note that if depth(T) = 1, that is, T is a production x → s in G₀, then τ(T) ends up being the same as β(T).

Indeed, for each u ∈

_G₀(x), there is a value for each parse tree T of u, namely, the string in

_G₁

α(x)

generated by τ(T).

Proposition 2.4. For any x ∈ N₀ and u ∈ L_G₀(x) with parse tree T , τ(T) generates a string in L_G₁α(x).

Proof. By induction on the depth of T . depth(T) = 0 is impossible, since x is assumed non-terminal and u is a string of terminals. If depth(T) = 1, then T consists of the single production x → u ∈ G₀. The leaves of τ(x → u) = β(x → u) must consist of terminals. Indeed, if there was a leaf labeled with a non-terminal, then γ(x → u,−) would need to map its location to the location of some non-terminal in u, but u does not contain any non-terminals. So τ(T) = β(x → u) generates a string in L_G₁α(x).

If depth(T) > 1, then τ(T) is, by the definition, the result of grafting trees of the form τ(T_s), for subtrees T_s of T , onto those leaves of β(x → s) that contain non-terminals. Using the induction hypothesis, all leaves of τ(T_s) are labeled with terminal symbols; hence τ(T) generates an element of L_G₁α(x) as well. □

Corollary 2.5. A morphism (α,β,γ) of context-free grammars from G₀ to G₁ induces a function, for each x ∈ N₀, from tree _G₀(x) to tree _G₁(α(x)). This induces, in turn, a multi-valued function from _G₀(x) to _G₁α(x), which restricts to a multi-valued function from L_G₀(x) to L_G₁α(x). If G₀ is an unambiguous grammar, then the latter two maps are single-valued.

Example 2.6. Returning to our motivating example (b), consider the source alphabet

N₀	= {expr}
T₀	= {⊛⊠x₁x₂…x_i…}

Let the grammar G₀ consist of the productions

expr	→ x₁\|x₂\|…\|x_i\|…
expr	→ expr ⊛ expr
expr	→ expr ⊠ expr

Now consider the target grammar G₁ with identical alphabet N₁ = N₀, T₁ = T₀ but productions

expr	→ x₁\|x₂\|…\|x_i\|…
expr	→⊛exprexpr
expr	→⊠exprexpr

There is a morphism from G₀ to G₁ with components α,β,γ defined by

α(expr) = expr
β(expr → x) = x for any variable x; note that γ(expr → x,−) has empty domain
β(expr → expr ⊛ expr) = ⊛exprexpr with γ(2) = 1 and γ(3) = 3
β(expr → expr ⊠ expr) = ⊠exprexpr with γ(2) = 1 and γ(3) = 3

(Since G₁ is unambiguous, there is no loss in writing the values of β as strings, as opposed to parse trees. The first argument of γ is suppressed for the sake of readability; numbers refer to locations of non-terminal symbols, as before.) For any u ∈ L_G₀(expr), the values of τ(u) will be the prefix forms of the parses of u.

Before moving on to compositions of morphisms and the rest of our motivating examples, let us make a series of remarks.

∙ The definition of morphism of grammars, as given above, appears out of the blue, and in somewhat austere generality. Admittedly, the definition, like most in the realm of algebra, is ‘experimental’, and driven by several, not easily formalizable criteria. It should cover enough cases of interest, seemingly not otherwise connected; it should possess good structural properties; and should have a family, or conceptual resemblance to other notions that have proved useful. As for the instances of morphisms of grammars in mathematical syntax, I am hopeful this article provides quite a few. The desired structure theory is phrased in the language of categories; see below. As for family resemblances, there exist significant overlaps between the formalisms of tree transducers, term rewrite systems and context-free language transformations, discussion of which would take us far afield. Suffice it to say that the notion of morphism of grammars is most similar to (and in fact, properly contains) synchronous context-free grammars (SCFG); see e.g. Chapter 23 of [AS10]. SCFG are themselves notational variants of the syntax-directed translation schemata of Aho and Ullman [AU72]. The differences are quite significant:

Thus, because of the presence of repeated variables, our motivating example (a) could not be handled by a SCFG. Nonetheless, it is fair to think of morphisms of grammars as syntax-directed translation schemes, boosted to their ‘natural level of generality’.

∙ Recall that our grammars do not contain preferred start symbols; a morphism of grammars induces a multi-valued map

for each non-terminal x in the alphabet

₀ of G₀. It may well happen that for some u ∈

₀^∗, there exist distinct x₀,x₁ ∈ N₀ such that u ∈ ˆ
L

_G₀(x₀) and u ∈ ˆ
L

_G₀(x₁), and the translation(s) into ˆ
L

_G₁ differ when u is considered as a descendant of x₀ from when it is considered a descendant of x₁.

∙ The language of iterated commutators, cf. Example 2.2, could be more succinctly defined with the help of a single non-terminal symbol expr and productions

∙ What seems to be conspicuously missing from the definition of morphism is how the terminal symbols get translated. Indeed, the function α that is part of the morphism data goes from non-terminal symbols to non-terminal symbols. Of course, the function β is responsible for the translation of terminals, since terminals occurring in the language can be reached from the source non-terminal via productions. In fact, the reader may enjoy working the following out. Let T₀, T₁ be alphabets. Recall that any map h : T₀ → T₁^∗ induces a semigroup homomorphism h : T₀^∗ → T₁^∗. (The reuse of the letter ‘h’ should cause no confusion.) For a language L ⊆ T₀^∗, h restricts to a map h : L → T₁^∗. Maps of this type are called literal homomorphisms.

Exercise. Let G₀ be a context-free grammar in the alphabet N₀ ⊔ T₀ and T₁ another set of terminals. Let h : T₀ → T₁^∗ be a map, inducing a literal homomorphism h : L_G₀(x) → T₁^∗ for each x ∈ N₀. Show that there exists a context-free grammar G₁ in the alphabet N₀ ⊔ T₁ and a morphism of grammars G₀ → G₁ whose associated translation τ : L_G₀(x) → L_G₁(x) is single-valued and satisfies τ(u) = h(u) for all u ∈ L_G₀(x), any x ∈ N₀. (Hint: extend h to a semigroup homomorphism (N₀ ⊔ T₀)^∗ → (N₀ ⊔ T₁)^∗ by setting h(x) = x for x ∈ N₀. α is the identity. Now let β(x → s) = h(s).)

That is, any literal homomorphism can be induced by a morphism of grammars. Similarly, any rational transducer (thought of as a multi-valued mapping from its domain to its range, both being rational languages) can be encoded via a morphism of grammars. The details of this encoding are straightforward, but will be skipped here. It is unlikely that the notion of morphism of grammars will have anything to add to the very fine-tuned theory of rational transducers.

The next proposition is a simultaneous extension of Prop. 2.4 and of the defining property of the re-indexing map γ from the definition of morphism.

Proposition 2.7. Let (α,β,γ) be a morphism of context-free grammars from G₀ to G₁, x ∈ N₀ and T ∈ tree _G₀(x). There is a natural map γ(T,−) from nt(τ(T)) to nt(T) such that for any t ∈ nt(τ(T)),

α( label(γ(T, t))) = label(t).

Proof. By induction on the depth of T . If depth(T) = 0 then T consists of just the root x ∈ N₀, and τ(T) is the tree containing only the root α(x) ∈ N₁. So nt(T) = {x} and nt(τ(T)) = {α(x)}; γ(T,−) is uniquely determined.

If depth(T) > 0, recall how τ(T) is defined. Let p ∈ G₀ be the top production in T . As before, this induces subtrees T_s of T with roots s ∈ nt(p). For each t ∈ nt(β(p)), graft the tree τ(T_γ(p,t)) on β(p) with t as root. τ(T) is defined to be the resulting tree.

Consider any t ∈ nt(β(p)) and let s = γ(p,t). Since depth(T_s) < depth(T), by the induction hypothesis there is a map γ(T_s,−) from nt(τ(T_s)) to nt(T_s), with α as left inverse to the action of γ(T_s,−) on labels. When grafting τ(T_s) to β(p), the domain of γ(T_s,−) can be shifted with it, to become a subset of nt(τ(T)).

However, nt(τ(T)) is the disjoint union of the various nt(τ(T_s)) grafted to β(p), with s = γ(p,t), as t ranges over nt(β(p)). γ(T,−) can thus be defined as the disjoint union of the (appropriately shifted) maps γ(T_s,−). □

Note that if T is a production x → s ∈ G₀ then γ(T,−), as constructed above, coincides with γ(x → s,−) that is part of the morphism data; there is thus no conflict of notation.

Observe also that when T is a parse tree of some string u containing only terminal symbols then the leaves of τ(T) cannot contain non-terminals either (since no map γ(T,−) with the properties above could exist); so we indeed have an extension of Prop. 2.4.

Our choice of terminology insinuates that morphisms can be composed, and, with context-free grammars as objects, form a category. We will treat this next.

Definition 2.8. Let G₀,G₁,G₂ be context-free grammars, and let (α₀₁,β₀₁,γ₀₁) be a morphism from G₀ to G₁, and (α₁₂,β₁₂,γ₁₂) a morphism from G₁ to G₂. Define their composite

(α02,β02,γ02) = (α01, β01,γ01) ⋆ (α12,β12,γ12)

a morphism from G₀ to G₂, as follows:

α₀₂ is the composite N₀ α01
−−→ ₁ α12
−−→ ₂.

Let x → s (abbreviated as p) be a production in G₀. Set β₀₂(p) = τ₁₂β₀₁(p), where τ₁₂ is the induced translation from tree _G₁(y) to tree _G₂α₁₂(y), for y ∈ N₁. Note that β₀₂(x → s) is an element of tree _G₂(α₁₂(α₀₁(x))), i.e. of tree _G₂(α₀₂(x)), as required.

γ₀₂(p,−) is to be a map from nt(β₀₂(p)) to nt(p). It is defined as the composite

γ (β (p),−) γ (p,− ) nt (τ12(β01(p)))−1−2−−01−− −→ nt(β01(p)) −−01−−−→ nt(p)

More plainly, a production p ∈ G₀ is translated by β₀₁ into a parse tree T₁ formed with G₁, which τ₁₂ translates into a parse tree T₂ formed with G₂. The re-indexing map γ₁₂(β₀₁(p),−) goes from leaves of T₂ labeled with non-terminal symbols to leaves of T₁ labeled with non-terminal symbols, followed by the re-indexing map γ₀₁(p,−) from leaves of T₁ labeled with non-terminal symbols, to leaves (i.e. letters on the right-hand side) of the production p that are non-terminal symbols.

As a continuation of Example 2.6, it is instructive at this point to construct grammars G₀, G₁, G₂ for prefix resp. postfix resp. fully parenthesized infix terms of binary function symbols ⊛ and ⊠, and morphisms G_i → G_j (i,j ∈{0, 1, 2}) that form a commutative diagram of isomorphisms. Of course, one expects more: a commutative diagram of (iso)morphisms of grammars should induce a commutative diagram of (bijective) mappings between the associated languages. That is indeed so. To prove it, we need a key structural property of τ. By definition, the translation τ(T) of a parse tree T can be generated by attaching to the translation of the top production in T the translations of the sub-trees of the top production — appropriately re-indexed. The next lemma states that the same recipe applies if one separates any top segment, not necessarily just the top production, of the input tree. The necessary re-indexing is supplied by Prop. 2.7.

Lemma 2.9. Let x ∈ N₀ and T ∈ tree _G₀(x). For each s ∈ nt(T), suppose given U_s ∈ tree _G₀(label(s)). Let T_U ∈ tree _G₀(x) be the result of grafting each U_s to s as root. Now, for each t ∈ nt(τ(T)), graft τ(U_γ(T,t)) to τ(T) with t as root. Let τ(T)_τ(U) ∈ tree _G₁(α(x)) be the resulting tree. Then τ(T_U) = τ(T)_τ(U).

Proof. By induction on depth(T). When depth(T) = 0, the lemma is a tautology. When depth(T) = 1, it is the inductive step in the definition of τ (applied to the tree T_U, whose top production is T ).

If depth(T) > 1, let p ∈ G₀ be the top production in T . As before, this induces subtrees T_r of T with roots r ∈ nt(p). The set of leaves of T with non-terminal labels, nt(T), is the disjoint union of nt(T_r) as r ranges over nt(p). For each r ∈ nt(p), let T_r,U be the tree that results from grafting U_s to s for each s ∈ nt(T_r). T_r,U is thus the same as the subtree of T_U with r as root.

τ(T_U) (by the inductive step in the definition of τ) is the result of grafting τ(T_γ(p,v),U) to v, for v ranging over nt(β(p)). Pick such a v ∈ nt(β(p)) and let r = γ(p,v). Since depth(T_r,U) < depth(T), by the induction hypothesis τ(T_r,U) is the same as the result of grafting, for each t ∈ nt(τ(T_r)), τ(U_{γ(T_r,t)}) to t as root. As v ranges over nt(β(p)), this assembles to the same tree as τ(T) with τ(U_γ(T,t)) grafted to t for each t ∈ nt(τ(T)). But that is the same as τ(T)_τ(U) by definition, completing the induction step. □

Proposition 2.10. If G₀, G₁, G₂ are context-free grammars and (α₀₁,β₀₁,γ₀₁) : G₀ → G₁ resp. (α₁₂,β₁₂,γ₁₂) : G₁ → G₂ morphisms of grammars, with composite (α₀₂,β₀₂,γ₀₂) : G₀ → G₂ and associated translation functions τ₀₁,τ₁₂ and τ₀₂. Then for all x ∈ N₀ and T ∈ tree _G₀(x),

τ12(τ01(T )) = τ02(T ) in treeG2(α02 (x )).

Proof. When depth(T) = 0, this reduces to α₁₂α₀₁(x) = α₀₂(x). When depth(T) > 0, let p be the top production in T , inducing subtrees T_s with roots s ∈ nt(p) as before. τ₀₁(T), by definition, is the result of grafting τ₀₁(T_{γ₀₁(p,t)}) to t for each t ∈ nt(β₀₁(p)). τ₁₂ of that composite tree, by Lemma 2.9, is the result of grafting τ₁₂τ₀₁(T_{γ₀₁(p,t)}), with t = γ₁₂(τ₁₂(β₀₁(p)),r), to r ∈ nt(τ₁₂(β₀₁(p))). But that is the same as the translation of T under τ₀₂, by definition of the composite of two morphisms. □

Proposition 2.11. The composition of morphisms of context-free grammars is associative. That is, if G_i (i = 0, 1, 2, 3) are context-free grammars, and μ_i,i+1 = (α_i,i+1,β_i,i+1,γ_i,i+1) morphisms from G_i to G_i+1 (here i = 0, 1, 2) then

μ01 ⋆ (μ12 ⋆ μ23) = (μ01 ⋆ μ12) ⋆ μ23.

Proof. The component α₀₃ of G₀ → G₃ is the composite

α α α N0 −−0→1 N1 −−1→2 N2 −−23→ N3.

As regards β₀₃ : given p ∈ G₀, μ₀₁ ⋆(μ₁₂ ⋆μ₂₃) associates to it τ₁₃β₀₁(p), while (μ₀₁ ⋆μ₁₂)⋆μ₂₃ sends it to τ₂₃β₀₂(p). But both of those equal τ₂₃τ₁₂(β₀₁(p)), by Prop. 2.10.

Finally, γ₀₃(p,−), computed either way, is the composite

γ23(τ12(β01(p)),−) γ12(β01(p),−) γ01(p,−) nt (τ23(τ12(β01(p))))−−− −−− −−−−→ nt (τ12(β01(p))) −−−− −−−→ nt (β01(p))−−−− −→ nt &

□

Definition 2.12. Let cfg be the category whose objects are context-free grammars, with morphisms defined by Prop. 2.1 and composition defined by Prop. 2.8. The identity morphism on G is given by (id _N, id _G, id _nt(p)), i.e. identity maps.

We are now ready to assemble Prop. 2.3, Cor. 2.5, Prop. 2.10 and Prop. 2.11 into the main theorem of this paper. Intuitively, it says that tree is a functor from cfg to the category of sets. However, since we did not include a preferred start symbol in the data for context-free grammars (and much less did we assume that any such symbol would be preserved by morphisms), the target category is slightly more complicated. Let Mor(Set) be the category of maps of sets. An object of Mor(Set) is thus a function f : X → Y between arbitrary sets; a morphism from f₁ : X₁ → Y ₁ to f₂ : X₂ → Y ₂ consists of maps u : X₁ → X₂ and v : Y ₁ → Y ₂ such that

commutes. Morphisms are composed ‘horizontally’. Mor(Set) is an example of a diagram category (see e.g. MacLane [CWM]), but an alternative way to think of it is as the category of sets fibered over a base: f : X → Y can be thought of as the family of sets f⁻¹(y) with y ∈ Y . Morphisms are then fiberwise maps.

Theorem 2.13.

tree is a functor cfg → Mor(Set). It associates to a context-free grammar G the family of sets {tree _G(x)|x ∈ N}. To a morphism of grammars G₀ → G₁ it associates the map of families α : N₀ → N₁ and τ : tree _G₀(x) → tree _G₁(α(x)), where x ∈ N₀.
Let ucfg be the full subcategory of cfg whose objects are the unambiguous context-free grammars. is a functor ucfg → Mor(Set). It associates to a context-free grammar G the family of sets {_G(x)|x ∈ N}. To a morphism of grammars G₀ → G₁ it associates the map of families α : N₀ → N₁ and f : _G₀(x) →_G₁(α(x)) with x ∈ N₀, that sends u ∈_G₀(x) to the sentential form generated by τ(T(u)), where T(u) is the (unique) parse of u.
L, sending G to the family {L_G(x)|x ∈ N}, is a subfunctor of .

There exists a well-understood interplay between rational languages, finite state automata, and monoid objects in categories; the canonical reference is Arbib [AA69]. Category-theoretic properties of cfg (for example, the existence of pullbacks, filtered colimits or coproducts) as well as the roles that morphisms, functors, natural transformations etc. may play in formal language theory at higher levels of the Chomsky hierarchy, are much less explored.

3. Looking ahead

We have only dealt with two of the motivating examples. Neither of the other two can be described by a morphism G → G where G is any of the usual unambiguous context-free grammars for first order logic, or, I suspect, any context-free grammar for it. It should come as no surprise that there are limitations to the ‘word processing power’ of morphisms, as defined above. One expects that there exists a hierarchy of mappings between context-free grammars, just as there are hierarchies of languages, complexity classes, and so on. The goal of this final — much more speculative — section is to sketch further levels of this hierarchy. But first, here is one expression of the structural limitations of morphisms.

Proposition 3.1. Suppose (α,β,γ) : G₀ → G₁ is a morphism of grammars with the property that for some constant K,

depth (β(p)) ≤ K

for all p ∈ G₀. Then for all x ∈ N₀ and T ∈ tree _G₀(x),

depth (τ(T )) ≤ K ⋅ depth (T).

The proof is by induction on depth(T). Note that such a bound K always exists if G₀ is finite; however, our grammars (and alphabets) were not assumed to be so by default.

Example 3.2. Let L be the language of function terms for an associative binary operation (denoted by juxtaposition), fully parenthesized, with infinitely many variables available. The alphabet is

N	= {expr}
T	= {()x₁x₂…x_i…}

with unambiguous grammar

expr	→ x₁\|x₂\|…\|x_i\|…
expr	→ (exprexpr)

Let τ : L → L be the mapping that sends an expression to its leftmost-parenthesized equivalent. For example,

((x5x3 )((x1x3 )x2))

is to be sent to

((((x5x3)x1)x3 )x2 )

If there was a morphism of grammars (α,β,γ) : G → G inducing τ, it would have to satisfy

β(expr → xi) = xi

for all i = 1, 2,…. Since there is only one other production in the grammar, namely,

expr → (exprexpr)

Prop. 3.1 would apply. However, for any positive integer d, let T be the term in variables x₁,x₂,…,x_2^d whose parse tree (ignoring parentheses) is the complete binary tree of depth d; e.g. for d = 3:

(((x1x2 )(x3x4 ))((x5x6 )(x7x8 )))

τ(T) is a left-branching tree, with depth 2^d.

depth(τ(T-)) { depth (T) |T ∈ treeG(expr)}

is thus unbounded, and the mapping τ cannot correspond to any morphism of grammars.

This argument does not apply to our motivating example (c), replacement of free occurrences of a variable x in the input formula ϕ by some term t, since

always. (We have silently fixed an unambiguous context-free grammar G for first order logic.) However, no morphism G → G induces τ_x→t(ϕ). The recursive rules

showing that replacement descends the parse tree along boolean connectives and quantification with respect to variables other than x, conform perfectly to the combinatorial possibilities of a self-morphism of G. However, one has

since all free occurrences of x in ϕ become bound in ∀xϕ. τ_x→t(∀xϕ) is thus not a function of τ_x→t(ϕ), since ϕ cannot in general be reconstructed from τ_x→t(ϕ). So τ_x→t cannot be computed by bottom-up induction, whereas translations induced by morphisms can always be.

Intuitively, a morphism of grammars applies the same functional transformation (itself!), iteratively, to subtrees of the input tree, whereas (∗) calls on a different transformation (namely, the identity) when the input has the form ∀xϕ. Recall that two functions f,g : ℕ → ℕ are defined by simultaneous recursion if f(0) and g(0) are given, and there exist functions F and G such that for n > 0,

Definition 3.3. Let G₀ and G₁ be context-free grammars in the alphabets N₀,T₀, N₁,T₁ as usual, and k a positive integer. A k-morphism from G₀ to G₁ defined by simultaneous recursion consists of the following data:

mappings α_i : N₀ → N₁ for i = 1, 2,…,k
mappings β_i, for i = 1, 2,…,k, assigning to each production x → s ∈ G₀ a parse tree from tree _G₁(α_i(x))
for each i = 1, 2,…,k and each production p ∈ G₀, a function γ_i(p,−) from nt(β_i(p)) to nt(p) and a function δ_i(p,−) from nt(β_i(p)) to {1, 2,…,k}, with the property that for all i = 1, 2,…,k and all t ∈ nt(β_i(p)), writing j = δ_i(p,t), $αj(label(γi(p,t))) = label(t).$

A k-morphism is, roughly, a k-tuple of grammatical transformations that are intertwined via the function δ: the i-th transformation can call on the j-th transformation to act on a subtree of the input tree. The maps α_i provide the initial values. There is no circular dependency, since each recursive call applies to a lower-level subtree of the input tree. More precisely,

Proposition 3.4. A k-morphism of grammars from G₀ to G₁ induces, for each i = 1, 2,…,k and x ∈ N₀, a mapping

τi : treeG0(x) → treeG1(αi(x)).

Proof. For T ∈ tree _G₀(x), define the τ_i(T) ∈ tree _G₁(α_i(x)) simultaneously by induction on the depth of T :

∙ If depth(T) = 0, then T must be x itself, and τ_i(T) is defined to be α_i(x).
∙ If depth(T) > 0, let x → s ∈ G₀ be the top production in T . Write p for x → s for brevity. As usual, nt(p) can be identified with a subset of s, the locations of the non-terminal symbols in s. Since G₀ is context-free, each s ∈ nt(p) induces a subtree T_s of T with s as root. For each i = 1, 2,…,k and t ∈ nt(β_i(p)), writing j = δ_i(p,t), graft the tree τ_j(T_{γ_i(p,t)}) on β_i(p) with t as root. τ_i(T) is defined to be the resulting tree.

Since depth(T_s) < depth(T) for all s ∈ nt(p), τ_j(T_s) is defined by the induction hypothesis. Note that τ_j(T_s) belongs to tree _G₁(α_j(label(s)) by the induction assumption, and α_j label(γ_i(p,t)) = label(t) by Def. 3.3. That is, the non-terminal symbol at the root of τ_j(T_{γ_i(p,t)}) coincides with the non-terminal symbol at the location t. Since G₁ is a context-free grammar, the graft is well-defined, and τ_i(T) will belong to tree _G₁(α_i(x)) as desired. □

When finding τ_i(T) by recursion from root to leaves on T , one can restrict to computing τ_j(T_s) only for those subtrees T_s of T and values j ∈{1, 2,…,k} that are called for by the indexing function δ. When using bottom-up induction, the entire k-tuple of values

τ₁(−),τ₂(−),…,τ_k(−)

needs to be computed for all subtrees of T .

Mutatis mutandis, the results of the previous section, from Prop. 2.3 to Prop. 3.1, remain valid for morphisms defined by simultaneous recursion. The composition of a k-morphism from G₀ to G₁ and n-morphism from G₁ to G₂ will be a k ⋅ n-morphism from G₀ to G₂. Composition is associative, and tree _G becomes a functor from cfg to tuples of functions of sets. The details, while not conceptually complicated, are quite tedious (largely for notational reasons) and will not be needed here.

The reader is invited to define the pair of transformations (τ_x→t, id) by simultaneous recursion on the syntax of first order logic. τ_x→t calls itself and id, while the identity transformation calls itself only. The fact that the treatment of descendant nodes is inherited from their parent nodes is reminiscent of attribute grammar.

Note that ϕ_x→t, replacing all free occurrences of the variable x in the formula ϕ by the term t, is the least complicated of the multitude of operations involving variable replacement and binding. If a free variable in t is captured by a quantifier in ϕ, then ϕ will no longer imply its instance ϕ_x→t; to preserve the intended logical meaning, the dummy variable appearing in the capturing quantifier in ϕ should be renamed first, to a variable not occurring in ϕ or t. However, the function that returns a variable not occurring in a given formula does not have a canonical value, and is not easily describable in terms of language operations. A related, and much researched, issue is the formalization of explicit substitution in lambda calculi [ES90]: under explicit substitution, the operation x → t does not belong to the meta-language, but is part of the language itself. On the other hand, there seem to exist few studies, from the viewpoint of mathematical linguistics, of the syntax of substitutions through de Bruijn indices or Bourbaki’s variable-free notation [TL99].

Prop. 3.1 does not apply either to our fourth (and last) motivating example, transforming first order formulas ϕ to negation normal form nnf(ϕ), since depth(nnf(ϕ)) ≤ depth(ϕ) always. But nnf cannot be induced by a morphism, or in fact k-morphism. The standard context-free grammars of first order logic contain the production

But β(expr →⌝expr) cannot contain any terminal symbols; any non-terminal other than ‘expr’; or more than one copy of ‘expr’: each of those possibilities would be inconsistent with the fact that nnf(⌝⌝ϕ) = nnf(ϕ). So β(expr →⌝expr) is forced to be ‘expr’, which of course is incompatible with the negation normal form of ⌝ϕ for atomic ϕ.

Recall that a term rewrite system (TRS) is an unordered set of rewrite rules acting on function terms in some fixed signature. A TRS is called convergent if it is both noetherian and confluent [TR98]. All four motivating examples, and Example 3.2 as well, belong to the family of convergent TRS, adapted from the unambiguous grammar of function terms to the general setting of context-free grammars. The fact that the effect of morphisms on parse trees can be computed by both bottom-up and top-down recursion, as well as Lemma 2.9, can be seen as corollaries of confluence.

It is quite challenging, however, to fashion a category out of convergent TRS. To begin with, neither the confluence nor the noetherianness of TRS is, in general, decidable (though, curiously, the confluence of noetherian TRS is decidable). Secondly, a famous example due to Toyama shows that the disjoint union of two convergent TRS need not be convergent. Thus the composite of two TRS cannot, in general, be defined as the disjoint union of their underlying rules. There exist, however, sufficient conditions for the modularity of convergence for TRS. Alternatively, one can experiment with ordered (prioritized) rewriting rules.

In a different direction, the notion of morphism of context-free grammars could be broadened to allow for non-determinism: several right-hand sides of the component β. Finally, the focus on parse trees is, to some extent, restrictive: the domain of these transformations could be any set of node-labeled rooted trees closed under taking subtrees.

I hope to elaborate some of these ideas in later publications. In closing, let me return to the quote from Chomsky that opened this article. Suppose that the only gift linguistics ever gave mathematics was, indeed, the notion of context-free grammar. Let’s play with this present: expand the focus from context-free grammars to maps of context-free grammars (from objects to morphisms) and I think we will agree that linguistics has given mathematics a gift that keeps on giving.

References

[AA69] M. Arbib: Theories of abstract automata. Prentice–Hall, 1969

[AS10] Algorithms and Theory of Computation Handbook. Vol 2: Special Topics and Techniques. Ed. by M. Atallah and M. Blanton. 2nd ed., Chapman & Hall, 2010

[AU72] A. Aho and J. Ullman: The Theory of Parsing, Translation, and Compiling. Vol 1: Parsing. Prentice–Hall, 1972

[CWM] S. MacLane: Categories for the Working Mathematician. 2nd ed., Springer–Verlag, 1998

[ES90] M. Abadi, L. Cardelli, P.-L. Curien, J.-J. Lévy: Explicit substitutions. Digital Systems Research Center Technical Report SRC-RR-54, 1990

[GE82] Noam Chomsky on The Generative Enterprise: A Discussion with Piny Huybregts and Henk van Riemsdijk, Foris Publications, Dordrecht–Holland 1982

[GE04] The Generative Enterprise Revisited: Discussions with Riny Huybregts, Henk Van Riemsdijk, Naoki Fukui, and Mihoko Zushi. With a New Foreword by Noam Chomsky. de Gruyter, 2004

[M10] E. Mendelson: Introduction to Mathematical Logic. 5th edition, Chapman & Hall, 2010

[TL99] A.R.D. Mathias: A term of length 4,523,659,424,929. Preprint, 1999
Available at www.dpmms.cam.ac.uk/~ardm/inefff.pdf

[TR98] F. Baader and T. Nipkow: Term Rewriting and All That. Cambridge University Press, 1998

f(n)	= Fn,f(n − 1),g(n − 1)
g(n)	= Gn,f(n − 1),g(n − 1).

⌝⌝ϕ	⇒ϕ
⌝(ϕ ∧ ψ)	⇒⌝ϕ ∨⌝ψ
⌝(ϕ ∨ ψ)	⇒⌝ϕ ∧⌝ψ
⌝∀xϕ	⇒∃x⌝ψ
⌝∃xϕ	⇒∀x⌝ψ