## 1 Introduction

Though interest in artificial intelligence and machine learning have always been high, the public’s exposure to successful applications has markedly increased in recent years. From consumer-oriented applications like recommendation engines, speech face recognition, and text prediction to prominent examples of superhuman performance (DeepMind’s AlphaGo, IBM’s Watson), the impressive results of machine learning continue to grow.

Though the understandable excitement around the expanding catalog of
successful applications lends a kind of mystique, neural networks and
the algorithms which train them are, at their core, a special kind of
computer program. One perspective on programs which is relevant in this
domain are so-called *state-and-effect triangles*, which emphasize
the dual nature of programs as both state and predicate transformers.
This framework originated in quantum computing, but has a wide variety
of applications including deterministic and probabilistic
computations [Jacobs17b].

The common two-pass training scheme in neural networks makes their dual role particularly evident. Operating in the “forward direction” neural networks are like a function: given an input signal they behave like (a mathematical model of) a brain to produce an output signal. This is a form of state transformation. In the “backwards direction”, however, the derivative of a loss function with respect to the output of the network is

*backpropagated*[Rumelhart86] to the derivative of the loss function with respect to the inputs to the network. This is a kind of predicate transformation, taking a real-valued predicate about the loss at the output and producing a real-valued predicate about the source of loss at the input. The main novel perspective offered by this paper uses such state-and-effect ‘triangles’ for neural networks. We expect that such more formal approaches to neural networks can be of use in trends towards

*explainable*AI, where the goal is to extend automated decisions/classifications with human understandable explanations.

In recent years, it has become apparent that the architecture of a
neural network is very important for its accuracy and trainability in
particular problem domains [Goodfellow16]. This has resulted in a
profligation of specialized architectures, each adapted to its
application. Our goal here is not to express the wide variety of
special neural networks in a single framework, but rather to describe
neural networks generally as an instance of this duality between state
and predicate transformers. Therefore, we shall work with a simple,
suitably generic neural network type called the *multilayer
perceptron* (MLP).

We see this paper as one of recent steps towards the application of modern semantical and logical techniques to neural networks, following for instance [FongST17, GhicaMCDR18].

*Outline.* In this paper, we begin by describing MLPs, the layers
they are composed of, and their forward semantics as a state
transformation (Section 2). In Section 3, we give the corresponding
backwards transformation on loss functions and use that to formulate
backpropagation in Section 4. Finally, in Section 5, we discuss the
compositional nature of backpropagation by casting it as a functor,
and compare our work in particular to [FongST17].

## 2 Forward state transformation

Much like ordinary programs, neural networks are often subdivided into
functional units which can then be composed both in sequence and in
parallel. These subnetworks are usually called *layers*, and the
sequential composition of several layers is by definition a “deep”
network^{1}^{1}1In contrast, the “width” of a layer typically
refers to the number of input and output units, which can be thought
of as the repeated parallel composition of yet another
architecture.. There are a number of common layer types, and a
neural network can often be described by naming the layer types and
the way these layers are composed.

*Feedforward networks* are an important class of neural networks
where the composition structure of layers forms a directed acyclic
graph—the layers can be put in an order so that no layer is used as
the input to an earlier layer. A *multilayer perceptron* is a
particular kind of feedforward network where all layers have the same
general architecture, called a fully-connected layer, and are composed
strictly in sequence. As mentioned in the introduction, the MLP is
perhaps the prototypical neural network architecture, so we treat this
network type as a representative example. In the sequel, we will use
the phrase “neural network” to denote this particular network
architecture.

More concretely, a layer consists of two lists of nodes with directed edges between them. For instance, a neural network with two layers may be depicted as follows.

Comments

There are no comments yet.