Chapter 8

Features of Connectionist Networks

Exploit parallel processing
Can be used to model multiple satisfaction of soft constraints
Do not feature explicit (content-specific) rules
Exhibit graceful degradation
Intended as models of information-processing at the algorithmic level
Capable of learning

Chapter 8: Section 8.1

Explain the basic structure and functioning of an artificial neuron. Compare it with the structure and functioning of a real neuron.

Artificial neurons are much simpler
Do they capture the most important features?

Are artificial neural networks a good enough approximation of real neural networks to be useful to cognitive scientists?

In many areas we need models to work with.
Ethical and practical limitations on working with real objects may be too constraining
The critical question: Does the model function in ways that would be misleading?

Learning and Neural Networks

Neural networks are especially important for modeling learning
Physical symbol systems were not much concerned with learning
The first question to answer is one of competence:
Can a network (or any other system) learn what humans are capable of learning?

Networks and Layers

Basic distinction
Single-unit networks [single layer networks]
Multilayer networks [contain “hidden” layers]
Different learning rules for the two types

Single unit networks: The key parameters:
Weights assigned to inputs
Threshold function for output

Chapter 8: Section 8.2

Explain what Hebbian learning is.

Basic principle: Neurons that fire together, wire together
There are generally two types of learning : Supervised: [requires feedback] Unsupervised: [no feedback]
Hebbian is unsupervised
This is important because much human learning is unsupervised

Hebbian Learning

Typically used in pattern association networks
Very good at generalizing patterns
Also feature in more complex learning rules (e.g. competitive learning - later)

What is a major difference between Hebbian learning and learning via the perceptron convergence rule?

Perceptrons are distinct from Hebbian learning in that training depends upon the discrepancy (?) between actual output and intended output
Delta rule gives algorithm for changing threshold and weights as a function of ? and a learning rate constant

An early example: Recognition of patterns based on the Pandemonium model
Successive levels of processing
Note: Pandemonium is still a physical symbol system, but it suggested the network idea

Explain the concept of linear separability of functions. How does it pertain to perceptrons?

The perceptron convergence rule will converge on a solution in every case where a solution is possible. But which functions are these? Those functions that are linearly separable.
The classic non-separable function is XOR: “TRUE if A or B but not both”
The difficulty is in learning interactions

An example of an interaction: The concept of “appropriate behavior”
A person may laugh when she sees something funny, or cry when she sees something sad.
It is considered inappropriate to laugh when you see something sad, or cry when you see something funny.
A perceptron cannot learn this concept

Why is training such an integral part of neural network modeling?

It is one of the key features that distinguish connectionist networks from physical symbol systems
PSS: Knowledge is built in by the modeler
This is a serious limitation
Connectionist networks: Knowledge has to be acquired

Where Are We Now?

Single unit networks can be trained, but can only compute linearly separable functions
To compute all Turing-computable functions, we need multilayer networks.
Multilayer networks cannot be trained using the perceptron convergence rule
A new training rule is needed

Describe the basic structure and functioning of a multi-layer network.

Input and output layers
One or more hidden layers

Three steps in the operation of each unit:
Integrate input from previous layer (if any)
Transform input to output activity, using an activation function
Send output on to next layer

Explain how the backpropagation algorithm works.

The algorithm needs to find a way of calculating error in hidden units that do not have target activation levels
It does this by calculating for each hidden unit its degree of “responsibility” for error at the output units
This error value is used to adjust the weights of the hidden units

The Competence of Backpropagation

Multilayer networks can compute any Turing-computable function
But backpropagation will not always converge on a solution
(Unlike perceptron convergence rule, which is guaranteed to find a solution where there is one)
Backpropagation searches for an optimal solution: It can get stuck in a local minimum

Chapter 8: Section 8.3

Describe the worries about the biological plausibility of artificial neural networks.

No evidence that backpropagation takes place in the brain
Setting the number of hidden units is crucial - how would the brain determine this
But the real issue may be gradient descent learning - not the specific learning algorithm achieving it

What is a local learning algorithm? Does it represent supervised or unsupervised learning?

An alternative to backpropagation
Backpropagation applies changes globally
Other rules apply changes to individual units
Example: Hebbian learning involves local changes

Explain what a competitive network is. How does it differ from a standard artificial neural network?

Real neurons can function to inhibit others, as well as excite others
Competitive networks include inhibitory connections between units
They can then compete with each other
The “winning” unit is rewarded by being strengthened

Explain the key features of information processing in artificial neural networks.

Three key features:
Distributed rather than localized activity
No clear difference between rules and data
Learning through experience

Explain how artificial neural networks differ from physical symbol systems.

Compare them in three important ways:
Algorithms
Representations
The nature of knowledge and intentional realism

Algorithms

Neural networks are algorithmic in a limited sense
Algorithms for updating activation levels
Learning rules are algorithmic

But not algorithmic in the same way as PSS
Algorithms are general, not task-specific
They operate at the level of individual units

Representations

Representations in a neural network need not be located in distinct physical locations
The network’s “knowledge” lies in its pattern of weights and thresholds
The power of distributed (as opposed to localist) networks comes from the fact that the network doesn’t need a separate unit to code every feature to which it is sensitive

Knowledge

Once a network has been trained, all its knowledge is encoded in a single set of weights
This makes it difficult to think about the network’s knowledge as composed of discrete items (e.g. particular beliefs)
Where does this leave intentional realism?

How do we Explain Behavior

Intentional realism has been a powerful mechanism for explaining behavior
Ordinary psychological explanation depends upon propositional attitudes (beliefs) that display propositional modularity
I.e., propositional attitudes are functionally discrete, causally effective, semantically evaluable
What is it in the network that corresponds to a particular belief in this sense?

Functional Discreteness

Once a network is trained up, there is a single set of weights that encodes all propositions
Given those weights, the network will produce the appropriate outputs for each input
But, there is no sense in which the representation of a particular proposition is “responsible” for the output
No independent beliefs

Causal Effectiveness

Although a number of beliefs might cause a particular behavior, in any one situation only one of them actually does
But, in a network all propositions are jointly represented, and hence either all are active or all are inactive
No identifiable single cause

Semantic Evaluation

We evaluate propositional attitudes in terms of their meaning – differences in how they are represented
This permits meaningful comparisons
But, the way a proposition is encoded depends upon how the other propositions are encoded
Training the network with a new proposition can completely change the weights
So, how do we compare different networks with different patterns of weights

Next Steps

Possible solution: find a new level of description at which individual representations will emerge
Some investigators now take the networks themselves as the units to be studied

Class Notes: Chapter 8