Features of Connectionist Networks
Exploit parallel processing
Can be used to model multiple satisfaction of soft constraints
Do not feature explicit (content-specific) rules
Exhibit graceful degradation
Intended as models of information-processing at the algorithmic level
Capable of learning
Chapter 8: Section 8.1
Explain the basic structure and functioning of an artificial neuron.
Compare it with the structure and functioning of a real neuron.
Artificial neurons are much simpler
Do they capture the most important features?
Are artificial neural networks a good enough approximation of real neural
networks to be useful to cognitive scientists?
In many areas we need models to work with.
Ethical and practical limitations on working with real objects may be
too constraining
The critical question: Does the model function in ways that would be misleading?
Learning and Neural Networks
Neural networks are especially important for modeling learning
Physical symbol systems were not much concerned with learning
The first question to answer is one of competence:
Can a network (or any other system) learn what humans are capable of learning?
Networks and Layers
Basic distinction
Single-unit networks [single layer networks]
Multilayer networks [contain hidden layers]
Different learning rules for the two types
Single unit networks: The key parameters:
Weights assigned to inputs
Threshold function for output
Chapter 8: Section 8.2
Explain what Hebbian learning is.
Basic principle: Neurons that fire together, wire together
There are generally two types of learning : Supervised: [requires feedback]
Unsupervised: [no feedback]
Hebbian is unsupervised
This is important because much human learning is unsupervised
Hebbian Learning
Typically used in pattern association networks
Very good at generalizing patterns
Also feature in more complex learning rules (e.g. competitive learning
- later)
What is a major difference between Hebbian learning and learning via the
perceptron convergence rule?
Perceptrons are distinct from Hebbian learning in that training depends
upon the discrepancy (?) between actual output and intended output
Delta rule gives algorithm for changing threshold and weights as a function
of ? and a learning rate constant
An early example: Recognition of patterns based on the Pandemonium model
Successive levels of processing
Note: Pandemonium is still a physical symbol system, but it suggested
the network idea
Explain the concept of linear separability of functions. How does it pertain
to perceptrons?
The perceptron convergence rule will converge on a solution in every
case where a solution is possible. But which functions are these? Those
functions that are linearly separable.
The classic non-separable function is XOR: TRUE if A or B but not
both
The difficulty is in learning interactions
An example of an interaction: The concept of appropriate behavior
A person may laugh when she sees something funny, or cry when she sees
something sad.
It is considered inappropriate to laugh when you see something sad, or
cry when you see something funny.
A perceptron cannot learn this concept
Why is training such an integral part of neural network modeling?
It is one of the key features that distinguish connectionist networks
from physical symbol systems
PSS: Knowledge is built in by the modeler
This is a serious limitation
Connectionist networks: Knowledge has to be acquired
Where Are We Now?
Single unit networks can be trained, but can only compute linearly separable
functions
To compute all Turing-computable functions, we need multilayer networks.
Multilayer networks cannot be trained using the perceptron convergence
rule
A new training rule is needed
Describe the basic structure and functioning of a multi-layer network.
Input and output layers
One or more hidden layers
Three steps in the operation of each unit:
Integrate input from previous layer (if any)
Transform input to output activity, using an activation function
Send output on to next layer
Explain how the backpropagation algorithm works.
The algorithm needs to find a way of calculating error in hidden units
that do not have target activation levels
It does this by calculating for each hidden unit its degree of responsibility
for error at the output units
This error value is used to adjust the weights of the hidden units
The Competence of Backpropagation
Multilayer networks can compute any Turing-computable function
But backpropagation will not always converge on a solution
(Unlike perceptron convergence rule, which is guaranteed to find a solution
where there is one)
Backpropagation searches for an optimal solution: It can get stuck in
a local minimum
Chapter 8: Section 8.3
Describe the worries about the biological plausibility of artificial neural
networks.
No evidence that backpropagation takes place in the brain
Setting the number of hidden units is crucial - how would the brain determine
this
But the real issue may be gradient descent learning - not the specific
learning algorithm achieving it
What is a local learning algorithm? Does it represent supervised or unsupervised
learning?
An alternative to backpropagation
Backpropagation applies changes globally
Other rules apply changes to individual units
Example: Hebbian learning involves local changes
Explain what a competitive network is. How does it differ from a standard
artificial neural network?
Real neurons can function to inhibit others, as well as excite others
Competitive networks include inhibitory connections between units
They can then compete with each other
The winning unit is rewarded by being strengthened
Explain the key features of information processing in artificial neural
networks.
Three key features:
Distributed rather than localized activity
No clear difference between rules and data
Learning through experience
Explain how artificial neural networks differ from physical symbol systems.
Compare them in three important ways:
Algorithms
Representations
The nature of knowledge and intentional realism
Algorithms
Neural networks are algorithmic in a limited sense
Algorithms for updating activation levels
Learning rules are algorithmic
But not algorithmic in the same way as PSS
Algorithms are general, not task-specific
They operate at the level of individual units
Representations
Representations in a neural network need not be located in distinct physical
locations
The networks knowledge lies in its pattern of weights
and thresholds
The power of distributed (as opposed to localist) networks comes from
the fact that the network doesnt need a separate unit to code every
feature to which it is sensitive
Knowledge
Once a network has been trained, all its knowledge is encoded in a single
set of weights
This makes it difficult to think about the networks knowledge as
composed of discrete items (e.g. particular beliefs)
Where does this leave intentional realism?
How do we Explain Behavior
Intentional realism has been a powerful mechanism for explaining
behavior
Ordinary psychological explanation depends upon propositional attitudes
(beliefs) that display propositional modularity
I.e., propositional attitudes are functionally discrete, causally effective,
semantically evaluable
What is it in the network that corresponds to a particular belief in this
sense?
Functional Discreteness
Once a network is trained up, there is a single set of weights that encodes
all propositions
Given those weights, the network will produce the appropriate outputs
for each input
But, there is no sense in which the representation of a particular proposition
is responsible for the output
No independent beliefs
Causal Effectiveness
Although a number of beliefs might cause a particular behavior, in any
one situation only one of them actually does
But, in a network all propositions are jointly represented, and hence
either all are active or all are inactive
No identifiable single cause
Semantic Evaluation
We evaluate propositional attitudes in terms of their meaning
differences in how they are represented
This permits meaningful comparisons
But, the way a proposition is encoded depends upon how the other propositions
are encoded
Training the network with a new proposition can completely change the weights
So, how do we compare different networks with different patterns of weights
Next Steps
Possible solution: find a new level of description at which individual
representations will emerge
Some investigators now take the networks themselves as the units to be
studied
Are there any reasons to be skeptical whether artificial neural networks
represent a new way to think about information processing?
There were numerous problems with the PSS approach to information
processing
Networks may solve these problems, at the cost of introducing new ones