PERCEPTRON PDA See PORTABLE COMPUTERS. PERCEPTRON 1383 A perceptron is a signal transmission network consisting of sensory units (S units), association units (A units), and output or response units (R units). The receptor of the perceptron is analogous to the retina of the eye and is made of an array of sensory elements (photocells). Depending on whether or not an S-unit is excited, it produces a binary output. A randomly selected set of retinal cells isconnected to the next level of the network, the A units. Each A unit behaves like the basic building block discussed above, where the 1, - 1 weights for the inputs to each A unit are randomly assigned. The threshold for all A units is the same. + In 1957 the psychologist Frank Rosenblatt proposed “The Perceptron: a perceiving and recognizing automaton” as a class of artificial nerve nets, embodying aspects of the brain and receptors of biological systems. Fig. 1 shows the network of the Mark 1 Perceptron. Later, Rosenblatt protested that the term perceptron, originally intended as a generic name for a variety of theoretical nerve nets, was actuallyassociated with a very specific piece of hardware (Rosenblatt, 1962). Thebasicbuildingblock of a perceptron is an element that accepts a number of inputs xi, i = 1, . . . , N, and computes a weighted sum of these inputs where, for each input, its fixed weight w can be only +1 or- 1. The sum is then compared with a threshold 0, and an output y is produced that is either 0 or 1,depending on whether or not the sumexceeds the threshold. In other words, The binaryoutput Y k of the k th A unit ( k = 1, . . . , m)is multiplied by a weight ak, and a sumof all m weighted outputs is formed in a summation unit that is the same as the basic building blocks with all weights equal to 1. Each weight ak is allowed to bepositive, zero, or negative, and may change independently of other weights. The output of the perceptron is again binary, depending on athreshold that is normally set at 0 . The binary values of the output areused to distinguish two classes of patterns thatmay bepresented to theretina of a perceptron. The designof a perceptron to distinguish between two given sets of patterns involves adjusting the weights ak, k = 1, . . . , m, and the threshold 6. + Rosenblatt (1962) proposed a number of variations of the following procedure for “training” perceptrons. The set of given patterns of known classification are presented sequentially to the retina, with the complete set being repeated as often as needed. 1 Y1 Associative units Figure 1. Mark 1 Perceptronstructure. 1384 PERCEPTRON The output of the perceptron is monitored to determine whether a pattern is correctly classified. If not, the weights are adjusted according to the following “error correction” procedure: if the nth pattern was misclassified, the new value ak(n 1) for the kth weight is calculated as + ak(n + 1) = ak(n) + y d n ) x &(X), where 6(n)is 1 if the nth pattern is from class 1 and 6( n) is - 1 if the nth pattern is from class 2. No adjustment to the weight is made if a pattern is correctly classified. If there exists a set of weights such that all patterns can be correctly classified, the pattern classes are said to be linearly separable. It was conjectured by Rosenblatt that, when the pattern classes are linearly separable, the error correction “learning” procedure will converge to a set of weights that correctly classifies all the patterns. Many proofs of this perceptron convergence theorem were subsequently derived, the shortest by A. J. Novikoff. Subsequent contributions related the simple perceptron to statistical linear discriminant functions and related theerror-correction learning algorithm to gradient-descent procedures and to stochastic approximation methods that were originally developedforfinding the zeros and extremes of unknown regression functions (see e.g. Kanal, 1962). Thesimple perceptron described is a series-coupled perceptron with feed-forward connections only from S units to A units and A units to the single R unit. The weights a k , the only adaptive elements in this network, are evaluated directly in terms of the output error.This is sometimes referred to as a single-layer perceptron. There is no layer of “hidden” elements-Le. elements for which the adjustment is only indirectly related to the output error.A perceptron with one ormore layers of hidden elements is termed a multilayer perceptron. Rosenblatt investigated cross-coupled perceptrons in which connections join units of the same type, and also investigated multilayer back-coupled perceptrons, which have feedback paths from units located near the output. For series-coupled perceptrons with multiple R units, Rosenblatt proposed a “back-propagating error correction” procedure that used error from the R units to propagate correction back to the sensory end. But neither he nor others were able to demonstrate a convergent procedure for training multilayer perceptrons. Minsky and Papert (1969) proved various theorems about simple perceptrons, some of which indicated their limited pattern-classificationand function approximating capabilities. For example, they proved that the single layerperceptron could not implement the Exclusive OR logical function (see BOOLEAN ALGEBRA) and several other such predicates. Later, many who wrote on Artificial Neural Networks (ANN) would blame this book byMinsky and Papert forgreatly dampening interest and leading to ademise of funding for research on ANNs. Thesection on “Alternate Realities” in Kanal (1992) details why the blame is misplaced. As noted there, by 1962 many researchers had moved on from perceptron-type learning machines to statistical and syntactic procedures for pattern recognition. The demise of funding for perceptron-type networks should be blamed on the inadequate technology and training algorithms availableformultilayer perceptrons and the premature, overblown results promised the funding agencies. Minsky and Papert’s results did not apply to multilayer perceptrons. Research on ANNs,biologically motivated automata, and adaptive systems continued in the 1970s inEurope, Japan, theSoviet Union and the USA, but without the frenzied excitement of previous years. In a 1974 Harvard University dissertation, Paul Werbos presented a general convergence procedure for adaptively adjusting the weights of a differentiable nonlinear system so as to learn a functional relationship between the inputs and outputs of the system. The procedure calculates the derivatives of some function of the outputs, with respect to all inputs and weightsor parameters of the system, working backwards from outputs toinputs. However, this work by Werbos went essentially unnoticed, until a few years after Rumelhart, Hinton, and Williams independently popularized a special case of the general method to adjust adaptively the weights of a multilayer, feedforward perceptron for pattern classification applications when learning samples are available.This algorithm, which adapts the weights usinggradient descent, is known as error backpropagation or just backpropagation. It propagates derivatives from the output layer through each intermediate layer of the multilayer perceptron network. The resurgence of work on multilayer perceptrons and their applications in the 1980s is directly attributable to this convergent backpropagation algorithm. It has been shown that multilayer feedforward networks with a sufficient number of intermediate or “hidden” units between the input and output units have a “universal approximation” property: they can approximate nearly anyfunction to any desireddegree of accuracy. It has alsobeen shown by White that backpropagation is essentiallya special case of stochastic approximation, and once again neural network learning procedures are being shown to be intimately related to known statistical techniques (Bishop, 1995). More on recent developments in backpropagation algorithms and multilayer perceptrons may be found in Werbos (1994), Chauvinand Rumelhart (1995), and Mehrotra et al. (1997). . Bibliography 1962. Rosenblatt, F. Principles of Neurodynarnics. New York: Spartan Books. 1962. Kanal, L. “Evaluation of a Class of Pattern-Recognition Networks,” in Biological Prototypes and Synthetic Systems (eds. E. E. Bernard and M. R. Kare), 261-269. New York: Plenum Press. 1969. Minsky, M., and Papert, S. Perceptrons. Cambridge, MA: MIT Press. 1992. Kanal, L. N. “On Pattern, Categories, and Alternate Realities,” 1992 K.S. Fu award t a k a t IAPR, The Hague, in Pattern Recognition Letters, 14, 241-255. 1994. Werbos, P. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. New York: John Wiley. 1995. Chauvin, Y., and Rumelhart, D.E. (eds.) Backpropagation: Theory, Architectures, and Applications. Mahwah, NJ: Lawrence Erlbaum Associates. 1995. Bishop, C. M. Neural Networks and Pattern Recognition. Oxford: Oxford University Press. 1997. Mehrotra, K., Mohan, C. K., and Ranka, S. Elements of Artificial Neural Networks. Cambridge, MA: MIT Press. Laveen N. Kanal The main purposes of the measurement and evaluation of computer systems are to: 1. Aid in the design of hardware and software. 2. Aid in the selection of a computer system. 3. Improve the performance of an existing system. Thefirst of these must use some type of model of the systembeing designed. The latter two mayuse I Workload Figure 1. A computer system and its subsystems. PERFORMANCE MEASUREMENT AND EVALUATION 1385 actual measurements or models or some combination of the two. Measurement and evaluation of computer system performance is difficult due to the complexity of the internal structure of computer systems and because of the difficulty of describing and predicting the workload. As shown in Fig. 1, a computer system is composed of subsystems, each of which can be viewed as a system with its own workload and performance. Total system performance is related tothe performance of the subsystems, although the relationship can be complex. Computer system and subsystem performance measures fall into three categories-responsiveness, throughput, and cost. The response time for interactive commands or the turnaround time for batch jobs are typical measures of responsiveness. Throughput is a measure of the computational work accomplished by the system per unit time. There is, however, no generally acceptable definition of a unit of computational work. Measures such as jobs per unit time or transactions per unit time become meaningful only when the resource requirements of these tasks are described; this is one aspect of the workload characterization problem. The cost of a computer system is the monetary amount required to buy or lease the system. Response and throughput characteristics have to be evaluated in terms of the cost of the system. It is necessary to characterize the load on a system in order to make meaningful statements about its performance. One aspect of this problem is determining which characteristics of the load largely determine the performance measures of interest. Another is determining the values of the workload model parameters for a particular performance study and is particularly difficult if the system is not yet operational. But even with an operational system, the workload may vary with time, and the workload characteristics measured will depend on the measurement period chosen. Subsystem Primary performance measures