1 Introduction APZ 212 40 LZT 123 7917 R3A Copyright © 2004 by Ericsson AB -1- APZ 212 40 -2- 1 INTRODUCTION............................................................................3 2 APZ 212 40 HARDWARE STRUCTURE.....................................30 3 SOFTWARE STRUCTURE .........................................................62 4 APZ 212 40 HANDLING ..............................................................89 5 FAULT FINDING........................................................................125 6 APZ 212 40 - RPS .....................................................................159 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction 1 Introduction LZT 123 7917 R3A Copyright © 2004 by Ericsson AB -3- APZ 212 40 OBJECTIVES After completion of this chapter, the participants will have knowledge of: • • • • The evolution of the APZ 212 series The capacity of different APZ versions The APZ 212 40 architecture The APZ subsystems and functions Figure 1-1 Objectives INTRODUCTION The AXE 10 system is hierarchically structured into a number of functional levels. At the highest level the AXE system is divided into to the following parts: RMP, XSS and AM (APT) make up the application part of the AXE handling the payload through connection and the functions related to it. APZ the control part that contains the hardware and software required to control the operation of the AXE. The APZ consists of a duplicated Central Processors and several Regional processors as shown in the figure below. When AXE 10 was introduced in 1976, it supported the major telecommunication application, Public Switched Telephone Network (PSTN), and was based on a model in which all functionality (switching, subscriber and network access, operation and maintenance, traffic control, signalling and charging control) was handled by each node in the network. . -4- Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction Figure 1-2 Central and regional processors in AXE Since then AXE 10 has been continuously developed and today it supports a wide range of applications besides PSTN. STP Business Communication PSTN/ISDN APZ Internet Intelligent Network Operator Exchange MSC/BSC/VLR/HLR Figure 1-3 APZ Applications LZT 123 7917 R3A Copyright © 2004 by Ericsson AB -5- APZ 212 40 The AXE 10 can for example be deployed as local, transit and international exchanges that handle PSTN and Integrated Services Digital Network (ISDN). It can be set up as Operator exchanges, nodes that provide business communication and Signalling Transfer Points (STP) for signalling systems. It is used also for Intelligent Network Nodes (IN) nodes for service switching and service control for the provision of IN services such as Freephone and Virtual Private Networks (VPN). Based on the services mentioned above it is easy to see that demands on the APZ, the AXE control part, are dependent on the application being run in the exchange. Along with the general technical evolution in areas such as processor and memory capacity, this has lead to a vast variety of APZs being developed. APZ EVOLUTION The first APZ was the APZ 210 which evolved from APZ 210 03 to the 210 04, and finally to the APZ 210 06 in the 1970s. Demands on increased capacity for applications such as transit and international exchanges lead to the development of the APZ 212 which is a more powerful processor than the APZ 210. However, not all exchanges required more capacity than the APZ 210 could provide. Therefore the APZ 211 was developed. This APZ did not implement any major capacity increase compared to APZ 210 but it was cheaper to manufacture. The APZ 212 has continued to develop. Both processor and memory capacity have increased significantly, making it the most powerful processor today. Basic APZ concepts The processors in APZ 212 series constitute the majority of the installed basis in Ericcson’s customers worldwide. Although the APZ hardware and software are continuously updated and thereby many functions readapted, some basic concepts remain the same. The Central Processor in APZ 212 series up to APZ 212 33 (referred as classical APZ in this courseware) consists of two physically separated processing units, the Instruction Processing Unit (IPU) and the Signalling Processor Unit (SPU). Both processors are running operating system consisting mainly of micro program, MIP. -6- Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction The SPU distributes the jobs generated by the applications, based on their priority and the IPU executes them accordingly. While executing jobs the IPU uses the application code and data stored in the program respective data stores. The AXE hardware and software devices and units are defined in the central software. The Central Software of classical APZ 212 series is coded in PLEX and then compiled to ASA before loading that on the APZ. The applications in APZ are divided in several subsystems. The CP in APZ 212 40 still accommodates the same subsystem structure and PLEX language still used. The hardware is accessed by the APZ through the RPs distributed in the AXE node. The RPs are interconnected with the CP via the RP Bus, a cable provided physically by boards located in the RPH Magazine. Since the SPU handles the incoming jobs towards the execution, the RPHM is connected to SPU. The Regional Processor Handlers (RPH) are located in a magazine of their own in APZ 212 30. This design is mainly reused in APZ 212 40. The duplication of CPs that we see in the classic APZs is still a feature in the APZ 212 40. In addition, some typical features of the classic APZ for example the APZ 212 30, are still employed by the APZ 212 40. The APZ Operating system for APZ 212 30 still features in APZ 212 40. In the application part of the APZ most subsystems are reused without changes. Only CPS, MAS, and RPS have been modified from the APZ 212 30. Capacity The capacity of different APZ versions is interesting but the term must be carefully used since it is application system-dependent. One way of deciding the capacity is to compare two APZ versions, resulting in the relative capacity between the two. If we use this method and always refer to APZ 212 20 we receive the following information: LZT 123 7917 R3A Copyright © 2004 by Ericsson AB -7- APZ 212 40 APZ version(s).... ....is approximately.... ...times faster than... APZ 212 20 2,5 APZ 212 25 APZ 212 25 1 APZ 212 30 8 APZ 212 25 APZ 212 33 16 APZ 212 25 APZ 212 40 28 APZ 212 25 Figure 1-4 Capacity of different APZ version The table shows the large increase in capacity with the APZ 212 40. KEY FEATURES OF APZ 212 40 Power consumption Floor space Memory size Real time capacity Figure 1-5 APZ-CP characteristics comparison -8- Copyright © 2004 by Ericsson AB LZT 123 7917 R3A APZ 212 40 APZ 212 33 APZ 212 30 APZ 212 20 APZ 212 40 APZ 212 33 APZ 212 30 APZ 212 20 APZ 212 40 APZ 212 33 APZ 212 30 APZ 212 20 APZ 212 40 APZ 212 33 APZ 212 30 APZ 212 20 The APZ 212 40 is the first central processor of a new generation based on industry-standard microprocessors. 1 Introduction APZ 212 40 will be deployed as a high-capacity CP for all types of applications. It is based on a commercial processor and a commercial operating system. The estimated processing capacity is shown in the figure above. APZ 212 40/1 and /2 provide RPB-S and IPN for connecting the RPs respective APG40. The APZ 212 40/3 variant is enriched with Ethernet bus connecting RPs (Generic Application Resource Processor, Ethernet, GARPE) with Ethernet RPB interface and the APG40E. APZ 212 ARCHITECTURE The processors in APZ 212 series constitute the majority of the installed basis in Ericcson’s customers worldwide. Although the APZ hardware and software are continuously updated and thereby many functions readapted, some basic concepts remain the same. The APZ constitutes the control part of a node (AXE) and is responsible for the establishment, supervision and release of pay load and signalling connections. The control logic is implemented in the software and when application hardware is involved it is communicating with it through well defined interfaces (RPB and IPN in the figure below). The AXE hardware and software devices and units are defined in the central software. The service requests coming through the interfaces or generated in the CP software (periodical requests) are transported in form of signals and treated as jobs in CP. A job corresponds to a part of code and data in CP. A request (signal) has always the address of the requester, the priority of the signal and the reason. The signals coming in through the RP Bus are stored in buffers in the RPH Interface Boards located in the RPH magazine. The Signalling Processor Unit (SPU) is scanning the RPHI buffers and fetches the signals sorting them in priority divided buffers called Job buffers. The Instruction Processing Unit (IPU) is the processor that executes the jobs. It fetches the signal from SPU and by using its data addresses proper code stored in Program Store (PS) and Data stored in DS. IPU uses information stored in Reference Store (RS) to find the code and data for a particular job. The APZ 212 40 architecture is presented in the figure below (simplified). LZT 123 7917 R3A Copyright © 2004 by Ericsson AB -9- APZ 212 40 Application Hardware Signaling Terminal IPNX RP RP Application Software (APZ, XSS, RMP, AMs) RPH PLEX Engine SPU OS IPU CP Hw PS RS DS Figure 1-6 APZ 212 40 Architecture The APZ 212 40 is built on a modern commercial microprocessor platform. There will be no micro code in the sense that was used in classical APZ i.e. to support the processing in IPU and SPU . Instead the PlexEngine will execute the ASA code, so PlexEngine is the replacement for the micro-program (MIP). The Central Processor Unit (CPU) sides do not run in parallel synchronous (lock-step) mode. Instead, a warm stand-by principle is employed. Whenever a side switch is planned in advance (for testing purposes, maintenance or when adding/replacing hardware), a command will update the SB side to become "hot". This means that the data becomes identical in both CP sides before a side switch is initiated. In the CPU hardware of the APZ 212 40 there are two CP sides, each side has two processors. There is a new memory layout, which is record-oriented instead of variable-oriented store architecture with more focus on cache usage also. The Regional Processor Handler (RPH) design from APZ 212 33 is reused. APG 40 is the IO system used for operation with APZ 212 40. It is prepared for the Inter Platform Network (IPN) that will provide the AXE with fast Ethernet connection in the CP for connection to the I/O. - 10 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction Illustrated in the following figure are the key differences in the structure of the classic APZ and the APZ 212 40. “Classic” APZ APZ 212 40 Application SW Same APZ-CP OS Application SW APZ-CP OS Same ASA HAL MIP OS API APZ-VM compiler HW Same PLEX Engine OS (Unix) HW Figure 1-7 Classic APZ and APZ 212 40 structure The PLEX Engine and thereby its components HAL, OSI and ASA compiler is described in chapter 3. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 11 - APZ 212 40 APZ 212 40 SUBSYSTEMS APZ 212 40 is functionally structured in subsystems and function blocks. Figure 1-8 APZ 212 Subsystems The control system includes the following subsystems: - 12 - • CPS, Central Processor Subsystem . The subsystem includes the executive program with functions for administering. • CPHW, CP Hardware Subsystem platform. The subsystem contains the CP hardware platform. • PES, PLEX Engine Subsystem. The subsystem contains the PLEX engine functions. • MAS, Maintenance Subsystem. The subsystem containing maintenance functions for the central processors. • DBS, Database Management Subsystem. The subsystem containing database handling functions for AXEapplications. • RPS, Regional Processor Subsystem . RPS is divided in the following subsystems RPS-B, RPS-2 and RPS-M. The subsystems comprising the RPs with stores, executive programs, the signal terminal with control programs and maintenance functions. • RIS, Regional Inter Networking Subsystem. The subsystem contains regional software for protocol stacks and drivers. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction • ACS, Adjunct Computer Subsystem. The subsystem supplements the APZ in market solutions that require industry standard communications and/or high capacity operation. • OCS, Open Communication subsystem . The subsystem providing communications to other systems using industry standard protocols. • AES. The Adjunct Computer External Access Subsystem (AES) contains functions that interact with other platforms using standard protocols. • MCS, Man-machine Communication Subsystem. The subsystem comprising man-machine functions. • FMS, File management subsystem. The subsystem containing file management functions. . • MPS. The subsystem provides functionality for element management, which is a family of tools. PC based tools that can run on PC/Unix based workstations. The new or opened and re-designed subsystems in APZ 212 40 are shown in the figure below. Affected APZ sub-system The APZ sub-systems status in APZ 21240 development are as follows: •CPS, major re-design •MAS, major re-design •DBS, not affected •RPS, affected by FCTYPE 1 ambition in first release, OPI changes •RIS, not affected •DCS, not affected •FMS, not affected •MCS, one block affected, IOH •SPS, not affected until we support IOG20 •ACS, block APMA affected •OCS, block OCITS affected and also affected by FCTYPE 1 ambition level •MPS, not affected •CPHW, new subsystem that contains the HW parts from CPS •PES, new subsystem containing the C++ parts in the system Figure 1-9 New and opened subsystems in APZ 212 40 LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 13 - APZ 212 40 SUBSYSTEM CPS In the subsystem CPS are implemented the principles of program control and data handling in the central processor which are used by the application system and the rest of APZ. It includes software for the central operating system functions. CPS interacts with RPS hardware through the bus RPB. The communication to the MAS hardware takes place through the maintenance bus and the test bus CTB. CPS is duplicated for reasons of reliability. In a normal state one CP side is executing and the other side is stand-by and can take over the execution quickly in case of a fault. The bus UPB is used for updating between the CP sides. The CPS software interacts with the other subsystems through normal software signals and other well defined interfaces. The software in CPS is mainly written in Plex with some parts in ASA210C. CPS software is run in CP and AP. It executes by using a standardized interface in the Plex Engine Subsystem (PES). The resulting compiled code then executes on the hardware in the CP Hardware Subsystem (CPHW). The hardware in CPHW is based on a commercial processor. Some CPS function blocks have a part that is allocated in PES. This software is written in C++ and are called Clayton parts. The CPS in APZ 212 40 is divided in the following Set of Parts. Set of Parts in CPS AFCP Audit BUCP FCCP Back Up Function Change License Mgt in CP SACP TECP PARCP PCCP LOCP Size Test Loading Administration Program of Correction Alteration System AXE parameters AFBLA, AFCO, AFIO, AFMC, AFUS BUCG, BUCL, BUMS, BUO, BURH, BUS, BUSRV, LOGB LMA BUC, LAAT, LACCALL, LACI, LACO, LADS, LAFI, LAL, LALI, LALT, LAP, LAPS, LARS, LASYMB, LINKCP FCA, FCBC, FCC, FCCC, FCD, FCI, FCL, FCT PARA, PARTAB1 PARTAB2 PCA, PCI, PCS PCT SAFCI, SAFCO, SAFH SAFTAB1 Figure 1-10 CPS Set of Parts - 14 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A TED, TEI, TEM, TEO, TET TEV 1 Introduction These set of parts and function blocks contained in CPS implement the following functions: Program execution and data handling in the central processor. CPS is used for storage of the central software units which are products in the hierarchical product structure of the AXE system. The central software units are handled as independent products, both in the "physical" handling (e.g. loading, deletion, replacement) and in the "logical" handling (e.g. program interaction, data accessing). The central software units are stored in different stores. The program code is stored in the program store PS, and variables are stored in the data store DS. In order to implement the applicable principles of the program interwork and to facilitate new allocations and re-allocations of programs and variables, use is made of the reference store RS. RS is used for storage of start addresses and other information required for the addressing of programs and data. The execution of the central software units mainly takes place in the central processor CP, which is divided into a number of function blocks. Implementation has also been made in PES of special functions that are critical in time, functions that are difficult to solve in programs, or functions that will increase the capacity of the system considerably compared with program solutions. Some of the program execution functions are implemented in software, for example timing, certain supervisions. These functions are implemented in function block JOB, the job monitor. JOB also contains certain variables which are used by PES during the program execution. Examples are the storage of delayed signals and tables for regular request of the user programs. CPS software kernel Some functions in CPS are service routines which are called by software inside and outside CPS. These routines are implemented in the function blocks JOB, LARI, LAD, KEED, KECC, KELR, RTS and BIN and may be grouped as follows: LZT 123 7917 R3A • Time measurement functions (job table and time queues) (JOB). • Real time clock and calendar functions (JOB). • Arithmetic functions (JOB). Copyright © 2004 by Ericsson AB - 15 - APZ 212 40 • Interface to central system tables (LARI). • Allocation of dynamic and communication buffers (LAD). • Handling of System States and System Events (KEED). • Changing of code module for a software unit (KECC). • Load regulation (KELR). • Run-time support for HLPlex applications (RTS, BIN). Function change The following methods of function change have operational instructions that belong to CPS: • Addition and start of central software units. • Removal of central software units. • Replacement of central software units, block change method. • Replacement of central software units, side switch method. For these methods of function change, use is made of the fact that both the central processor and its stores and the regional processors are duplicated. The central processor is non-synchronously duplicated, and the regional processor pairs work under load sharing. Both the central processor and the regional processors can work in single machine operation. The function change function are implemented in the set of parts Function Change of CP (FCCP) in CPS. The loading system in CPS and the maintenance subsystem (MAS) are used in many ways at function change. - 16 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction Handling of the system backup copy A system backup copy is provided as security for use in the case of a serious system error. The system backup copy is a copy of the contents of the central processor's stores. When the maintenance subsystem (MAS) considers that the existing software is no longer functional, MAS sends an order to the bootstrap loader to reload the system with the system backup copy. The system backup copy is located on a CP file or on a dedicated area in the main store, that acts as a fast cache for the external medium. The Backup function is extensively treated in chapter 4. Loading and removal of software units and store administration. CPS contains loading functions for: • Initial loading at system start. • Reloading of the system backup copy in case of a serious system fault. • Loading during function change. Initial loading at system start can be made of the system backup copy from another exchange or system test plant. Initial loading of the system backup copy is made by a PES module called LAB. Reloading of the system backup copy in case of a serious system fault is made by LAB upon order from the maintenance subsystem MAS. At reloading, the youngest available system backup copy on line is normally selected. But the system can also be configured to automatically use an older, more proven, system backup copy if the reloading of a previous backup copy is not successful. After a successful reloading the command log associated with the system backup copy is executed automatically or manually. Relocatable software units are loaded in the case of function change. Both additional loading and loading for replacing software units can be made. There are also functions to load tables that are used for variable conversions during a function change. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 17 - APZ 212 40 The loading functions also include functions to delete software units and for store administration. The functions are implemented in the set of parts Loading Functions in CP (LOCP). The loading information is stored on a CP file. Copying of software units to an external medium. On output of central software units, they will be received in a form that agrees with an output made with the programming system APS. The output made in APZ, however, also contains program corrections and the variable data which the software unit had at the time of the output. The software unit that has been output can then be loaded again in the usual manner. The output function is implemented in function block BUC. Size alteration of data files. When extending an exchange, the data files of certain central software units must usually also be extended. CPS contains functions to perform size extension and size reduction of data files without disturbing the operation. A data file can contain a number of variables where each variable has an equal number of records (= individuals). All the data files in the system are grouped in a number of numbered size alteration cases. Global size alteration cases concern data files belonging to several software units, whereas local size alteration cases only concern one software unit. A size alteration is initiated by command or by an application program. The size alteration takes place in interaction with the fileowning software unit in such a manner that the file-owning software unit answers questions, for example whether the size alteration is permitted, and also will know when the new file size can be used. The physical size alteration in the store takes place by means of the store-administering functions of the loading system. The size alteration function is implemented in set of parts Size Alteration of Data Files (SACP). - 18 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction Program correction In case of an error in central software, the normal action is to replace the faulty software unit by a corrected version with a function change. If the fault is serious enough to require immediate correction, a program correction can be introduced. CPS contains a program correction system for this purpose. It is implemented in the set of parts Program Correction in CP (PCCP). The corrections are introduced by command in the assembler language. The main use of the program correction system is in system test plants. Tools for program test. The program test system is used for tests and fault finding in the system test plant and an operating exchange. The program test system uses PES and hardware for its functions. It is fully integrated with program testing in RP and EMRP. The functions are mainly based on the fact that supervision and tracing of a certain event can be made. When the supervised or traced event occurs, predefined tasks will be executed. Such a task can for example be storage of important data for later printout. The printout is received automatically. Supervision and tracing is obtained by setting trace bits in the program and reference stores. These trace bits are scanned continuously by PES. Checks are made whether a trace bit has been set. Supervision and tracing as well as tasks are set by command. Then a number of supervision types can be selected. For each supervision type, one or several tasks are selected which are taken from a task library. When the traced software units are forlopp adapted, the starting point for the tracings or supervisions is indicated and the test system will then automatically trace the forlopp. If the traced software units are not forlopp adapted, trace or supervision commands are used for each software unit of interest. The program test system is implemented in the set of parts Test System in CP (TECP) and function block TETM. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 19 - APZ 212 40 Measurement of the processor load. In order to verify that the system fulfills the capacity requirements and that the system acts in a correct manner in extreme load situations, CPS contains aids for various load measurements to central and regional processors. Examples of measurements are the extent to which the processors are loaded and the extent to which various buffers are filled. The time consumed by a specified job can also be measured. The functions are initiated by commands, and are implemented in function blocks MEI, MEM and MEO. The figure below shows a printout of the processor load. <plldp; PROCESSOR LOAD DATA INT PLOAD CALIM OFFDO OFFDI FTCHDO FTCHDI OFFMPH OFFMPL FTCHMPH FTCHMPL 1 60 25000 0 5523 0 5523 0 0 0 0 2 60 25000 0 5524 0 5524 0 0 0 0 3 60 25000 0 5527 0 5527 0 0 0 0 4 60 25000 0 5540 0 5540 0 0 0 0 5 60 25000 0 5526 0 5526 0 0 0 0 6 60 25000 0 5524 0 5524 0 0 0 0 7 60 25000 0 5544 0 5544 0 0 0 0 8 60 25000 0 5521 0 5521 0 0 0 0 9 60 25000 0 5527 0 5527 0 0 0 0 10 60 25000 0 5531 0 5531 0 0 0 0 11 60 25000 0 5535 0 5535 0 0 0 0 12 60 25000 0 5531 0 5531 0 0 0 0 INT OFFTCAP FTDTCAP 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 10 0 0 11 0 0 12 0 0 END Figure 1-11 APZ 212 40 Processor Load - 20 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction Collection of maintenance statistics The Maintenance Statistics function collects status information and information about events which have happened in the system. The collected data can be used to evaluate performance indexes. The function is controlled by the Statistics Subsystem (STS) in APT. Contents of variables can be transferred to STS where the data is processed. The variables to be transferred are described in DID's, Data Interface Descriptions. Examples of variables to be transferred in CPS are counters of System Events and System State changes as reported to block KEED. Examples of maintenance statistics collected in CPS are: • Number of restarts. • Number of bit faults. • Accumulated system stop time. • Accumulated time for blocked CP. • Memory sizes. The Maintenance Statistics function is implemented in blocks LAVS and MEMS and uses KEED functions. Product administration. The Product Administration function checks that identities of software units and corrections are correct. The Product Administration function is implemented in block PACR. Audit functions. Audit functions detect and inform operators about errors and system states which require manual intervention. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 21 - APZ 212 40 The audit function group of blocks consists of two different categories of blocks. The first group contains blocks which are common for all audit functions, such as I/O-handling, supervision and administration of audit function handlers etc. The second group contain handlers which detect different types of errors in the CP. When an audit function detects an error or another state that requires manual intervention, an alarm may be raised. The alarm will cease either after that an operator has taken the appropriate actions, after correction of an error or after a reloading of the system. Three different audit functions exist today: • A function to detect uncontrolled writings in the program store. • A function to supervise the utilization of data files in a size alteration case and the utilization of the different stores. • A function to provide control of the exchange build level. The audit functions are implemented in the set of parts Audit Functions in CP (AFCP). Signal linking and symbol translation. This function is used when software units are loaded to and output from the system. Before the software units are loaded in the system all references are not resolved. References to signals are given as symbolic names instead of absolute numbers. At loading, the symbolic references are resolved and all signals are given a global signal number that later is used in all references. When software units are output from the system in relocatable format, this function is also used. In this case, the signal number is translated back to symbolic names again. The function to administer the global signal number and also to translate between symbols and global signal numbers is implemented in the set of parts LOCP. The linking function, a function to list information associated with the linking of central software units and a function to perform a consistency check of the signal symbol table and the Global Signal Distribution Tables are also implemented in the set of parts LOCP. - 22 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction Administration of AXE parameters. This function provides a standard mechanism for examining and changing the values of AXE parameters that define the properties of an exchange, for example, an optional feature in the Mobile Telephony System. An updated parameter value is checked against predefined limits or a list of permissible values before being accepted and distributed to the central software units that use the parameter. The function is intended for maintenance staff and customers' technicians. The Administration of AXE parameters function is implemented in the set of parts Administration of AXE parameters (PARCP). SUBSYSTEM CPHW The main tasks for the CPHW subsystem are: • Provide a platform for the PLEX Engine subsystem (PES) to execute the AXE-application including the APZ operating system on. • Provide support functions for the PLEX Engine subsystem (PES) and the maintenance subsystem (MAS) to create a central processor that will fulfill the telecom requirements. • Provide physical interfaces towards IPN and RPB. The CPHW is described in chapter 2. SUBSYSTEM PES PES provides a platform for program execution and data storage for the ASA210C applications on the Central Processor, CP. The ASA210C assembler code is compiled into native assembler code using the Just-In Time principle. The central log handler administers error and event messages sent from processes on the CP to a log on the Adjunct Processor, AP. PES is described in chapter 3. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 23 - APZ 212 40 SUBSYSTEM MAS The maintenance system in APZ 212 has been designed to meet very high operational reliability requirements. The Maintenance Subsystem (MAS) handles hardware faults and software faults in the Central Processor (CP). The hardware faults and the software faults are handled with as little disturbances as possible in the traffic handling. MAS also contains functions for Central Processor Test (CPT), forlopp management and support to other subsystem.. In APZ 212 40 the Maintenance Unit (MAU) is implemented partly in hardware (HW) and partly in PLEX software (SW). In order to meet the very high demands for reliability the CP is duplicated using the principle of warm standby, hot on demand. The meaning of warm standby is that it is ready to take over as new EX in the case of a serious fault in current EX. Hot on demand refers to the Soft Side Switch (SSS). The duplication means that at any given moment one of the CPsides is executive (EX) and the other CP-side is standby (SB). The operation state of the duplicated CP is supervised by the MAU. To protect AXE 10 from data corruption due to SW faults, and at the same time minimize the influence on the application, fault detection and recovery functions are introduced in PES and in MAS SW. After a fault has been detected, the least disturbing actions possible are used to resume program execution. MAS functions are described in chapter 5. SUBSYSTEM DBS DBS is a relational database management system for AXEapplications. DBS is based on the existing theory about relational database management systems. DBS consists of central software only. In the DBS-users, there will be compiler generated code that interfaces DBS. The interwork between DBS and the DBS-users is close. The functionality that DBS provides is implemented either purely in DBS, either partly in DBS and partly in the compiler or either purely in the compiler. Extensions to the standard are made for supporting easy incorporation in AXE, curtailments are made for functions that are of no use for real-time applications. - 24 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction SUBSYSTEMS RPS The regional processor system RPS includes RPs with executive programs. RPS also includes signal terminals with executive programs and programs for signaling on link supporting several signaling systems. The subsystem also provides specific functions for loading, function change and maintenance. APZ 212 40/2 does not support RPB-P. Several new RP products are introduced together with APZ 212 40, therefore the RPS is described in chapter 6. SUBSYSTEM RIS The task of Regional Internetworking Subsystem (RIS) is to handle data communication in AXE 10 when connecting a remote host. The remote host can be another AXE 10 or a system which has the same protocol implemented. The Regional Internetworking Subsystem is part of the APZ and contains regional SW for protocol stacks and drivers. The protocol stacks can be used by any application running in the RP platform. The function is to handle data communication in AXE. The remote host can be another AXE or a system, which has the same protocol, implemented. Subsystem RIS can be coarsely divided into the following functions. • LZT 123 7917 R3A Transmission Control Protocol / User Datagram Protocol / Internet Protocol. This function implements the Transmission Control Protocol (TCP), the User Datagram Protocol (UDP), and the Internet Protocol (IP), where TCP and UDP are two standard implementations of the transport layer, that is protocol layer 4, in terms of the Open System Interconnect (OSI) standard model for data communication protocol stacks. TCP works in the connection-oriented mode and UDP in the connectionless mode. Correspondingly, IP is a standard implementation of the network layer, that is protocol layer 3 in OSI terminology. TCP and UDP work on top of IP. Copyright © 2004 by Ericsson AB - 25 - APZ 212 40 • Frame Relay. This function implements Frame Relay (FR) which is a High Level Data Link Control (HDLC) based standard implementation of the data link layer, that is protocol layer 2, in terms of the OSI standard model for data communication protocol stacks. With FR, data transmission becomes connection oriented (at data link level) and packet switched. • File System Interface. This function implements the File System Interface (FSI). FSI provides a general framework in which various POSIX compliant file systems can be installed and accessed from within the Regional Processor (RP) environment. • Point-to-Point Protocol. This function implements the Point-to-Point Protocol (PPP). Basically, PPP is an HDLC based standard implementation of the data link layer, that is protocol layer 2, in terms of the OSI standard model for data communication protocol stacks. However, present subsystem function PPP in subsystem RIS goes beyond this basic PPP concept • Trivial File Transfer Protocol. This function implements the Trivial File Transfer Protocol (TFTP) which is an application specific protocol for file transfer with minimal capacity requirements and reduced overheads. TFTP works on top of UDP/IP at the very topmost level, that is protocol layer 7, in terms of the OSI standard model for data communication protocol stacks • Bootstrap Protocol. This function implements the BOOTstrap Protocol (BOOTP) which carries out the tasks of IP address initialization, RP configuration file loading, and OS initialization. Also, BOOTP provides a configuration file read interface to other processes. • Ethernet and GSI Drivers. The Ethernet driver is a means to initialize RP HardWare (HW) for Ethernet operation and to uphold a bi-directional data transmission between the Ethernet and the interconnected RPs. This transmission passes through an Application Program Interface (API) which also is initialized by the Ethernet driver. The Group Switch Interface (GSI) driver initializes, configures and opens up channels on the group switch interface SUBSYSTEM ACS - 26 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction ACS subsystem offers application programs: • A Microsoft Windows NT4 operating system environment with added functionality for communication with AXE-CP • External communication protocols • Support for handling errors and some features for integrating the AP into APZ • Tolerance to hardware faults and automatic recovery from software faults. SUBSYSTEM OCS The Open Communications Subsystem (OCS) is used to give CP and RPG applications in an AXE the ability to communicate with other hosts connected to a TCP/IP based network. OCS is also used for internal communication between CP, AP and AXD 301. The Internet Protocol (IP) is used to connect computers while the Transmission Control Protocol (TCP) uses IP to provide a reliable stream oriented connection. SUBSYSTEM AES The Adjunct Computer External Access Subsystem (AES) contains functions that interact with other platforms using standard protocols. AES provides: 1) Generic Output Handler (GOH). GOH is a common output handler for files, directories and block for application programs. 2) File Transfer Protocol (FTP) configuration, provides commands to create, remove and list FTP sites or virtual directories in a given site. 3) Secure Shell (SSH) for secure remote connection to APG40 over a TCP/IP connection. The secure communication can be used both for command sessions and file transfers. AES is a part of the Adjunct Processor Module (APM) and is always required. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 27 - APZ 212 40 SUBSYSTEM MCS MCS provides the man machine interface for operation and maintenance functions in the AXE10. MCS handles alphanumeric information, alarms and authorization checks. • Alphanumeric information, commands and printouts, is mostly (but not always) transferred to or from Alphanumeric Devices (AD-n) in an APG. • By "alarms" is meant the administration of all alarms in the AXE10 system. • Authority checks of operators at logon from RSS-connected typewriters, and an authority check of Man Machine Language (MML) commands at execution, are provided. The Man Machine Language in command mode is a subset of the CCITT MML specification. SUBSYSTEM FMS The subsystem FMS provides the persistent file storage for AXE. The main functions are normal file operations like create, delete, copy etc. FMS also contains an infinite file function. The FMS CP File System (CPF) is mainly concerned with managing the different file types (regular or infinite) supported by the FMS in APIO The transfer of files to remote system is managed by the Generic Output Handler. A function in CPF implements infinite files. An infinite file is a composite file where the sub files may be created automatically by the CP file system, hiding the fact that there is more than one file from the user. SUBSYSTEM MPS Management Platform Subsystem (MPS) provides functionality for managing exchanges via communication through IOG or APG. The MPS provides functionality for element management via communication through IOG or APG. The MPS software is located on a workstation running Windows or Unix machine. Several communications protocols are supported.. - 28 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 1 Introduction The current release of MPS includes Win Fiol, CommandForm, AlarmTool, File Tool, ForloppTool, IOGTool, Hardware Inventory Tool, OPTool Fast Recovery Module, together with the FMS products ODManNT and SPManNT and the Element Management Resource Kit. CHAPTER SUMMARY LZT 123 7917 R3A • The APZ 212 40 is the next generation high capacity CP. It has 3 times the capacity of APZ 212 30 and is based on a commercial microprocessor and operating system. • PlexEngine replaces the microprograms that execute ASA code. • A new ‘warm stand-by, hot on demand’ concept is employed in the CP. Duplicated CPs do not run in parallel like in classic APZs. • The RPH design from APZ 212 30 is mainly reused. • The hardware of the CPUM APZ 212 40 is completely different to previous APZs. • APZ 212 40 supports IPN (Inter Platform Network). • The APZ 212 subsystems are further divided in set of parts and blocks implementing all APZ related functions. Copyright © 2004 by Ericsson AB - 29 - APZ 212 40 2 APZ 212 40 Hardware Structure - 30 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure OBJECTIVES After completing this chapter, the students should have knowledge of: • • The layout of the APZ 212 40 Cabinet The hardware structure of the CP Magazine in the classic APZ and the APZ 212 40 The hardware structure of the RP Handler Magazine in APZ 212 40 The Inter Platform Network in the APZ 212 40 The functions of the power boards and fans in the APZ 212 40 Changes in the CDU panel • • • • Figure 2-12 Objectives INTRODUCTION The APZ 212 40 hardware resides in CP and RPs. The RPs are treated in RPS in chapter 1. The focus in this chapter is on the CP hardware and hardware control software implemented in CPHW subsystem. The following blocks and products in CPHW are implemented in hardware, firmware and/or software: • Cabinet • CPU Magazine containing the o CPBB o UPBB o RPHMI o CPHW-FW LZT 123 7917 R3A • External and Internal Interfaces • CDU • FAN Copyright © 2004 by Ericsson AB - 31 - APZ 212 40 • RPHM o RPIO o RBPIB o IPNX The main tasks for the CPHW sub system are: - 32 - • Provide a platform for the PLEX Engine subsystem (PES) to execute the AXE application including the APZ operating system on. The program execution interface is based on a well-defined set of instructions defined by the processor provider. • Provide support functions for the PLEX Engine subsystem (PES) and the maintenance subsystem (MAS) to create a central processor that will fulfil the telecom requirements. • Provide physical interfaces towards IPN and the regional hardware part of RPS. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure APZ 212 40 CABINET LAYOUT The APZ 212 40 is BYB 501 compatible like the APZ 212 30/33. In the APZ 212 40, the magazines are housed differently than the APZ 212 30/33. The CPU hardware occupies one shelf in the cabinet. This holds both CP magazines, one for CP side A and one for CP side B. An external RPH-magazine is used. There are two RPH magazines (RPHM), again one for RPHM-A and one for RPHM-B. In the APZ 212 40 the RPHMs are stacked above the CPU since the CPU will become hot and it is not wanted to pre-heat the CPU cooling air. There is also a Central Display Unit (CDU), which indicates the states of both CPs. The figure below displays the APZ 212 40 cabinet layout. • Equipment practice BYB 501 • 2 CP sides/shelf • External RPH-magazine must be used (same RP bus interface boards as in APZ 212 30/33) • Footprint: 600 x 400 mm • Height: 1800 mm RPHM--A RPHM--B Fan CP-A&B Fan Figure 2-13 APZ 212 40 Cabinet layout LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 33 - APZ 212 40 THE CPU MAGAZINE As mentioned in chapter 1 the APZ 212 40 has many functions and concepts ported from the classic APZ. The CPU is the major area of change in the APZ 212 40. This can be best described by taking a look at the CPU as it exists in the older APZs. THE CPU MAGAZINE IN THE CLASSIC APZS. In the APZ 212 20, the CP is implemented in a number of magazines housed across two cabinets. The Signaling Processor functionality is implemented on one SPU board and the IPU functionality is implemented on two IPU boards, IPU 1 and 2. The Program Store is physically implemented on one printed circuit board called Storage Unit Program (STUP) and the Data and Reference Stores are found together on Storage Unit Data (STUD) boards. The MAU which is actually part of the Maintenance subsystem and supervises the CPs, is found on CPUM- B. The SPU and IPU are connected to the MAU via the Manual Intervention Allowed (MIA). MAUs hardware is implemented in the Maintenance Unit Magazine (MAUM). When the CPU is in normal operation both CP-A and CP-B are identical in software in the APZ 212 20. For updating and matching purposes the SPU/IPU is inter-connected to the corresponding SPU/IPU in the second CP side using the Updating and Matching Bus (UMB). In the APZ 212 30, the two CP sides are again completely identical in software during normal operation. There is a UMB with the same functionality as in the APZ 212 20. The SPU and IPU are found on the CPU but the SPU is sub-divided into two processors, a SPU master and a SPU slave. The layout of the boards in one CPU magazine can be seen below. In this diagram it can be seen that the IPU and SPU now have one board each. The MIA is featured on each board also to interface to the MAU. The Program and Reference store have now been allocated together on the IPU board. The Data Store is stored by itself on DRAM and SRAM boards. DRAM has a high storage capacity and SRAM allows faster access to the data. This is a feature that did not exist previously. - 34 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure The two CP sides have no cables connected to the front, which means that all buses are connected to the back of the CP sides. For instance, the Central Processor Test Bus (CTB) will be connected to the backplane. This is a fundamental difference in comparison with the APZ 212 20 design. EX MIA SB/WO SB/UP SB/HA SB/SEP MAU S T U MAN POW D OFF 0 S T U D 1 S T U D 2 S T U D 3 IPU S T U D 4 S T U D 5 S T U D 6 S T U D 7 SPU (MIT) POWC POU AUT POW OFF ALARM (MAI) MAN POW OFF POW ON/OFF POW ON/OFF POW ON/OFF FEX ON EMERG SWITCH POWER PHCI ON PHCI LAMP TEST SRAM boards DRAM boards Figure 2-14 APZ 212 30 CPU boards and MAU board The MAU board in APZ 212 30 combines the functionality of the former MAU Magazine (MAUM) of APZ 212 20. The MAU board is located in CP-B again. There is also a micro instruction trace functionality implemented on a BRU board. However, the BRU board is normally not inserted in the CP magazine during normal operation. In the APZ 212 33, again the CPUM contains the boards, IPU, SPU and POWC including MAI. The memory is laid out the same as APZ 212 33 with PS and RS found on the IPU board and the DS distributed over a maximum of 8 SRAM and DRAM boards called Data Store Units (DSUs). The front view of the CPU boards in the CPUM is exactly the same as that in APZ 212 30 as seen in the previous figure. The main real difference in the APZ 212 30 and APZ 212 33 is with respect to capacity. The IPU is based on the same principles as the APZ 212 30 except with some support for faster instruction execution of the software. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 35 - APZ 212 40 The SPU still has a master and slave unit. The maintenance philosophy is retained from APZ 212 30, MAU still located in the CP magazine in the B-side (CP-B) and with MAI still the interface between MAU and the CP. UMB also matches the data between the two CP sides. THE CPU IN APZ 212 40. In the APZ 212 40, we are now dealing with a commercial processor and not one designed exclusively by Ericsson. This has brought with it a lot of changes in the CP hardware. One CPUM still houses one CP side, however each CP side consists of two commercial processors. One operates as an IPU and the other as SPU, which look after program execution. Each CPU magazine comprises of the following boards: - 36 - • One CPU board which has the IPU, the SPU and the stores (PS, RS and DS). • One Base I/O board which includes an Access port to obtain low level, system error information from the CPU board. This board does not provide any application related function. • A RPHMI ( Regional Processor Handler Magazine Interface) board which has cable connections to the RPHMs and to the RPHMI in the other CP side. • A UPBB (Update Processor bus board) which has an Ethernet optical fibre connection to the other CP side for fast updating – “hotting up”. This board also has connections to the I/O system, CDU (Central Display Unit) panel at the top of the cabinet and to the CPU Magazine Fan for monitoring and controlling the fans. • The CPU Magazine also contains a DC Power Module. The Power module converts the externally distributed -48 volts to power levels that is used within the CPBB including the RPHMI and the UPBB board. The power distribution is done via two cables and both of them have to be connected to get the system to work properly. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure CP-A CP-B UPBB RPHMI CPU BASE IO Power Module Figure 2-15 The CPU Modules. The figure above shows how the CP subrack is laid out. In the above CPUM, Ericsson has developed both the UPBB and the RPHMI. A third party company has developed the other CPU boards. The RPHMI acts as an interface to the RPHM. It also contains the old MAU (Maintenance Unit) functionality, which is now implemented in the software and not in a board as with older APZs. Control of the CP states will now be implemented on the RPHMI board. A high speed updating bus (1 Gbit Ethernet) connects the UPBBs in both CP sides. This is responsible for carrying out the ‘hotting up’ of CP-B should we wish to do a side switch. Because the CPs do not run in parallel synchronous mode the data in the B side must be made identical to A if it is to take over. The UPBB also allows for communication with the Adjunct Processor (AP) via an Ethernet Switch. In the following figure, it can be seen that a cross connection exists between the CP and the IO system, this connection goes from the UPBBs in both CPUs to the APG 40. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 37 - APZ 212 40 APG 40 A Outer world I P N X RPHM A APZ 212 40 A 1Gb Ethernet I P N X APG40 B RPHB RPHM B APZ 212 40 B 100 Mb Ethernet Figure 2-16 Communication Links from CPs to APs and RPHMs In the following figure, the layout of both the UPBB and the RPHMI can be seen. The figure shows the cables and buses in each board connecting the boards to other parts of the APZ. The MIA LED is a lamp, which illuminates to indicate whether manual intervention is allowed or not in case of a fault. UPBB MIA LED Yellow LED Conditional power Push-button PTB, connected to “own switch” RPHMI MIA LED PHC inhibit WSB connected to other RPHMI Fan, connected to the fans system TAB connected to twin RPIO in RPHM JTAG, for external test equipment RMB and TAB own connected to own RPIO board in RPHM CDU, connected to the CDU panel Yellow LED Push-button RPHB to twin side, RPIO in RPHM I/O cable connected to “own” switch I/O cable connected to “twin” switch Update Bus, connected UPBB in other side RPHB from twin side, RPIO in RPHM RPHB to own side, RPIO in RPMH RPHB from own side, RPIO in RPHM Figure 2-17 Front view of UPBB and RPHMI - 38 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure The BASE IO board provides port for low level tests and access to authorized users. BASE IO BOARD Ethernet Status 10/100Mbs Ethernet port MIA Led Yellow Led BSC port, Manageability Subsystem Access Com 1, SRM Console Access Com 2, Kernel Debugger Access Figure 2-18 Base IO board CP Memory The memory in APZ 212 40 is pre-configured from factory. The maximum configuration is a total of 8 GW16. Two configurations will be available with the following sizes of total memory: • Medium: 4 GW16 • Large: 8 GW16 It is not be possible to extend memory on the CPU board. Memory extensions will be made by a complete replacement of the CPU magazine. The Program Store (PS) memory, Reference Store (RS) memory and Data Store (DS) memory are allocated to the CPU board. No special memory board is required for the RS, PS and DS memory and the respective sizes are set via a configuration file. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 39 - APZ 212 40 Figure 2-19 CPU - 40 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure APZ 212 40 INTERFACES Several interfaces are implemented in APZ 212 40 for the external or internal communication. These could be of two different types. • Physical. i.e. interfaces between physical units, these interfaces are normally defined from level 0 in the Open System Interface model. • Application Program Interface, API PHYSICAL EXTERNAL INTERFACES Serial Regional Processor Bus, RPB-S The RPB-S implements the interface between the CPHW subsystem and the regional parts, RP, of the regional processor subsystem, RPS. One RPBS is active at the time (carrying RP signals). The second one is used for redundancy. RPH together with RPHMI is handling protocol on the RPB-S. RP is secondary station on the RPB-S and it is allowed to transmit in response on poll command from RPH. Figure 2-20 Serial Regional Processor Bus Interface LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 41 - APZ 212 40 IPN signaling interface The CPHW subsystem provides support for IO-signaling and CPTsignaling over TCP/IP. Signaling over IOB (IPN signaling) is terminated within the commercial Operating system on the CPBB, signaling over PTB (CPT signaling) is terminated in the onboard firmware on the UPBB board. Figure 2-21 IPN signaling interface An Ethernet based, Inter-Platform Network, IPN, has brought in an industry standard, high capacity interface into the AXE. APZ 212 40 is prepared for the Inter Platform Network. IPN provides faster connections in the CP and faster connections to the I/O system using the high speed bus technology, Ethernet. The Ethernet switch (or IPNX switch as it is also referred to) is housed in the RPHM. The communication towards the AP system will go over an Ethernet cross-connection of 100Mb via this Ethernet Switch. With the later APZs and in APZ 212 40, this fast connection gives better performance for dump, reload and I/O operations. As an example, the reload time from Hard Disk is decreased by at least 90%. IPN allows for two APG 40 configurations. The Ethernet switch connections in APZ 212 40 are shown in the figure below. - 42 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure Ethernet link(s) between AP sides T.ex AXD 301 b 0 10 b M 100Mb 10 0M T.ex AXD 301 100 Mb AP-B 100 Mb AP-A Ex. on LAN´s LAN 1 10/100Base-T Mb 100 M b CPT 10Mb 100 Mb 100 LAN 2 Switch 2 100 Mb CPT 10Mb Switch 1 1Gb optical Ethernet CPU-B CPU-A Figure 2-22 Ethernet switch in APZ 212 40 environment Each CPU magazine is connected to both switches (one in each RPHM) for redundancy reasons. The APG 40 is also connected to both switches. The following diagrams illustrate the IPN Ethernet connection links in the APZ 212 40 going form the UPBB board in the CP to the APG. CDU Figure 2-23 Ethernet Connection Links LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 43 - APZ 212 40 In the APZ 212 33, IPN replaced the 10 Mbit STOC (Signaling Terminal Open Communication) which had been used as an internal Ethernet for the communication between the AP and the CP. However, in order to use IPN, an APG 40 is always required. It is possible to have an IPN connection between each CP (via the IPNX) to two or more APGs. Both IPNX switches have connections with both CPs. The fast Ethernet connection between the CPs of 1GB goes between the UPBBs in each side. The IPNX is using a slot in the RPHM but has own power connection. This improves the IPN high availability. The figure below shows a summary of the IPN and RPHMI connection in CP. AP AP CPT CPT RPB-P RPB-S R P H P R P H S R P I O R P H P R P H S R P I O RPHM-B RPHM-A AXD RPB-P RPB-S RPHB & TAB own RPHB &TAB twin PTB Ethernet IPNX RPHB & TAB own RPHB & TAB twin Ethernet IPNX Working State Bus Updating bus CPU-A RPHMI SPU cPCI IPU UPBB CPU- System Bus Memory Module Figure 2-24 IPN and RPHMI connections MAJOR PHYSICAL INTERNAL INTERFACES The internal interfaces are mostly implemented in communication channels sharing the capacity of a physical bus. Below, several such communication channels (logical busses) are presented. - 44 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A PTB 2 APZ 212 40 Hardware Structure Processor Test Bus, PTB The PTB, used for CPT communication, is physically implemented as a virtual channel within the IO-bus. This channel is used for maintenance communication with the CP core functionality. The communication is handled by a TCP/IP communication protocol over a 10Mbit per second Ethernet channel. I/O communication bus, IOB The IOB is implemented as an Ethernet bus with a bit frequency of 100Mbit per second. There are up to five physical individual ports on each CP-side. Every port has its own IP and MAC-address. Compact PCI, cPCI The cPCI busses are used to interconnect units housed in the magazine with the cPCI backplane. In the system there are among others the following cPCI buses. - Between PIM and RPHMI. - Between PIM and UPBB. The PCI Interface Module (PIM) connects the CPU backplane and the cPCI backplane. The figure below shows the cPCI backplane. Figure 2-25 cPCI backplane. Top View LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 45 - APZ 212 40 Regional processor handler bus, RPHB The RPHB is the communication channel between RPHMI and RPH on own CP side, thus called own RPHM or/and twin CP side, and thus called twin RPHM. It is used to transfer of signals between RPHMI on one side and RPH on other side. Test Access Bus, TAB The TAB is a connection between the RPHMI board and the RPHM, own or/and twin. The TAB is implemented as a JTAG chain. It is used to read, write or clear (if possible) out of band registers on the boards in the RPHM: • Error registers • ASIC product identification and revisions • Fault injection registers • RP bus cross connect register RPH Maintenance bus, RMB The RMB is a connection between an RPHMI board and its own RPHM. The RMB is both used to control the MIA LEDs on all boards in the RPHM and to read the board identification EEPROMs of the boards in the RPHM. The RMB also has functions for controlling and reporting power status of the RPHM. The RMB is implemented in the same physical cable as the TAB bus to the own RPHM side. Working state bus, WSB The Working State Bus, WSB, is used to synchronize working state information on the both CP sides and to transfer side locating error signals. WSB cable is used to define CP side, A or B. - 46 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure UPDATE CHANNEL The Update channel is implemented as a fibre base Ethernet bus with a bit frequency of 1 Gbit per second. The update channel is used to transfer data from CP EX side to CP SB side for example at back up and soft side switch. The communication is handled by an Ericsson developed device driver which takes care of the transfer of raw data. CDU communication channel The CDU communication channel is physically an I2C bus that is used to update the CDU panel with for example current working state, RPH state and manual intervention allowed. FAN communication channel This interface is used for the UPBB and the maintenance subsystem to observe the behavior of the FAN packages. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 47 - APZ 212 40 FANS IN APZ 212 40 The fan subracks is used to force an air flow through the system to protect it from overheating. The fans are reused from the APZ 212 30. The RPHMs are stacked above the CPU to increase to cooling of CPU by avoiding preheating the CPU cooling air. The figure below shows the FAN places in the cabinet. CDU-A CDU-B RPHM-A Cable shelf RPHM-B Cable shelf FAN FA CPUM-A FAN FAN CPUM-B FAN FAN Fan Filter Figure 2-26 FAN and CDU position - 48 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure The FAN is implemented as one sub rack containing three plug-in fan units. Each plug-in fan unit consists of two fans and a control circuit. The control relays on the CPU and it is performed through the FAN Communication channel. The FANs are physically connected in a loop as shown in the figure below. RPHM B RPHM A FAN CPUM UPPER CPUM A/B FAN CPUM LOWER Figure 2-27 FAN connections LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 49 - APZ 212 40 THE CDU PANEL On top of the cabinet there is still a display unit which indicates the state of the processor. It indicates the state of both CPUs but also the state of the RPHMs. RPHM-A (Normal CP working state is indicated on the panel above.) RPHM-B Fan CDU Central Display Unit NRM Normal state, Stand-by CP side CP Central Processor RPH Regional Processor Handler CPUM Central Processor Unit Magazine RPHM RPH Magazine CP-A CP-B Fan EX Executive CP side SE Separated state, Stand-by CP side HA Halted state, Stand-by CP side UP Updating state, Stand-by CP side Figure 2-28 The CDU panel in APZ 212 40 The CDU provides information about the CP states, MIA and RPH control information. In addition, the three digits display shows what the CP is doing by displaying the code of the action. The explanation of the codes can be found in the document “CP Display Unit, Handling” (ALEX). A sample of the codes is shown here: Numerical values, during loading of CP CDU Value 300 310 311 320 - 50 - Activity in progress Memory activities Check of Program Store (PS) checksum Check of Reference Store (RS) checksum Loading of small Data Store (DS) dump, from Main Store (MS) Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure CPHW FUNCTIONS In addition to the program execution platform for the PLEX engine subsystem the CPHW provides a • Service functions for the Maintenance Subsystem (MAS) and the following interfaces: • Application Interfaces • IO-signal handling. CPHW SERVICE FUNCTIONS FOR MAS Program handling check, PHC (Watch dog) PHC, Program Handling Circuit, is the system watchdog. It is used to supervise execution of APT and APZ software. It provides the means to recover system from the faults in their execution. APZ and APZVM are responsible for triggering of PHC. CP working state handling CP working state function is used to determine the state of two CP sides within CP and ensure that only one CP side is EX at any time. The Working state function logic is implemented in the RPHMI on both CP sides, however, it is only active on the CP B side. CP side is determined by the Working state bus (WSB) cable, which has different encoding of CP side pins on two sides. Working State Logic on two CP sides is synchronized over the Working state bus. Central Processor Test, CPT The CPHW provides the CPT interface and underlying functions used by the CPT. During normal operation as well as when it is not possible to reach the CP, Central Processor Test (CPT) commands may be used. The CPT is reached by entering the command PTCOI. It is also used when the CP is down and the following message is displayed when trying to establish an MML session: CP not obtainable You may now enter: APLOC, PTCOI, PTCOE, or EXIT command. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 51 - APZ 212 40 Hardware fault handling The system has a number of mechanisms to detect suspected behavior of the hardware. Some of the faults will lead to restarts with switch of CP-side and others will result in only an alarm on the alarm console. Soft side switch support functions RPHMI provide support for SSS as it needs to update all RP table copies, signal buffers and internal data on the RPHMI before RP signaling can be resumed on the new EX side at the point where it was stopped on the old EX. APPLICATION INTERFACES, API Application interfaces (API) are used to provide PLEX Engine with a standardized interface to the underlying implementations. The APIs define a layer consisting of two major groups (each group is functionally divided into subgroups depending on the represented functionality): • Hardware abstraction layer, HAL • Operating System Interface, OSI These two layers are intended to make the APZ less dependent on a particular microprocessor architecture or operating system. The OS API isolates the user from the commercial operating system and HAL isolates the user from the hardware. - 52 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure Figure 2-29 HAL and OSI Overview APT APZ-OS ASA Compiler APZ VM HAL OSI TRU64 CONSOLE CPU UPD RTC PHC CPUB RP/ WSL The reason for this is that if the operating system or hardware platform is replaced, the virtual machine and the layers above it can remain more or less intact. In the figure below, the relationship between the HAL and OSI layers and the APZ VM can be seen. IO SIGNAL HANDLING The CP hardware provides a number of IO-interfaces towards the regional processors and the IO systems. LZT 123 7917 R3A • Open communication signaling. Signaling towards the IPN and the APG40 is handled in the CPHW subsystem with a standard TCP/IP protocol stack implemented in the Unix operating system. The PES subsystem and OCS handles all levels above the TCP/IP. • RP signal handling. The CPHW subsystem supports up to 32 serial regional processor buses, RPB-S. Parallel RP bus is not supported. All functions for maintenance of the RPbuses are the same as for APZ 212 33. RP signal protocol handling is handled by the RPHMI and the RPH. Copyright © 2004 by Ericsson AB - 53 - APZ 212 40 • - 54 - CPT signaling. Signaling towards the APG40 and the CPT software (belonging to the MAS subsystem) executing on the APG40 platform is handled in the CPHW subsystem by the firmware executing on the UPBB board. The communication protocol between the APG40 and the CP is based on CPT signals packed in TCP/IP frames. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure REGIONAL PROCESSOR HANDLER MAGAZINE The RPHs are located in a magazine of their own in APZ 212 40 and handle signal transfer between the CPS and RPS subsystems. The main functions of RPHs are to receive RP signals from CPU (SPU) and to handle reformatting and transmission of the signal to the addressed RP according to the RP bus protocol. RPH also scans and polls the connected RPs to receive a signal when the scanned or polled RP has a signal to send, and to forward this signal to CPU (SPU). Regional Processor Handler in classic APZ 212. In the classic APZ 212, both parallel and serial RP buses are used for communication between the CP and RPs. When parallel buses are used the bus interwork between CP and RP is totally controlled by the two CP sides. In the figure below, it can be seen we have duplicated buses going from each RP to the CPs. Serial RPs use a different signalling algorithm than the parallels. CP signals to the RPs will during normal operation be sent on one bus branch called the active branch. The CP side connected to this will then distribute the RP signals to its twin side over a crossconnection as seen in the following figure. Figure 2-30 Classic APZ 212 CP-RP Communication (Serial RP Bus) LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 55 - APZ 212 40 There is also a power board for power supply and a Regional Processor Input Output (RPIO) which connects the RP buses to the SPU in the CPU. The Regional Processor Magazine in APZ 212 40 In the APZ 212 40 the same RPHM design is used, except with some small modifications to allow for the new features in the APZ 212 40 structure, and support for the parallel bus topology has been omitted. Figure 2-31 RPHM in APZ 212 40 A RPHM magazine comprises of: • RPH Input and Output board (RPIO2). This board is also referred to as the RPH Interface board. • Up to a maximum of 8 Serial RP Bus Interface boards (RPBIS) for connecting 4 serial RPBs to each RPBI-S. • One Inter Platform Network Ethernet switch board (IPNX) for connecting to the APG 40 and future I/O systems. • A power board ( POU-R). In APZ 212 40, serial and Ethernet RP Bus Interfaces are used. There can be 32 RPB bus branches in total. The RPIO2 board is still the first board on the left-hand side in the magazine. - 56 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure The IPNX board which allows for the Inter Platform Network function is located in the same slot as in the APZ 212 33 (next to the power board). As in previous models of APZ, a RP Handler Bus runs between the SPU in the CPU to the RPHM. Contrary to previously released APZs, where a RHB connected the SPU in CP-A to RPHM-A and another RHB connected CP-B to RPHM-B, in APZ 212 40 there is one RPHB running from both RPHMs to both CP sides. This means that CP side A and CP side B can control either RPHMs. Therefore the two RPHMs are cross connected in RPHMI board. The RPIO2 board however has been modified from that used in previous APZs in order to support the cross connected RPHMs. The following figure demonstrates this cross connection in the RPHMs Ethernet Ethernet AP node A AP node B CPT CPT RPB-S B RPB-S A CP R P I O 2 RPHM-A AXD 301 R P B I S I P N X RPHM-B R P I O 2 R P B I S I P N X RPHB & TAB own RPHB &TAB twin PTB PTB Working State Bus RPHB RPHB & & TAB TAB twin own RPHMI SPU IPU UPBB Updating bus Console Memory Module CPU-A BASE I/O CPU-B Figure 2-32 Cross connection between CPU sides and RPHMs The cross-connected RPHMs are introduced to increase redundancy. The executive CP side will control both RPHMs where one will be “active” and be the one that will capture the jobs from the RPs. If something happens to the “active” (the one currently carrying the traffic) RPHM the other RPHM will take over. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 57 - APZ 212 40 The RPHMs can also be separated and allocated to their own CP side. The state on the RPHMs will be displayed on the CDU panel in order to avoid any accidents e.g. disconnecting the “active” RPbus. Replacement of hardware in the RPHMs is done on board level. The RPB control logic is implemented in RPBIP and the allocated bandwidth per RPB is configurable as shown in the figure below <DPRBP; RP BUS BRANCH INDEX LRPBI 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 END PRPBI 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ADDRESS SUIT RP-0&&-31 RP-32&&-63 RP-64&&-95 RP-96&&-127 RP-128&&-159 RP-160&&-191 RP-192&&-223 RP-224&&-255 RP-256&&-287 RP-288&&-319 RP-320&&-351 RP-352&&-383 RP-384&&-415 RP-416&&-447 RP-448&&-479 RP-480&&-511 RP-512&&-543 RP-544&&-575 RP-576&&-607 RP-608&&-639 RP-640&&-671 RP-672&&-703 RP-704&&-735 RP-736&&-767 RP-768&&-799 RP-800&&-831 RP-832&&-863 RP-864&&-895 RP-896&&-927 RP-928&&-959 RP-960&&-991 RP-992&&-1023 RESERVED BW 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ALLOCATED BW 1280 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 TIME SLOTS 1 8 8 8 8 8 8 8 8 8 8 8 8 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 2-33 Example of RPB Indexes - 58 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure HARDWARE IDENTITIES IN CP The CP hardware unit identities are accessible by both CPT and Application Commands as shown in the figure below. The figure shows a part of the real printout. cpt<PTHIP; CPT MESSAGE HARDWARE IDENTITY POS MAG PCB PRODUCT NUMBER 4 CPU UPBB-A ROJ 226 0005/3 8 CPU RPHMI-A ROJ2260006/3 CPU MDC-A 54-30408-02.C05 CPU PIM-A 54-30406-01.B03 4 CPU POWER-A 30-56487-01 12 CPU BASEIO-A 54-30396-03.D03 20 CPU CPUB-A 54-30520-01.B3 CPU CDU-A ‰ CPU MRC-A-3 54-30460-02.B02 CPU DIMM-A-0 47 CPU UPBB-B ROJ 226 0005/3 51 CPU RPHMI-B ROJ2260006/3 CPU MDC-B 54-30408-02.C05 CPU PIM-B 54-30406-01.B03 47 CPU POWER-B 30-50898-01.E01 55 CPU BASEIO-B 54-30396-03.D03 64 CPU CPUB-B 54-30520-01.B3 CPU CDU-B ‰ CPU MRC-B-0 54-30460-02.B02 CPU DIMM-B-0 20-00FBA-09 2 RPH-A RPIO ROJ 207 129/1 10 RPH-A RPIRS-0 ROJ 207 030/2 14 RPH-A RPIRS-1 ROJ 207 030/2 18 RPH-A RPIRS-2 ROJ 207 030/2 74 RPH-A IPNX ROJ 207 503/1 78 RPH-A POUR ROJ 207 032/1 2 RPH-B RPIO ROJ 207 129/1 10 RPH-B RPIRS-0 ROJ 207 030/2 14 RPH-B RPIRS-1 ROJ 207 030/2 18 RPH-B RPIRS-2 ROJ 207 030/2 74 RPH-B IPNX ROJ 207 503/1 78 RPH-B POUR ROJ 207 032/1 1 FANC-UP FAN-0 ‰ 29 FANC-UP FAN-1 ‰ 57 FANC-UP FAN-2 ‰ 1 FANC-LO FAN-0 ‰ 29 FANC-LO FAN-1 ‰ 57 FANC-LO FAN-2 ‰ REV R1A R1A/A E03 SERIAL NUMBER TF10860478 TF10878814 AY13405003 AY12215017 C215100018 AY12901700 SW13400436 SW11900004 R1A R1A/A E01 P1D R1B R1B R1B R1A R2D P1D R1B R1B R1B R1A R2D TF10860454 TF10878818 AY13404988 AY12402882 C311400401 AY12901663 SW14800016 SW11900011 MR20160024 TF10649945 T22AET9656 T22AEH4340 T22ACZ4259 TF10624536 S971201559 TF10649942 T22AET9930 T22AEH4633 T22ACZ3657 TF10624542 S971202613 END Figure 2-34 CPU Hardware Identities, CPT Print command LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 59 - APZ 212 40 Not numbered positions in the figure above denote PCB that are not plug in units. They might be ASICs on the existing boards. The figure below shows similar data printed with DPHIP. <DPHIP; CP HARDWARE IDENTITY POS MAG PCB NUMBER I 0 CPU CDU-A 0 CPU CDU-B 2 RPH-A RPIO 10 RPH-A RPIRS-0 14 RPH-A RPIRS-1 18 RPH-A RPIRS-2 74 RPH-A IPNX 78 RPH-A POUR 2 RPH-B RPIO 10 RPH-B RPIRS-0 74 RPH-B IPNX 78 RPH-B POUR 1 FANC-UP FAN-0 29 FANC-UP FAN-1 57 FANC-UP FAN-2 4 CPU UPBB-A 47 CPU UPBB-B 8 CPU RPHMI-A 51 CPU RPHMI-B 20 CPU CPUB-A 64 CPU CPUB-B 12 CPU BASEIO-A 55 CPU BASEIO-B 4 CPU POWER-A 47 CPU POWER-B 1 FANC-LO FAN-0 29 FANC-LO FAN-1 57 FANC-LO FAN-2 PRODUCT NUMBER REV SERIAL ROJ 207 129/1 ROJ 207 030/2 ROJ 207 030/2 ROJ 207 030/2 ROJ 207 503/1 ROJ 207 032/1 ROJ 207 129/1 ROJ 207 030/2 ROJ 207 503/1 ROJ 207 032/1 P1D R1B R1B R1B R1A R2D P1D R1B R1A R2D TF10649945 T22AET9656 T22AEH4340 T22ACZ4259 TF10624536 S971201559 TF10649942 T22AET9930 TF10624542 S971202613 ROJ 226 0005/3 ROJ 226 0005/3 ROJ2260006/3 ROJ2260006/3 54-30520-01.B3 54-30520-01.B3 54-30396-03.D03 54-30396-03.D03 30-56487-01 30-50898-01.E01 R1A R1A R1A/A R1A/A TF10860478 TF10860454 TF10878814 TF10878818 SW13400436 SW14800016 AY12901700 AY12901663 C215100018 C311400401 E03 E01 END Figure 2-35 CPU Hardware Identities, Application Sw Print command - 60 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 2 APZ 212 40 Hardware Structure CHAPTER SUMMARY • The APZ 212 40 is BYB 501 compatible. It contains two CP magazines and two RP magazines. • The APZ 212 40 introduces a new CPUM processor of both Ericsson and a third party design. • The RPHM design is re-used from APZ 212 30. CPU and memory L2 cache PCI interface module L2 cache Base I/O 100 Mb/s CPU CPU PCI logic UPBB 1 Gb/s 100 Mb/s Control and data path logic 100 Mb/s 10 Mb/s Manageability RPHMI Memory TIG SROM Manageability RPHB own RPHB twin TAB own TAB twin WSL logic LZT 123 7917 R3A • One CPUM consists of a CPU board, a Base I/O board, a RPHM board and a UPBB board. Each CPU also has a DC power board. • Each RPHM comprises of a RPH I/O board, serial RP bus Interface boards (maximum 8) and one IPNX switch and a power board. • The APZ 212 40 uses an Ethernet switch (IPNX) which allows for faster connections in the CP to the I/O system. • In the APZ 212 40, the CDU panel has changed to allow for the new CP states that have been introduced. Copyright © 2004 by Ericsson AB - 61 - APZ 212 40 3 Software Structure - 62 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure Objectives After completing this chapter, students will be able to describe: • The key differences in the software structure of APZ 212 40 and older APZ’s • How the ASA compiler operate within PlexEngine • The functionality and components of APZ VM and the commercial operating system that features in the APZ 212 40 • The functions of HAL and OS API. • The record orientated memory architecture • Dumping and Loading in APZ 212 40 Figure 3-36 Chapter Objectives INTRODUCTION The APZ 212 40 is built on a modern commercial microprocessor platform. The applications running on classical APZs are supported by adapting the application software to the new processor structure. For this, a “middleware platform” is developed implementing a virtual APZ machine. Through this solution the application software is ported with almost no changes. The APZ virtual machine uses the software structure as it was made for the classical APZ and compiles that to native code for the commercial microprocessor. APT (RMP,XSS, AMs) Application Software APZ-SW APZ-VM ASA compiler PLEX Engine HAL, OSI OS HW (µ-Pro) Figure 3-37 Layers in APZ 212 40 LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 63 - APZ 212 40 The ASA code is not pre-compiled to native code, this happens during run time. The APZ VM, the compiler and the APIs (HAL and OSI) forms the PLEX Engine defined in the subsystem PES. Compared to the classical APZs the old MIP is replaced by the PLEX Engine consisting of the following parts: • An APZ Virtual Machine (APZ VM) that is responsible for the execution of ASA code. • An online ASA Compiler that works in two compilation modes, basic and optimized. • A Hardware Abstraction Layer (HAL) and Operating System Application Interface (OS API), these can be seen as two thin layers between the OS and APZ-VM. • A commercial operating system. APZ VM HAL ASA comp iler OS API MIP in Classic APZ Commercial Operating System Figure 3-38 New “MIP” replacement in APZ 212 40 This chapter will describe the structure of the application software and the program control relying in the Plex Engine and the commercial microprocessor. - 64 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure APPLICATION SOFTWARE STRUCTURE The logical structure of APZ 212 is designed with the aim of upholding the functional modularity also in the implementation of the function blocks, in order to simplify the installation, operation and maintenance of exchanges. A function block is normally implemented with a combination of hardware, central software and regional software. The hardware in a typical function block consists of a number of identical units, e.g. telephony trunks. The direct control of these units is executed by the Regional Software Units (RSU), while the more complex administrative functions in the block are implemented in the CSU. The figure below shows the realization of the software units. 0 1 n Hardware Unit Regional Software Central Software Unit Function block A Central Software Unit Function block B Figure 3-39 Realization unit in function block The software in a function block can be divided into program logic and data. As a result of the design used in the system structure, the program logic in a function block can only access data belonging to its own block. All inter-work between different function blocks is carried out by means of transmitting strictly defined messages, i.e. signals. This rule is naturally also extended to apply to the interwork between the central and the RSUs within a block. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 65 - APZ 212 40 Data in a function block can be either temporary, e.g. state information for hardware units, or of a more permanent type, e.g. information on the surrounding switching network. Since the maintenance of permanent data requires extensive and often complex programs, only temporary data can be stored in a RSU. Inter-work between blocks entails connections. To minimize the number of connections, all inter-work between blocks is normally placed in the CSUs. Inter-work between regional units can also occur within an RP, in addition application data can be send between RPs. For the internal structure of a CP software unit, see figure below. Signal interface Program Common data n 0 1 Individual data record Figure 3-40 Software unit structure Data belonging to a software unit can either be common to all controlled hardware units or individual to each unit, i.e. the data are stored in a series of data records, each corresponding to a certain hardware unit. The program executes its function in the same manner for each hardware unit and addresses the corresponding data record by means of a pointer. This structure also applies to function blocks that do not contain any hardware. The division of data into common data and a series of identical data records is a solution that is well suited for most of the functions used in telecommunication systems. Several different data record series can also be defined in a software unit. - 66 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure A corresponding RSU is stored in each RP to which hardware units of a certain type are to be connected. Individual data must, of course, be stored in such a manner that only data for the actually connected hardware units are stored in a certain RP. This means that, during communication between the central and the RSUs within a block, the pointer in the CSU must be translated into the corresponding RP address and an internal address series within the RP in question. This translation takes place in the central processor. The central processor supports the modular system structure as shown in the figure below. Program store Signal distribution table Signal sending table Program code PS Reference store Block number register IPU RS Reference table Base BS PSA Pointer register BSA DS Base address 1 Base address 2 Base address 3 C1 C2 address table Base address m X0 X1 Xp Xn Y0 Y1 Yp Yn Figure 3-41 CP software structure Only one function block at a time can execute in the processor, i.e. the block whose block number is temporarily stored in the block number register in CPU. The reference table in RS can be addressed and the start addresses to the CSU's programs and base address table can be read with the aid of the block number. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 67 - APZ 212 40 The programs of a program unit are stored in the program store, and divided into signal tables and a program code. A signal distribution table has an entry label for each signal that can be received. The signal number for signal transmission is stored in the signal sending table. Address to the instruction in the program in which the execution for this signal shall start is found by means of the signal number. The base address table of the software unit is stored in the reference store and contains a base address for each common variable and each series of individual variables, which can be handled by the programs. The base address contains the address to the respective storage position in the data store and information on the variable length, variable type, number of data records, etc. When individual data items are addressed, the pointer value in question is stored in the pointer register in CPU. Addresses to program labels and data are always specified relative to the start addresses PSA and BSA in the machine instructions. As a result, all information in the stores, with the exception of the reference table, is relocatable, i.e. can be stored at any optional position that is free in each store, respectively. The design of the addressing mechanisms also means that only data belonging to a block whose number is stored in the block number register can be accessed at a certain point of time, i.e. only the block's own data. Over addressing is inhibited by means of checking the value of the pointer automatically against the base address information on the number of data records. RECORD ORIENTED MEMORY ARCHITECTURE In the APZ 212 40, a record-oriented Data Store architecture has been introduced. The idea to have this is that after one piece of data has been read in a computer system it is more likely that another variable in the same record (the same pointer value) will be read, than that the same variable in the next record will be read. Contrary to the traditional APZ, modern processors heavily rely on cache technique, which means that the time for reading data is much less if a nearby-collocated data has just been read. When data is read, some data that follows immediately after is also always read and put in the cache memory. Therefore a record-oriented data allocation is more efficient. - 68 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure admstate admstate state state alarm alarm Record 0 Record 1 Record 0 Record 1 Record 0 Record 1 DS - Classic APZ, variable orientated memory admstate state alarm admstate state alarm Record 0 Record 1 DS - APZ 212 40, record orientated memory Figure 3-42 Comparison of Data Store layout for APZ 212 40 and the classic APZ The classic APZs and their operating system use a variableoriented DS architecture. In APZ 212 40 the when IPU finds the statement to read a variable from the DS the whole record containing the variable is fetched and stored in L2 cache memory. Signaling Inter-work The software units in APZ 212 are activated and ordered to execute desired functions by means of signals. As a result, mechanisms for transmitting signals between software units are required in the processors. The real-time requirements must then be taken into consideration. This means that facilities for storing and priority marking of signals must also be available. Signals are stored in various buffers. The data that is entered in a buffer must define the signal unambiguously. A stored signal message must thus include address information, and information on the signal format. The number of data items that can be transmitted by the signal according to the format in question. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 69 - APZ 212 40 PROGRAM CONTROL An interrupt system is required to ensure sufficient accuracy for the time measurements and also to ensure that high priority signals are dealt with rapidly. By means of the interrupt system, the work in the processors can be placed on different priority levels in such a way that a signal with a high priority interrupts any work in progress on a lower priority level. Four different priority levels are used for program execution in the CP. The highest level is used by the program test system. On the three remaining levels, the program execution is controlled by using four job buffers for storing signal messages with different priorities and a so-called job table for sending signals to time measurement programs. All time measurements in the central processor are based on the 10 ms primary interval for clock interrupts. The program control in the central processor is mainly executed by APZ-VM. APZ-VM emulates an interrupt system with five priority levels (in IPU) and a number of job buffers for storing signal messages, as shown in the figure below. RPH UPBB RPHMI SPU IPU Job buffers MAL SW error VM Trace error Job table MFL PREB TROB TRL JTJB JBA (CLOCK) Interrupt levels THL JBB JBC CL JBD DL RP RPSB IPN Figure 3-43 Job Scheduling The malfunction level, MFL, has the highest priority. Interrupts to MFL are generated by APZ-VM at execution errors. - 70 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure The trace level (TRL) is used for program tracing. Interrupts are automatically generated in conjunction with checks of activated trace bits, i.e. in signal distribution tables and base addresses, or SW execution errors detected by APZ-VM checks. Normal program execution is carried out on the traffic handling level (THL), C level (CL) and D level (DL). Job buffers A, B, C, D are used for controlling the levels. By means of the interrupt system, a work in progress can be interrupted when a signal with a higher priority requires that the central processor must be used for more important tasks. This high priority signal can originate e.g. from a RP when a change of state has been detected in a device. The high priority signal can be a clock interrupt signal when it is time to go through a time measurement buffer. It can also be a fault signal that requires immediate action. Each such interrupt signal indicates that work shall be done on a certain priority level. If the job being executed has a lower priority, an interrupt can be carried out immediately. In the opposite case, when the job in progress has a higher priority than the interrupt signal, the interrupt signal must wait until work on the higher level has been finished. Interrupts to THL level always interrupt work in progress on C or D level, similarly interrupts to C level always interrupt work in progress on D level but interrupts to D level wait until work in progress on C and THL levels has been finished and interrupts to C level wait until work in progress on THL level has been finished. In addition, an interrupt to THL level occurs every 10 ms for transmission of signals to time measurement programs according to information in the job table. The Job schedule and the program control in general are performed in the APZ VM. Following, the virtual machine and the other parts of Plex Engine are described. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 71 - APZ 212 40 PLEXENGINE IN APZ 212 40 The PLEX Engine Subsystem PES, handles mainly program execution and data storage for the ASA210C applications on the Central Processors, CP. There is also a logging function, which handles error and event messages sent from the CP to Adjunct Processor, AP. The subsystem resides on the CP with a small part on the AP. It interacts with a few other subsystems within the APZ, CPHW, CPS, MAS and RPS. The program execution is done by first compiling the ASA210C assembler into native assembler code, using the “Just In Time” principle and then perform the actual functions. The Plex Engine software is located on drive V as shown in the figure below: Figure 3-44 APZ VM software in APG 40 The APZ VM software identities can be printed as shown below: - 72 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure <LAMIP; MIDDLEWARE UNIT IDENTITY RPHMI MICRO PROGRAM IN PROM CAA 141 142 R4A02 RPHMI FLASH LOAD MODULES CXC1060124 R3A03, CXC1060125 R6A01 SYSTEM BOOT IMAGE CXC1060151 R2A PLEX ENGINE DUMP CXC 161 0005 R6A02 END APZ Virtual Machine APZ virtual machine is an entity that makes it possible to run programs compiled into ASA210C assembler language, on an platform using a commercial OS and commercial processors. It consists of middleware loaded to APZ 212 40 from the APG 40. ASA210C was previously interpreted by the Micro Instruction Program, MIP, and executed by the APZ processor. APZ VM replaces basically the MIP, thus leaving the APZ VM to interpret the ASA210C assembler and compiling into native code for the current processor type. The APZ Virtual Machine consists of the following major functions: LZT 123 7917 R3A • Kernel functions, such as buffering and scheduling PLEX signals. • Communication functions, such as communication with the OCS and RP. • ASA Compiler, compiling the ASA210C assembler into native code for the general purpose processor type. • Memory management, handling memory allocations for both the ASA210C environment and for memory allocations used by the APZ VM internal functions. • RTD (Real Time Debugger), providing an interface between the APZ VM core functions and an external device used for debugging purposes. Copyright © 2004 by Ericsson AB - 73 - APZ 212 40 Modules In the figure below, the modules forming the VM are shown simplified. IP - Thread SP - Thread Application blocks Communication (Media) Communication ASA Service Functions Error Handler Job Table Executive Core modules Job Buffer Memory Layout ASAC Register Memory Configuration General Toolbox Support modules Figure 3-45 VM's modules Notice, that RTD is a part of the Toolbox module. Two different module types are named in the picture above, core modules and application modules. Core modules Application Blocks This module contains application blocks (a.k.a. Clayton block) where a part of e.g. a PLEX block is actually implemented in the APZ VM. There are two main ideas for this concept: • It gives the possibility to access APZ VM core functions, which is not available using the normal Operating System Assembler Instructions defined by the ASA210C. • - 74 - These blocks shall mainly be executed within the same context as any other ASA210C application. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure The following application blocks are located in PLEX Engine Subsystem but the application block concept can be used by any sub-system, e.g. the PLEX block LAB located in Central Processor Subsystem. The following PLEX blocks are within PLEX Engine Subsystem: • FileTransfer The purpose of the FileTransfer block, whose PLEX counterpart is VMFTRHU, is to provide a file transfer service based on FTP. It is accessible for any APZ VM module and any ASA210C application. • VMSRVC The purpose of the application block VMSRVC is to serve as a signalling interface for internal services. VMSRVC is special in that it need to function before any dump is loaded. The PLEX counterpart is VMSRVCU, which is just an empty shell to occupy the block number e.g. at function change of an ASA210C application. • VMXFRH The purpose of the application block VMXFRH, whose PLEX counterpart is VMXFRHU, is to provide functions for data copying between the CP-A and CP-B side (or rather from EX to standby). ASA Service Functions The module acts basically as an interface between the ASAC and the core functions. It handles ASA210C Operating System Assembler Instructions, which operates on the core functionality of APZ VM and therefore can not be executed by the ASAC alone. Some of these instructions are executed internally within the module while others need actions by other modules. Communication The module handles mainly external communication such as RP and OCS signalling but it also handles the internal communication between threads. The Communication module is divided into the 3 sub-modules Transporter, ProtocolHandler and Media. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 75 - APZ 212 40 The Transporter defines the general interface that all communication entities used for inbound and outbound signalling must adhere to. The ProtocolHandler interface provides a general interface that any Transporter may use. The ProtocolHandlers uses a specific Media, which is the part that provides an interface towards some specific hardware, mapped DMA areas and/or the socket layer. For external communication a combinations of Transporters, ProtocolHandlers and Medias are used, while the internal communication uses the JobBuffer and a specific Media, ISHM. Error Handler The module provides mainly an interface for ASA210C application errors detected by the ASAC, internal errors within APZ VM, and finally an interface for errors detected by the underlying operating system. The Error Handling is in principle similar as in the classic APZ, with faults being reported by the software signal PROGERROR. PROGERROR is sent from APZ VM to MAS. The Error Handler provides also an interface for events that can occur within APZ VM. An event describes an occurrence of something that does not happen very often but still is expected to occur from time to time. All errors and events are logged and stored on the AP. The Error Handler also accommodates some service functions such as different restart actions, load regulation and supervision of the PHC thread. Software faults can originate in the APZ Plex blocks, APZ VM (including the Kernel), Alpha code, Operating System and other support functions. Examples of software errors are given below. - 76 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure Software faults From… APZ VM actions -Execution instruction fault APZ PLEX block PROGERROR - pointer too large VM Kernel PROGERROR Compiled Code PROGERROR - out of memory OS PROGERROR - RP table over dressing External PROGERROR - non-existing block - illegal instruction - pointer too large Figure 3-46 Examples of Software Faults that generate PROGERROR Executive The module handles the overall job scheduling within both IP thread and the SP thread and is therefore considered to be the kernel of the APZ VM. It divided in two parts: • one that executes on the SP thread and • one that executes on the IP thread. Apart from job scheduling it also handles on the SP side primary interval detection and supervision of the IP thread. On the IP side, it provides support for signalling between ASA210C applications (signal sending instructions), test system support and load measurement. An Executive component, the Scheduler, decides in which order jobs will execute. It fetches jobs from the job buffers and initiates the Dispatcher (another Executive component) to order execution of jobs. The order in which the jobs get executed is decided by a priority system inherited from the old APZs. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 77 - APZ 212 40 In the classic APZs, a job of high priority will sit on a corresponding high priority level and will immediately interrupt an executing lower level job. In the APZ 212 40, lower level jobs might be interrupted by those on a higher level depending on whether the lower level job takes a short time to execute or not. Short low priority jobs will not be interrupted in APZ 212 40 but longer jobs will and with some delay. The priority levels used are those described above (Malfunction (MFL), Trace (TRL), Traffic handling (THL) levels and C and D levels). The Dispatcher calls either the optimised or basic compiled block, the ASA compiler to compile a non-compiled block to execute a job. The dispatcher does a table lookup and gets a pointer that points to the signal entry in an optimised or basic compiled block for that signal entry. If the block is not compiled already then the compiler is called to create compiled code and then to execute it. The Dispatcher needs to include a set of parameters in the function call to the compiled code. These are data structures that are needed by the compiled code when executing the signal for example the Data Store, JAM and Register Memory. Job Buffer The module is responsible for managing buffered signals between ASA210C applications but also for internal functions within APZ VM. The job buffer handles eleven priorities, of which seven priorities can be used by ASA210C applications. Internal functions within APZ VM can use all eleven priorities but mainly uses the four dedicated priorities. Each priority buffer handles signals on a first in first out (FIFO) basis. Signals can be inserted into the job buffer from both the IP thread and the SP thread while retrieving of signals is only done on the IP thread. Operations on the job buffer from both threads can be done concurrently Job Table The module is used to manage timing intervals for specific jobs that execute periodically such as timers. Each of the periodical jobs has an entry in the job table consisting of a time counter to determine when the job should be executed and an ASA210C identifier. The identifier consists of a block number and a local signal number. - 78 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure This functionality is now run in the SP thread, which handles the scanning of the job table signal file. The job table is still managed by block Job, with Job deciding which jobs are to be inserted into the Job table and when the table should be scanned. Periodic signals are put into JTJB from the job table by JobTable block every 10ms where they wait their turn for execution. The contents of JTJB are then executed in the IP thread. Memory Layout The module is mainly an APZ VM internal support module. There are four main areas: • Providing support for working with the APZ stores, Reference Store (RS), ProgramStore (PS) and DataStore (DS). • Providing support some data structures used for fault location: JAM, MALData and “Preservation Areas” (areas for saving data during restarts, for example contents of job buffers). • Providing support function for allocating internal memory areas, such heap memory and exempt memory. • Providing support for compact storage of variables that can be used by the ASA210C FESR instructions. Register Memory Register Memory (RM) is mainly used for storing intermediate results during program execution in the classic APZ. In the APZ 212 40, they are emulated in software as data structures. The Register Memory contains the implementation of virtual HW registers that directly or indirectly maps to the registers found in the classic APZ. Also a limited set of new APZ 212 40 “registers” are located here. In APZ VM there is a pool of “Register Memories” available for the ASA compiler to use. Each job will have its own personal copies of the process registers it needs for executing. When the registers are not needed by the job any longer they are returned to the pool. Support Modules Configuration The module handles preparation and setting up of all other modules and the threads used by APZ VM. First it initializes itself, then continuing with all the other modules. If the initialization was successful, the treads are started. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 79 - APZ 212 40 The configuration parameters for the corresponding module are either stored as default values or configurable as command line arguments. General The module contains support and services generally available for all modules including the ASAC, OSI and HAL. It provides common services that are used by many functions within APZ VM such as thread handling and defining different signal types utilised for internal and external communication. It also handles services for approachability between platforms. Toolbox The module contains locally or 3rd party developed services/functions. These services or functions do not generally depend on any APZ VM specific functionality. The module also contains the functionality for the RTD. - 80 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure THREADS To divide the workload and to give priority to different kind of functions, threads are used. They also make it possible utilise multiple processors since it is possible to bound a specific thread to a specific processor. There are four threads explicitly created by APZ VM and one thread created by the OS at invocation of APZ VM. Thread Operation The table below summarizes the thread’s status during normal operation. Name Responsible Module Operation Bound CPU Main Thread Configuration Idle - IP Thread Executive Busy 1 SP Thread Executive Busy 2 PHC Thread Error Handler Idle 2 RTD Thread Toolbox Idle 2 Comments Waiting for the other threads to terminate. This thread never sleeps, thus having a dedicated CPU. Permits OS interrupts. Blocked on an I/O operation towards he PHC device. Used during debugging. Figure 3-47 Thread status Thread Description Main Thread Created by the OS. It handles the pre activities before the other threads are running and APZ VM is operational. The main tasks are; saving IP/SP thread data and initialization of the APZ VM. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 81 - APZ 212 40 IP Thread Executes the operations for all the ASA210C applications. The main tasks are; Scheduling between the eleven job buffer priorities, signal distribution and execution. The ASAC also executes on this thread as a result of the “Just In Time” compilation strategy. SP Thread Handles external communication and internal support functions. The main tasks are; Scheduling the external/internal communication channels, supervision of the IP thread and functions for e.g. restart, soft side switch, job table etc. PHC Thread The thread is tied to the PHC (Program Handling Circuit, sometimes called Watch Dog) device. The main task is to handle if the PHC circuit “fires” because the PHC has not been triggered by the software. The SP thread is then notified where the main actions are performed. The ASA210C applications have the main responsibility to periodically trigg the PHC. RTD Thread This thread is used for real time debugging and is normally not used at all. Execution Dynamics Conceptually an executing APZ VM is two large loops. One loop is the IP (“Instruction Processor”) thread and one loop in the SP (“Signalling Processor”) thread. There are a number of components involved located in the execution of APZ VM. Both threads has a scheduler to select different jobs within the thread. The jobs are on the IP thread originated from the ASA210C applications and on the SP thread from a number of protocol handlers, which handles external and internal communication. The job selection is on the IP thread is defined by a number of priorities where a higher priority takes precedence over a lower priority. On the SP thread the job selection is done as a round robin i.e. in sequential order. Communication (and synchronization) between the two threads takes place via a set of queues. The main queues are: • for signals from SP to IP - 82 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure • for signals from IP to SP • for signals from IP to IP • for X instruction “replies” from SP to IP Signals received by the IP thread is mainly buffered signals, which are queued in the job buffer but for some special cases an internal shared media (ISHM) is used instead. Signals received by the SP thread uses always an ISHM. SP Components The code executing in the SP thread is responsible for communicating with the external world, and for providing a few other support functions, such as sending periodical signals (Job Table signals) The SP thread main loop is the SPThreadDispatcher. On each iteration it checks the queue with requests from the IP thread, and dispatches them to the correct handler code for processing. It also calls all handlers that can generate inbound signals (receive data from the outside) to give them a chance to process inbound data and put it into the queues to the IP thread. Most of the handler code called by the SP thread is in the various communication modules (transporters, protocol handlers, media) used to communicate with the external world. The SPThreadDispatcher also does primary interval detection, and calls the JobTable when needed. IP Components All the application level code (ASA code) in the APZ execute in the context of the IP thread. The scheduler is the “driver” of the IP thread. It implements a priority driven scheduling policy. The policy is not exactly preemptive, but it is fairly close. It can best be described as “controlled preemptive”. Pre-emption can only occur at some predetermined points (like sending a buffered signal) or where the code generated by the ASA compiler calls the scheduler to check for higher priority jobs. The compiler inserts checks into all long (in execution time) sequences. The VM code and the ASA compiler check for pending higher priority jobs in a number of places, this is done by calling the yield function. There is one loop for every priority level and an idle loop. At a high level all the priority loops work in the same way. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 83 - APZ 212 40 1 Retrieve one buffered signal from the JobBuffer for the priority. 2 Call the dispatcher to execute the code for the signal in the ASA210C target block. 3 If no pending jobs at higher priority exists, and something left in current priority, back to 1. 4 Return to caller. The Dispatcher is responsible for starting the execution of a (PLEX/ASA) block. It sets up the proper context for the block and calls it. On return from the block it checks if the block sent a direct signal. If so, it transfers control to that block. The ASA210C block executes the application logic. The ASA code can call service functions in ASF (X instructions) or signal sending instructions (in Signal Distributor). The Signal Distributor handles most of the signal sending. It uses the GSD Tables to determine what block and LSN is the target for the signal (and that it is legal for that block). Depending on the signal type the distributor will do different things: • Buffered signals are inserted in the job buffer queue. • Combined (linked) signals, the dispatcher is called for immediate invocation of the target block. • For direct signals, a JobBufferEntry (this is what is in the Job Buffer Queues) is just prepared (filled in). Control will be transferred to the target block by the dispatcher when the ASA block returns to the dispatcher. This will happen immediately after the signal distributor returns to the block as direct signal send instructions imply that an “EP” follows. Functions IP- and SP-Thread Synchronization Communication between the different threads takes place by use of different internal signalling and/or use of the JobBuffer. The communication takes place without the use of time-consuming operations such as mutexes and semaphores. Instead credit based flow control for the queues guarantees the thread safety. Besides all queues use a single inserter and a single retriever. The same thread may do both, as is the case for IP to IP queues. Having exactly one thread manipulating each end of the queues greatly simplifies synchronization. - 84 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure IP thread to IP thread uses the job buffer as a communication channel. SP thread to IP thread uses either the job buffer or an ISHM as a communication channel. IP thread to SP thread uses an ISHM as a communication channel. PHC thread to SP thread is done by the use different flags that is read by the SP thread. RTD thread to SP thread uses a dedicated ISHM as a communication channel. THE COMMERCIAL OPERATING SYSTEM. Behind the APZ VM there must be an operating system, which supports it. The APZ 212 40 is based on a commercial microprocessor with the CP built on a commercial CPU, therefore it is feasible to use a commercial Operating System. The OS used in the APZ 212 40 is UNIX Tru 64. UNIX Tru64 is the first true 64-bit operating system in wide use. The operating system complies with all of the major standards for a UNIX Operating System. With this operating system it is possible to customise compact, special- purpose versions of the OS for special industrial applications such as Telecommunication applications as in the case of APZ 212 40. HAL AND OS API As mentioned in chapter 2 the virtual machine communicates with the CPU platform via two additional thin layers: an Operating System Application Interface (OS API) and the Hardware Abstraction Layer (HAL). These two layers are intended to make the APZ less dependent on a particular microprocessor architecture or operating system. The OS API isolates the user from the commercial operating system and HAL isolates the user from the hardware. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 85 - APZ 212 40 The main purpose of HAL is portability hence why these layers isolate the Application software from the hardware. HAL facilitates isolation of the hardware by creating well defined software interfaces. APT APZ-OS ASA Compiler APZ VM HAL OSI TRU64 CONSOLE CPU UPD RTC PHC CPUB RP/ WSL Figure 3-48 HAL and OSI Overview These must be used for all hardware accesses so anytime the application software wishes to reach the hardware it must go through HAL. HAL is divided into a number of Device Drivers as seen in figure above; each device driver contains functionality that is associated with a particular hardware. The HAL Driver Interface (HDI) acts as an interface between HAL and the APZ VM. OSI also focussed on portability. It will help minimise the work to be done when a different commercial OS is introduced in APZ 212 40. The applications can only reach the OS (Tru64) via calls in this interface. No direct calls to the OS are allowed but OSI shall be used instead. The system console seen in the above diagram is used by OSI when manual intervention is required during start-up of the commercial OS and when pinpointing errors. OSI also contains an administration process, which supervises the APZ VM process, including starting APZ VM at boot time and restarting it if unrecoverable errors are detected. OSI is coded in C. - 86 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 3 Software Structure SOFTWARE DIFFERENCES TO CLASSIC APZS The C and C++ parts of the system are treated separately from the rest of the system. These separate files on the adjunct processor are loaded using the Bootstrap Protocol (BOOTP) and the File Transfer Protocol (FTP). They are not part of the system backup. To ensure short reload time, the function Backup in Main Store is always used. When backup of the system data is first made, it is stored in the backup area of primary memory. As part of the backup function, the system is then transferred to external medium, the APG40. The automatic backup function is routed via the standby side to the adjunct processor. In the standby side, an automatic reload is initiated to keep the standby side “warm.” When the APZ VM, ASAC, HAL, OS API, or OS needs to be updated, the new version is loaded as a normal file onto the adjunct processor. A boot is then initiated in the CP standby side, and the new versions are loaded. The CP sides (roles) are then switched and the sequence is repeated. CHAPTER SUMMARY LZT 123 7917 R3A • In the APZ 212 40, the old microprogram system of previous APZs is replaced by an APZ VM, an ASA compiler, the hardware abstraction layers HAL and OSI API and a commercial Operating System. • PlexEngine is the collective name for the ASA compiler and the APZ VM. It is responsible for executing the ASA code. • The ASA Compiler operates in two modes, Basic and Optimised. • The APZ VM is a compiled module of C++ code. It is described as middle ware interfacing between the application software and the hardware platform. • The commercial Operating System that runs the VM is UNIX Tru64. • HAL and OS API are additional layers that make the APZ less dependent on a particular microprocessor or Operating System. Copyright © 2004 by Ericsson AB - 87 - APZ 212 40 • - 88 - In the APZ 212 40, a record orientated architecture is used in the DS as opposed to variable orientated in the classic APZ. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling 4 APZ 212 40 Handling LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 89 - APZ 212 40 OBJECTIVES Objectives After completing this chapter, participants will be able to describe changes in handling of: • hardware • backup • reload • maintenance Figure 49 Objective NORMAL SYSTEM OPERATION The two CP sides are designated A and B from a physical point of view, and EX and SB from a functional point of view. At any given moment, one CP side is EX and the other one SB. At normal operation in APZ 212 40, the EX side runs an application while the SB is ready to take over. The two CP sides are not in parallel operation as in the classical APZs. At error, the EX side attempts to restart the application. The CP working state is stored in a Working State Register. At normal operation, data in the WSR (Working State Register) indicates which side is EX and which is SB. - 90 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling Ethernet Ethernet AP node A AP node B CPT CPT RPB-S B RPB-S A CP R P I O 2 RPHM-A I P N X R P B I S R P I O 2 RPHM-B AXD 301 I P N X R P B I S RPB-SX RPHB & TAB own RPHB &TAB twin RPHB & TAB twin RPHB & TAB own IOB PTB PTB Working State Bus UPBB RPHMI SPU Updating bus UPBB RPHMI IPU SPU IPU Console CPU-A BASE I/O BASE I/O Memory Console CPU-B Module Memory Module Figure 4-50 WSB connection. CP working state function is used to determine the state of two CP sides within CP and ensure that only one CP side is EX at any time. The Working state function logic is implemented in the RPHMI on both CP sides, however, it is only active on the CP B side. If a CP side is A or B is determined by the Working state bus (WSB) cable, which has different encoding of CP side pins on two sides. Working State Logic on two CP sides is synchronized over the Working state bus shown in the figure above. The MAU function is located in RPHMI board in both CP sides. When the system has normal state and that is when the CP-A is EX and the CP-B is SB, the MAU state is NRM, otherwise the MAU is AAM. The printout below shows the MAU state AAM although there is not any fault. <dpwsp; CP STATE MAU AAM END LZT 123 7917 R3A SB SBSTATE A WO RPH-A SB/PWO RPH-B EX/PWO Copyright © 2004 by Ericsson AB BUA STATE 2 - 91 - APZ 212 40 When the CP-A becomes EXectutive the MAU is normalized. <dpswi; DPSWI; <; ORDERED < CP STATE MAU NRM SB SBSTATE B WO RPH-A EX/PWO RPH-B SB/PWO BUA STATE 2 END STATES FOR SB SIDE The SB side is aimed to take over when the executive cannot run the applications or to load new functions with a function change procedure. Depending on the situation the following sub states are possible, indicating if it is: • ready to take over the application at fatal fault in EX. NRM(WO) • used for preparation of a function change. (SE) • suspected as faulty. (HA-FM) • under test. (SE-FMMAN or SE-FM, FMMAN) These sub states are described below. NRM (WO) Normal State. The stand by side is warm, which means that it is ready to take over as new EX in the case of a serious fault in current EX. Program Store (PS) and Reference Store (RS) in the SB side contains same information as PS and RS in the EX side. The SB side will leave state NRM if the contents of PS or RS in EX side is intentionally changed (for example at Function Change or loading of a program correction). The same happens when Size Alteration is performed. - 92 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling For Data Store (DS) variables included in the small or large data dump, the SB side contains consistent information that has been copied from the EX side. These variables were copied from EX to SB when the state for the SB side was changed to NRM. They will also be updated from the back up information at automatic output of back up information. Thus, they are regularly updated from EX to SB if automatic output of back up information is active. For DS variables with property TRANSIENT, SB side contains the same information as EX but, with a small (negligible) delay. At every write to a TRANSIENT variable in EX, a message will be sent to SB, causing a similar write in the SB side. The SB side can not send or receive RP signals. (The SB side is disconnected from RPH.) The SB side can not send or receive application signals through the IP based communication path. APZ Operating System (APZ-OS) is not running with full functionality in the SB side. The execution of APZ programs is limited to a few job table signals and a few jobs running through the job buffers. The running jobs have to maintain the program clock, trig PHC, supervise error signals in EX, test own hardware and support the back up function. PES support makes it possible for the back up function to output new back up from the SB side to AP resident IO file. PES support is also used for the updating of TRANSIENT variables in the SB side. If a supervising program in SB detects a fatal error signal in EX, it will order switch of EX and, when executing in EX state, initiate a system restart with reload. This reload does not take the time needed for a forced CP-Reload because the software is already loaded. The event is shown in “SYRIP: Survey;” . The software in the SB side is authorized to order switch of EX only if there is an error signal in EX. SE-FMMAN Separated and manually fault marked. The SB side is acting as backup after function change or other modification (size alteration of data file, loading of correction etc.) in EX side. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 93 - APZ 212 40 The difference between this state and state NRM is that here, the back up function is no longer able to update the SB side and writing to TRANSIENT variables in EX is not performed in the SB side. This side is ready to take over as new EX just like at NRM. SE-FM, FMMAN Separated, fault marked and manually fault marked. Used when a CP side shall be prepared for function change (after command FCSEI) or at manual testing in a separated CP side (after command DPSES). This CP side may have contact with own RPH magazine. The printout below shows the SB state. <dpwsp; CP STATE MAU AAM SB SBSTATE A SE-FM-FMMAN RPH-A DISC RPH-B EX/NOTPWO BUA STATE 2 END SE-FM Separated and fault marked. Separated at EX controlled test of SB/SE during repair check or during Retry after CP error. UP-FM Updating and fault marked. Used during the updating procedure. The SB side is not able to execute any ASA instructions. With support from PES, it is possible to copy PS, RS and DS from EX to SB (completely or partly). UP-FM, FMMAN Updating, fault marked and manually fault marked. Used during the updating procedure. The SB side is not able to execute any ASA instructions. With support from PES, it is possible to copy PS, RS and DS from EX to SB (completely or partly). - 94 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling HA-FM Halted and fault marked. This side has been automatically taken out of service due to permanent fault or due to high frequency of temporary errors. HA-FM, FMMAN Halted, fault marked and manually fault marked. The SB side has been manually halted by DPHAS or REMCI. If MIA has been activated (from other side) power may be switched off/on. RPH STATES In each CP side there is also a Regional Processor Handler (RPH) state, indicating how this CP side has contact with RPH. In normal CP state the CP-A controls both RPHs. <dpwSP; CP STATE MAU NRM SB SBSTATE B WO RPH-A EX/PWO RPH-B SB/PWO BUA STATE 2 END This is indicated in the printout above and the CDU as shown in the figure below LZT 123 7917 R3A CDU Central Display Unit NRM Normal state, Stand-by CP side CP Central Processor RPH Regional Processor Handler CPUM Central Processor Unit Magazine RPHM RPH Magazine EX Executive CP side SE Separated state, Stand-by CP side HA Halted state, Stand-by CP side UP Updating state, Stand-by CP side Copyright © 2004 by Ericsson AB - 95 - APZ 212 40 Figure 4-51 CDU states The Executive CP side will control both RPHMs where one RPHM will be “active” and that will capture the jobs. If something happens to the “active” (the one currently carrying traffic ) RPHM, the other one will take over. The RPHMs will be displayed on the CDU panel in order to avoid any accidents such as disconnecting the “active” RP bus. When side switch occurs CP-B will start by controlling the own RPHM-B and then the RPHM-A. The procedure can be followed in CDU LEDs. HANDLING CHANGES COMPARED TO CLASSIC APZ HW -BOARD REPLACEMENT / EXTENSION The smallest replaceable unit in CP is called Field Replacement Unit (FRU) and is a whole CPUM (i.e. a whole CP-side including the RPHMI and BASE IO). For RPHM it is as it is in classical APZs, on board level. The special support for RPH extension is removed. If an extension is necessary the function change procedure must be used. MEMORY EXTENSION When main memory needs to be extended the FRU (CPBB, UPBB, RPHMI and BASE IO) must be replaced with a new FRU with more memory. Only extension of memory is supported. Function change will be used during the update. No support for command SASTS. Limits for RS, PS and DS are set via a configuration file within APZ-VM. In the configuration file the memory for RS and PS is set to max value according to the system limit (16MW32and 256MW16). The automatic allocation process will first allocate memory for RS and PS then remaining part of the memory to DS. - 96 - Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling WARM STANDBY HOT ON DEMAND When a fatal HW fault occurs (including OS fault) we will have a side switch to a warm SB-side. In this situation we will get a large restart of rank reload. The warm SB-side is normally pre-loaded with SW so the effect is a reload with reload time less than 0,5 seconds. When traffic is up and executing in the new EX-side, the command log is added manually. Soft Side Switch (planned events) will only result in a minimal hinder time (ms), not affecting established calls, at the end of the "heating-up” sequence before the side switch. During this time the CP will freeze time i.e. the CP is stopped. This time is estimated to < 500 ms. When the processor load is high the SSS (DPSWI) will not be performed. It is because the data to be transferred during the heating up cannot be competed within the time interval. A Time-out occurs generating a related Fault Code. At function change the stop time is very much dependent of how many blocks that are changed, exchange data such as file structure and so on. Therefore no guidance can be given as to how long the stoppage will be. On one hand we have the memory architecture that will result in additional copying and conversion of data during the stop-time. On the other hand we have a faster processor, improvements in the function change functionality (method 2 only). The end result, the stop-time at function change will probably be in the same range as in classic APZ typically 0.2-5 minutes. RPH faults are in general handled without traffic disturbance by switching RPH but keeping the same CP-side as executive. So why have Ericsson chosen this warm standby concept? Introduction of this new system concept has meant that new system states have had to be implemented or that the meanings of the old ones had to change. The normal system state is that one side is Executive and the other is Standby Normal as opposed to Standby Working in previous APZs. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 97 - APZ 212 40 • • • • • Minimal job-table, as few signals as possible in the job buffers APT loaded with block state ACTIVE, but not executing. Restart RANK=RELOAD required to start APT execution i.e. the APT blocks in the SB side can not affect the transient data in the SB side. All external sources for signals to APT will be blocked. SB shall receive (be able to receive) updates of transient data from EX, but does not have to be completely up to date. Figure 52 Definition of the state NRM • One of the requirements/objectives for the project was to base the new APZ on a commercial CPU. Only a few commercial CPU's support lock-step. The choice of CPU is based on number of criteria where capacity is the number one. We should also be able to move from one CPU platform to another with only limited design efforts • The other reason is ISP. Today we have very few HW faults resulting in ISP disturbances, relative SW faults. Without this fact the decision would not have been taken. What are the benefits or drawbacks of a warm standby? - 98 - • Benefits are that we can choose, purchase and follow the best (capacity wise) CPU family on the market and thereby shorten lead-time between upgrades and minimise maintenance in the future. • Drawback is that we are not transparent to HW faults. When a fatal HW fault occurs (including OS fault) we will have a side switch to a warm SB. In this situation we will get a large restart of rank reload. The warm SB is however pre-loaded with SW so the effect will be a reload with reload time equal to more or less 0 seconds. When traffic is up and executing in the new EX side, the command log is added. • As mentioned above these type of faults are not expected to be very frequent. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling • In today's systems we could have a serious SW fault that results in a reload, both sides could simultaneous become affected. This type of fault will not have the same effect in this machine. How does warm standby effect the ISP for the node? • Planned events will only result in a minimal hinder time (ms), not affecting established calls, at the end of the "heating-up” sequence before the side switch. During this time the CP will freeze time i.e. the CP is stopped. This time is estimated to < 500 ms. • The application will be given notice before this sequence is started warned via three new system events that are introduced. Subscribing to these events, and act accordingly e.g. to “TEMPORARY CP STOP MAY OCCUR” allows possibilities to make the side switch transparent to the applications. • The RP signals will be buffered out in the RPs during this time and consequently nothing will be lost. Adaptations have also been made in the OCS protocol to avoid problems in the OCS traffic. The lost time will be corrected over a 24-hour period. BACKUP A system backup copy is provided as security for use in the case of a serious system error. The system backup copy is a copy of the contents of the central processor's stores. When maintenance subsystem MAS considers that the existing software is no longer functional (several system restarts take place consecutively in a brief period of time), MAS sends an order to the boot strap loader to reload the system with the system backup copy. The system backup copy is located on a CP file and on a dedicated area in the main store, which acts as a fast cache for the external medium. The use of backup in Main Store (BUA) is mandatory in APZ 212 40. An output of a new system backup copy must always be done after major system changes such as function changes and size alterations. The following contradictory requirements apply to the system backup: LZT 123 7917 R3A • The system must start up safely. • The information loss must be kept to a minimum. Copyright © 2004 by Ericsson AB - 99 - APZ 212 40 In order to meet the first requirement, reloading should take place with the best proven, in most cases the oldest system backup copy possible. The second requirement means that the system backup copy should be as young as possible. These two incompatible requirements have resulted in a compromise. Output data has been subdivided into two types, 1. data which can control the program path and 2. data of purely recording type (for example, charging data). Both these data types are stored in two versions on the external system backup copy (one version only in the primary storage backup) and are automatically output at times determined for the exchange. Programs and reference information are output only after major system changes such as function changes and size alterations. The system backup copy function is implemented in the set of parts Backup of CP (BUCP) and the function block LAB. LAB is not a normal PLEX block. It performs reloading of backup information at system restart with reload or when ordered by command, from external medium or main store. LAB also performs initial loading ordered by command and update of the SB-side. When the system is first backed up using the command SYBUP the system is stored in the backup area of the primary memory. As part of the back up function the dump is then transferred from the back up area to the APG40. The automatic dumping goes via the SB-side to the APG 40. BUINFO instance information about the size of the system back-up copy SDD small data dump LDD1 most recent large data dump LDD2 second latest large data dump PS program store dump RS reference store dump Figure 53 sub-file structure of dump file The dump format is changed. The old format, supporting magnetic tape stores is removed. The old sub-files called R0-R5 are replaced with sub-files called: • “BUINFO” where for instance information about the size of the system back-up copy is stored. It contains also information about the other subfiles. - 100 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling During automatic output of small or large data dump, subfile BUINFO is updated • “SDD” where the small data dump is stored. it contains certain specially selected variables. These variables are so important that extra outputs must be made in the interval between the outputs of the RELOAD variables. The variables of which output is desired so frequently are for example those that contain charging data for subscribers. These variables must not be marked RELOAD (output is ordered by using a special block type and a macro). The variables must furthermore not be program-controlling for reasons of safety. This means that the variables to be output in this manner shall be of the counter type, whose value only is output or listed. • “LDD1” where the most recent large data dump is stored • “LDD2” for second latest Subfiles LDD1 and LDD2 contain the RELOAD marks variables of the system. Output of a large data dump implies that subfiles SDD and LDD1 are output. In other words at a large data dump a small data dump is always output first. The existing LDD1 is moved to LDD2. • “PS” where the program store is dumped • “RS” where the reference store is backed up. The RELFSWx subfiles are shown in the printout below: C:\>cpFls -ls relfsw0 CPF FILE TABLE FILE RELFSW0 TYPE reg TRANSFER QUEUE MODE RLENGTH 2048 MAXSIZE SUBFILES RELFSW0-BUINFO RELFSW0-LDD1 RELFSW0-PS RELFSW0-RS RELFSW0-SDD LZT 123 7917 R3A MAXTIME REL CMP yes ACTIVE SIZE 1 200128 44234 6649 1068 VOLUME RELVOLUMSW SIZE 0 USERS 0 [ 0R USERS 0 [ 0R 0 [ 0R 0 [ 0R 0 [ 0R 0 [ 0R 0W] 0W] 0W] 0W] 0W] Copyright © 2004 by Ericsson AB 0W] - 101 APZ 212 40 With this new dump format the extra conversion made today after dumping and before loading is not needed (the CP uses the new format already internally) and we thereby improve load and dump times substantially. The figure below shows the steps taken by CP during command ordered backup. AP AP EX Ba ck -up Alternative way du mp ed No to rm ex al ter wa na y l x me dia BUMS->background data x-fer CPU-A EX 3 4 1 Gb Ethernet BUA 2 SYBUI 7 CPU-B SB 6 BUA PM DS RS PS SYBUP 1 SB PM 5 DS RS PS 1. Back up is ordered. 2. System copy copied to the Back Up Area of Primary Memory. 3. The back up copy is transferred to the SB side. 4. Back up copy received. 5. SB side reloaded and DS, RS and PS is updated with a new system copy. 6. Back up is transferred to external media. 7. Automatic back up function can be activated. X. Alternative way for back up to external media. Figure 4-54 Creating a Backup Copy. The system must be normalized with DPPAI or RECCI (step 3) The flow of creating a Backup Copy is shown in the printout below. The comments are added to clarify the steps taken by CP: <sybup:file=relfsw8; SYBUP:FILE=RELFSW8; <; ORDERED < BACKUP INFORMATION OUTPUT PROGRESS TEST AND RELOCATION OF STORES IN PROGRESS BACKUP INFORMATION OUTPUT PROGRESS CP starts the backup in BUA. The SB is Separated. - 102 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling BACKUP TO MAIN STORE STARTED BACKUP INFORMATION OUTPUT PROGRESS BACKUP TO MAIN STORE COMPLETED After the backup in BUA CP writes out all subfiles to the RELFSW8 as given by the command. BACKUP INFORMATION OUTPUT PROGRESS FILE RELFSW8 RELFSW8 END SUBFILE BUINFO BUINFO PROGRESS 0 100 FILE RELFSW8 RELFSW8 END SUBFILE SDD SDD PROGRESS 0 100 FILE RELFSW8 RELFSW8 RELFSW8 RELFSW8 RELFSW8 RELFSW8 RELFSW8 END SUBFILE LDD1 LDD1 LDD1 LDD1 LDD1 LDD1 LDD1 PROGRESS 0 18 36 54 73 91 100 FILE RELFSW8 RELFSW8 RELFSW8 END SUBFILE PS PS PS PROGRESS 0 81 100 FILE RELFSW8 RELFSW8 SUBFILE RS RS PROGRESS 0 100 BACKUP INFORMATION OUTPUT EXECUTED END The command SYBUP:FILE; will result to output in the last RELFSWx file in the first file range. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 103 APZ 212 40 BUA STATE The system backup copy in main store (BMS) is placed in backup area (BUA). BUA is located in the end of the Data Store (DS) and consists of the following four memory area: • The Small Data Dump (SDD) area • The Large Data Dump (LDD) area • The Program Store (PS) area • The Reference Store (RS) area Certain data, such as the address to the backup area, are stored in the Operating Systems Area (OSA). Several operations in CP such as reload, size alteration and function change affect the validity of the dump in BUA. Therefore, the system invalidates the BUA and changes its state in order to prohibit or allow the loading from BUA during a restart. The different BUA states are numbered from 0 to 4: 0 No backup area 1 Can be copied to SB, valid for backup 2 Can be copied to SB, Not valid for backup 3 No copy to SB, valid for backup 4 No copy to SB, Not valid for backup 5 Backup area temporary unavailable “Can be copied to SB" - This means that order for normalization of CP-SB state is accepted. "Valid for backup" - This means that if a reload is made, it will be a reload from BUA and not from the file. A reload will set the BUA STATE to 2 during a command specified time as shown in the printout below. This is because another reload within that time period should be done from file. If there have not been another reload for more than that specified time interval the system trusts the BUA and will accept reloads from BUA again. <DPWSP; CP STATE - 104 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling MAU NRM SB SBSTATE B WO-FM RPH-A EX/PWO RPH-B SB/PWO BUA STATE 2 END Normal system operation BUA STATE value is 1. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 105 APZ 212 40 HANDLING OF MIDDLEWARE AND FIRMWARE The C++ parts and the C parts of the system (PES) are treated separate from the rest of the system, similar to replacing HW in the earlier APZ’s. They are separate files on the AP that are loaded using BOOTP and TFTP. These parts will not be part of the system back-up copy and so they will require some additional handling. They are stored in APG 40 in V drive structured as shown in the following figure: DIRECTORY STRUCTURE Directory Structure in APG for V: V:/APZ/DATA image CPHW APZ_VM FW CP-<Side> PES event core temp boot CP-<Side> error syslog crash binlog CPBB_upgrade UPBB_upgrade RPHMI_upgrade active control fw_upgrade fw_upgrade Figure 4-55 APZ file structure in APG 40 The directory structure on the AP for the APZ 212 40 system files is vital for the operation of the APZ. The AP has two kinds of disks. On a system disk, software that is executing on the AP is stored. The AP gives support for function change of system disk files. On a data disk, all other files are stored. The V drive is not an AP system drive. On a system disk, new applications are stored. These applications are: • - 106 generation handling of load images (boot image, APZ VM) Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling • .name translation services • .CP supervision (if any) On a data disk, data belonging to the applications are stored. The files required for the booting of APZ 212 40 are stored on a data disk. The directory structure is set-up by the installation software when the AP is installed. Separate directories are used for the concerned files. If several versions of a file exist, then all versions are stored in the same directory. The files used by CPS e g for loading and dumping are not included. APZ\DATA\temp Is the main directory for intermediate storing of files when processed. The contents and file names are internal to APZ. APZ\DATA\boot\image Is the main directory for the boot image and files used by the parent process when starting APZ VM. The purpose of the Parent process is to initiate the APZ VM process, supervise the APZ VM process, supply APZ VM with configuration data, i e memory sizes and transfer the core dump to AP. In this directory is stored the CXC106<running_number> boot image containing Tru64 and the parent process. APZ\DATA\boot\APZ_VM Main directory for the APZ VM and JITC load module. The following files are contained: • CXC161<running_number>the APZ_VM and JITC load module • CDA102<running_number>.ini the configuration file, specifying APZ memory sizes As time elapses, new versions of APZ VM will be released. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 107 APZ 212 40 APZ\DATA\boot\FW\CPBB_upgrade Main directory for the CPBB load modules. The following file is contained: CXC106<running_number> the load module containing CPBB firmware The file is distributed via ftp or on portable medium. APZ\DATA\boot\FW\RPHMI_upgrade Main directory for the RPHMI load modules. It contains the CXC106<running_number> file that is the load module containing RPHMI firmware The files are distributed via ftp or on portable medium. APZ\DATA\boot\FW\UPBB_upgrade Is the directory for the UPBB load modules. It contains the CXC106<running_number> that is the load module containing UPBB firmware The files are distributed via ftp or on portable medium. APZ\DATA\boot\<CP-side>\control Is the directory for the boot controlling files. The following files are contained: - configdb.ini a file containing the reference to the actual config file and actual APZ_VM load module - fwcomp.ini a file containing the reference to the actual firmware compatibility control file - bootdb.ini a file containing the reference to the actual boot image APZ\DATA\boot\<CP-side>\active I the directory for the active boot image file. The file stored in this directory is boot_image, a copy of the boot image file. The file is created on-site. - 108 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling APZ\DATA\boot\<CP-side>\fw_upgrade It is the directory for the firmware compatibility controlling files. The <cxc_xxx>.ini a file containing the compatibility for the actual firmware load module is stored in this directory. The file is distributed via ftp or on portable medium. APZ\DATA\<CP-side>\PES\error and APZ\DATA\<CP-side>\PES\event Is the directory for the error logs that are produced by APZ VM in a specific CP-side. APZ\DATA\<side>\CPHW\binlog , syslog and crash CHHW is the directory for the binary-, system- and crash-logs that are produced by Tru64 in a specific CP-side. APZ\DATA\<side>\CPHW\core Is the directory for the core dumps that are produced by Tru64 in a specific CP-side. The files contained in all above directories are generated by the CP, the file names are internal to the system BOOTING OF THE CP The boot part in the APG40 contains the software presented below. As mentioned above the BOOTP is used to boot-up the CP. BOOTP is a standard protocol that allows a diskless client to discover its own IP address, the address to the server host and the name of the boot file. The protocol sends a broadcast to all potential load servers. The servers have to be configured to make the reply. The BOOTP-implementation in the AP has been replaced by DHCP, i e DHCP implements the BOOTP-protocol. As shown in the figure above the boot software consist of : LZT 123 7917 R3A • Boot Image, stored in: V:\APZ\DATA\boot\image\boot_image • APZ VM, stored in: V:\APZ\DATA\boot\APZ_VM\CXC(CDA) • Firmare, FW structured in upgrade folders: Copyright © 2004 by Ericsson AB - 109 APZ 212 40 V:\APZ\data\boot\cp_a\fw_upgrade\urcf V:\APZ\data\boot\cp_a\fw_upgrade\urlf V:\APZ\data\boot\cp_a\fw_upgrade\CXC1060117_R3A05_ UPBB.upg V:\APZ\data\boot\cpb\fw_upgrade\urcf V:\APZ\data\boot\cpb\fw_upgrade\urlf V:\APZ\data\boot\cpb\fw_upgrade\CXC1060117_R3A05_U PBB.upg V:\APZ\data\boot\fw\UPBB_upgrade\urcf V:\APZ\data\boot\fw\UPBB_upgrade\urlf V:\APZ\data\boot\fw\UPBB_upgrade\CXC1060117_R3A05_ UPBB.upg The APZ-CP system is delivered in three LZY products, three SW packages. 1) The PLEX-OS system 2) APZ-VM, ASA compiler, HAL and OSI 3) The commercial operating system, Tru64, APZ-VM parent plus some drivers and firmware for UPBB and RPHMI The version information associated with the system middleware can be printed with the following command: < LAMIP; “MIDDLEWARE UNIT IDENTITY” “EXECUTING RPHMI MICRO PROGRAM” “CAA 141 142 R3A03” “RPHMI MICRO PROGRAM IN PROM” “CAA 141 142 R3A03” “MAINTENANCE BUS CONTROLLER FIRMWARE” “CXC1060124 R3A03, CXC1060125 R3A02” “SYSTEM BOOT IMAGE” “CXC1060145 R2A” “APZ VIRTUAL MACHINE” “CXC 161 0005 R2A03” - 110 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling “END” IP ADDRESS RESOLUTION The BOOTP client (CP) issues a query on the network to get data about itself. The only data known to the client is it’s MAC-address. The server(s) are configured to response to all requests from the MAC-addresses it recognizes. If the client receives multiple answers it chooses the first to be source for loading. The APGs contains a database that translates from MAC-addresses to IP-addresses. The following IP-addresses will be used for CPIPN-communication in the AP: RPHMRPHM-A 192.168.169.1 AP-(1) A port no 1 192.168.169.2 AP-(1)B port no 1 192.168.170.1 AP-(1)A port no 2 192.168.170.2 AP-(1)B port no 2 IPNX APG40 NODE A NODE B LAN 0 LAN 1 RPHMRPHM-B IPNX UPBB CPCP-A UPBB CPCP-B : LAN 0 (192.168.169.1) : LAN 1 (192.168.170.1) e i c , 170 e i c , 169 e i b , 169 e i b , 170 Figure 4-56 The CP-AP IPN network The following IP-addresses will be used for CP-IPNcommunication in the CP: LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 111 APZ 212 40 192.168.169.128 CP-A port no 1 192.168.170.128 CP-A port no 2 192.168.170.129 CP-B port no 1 192.168.169.129 CP-B port no 2 The following IP-addresses will be used for CPT in the CP: 192.168.169.127 CP-A 192.168.170.127 CP-B The following IP-address will be used by the RTD: 192.168.200.1 CP-A and CP-B All addresses above are hardcoded. At replacement of FRU the corresponding CP MAC addresses must be entered in AP. The procedure on how to read and change the MAC addresses is described in the ALEX document “START UP AND INITIAL NE TEST OF APZ 212 402” CONFIGURATION DATA The middleware configuration data can be divided into three categories: • - 112 Hardware dependent. Hardware configuration data is connected to the site and not to any boot image. An example on this is the APZ memory sizes. These sizes must be known when APZ VM creates the objects that implement the memories. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling • APZ VM dependent. APZ VM configuration data is the same at all installations of APZ VM. The configuration data file is loaded and overridden by APZ VM. Based on the configuration data file the APZ VM structures and allocates the memory. The remaining memory, when APZ VM, Tru64 etc have got their parts, is allocated to APZ. PS and RS are set to fixed sizes, the remaining is allocated to DS. APZ VM has no data that it requires to be configured. The configuration data reflects the characteristics of APZ and is traditionally either set by APZ at system restart (job buffer sizes) or by commands (memory sizes). For data that is set during the restart, APZ VM will allocate a maximum size. APZ will then inform APZ VM about how much APZ wants to use. APZ memory sizes are specified in APZ\DATA\boot\image\CDA102<running_number>.ini. The file is however overridden. • APZ system dependent. APZ system configuration data is data that is set by APZ but required by APZ VM. Examples on this type of data are job buffer sizes and load limits. However, this data is set by APZ in the restart. It is thus sufficient for APZ VM to set an initial value or allocate a maximum size that later on can be changed by APZ. All configuration data is handled by the common configuration file. BOOTING SEQUENCE The booting sequence is • boot of boot image followed by • boot of APZ VM incl LAB1 followed by • boot of CP system backup copy 1 LAB, Loading Administration, Bootstrap Loader, performs reloading of backup information at system restart with reload or when ordered by command, from external medium or main store. It is the APZ dump loader that loads the CP system backup copy. LAB is included in the APZ VM load module, but belongs to CPS LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 113 APZ 212 40 FUNCTION CHANGE OF MIDDLEWARE The upgrade of middleware is performed through a function change. Function change of middleware software can be seen as an extension to the side-switch method. The same mechanism will be used for changing APZ VM, boot image or configuration file. The function change of the middleware procedure is described in the homonymous OPIs. The main steps described here do not replace the OPIs, they are aimed to highlight the procedure: 4.1.1.1 Make a new CP system backup copy (SYBUP) Install the new versions of boot and firmware compatibility files (AP actions) 4.1.1.2 Update the configuration database for the SB-side (AP actions). Activate the new PES and CPHW dump with the command cfeted This command is used when performing a system upgrade/backing. The command gets as input the versions of the upgradable packages (HWdump and PES-dump) from the configuration files containing the information required. This is done when performing a normal (aut) upgrade/backing . 4.1.1.3 cfeted -t aut -s <CP-SB> -n pe In the example above, the command is typed with the aut option followed by -n pe, which implies an upgrade to the latest delivered PE package. No interaction with the operator is needed. The command will be parsed as it is and the system upgrade will proceed accordingly. 4.1.1.4 Set the SB to SE (DPSES) . Boot the SB-side with the boot image and APZ VM binary (Command PTCPR) 4.1.1.5 Load the SB-side with the CP system backup copy created in step 1 (Command PTCPL) 4.1.1.6 - 114 4.1.1.7 Bring over updated data from the EX-side (FCDAT) 4.1.1.8 Activate the application (SYATI:RESTART) 4.1.1.9 Switch SB to EX (FCSWI or DPPAI) Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling 4.1.1.10 Make a new CP system backup copy (SYBUP). Update the configuration database for the SB-side (AP actions). 4.1.1.11 Normalize the system (DPPAI). The normalization will boot the SB-side. 4.1.1.12 Step 6 will generate an inconsistency alarm. The alarm should be tolerated. Step 9 can be performed as a soft side switch if there are no dependencies to the system backup copy. If there are dependencies, a restart will be required. Steps 10 and 12 may be reversed depending on how the normalization procedure is implemented. Steps 3 and 11 require updates to the configuration database. At function change, the contents of the configdb and/or fwcomp files have to be updated. This is a critical action that must not fail. The updates can be performed according to two main approaches: • The new files are distributed together with the binaries and installed in the file tree controlling the B-side. When the system is made parallel, the system updates the tree controlling the A- side before the SB-side is rebooted. • The files are edited by the authorized user, first for the Bside and later for the A-side. This means technician intervention in the process flow (first risk of fault introduction). The technician has to edit the files without introducing faults (second risk of fault introduction). The second approach is used normally. Fall back during function change If a function change fails, then a fall back will be made to the old system that is active in the SB-side. After the switch over, the configuration database for the original SB-side will refer to files that have been proven not to work. The only action after a failed function change is to update the configuration database to refer to the original versions of APZ_VM and Tru64 and normalize the system (DPPAI). LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 115 APZ 212 40 OPIs: The following OPIs are handling the central middleware: • Central Middleware, File, Copy • Function Change, Central Middleware, Side Switch Method, Change • Central Middleware, Restore, Initiate • Function Change, Central Middleware, Soft Side Switch Method, Change FUNCTION CHANGE OF FIRMWARE Function change of firmware software can be seen as a variation on function change of middleware. Firmware updates are sometimes connected to updates of hardware limiting the possibilities for automatically rollback in case of failure. The upgrade procedure for the FW is the same as for TRU64 and VM. Successful function change A detailed procedure on the function change of firmware is described in the related OPIs. The main steps of the procedure for function change of the firmware and related hardware are presented here. Some steps in the procedure to upgrade the firmware might take time. Wait until the system responds. An interruption before the operation is completed might put the APZ out of operation for several days. 4.1.1.1 Make a new CP system backup copy (SYBUP) Install the new versions of boot and firmware compatibility files (AP actions) 4.1.1.2 Update the configuration database for the SB-side (AP actions) 4.1.1.3 4.1.1.4 Set the SB to SE (DPSES) 4.1.1.5 Install the new hardware in the SB-side Boot the SB-side with the boot image and APZ VM binary (Command PTCPR) 4.1.1.6 - 116 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling Load the SB-side with the CP system backup copy created in step 1 (Command PTCPL) 4.1.1.7 4.1.1.8 Bring over updated data from the EX-side (FCDAT) 4.1.1.9 Activate the application (SYATI:RESTART) 4.1.1.10 Switch SB to EX (FCSWI or DPPAI) 4.1.1.11 Make a new CP system backup copy (SYBUP) Update the configuration database for the SB-side (AP actions) 4.1.1.12 4.1.1.13 Install the new hardware in the SB-side Normalize the system (DPPAI). The normalization will boot the SB-side. 4.1.1.14 Step 7 will generate an inconsistency alarm. The alarm should be tolerated. Step 10 can be performed as a soft side switch as there are no dependencies to the system backup copy. A function change of firmware may be made together with a function change of the middleware. Dependencies may then be introduced, see previous chapter, that requires a restart after the data transfer. Steps 11 and 14 may be reversed depending on how the normalization procedure is implemented. Steps 3 and 12 require updates to the configuration database. Fall back during function change The firmware and associated hardware is assumed to be backwards compatible almost forever. The control file contains a list of compatible firmware versions. The oldest version defines the point of no return which must not be passed without manual intervention. The manual intervention is to restore the hardware to the original status and update the references in the fwcomp file to refer to the corresponding compatibility control file. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 117 APZ 212 40 CONFIGURATION UPDATES The only configuration update that is possible is to alter the APZ memory definitions. The supported definitions are defined in the config-files, one file for each supported configuration. The configuration update is similar to function change of middleware and will proceed as follows: 1. Make a new CP system backup copy (SYBUP) 2. Install the new versions of the configuration files (AP actions) 3. Update the configuration database for the SB-side (AP actions). 4. Set the SB to SE (DPSES). 5. Boot the SB-side with the boot image and APZ VM binary (Command PTCPR) 6. Load the SB-side with the CP system backup copy created in step 1 (Command PTCPL) 7. Bring over updated data from the EX-side (FCDAT). 8. Activate the application (SYATI:RESTART). 9. Switch SB to EX (FCSWI or DPPAI) 10. Make a new CP system backup copy (SYBUP). 11. Update the configuration database for the SB-side (AP or system initiated actions). 12. Normalize the system (DPPAI). The normalization will boot the SB-side. Step 6 will generate an inconsistency alarm. The alarm should be tolerated. Step 9 can be performed as a soft side switch if there are no dependencies to the system backup copy. If there are dependencies, a restart will be required. Steps 10 and 12 may be reversed depending on how the normalization procedure is implemented. Steps 3 and 11 require updates to the configuration database. - 118 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling The memory sizes specified in the configuration file will be overridden by APZ VM. When PES and CPHW have made their initializations, the remaining memory is available to APZ. PS and RS are set to predefined sizes, calculated from the system dimensions, and the remaining part is allocated to DS. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 119 APZ 212 40 SYSTEM LOAD- RELOAD LOADING CP contains loading functions for: • Intial loading at system start. • Reloading of the system backup copy in case of a serious system fault. • Loading during function change. Initial loading at system start can be made of the system backup copy from another exchange or system test plant. Initial loading of the system backup copy is made by the PES module LAB. Reloading of the system backup copy in case of a serious system fault is made by LAB upon order from the maintenance subsystem MAS. At reloading, the youngest available system backup copy on line is normally selected. But the system can also be configured to automatically use an older, more proven, system backup copy if the reloading of a previous backup copy is not successful. After a successful reloading the command log associated with the system backup copy must be executed manually. - 120 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling INITIAL START The behavior at initial start depends on the PHC state. Following two state machines show the system initial start cases: DPPAI Figure 4-57 System start, case1. CP-A is powered before CP-B and PHCI button is pressed The System Start in case 1 above requires loading using the CPT command PTCPL. The APZ-VM is in stop loop until the command is entered: <PTCOI; EXECUTED cpt< cpt<PTCPL:CS=A,FILE=RELFSW8; ORDERED cpt< CPT MESSAGE INITIAL LOADING (ABSOLUTE) STARTED END CPT MESSAGE INITIAL LOADING (ABSOLUTE) IN PROGRESS CPT MESSAGE INITIAL LOADING (ABSOLUTE) FINISHED LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 121 APZ 212 40 END DPPAI Figure 4-58 System start, case2. CP-A is powered before CP-B and PHCI button is not pressed If the PHCI is not pressed the APZ-VM is fetching the RELFSW0 and restarts the system. SYSTEM RELOAD This parts highlights the actions caused by a reload of the CP system backup copy initiated by the system. The CP system backup copy contains the identities of the middleware files that were used when the CP system backup copy was created and the RSgeneration. Before the CP system backup copy is loaded, the identities of the executing middleware are checked against the identities kept in the CP system backup copy. The result of the check may be either that all identities are equal or not equal. The normal case is when they are equal and no corrective actions will be required. - 122 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 4 APZ 212 40 Handling In case of incompatible middleware the system will detect the incompatibility and will make an attempt to launch the system. An alarm will be raised and a registration in the APZ VM error log will be made. The technician has to decide how the incompatibility should be handled by either accepting it and clearing the alarm or acting upon it by correcting the configuration database and then initiate a boot. If reloading takes place, the following information is loaded to CP stores: • Recording type of data. • The youngest version of data which control the program path from the backup copy in main store. (the oldest version when taken from the external system backup copy). • Program information. • Reference information When the system is reloaded always the latest version of APZ-VM and TRU64 is used. The reason behind this is that they are changed very seldom and they should always be backward compatible. If there is a need to use an older version it must be initiated manually. In APZ 212 40 we have more loadable software compared to classic APZ, TRU64 and APZ-VM. When a back up copy is made a certain combination of TRU64 and APZ-VM is used together with the PLEX-dump. Product information is stored on the PLEXdump about the other software. When a reload of the PLEX-dump is performed the latest version of the TRU64 and APZ-VM will always be used. There exist a requirement on TRU64 and APZ-VM to always be backward compatible. If there is a need to use an older version of TRU64 and APZ-VM, some configuration files must be changed and this must be done manually. There are commands to print and edit this information. NEW COMMANDS IN APZ 212 40 LACBP: Loading Administration, Compiled Block Data, Print LACMS: Loading Administration, Compiler Mode, Set LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 123 APZ 212 40 LAEMC: Loading Administration, Execution Mode, Change LAEMP: Loading Administration, Execution Mode, Print LAMIP: Loading Administration, Middleware Identity, Print LMFRI: License Management File Release, Initiate LMLIC: License Management Logging Interval, Change LMLIP: License Management Logging Interval, Print PACRP: AXE Parameter, Cross-Reference, Print PADBI: AXE Parameter, Database Build, Initiate PAFTI: AXE Parameter, Fault Test, Initiate SYCLP: System Start and Restart Functions, Command Log Status, Print - 124 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding 5 Fault Finding LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 125 APZ 212 40 OBJECTIVES Objectives After completion of this chapter, participants will have knowledge of handling: • Hardware Faults in APZ 212 40 • • • Plex Engine Faults Software Faults in APZ 212 40 System recovery for the APZ 212 40 Figure 5-59 Objectives INTRODUCTION The CP in APZ 212 40 is duplicated using the principle of warm standby, hot on demand. The meaning of warm standby is that it is ready to take over as new EX in the case of a serious fault in current EX. Hot on demand refers to the Soft Side Switch (SSS), command DPSWI. The duplication means that at any given moment one of the CP-sides is executive (EX) and the other CPside is standby (SB). The operation state of the duplicated CP is supervised by the MAU. To protect AXE from data corruption due to SW faults, and at the same time minimize the influence on the application, fault detection and recovery functions are introduced in PES and in MAS SW. After a fault has been detected, the least disturbing actions possible are used to resume program execution. This chapter will describe the fault finding and error handling for the APZ 212 40. HANDLING OF HARDWARE FAULTS General APZ is a fault tolerant system which is well adapted to the practical realities of the occurrences of different types of faults. Historical data indicates that about 40% of the component faults occur as sudden solid (permanent) faults. The remainder occur as transient (temporary) faults recurring at intervals that can vary between a few seconds and several weeks. - 126 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding The system has been designed to be tolerant of technician errors, and a reasonable number of design faults in the SW and HW. A fault or a disturbance can cause a malfunction event meaning that an incorrect result (error) is produced in a unit. Fault detection can then be performed by means of check circuits (supervisory circuits) or by checks in PES and programs. The task of the recovery function is to eliminate the effects of a fault and to inhibit the propagation of the incorrect result. The recovery of CP HW is controlled by the MAU. The process for the normalization of the CP is described as an independent function of its own, since it is used as a subfunction by the recovery function as well as by other functions. The diagnostics function localizes faults down to a Field Replaceable Unit (FRU). This is accomplished by analyzing the information recorded about fault symptoms in the units that are either working or subjected to special testing. When a repair is needed, an alarm is issued. The operator then orders a diagnosis of the system, replaces FRU and checks the result. Faults in CPUM result to CP Fault alarm slogan and the diagnosis function cannot point out a specific board. Therefore, board replacement cannot take place in CPUM, the whole magazine must be replaced in case of hardware fault in CPU, UPBB, Power module and RPHMI. The figure below shows (simplified) how the different blocks interacts at a hardware fault. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 127 APZ 212 40 4 Handling of Hardware faults Test programs 5 Fault detection , Hardware faults 6 Operator Diagnostics 1 7 Recovery , Hardware faults 3 Operator 2 Repair , Hardware faults Expert support, Hardware faults Figure 5-60 Handling of hardware faults The modules shown in the figure above are described in the following sub chapters. FAULT DETECTION, HARDWARE FAULTS The function detects faults in the central processor. Fault detection is made by hardware or by programmed tests. Normally a hardware error of type "Fault in CP-side" has been detected. The tests are executed continuously on spare time or at regular intervals. The faults can be divided in the following categories: • "No disturbing HW fault", for example correctable bit fault in store, fan fault or a fault that reduces the ability to detect other faults. • "HW fault in CP-side SB". • "HW fault in CP-side EX". The fault detection in hardware is implemented by: • - 128 Matching between the parallel working RPH-sides Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding • Use of error detection/correction code and parity supervision of buses inside the CP-sides. • Use of "Watch dog timer" to check program execution. Programmed tests are used to: • Exercise the hardware in order to get error detection by hardware. • Test of the data processing part of the hardware. • Test the supervision circuits that are used to detect errors. RECOVERY, HARDWARE FAULTS The function isolates the faulty unit and prevents faulty data from being used. Essential parts of recovery are: • Fault isolation. • Error data collection. • Start attempt. When an error is detected a Side Location Error (SLE) indication bit is set in the WSR. This enables the other CP-side to start a recovery sequence. Error interrupt information will be collected for later diagnostic use. A normalization attempt is made in order to determine if the fault was temporary and a Fault Mark (FM) is set for the SB-side. A bit fault in store does not affect the ability to recover from other faults. In the case of other type of faults, the following fault situations can be handled: • Faults that have stopped CP-SB. • Faults that have stopped CP-EX. Recovery ends by ordering REPAIR, HARDWARE FAULTS to, if necessary, update status for alarm CP FAULT. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 129 APZ 212 40 Recovery after "No disturbing HW fault" In the case of a bit fault in a store, fault is isolated by use of error correcting code. Information about where the fault occurred are saved and an attempt to erase the bit fault will be made by writing back correct contents in the store. In the case of a fault in the circuits used to detect other faults, no actual fault isolation is done, but information about the fault are stored. Recovery after "Fault in CP side SB". Fault in CP side SB, will be periodically detected by CP-EX that immediately will halt CP-SB. CP-SB is not in normal state, Then error information will be collected and a retry (normalization) will be performed to re-establish normal working state. If the fault is permanent or occurs frequently, alarm CP FAULT is issued. Recovery after "Fault in CP side EX". At fault in CP-EX, CP-SB will take over control and make itself new EX, halt old CP-EX and reload the application. CP-SB is not in normal state, the error information will be collected and a retry (normalization) will be performed to re-establish normal working state. If the fault is permanent or occurs frequently, alarm CP FAULT is issued. REPAIR, HARDWARE FAULTS The function consists of commands used by the operator to repair the system. When a repair is needed, an alarm is issued. The operator then orders a diagnosis of the system, replaces the faulty HW and makes a repair check of the new HW. EXPERT SUPPORT, HARDWARE FAULTS The function contains aids for trouble shooting manually and for start after system stoppage. An error log, CP EVENT RECORD, can be printed on demand. This log shows the information on which the built in diagnostics has based its result. Error Interrupt Data must be printed at each error interrupt in CP if this printing function has been activated beforehand (with command DIEFC). - 130 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding In addition, the related information can be extracted from the logs handled by the Central Log Handler (CLH) function. A list of FRUs can be printed (command DPHIP). This list shows product number and serial number for units in the CP magazine and also board name, used in diagnostics printouts, and position within the magazine. Firmware identities are included in printout of ERROR INTERRUPT INFORMATION (printed at command DIECP). They can also be printed by means of command PTSRP. <ptcoi; EXECUTED cpt<ptsrp:reg=fanreg; CPT MESSAGE ERROR INFORMATION REGISTER REGISTER FANREG DATA H'07 END OPERATOR PROCEDURES When the alarm is issued pointing to a hardware fault, <ALLIP:ALCAT=PROC; ALARM LIST A2/PROC "WT_R10_CNG0_TB1" 009 040422 CP FAULT 1038 the operator makes a diagnosis of the system by the command REPCI <REPCI; REPCI; <; ORDERED < CP DIAGNOSIS TEST RESULT FAULT FAULTTYPE PERMANENT LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 131 APZ 212 40 MAG PCB FANC-UP FAN-1 FANC-UP FAN-2 FANC-UP FAN-0 REPLACED REASON When the PCB and the magazine information is known the manual intervention is initiated by the command REMCI: <REMCI:MAG=FANC-UP,PCB=FAN-0; REMCI:MAG=FANC-UP,PCB=FAN-0; <; ORDERED CP MANUAL INTERVENTION INTERVENTION PREPARATION SUCCESSFUL ACTION LOCATE REPLACE MAG FANC-UP FANC-UP PCB FAN-0 FAN-0 NOTE LOCATE PCB IN POS 01 IN MAG END After the replacement the new MAC addresses must be entered in the DHCP server (AP) and a repair test must be performed with the command RECCI: <RECCI; RECCI; <; ORDERED < CP REPAIR SUCCESSFUL END Both CP sides are updated and the system runs in normal operation state. - 132 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding SOFTWARE FAULTS FAULT DETECTION When a fault detecting function detects an error, an analysis is performed to determine the source of the fault by means of fault codes. Each fault code is unique, and an exact picture of the corresponding action can only be obtained from a matrix implemented in the block which decides the recovery action. Following, the fault detection in various system units is described. Detection of Plex software faults The supervision of program execution and the run-time checks are performed by the ASA-compiler. Faults can also be detected by supervisory programs in Plex-OS. Forlopp adapted blocks can use an audit time supervision function to detect hanging situation. The function “Automatic release of hanging software records” is also available for Forlopp adapted blocks. A check of the Forlopp identity at signal reception (Forlopp Execution Control Function ECF) can be used to get an early fault detection of software errors in a call chain. Detection of Plex Engine faults Faults in the compiled code produced by the ASA-compiler are normal interpreted as a Plex fault and recovered as a normal Plex fault. Faults in APZ-VM can be detected by the VM itself, by the OS or by the Program Handling Circuit PHC. RECOVERY The recovery actions are related to which level the fault was detected. Generally, the recovery requires actions in middleware or APZ applications APZ VM Recovery The recovery process of APZ VM and APZ can be divided into a number of scenarios: LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 133 APZ 212 40 • re-initialization of APZ VM conditionally followed by reloading of APZ • loading of APZ VM including LAB followed by reloading of the CP system backup copy. This scenario strategy would be to reload APZ VM from the AP and start it but in such a case it is an initial loading and not recovery action. • booting of Tru64 and all higher layers • reloading of APZ. Several cases are implemented depending on the grade of the fault and the recovery functions that are activated in CP. The faults requiring system restart are described further on in this chapter. VM/OS/PHC detected Yes Bootload Yes Progerror New New VM-process VM-process Require bootload? No PLEX Running without VM-restart? No Should Progerror Yes Progerror Should Progerror be sent? Yes Send Progerror After VM-start Recovery action? No System System Restart Restart be sent after VM restart? No PROGERROR System System Running Running 1. Forlopp 2. Selective Normal Normal Restart Recovery Recovery3. System Restart Other Other Recovery Recovery action action Report& Report& Alarm Alarm Kill Kill VM-process VM-process System System Running Running VM VMRestarted Restarted Figure 5-61 APZ VM Error handling As shown in the figure above a fault situation is when APZ VM itself has detected an internal error. A possible recovery action then is to reinitiate the APZ VM without reload and then restart APZ. An extension to this recovery action is to reload the CP system backup copy. - 134 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding This recovery action is triggered when APZ VM detects an internal error. APZ VM terminates. For example, the Program Handling Circuit detects a hanging and the value of System Restart Counter is 1 (is stepped by the PHC). A new APZ-VM process must be started. The operating system (Tru64) will inform the parent process that APZ VM has been terminated. The parent process reinitiates the APZ VM: A new instance of APZ VM will bee created from the RAM disc. The new instance will configure itself exactly as the deceased instance. When the reinitialization has been concluded, APZ is ordered to reload of the CP system backup copy APZ Software recovery Almost all software faults in Plex program can be recovered by either Forlopp release or by use of selective restart. Software faults detected in a Forlopp adapted process are normally recovered by Forlopp release. The flow of recovery functions is shown in the figure below. Software Software error error Error related to regional processor? Yes No Forlopp release possible? Selective restart function active? Yes No Error Intensity counter too high? No Test and possible block RP! Forlopp Release! Yes No System Restart acc. to rank! Yes Yes Error intensity counter too high? No Figure 5-62 Software recovery LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 135 APZ 212 40 Faults in a not Forlopp adapted process are normally handled by the function selective restart ( if active). Different actions are taken depending on the faul type and error intensity, and on the block category of the block where the error was detected. Block category is assigned on the basis of the importance of the block for the traffic process and for the central system information. When a system restart is performed all blocks in the system are restarted When a new APZ-VM process is started the normal recovery action is performed, Forlopp release, Selective restart or System restart. Forlopp release or Selective restart is possible if this can be used for all interrupted levels (THL, BAL1 and BAL2). If the System Restart Counter is > 1 a system restart according to the restart rank is performed. The Recovery process is described here in details Forlopp Release At a Forlopp release, only the Forlopp adapted process is affected. Other functions in the system are not affected. To enable an analysis of the reason for the fault, the error information is preserved. An alarm is also initiated. Next Next Forlopp Forlopp Release Release Delay time > Current “Forlopp Release Delay Limit”? Calculated delay time half of current “Forlopp Release Delay Limit”? Yes System Restart acc. to rank! No Execute Forlopp Release! No Yes Initiate ALARM “Long Delay for Forlopp Releases” Figure 5-63 Forlopp release System Restart The purpose of a system restart is to start the system from a welldefined position. System restarts of different ranks can be performed: - 136 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding • Small system restart • Large system restart without reloading • Large system restart with reloading At a small system restart, the application system normally saves all the call set-up’s which are in speech position. At a large system restart, all call set-up’s are released. A large system restart with reloading implies that the central processor memory is reloaded with programs and data from an external medium or from the primary store. Error information is preserved for an analysis of the reason for the error. An alarm is also initiated. After a system restart the rank is increased. If a new system restart occurs within 4 minutes (default) after small system restart is finished, the next system restart will be large if the last one was a small and reload if the last one was a large system restart. The rank is decreased after 4 minutes (default). The delay after a small system restart before the restart rank is set to small and the delay after large system restart before the restart rank is set to small is normally 4 minutes but is changeable between 1 minutes to 10 minutes. The number of small restarts before escalation to a large restart is normally 1 and the number of large restarts before escalation to a reload is normally 2 but is changeable between 1 to 5. If a manual restart is initiated, the next automatic restart will normally be of the same rank as the manual initiated one. When a large system restart with reload takes place a side switch is also performed. The SB-side is preloaded therefore the loading time can be omitted. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 137 APZ 212 40 Boot Load Boot load means that Plex Engine and Vendor OS will be loaded. After a boot load a new VM process is started and the Plex programs are loaded by a System restart with reload. Normally when a boot load takes place a side switch is performed. The SBside is preloaded therefore the loading time can be omitted. As mentioned in chapter 4 the BOOTP is used to boot-up the CP. BOOTP and TFTP are used to load the boot image from the APG 40. The bootimage is the executable that is loaded at system boot. The bootimage consists of Tru64, proprietary drivers and the process that should be started, i e the parent process. The bootimage should be independent of node, i e all nodes that are running the same version of APZ VM should use the same image The Tru64 in APZ 212 40 operates diskless. This implies that booting has to take place over a network supported by NFS or that the boot image is loaded into a RAM-disk. APZ 212 40 will use a RAM-disk. The booting can be divided into two separate phases. The first phase allows a diskless machine to discover its own IP address, the address of the server host, the name of the boot file by using BOOTP. The second phase is to load the boot file into memory and execute it. The second phase uses TFTP. BOOTP and TFTP are running over TCP. TCP/IP is used as carrier protocol between the CP and AP. The IP-addresses used by the CPsides and APs are configured in the AP. The same addresses are used in every exchange. The IP-addresses used by the APs when communicating with the surrounding world are dynamical. The IPaddresses at the CP-end are connected to the physical side. The AP provides the boot server function. A database is set up in the boot server (AP) containing the path to the boot image i.e. V:\APZ\DATA\boot\image\. When the boot server has been identified, it is possible to choose the correct version of the boot file. There may exist many versions of Tru64 and APZ VM. Once the boot process has started, it reads the configdb-file to find which APZ VM-version to load. The process supplies APZ_VM with parameters, i e APZ configuration data, to use. At boot load, the versions of boot image and APZ VM are loaded that are specified by the configuration database. - 138 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding Which version of the CP system backup copy to load is determined by CPS. Before the CP system backup copy is loaded, it is checked if the running versions of boot image and APZ VM are the preferred ones. If there is a mismatch, the loading will be continued, but an alarm will be raised and a registration in the APZ VM error log will be made. BOOTING SEQUENCE As mentioned in chapter 4 the booting sequence is • boot of boot image followed by • boot of APZ VM incl LAB followed by • boot of CP system backup copy When a boot is finished, the end result should be a consistent system, i.e. boot image, APZ VM and APZ should fit together. The CP system backup copy generation to load is unknown when the booting is initiated. The default versions, equal to those that were loaded before the boot, of boot image and APZ VM are loaded. These versions are indicated in the configuration database. If an inconsistency is detected later on, the booting procedure will be concluded, but, the technician will be notified. APZ VM must be able to handle the RS-generation indicated in the system backup copy. If not, the booting sequence will be interrupted. A file service within APZ VM is used for accessing remote files. The same service is used by other functions in APZ, that are not using the regular PLEX I/O-interface, e g back-up. Selective Restart A selective restart implies that, if a software error is detected in a block which is of minor importance for the traffic process, the system restart can be either suppressed or delayed until a low traffic time. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 139 APZ 212 40 To determine the recovery action, the fault type and block category are selected. At each recovery which does not lead to a system restart, a error intensity counter is incremented. When the maximum limit is reached, a system restart is initiated. The error information is preserved and an alarm is initiated at each recovery. If a delayed system restart is initiated, another alarm is received. In this case, the operator shall verify that the traffic process is working and set the time when the delayed system restart should be executed. The Selective restart flow is shown in the figure below: Block category? Delayed System restart with reload 1 0 2 3 Immediate system restart Selective restart type? Immediate Delayed small system restart Delayed (default) Ignore OPI Performed at specified time When? When? Ignore Cancel Figure 5-64 Selective restart Unconditional System Restart Independent of Selective restart and/or Forlopp execution an unconditional System restart is done in following situations: - 140 • HW-fault. • No reply on the system_restart_signal at System restart. • Some VM-faults detected by APZ-VM or Vendor OS. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding • When the CP has lost the contact with several RP’s. • No contact with RPH at System restart. • Too long delay for Forlopp releases. • Buffer congestion in JBA-JBD. • SW-fault at Size alteration in block with block category 1,2 and 3. • Errors found by DBS. • Errors found by the Function Change. Decision on Action Depending on Fault Type An error intensity counter is incremented for most recovery actions which don’t lead to immediate system restart. When the maximum limit is reached, a system restart is executed. The maximum limit can be changed by command. Error information is also preserved for an analysis of the reason for the error. When error information is preserved, an alarm is also initiated. pp rlo ase o F ele R Selective Decide Decide Restart Recovery Recovery Action Action Sy Re ste sta m rt Kill Kill Current Current Job Job Save Save Software Software Error Error Information Information Order Order Forlopp Forlopp Release Release Initiate Initiate Alarm Alarm “Software “Software Error” Error” Kill Kill Current Current Job Job Save Save Software Software Error Error Information Information Order Order Delayed Delayed System System Restart Restart Initiate Initiate Alarm Alarm “Software “Software Error” Error” Stop Stop All All Program Program Execution Execution Save Save Restart Restart Information Information Restart Restart All All Blocks Blocks Initiate Initiate Alarm Alarm “System “System Restart Restart Figure 5-65 Software recovery decision At Forlopp release, the current Forlopp release delay limit is checked. If the delay time for Forlopp release, due to a mass Forlopp release situation, is longer than current limit, a system restart is initiated. At delayed system restart (with or without reloading) and at Forlopp release, any job in progress is terminated and the next job is started. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 141 APZ 212 40 At delayed system restart (with or without reloading), an alarm is initiated. SOFTWARE ERROR SITUATION INFORMATION When an error situation is analysed, the dumped error information can be printed by means of a command. Error information for several different error situations can have been preserved at the same time. Two different types of error information exist, i.e. restart information on one hand and software error information. Both types of information can be preserved from 4 different error occasions. The printout below shows a survey of the last four restarts: <SYRIP:SURVEY; ORDERED < SOFTWARE RECOVERY SURVEY EVENT TYPE 37 RELOAD 24 SMALL 31 SMALL 23 SMALL EXPLANATION POINTER TOO LARGE POINTER TOO LARGE POINTER TOO LARGE POINTER TOO LARGE EVENTCNT FRDEL EVENT CODE INF1 INF2 INF3 INF4 SIDE STATE DATE TIME ACTIVE 37 H'0008 H'0001 H'AB29 H'0000 H'0000 A-EX NORMAL 040428 1730 YES 24 H'0008 H'0001 H'AB29 H'0000 H'0000 A SINGLE 040428 0000 YES 31 H'0008 H'0001 H'AB29 H'0000 H'0000 A-EX NORMAL 040428 0000 YES 23 H'0008 H'0001 H'AB29 H'0000 H'0000 A SINGLE 040428 0000 YES END Information from one error situation when an alarm has been initiated is saved until it has been printed or until 1 week has elapsed since the error occasion. If a new error occurs and no idle records are available for preservation, the latest record is overwritten. If the error information is of type software error information and the fault code and the information words are equal (different number of information word are used for different fault codes) as an already stored dump, the preservation is ignored but an error counter is stepped for this event. If a Forlopp execution is in progress, information is preserved for the individuals linked to the Forlopp and the individuals variable information. - 142 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding If information is not from a system restart, also other variables (i.e. common stored) are preserved for the faulty/initiating block. When a block is running in “Optimized mode” the information from last program jumps (JAM) and contents of the registers in the Central processor are reduced. The Restart information contains following information: • Explanation of the fault • Forlopp information (connected individuals) • Software and correction identities • Error signals (hardware faults) • Previous working state of Central Processor • CP load and number of call attempts for the last minute before the System Restart. • Last program jumps (JAM) • Last sent and received signals • Contents of the registers in the Central processor • Contents of preserved variables • Signals in job buffers The Software error information contains following information: LZT 123 7917 R3A • Explanation of the fault • Forlopp information (connected individuals) • Software and correction identities • CP load and number of call attempts for the last minute before the current event. • Last program jumps (JAM) • Last sent and received signals • Contents of the registers in the Central processor • Contents of preserved variables for faulty/initiating block Copyright © 2004 by Ericsson AB - 143 APZ 212 40 • Contents of variables belonging to the connected individuals in Forlopp. • Signals in job buffers The Printout below shows an extraction of data for a restart event (dumped error information). It focuses on the JAM layout of blocks running in Basic respective optimized mode. <SYRIP:SURVEY; ORDERED SOFTWARE RECOVERY SURVEY EVENT TYPE 26 LARGE EXPLANATION PES INTERNAL ERROR, GROUP 5 EVENTCNT FRDEL EVENT CODE INF1 INF2 INF3 INF4 SIDE STATE DATE TIME ACTIVE 26 H'8F05 H'000A H'323F H'0002 H'0000 A SINGLE 040823 1859 NO END Event 26 points to PES internal error. The Restart Information differs from those in the classic APZs in how the JAM is presented. The printout below highlights this point. <SYRIP:EVENT=26; ORDERED < RESTART INFORMATION RANK LARGE REASON PROGRAM ERROR EVENT TYPE 26 LARGE - - - EXPLANATION PES INTERNAL ERROR, GROUP 5 JAM CONTENTS LEV TRL BLOCK MISSRD - - TRL MSWREC FROM TO FROM TO FROM TO FROM TO H'36C6 H'36C0 H'36BC H'36B9 H'36B0 H'3B28 H'3B28 H'3B25 H'3B21 H'36AE H'36AC H'36A9 H'36A9 H'3B28 H'3B28 H'0813 H'080D H'080A H'07F2 H'07E8 H'07DC Blocks in Basic Compiler Mode - 144 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding TRL THL THL THL THL THL THL THL THL THL THL THL JOB MHOMH C7CO C7DR2 C7ST2C C7DR2 C7CO MRRMH MMM MFM MMM MSCCO H'00BA H'00BA H'02A5 H'02A4 H'308E H'308E H'518F H'518F H'0A80 H'0A80 H'4E6F H'4E6F H'2E9D H'2E9D H'0795 H'0795 H'674E H'674E H'11EB H'11EB H'0400 H'0400 H'02AE H'02AE THL MHO H'16E5 H'16D0 H'5273 H'5270 H'16CE H'1676 H'53B8 H'53B4 H'1674 H'166E H'076A H'075F H'1658 H'1614 H'1610 H'160B LZT 123 7917 R3A Blocks in Optimized Compiler Mode Copyright © 2004 by Ericsson AB - 145 APZ 212 40 CENTRAL LOG HANDLER, CLH INTRODUCTION The APZ 212 40 system outputs certain logs if a fault or other irregularities should occur. The APZ contains the Central Log Handler (CLH) function to store all logging and core/crash dumping information. This function is implemented in the Plex Engine Subsystem (PES).The logs are stored on hard disk in the AP. The core and crash logs are files large in size. Because of that, the logs can not be included in an ordinary Trouble Report as an attachment. Instead, the logs are handled separately from the Trouble Report. This chapter is aimed to provide rather the knowledge to handle the logs than analyse them. The normal procedure for the logs is either: • transferred to an ftp server that can be accessed by Ericsson support centre or • transferred directly to Ericsson support centre or • put on a removable medium (DAT tape) and sent to Ericsson support centre. The transfer of the logs is described in Operational Instruction Central Log Handler, CP Log, Transfer. CLH LOG AND DUMP CHANNELS The logs and dumps contain system information. The writing is enabled when the vendor OS is operational. Several APZ and Operating system function store information in certain logs /dumps by opening a channel. The following logs and dumps are defined in the APZ 212 40. SYSLOG A syslog event consists of one line of text in printable ASCII format. Any CP application is allowed to post messages to the syslog. E.g. Tru64 uses the syslog channel. An extraction of the syslog file is shown below: - 146 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding V:\APZ\Data\CPA\CPHW\syslog>type Syslog_20040804_142822.log 20040812_195920_000000 syslog: cpbberd: Could not specificly identify CPU correctable binlog event, mCheckCode 0x86 19700101_000505_000000 syslog: Early APZ-VM message 6. Data Store is requested for recreation Actual size: 5636620288 Result: 2 19700101_000505_000000 syslog: Early APZ-VM message 4. Reference Store is requested for recreation Actual size: 67108864 Result: 2 19700101_000505_000000 The default unix date is used until the real on is entered BINLOG The binlog channel receives information about hardware status in the CP. Possible event posters are the Alpha HW, PAL code and also the Tru64 kernel. This log is in binary format. ERRORLOG APZ VM internal errors are posted to this log. The entries are in printable ASCII format as shown below. V:\APZ\Data\CPA\PES\error>type ErrorLog_20040823_152309.log 20040901_132309_835130 Message 0000001130 Internal error: H'8610, Description: Error H'8611 detected in Xmpi200.cxx at line 443 with information words Inf1: 4, Inf2: 49154, Inf3: 7103, Inf4: 0 has been suppresse d 1 times File: ErrorHandler.cxx, Line: 4137 EVENTLOG APZ VM status messages are posted to this log, no error messages are allowed. The entries are in printable ASCII format as shown below. V:\APZ\Data\CPA\PES\event>type EventLog_20040827_094934.log 20040901_132306_588070 LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 147 APZ 212 40 20040901_132253_703241 Message 0000001109, EventPoint 0000642 CPT device CPUB successfully synchronized APZ VM CORE DUMP The core dump of APZ VM is a CLH dump channel. TRU64 CRASH DUMP The crash dump of the Tru64 operating system is a CLH dump channel. LOGGING PROCEDURE AND PATH The logs are stored in AP. The command clhls is used to lists CP log events(Central log handler log list, syslog , errlog, eventlog ). The data is presented in reverse chronological order. The ideal time for taking logs is one hour before and after the fault has happened. The most recent event is on top of the printout. The V drive on AP is used to store the loggs. The detailed path is presented in chapter 4. Example on clhls command: V:\APZ\Data\CPA\PES\error>clhls 20040812_195920_000000 syslog: cpbberd: Could not specificly identify CPU correctable binlog event, mCheckCode 0x86 19700101_000505_000000 syslog: Early APZ-VM message 6. Data Store is requested for recreation Actual size: 5636620288 Result: 2 To output the logfiles, use OPI "Central Log Handler, CP Log, Handling" 1/154 31 - ANZ 225 01/2 Uen PA2. - 148 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding In the clhls command time range with start date, start time, stop date and stop time can be used to limit the number of log events to transfer. The example below shows that. V:\APZ\Data\CPA\PES\error>clhls -d -e 20040802 -f 20040831 SYS: CPA: Syslog_20040804_142822.log - Syslog_20040804_142822.log CPB: Syslog_20040805_182350.log - Syslog_20040805_182350.log EVENT: CPA: EventLog_20040805_112044.log - EventLog_20040827_094934.log CPB: EventLog_20040805_112928.log - EventLog_20040816_115431.log ERROR: CPA: ErrorLog_20040804_094029.log - ErrorLog_20040823_152309.log CPB: ErrorLog_20040805_134318.log - ErrorLog_20040809_113418.log BIN: CPA: Binlog_20040812_195919.log - Binlog_20040812_195919.log CPB: - CORE: CPA: CPB: - CRASH: CPA: CPB: - The analysis of the loggs is not a part of this course. In the event of core or crash dump, the dump information should be placed on an ftp server where the support personnel can read it. Below, the information that should be placed on the ftp is summarized: CPHW logs : - Tru64 crash dump - APZ VM core - Binlog - Syslog In order to provide sufficient information to support personnel list the CPHW logs and add this listing to the case/TR dir v:\apz\data\cpa\cphw\crash\ dir v:\apz\data\cpa\cphw\core\ dir v:\apz\data\cpa\cphw\binlog\ LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 149 APZ 212 40 dir v:\apz\data\cpa\cphw\syslog\ dir v:\apz\data\cpb\cphw\crash\ dir v:\apz\data\cpb\cphw\core\ dir v:\apz\data\cpb\cphw\binlog\ dir v:\apz\data\cpb\cphw\syslog\ If there exist a log that is associated to the fault it could be compressed and saved with the pkzip25 command. Examples: Crash log : pkzip25 -add -rec -path=relative g:\ftpvol\cpa_crash_YYMMDD.zip v:\apz\data\cpa\cphw\crash\Tru64_Crash_20040301_1 00044\* Core log : pkzip25 -add -rec -path=relative g:\ftpvol\cpa_core_YYMMDD.zip v:\apz\data\cpa\cphw\core\Apz_vmCore_20031130_015 402.log\* Binlog : pkzip25 -add -rec -path=relative g:\ftpvol\ cpa_binlog_YYMMDD.zip v:\apz\data\cpa\cphw\binlog\* Syslog : pkzip25 -add -rec -path=relative g:\ftpvol\cpa_syslog_YYMMDD.zip v:\apz\data\cpa\cphw\syslog\* - 150 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding CLH COMPONENTS Central Processor CPBBERD ftp client APZ VM Run Time Log ftp client Adjunct Processor ftp Server Log Manager Log Reporting tool APZ VM Core Dump Handler ftp client savecore ftp client Figure 5-66 Logical View Of CLH Components CPBBERD CPBBERD is an Ericsson developed application running on the Central Processor (CP). One of its purposes to transfer syslog and binlog events to the Adjunct Processor (AP). CPBBERD is also independent of APZ VM. APZ VM AND THE RUNTIMELOG Internally in the APZ VM there is a component called the RunTimeLog. The RunTimeLog is responsible for transferring the ErrorLog and EventLog to the AP, hence the RunTimeLog is a CLH component. The RunTimeLog is independent of all other CLH components in the CP. The RunTimeLog uses a APZ VM component called FileTransfer to transfer the ErrorLog and EventLog events to the AP. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 151 Trouble Shooter APZ 212 40 APZ VM CORE DUMP HANDLER The APZ VM Core Dump Handler is invoked when a core dump of APZ VM has been written to the CP file system. The Core Dump Handler is responsible for transferring the core dump to the AP. The Core Dump Handler is independent of all other CLH components in the CP. SAVECORE Savecore is a part of the Tru64 operating system and is automatically invoked during startup of Tru64. If a Tru64 crash dump has been made, then the crash dump is transferred to the AP. Issue "man savecore" in a Tru64 terminal for more information about savecore. LOG MANAGER The Log Manager is responsible for maintaining the logs and dumps transferred by CP applications, i.e. generation handling of the logs and dumps. LOG REPORTING TOOL The Log Reporting Tool is responsible for presenting the logs and dumps maintained by the Log Manager when a Trouble Shooter gives such commands. The presenting media is a computer terminal or DAT, which is a removable media. It is also possible to transfer output from the Log Reporting Tool to a remote machine. CLH INTERFACE BETWEEN CP AND AP The only way that CLH components from CP and AP can communicate is through ftp. It is a one way communication since only CLH components located on the CP can send files to the AP, not the other way around. - 152 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding TROUBLESHOOTING New abstraction layers and the principles in APZ 212 40 imply that a new set of tools and logs/dumps is required to support troubleshooting. Most tools that are in use for classic APZ today will be re-implemented with the current functionality preserved as far as possible, and then with a few differences. Besides these, there will be new tools introduced to the APZ environment covering the new Middle ware and HW/OS layers. Some few tools will not be acceptable any longer. Traditional logs and dumps will exist also in the APZ 212 40 environment. The JAM and MAL data will look somewhat different and perhaps not be as comprehensive as before. In first releases MAL data will be presented in binary format while in later products the decoding to text format is implemented. The ASA compiler basic mode will generate APZ register coherency and indicate the registers that might not contain relevant data, in the restart information e.g. similar information as today. Basic mode will be used to trouble shoot reproducible faults or faults that occur regularly. The Test system will be fully supported in basic mode environment while only very limited support is given for blocks being executed in optimised mode. CPT will from an operator point of view look more or less the same. In the figure below we can see what trouble shooting tools that are available for the APZ 212 40, logs and dumps that are generated by the system are also shown. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 153 APZ 212 40 As you can see there are many new tools and logs. All of them are to be used for trouble shooting in the platform area i.e. in the operating system, APZ-VM, ASA compiler or in the HW. This also implies that they, at least initially will be handled/analysed by UAB. Tools Application SW APZ-CP OS ASA compiler HW (Alpha µ-proc) Binary Event Log Crash Dump CP Event Record Error Interrupts OS (Tru64™) Run-time logs APZ-VM Core Dump Restart Information SW Error Information SW Recovery Log Test System CPT LADEBUG Debugger DBX Debugger Real-time Debugger Logic Analyzers RP-bus Analyzer Logs/Dumps Old New Replaced Figure 5-67 TR Shooting possibilities in APZ 212 40 APZ-VM will have two different logs (run-time logs) as well as generating a core dump in a crash situation. The underlying operating system will generate a crash dump in case of a crash. The OS (Tru64) also generates the binary error log and the system • Log. By analysing the binary error log we can determine where a HW fault is located. This analysis is something that the repair or logistic centres can do in the future. Together with that analysis there is another tool available, a post-processing tool called Compaq Analyse. - 154 • WEBES Web-Based Enterprise Service is a collection of core services, components, and code that are shared by a suite of WEBEScompliant service applications. • These applications include Compaq Analyze (CA), Compaq Crash Analysis Tool (CCAT) and Revision and configuration Management Tool (RCM). Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding • Compaq Analyzer analyses logged events for any indication of a hardware fault and generates problem reports describing the actions needed to resolve the problem. • CA has two interfaces by which it can be used, a Command Line Interface and a Web interface. • CCAT performs analysis of crash footprints extracted from crash dumps to determine the cause of system crashes. It can email the crash analysis results to everyone on its notification mailing list. • CCAT uses a web browser style interface. • RCM collects system configuration and revision information at scheduled intervals and transports the collected data to Compaq for processing. For the UPBB and RPHMI board a daughter card (UHINT and RHINT) with trace, debug and recording facilities will be developed, similar to a logic analyser. In fault situations these boards can be installed via CPUM replacement. These daughter cards can together with a middle-ware debugger (real time debugger) be seen as replacement to the old MIT cards available in APZ 212 30. LADEBUG and DBX are debuggers only used during development. The files are collected from the site and sent to Ericsson, see figure below. A central log handler is released in the APZ 212 40 product. This log handler will have a CP part and an AP part. In short it will collect information from the CP and place it in correct archives/files on the AP. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 155 APZ 212 40 The following slide shows the transfer of trouble shooting information from an exchange to Ericsson OMC. Trouble shooting information collectedvia FTP from e.g an OMC and then sent to Ericsson as trouble report Central log handler receives data from CP and stores it in correct archive/file in AP Binary error log AP AP VM error log VM event log Log … and more file dat as ent to A P Log file data DAT CPU-A EX 1GB Ethernet link CPU-B SB For large amounts of data an option is to use the DAT tape drive in the AP e.g Tru64 crash dump Figure 5-68 Transfer of Trouble shooting information TEST SYSTEM Test System is an integrated system for software maintenance in the CP in previous APZ models and is a powerful aid for locating software faults in Plex and ASA code. The system is capable of tracing software signals between CP blocks, and signals sent between RPs and the CP, also variables and other data in Plex code. It is particularly useful because it can be used during normal traffic. The main principles for the Test System in APZ 212 40 will be the same as for earlier APZ 212 processors. A number of trace bits located in System Tables will be used to mark events, which are to be traced. These flags will be set by the Test System and checked by the Kernel before a new job is started. Some of the trace bits kept in a function block’s entry in the Reference Table indicate the trace types, which are ordered for the block and there will be a number of changes in this table. Key modifications to the Test System functionality are: - 156 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 5 Fault Finding 5 The MELODY functionality is implemented using the ON EP trace principle in the test system. With the new concept it is possible to trace a certain process i.e. a specific call. This was also possible before but now a time measurement is added and it does not matter if the block is FORLOPP adapted or not. With this change no specific block or corrections are required for MELODY functionality, it will always be there. A new printout in the “MELODY” format is introduced. Due to the extensive usage of caches the accuracy of a MELODY measurement is worse in an APZ 212 40 system compared to earlier systems. One measurement can differ quite a bit from another, therefore it is recommended to perform a number of measurements. Measurements can be done on several FORLOPPs 6 The possibility to “Trace on every instruction address” is removed due to the CPU load this trace costs. 7 Minor change regarding tracing on IPN. The way to specify the trace is changed to ON OUT IPN. 8 When a trace is initiated the Alpha compiled code is regenerated by the ASA compiler. This is needed for the following traces: DB, VAR, IA, JUMP and FLVAR. The code is not re-generated for the following traces: INSIG, OUTSIG, EP, FLIN, FLOUT, FLEXIT and FLCON. When the code is regenerated it will also run in basic mode. 9 Possibility to trace on global EP is introduced. This will enable MELODY measurements also with the old method (as well), but through a new command interface i.e. the test system. 10 The trace measure DYN TRACE ON JUMPS is replaced with ENABLE JUMPS and DISABLE JUMPS. The correction system ( corrections entered under test system ) will be fully supported but as always the usage of corrections should be carefully planned. Corrections have an impact on compilation time and can also affect if there are many jumps and branches in the code. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 157 APZ 212 40 Jump Address Memory (JAM), which is used when tracing on jumps, also features in the APZ 212 40. In the classic APZ, there is a JAM, which is updated for each jump taken when executing the code. In the APZ 212 40, the optimised code will only generate JAM entries at receiving software signals but the basic compiled code will however generate the same JAM as in APZs previously. CHAPTER SUMMARY - 158 • Most troubleshooting tools that are used today will be implemented with their current functionality preserved as far as possible. Along with these there will be new tools introduced. • The main principles of Test System will be the same as for earlier APZs but with some modifications. • Redundancy is still a key feature in the APZ 212 40, it means that all essential hardware magazines are duplicated so that if a hardware fault occurs the second side can take over operations. • CPs do not run in parallel synchronous mode, in the case of a side switch data is copied from EX side to SB side to make the standby “hot” (identical in data to EX). • Reasons for operating the CP in warm standby mode instead of parallel include the using of a commercial CP that does not support parallel mode and the fact that very few hardware faults occur. Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS 6 APZ 212 40 - RPS LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 159 APZ 212 40 OBJECTIVES Objectives After completing this chapter, participants will able to •Describe the RPB-E •List the functions related to Ethernet connected RPs •Describe the hardware and functions of SCB-RP •Describe the functions of GESB •Describe the functions of the RPI and GARP Figure 6-69 Objective REGIONAL PROCESSOR SUBSYSTEM INTRODUCTION The regional processor system RPS is divided into the following subsystems: • RPS-1 • RPS-B • RPS-2 • RPS-M The RPS-B subsystem includes the support and maintenance software functions that all RPs and EMRPs require for their operation. RPS-2 includes the hardware functions, micro programs (where applicable) and the OS functions for members of the RP Groups RP2, RPD, RPP, GEMRP and RPI. The subsystem includes software for restart dump (a maintenance function) plus software for program test and program correction (support functions). This chapter will focus on the GEMRP and the RP that are connected to CP via the ethernet bus. RPS-M includes the hardware functions, Operation & Maintenance Functions and OS functions required for the operation of the Extension Module Group (EMG) control system. The entire subsystem is optional and is not required for the general operation of an AXE node. - 160 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS RPS interfaces with the following AXE subsystems as illustrated in the figure below. Figure 6-70 RPS including its components RPS-B, RPS-1, RPS-2 and RPS-M. RPS provides services to all AXE applications such as POTS (ordinary telephony), ISDN, Voice Over IP, ADSL, GSM and many more. In general, RPS may be seen as the means by which a subscriber to a particular telecom application is connected to the AXE system that supports the particular application in question. Figure 6-71 The various telecommunication applications that interface with the RPS platform via its API . LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 161 APZ 212 40 RP EVOLUTION This chapter will not cover all subsystems one by one. It will focus on the new RP functionality. First-generation AXE systems contained RPs that were introduced in the late 70s. Active marketing of such equipment ceased around 1986. The second-generation AXE systems contained RPs that were constructed in accordance with the BYB202 equipment practice which was introduced around 1986. Active marketing of this equipment practice has come to an end 2002. Third-generation AXE systems contained RPs that are constructed in accordance with the BYB501 equipment practice which was introduced in 1998 and remains the current equipment practice. The BYB 501 equipment practice includes a 1800mm cabinet that can house either a combination of full and half-height GDM magazines (GDM-F and GDMH) or a combination of the GEM magazine and only half-height GDM magazines. EMRP 42 EMRP3 EMRP2 EMRPD EMRPI EMRP1 1975 1980 RP1 EMRP4 1985 RP2 1990 1995 RPD EMRPI’ EMRPD2 EMRPS RP3 2000 RPG RPP RP4 RPP RPG2 GARP RPI GARP RPG3 RPB-P RPB-S RPB-S RPB-E Figure 6-72 RP evolution The fourth-generation AXE systems contained a new CP concept based on third party hardware with a UNIX operating system. An Ericsson designed APZ layer (APZ-VM) ensure the compatibility to older systems. To utilize the higher capacity of the CP a new RP bus type was introduced based on Ethernet technology. - 162 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS THE RP BUS The RPs are connected to a bus that serves as the communications link between the RP and the CP. Three different types of RPBs exist - a parallel bus referred to as the RPBP, a serial bus referred to as the RPB-S and an Ethernet based bus referred as RPB-E. The RPB-P is not supported by APZ 212 40 and will therefore, not described here. RPB-S Two RPB-S paths are used providing redundant CP-RP interconnection and ensuring the fallback in case of fault. In order to ensure that both Paths A and B function correctly, the RPB-S paths are switched every 15 seconds. Assume, initially, that CP-A is running in executive mode and CP-B is running in standby/working mode. Assume also that paths A and B contain 32 branches each. CP-A initially sends signals to the RPs along Path-A (that is, to all branches via Path-A). The transmission of signals to and from the RPs will change at intervals of approximately 15 seconds as follows: Figure 6-73 RPB branch side switch LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 163 APZ 212 40 There is no CP side switching as a result of the above. CP-A is still executive. This switching mechanism above can only be maintained while the two CP are not separated. In this way we are assured that both paths are operational and that a fault requiring an RPB-S path change will not be catastrophic. ETHERNET RP BUS, RPB-E The regional processor bus of Ethernet type (RPB-E) is intended to replace the serial RP-bus for applications that require high bandwidth between CP and RP. RPB-E provides more bandwidth than RPB-S to each RP, which means more compact systems can be built. The RPB-E is implemented on the APZ 212 40. The maximum bandwidth on APZ 212 40 is currently limited to 100 Mb/s. RPB-E offers a service to the application up to layer 6 in the OSI model where the two lowest layers consist of Ethernet DIX version. On top of Ethernet an Ericsson developed protocol, TIPC, is used. TIPC takes care of the layer 3 to layer 5 in OSI as shown below: RPB Conversion layer TIPC layer MAC Physica l layer MAC Physical layer • AXE addressing (signal distribution) • Stop sending signals • Broadcast signals • Bundling management • Reliable Transport • Establishing & Terminating connection • Maintaining connections • Message fragmentation • Redundancy management • Name service Frame Type Handling Address Filtering Auto negotiation RPB Conversion layer TIPC layer MAC Physica l layer Figure 6-74 Illustration of the RPB-E protocol structure. - 164 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A MAC Physical layer 0 6 APZ 212 40 - RPS The RPB Conversion layer provides mechanisms to make the bus to fit into the AXE system. That layer corresponds to the presentation layer, layer 6, in the OSI model. The TIPC protocol handles the network, transport and session layers for the RPB-E. This ensures good real-time capacity as well as robustness features such as fail over mechanisms. TIPC has been chosen because of the mechanisms to handle redundant links and the included mechanisms in the protocol to address the executive part of the CP without distribution of physical address, so called location transparency. The figure above illustrates structure of the RPB-E protocol. The RPB conversion layer on top of TIPC guarantees that the bus is compatible with the previous busses from the user’s point of view and have all expected capabilities. To utilize the available bandwidth on Ethernet signals are bundled during intensive signaling cases. The bundling mechanism is implemented in the RPB conversion layer. The signals are delayed for a short period (less then 300 µs) to fill up the frame which size is specified at the system design. The AXE principles of redundancy to ensure robustness and reliability are kept as shown in the figure below: Figure 6-75 The principle of RPB-E: double separate Ethernet nets as a means of creating secure, reliable CP-RP-CP communication without any single point of failure for the system The RPB-E consists of two independent nets that are continuously supervised and in case of permanent failure in one of the communication nets between the processors in the system an alarm is issued and an automatic switch of the signaling is performed to the redundant net. After repair of the faulty unit (cable or switch) the signaling will automatically return to the original net. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 165 APZ 212 40 RP bus structure based on Ethernet The RPs are connected to the RPB-E via a network of a Support and Connection Board with RP (SCB-RP) pair and one or several Gigabit Ethernet Switch Board (GESB) pairs. The SCB-RP transfer the bus into the magazine backplane whiles the GESB distributes the bus to different magazines and cabinets. The switches are always working in pairs, which constitutes two redundancy nets. Figure 6-76 Ethernet RPB (RPB-E) network configuration The GESB has eight Ethernet ports placed on the front. One port is connected to the previous GESB, four ports are used to connect the magazines switches SCB-RP placed in the same cabinet and the remaining three ports are used to connect GESBs in other cabinets. In some cases the existing IPN Switch (IPNX) can be used as the first switch seen from the CPs perspective instead of a GESB. - 166 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS THE MAGAZINES IN BYB 501 The equipment practice BYB 501 is used for both AXE 10 (GDMs) and AXE 810 (GEMs and GDMs). Several GDM based application hardware units that are not adapted to GEM must be used in order to provide certain solutions. Therefore, an AXE node might contain both GEMs and GDMs in order to provide full application support. The figure below shows such a structure including all the RP types. Figure 6-77 An illustration of the use of the different Regional Processor types. Traditional (RPB-P and RPB-S) bus types are shown LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 167 APZ 212 40 GDM The Generic Device Magazine, GDM, houses serial RPs and thirdgeneration device boards. Figure 6-78 The GDM magazine containing serial RPs and device boards. Every GDM magazine contains two RP4s (one at each end of the magazine) - each RP4 has a number of functions. One of these functions is to feed one of the RPB paths to the RPs located within the magazine another is to pass that RPB path on to other magazines containing RPs connected to the same RP branch. GEM The Generic Ericsson Magazine, GEM is introduced with AXE 810. It houses a 16k GS and several application hardware units. GEMs does not any longer provide a backplane EMB connection as does the GDM. All boards mounted in a GEM are application boards which contain both an on-board RP and the EMs required to satisfy the requirements of the particular application. The SCB-RP provides backplane support for the RP bus much the same as the RP4 in the case of the GDM. - 168 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS The magazine may contain a total of 26 boards which means that a fully configured GEM contains 26 RPs including SCB-RP. GEMs always require fans for forced-air cooling. GEM boards contain about 25% more surface area than do GDM fullheight boards. Unlike the situation in the case of the GDM and the RP4, all GEM application boards now include their own RPs and the SCB-RPs do not take part in the operation of the applications. The SCB-RP simply supports the operation of the magazine and the transport of the RPB to the various application boards. Figure 6-79 Example of a GEM magazine connected to RPB-S Figure 6-80 Example of a GEM magazine connected to RPB-E The GESB are equipped with an Ethernet switch that can distribute the signals to the next level switch for further distribution from the CP. The LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 169 APZ 212 40 SCB-RP in the GEM magazine provides 100 Mb/s bandwidth to all GARPs through the backplane. The connection from SCB-RP to the next level switch (for example GESB) provides a bandwidth of 1000 Mb/s. It is depending on the speed of ethernet ports in the CP how high bandwidth that can be used. In the first implementation of RPB-E only the high intensity signaling GARPs will have the capability to use RPB-E. The RP part of the SCB-RPs, which performs magazine supervision, will be connected to RPB-S. - 170 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS RP SUPPORT AND MAINTENANCE FUNCTIONS The RP functions below are mainly parts of the RPS-B subsystem. They are: • Loading Administration of RSU in CP and loading of EMRP, RP and DP ; APZ supported loading of PDSPL boards • Program output Output of regional programs, backup copy function • Function change Function change, change of EM in EMG and loadable RP • Function change of Regional Firmware Function change, change of FLASH programs for GEM-RP, RPP and RPG1, RPG2, RPG3 • Program correction RP program correction • Program test Debugger for RP and EMRP with REOS interface • Flow control CP-RP signaling flow control Maintenance functions • Equipment and administration RP administrative function • Start and restart RP start function • Fault detection RP error detecting function • Recovery RP recovery function • Diagnosis RP diagnostics function • Repair RP repair function • Alarm RP alarm function • Board Repair Indication Board repair indication function in BYB501 • Hardware Inventory Hardware inventory function in BYB501 • Magazine address supervision Magazine address supervision • Magazine power supervision Magazine power supervision LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 171 APZ 212 40 • Supervision of RP intercommunication RP intercommunication supervision function • Fan Supervision Includes the monitoring of alarms and temperature. RPS reports to OMS. • RPB-E Supervision RPB-E Network Maintenance function • GESB and SCB-RP Ethernet switch function Ethernet switching in GEM Figure 6-81 RPS-B support and maintenance functions. As shown above the RPB-E related functions are related to the maintenance part. RPB-E NETWORK MAINTENANCE FUNCTION The principles for bus supervision differ between the bus types. The classic CPs (up to APZ 212 3x) consists of two parallel executing processors. Normally, one is working executing and the other is working standby. That means that both parts of the CP must receive the same information from the RPs. For the parallel bus this is solved by connecting one bus path to the CP WO/EX and the other path to CP WO/SB and that the RPs always are sending on both bus paths. As both CP parts are working in parallel both parts are sending signals to the RPs. The RPs always ignores the signals from the WO/SB part of the CP. By matching the information from the two CP parts any malfunctions of the bus can be detected. The principle for the serial RP bus is slightly changed from the one for the parallel bus. The CP parts are still working in parallel but the RPH only transfer the signals from the WO/EX CP part to the RPs. To verify correct function of bus the traffic is regularly switched between the bus paths. The principle for the Ethernet based RP bus (RPB-E) is completely changed from the two others. The bus is only implemented on the APZ 212 40, which is a non-parallel working CP. Both bus paths are used at the same time. Between each RP and both CP parts two separate logical links are established. One active and one redundant logical links per RP are established. The active and the redundant logical link are never running on the same bus net. - 172 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS A link switch to the redundant link is automatically performed when the active link fails. An alarm is always raised when a link failure occur independent if the interruption has affected the active or its redundant link. The task for the RPB-E Network Maintenance Function (RPNM) is to raise an alarm and to identify faulty RPB-E network units in a situation when the CP cannot communicate with several RPs on RPB-E. RPNM will interact with the protocol function (TIPC) in the CP side where it executes. The RPNM function receives information from TIPC regarding changes in the communication status on RPB-E. If one or several links becomes ‘Not Working’ TIPC informs the RPNM function about it. The RPNM function then raises the alarm “RPB-E NETWORK FAULT”. In case of link failure TIPC will automatically switch over to the Standby link and continue the communication. In a fault situation when one or several links between the CP and the RPBE RPs are “Not Working”, the command EXRNP belonging to the RPNM function, shall be used. With this command is possible to print the status of all links between CP and the RPB-E RPs and get the picture of the faulty situation. The information printed by the EXRNP is fetched from TIPC. With the aid of the OPI “ RPB-E NETWORK FAULT” the operator is raised guided to the most probable faulty unit, given a particular fault picture displayed by the command EXRNP. The units that may cause an RPB-E network fault are the ethernet switches that belong to the RPB-E (GESB and SCB-RP), the connection points towards the RPB-E in the RPs and the cables in between. The RPNM function consists of the blocks RENCSI and RENFD. Block RENCSI interacts with TIPC and block RENFD performs the analysis of the information received from TIPC and handles the command EXRNP. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 173 APZ 212 40 HARDWARE UNITS AND FUNCTION BLOCKS GIGABIT ETHERNET SWITCH BOARD GESB The APZ platform utilizes GEM magazines and SCB-RP boards together with processor boards and when applicable, an Ethernet switch to connect different magazines. The GESB is an 8 x 1Gbps unmanaged Ethernet LAN Switch that is normally used for interconnection of several GEM magazines (SCB RP interconnect boards in a GEM; GESB interconnects GEMs) . • The GESB serves a number of main functions: • GESB is an 8-port Gigabit Ethernet switch, able to handle communication between GEMs as well as communication to GEMexternal stations. • GESB houses an RPI. The main features of the Gigabit Ethernet switch are the following: • Provides eight 1000Base-T ports towards the front. • All front connectors are the 8 pin RPV 431 1911 (20 mm pitch). • All Ethernet ports support auto-negotiation. • All Ethernet ports support half or full duplex communication. • The switch provides automatic MAC address learning. • The Ethernet switch support operation in managed and unmanaged mode. • Flow control complies with IEEE 803.x can be enabled. The figure below shows GESB interconnecting several GARPs. - 174 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS GDM R P 4 R P 4 RPs GEM S C B R P S C B R P GARPs GEM S C B R P RPB-S G E S B GARPs G E S B S C B R P RPB-S RPHM-B RPHM-A RPB-E RPB-E APG UPBB RPHMI BASE-IO UPBB BASE-IO RPHMI CP-B CP-A Figure 6-82 GARP and GESB connected to the RPB-E The GESB lay out is shown in the figure below. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 175 APZ 212 40 MIA Led Port 7 8-ports Gigabit Ethernet Port 0 Figure 6-83 Front connector allocation. Function Block: Gigabit Ethernet switch board hardware, GESBHW The GESBHW supplies the applications with communication channels on Ethernet. The HW unit is designed for use in GEM and equivalent subracks in BYB 501. Firmware (FW is the software loaded on the board from factory) for boot and restart at power up and default OS for deblocking are stored in flash. FW for switch configuration are also stored in flash. GESB contains an Ethernet switch providing eight 1000BaseT interfaces on the front panel. It has been implemented in full height (265 mm), 225 mm depth and 30 mm width, in equipment practice BYB 501 with fan cooling. It uses a redundant -48 V power supply from the backplane for internal use. GESBHW, Block Function The block GESBHW consist of a hardware unit, ROJ 208 410/2 GESB and three load modules, CXC 106 0046 PBOOT_RPI_860, CXC 106 0134 - 176 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS GESB_CORE and RON 109 672 GESB_GLUE. FW for RPB-E is not loaded in the GESB flash memory from factory. Only RPB-S is supported. The GESBHW functions are supporting the RP and the ethernet switch parts. The processing function in GESB is used for execution of RP programs written in the C language. Autonegotiation on the Ethernet interface is not supported, only half duplex 10Mbit/s is possible. The Ethernet switch can run in either unmanaged or managed mode, controlled by the internal processor. The switch runs in the store and forward mode which means that it can transfer packets between ports which have different speed and duplex modes. Autonegationation of speed and half/full duplex is supported individually on all ports. The hardware unit handles the RP-CP communication providing an RPB-S interface ethernet communication providing an Ethernet switch. Firmware for boot and restart at power up and default OS for deblocking are stored in flash memory. The firmware for switch configuration is stored in EEPROM loaded from factory. SCB-RP BOARD The SCB-RP board is designed to be used in the GEM. For redundancy reasons each GEM will be equipped with two SCB-RP boards. The SCBRP plug-in unit is composed of a mother and a daughter board. The SCBRP board serves a number of main functions: LZT 123 7917 R3A • Delivering power (-48V) to all boards in GEM. • Distributing RPB-S to all boards in GEM. • SCB-RP is an Ethernet switch, able to handle communication between boards in GEM as well as communication to GEMexternal stations. • SCB-RP houses an RPI. • SCB-RP is a handler of external alarms. • SCB-RP is the master on the Maintenance bus Copyright © 2004 by Ericsson AB - 177 APZ 212 40 Figure 6-84 The SCB-RP and the GEM structure. V24 is used as alarm and maintenance interface for the FANs Support and Connection Board Hard Ware is the function block realizing the SCB-RP. It supplies the AXE with a HW maintenance platform and communication channels on RPB-S and for two of the HW units also on Ethernet. The HW units are designed for use in GEM and equivalent subracks in BYB 501. The SCB-RP is front layout is shown below. - 178 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS Figure 6-85 SCB-RP Front Connector Allocation SCB-RP Interworking RPB-S For communication with the CP, the SCB-RP provides a RPB-S interface. RPB-S is connected to the SCB-RP via a front connector and distributed to the backplane. The front connection is of differential type while the connections towards the backplane are single ended. Each RPI in the magazine is connected to two RPB-S/M branches, since there exist two SCB-RP boards in a GEM one handling path A (RPB-S/M-A) and the other path B (RPBS/ M-B). LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 179 APZ 212 40 The transmitters and receivers for A- and B-paths are all in separate packages to enhance fault-tolerance of the complete system, if one of them should break. In order to reduce the load on the backplane bus, the RPB/M distribution is split into 4 branches, each handle up to seven boards. The figure below shows the principle of RPB-S distribution (the split is not shown). Figure 6-86 RPB-S distribution in GEM (RPB/M: M stands for Magazine). The SCB-RP provides the maintenance bus interface. The board acts as the master on this bus, but has also a slave interface like all other boards in the GEM magazine. The maintenance bus also gives the possibility to point out specific boards in a magazine by controlling the MIA-LED. The M-Bus components on each board in GEM are powered and supervised by SCB-RP. Each SCB-RP board controls the M-bus slaves of the opposite half of the subrack including the opposite SCB-RP. The Mbus is not implemented with redundancy. Ethernet Switch The SCB-RP contains an ethernet switch to support the ethernet connections in GEM. The Ethernet switch is an extensible, scalable architecture for the switching of packetized data. All switching devices act as distributed intelligent agents within the switching system making switching decisions independent of other devices in the system. - 180 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS • SCB-RP provides accessibility to all boards within the GEM via 10/100-BaseT Ethernet • SCB-RP provides a 1000BaseT port and two 10/100 BaseT ports towards the front • All Ethernet ports support autonegotiation, even the Gigagit port. • All Ethernet ports support half or full duplex communication. • The ports can be configured to support backpressure for half duplex. • The switch provides automatic MAC address learning. • The Ethernet switch support operation in managed and unmanaged mode. • Flow control compliant to IEEE 803.x can be enabled. Function Block: Support and Connection Board Hard Ware, SCBHW In ROJ 208 323/1 and ROJ 208 323/2 the repeater part in SCB-RP distributes one path of an RPB-S branch from one preceding subrack or RPH through the repeater to the next subrack in a chain of subracks. The boards have two front connectors, RPB-S/I for incoming bus and RPB-S/O for outgoing bus as seen from the RPH. ROJ 208 323/3 is designed to be a single board that is cable connected to a path on an RPB-S branch. The RPB-S/O port is therefore missing. Instead the ROJ 208 323/3 has four front connectors for incoming RPB-S bus branches, RPB-S1 to RPB-S4, for distribution to the backplane. The repeater part also supply RP-boards in the same subrack with RPB-S connections through the backplane, RPB-S/M. ROJ 208 323/1 and ROJ 208 323/2 support connections to the whole backplane from a path on one RPB-S branch. ROJ 208 323/3 support from one up to four branches of RPB-S connections to the backplane. The RP part in SCB-RP uses a redundant RPB-S/M (Magazine) interface, path A and B, through the backplane. The RP also uses physical RP addresses, (subrack and slot address), from the backplane. The Ethernet switch (on ROJ 208 323/2 and ROJ 208 323/3) provides 26 100BaseTx interfaces through the backplane to boards in the same subrack. The switch also provides one 1000BaseT and two 100BaseTx LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 181 APZ 212 40 interfaces on the frontpanel. One of these 100BaseTx on the front panel is only for test purposes and is equipped with a standard RJ45 connector. The RP part of SCB-RP has two standard (IEEE802.3/Ethernet CSMA/CD) 10BaseT half duplex interfaces to the backplane. Each port has its own individual MAC address. Apart from the repeater an interconnection part the SCB-RP provides a distribution module used to distribute several busses in the GEM backplane. The distribution module in SCB-RP uses two -48 V power supplies through the front panel connector, each capable to distribute up to 500W. The distribution module then supplies all boards in the same subrack with a filtered -48 V power connection through the backplane. SCB-RP is capable of supervise four power buses in the backplane, partly the two power buses it distributes itself and partly two redundant power buses from another SCB-RP. SCBHW Function Block The processing function in SCB-RP is designed for execution of RP programs written in the C language. RPB-S Repeater part In ROJ 208 323/1 and ROJ 208 323/2 the RPB-S repeater function distributes one path on one RPB-S bus branch to all RPs in the same subrack and through an expansion port also to another subrack. In ROJ 208 323/3 the RPB-S repeater function distributes one path of one up to four RPB-S bus branches to the RPs in the same subrack. Both SCBRP boards in a subrack senses its position in the backplane to be able to automatically configure their RPB-S front ports to the same group of RPs. If four branches are used in the GEM each branch will handle about one quarter of the slots in the subrack. RPB-S1 will handle slot 0–5, RPB-S2 will handle slot 7–12, RPB-S3 will handle slot 14–19 and RPB-S4 will handle slot 20–25. Observe that slot 6 and 13 can not be used! The SCB-RP senses if it has an RPH connected in the other end of a bus cable. If a front port is not connected (and have not been earlier) the corresponding slots are automatically configured to the front port with nearest lower number that is (or have been) connected. This means for example that with only RPB-S1 connected, this bus branch will handle all slots in the subrack. In this special case the SCB-RP ROJ 208 323/3 will - 182 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS also handle slot 6 and 13, just like ROJ 208 323/1 and ROJ 208 323/2. Observe that it is not possible to change a configuration by just remove one RPB-S cable. The SCB-RP will remember that this cable has been connected and keep the corresponding configuration. The only way to reset the configurations is to remove all RPB-S cables and then reconnect cables only to the desired ports. When a failure occurs or during an installation or repair the configuration can be different for the two paths. Because the RPs in the subrack only uses a part of their full logical RP address, this might lead to an addressing conflict on RPB-S. Therefore the logical RP addresses must be chosen so that the lowest four Bytes are different for all RPs in the whole subrack. The easiest way to do this is to let this part of the address be equal to the slot address. To get a redundant RPB-S connection in a subrack two SCB-RPs are needed one for each path on the RPB-S branches. Ethernet switch HW part (on ROJ 208 323/2 and ROJ 208 323/3) The Ethernet switch can run in either unmanaged or managed mode, controlled by the RP part. The switch runs in the store and forward mode which means that it can transfer packets between ports which have different speed and duplex modes. Autonegationation of speed and Half/full duplex is supported on all ports. THE DCP PLATFORM The Data Communication Platform (DCP) supplies AXE applications with a high capacity regional processor platform. The platform includes DL2 connections to the Group Switch and an Ethernet Packet Switch allowing Ethernet communication between the RP boards and the additional units in the GDDM-H magazine. The Ethernet Switch also includes circuits for communication with other Ethernet Switches or external Ethernet connections. Target application areas are those in which telecommunications and data communications overlap. Typical applications areas are access functions for Internet or other data networks over PSTN, ISDN, PLMN, etc. The DCP platform is made up of the RPP and one of the two packet switch versions, the EPSB or the EPS. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 183 APZ 212 40 RPP The purpose of the RPP is to access and monitor DL2 data streams from the Group Switch. From a general AXE architectural perspective the RPP product is considered as being a traditional RP. On the other hand, datacom applications view the RPP as being far more than just a traditional RP due to its versatility in the support of data communications applications. In addition, the RPP and the EPSB/EPS combine to form a Redundant LAN Switching System for RP-RP communication. EPSB/EPS The EPSB/EPS is an Ethernet LAN Switch. They provide a redundant switched LAN for high-speed intercommunication between RPPs, which in effect constitutes a local backbone for packet switched applications employing RPPs for protocol processing. For redundancy purposes, an application may be subdivided so that it is supported by several RPPs, interconnected via a duplicated Ethernet link. EPSB is a product to be phased out and give space for the EPS that has better performance. RPPETA1 The RPPETA1 is one of three boards included in the ALI-ETA155 PlugIn- Unit that serves as an ALI (ATM Link Interface) and integrates the AXE10 into the UMTS (Universal Mobile Telecommunication System). The unit can be seen as an Inter Working Unit (IWU), operating between the ATM (Asynchronous Transfer Mode) and STM (Synchronous Transfer Mode) environments. Since the protocol for voice transport over UMTS is AAL2, the ALI unit must support the AAL2 protocol (for ATM voice transmission) and the AAL5 that has been selected as the signaling protocol. RPI The RPI is a low cost, highly integrated general-purpose AXE10 regional processor platform. The RPI is intended to be used as a hardware block to be used on different application boards. The RPI is implemented as a hardware module and will be placed on boards that require RP functionality - in other words, the RPI functions as an ‘onboard processor’. - 184 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS GENERIC APPLICATION RESOURCE PROCESSOR (GARP) The GARP is designed for the GEM magazine. Its objective is to utilize as much as possible of the new capabilities of the GEM magazine and offer a general future proof platform for applications with real time requirements. It complements the RPI-based GEM board which have application specific HW-, FW- and SW-design. The GARP has an Ethernet port on the front panel that makes it possible to connect directly to an external network. Together with it’s high capacity it makes GARP an ideal choice for signaling applications, for example, Sigtran. All CP-RP-CP communication with GARP can be performed either on RPB-E or on RPB-S but never mixed at the same time. Which bus type is going to be used is pre-determinated under GARP definition (command EXRPI). RPB-E CONNECTED GEM The figure below shows an ethernet connected GEM. SCB-RP GESB GARP GARP GARP GARP GARP GESB SCB-RP 24 Slots Figure 6-87 An example of GEM magazine containing GARP. LZT 123 7917 R3A Copyright © 2004 by Ericsson AB - 185 APZ 212 40 RPB-S CONNECTED GEM SCB-RP GARP GARP GARP GARP GARP SCB-RP 24 Slots Figure 6-88 An example of GEM magazine containing GARP and two SCB-RP with 4 ports GARP is a general processor for protocol- and real-time applications. Its improvements compared to RPP are: • Increased CPU performance. CPU upgrade from 333 to at least 533 MHz • DL34 interface, Timeslot increase from 64 to 2688 in steps of 128 • Increased amount of SDRAM and ECC support • One board solution instead of three • A 10/100 Mbps Ethernet port in the front • Improved system debug features • Smaller, easier to verify basic system SW GARP is a powerful regional processor. The same hardware can be used as GARP and GARPE. GARP is connected to CP through the RPB-S and GARPE through an Ethernet bus. The E version is adapted to APZ 212 40E. Some Applications on GARP are: - 186 Copyright © 2004 by Ericsson AB LZT 123 7917 R3A 0 6 APZ 212 40 - RPS • SCTP-ST (also known as SIGTRAN-ST. ST = Signaling Terminal). The IP-signaling terminal provides the IP based signaling interfaces of the MSC Server. Two GARP boards form a redundant SCTP-ST pair. Those boards are housed in a GEM magazine together. The connection to the CP is via the existing serial RPB or RPB-E relying on the application to decide which is used. One RP branch per GEM is used. Each GARP board has a 100BaseT Ethernet port for external communication towards MGW/SGW. The both GARP boards are redundant pair to perform fault tolerate HW and load sharing. The total signaling capacity is 6 Mbit/s signaling payload restricted by the RPB. One pair of SCTP-ST (one GEM) is expected to handle all signaling for the MSC server. • LZT 123 7917 R3A GCP encoding/decoding Copyright © 2004 by Ericsson AB - 187