ZXWN MGW Media Gateway Troubleshooting

ZXWN MGW Media Gateway Troubleshooting Version 3.07 ZTE CORPORATION ZTE Plaza, Keji Road South, Hi-Tech Industrial Park, Nanshan District, Shenzhen, P. R. China 518057 Tel: (86) 755 26771900 Fax: (86) 755 26770801 URL: http://ensupport.zte.com.cn E-mail: support@zte.com.cn LEGAL INFORMATION Copyright © 2006 ZTE CORPORATION. The contents of this document are protected by copyright laws and international treaties. Any reproduction or distribution of this document or any portion of this document, in any form by any means, without the prior written consent of ZTE CORPORATION is prohibited. Additionally, the contents of this document are protected by contractual confidentiality obligations. All company, brand and product names are trade or service marks, or registered trade or service marks, of ZTE CORPORATION or of their respective owners. This document is provided “as is”, and all express, implied, or statutory warranties, representations or conditions are disclaimed, including without limitation any implied warranty of merchantability, fitness for a particular purpose, title or non-infringement. ZTE CORPORATION and its licensors shall not be liable for damages resulting from the use of or reliance on the information contained herein. ZTE CORPORATION or its licensors may have current or pending intellectual property rights or applications covering the subject matter of this document. Except as expressly provided in any written license between ZTE CORPORATION and its licensee, the user of this document shall not acquire any license to the subject matter herein. ZTE CORPORATION reserves the right to upgrade or make technical change to this product without further notice. Users may visit ZTE technical support website http://ensupport.zte.com.cn to inquire related information. The ultimate right to interpret this product resides in ZTE CORPORATION. Revision History Product Version Revision Revision Reason V3.07.40 20090420–R1.0 First edition Serial Number: sjzl20092207 Contents About This Manual.............................................. i Declaration of RoHS Compliance ........................ i Troubleshooting ................................................1 Basic Requirements for the Maintenance Personnel ............. 1 Troubleshooting Flow ...................................................... 2 Troubleshooting Principles................................................ 4 Troubleshooting Methods ................................................. 4 Hardware Faults ................................................9 Faults in Board Configuration ........................................... 9 Handling Fault Occurring during Board Configuration ........ 9 An Instance for Handling Board Version Loading Failure .............................................................12 Faults in Board Running Process ......................................12 Fault Handling Procedure ............................................13 Handling UIM Board Hardware Faults............................14 Handling OMP Board Hardware Faults ...........................17 Handling SMP Board Hardware Faults ...........................19 Handling SIPI Board Hardware Fault.............................20 Handling SPB Board Hardware Faults............................21 Handling CLKG Board Hardware Faults..........................23 Handling IPI Board Fault .............................................24 Handling DTB/DTEC Board Fault ..................................25 Handling APBE Board Fault..........................................26 Handling MRB Board Fault...........................................27 Handling VTCD Board Fault .........................................29 Handling Changeover Exceptions .....................................30 Handling CPU Overload...................................................31 Clock Faults..................................................... 33 Handling System Clock Exception ....................................33 Handling the Clock Lock Failure .......................................35 Handling the Clock Networking Fault ................................38 Handling the Inconsistent Lock Status of Active/Standby CLKG Boards.........................................................39 Handling the Clock Reference Loss ...................................41 Handling the Output Clock Loss .......................................42 Handling the Slip Code ...................................................44 Interface Faults............................................... 47 Handling MGW-MSCS Interface Fault ................................47 Handling MGW-MGW Interface Fault .................................49 Handling MGW-RNC Interface Fault ..................................50 Handling MGW-PSTN Interface Fault.................................52 Handling MSCS-BSC Interface Fault .................................54 Handling Service Faults ..................................................56 OMM System Faults ......................................... 59 Handling OMM Abrupt Abnormality...................................59 Handling Virus/Security Events........................................62 Analyzing Instance 1......................................................64 Analyzing Instance 2......................................................64 Analyzing Instance 3......................................................65 Analyzing Instance 4......................................................66 Analyzing Instance 5......................................................67 Analyzing Instance 6......................................................68 Analyzing Instance 7......................................................70 Analyzing Instance 8......................................................71 Interconnection Faults in IP-Bearer Network .......................................................... 73 Handling Continuous Call Loss Generated for Broken Receiving Fiber of Soft-Switch .................................73 Handling Call Loss Generated by Soft-Switch after CE Restarts ...............................................................74 Handling Soft-Switch Failing to Ping through CE.................75 Voice Faults..................................................... 77 Common Voice Faults .....................................................77 Troubleshooting Ideas and Common Methods ....................78 Troubleshooting Ideas ................................................78 Common Methods for Locating Faulty NE ......................79 Methods of Locating Internal Fault Points on CN.............81 Echo Fault Handling .......................................................82 Echo Principles ..........................................................82 Working Principles of Echo Canceller ............................83 Principle Overview.............................................83 Echo Directions.................................................84 Echo-Suppression Implementation ...............................85 Configuring Echo Cancellation .............................85 Configuring Echo Cancellation by Adopting Resource Pool ........................................87 System Implementation .....................................88 Fault Processing ........................................................89 Monolog Fault Handling ..................................................89 Both-Way Silence Fault Handling......................................90 Noise Fault Handling ......................................................92 Cross-Talking Fault Handling ...........................................92 Instance Analysis ..........................................................93 Analyzing Instance 1..................................................93 Analyzing Instance 2..................................................94 Analyzing Instance 3..................................................95 Analyzing Instance 4..................................................96 Figure.............................................................. 99 Table ............................................................. 101 Index ............................................................ 103 About This Manual Purpose At first, thank you for choosing ZXWN wireless core network system of ZTE Corporation! ZXWN system is the 3G mobile communication system developed based on the UMTS technology. ZXWN system boasts powerful service processing capability in both CS domain and PS domain, providing more abundant service contents. Comparing with the GSM, ZXWN provides telecommunication services in wider range, capable of transmitting sound, data, graphics and other multi-media services. In addition, ZXWN has higher speed and resource utilization rate. ZXWN wireless core network system supports both 2G and 3G subscriber access, and provides various services related with the 3G core network. ZXWN MGW Media Gateway boasts the functions of media control and media flow control, and provides transmission resources. ZXWN MGW is totally compatible with 3GPP R4 of June 2003, and is downward compatible with 3GPP R99 of June 2002. ZXWN MGW not only supports the networking mode of bearer independent of control in 3GPP R4, but also can be bound to a MSC with ZXWN MSCS, supporting the networking mode of 3GPP R99. Besides satisfying the requirements of constructing the common mobile switching network, ZXWN MSCS also can satisfy the requirements of construct the No.7 signaling network and mobile intelligent network, and can be adapted to various complicated networking modes of the mobile switching network. Thus the continuity development capability of the network is improved. The purpose of writing this manual is to provide procedures and guidelines that support the maintenance of ZXWN MGW. Intended Audience This document is intended for engineers and technicians who perform maintenance activities on ZXWN MGW. Prerequisite Skill and Knowledge To use this document effectively, users should have a general understanding of wireless telecommunications technology. Familiarity with the following is helpful. What Is in This Manual � ZXWN MGW system and its various components � User interfaces on the ZXWN MGW � ZXWN MGW operating procedures. This manual contains the following chapters: Chapter Summary Chapter 1, Troubleshooting Introduces troubleshooting background, troubleshooting sequence, troubleshooting methods and types of faults. Chapter 2, Hardware Faults Introduces about the hardware faults. Chapter 3, Clock Faults Introduces about the clock faults. Confidential and Proprietary Information of ZTE CORPORATION i ZXWN MGW Troubleshooting Conventions Chapter Summary Chapter 4, Interface Faults Introduces about the interface and service faults. Chapter 5, OMM System Faults Introduces about the OMM system faults. Chapter 6, Interconnection Faults in IP-Bearer Network Introduces about the interconnection faults in IP bearer network. Chapter 7, Voice Faults Introduces the methods of handling voice faults occurring in the CN. ZTE documents employ the following typographical conventions. Typeface Meaning Italics References to other Manuals and documents. “Quotes” Links on screens. Bold Menus, menu options, function names, input fields, radio button names, check boxes, drop-down lists, dialog box names, window names. CAPS Keys on the keyboard and buttons on screens and company name. Note: Provides additional information about a certain topic. Mouse operation conventions are listed as follows: ii Typeface Meaning Click Refers to clicking the primary mouse button (usually the left mouse button) once. Doubleclick Refers to quickly clicking the primary mouse button (usually the left mouse button) twice. Right-click Refers to clicking the secondary mouse button (usually the right mouse button) once. Confidential and Proprietary Information of ZTE CORPORATION Declaration of RoHS Compliance To minimize the environmental impact and take more responsibility to the earth we live, this document shall serve as formal declaration that ZXWN MGW manufactured by ZTE CORPORATION are in compliance with the Directive 2002/95/EC of the European Parliament - RoHS (Restriction of Hazardous Substances) with respect to the following substances: � Lead (Pb) � Mercury (Hg) � Cadmium (Cd) � Hexavalent Chromium (Cr (VI)) � PolyBrominated Biphenyls (PBB’s) � PolyBrominated Diphenyl Ethers (PBDE’s) … The ZXWN MGW manufactured by ZTE CORPORATION meet the requirements of EU 2002/95/EC; however, some assemblies are customized to client specifications. Addition of specialized, customer-specified materials or processes which do not meet the requirements of EU 2002/95/EC may negate RoHS compliance of the assembly. To guarantee compliance of the assembly, the need for compliant product must be communicated to ZTE CORPORATION in written form. This declaration is issued based on our current level of knowledge. Since conditions of use are outside our control, ZTE CORPORATION makes no warranties, express or implied, and assumes no liability in connection with the use of this information. Confidential and Proprietary Information of ZTE CORPORATION i ZXWN MGW Troubleshooting This page is intentionally blank. ii Confidential and Proprietary Information of ZTE CORPORATION Chapter 1 Troubleshooting Table of Contents Basic Requirements for the Maintenance Personnel ................. Troubleshooting Flow .......................................................... Troubleshooting Principles.................................................... Troubleshooting Methods ..................................................... 1 2 4 4 Basic Requirements for the Maintenance Personnel Knowledge Networking and Running Environment Operations � Get familiar with the communication knowledge, such as mobile communication principles, ATM principles and soft-switching principle. � Get familiar with signaling protocols related with , BICC No.7 signaling and H.248 signaling. � Get familiar with related international technical regulations. � Understand billing principles and flows. � Understand basic knowledge about computer networks, including Ethernet, TCP/IP, Client/Server architecture and Oracle database. � Get familiar with product knowledge of the ZXWN MGW system, concerning functional structure, call flow, and service flow. � Know hardware architecture and performance of the ZXWN MGW system very well. � Know the inter-module routing and routing between modules and offices in the ZXWN MGW system very well. � Know signaling and protocols of the ZXWN MGW system and the networking equipments very well. � Get familiar with the network architecture and channel allocation of the relevant transmission equipment. � Master daily operations of the ZXWN MGW system. � Know well which operations will cause the interruption of part of or all services. � Know well which operations will cause damage to the equipment. Confidential and Proprietary Information of ZTE CORPORATION 1 ZXWN MGW Troubleshooting Instruments and Meters � Know well which operations will cause vital effects on the billing. � Know well which operations will cause the subscriber’s complaint. � Know well emergency or backup measures. The maintenance personnel of the ZXWN MGW system must get familiar with how to use instruments and meters to locate a fault. The common instruments and meters include multi-meter and the SS7 analyzer. Troubleshooting Flow The troubleshooting flow of the ZXWN MGW system is shown Figure 1. 2 Confidential and Proprietary Information of ZTE CORPORATION Chapter 1 Troubleshooting FIGURE 1 TROUBLESHOOTING FLOW Confidential and Proprietary Information of ZTE CORPORATION 3 ZXWN MGW Troubleshooting The above flow describes the location and processing flow of the non-emergency fault. As for the emergency fault, inform local ZTE office or call ZTE Global Customer Support Center as earliest as possible, and operate the equipments under the instruction of the ZTE technical support personnel. Troubleshooting Principles When performing the troubleshooting, follow the troubleshooting flow and the basic principles: check, ask, think and act. Check Check the phenomena of the fault first, that is, check which part of the equipment is faulty and which alarm is generated, and check the severity degree and the harm caused. For the phenomena checking, the system provides various tools, such as the performance statistics, the signaling tracing, the alarm query, the log query and failure observation. Ask After checking the phenomena, the maintenance personnel must inquiry the onsite personnel of each phase about the fault causes, such as that the data have been modified, that the file has been deleted, that the circuit board has been replaced, and that power cut, lightening or incorrect operation has occurred. Think Combining the on-site phenomena checked and the results acquired with the communication knowledge, think, analyze and judge the possible reason that may cause such fault, and then make the correct judgment. Act Find out the fault point based on the above three steps, solve and eliminate the fault by modifying data, replacing the circuit board, and so on. When operating the equipment, the maintenance personnel must consider whether it is the busy hour, and the potential consequence of operating at the busy hour. As for uncertain problems, the personnel must consult ZTE technical support personnel. Troubleshooting Methods Analyzing Fault Information The fault information analysis is mainly used to judge the range and category of the fault, providing the evidence for reducing the fault range and initially locating the fault on the primary phase of the fault processing. The personnel with rich maintenance experiences can even locate the fault directly. The collection and analysis of the fault information plays a vital role in processing other kinds of faults, especially the trunk fault, for the trunk needs to connect with the transmission system and there are signaling coordination problems. The fault information includes whether the transmission system runs normally, and whether the data or definitions of some signaling parameters have been changed by the opposite-end office. 4 Confidential and Proprietary Information of ZTE CORPORATION Chapter 1 Troubleshooting Analyzing Alarm Information The alarm information refers to the information transferred from the ZXWN MGW alarm system in the mode of sound, light or screen output. The alarm information features briefness and large and complete contents, involving the hardware, link, trunk, billing, CPU load and other parts of the ZXWN MGW. It is one of the important evidences for the fault analysis and location. The alarm information analysis is mainly used to find out the specific location or cause of the fault. Because the alarm information outputted by the Fault Management System of ZXWN MGW has large and complete contents, it is usually used to directly locate the fault cause, or coordinate with other methods to locate the fault cause. It is one of main methods to analyze the fault. With high location accuracy, the alarm generated by ZXWN MGW, for example, can test and locate each circuit of the signaling system. When the alarm station generates several alarm messages, firstly process the fault alarm with high level according to the alarm level, and then process the event alarm. Indicators Status Every board of the ZXWN MGW is equipped with corresponding running and status indicators, and some board is even equipped with the function or feature indicator. Board indicators reflect the working status of the board, and most of them can reflect the status of link, optical channel, node, channel, active/standby servers and others serving as one of important bases for fault analysis and location. The indicator status analysis is used to quickly find the fault position or cause, preparing for further processing. Due to the relatively inadequate information provided by the indicator, it is usually used together with the alarm analysis. Signaling Tracing The signaling tracing plays an important role in analyzing the failure cause of the subscriber call connection and inter-office signaling coordination. The cause for the call failure can be obtained from results of the signaling tracing, which is helpful for the subsequent analysis. ZXWN MGW provides a lot of signaling tracing methods. Log Querying Because the data configuration of ZXWN MGW is complex, incomplete configuration often causes faults. To quickly locate such faults, it is required to query the data configuration performed by the maintenance personnel. ZXWN MGW provides the log querying function, which can record the operator’s operation. Besides querying the log information of the local office, inquire about the data modification made by the opposite end office, when some problems are related to the opposite-end office. Test and Self-Loop � Test With the aid of instruments or testing software, test the corresponding technical parameters of subscriber lines, transmission channels and trunk equipment that probably have faults, and judge whether the equipment is faulty or going to be faulty based on the test results. In addition, ZXWN MGW supports testing a CPU on a board instantly, and scheduling the task to test CPUs periodically or in batches, too. � Self loop Self-loop refers to testing the transmission equipment or transmission channel by adopting the self-transmitting and Confidential and Proprietary Information of ZTE CORPORATION 5 ZXWN MGW Troubleshooting self-receiving method in hardware or software mode, to judge whether such conditions as the transmission equipment, the transmission channel, the service status and the signaling coordination are normal. Through these conditions, confirm whether the condition of the corresponding hardware and the software parameter settings are normal. This method is mostly used to locate the transmission problems and judge the correctness of the trunk parameter settings. When locating the fault related to the transmission, test is often used together with self-loop. Self-loop can be divided into hardware self-loop and software self-loop. The latter one is featured by simple operation and flexible usage, but its reliability is not as good as the hardware self-loop. In addition, during the course of the office commissioning and trunk expansion, the trunk self-loop of ZXWN MGW is often used to judge the correctness of such aspects as the trunk parameter settings of the local office and the outgoing routing data configuration. Caution: Cancel the software self-loop once the troubleshooting is completed. Therefore, it is recommended for the maintenance personnel to form a habit of making records to avoid such things. Unplugging/Plugging When the circuit board is faulty, eliminate such faults as poor contact or processor exception by plugging/unplugging the circuit board and external interface connector. During the course of plugging/unplugging the board, the operation regulations of plugging/unplugging the board must be strictly complied with; otherwise, the board and other components may be damaged. Comparison and Interchange � Comparison Comparison refers to comparing the faulty component or fault phenomenon with the normal one and performing analysis to find the differences and locate the fault. This method is applicable to the situation of single fault. � Interchange Interchange refers to interchanging a faulty component with the normal one (such as the board or the fiber) when the fault range or the faulty component still cannot be confirmed after replacing the components with spare parts. Then compare the new running status with the previous status to judge the fault range or location. 6 Confidential and Proprietary Information of ZTE CORPORATION Chapter 1 Troubleshooting Caution: Take all precautionary measures while interchanging the board, because board interchange has a high risk and is easy to bring new problems. Interchange operation should only be applicable to the components such as optical fiber and E1. Perform board replacement or board interchange when the traffic is low, for example, 0:00 ~ 6:00. Configuration Modification The modified configuration contents can include the timeslot, the board position, the board parameters, the number selector, the number properties, the trunk properties and the protocol type. So this method is applicable to eliminate the fault caused by the configuration error after locating the fault to the single site. For example, given that the signaling link to an office direction is abnormal, which affects the signaling transmitted through the link and services related to the link. The following configuration data can be modified: 1. Change the signaling route so as the new route does not use the signaling link to this office direction. If the fault disappears, it indicates that the fault lies in the original route. 2. Change the signaling link group so as the new signaling link group does not use the signaling link to this office direction. If the fault disappears, it indicates that the fault lies in the original link group. 3. Change the SPB channel used by the signaling so as the new signaling does not use the channel on the original SPB. If the fault disappears, it indicates the fault is related to the original SPB. In the process of version upgrading or capacity expansion, the original configuration can be deployed again to locate the fault when doubting that there are errors in the new configuration. When the fault cannot be exactly located to the board by modifying the timeslot configuration, the replacement method is required to further locate the fault. Therefore, under the condition of no spare board available, this method is applicable to preliminarily locate the fault type and temporarily recover the services by using the other service channels or board positions. Performance Statistics The performance statistics is mainly used to locate the system resource fault. The performance measurement analysis is to create the performance measurement task through the ZXWN MGW performance statistic function, to analyze the possible cause and range of the fault. Usually, its result is important for fault handling. Coordinating with the signaling tracing tool, the performance statistical tool plays an important role in finding abnormal inter-office signaling coordination and the trunk parameter setting error. Therefore, the maintenance personnel should master it. Configuring Data The configured data decides the working and coordinating mode of the system. Generally, the data modification is not allowed. Some special conditions, such as the abrupt change of the external environment or the wrong operation, probably cause the damage Confidential and Proprietary Information of ZTE CORPORATION 7 ZXWN MGW Troubleshooting or modification of the configuration data of the equipment, and the service interruption as well. If the fault has been located to the local system, it is capable to query and analyze the current configuration data, and identify the wrong operation of the network management by checking the user operating logs of the network management. Only the maintenance personnel with rich experiences and knowing well the equipment can analyze the configuration data. Experience Processing Such special condition as instantaneous abnormal power supply, low voltage and the strong external electromagnetic interference can cause the abnormal working status of the board. The service may be interrupted, accompanied with corresponding alarm, or no alarm. And the configuration data of each board may be normal. Testified by the experiences, the fault can be effectively handled by using such methods as resetting the board, restarting the equipment after power-off, synchronizing the data again or switching the MP. It is recommended not to use this method frequently, as this method cannot completely find out the fault reason. Except for the emergency, preferentially adopt previous methods or ask for the technical support through the normal channel to locate the fault to eliminate the external/internal hidden trouble of the equipment. 8 Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults Table of Contents Faults in Board Configuration ............................................... 9 Faults in Board Running Process ..........................................12 Handling Changeover Exceptions .........................................30 Handling CPU Overload.......................................................31 Faults in Board Configuration This section describes how to handle the faults occurring during board configuration. Handling Fault Occurring during Board Configuration Background ZXWN MGW adopts integrated hardware platform. According to different board types, the loaded board falls into loading OMP board and non-OMP board. When OMP board starts, it requests profile from OMM server through IP address that is allocated during serial port debugging. If the profile can be obtained, OMP board compares it with the local file. If they are consistent, OMP board invokes the version from the local hard disk for starting. Otherwise, it obtains the version file from OMM server for starting, and modifies the local profile at the same time. If the profile is unavailable, OMP starts according to the local profile. Non-OMP board requests the version file from active OMP instead of from OMM server for starting. If non-OMP board obtains the version file from OMP successfully, it uses the obtained version file for starting. Otherwise, it invokes the version file stored on its own FLASH. Common Phenomena Common Causes � Board indicators failed to be illuminated normally. � The OMM alarm system prompts that the board is offline or unstable. During board configuration, the common fault causes are as follows. 1. Board fault occurs in the BOOT phase. Confidential and Proprietary Information of ZTE CORPORATION 9 ZXWN MGW Troubleshooting Hardware problems result in a lot of boards failing to start, such as FLASH, RAM, hard disk, and some sub-cards. Poor contact between backplane and boards may also cause such problems. In these cases, faulty device can be located through the serial port printing. To locate the device fault, replace the related devices with spare parts to exclude the devices one by one. Generally, all devices complete self-test in the production phase. However, faults may occur in transit. Some faults can be handled manually, for example, installing hard disk or memory bar again. But in most cases, the board needs to be sent back for repairing. 2. Slot or backplane error occurs in the BOOT phase. Some boards failed to start due to slot or backplane problem. To locate it, plug the normal board into the faulty slot. If the board still fails to start, then there are some problems in the slot or backplane. On site, check whether the configuration conforms to the platform specifications for the slots, and whether the backplane of this slot is damaged physically. If the fault results from the physical fault of the slot or backplane, do not use this slot to avoid the fault on site. 3. The error occurs while downloading a version. Check whether the shelf control-plane connection-line corresponding to this slot is normal. Usually, all of the boards in the shelf cannot start if there are some problems in the connection line. Some faulty chips in the UIM board may also cause some slots in the shelf failing to start. 4. BSP self-test error Generally, if there are some problems in the equipment hardware, view the alarms on the background to find out specific causes. In some exceptional cases, the LED indicator of the board is physically faulty, but the board works normally. To handle the fault occurring in this phase, send the board back for repairing. 5. Errors occur in other subsequent phases. Usually the hardware works normally in these phases. The fault is due to the software configuration error. While handling the fault, first check whether the software version and related chip version are correct through the alarms corresponding to the version management. Then check whether the slot restriction, HW distribution, and port configuration are reasonable. Finally check whether the active/standby configuration or load-sharing configuration is correct. Flow Diagram 10 Figure 2 shows the flow of handling the fault occurring during board configuration. Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults FIGURE 2 HANDLING THE FAULT OCCURRING DURING BOARD CONFIGURATION Fault Handling Procedure 1. Check whether version loading is finished for all the boards in the Version Management tab of the Professional Maintenance window of the NetNumen M30 window. Check whether the OMPCFGx.INI profile (x is the office number) exists in the folder zxwomcs\ums-svr of the installation directory on the OMM server. Check whether the loaded version file exists in the zxwomcs\ums-svr\cnvmVerFile directory. If there is no OMPCFGx.INI file in the specified directory, create the OMP boot file in the Version Management window. If there is no loaded version file in the specified directory, load the version again. 2. On the File Management window of Local Maintenance Tool, check whether the version file exists on the hard disk (named as /DOC0) and memory (named as /IDE0) of OMP board. If the version file does not exist in the specified directory (/DOC0/VER and /IDE0/RELEVER) of hard disk on the OMP board, load the version again. 3. On the File Management interface of Local Maintenance Tool, check whether version files on other boards are correct. In general, version files exist in the /FLASH0/VER directory. If you failed to query the version file with the file management function of Local Maintenance Tool, check the running status of the board, implement the troubleshooting flow of board. If the queried version file is wrong, load the correct version file for this board in the Version Management tab. Confidential and Proprietary Information of ZTE CORPORATION 11 ZXWN MGW Troubleshooting 4. Check the working status of UIM and CHUB boards. If they are faulty, the communication between OMP and other boards will be interrupted. As a result, other boards cannot load version file from OMP. If UIM and CHUB boards work abnormally, check their hardware configuration and version loading to make sure that they run normally. An Instance for Handling Board Version Loading Failure Topic Handling board version loading failure Symptom ZXWN MGW may upload version files to server when loading version files in batch by default. But it cannot complete foreground switching. No prompt is displayed during this process. Solution 1. Check whether the communication between background server and foreground OMP is normal. Find that the background can ping through the foreground. Check CPU1 on OMP board, and find that the OMP board runs normally, and the RUN2 indicator on its panel is in normal status. 2. To make sure whether the OMP gets the version files to be loaded, check whether there are three files (*.BIN, *.RBF and *.ini) in the \ZXWN-OMCS\zxwomcs\ums-svr directory on OMM server, and then restart the OMP. Check whether the OMP configuration is correct, whether the OMP gets the version from server. The results meet the requirements. After restarting the OMP, log in the OMP from hyper-terminal, and check the working status of the board with the SCSSHowMcmInfo command. The board is normal. Therefore, the board hardware is not faulty. 3. Check the OMP board again, and find that the RPU fails to start. Check whether the background is configured with the capacity of office data. Restart the OMP. The problem still remains. 4. Probably the problem lies in FTP settings. On the Professional Maintenance > Version Management tab of the NetNumen M30 window, click the Get OMP’s OMC FTP Address button on the sub-tool. Find that the server IP address for version downloading is 127.0.0.1. Modify it to the server’s intranet IP address. After that, version loading is normal. Summery The FTP address of OMM server must be configured correctly for OMP board to get version files during the version-loading process. Faults in Board Running Process This section describes how to handle the faults occurring during the running process of boards. 12 Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults Fault Handling Procedure Common Fault Phenomena Common Causes of Faults Flow � Indicators on board cannot be illuminated normally. � OMM fault management system prompts that a board is offline or unstable. � All board-related services are interrupted. � Malfunctioning board’s own hardware fault � Upper-level board of the board that reports an alarm is faulty. � Poor contact between board and its slot � Backplane is faulty. � Port connection of the board is faulty. � UIM board fault � CHUB board fault Figure 3 shows the flow of handling the common faults occurring during the running process of boards. FIGURE 3 HANDLING COMMON FAULTS Precautions OF BOARDS Board changeover, reset, and replacement will bring some influence on the system. These operations must be implemented under the guide of ZTE technical supporters. Furthermore, board replacement procedure must strictly conform to the operating specifications. Refer to ZXWN MGW Board Replace for details. Confidential and Proprietary Information of ZTE CORPORATION 13 ZXWN MGW Troubleshooting Handling UIM Board Hardware Faults Background UIM board implements functions of managing resource shelf, the Ethernet Level-2 switching, and the circuit domain timeslot multiplexing/exchanging in the resource shelf. Meanwhile, the UIM board provides external interfaces of the resource shelf, including the packet data interfaces (GE optical interfaces) connecting with the core switching unit, the circuit domain interfaces (optical interfaces) connecting with the circuit switching units and the control plane data Ethernet interfaces (4 FEs) of the distributed processing platform. The UIM board provides the control plane, the media plane and the HW resources for this resource shelf. The UIM board can be divided into several types, as described in Table 1. TABLE 1 UIM TYPES Board Name Function UIMC Universal Interface Module of Control Having the GCX sub-card, without the T network and the media plane, and only providing the control plane resources UIMP Universal Interface Module of Packet Having the GXS sub-card, without the T network resources, and providing the control plane resources and the media plane resources UIMT Universal Interface Module of TSNB Having the TDM optical interface, introducing the TSNB 8K timeslot resources to this recourse shelf, and providing the big T network resources and the control plane resources UIMC Universal Interface Module of BUSN Having the small T network and the GXS sub-card, and providing the control plane, media plane and inner-shelf 4K timeslot switching resources The UIM cable has the following types: 14 � Clock connection line: adopts the ZTE dedicated clock cable, connecting the RCLKG to the UIM. It sends the CLKG clock signal to the UIM, and then the UIM provides the clock signal to each board in the shelf. � Control plane connection line: adopts the ZTE dedicated cable in the case of an office with multiple shelves, connecting the RUIM (UIM rear board) to the RCHB (CHUB rear board); adopts the straight Ethernet cable in the case of an office with two shelves, connecting from the RUIM of the main control shelf to that of non-main control shelf. � TDM connection line: used in the case of an office with the big T network, and adopts the fiber, connecting from the TFI of the core switching shelf in the circuit domain to the UIMT of the resource shelf. Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults Fault Phenomenon Flow � Media plane connection line: used in the case of an office with Level-1 switching shelf, and adopts the fiber, connecting from the GLIQV board of the Level-1 switching shelf to the UIM. � The network management alarm system reports that the OSS interruption occurs to several or all boards, and the network interface of the control plane is blocked. � The ALM indicator on the UIM board gives alarms. And the network management system reports the 8K and 16M clock input loss. � Plugging/unplugging the board except the UIM causes alarms that other online boards are abnormal. � There are whole-shelf HW loopback detection alarms in resource shelf. The troubleshooting flow of the UIM fault is shown in Figure 4. FIGURE 4 TROUBLESHOOTING FLOW Solution OF UIM FAULT 1. When the fault occurs on the UIM board, checking the fault from the indicators on the board is a relatively direct method. Sometimes, it is the only method. Table 2 lists the meanings of the indicators on the UIM board. The normal running status is that the RUN indicator flashes at 1 Hz, and both the ALM indicator and the ENUM indicator are constantly off. � If the RUN, ALM, and ENUM indicators are on, usually the board breaks down. Confidential and Proprietary Information of ZTE CORPORATION 15 ZXWN MGW Troubleshooting � � � � If the ENUM indicator is on, the board is not plugged well. Check whether the board is plugged well, and whether the extractor is closed. If the RUN indicator stays off, there is hardware fault occurs on the board. Replace the board. If the RUN indicator and ALM indicator are on constantly, there is a hardware logic problem in the board. It probably needs to update the chips on the board, which can be solved by replacing the board on site. If the RUN indicator flashes quickly, the board is loading the version information. If the board cannot work normally, check whether the hardware configuration and loaded version are correct. TABLE 2 INDICATORS ON UIM Indicator Description RUN Run indicator The board runs normally - flashing at 1 Hz. ALM Alarm indicator ACT Active/standby indicator On – Active, off - Standby ENUM Board plugging/unplugging indicator On: the board is not well plugged ACT-P Active indicator of the board packet domain On: the board packet domain is active ACT-T Active indicator of the board circuit domain On: the board circuit domain is active 16 ACT1 Optical interface 1 is active when plugging the GXS sub-card. ACT2 Optical interface 2 is active when plugging the GXS sub-card. LINK1 On: the FE-C1 port on the rear board is link up LINK2 On: the FE-C2 port on the rear board is link up LINK3 On: the FE-C3 port on the rear board is link up LINK4 On: the FE-C4 port on the rear board is link up SD1 There is signal in the optical interface 1 when plugging the GXS sub-card. SD2 There is signal in the optical interface 2 when plugging the GXS sub-card. Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults Indicator Description SD3 There is signal in the first TDM optical interface when using the TDM optical interface. SD4 There is signal in the second TDM optical interface when using the TDM optical interface. 2. Switch the active/standby UIM, to check whether the fault disappears. But it is recommended not to plug/unplug the possible faulty board, because the fault possibly will not appear again. The faulty board can be plugged and unplugged if the changeover fails. If the fault still exists, replace the faulty UIM board. 3. Whole-shelf HW loopback detection alarms of the resource shelf: the T network related part is abnormal, check such items as the small T network configuration of the UIMU board, whether the physical connection between the UIMT and the TFI is consistent with the UIM-COMM-TFI configuration port, the physical connection, and the connection indicator. Handling OMP Board Hardware Faults Background The OMP board is responsible for the operation, maintenance and management of the whole NEs on the foreground. It receives the instructions from the OMM, reports various alarm information, traffic statistical information and tracing message specified by the subscribers to the OMM. It is the operation and maintenance center of the system. The OMP is also responsible for managing and distributing the software version of all of boards on the foreground. Each board needs to load the required version file from the OMP for starting. The OMP connects to the IP network externally through the OMC2 interface on the RMPB rear board, implementing the communication with the OMM server. Fault Phenomenon Flow � The Run indicator on the panel of the OMP board cannot flash at 1 Hz normally, continuously alternating between flashing quickly and turning off. � The OMM alarm system reports that the communication between the OMP board and the foreground fails, unable to transmit any data to the foreground. � The NE maintenance tools of the foreground, such as the signaling tracing and failure observation, cannot work normally. � The OMP board is in a deadlock. The troubleshooting flow of the OMP fault is shown in Figure 5. Confidential and Proprietary Information of ZTE CORPORATION 17 ZXWN MGW Troubleshooting FIGURE 5 TROUBLESHOOTING FLOW Solution OMP FAULT 1. Observing the status of the RUN2 indicator on the OMP board, first judge whether the board works normally. There are two CPUs on the OMP board. The CPU1 belongs to the RPU module, while the CPU2 belongs to the OMP board. In the normal working status, the RUN2 indicator of the OMP module flashes at 1 Hz. Otherwise, there is fault in the OMP module. � � � � 18 OF If the ENUM2 indicator is on, the board is not plugged well. Check whether the board is plugged well, and whether the extractor is closed. If the RUN2 indicator stays off, there is hardware fault occurs in the board. Replace the board. If both the RUN2 indicator and ALM2 indicator stay on, there is a hardware logic problem in the board. It probably needs to update the onboard chips. It can be solved by replacing the board on site. If the RUN2 indicator flashes quickly, it indicates that the board is loading the version information. If the board cannot work properly, check whether the hardware configuration and loaded version are correct. Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults � If the HD2 indicator stays on, it indicates that the board is reading the hard disk all the time. Probably, there is a fault in the hard disk of the board. Replace the board. 2. On the OMM client, perform the active/standby changeover for the OMP board. If the changeover is failed, directly press the EXCH key on the board for a while to implement hardware changeover. 3. If the fault still remains after changeover, reset the OMP board. Reset the board on the OMM client. If the resetting is unsuccessful, press the Reset key on the board with a tool to reset the board forcefully. If the fault still remains after resetting, replace the OMP board to make sure whether it is the hardware fault. 4. If the fault still remains, check the pins on the backplane to eliminate the contact problem. Check whether the DIP switches on the backplane are correct. The DIP switches in the shelf where the OMP is located are fixedly in the No.2 shelf in the No. 1 rack. Handling SMP Board Hardware Faults Background The SMP board processes the MTP3 and its upper-layer protocols, including the CC, MM, SCCP, BSSAP, BSSAP+, RANAP and H.248 protocol. It is the center of all services of the control system. According to different functions, the SMP board is divided into the signaling SMP and service SMP. � The signaling SMP is responsible for processing signaling, such as the SIGTRAN protocol. � The service SMP is responsible for various upper-level services, such as the call control and the mobility management. In addition, the SMP generates the original CDR that is forwarded by the USI to the billing server through the internal bus. Without external interfaces, the SMP does not need the rear board and the connection lines. Fault Phenomenon Solution � There are OSS communication interruption alarms in the network management. � The RUN indicator of the SMP board cannot flash at 1Hz, continuously alternating between flashing quickly and turning off. � Some signaling links are unavailable. � The services of some subscribers are interrupted. Both the SMP and the OMP adopt the MPx86 physical board. For the fault handling procedures, refer to “Solution” in Handling OMP Board Hardware Faults. Confidential and Proprietary Information of ZTE CORPORATION 19 ZXWN MGW Troubleshooting Handling SIPI Board Hardware Fault Background The SIPI board is responsible for providing the bottom-layer IP interface for SIGTRAN, and the external IP interface as well. The RMNIC rear board provides the external interface of the SIPI, connecting to the IP network with the network cable, and interconnecting with the adjacent office interface. Fault Phenomenon Flow � The fault management system on the OMM client prompts that the association is interrupted. � The RUN indicator on the SIPI board flashes abnormally. � The office direction of some associations is unreachable. � Subscriber services are interrupted. Figure 6 shows the flow of troubleshooting the SIPI board fault. FIGURE 6 HANDLING Solution THE SIPI BOARD FAULT 1. The MNIC does not need the FE port when serving as the SIPI board. At this time, set the number of FE ports as zero. 2. The SIPI board connects the external network with the FE1 network interface, and the internal control plane with the FE3 network interface. Check the network cable to make sure that the connection of network interfaces is normal. 3. Locate and replace the SIPI board with the replacement method. Make sure whether the fault results from itself or other boards. Example: When the OMP is faulty, the SIPI cannot load the version file while starting, resulting in the activation failure of the SIPI board. When the UIM is faulty, the SIPI cannot communicate normally. 20 Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults Handling SPB Board Hardware Faults Background The SPB is responsible for processing the narrowband Signaling System No.7 (SS7) MTP-2 protocol and its lower-layer protocols. Each board provides 16 E1s externally for accessing, and extracts the 8K line clock reference to the CLKG. In addition, the SPB can be set with the 75 Ω or 120 Ω transmission mode according to different impedance characteristics of the transmission line. The SPB connects with other offices through the interfaces on the RSPB rear board, adopting the ZTE dedicated transmission cable as the connection line. Fault Phenomenon Flow � The board cannot be started, or in a deadlock while running. The alarm indicator on the board flashes. � The fault management system prompts that the bit errors exist in the link layer, or that the CRC check fails. � Some or all links on the SPB board are interrupted. The troubleshooting flow of the SPB fault is shown in Figure 7. FIGURE 7 TROUBLESHOOTING FLOW OF SPB BOARD Confidential and Proprietary Information of ZTE CORPORATION 21 ZXWN MGW Troubleshooting Solution 1. Check whether the CPU1 is normal. There are up to four CPUs on the SPB board. The CPU1, as the primary CPU, manages the board resources, including turning on each indicator. The rest three CPUs are the secondary CPUs. Therefore, check the status of CPU1 first when the board cannot be start or in a deadlock during the running. Check the CPUs of the SPB board in the Rackchart Management tab of the Daily Maintenance window of the NetNumen M30 window. The system will display the status of four CPUs. Check whether the CPU1 is normal. 2. The normal running status is indicated as: � The RUN indicator flashes at 1 Hz. � The ACT indicator is on constantly. � The ALM and the ENUM indicators are off constantly. When all these indicators are on, the CPU1 possibly cannot be started. The reasons are as follows: � � If this problem appears when the board is powered on for the first time, generally, the reason is that the boot is burned incorrectly. Burn the software of the BOOT chip again. The mother board of SPB is faulty. Replace the board to eliminate the fault. 3. The board runs normally, but there are bit errors in the E1 link, especially with large traffic. � � � Check whether the clocks of two interconnected environments are synchronous; Check whether the configuration of the jumpers and board impedance DIPs are consistent with the cable adopted; Check whether the impedance against ground of each environment conforms to the requirements. For the SPB, the specific method of checking the impedance matching is as follows. 1. Findd four DIP switches (S3-S6) on the board, with each one corresponding to four channels of E1s (No.1~16 channels of E1s from the upper to the lower). The "ON" position represents 75 Ω, while the “OFF” position represents 120 Ω. 2. Find four DIP switches S2 that indicate the E1 status. From the upper to the lower, one switch corresponds to one group of E1s (having 4 E1s), that is, 16 E1s totally. The "ON" position represents 75 Ω, while the “OFF” position represents 120 Ω. 3. Set the jumper on the rear board. There are five groups of jumpers (X11-X15) from the upper to the lower. Where, the X11 and X15 correspond to 2-channel E1s respectively, while other jumpers correspond to 4-channel E1s respectively. Each channel of E1 is configured with jumpers on the sending and receiving directions. The sending jumper is above the receiving one. When the impedance matching is set as 75 Ω, just place the jumper on the sending end, and no jumper on the receiving end. When the impedance matching is set to be 120 Ω, place no jumper on both the receiving and the sending end. 22 Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults Handling CLKG Board Hardware Faults Background As the clock generation board, the CLKG is responsible for providing the clock for the system, tracing the external reference line clock, and keeping the NE clock synchronous with the network clock. In the ZXWN MGW, the clock is used to help establish stable link with the traditional SS7 network. Both the SPB board and UIM board use the clock. The CLKG board need not download the version from the OMP during the activation. It activates through the program in the onboard chip directly. Therefore, the version loading problem can be omitted while troubleshooting the CLGK board. The CLKG board provides the internal clock interfaces and external clock tracing interfaces through the RCLKG rear board. The internal clock interface connects to the UIM on the expansion frame with the ZTE dedicated clock cable, and then UIM sends the clock to each board that needs the clock. The external clock tracing interface connects to the clock tracing signal output interfaces of the external interface boards such as SPB, APBE and SDTB with the ZTE dedicated clock tracing cable. Fault Phenomenon Flow � The RUN indicator on the CLKG board flashes abnormally. � The red ALM indicators on the UIM, APBE and SPB boards are on. � The fault management system on the OMM client prompts that the clock signal is lost. � Faults occur in the SS7 link, such as intermittent disconnection, bit error and interruption. Figure 8 shows the flow of troubleshooting the CLKG board fault. Confidential and Proprietary Information of ZTE CORPORATION 23 ZXWN MGW Troubleshooting FIGURE 8 TROUBLESHOOTING FLOW Solution OF THE CLKG BOARD 1. Check the indicators on the CLKG board to confirm the running status of the CLKG. 2. Check whether the external line of the rear board RCLKG of the CLKG is connected correctly. 3. Perform active/standby changeover to check the working status of the CLKG board. 4. Check whether the fault lies in the CLKG board by using the replacement method. If the CLKG board is faulty, replace it. Handling IPI Board Fault Background Fault Phenomenon Flow 24 The SIPI board is responsible for providing the IP interface of the media plane externally. The RMNIC rear board provides the IP interface, connecting to the IP network with the network cable, and interconnecting with the adjacent office interface. � The fault management system of the OMCS client prompts that there is a RTP trunk alarm. � The RUN indicator on the IPI board flashes abnormally. � Subscriber services are interrupted. Figure 9 shows the flow of troubleshooting the SIPI board fault. Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults FIGURE 9 HANDLING Solution THE IPI BOARD FAULT 1. The IPI board adopts the MNIC board. When the FE port number is set as one or two, the MNIC board actually occupies two media plane network interfaces. It will occupy the media plane network interfaces on the adjacent slots. 2. Four network interfaces of the IPI board can all be used to connect the external. Check the network cable to confirm the connection of the network interface is normal. 3. Locate and replace the IPI board with the replacement method. Make sure whether the fault results from itself or other boards. Example: When the OMP is faulty, the IPI cannot load the version file while starting, resulting in the activation failure of the IPI board. When the UIM is faulty, the IPI cannot communicate normally. Handling DTB/DTEC Board Fault Background Fault Phenomenon As the digital trunk interface board, the DTB/DTEC board is used to access the E1/T1 link. It provides 32-channel E1/T1 interfaces externally through the RDTB rear board, and extracts 8k line clock reference to the CLKG. In addition, it can respectively set the 75 Ω and 120 Ω transmission mode based on the impedance characteristics of the transmission line. � The fault management system of the OMCS client prompts that there is an E1 trunk alarm. � The RUN indicator on the DTB/DTEC panel flashes abnormally, and the ALM indicator shows that there is an alarm. � Subscriber services are interrupted. Confidential and Proprietary Information of ZTE CORPORATION 25 ZXWN MGW Troubleshooting Flow Figure 10 shows the flow of handling the DTB/DTEC board fault. FIGURE 10 HANDLING Solution THE DTB/DTEC BOARD FAULT 1. There are totally 12 4-position DIP switches on the DTB/DTEC board. Eight 4-position DIP switches (S1-S6, S9, and S12) are for setting the matching impedance of each E1 channel as 75 Ω or 120 Ω. S7 and S8 4-position DIP switches are for indicating the receiving matching impedance of corresponding E1 chip to the CPU. S10 and S11 position DIP switches are for indicating the long/short wire status of each E1 chip to the CPU. 2. The DTB/DTEC board provides 32 E1s to the outside world. If there is a problem in a certain E1, locate the fault by creating the E1 self loop to make sure whether the problem occurs on local end or opposite end. If it is at local end, check the status of related sub-unit and trunk data. 3. If all of the E1s of the board are faulty, first try to reset the board, and check data. If the fault still exists, replace the board with a new one to eliminate the fault. Handling APBE Board Fault Background 26 As the Iu-CS interface board, the APBE board provides two optical interfaces to connect the interface at the RNC side. It is responsible for processing the ATM adaptation, the broadband No.7 bottom-layer signaling such as the ALL5-SAR, SSCOP, and SSCF, and Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults forwarding the MTP3B signaling packet to the SMP through the FE interface for processing. Fault Phenomenon Flow � Serious alarm appears in the fault management system, and the communication interruption occurs on the APBE unit. � All of the RNC services connected by the APBE are interrupted. � The fault management system prompts that the office direction to the RNC is unreachable. Figure 11 shows the flow of troubleshooting the APBE fault. FIGURE 11 HANDLING Solution THE APBE BOARD FAULT 1. There are two pairs of optical interfaces on the APBE board panel. Each pair of optical interfaces is configured with sending and receiving directions. If the sending/receiving directions are reversed, the board cannot receive the optical signal. 2. The APBE board must be configured with correct ATM address; otherwise, the RNC office direction is unreachable. 3. Check whether the PVC signaling, voice channel PVC, VCI, and VPI, and PATH ID settings are correct. Handling MRB Board Fault Background The MRB is a media resource board in the MGW. It is responsible for providing various media resources for the system, including various signal tones, voice tones, DTMF, MFC, and multi-party Confidential and Proprietary Information of ZTE CORPORATION 27 ZXWN MGW Troubleshooting talking resources. The MRB board is configured in the BUSN shelf, without backup. The MRB implements different functions by configuring different attributes for the sub-units in the OMM system. For example, the sub-unit represents the tone sub-unit when being configured as TONE, the dual tone multiple-frequency sub-unit when being configured as the DTMF, and the multi-frequency compelled sub-unit when being configured as the MFC. Fault Phenomenon Flow � For the call service, the local office cannot provide system tones or provide wrong tones. � For the second call service, subscriber cannot dial a second call, or an error occurs on the number identification after the second call. � Three-party call and conference call service cannot be implemented. � Serious alarm appears in the fault management system, and the communication interruption occurs on the MRB unit. � The RUN indicator on the panel of the MRB board is abnormal. Figure 12 shows the flow of troubleshooting the MRB fault. FIGURE 12 HANDLING Solution 28 THE MRB BOARD FAULT 1. There are total four DSPs on each MRB board. A DSP can be configured with different media resources. According to the fault phenomena, find out the corresponding DSP for resetting. 2. The MRB must be configured under the corresponding SMP according to the planning. If the configuration is wrong, the sub- Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults scribers under the corresponding SMP cannot use the resources in the MRB. 3. Reset the whole board. If the fault still remains, replace the faulty board with a new one to eliminate the fault. Handling VTCD Board Fault Background Fault Phenomenon Flow The VTCD board is the voice codec board in the MGW. It is responsible for coding/decoding voice Iu-UP and AMR, and processing the Nb-UP. The VTCD board is configured in the BUSN shelf, without backup. � Serious alarm appears in the fault management system, and the communication interruption occurs on the VTCD unit. � Such faults as the noise or no voice occur during the call. � The indicator on the VTCD is abnormal. Figure 13 shows the flow of troubleshooting the VTCD fault. FIGURE 13 HANDLING VTCD BOARD FAULT Solution 1. There are total four DSPs on each MRB board. A DSP can be configured with different media resources. According to the fault phenomena, find out the corresponding DSP for resetting. 2. The MRB must be configured under the corresponding SMP according to the planning. If the configuration is wrong, the subscribers under the corresponding SMP cannot use the resources in the MRB. 3. Reset the whole board. If the fault still remains, replace the faulty board with a new one to eliminate the fault. Confidential and Proprietary Information of ZTE CORPORATION 29 ZXWN MGW Troubleshooting Handling Changeover Exceptions Background Common Causes Handling Board changeover is a kind of maintenance means frequently used for board replacement, version update, and other routine maintenance activities. After changeover, service personnel should observe the working status of the system. If it works normally, board changeover is successful. If the system cannot work normally, immediately power down the board on which changeover was performed, and bring the standby board into active status. The common causes for changeover exceptions are as follows. � Operations do not conform to standardizations. � It is prohibited by system running status. � Standby board is in abnormal status. 1. Check whether standby board is normal. When standby board is unavailable for no insertion, faults or abnormal running status, the system will deny to implementing active/standby changeover. On the Daily Maintenance window of the NetNumen M30 window, find the NE to be maintained. Open the Rackchart Management tab, select the corresponding module and rack, and find the board on which active/standby changeover is to be implemented. Check the board’s information. If the board is in hybrid, unknown or other statuses, active/standby changeover cannot be performed for it. 2. Check whether board version is loaded normally. On the Professional Maintenance > File Management tab of the NetNumen M30 window, check whether the board version files on the board are correct. If you failed to query version files with this method, check the running status of the board, and implement the troubleshooting flow for the board. If you can query version files, but they are wrong, load the correct ones for this board in Version Management. 3. Other prohibited conditions To ensure the safe operation of switches, the system will also deny the changeover operation when large traffic, high CPU occupancy, scheduled tasks, data backup, and other special conditions occur. If you forcefully to implement changeover, serious consequences may arise, such as CDR missing, call disconnection, and all active/standby boards being reset. 30 Confidential and Proprietary Information of ZTE CORPORATION Chapter 2 Hardware Faults Caution: � Changeover is an operation with relatively high risks. System data backup must be done in advance. � It is recommended to perform changeover on OMP and other important boards at 0:00 - 6:00 am, and to keep a certain interval between two changeover activities. Handling CPU Overload Background Common Causes Common Fault Phenomena Flow CPU overload is a major fault in ZXWN MGW. Too-high CPU usage will increase call loss and decrease the call completion ratio. More serious condition will cause ZXWN MGW breakdown. � Over excessive traffic � Interface congestion � Too-short cycle of performance statistics task � Irrational location area settings � Informal maintenance activities � Incorrect data settings. � The “Load of the CMP board is excessive” alarm appears in Fault Management system � View the CPU occupancies of CMP (service processing module) and SMP (signaling processing module) in the Performance Statistics and Load Statistics windows of the NetNumen M30 window. 1. Check traffic. On ZXWN MSCS, query recently performance statistics reports to know the traffic within a period of time. Generally, observe the conditions of CPU overload caused by large traffic. To control traffic, refer to the Emergency Handling for Large Traffic section in ZXWN MSCS MSC Server Emergency Fault Handling and System Recovery. 2. Check maintenance operations. A lot of maintenance tasks will consume too many CPU resources. Therefore, some operations should be avoided when traffic is large, such as performing bulk modification with commands, displaying excessive command-executed results, performing dynamitic tracing on excessive links, tracking excessive signaling, and other operations. 3. Check performance statistical cycle. During routine maintenance, majority traffic statistical tasks are closely associated with calls. Therefore, too-short statistical task cycle will aggravate the CPU load of system. Currently, a 1-hour cycle is relatively adequate. Confidential and Proprietary Information of ZTE CORPORATION 31 ZXWN MGW Troubleshooting 4. Check whether data configuration is correct. For MSCS, data configuration errors will cause CPU overload in the following three aspects. � � � Unbalanced load-sharing configuration on signaling links and trunks results in some signaling links carrying too large load. It causes the board that is responsible for processing this part of services is overloaded. In this case, data-link configuration should be adjusted. Unbalanced trunk distribution results in some modules carrying relatively heavy load. In this case, circuits should be distributed to each module. Incorrect MAP configuration will also cause excessive CPU load. 5. Check whether location area settings are rational. Slit the location areas with irrational settings, or adjust them through BSC/RNC. 32 Confidential and Proprietary Information of ZTE CORPORATION Chapter 3 Clock Faults Table of Contents Handling System Clock Exception ........................................33 Handling the Clock Lock Failure ...........................................35 Handling the Clock Networking Fault ....................................38 Handling the Inconsistent Lock Status of Active/Standby CLKG Boards.....................................................................39 Handling the Clock Reference Loss .......................................41 Handling the Output Clock Loss ...........................................42 Handling the Slip Code .......................................................44 Handling System Clock Exception Background The clock plays a vital role in the CS domain. Without clock, the CS domain cannot work normally. Therefore, it is necessary to ensure the normally working of clock system. In ZXWN MGW, the CLKG is plugged into the main control shelf where the OMP is located, providing the clock signal to the UIM boards in all shelves. The CLKG traces the upper-level office clock signal from the DTB, SDTB, APBE and SPB boards with the clock tracing cable to keep the local clock synchronous with the upper-level office clock. Common Causes Flow Diagram � On the Fault Management window, there are Clock Loss alarms reported by the resource board, such as DTEC, VTCD, MRB and SPB. � The ALM indicator of the board is on. � The narrowband link is interrupted or there are alarms in the narrowband link. Figure 14 shows the flow of troubleshooting the clock abnormality. Confidential and Proprietary Information of ZTE CORPORATION 33 ZXWN MGW Troubleshooting FIGURE 14 HANDLING Solution THE CLOCK ABNORMALITY 1. Check whether the CLKG board runs normally. For the method, see Handling CLKG Board Hardware Faults. 2. If there is a fault in the CLKG board, handle it first, and then check whether the system clock exception disappears. If the CLKG works normally, check whether the connection between the UIM board in the BUSN shelf and the CLKG is correct. Note: The CLKG board is connected to the UIM through the RCLKG rear board. 3. If there is a fault in the connection line, replace the connector of the clock cable to the UIM board, or the clock cable to eliminate the fault. 4. If the clock is still lost in the case of both the CLKG board and the connection between the CLKG board and UIM board are correct, check whether the UIM board works normally. For the method of troubleshooting the UIM fault, see . 34 Confidential and Proprietary Information of ZTE CORPORATION Chapter 3 Clock Faults Handling the Clock Lock Failure Background During the running process, the ZXWN MGW extracts the upperlevel office line clock, BITS clock or GPS clock according to the configuration, to keep synchronous with the entire network clock. When tracing the upper-level office line clock, it extracts the upper-level clock reference signal through the interface boards SPB, DTB, APBE, and SDTB, and sends signal to the RCLKG rear board of the CLKG through the dedicated clock tracing cable. The CLKG traces this clock reference automatically. When tracing the BITS clock or GPS clock, the dedicated clock tracing cable transmits the external clock reference signal to the RCLKG rear board of the CLKG. The CLKG automatically traces this clock reference. The clock lock failure means that the local office cannot synchronously trace the standard clock reference of upper-level office, such as the BITS clock and line clock. When the clock lock is failed, the working status of the CLKG board in the ZXWN MGW is as follows. � The CATCH indicator is constantly on. � Both the CATCH indicator and the TRACE indicator have flashed synchronously for more than 30 minutes. Based on the above-mentioned conditions, judge that the CLKG board is hard or unable to lock the upper-level clock. Fault Phenomenon Flow � The link-layer bit error occurs in the outgoing narrowband link of the E1 (connecting with the HLR) or DT (connecting with the PSTN) of the SPB board. � The outgoing E1 indicator of the SPB/DTEC board flashes abnormally, and the alarm indicator is on. � The signaling link is disconnected or intermittent. � The CLKG board cannot remain stable tracing status. � The indicator on the CLKG board is abnormal. � The signaling link is unstable, appearing intermittent disconnection and even interruption. � The fault management system reports the slip alarm. � Such faults as the noise and one-way transfer may possibly occur when subscribers are talking. The troubleshooting flow of the clock locking failure is shown in Figure 15. Confidential and Proprietary Information of ZTE CORPORATION 35 ZXWN MGW Troubleshooting FIGURE 15 TROUBLESHOOTING FLOW Solution OF CLOCK LOCKING FAILURE 1. The CLKG board is preheated insufficiently. The CLKG board adopts the high-stability and high-precision crystal oscillator to generate the local clock. Due to the working characteristics of the crystal oscillator, the CLKG board cannot enter the normal tracking status until it has been preheated for a period of time. If the system failed to trace the clock signal when the CLKG board has not run for enough time or just been replaced, wait for at least three hours for the CLKG board is preheated sufficiently, and then observe the clock tracing status again. 2. Check the setting of the interface board (SPB). � � If the clock fault occurs on the backplane of the SPB, troubleshoot the fault by replacing the CLKG board, UIM board, and clock output cable. Check whether the setting of 75/120 Ω matching impedance is correct. The impedance of the E1 cable must match that of the board. The method to set the E1 cable impedance is: set the DIP switches S3~S6 on the SPB board. The ON position in- 36 Confidential and Proprietary Information of ZTE CORPORATION Chapter 3 Clock Faults dicates that the matching impedance is 120 Ω, while the OFF position indicates that the matching impedance is 75 Ω. Table 3 lists the corresponding relationship between the DIP switches and E1 lines. TABLE 3 IMPEDANCE DIP SWITCHES � � OF E1 ON THE SPB BOARD DIP Switch Corresponding E1 1st bit ~ 4th bit of S3 Channel 1~4 E1 of the SPB board 1st bit ~ 4th bit of S4 Channel 5~8 E1 of the SPB board 1st bit ~ 4th bit of S5 Channel 9~12 E1 of the SPB board 1st bit ~ 4th bit of S6 Channel 13~16 E1 of the SPB board Check the reliability of the cable connecting to the E1, which can be solved by replacing the E1 cable. When the E1 line length is more than 300 meters, adopt the long-line mode. For the DTEC board, find the E1 status indicating DIP S10 and S11, with eight switches in total. From the upper to the lower, a switch corresponds to a group of E1s (4 E1s), that is, eight groups of E1s in total. The short-line mode is set by placing the switch to ON, and the long-line mode is set by placing it to OFF. For the SPB board, find the E1 mode setting DIP S1, with four switches in total. From the upper to the lower, a switch corresponds to a group of E1s (4 E1s), that is, four groups of E1s in total. The short-line mode is set by placing the switch to ON, and the long-line mode is set by placing it to OFF. 3. Handle the clock networking fault. The clock networking fault results from the self-loop relationship exists between the local exchange and upper-level clock office, which causes the clocks to track mutually. In this case, the clock lock failure often occurs. For the handling method, see Handling the Clock Networking Fault. 4. Handle the upper-level clock source fault. The CLKG board works normally and the clock networking relationship is correct, but the clock still cannot lock the upper-level office. In this case, contact the office homed by the upper-level clock source to make sure whether there is clock fault in the upper-level office. If the clock-source accuracy of the equipment to be traced does not reach the level-2 clock, negotiate with the office to ask for the clock source with high accuracy. 5. Handle the inconsistent lock status of the active/standby CLKG board. Confidential and Proprietary Information of ZTE CORPORATION 37 ZXWN MGW Troubleshooting If one of active/standby CLKG boards of local office can trace and lock the upper-level clock, there is no problem with the upper-level clock source and clock networking. The fault must exist in this office. For how to handle this kind of faults, see Handling the Inconsistent Lock Status of Active/Standby CLKG Boards. Handling the Clock Networking Fault Background In the communication network, all node clocks must keep synchronous so that the link document and data receiving/transmitting can be identified normally. In the clock networking, the clock synchronous relation between each node is divided into following three types: � Quasi-synchronization This synchronization mode is generally applicable to the international communication, for the PRC used by different countries has high accuracy that reaches 1×1011 so that the slip occurs only one time within 70 days. � Mutual synchronization This synchronous relation is relatively complex and requires the clocks with higher level. � Master-slave synchronization With a reference clock in the network, distribute the clock based on the hierarchy. The master-slave synchronization adopting 3-level architecture is adopted in China. The level-1 clock, PRC serves as the master clock and complies with ITU-T recommendation G.811. Composed by the cesium clocks, the PRC mainly works in the free-running status. There is another local master clock mainly composed by the atomic clock and the GPS can be synchronized by the PRC. The level-2/level-3 slave clock is mainly used on the level2/level-3 node of the network. The node clock stability complies with the G.812/YD-T1012. Here, the level-2/level-3 corresponds to original enhanced level-2/level-3. Overview The clock networking fault refers to the clock synchronization fault caused by the error configuration of synchronous relation with other office. Clock self-loop is the most typical one. Two offices with the master-slave relation track and synchronize the clock mutually when they are synchronizing the clock reference. Therefore, the close-loop of clock is formed. That is the clock self-loop. For example, when the ZXWN MSCS is interconnected with the PSTN, the MSCS tracks the E1 line clock on the PSTN line and compares with the 8K line clock on the PSTN. And the PSTN does the same thing to the MSCS. This is the clock self-loop, as shown in Figure 16. 38 Confidential and Proprietary Information of ZTE CORPORATION Chapter 3 Clock Faults FIGURE 16 EXAMPLE FOR CLOCK SELF-LOOP In the practical networking, connecting the 8K link clock reference often causes the clock self-loop fault. Such a fault is relatively reduced when the BITS clock reference is adopted. Fault Phenomenon Solution When the clock self-loop fault occurs, the common fault phenomena are shown as follows: � The system is hard to track the clock again when the clock lock loss happens frequently. � The system clock cannot be locked. The handling method of clock networking fault is shown below: 1. Check the actual result of the clock networking to make sure whether the clock self-loop fault occurs. 2. If it is, change the reference clock source of the local office or interconnected office based on the actual condition to eliminate this fault. Adopt the PSTN clock reference to eliminate the clock self-loop fault as shown in the Figure 16. Handling the Inconsistent Lock Status of Active/Standby CLKG Boards Overview When the clock tracing status of two CLKG boards configured with active/standby mode are inconsistent, the hidden trouble exists, although the system may work normally. At this time, it is necessary to check the system, and handle the hidden trouble to make the active and standby CLKG boards work normally. In this case, the clock networking fault and the clock source fault generally can be excluded basically. Fault Phenomenon When the CLKG boards is configured with the active/standby mode, one CLKG board cannot be locked (the CATCH indicator is Confidential and Proprietary Information of ZTE CORPORATION 39 ZXWN MGW Troubleshooting constantly on, or both the CATCH indicator and TRACE indicator flash synchronously), but another CLKG board can be locked. Solution The method of handling inconsistent lock status of the active/standby CLKG boards is as follows. TABLE 4 HANDLING METHOD OF INCONSISTENT LOCKING STATUS ACTIVE/STANDBY CLKG BOARDS Phenomenon Method The BITS clock serves as the clock reference Step 1 The 8K line clock servers as the clock reference Step 2 OF 1. The BITS clock serves as the clock reference In the case of the BITS clock serving as the clock reference, basically restrict the fault location on the slots and the CLKG board itself while handling this inconsistent lock status fault. i. Switch the CLKG board incapable of locking the clock to the standby status. And then plug/unplug this board again to exclude the contact problem. ii. After unplugging this board, check whether the pin 24/25 of the J5 connector of the corresponding slot are bent. The pin distortion may also result in this fault. iii. If it is permitted, replace the clock reference of the uncertain CLKG board to see whether this board can be locked. In this way, exclude or locate whether the CLKG board is faulty. 2. The 8K line clock serves as the clock reference The 8K reference clocks of the active/standby CLKG board are transferred independently. Therefore, there is no inevitable association between the 8K reference clock of the active CLKG board and that of the standby CLKG board. As a result, it cannot directly draw a conclusion that the CLKG board is faulty when the lock status of the active/standby CLKG board is inconsistent, in the case of the 8K clock source serving as the CLKG current reference clock. Troubleshooting the fault carefully is necessary. Although the 8k clock cables are separated on the backplane, they have the same source, therefore, the fault generally does not relate to the clock networking or the clock hierarchy. i. Touch the RJ45 connector that feeds in the clock reference on the RCKG1 rear board to eliminate the contact problem. If the fault still exists, handle it by performing the following steps. ii. If there exists another 8K reference clock, switch the current tracking reference of the CLKG through the background or manual mode to the standby reference. And then observe the lock status of the CLKG. If the CLKG is locked, restrict the fault location on the slot or the rear board; otherwise, the CLGK board is basically faulty. 40 Confidential and Proprietary Information of ZTE CORPORATION Chapter 3 Clock Faults iii. If there is only one 8K reference clock, pull out the clock reference cable from the RCKG1 rear board, replace the normal clock reference cable, and extract the clock from other interface board. If this phenomenon disappears, the fault is located on the clock cable, the interface board that extracts the clock, and the rear board of this interface board. If the phenomenon still exists, restrict the fault location on the CLKG, the slot plugged the CLKG, and the RCKG1. And then handle the fault by using the replacement method, find out the faulty component and replace it. Handling the Clock Reference Loss Overview The clock reference loss refers to the upper-level clock reference signal used by the system is lost, which makes the system asynchronous with the whole network. When the CLKG board gives the clock loss alarm, it is necessary to check and handle the fault. Fault Phenomenon When the BITS clock reference is lost, probably appear following two conditions: Solution � Both active CLKG board and standby CLKG board lose the clock reference. � The clock reference loss status of the active/standby CLKG board is different. The handling method of clock reference loss is shown in Table 5. TABLE 5 HANDLING METHOD OF INCONSISTENT CLOCK REFERENCE STATUS OF ACTIVE/STANDBY CLKG BOARDS Phenomenon Method The BITS clock serves as the clock reference Step 1 The 8K line clock servers as the clock reference Step 2 1. The BITS clock serves as the clock reference. i. Both active CLKG board and standby CLKG board lose the clock reference Check the indicators on the active/standby CLKG board, finding that the KEEP indicator (working mode indicator) and the corresponding BITS clock reference indicator (such as the 2MBPS1 indicator) are constantly on. This means that the CLKG was in tracking status before losing the current clock reference. In this case, the CLKG board itself is normal. Check whether the 9-pin connector on the RCKG1 rear board is dropped or loose. Troubleshoot the corresponding output or transmission equipment of the BITS. Confidential and Proprietary Information of ZTE CORPORATION 41 ZXWN MGW Troubleshooting ii. The clock reference loss status of the active/standby CLKG board is different. If this fault occurs during the system running, check whether the CLKG board is well plugged, for which will make the board loose. Otherwise, the CLKG board is probably faulty. The board should be replaced in time. If this fault appears during the course of office commissioning, check whether the slot is normal. And check whether the pin 24/25 of the J5 connector are normal, which is located at the corresponding backplane slot. 2. The 8K line clock serves as the clock reference. i. Both active and standby CLKG boards lose the clock reference. Check whether the RCKG1 rear board is loose, whether RJ45 connector of the reference inputting interface is loose and whether the interface board works normally. If the reference is lost during the course of office commissioning, check the connecting line. For example, if the 8K clock reference is extracted through the DTB, check such components as the 8K clock cable, the RCKG1 rear board. ii. The clock reference loss status of the active/standby CLKG board is different. One CLKG board can detect the clock reference signal, but another CLKG board cannot. Therefore, concentrate on the CLKG board and its corresponding slot, the RCKG, the 8K clock reference cable, the clock extracting interface board, and the rear board of the interface board. For how to handle such kind of faults, refer to Handling the Inconsistent Lock Status of Active/Standby CLKG Boards. Handling the Output Clock Loss Overview The output clock loss refers to that the output clock signal is abnormal or lost, which is transmitted from the CLKG board to the UIM or other board in the system. At this time, the board in the system will report the clock loss alarm. Fault Phenomenon When the BITS clock reference is lost, probably appear following two conditions: Solution 42 � The board in the system reports the clock loss alarm. � The signaling link appears intermittent disconnection and even interruption. � The ALM indicator on the board is on. The handling method of output clock loss is listed in Table 6. Confidential and Proprietary Information of ZTE CORPORATION Chapter 3 Clock Faults TABLE 6 HANDLING METHOD OF OUTPUT CLOCK LOSS Phenomenon Method All the output clocks have been lost for a moment Step 1 All the clocks have been lost for a long time Step 2 Several clocks are lost Step 3 1. All the output clocks have been lost for a moment. During the system running, the clock signal is lost suddenly, and then recovers automatically about 10 seconds later. Query the history notification and history alarm information to see the working mode conversion of the CLKG. If the CLKG lock loss happens suddenly and the lock loss status lasts for about 120 minutes, the fault reason can be basically judged as the clock deterioration due to the network fluctuation. The CLKG board tracks the clock signal again so that the clock is temporarily lost during this process. It is necessary to check the network fluctuation reason to prevent this fault from appearing again. 2. All the clocks have been lost for a long time. The clock signal has been lost for more than 10 seconds during the system running. If either the 44-pin connector on the CLKG rear board or the 9-pin connector on the RUIM is not loose, the fault reason is that both the active and standby CLKGs are running in the standby mode due to the abnormality. In this case, check whether the ACT indicator on the CLKG board turns off. If it is, plug/unplug the CLKG board to handle this fault. 3. Several clocks are lost. Generally, the loss of one or several clocks results from the hardware faults. Check the clock cable, the socket connector and the backplane first, and then check the board. The following example describes the handling procedure. i. The CLKG board is connected to the three shelves, labeled as the shelf 1, shelf 2, and shelf 3 respectively. The UIM of the shelf 1 reports the clock loss alarm, but either the shelf 2 or shelf 3 does not. Switch the 9-pin clock cable connector of the shelf without reporting this alarm (such as the shelf 2) with the UIM of the shelf 1 reporting this alarm. ii. If the shelf 1 still reports the clock loss alarm, but either the shelf 2 or shelf 3 does not, restrict the fault location on the RUIM, UIM and slot plugged with the UIM. Then replace the RUIM to eliminate its problem. If the clock loss phenomenon disappears after the RUIM is replaced, the RUIM rear board is faulty. Otherwise, replace the UIM board. If this phenomenon disappears after the UIM is replaced, basically the UIM is faulty. Otherwise, the backplane is faulty. iii. After the UIM board is replaced, the UIM in the shelf 1 does not report the clock loss any longer, but the shelf 2 starts to report this alarm. In this case, troubleshoot the fault on Confidential and Proprietary Information of ZTE CORPORATION 43 ZXWN MGW Troubleshooting the system clock cable, RCKG1, RCKG2, CLKG, and the slot plugged with the CLKG. Replace the system clock cable. If the alarm from the shelf 1 disappears, the system clock cable is faulty. If the shelf 1 still reports alarm after the clock cable is replaced, replace the RCKG1 and RCGK2 rear boards. If the alarm disappears, there is a problem in the rear boards. Otherwise, replace the CLKG board. If the alarm disappears, there is a problem in the CLKG board. Otherwise, the slot plugged with the CLKG board is faulty. Handling the Slip Code Overview The slip code is the error judgment to the data among different offices after running for a period of time due to the minor difference of clock frequency. The slip code is usually related to the clock system, but it does not always result from the CLKG board. The engineering problems or transmission problems often cause this fault. The national standard (GB-12048) defines the slip code alarm as follows: report the common alarm when the slip occurs four times within 24 hours, and the important alarm when it occurs 255 times. When the CLKG board is locked, its frequency difference is above -1×1010, fully complying with the related specifications. Fault Phenomenon Solution � The fault management system reports the slip code alarm. � Some signaling links report the link intermittent disconnection alarm. The handling method of slip code fault is listed in Table 7. TABLE 7 HANDLING METHOD OF SLIP CODE Phenomenon Method The CLKG board works abnormally Step 1 E1 configuration is error Step 2 The inter-office clock synchronous relation is error Step 3 Transmission fault Step 4 Opposite-end office fault Step 5 1. The slip code is caused by the CLKG. When the slip code occurs, check the CLKG working status first. If it is in the trace status, the slip interval is not less than 40 minutes. That is to say, if the CLKG is in the trace status at the same time, the CLKG board can be excluded basically. If the CATCH indicator on the CLKG is constantly on, or the CATCH indicator and TRACE indicator flash synchronously, and 44 Confidential and Proprietary Information of ZTE CORPORATION Chapter 3 Clock Faults the slip interval is about several minutes, the CLKG board is probably faulty. The CLKG board should be checked by using such methods as the active/standby changeover, resetting and replacement. 2. The slip code is caused by the unused E1. If there are some unused E1s that have been connected and configured, these E1s certainly will cause the slip code, and the serious one. Clear all used E1s and delete the configuration data of the unused E1s in the system. In this way, the system will not report the alarms from those unused E1s. 3. The offices interconnected through the E1 have neither direct nor indirect clock synchronous relation. In the whole network, all offices interconnected through the E1 must have the direct or indirect clock synchronous relation. If two offices interconnected through the E1 have no such relation, the slip code probably occurs in the E1 system between these two offices. At this time, it is necessary to check the clock networking relation among different corresponding offices and establish the direct or indirect clock synchronous relation between those offices. 4. The slip code is caused by the transmission. If the clock networking relation is correct and the CLKG works normally, the slip code still occurs. It is necessary to check whether the transmission is normal. Because the normal-working CLKG can only confirm the clock is sent at correct rate. However, the slip code is caused by the deviation of the sending rate from the receiving rate, especially when the clock reference of the CLKG is not extracted from the line, or not transmitted from the opposite-end office where the slip code occurs. 5. The slip code occurs when the opposite-end office is faulty. The slip code also occurs when the opposite-end office clock is faulty. At this time, it is necessary to handle the fault of the opposite-end office in time to eliminate to slip code. Confidential and Proprietary Information of ZTE CORPORATION 45 ZXWN MGW Troubleshooting This page is intentionally blank. 46 Confidential and Proprietary Information of ZTE CORPORATION Chapter 4 Interface Faults Table of Contents Handling MGW-MSCS Interface Fault ....................................47 Handling MGW-MGW Interface Fault.....................................49 Handling MGW-RNC Interface Fault ......................................50 Handling MGW-PSTN Interface Fault.....................................52 Handling MSCS-BSC Interface Fault .....................................54 Handling Service Faults ......................................................56 Handling MGW-MSCS Interface Fault Background Mc interface is the interface between MSCS and MGW. Figure 17 shows its protocol-stack structure. FIGURE 17 MC INTERFACE PROTOCOL STACK The MGW equipment interacts with the MSCS through the Mc interface, which adopts the standard H.248 protocol and supports the binary and text protocol CODEC formats. During the service connecting process, the MGW is applicable to invoke and manage various service resources under the control of the MSC Server. The Mc interface fault will cause the MGW equipment incapable of providing the service. Generally, the Mc interface adopts the IP bearer. Because current network is not all-IP network, the Mc interface protocol stack usually adopts the H.248/M3UA/SCTP/IP mode. The IP physical interconnection between the ZXWN MSC Server and the MGW is implemented by connecting the FE1 interfaces on the rear board of the SIPI board of two NEs. The corresponding SIPI unit is configured Confidential and Proprietary Information of ZTE CORPORATION 47 ZXWN MGW Troubleshooting with the IP address and the SCTP association. The upper-layer is configured with the AS data, and the AS data of SIO location. The signal trail of the Mc interface is shown below: SMP → UIM → SIPI → IP Network → MSCS Fault Phenomenon Solution � All the calls related to the subscribers in local office fail. � The office direction between the MSC Server and MGW is disconnected. � The association cannot be deactivated/activated normally in the Dynamic Management. � The association status is normal, but the statuses of the AS and ASP are abnormal, being non-activated. There is an MGW registration failure message in the platform signaling tracing. � The AS status is normal, but the gateway failed to register on the MGC. 1. Check the physical connection to see whether the change of the connection line causes Mc interface abnormality. If the SIPI and UIM are both active/standby, specially check the line between the SIPI and the UIM. 2. Check the data configuration of the MGW, including the following data. � The adjacent office configuration � IP protocol stack configuration � SIGTRAN data configuration. 3. Check whether the status of the SMP managing the association module and SIPI is normal, whether there is an alarm. If the onsite conditions permitting, restart the OMP, SMP and SIPI to see whether the fault can be eliminated. 4. Use the dynamic management tool provided by the OMM to activate/deactivate the association to see whether the fault can be eliminated. 5. Check the port address and service status of each board, and the routing information on the RPU through the OMM system. 6. In the Dynamic Management interface, check the association status and whether the M3UA route is reachable. 7. If the actual/virtual address of opposite end can be successfully pinged by using the platform tool, but the SCTP connection cannot be established, possibly the broadcast storm occurs to the switch connected with the SIPI boards of the MSCS and the MGW, which causes the Mc interface communication abnormal. To restore the service as soon as possible, use the cross cable to temporarily connect the SIPI boards of the ZXWN MSC Server and the MGW. 8. If the fault still cannot be eliminated by using all available methods, reset the RPU and SMP, transmit all tables again, and finally restart the MP. Instance Analysis Topic: the Mc interface fault results from the disconnection of the signaling link to the MSC Server. 1. Symptom All of the call service cannot be processed. 2. Source 48 Confidential and Proprietary Information of ZTE CORPORATION Chapter 4 Interface Faults The subscriber or carrier tells that the call service cannot be processed. The background reports the link interruption alarm. 3. Related Components Broadband signaling processing board or the MNIC board 4. Fault Analysis and Location The link to the MSC Server is interrupted. 5. Solution Determine whether the fault is caused by the local office or by opposite-end office. If it is caused by the opposite office, ask the opposite-end office to handle the fault. If it is caused by the local office, determine whether it is caused by software or hardware. If it is caused by hardware problem, check the board or optical fiber. If it is caused by software, record the symptom, and analyze the reason. If it is necessary, reset the board. 6. Summery i. In the fault location procedure, determine whether the fault is from the local office or the opposite-end office, and then determine whether it is hardware problem or software problem. ii. When resetting the board, save the relevant fault information for the future fault location. Handling MGW-MGW Interface Fault Background Nb interface is the interface between two MGWs. It is responsible for transferring the media plane information. The bearing mode of the Nb interface is the ATM, TDM, and IP, as specified by the protocol. In the current practical networking, Nb interface mainly adopts TDM bear and IP bearer. � With the bottom-layer adopting the IP bearer, the signaling processing flow of the Nb interface is shown as follows. VTCD (MRB) → UIM → IPI → IP Network → Opposite-end MGW interface board � With the bottom-layer adopting the TDM bearer, the signaling processing flow of the Nb interface is shown as follows: VTCD (MRB) → UIM → DTB (SDTB) → TDM Network → Opposite-end MGW interface board Fault Phenomenon � The call signaling flow is correct, but the call does not have two-way speech. � The fault such as the noise and call interrupted during the call process. � The call failed to be established because the MGW resources failed to be obtained. Confidential and Proprietary Information of ZTE CORPORATION 49 ZXWN MGW Troubleshooting Solution 1. Check whether the data configuration is correct. Frequent data configuration errors are as follows. � The CODEC mode setting error � Interface information configuration error � The VTCD configuration error. 2. Open the Failure Observation to trace the call and find out the fault reason. Instance Analysis 1. Symptom One MSCS is configured with two MGWs. The inter-office communication between two MGWs is unsuccessful after the Nb interface configuration is over. Without playing tones, trace the signaling and find that the call is disconnected directly after it is confirmed, therefore, the call failed. The inter-office call cannot be established. 2. Fault analysis and location i. The inter-office signaling tracing shows that the call is disconnected directly after it is confirmed. The call fails. ii. Implement the test, and find that the communication between two MGWs is normal, which means that the connection from the MGW to the BSC is normal and configured correctly. iii. Analyze that the VTCD configuration processing the CODEC is error, which causes the unsuccessful communication between two MGWs. Check the configuration and find that the VTCD unit is configured not under the SMP module that processes the call but under the OMP module. As a result, the inter-office call is failed. 3. Solution Delete the VTCD unit from the OMP module and configure it under the SMP module, and then restart the VTCD. The Nb interface signaling is clear, and the fault is eliminated. Handling MGW-RNC Interface Fault Background Iu-CS interface is the interface between CN and RNC in CS domain. In the R4 phase, the MSCS processes the control plane of the ATM-borne Iu-CS interface. The signaling is adapted based on the AAL5, and transmitted through the SCCP. The MGW processes the user plane and the bearer control plane of the Iu-CS interface. The ALCAP controls the establishment and release of the user plane connection. The user data is adapted based on the AAL2, and transmitted through the AAL2 connection. In practical networking at present, the signaling interaction between the MSCS and the RNC is switched by the SG built in the MGW. The networking mode and interface protocol stack structure are shown in Figure 18. 50 Confidential and Proprietary Information of ZTE CORPORATION Chapter 4 Interface Faults FIGURE 18 NETWORKING MODE & INTERFACE PROTOCOL STACK STRUCTURE The connection relationship between the MGW and the RNC is the direct-connection through the fiber physically, with bottom-layer bearer ATM. The signaling trail between MGW and RNC is as follows. SMP → UIM → APBE → ATM Network → RNC Fault Phenomenon Solution Instance Analysis � The RNC office is unreachable. � The subscriber calls under RNC all fail. � Subscriber location update service fails. 1. First check whether the APBE board works normally. Check whether the RUN indicator on the APBE is normal and whether the ALARM indicator gives alarms. 2. Check whether the indicator of the fiber-connected optical interface is on. If it is off, check whether the sending/receiving fiber is connected reversely, and whether the connection is correct. 3. After correctly connecting the fiber, check whether the ATM-interface data configurations of local office and RNC are correct. Focus on the PVC configuration. 4. If the office is still unreachable when fiber connection and PVC configuration are both correct, locate the fault with the selfloop method. If the link cannot be activated after looping back the local office, check the configuration data of the signaling link to the RNC, including signaling link, signaling link group, signaling office and signaling route. The links can be activated normally after self looping back RNC and MGW, but the links become abnormal after interconnection. Check whether the data of interconnected ends are consistent, including PVC, SLC, DPC, and OPC. 5. If the office is reachable, and the circuit is normal, but all the subscriber calls under RNC still fail, further check the RNCrelated data configuration. 1. Topic The signaling link of the Iu interface cannot enter into the service status. 2. Symptom Confidential and Proprietary Information of ZTE CORPORATION 51 ZXWN MGW Troubleshooting Being in the Initial Position Status, the signaling link from the MGW to the RNC cannot enter into the Service Status. 3. Fault analysis and location Generally, such fault is caused by two cases: the hardware fault and the ATM configuration problem. i. It is found that the board running indicator is normal, and the board does not give alarms. Printing the information from the foreground is normal. If there are problems on the hardware, following information will be printed. TIMER TCC OUT, SSCOP Send AA_RELEASE_Ind/Conf… or TIMER NORESPONSE OUT, SSCOP Send AA_RELEASE_Ind In…. It indicates that the bottom layer is disconnected or connected in single direction. Then replace the board with the same normal one, but the fault does not be eliminated. So, the hardware fault can be excluded. ii. Check the ATM configuration on the MGW, finding that the data satisfies the requirement. Check the interconnected ATM configuration with the RNC side, finding that the VPI configured at the RNC side is inconsistent with the one configured at the CN side. 4. Solution Modify the VPI configuration data to make the VPI at the CN side consistent with that at the RNC side. The link enters into the service status after modification. Handling MGW-PSTN Interface Fault Background In the R4 networking application, all service interactions between the MGW and PSTN switch are based on TDM bearer connection mode. Since the call part is separated from the control part, the configuration of MGW as a bearer device for interconnection with other devices mainly refers to configuration of bearer resources. Therefore, it only needs to ensure the consistent TDM interconnection of two ends while debugging. In actual networking, there is no physical connection between MSC Server and PSTN switch. Signaling interaction between them is implemented by MGW-built-in SG to forward the signaling message. The ZXWN MGW supports the built-in SG function. Figure 19 shows the networking structure between the ZXWN MSCS and PSTN switch for ISUP message forwarding. 52 Confidential and Proprietary Information of ZTE CORPORATION Chapter 4 Interface Faults FIGURE 19 NETWORKING SUPPORTED) AND PROTOCOL ARCHITECTURE (BUILT-IN SG The MGW and PSTN are connected directly and physically through the E1. The TDM serves as their bottom-layer bearer. The signaling route between the MGW and PSTN is shown as follows. SMP → DTEC → TDM network → PSTN Fault Phenomenon Solution Instance Analysis � The PSTN office direction is unreachable. � Part or all of the circuits are abnormal. � Calls from fixed network are crossed. 1. Check whether the DTEC board works normally. Check whether the RUN indicator on the DTEC board is normal, and whether there is an alarm shown by the ALARM indicator. 2. The crosstalk is usually due to that E1 lines are crossed. Adjust the connection of E1 line to eliminate the fault. 3. In the Dynamic Management interface, check the status of the PCM and CIC, and check the status of the corresponding circuit in the R_CIC table with a probe. Locate the fault according to the status. 4. If the fault is not due to the hardware problems, check the circuit configuration and trunk management configuration on the MSCS to further locate the fault. 1. Topic A CIC circuit to the PSTN is unavailable. 2. Symptom � The call using this link fails. � The status value of the R_CIC is not 0. � � The reset/block operation on the circuit through the dynamic management fails. The platform cannot receive the message from the opposite end after tracing the link signaling. 3. Fault analysis and location i. Gather the service MP printing information about the activating/deactivating association for the management association module of the MGW and MSCS office. ii. Check whether the trunk status is normal, and whether the sub-unit is unblocked. iii. Check the status value of the CICID in the R_CIC foreground table. Confidential and Proprietary Information of ZTE CORPORATION 53 ZXWN MGW Troubleshooting iv. If the status value of CICID is 512, it indicates that the local end is blocked. Make sure that the PCM system number and signaling timeslot are correct, and the trunk board is in normal status. Use the light-emitting diode on the DDF to ensure there are circuits on the receiving and sending channels. v. Unplug the trunk line from the trunk board, and plug the line again after 30 seconds later. The fault is eliminated. Handling MSCS-BSC Interface Fault Background The MGW provides the bearer for the A interface traffic with the TDM connection mode. In the R4 phase, ZXWN MSCS integrates with ZXWN MGW to serve as MSC in R99 (this mode is called MGW built-in mode). There is only one MSC (one NE) at the BSC side. The A interface is the interface between MSC Server and BSC. Figure 20 shows the adopted protocol stack. MSC Server processes all of the control messages of the A interface. This interface implements the functions unrelated to the bearer part, including subscriber mobility management, BSS access, control plane processing of the call service and SMS service. FIGURE 20 A-INTERFACE PROTOCOL STACK STRUCTURE When MGW built-in networking mode is adopted, ZXWN MGW and ZXWN MSCS use the same SP. In this case, there is no physical connection between ZXWN MSCS and BSC, the BSSAP protocol between them is transferred through ZXWN MGW. Figure 21 shows the networking mode. In this networking mode, if the MTP3 upper-layer user is configured in the MGW, the SCCP cannot be configured, because the MSCS and MGW use the same SP. The MGW needs to forward all of the messaged to the MSCS, but the BSSAP is above the SCCP. If the SCCP part is configured to the MGW, the BSSAP message cannot be forwarded to MSCS correctly. Therefore, the ZXWN MSCS and ZXWN MGW are combined to form a MSC NE. 54 Confidential and Proprietary Information of ZTE CORPORATION Chapter 4 Interface Faults FIGURE 21 MGW BUILT-IN NETWORKING MODE STRUCTURE AND ITS PROTOCOL For ZXWN MGW, the boards processing the narrow-band No.7 signaling protocol are shown in Figure 22. FIGURE 22 BOARDS RELATED PROTOCOL Fault Phenomenon Solution Instance Analysis TO NARROW-BAND NO.7 SIGNALING � The BSC office direction is unreachable. � Part or all of the circuits are abnormal. � Talks are crossed. 1. Check whether the DTB board works normally. Check whether the RUN indicator on the DTB board is normal, and whether there is an alarm shown by the ALARM indicator. 2. The crosstalk is usually due to that E1 lines are crossed. Adjust the connection of E1 line to eliminate the fault. 3. On the Dynamic Management window, check the status of the PCM and CIC, and check the status of the corresponding circuit in the R_CIC table with a probe. Locate the fault according to the status. 4. If the fault is not due to the hardware problems, check the circuit configuration and trunk management configuration on the MSCS to further locate the fault. 1. Topic The call completion ration to a certain BSC is low. Confidential and Proprietary Information of ZTE CORPORATION 55 ZXWN MGW Troubleshooting 2. Symptom i. Calls are hard to connect, only few of which can be put through. ii. Gather the fault information iii. Gather the networking condition to check whether the A interface of the BSC can be forwarded to the MSC Server by the MGW. iv. Gather the performance statistic data of the OMM before and after the fault occurs, such as m3ua, mtp3, and signaling link statistics. v. Save the stored alarm information before and after the fault occurs. vi. Gather the call loss of the service and signaling, including MM, VLRMAP, MSCMAP, and BSSAP. vii.If there is printing information, save the information printed before and after the fault occurs. viii.If there are a lot of signaling tracing records before and after the fault occurs, save them to the utmost. ix. Gather and save the traffic statistics before and after the fault occurs. x. Gather the records of the operation to the foreground before and after the fault occurs. 3. Analysis i. By tracing the signaling, it is found that for the calling party, the network returns the call proceeding message after receiving the setup message. A moment later, the network sends the disconnect message. But the called party is hard to trace the page rsp message. ii. Perform statistics on the sending and receiving messages of the A interface signaling link through the OMM system. The sending and receiving messages of each signaling link are not balanced. The sending and receiving ratio is seriously imbalanced. iii. Observe the call loss records. Most of them are due to the paging has no response or calling party releases the call abnormally. iv. It can be inferred that the problem lies in the message forwarded from the M3UA to the MTP3 at the MGW side. A lot of messages are lost. Reset the standby board of the SMP Handling Service Faults Background 56 The MGW provides the call-independent service bearing function and implements the service bearing conversion and service flow format processing under the control of the MSC Server. Confidential and Proprietary Information of ZTE CORPORATION Chapter 4 Interface Faults The basic service fault of the MGW refers to that the equipment cannot either provide corresponding bearing function or implement the bearing service conversion and service flow processing. This section describes the handling of service faults by using a specific example. Instance 1 1. Topic The circuit is in abnormal status. And the CIC is not in idle status. 2. Symptom Query the local end status of the CIC circuit on the MSCS. The status is “ERRORREQ”, indicating the request message is illegal. 3. Fault Analysis and Location The CIC circuit is related to the voice channel. For the R4 networking, the signaling is separated from the voice channel, that is, the MSC controls the signaling part, while the MGW carries the voice channel. Therefore, focus on checking the MSCS data configuration related to the MGW. 4. Solution i. Check the office directions and links from the MSC to the MGW, AS, and ASP. Find that all of them are in normal status. ii. When querying the static data of the MGW, find that they are not configured. Configure these data again. But the command fails, prompting that “Language description template number does not exist. Configure the language description first.” Therefore, check the configuration related to the tone. A tone type of the language description exists in “Batch create MSCS tone”, but it is not configured in the data configuration. iii. Select ALL for the tone types in the “Batch create MSCS tone”. Add MGW static data again. The command is executed successfully. iv. Synchronize the data. Check the status of the CIC circuit, and find that it is in IDLE status normally. 5. Summary i. When creating MSCS tone in batch in the MSCS, the default type of tone is TONEID, other types such as LANGSTR are not default. For practical configuration, it is not enough to configure these data with only the TONEID type. Therefore, it needs to select other tone types according to the actual condition. In this way, the command can be executed successfully. ii. In addition, the OMM system provides the batch command to speed up the data configuration. But it needs to carefully check whether every command can be executed successfully after implementing each batch processing. If the command is executed unsuccessfully, carefully check where the problem is located, and then execute this command separately. Instance 2 � Topic Confidential and Proprietary Information of ZTE CORPORATION 57 ZXWN MGW Troubleshooting CIC timeslot configuration error causes low call completion ratio of wireless system. � Symptom This fault occurs in a soft switch project. On the OMM system, the performance index “call completion ratio of wireless system” under the soft switch is lower than that under other MSC about 4%~6%. It does not meet the specified standards. The onsite networking mode is MSCS-MGW-BSC. � Fault Analysis and Location Analyze the formula used by the operator for calculating the call completion ratio of wireless system, and find that the “success rate of service channel allocation (changeover excluded)” parameter is lower than normal standard. The calculation formula of this parameter is: “Times of successful service channel allocation”/“Times of requests for service channel allocation”×100%. The operator takes the times of BSC receiving an “AssigReq” assignment request message from MSC as that of requests for service channel allocation, and the times of BSC sending an “AssignCmp” assignment completion message to MSC as that of successful service channel allocation. Obviously, the lower success rate of service channel allocation results from more assignment failure. Analyze the BSC-provided CICs failed to be allocated, and find that these CICs are all corresponded by 16-timeslot in E1 of A-interface. After querying, find that the BSC engineer deleted all CICs corresponded by 16-timeslot of E1 when unblocking the A-interface circuit. � Solution At the MSC side, delete the BSC-provided CICs corresponded by this part of 16-timeslot. The performance index is recovered. � Summary Negotiate interconnecting data with adjacent office during signaling and circuit commissioning, including signaling point code, SLC, signaling-located timeslot, PCM start number, and other data. 58 Confidential and Proprietary Information of ZTE CORPORATION Chapter 5 OMM System Faults Table of Contents Handling OMM Abrupt Abnormality.......................................59 Handling Virus/Security Events............................................62 Analyzing Instance 1..........................................................64 Analyzing Instance 2..........................................................64 Analyzing Instance 3..........................................................65 Analyzing Instance 4..........................................................66 Analyzing Instance 5..........................................................67 Analyzing Instance 6..........................................................68 Analyzing Instance 7..........................................................70 Analyzing Instance 8..........................................................71 Handling OMM Abrupt Abnormality Fault Phenomenon Flow Diagram � No clients can log in the system. � The OMM cannot connect to the primary NEs. � The OMM cannot execute various man-machine commands. Figure 23 shows the flow of handling the abrupt abnormality of the OMM system. Confidential and Proprietary Information of ZTE CORPORATION 59 ZXWN MGW Troubleshooting FIGURE 23 HANDLING Solution THE ABRUPT ABNORMALITY OF THE OMM SYSTEM 1. Update fault of the OMM system The OMM system fails to be started after being upgraded. � � � Check whether the software packet is correct, such as the software name, version ID and packet bytes. Check whether the abnormal start of the OMM system is due to the abnormal modification to ums-svr\deploy\deploy*.xml or ums-clnt\deploy\deploy*.xml. The solution is to replace deploy*.xml with the correct one. In the OMM software running version, check whether the key file or directory has been deleted. Start the OMM software, and carefully view the output information of the command, to find out the lost file according to the prompt. Copy the corresponding file in the version backup position to current running position, and then restart the OMM system. 2. Network fault � 60 There are errors in the configurations of the OMM IP address, subnet mask or network gateway, which cause the Confidential and Proprietary Information of ZTE CORPORATION Chapter 5 OMM System Faults network fault and make NEs unable to establish communication. � � � � Check the above-mentioned configurations carefully, correct the error configuration. And then check whether the network fault has been eliminated through the Ping command. Enter the netstat -an command to check following ports of the OMM server: 21, 23, 1521, 5000~5006, 5057 and 21099~21114. Check whether these ports are blocked by the firewall. Search for “port” in the file ums-svr\deploy\deploy-default.properties, and find the list of all ports needed by the current version. And then check whether some listed ports are blocked in the firewall configuration table. Unblock these ports if they are blocked. The network port, network card, network cable or network equipment of the carrier is faulty, and the Ping command cannot be used to successfully ping the extra-network from the OMM server. By using such methods as the Ping packet, Trace command, network cable or equipment replacement, check the fault reason and replace the faulty equipment. 3. Database fault � � � Check the listening configuration of the ORALCE server. Run the lsnrctl command status in the DOS window of the server, and then check whether the listening service configuration is correct and the database instance runs normally. If the listening service is abnormal, modify the configuration by using the NET MANAGER tool of ORACLE, and restart the listening service. Test whether the RACLE instance used by the OMM software works normally through the sqlplus tool of ORALCE. If the instance is abnormal, use the ORACLE Enterprise Manager to restart the database instance, modify the operation, and recover the database instance. Check whether the IP address of each database server is correctly configured in the ums-svr\deploy\deploy-default.properties file. Modify the IP address when it is inconsistent with the one in the ORALCE database. 4. Other faults It is found that the free space of disk partition where the OMM system software is installed is less than 300 MB. Clear this disk partition to keep its free space above 500 MB. The way is mainly backing up and clearing the system log and the statistical information. Back up the old system log and statistical information to other hard disk, or store them to the external medium, and then delete them. Of course, the trash file in the operating system also should be cleared. Confidential and Proprietary Information of ZTE CORPORATION 61 ZXWN MGW Troubleshooting Handling Virus/Security Events Fault Phenomenon Flow Diagram � The OMM system finds that an anonymous IP user logs in the system and modifies the data, or that the host where the OMM system is located suffers from the login, intrusion or attack from illegal user. � The virus occurs in the OMM system or the network where the OMM system is located. Figure 24 shows the flow of handling the virus/security events. FIGURE 24 HANDLING VIRUS/SECURITY EVENTS Solution 1. Check whether a virus or bug exists in the OMM system. � � 62 Scan the OMM system and kill the virus. One or several antivirus software is available, such as Norton, Symantec, MacAfee, KV3000 or Rising. Perform analysis and check whether the security bug exists in the premises network through various network security analyzing tools, such as Network Analyzer, Sniffer and Ag- Confidential and Proprietary Information of ZTE CORPORATION Chapter 5 OMM System Faults ilent. Eliminate such hidden security trouble based on the actual network condition. 2. Check whether there are security problems in the OMM system. � � � After checking the login logs of the OMM system, it is found that the user with illegal IP address once logged in the system. Delete the illegal user name and password of the OMM system. It is found that the system configuration information has been modified illegally. For example, the configuration about NE access has been changed, which was not performed by the inner personnel. Check and recover the modified data configuration in the system. After checking the user name and password logging in the NE system, it is found that there is a new user with administrator authority, or that a user announces that original login user name and password cannot be used for unknown reason. Modify the user password of the root authority or the Admin authority. It should be noted that the password must have certain security intensity, such as not less than eight digits, containing letters, numbers and being case sensitive. 3. Check whether the security problems exist in the database. Open Enterprise Manager Console to check whether any illegal user exists in the administrator group. If any illegal user exists, delete it in the Enterprise Manager Console. 4. Check whether the security problems exist in the OMM server computer. � � On the User tab of the Task Manager (For Windows 2003), anonymous user simultaneously logs in the system. If this user is logging in the system, right-click and select Break. When necessary, pull out the network cable to temporarily disconnect the network, and then perform the following operations: Click My Computers > Manage > System Tools > Local User and Group > User to check whether there is any new administrator user. If there is, delete it. Select Administrator, right-click and select Set Password to modify the user password. It should be noted that the password must have certain security intensity, such as not less than eight digits, containing letters, numbers and being case sensitive. 5. Check whether the security problems exist in the OMM network firewall. � Log in the firewall with administrator authority, to check whether there are illegal IP login records. If the records exist, delete the corresponding illegal user’s login information. And then modify the current administrator’s login password. It should be noted that the password must have certain security intensity, such as not less than eight digits, containing letters, numbers and being case sensitive. If the administrator cannot log in the firewall, log in the firewall again after restarting and resetting the firewall. Confidential and Proprietary Information of ZTE CORPORATION 63 ZXWN MGW Troubleshooting � Check the firewall, including the access control list, static route configuration and port settings. Recover the original configuration items of the firewall. Analyzing Instance 1 Topic The client cannot access the server because of the system log full. Symptom Sometimes the client cannot connect to the server after being restarted. After logging in the client, the topology tree cannot be displayed on the client, and the left directory tree is empty. But the connection between the client and the server is normal. Fault Analysis and Location Fault Handling � After checking the server, it is found that the process is not started. While restarting the server, all processes cannot be started. � After the check, it is found that the free space of partition C is 0, and there are 4G log files. The disk-full error causes the OMM fault. After deleting old logs, the server restarts normally. It is recommended to set reasonable log-clearing mechanism for the OMM server and clients, automatically clearing the old logs. Analyzing Instance 2 Topic Symptom Fault Analysis and Location The client fails to connect to the server. After installing the OMM server and client (both can be installed on the same computer), the system prompts that Failed to connect the server after inputting the IP address of the server on the client. There are several reasons. � The server IP address inputted is incorrect. Handling: check the IP address of the server carefully. � The physical connection between the client and the server is not clear. Handling: Check whether the channel between the client and server is clear with the Ping command. Check the physical connection between the server and client, to assure the normal physical connection. � The port of the ftp service at the server side conflicts with that of the IIS service. Handling: Stop the IIS service in the Services, and set the IIS service type as Disable. The OMM system of ZXWN MGW does not need the IIS service. The activation of the IIS service often results in the port conflict and affects the normal login of the client. � 64 Other reasons Confidential and Proprietary Information of ZTE CORPORATION Chapter 5 OMM System Faults Handling: The OMM system requires a server with relatively high performance. The poor performance of the server will possibly cause the login failure of the client. Summary 1. Manually deleting the version file in the File Management possibly causes the foreground version database table inconsistent with the file. It is recommended to directly delete the version file through the Version Management instead of File Management. Otherwise, the foreground system possibly cannot obtain the version during the restarting. 2. The parameters are incorrect when performing the physical configuration for the board, which causes version load failure. For example, it must be clear whether the T network, CPU and the GXS sub-card exist when configuring the UIM board parameters. 3. It is required to re-allocate the address of the OMM ftp server after upgrading the OMM server. Analyzing Instance 3 Topic The OMM fault results from the incorrect settings of the timing task. Symptom When an MSC Server is creating the data of the MGW managed by this MSC Server, the response of the OMM client is too slow to modify and add data, which will last for about 10 minutes. Fault Analysis and Location 1. This problem occurs frequently since a certain time (April 4). Therefore, it is doubted that the data were modified at that time. Search the operation logs of April 4, but there are too many operations in the logs. So search the fault with other method. 2. Observe the CPU load when the fault occurs. The CPU occupancy is very high. It is confirmed that the fault results from the periodic or aperiodic running of a process. 3. Check the occupancy of the disk space by the OMM server. There are five files with the extensions of DMP. Four of them are larger than 1500 Mb, which are created by an unknown process in the morning. The fault occurs again at about 14:20 p.m., and the size of the fifth file increases constantly and dramatically, close to 500 Mb. From this phenomenon, it is inferred that this fault occurs periodically. 4. Check the system management in the OMM system, only the performance statistics data are set with the functions of data backup, export and automatic deletion. And the Data Backup policy is enabled for all of the policies in the Policy Management. This policy was modified on April 4 with a cycle of 27 hours. Check the entries of the policy management in the log files, finding that an operator set the policy on that day. It is confirmed that this policy results in the problem. 5. Suspend the data backup policy, and observe the running of the system for two days. This fault does not occur again. Summary As a useful data backup tool, the data backup policy in the policy management is used to perform data backup for the database of data configuration in the OMM system. But the data backup operation occupies high CPU of the OMM server. Therefore, execute Confidential and Proprietary Information of ZTE CORPORATION 65 ZXWN MGW Troubleshooting such kind of task in the early morning or a period of time without frequent operations. Analyzing Instance 4 Topic Symptom Fault Analysis and Location The EMS fails to get the performance measurement file for OMM FTP parameter setting errors. The EMS fails to get the performance file reported by the CS. The soft-switch reports a notification to the EMS, informing that the performance files are ready. However, the IP address contained in the performance files is wrong, which is set as 127.0.0.1. It results in that the EMS fails to get the performance files of the CS. To configure the IP address through the tool in the tools\config\ directory on the OMM server, perform the following steps. 1. Run the run.bat in the tools\config\ directory, to pop up the Config the Environment of Software dialog box, as shown in Figure 25. FIGURE 25 SETTING CORBA FTP 66 Confidential and Proprietary Information of ZTE CORPORATION Chapter 5 OMM System Faults 2. On the CORBA FTP tab, input the big network address (that is, the IP address for the OMM communicating with the EMS). 3. Delete TEMP files. 3. Delete TEMP files. Result: The fault is solved after the OMM system is restarted. Summary If selecting to install the CORBA during the installation of OMM system, it needs to set the FTP address of the CORBA at a later stage of the installation. The default address is 127.0.0.1. To avoid this error, modify it to the address used for communicating with the EMS. Analyzing Instance 5 Topic Symptom The OMM server fails to start. The OMM server fails to start. Check the log file server-start.log on the OMM server in the X:\ZXWN-OMCS\zxwomcs\ums-svr\log directory. The following information is shown in the file. Starting failed RuntimeErrorException: Error in MBean operation ’start()’ Cause: java.lang.Error: test 3 times ftp server fail Fault Analysis The FTP server fails to start during the startup of the OMM server because the port 21 was occupied. There are following two cases. 1. Other FTP server was started, probably the IIS carried by the operating system. 2. During the startup of the server, the FTP process was not stopped normally. The engineer checks whether there is a JAVA process in the Task Manager, and closes the Oracle HTTP server in the Services. Solution 1. 2. 3. 4. 5. Summary Click Start > Run, and type the CMDcommand. The Command Promptwindow appears. Type the netstat -an|findstr 21 command to view whether the port 21 is occupied. Select Control Panel > Administrative Tools > Services to pop up the Services dialog box. Stop the IIS service carried by the Windows operating system. Restart the OMM server. � Directly execute the run.bat (\ZXWN-OMCS\zxwomcs\umssvr\bin\run.bat) to start the OMM server for viewing more startup information. � Close the FTP server before startup. Execute the netstat -an|findstr 21 command to check whether the port 21 is occupied. Confidential and Proprietary Information of ZTE CORPORATION 67 ZXWN MGW Troubleshooting Analyzing Instance 6 Topic Symptom Fault analysis and location The OMM client fails to connect the OMM server, and fails to log in the database with the sqlplus uep/uepuep@omc command, of which the omc is a database instance. The OMM client fails to connect the OMM server. 1. Collect files. {ORACLE_HOME}\admin\{ORACLE_SID}\bdump\*.trc {ORACLE_HOME}\admin\{ORACLE_SID}\udump\*.trc {ORACLE_HOME}\admin\{ORACLE_SID}\cdump\*.trc {ORACLE_HOME}\admin\{ORACLE_SID}\pfile\*.* admin\{ORACLE_SID}\bdump\alert_{ORACLE_SID}.log Note: ORACLE_HOME refers the installation path of the ORCLE, which is usually installed in the D:/ORACLE directory on the OMM server. Through gathering files, it is found that the records in the alert_{ORACLE_SID}.log are as follows. Wed Sep 12 00:39:07 2007 KCF: write/open error block=0x1055 online=1 file=2 D:\ORACLE\ORADATA\OMC\UNDOTBS01.DBF error=27072 txt: ’OSD-04008: WriteFile() Failed, unable to write files to it O/S-Error: (OS 1453) Quota insufficient, unable to complete the required service’ Wed Sep 12 00:39:07 2007 Errors in d:\oracle\admin\omc\bdump\omc_ckpt_2120.trc: ORA-00202: TROL03.CTL’ controlfile: file ’D:\ORACLE\ORADATA\OMC\CON- ORA-27091: skgfqio: unable to queue I/O ORA-27070: skgfdisp: async read/write failed OSD-04006: ReadFile() Failed, unable to write files to it O/S-Error: (OS 1453) Insufficient quota exists to complete the required service Wed Sep 12 00:39:07 2007 Errors in d:\oracle\admin\omc\bdump\omc_dbw0_1612.trc: file ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode ORA-01114: IO error writing block to file 2 (block # 4181) 68 Confidential and Proprietary Information of ZTE CORPORATION Chapter 5 OMM System Faults ORA-01110: data file 2: DOTBS01.DBF’ ’D:\ORACLE\ORADATA\OMC\UN- ORA-27072: skgfdisp: I/O error OSD-04008: WriteFile() Failed, unable to write files to it O/S-Error: (OS 1453) Insufficient quota exists to complete the required service DBW0: terminating instance due to error 1242 Wed Sep 12 00:39:09 2007 Errors in d:\oracle\admin\omc\bdump\omc_lgwr_2084.trc: file ORA-00345: redo log write error block 45901 count 2 ORA-00312: online log 1 thread 1: DATA\OMC\REDO01.LOG’ ’D:\ORACLE\ORA- ORA-27072: skgfdisp: I/O error OSD-04008: WriteFile() Failed, unable to write files to it O/S-Error: (OS 1453) Insufficient quota exists to complete the required service Wed Sep 12 00:39:31 2007 Errors in d:\oracle\admin\omc\bdump\omc_ckpt_2120.trc: file ORA-00204: error in reading (block 1, # blocks 1) of controlfile ORA-00202: TROL03.CTL’ controlfile: ’D:\ORACLE\ORADATA\OMC\CON- ORA-27091: skgfqio: unable to queue I/O ORA-27070: skgfdisp: async read/write failed OSD-04006: ReadFile() Failed, unable to write files to it O/S-Error: (OS 1453) Insufficient quota exists to complete the required service Wed Sep 12 00:39:31 2007 Errors in d:\oracle\admin\omc\bdump\omc_lgwr_2084.trc: file ORA-00340: IO error processing online log 1 of thread 1 ORA-00345: redo log write error block 45901 count 2 ORA-00312: online log 1 thread 1: DATA\OMC\REDO01.LOG’ ’D:\ORACLE\ORA- ORA-27072: skgfdisp: I/O error OSD-04008: WriteFile() Failed, unable to write files to it O/S-Error: (OS 1453) Insufficient quota exists to complete the required service Preliminarily, judge the fault results from insufficient table space. 2. In DOS, type the sqlplus “sys/omc@omc as sysdba” command, and find that the 9205-patch is not installed. The oracle version is still 9201. Confidential and Proprietary Information of ZTE CORPORATION 69 ZXWN MGW Troubleshooting Note: Adjusting database parameters and installing 9205-patch are recommended. 3. The adjusted database parameters are as follows. sqlplus "sys/omc@omc as sysdba" alter system set processes=300 scope=spfile; alter system set timed_statistics=FALSE scope=spfile; alter system set aq_tm_processes=0 scope=spfile; alter system set shared_pool_size=167772160 scope=spfile; alter system set java_pool_size=167772160 scope=spfile; alter system set large_pool_size=67108864 scope=spfile; alter system set db_cache_size=209715200 scope=spfile; alter system scope=spfile; set pga_aggregate_target=167772160 alter system set undo_retention=1800 scope=spfile; alter system set log_buffer=1048576 scope=spfile; show parameters pfile; 4. Close the database instance. shutdown immediate 5. Restart the database instance again. startup Summary The oracle is not update to version 9205, which may cause some uncertain factors. Therefore, it needs to confirm that the patches are already installed for OMM database and user database, and the OMM database is upsized successfully. Analyzing Instance 7 Topic OMP fails to communicate with the OMM server. Symptom OMP cannot communicate with the background server during MSC Server debugging. Fault analysis and location 1. The engineer cannot ping the IP address of the foreground OMP from the background server. However, the OMP is started and runs the version normally. 2. The engineer checks the IP configuration of the OMP, which is in the same network section with the background server. And both of them has same mask. In normal circumstances, the IP address of the OMP should be pinged through. 3. The engineer checks the FE interface of the switch, and finds both status indicators and network cables are normal. 70 Confidential and Proprietary Information of ZTE CORPORATION Chapter 5 OMM System Faults 4. The engineer checks the switch settings, and finds that it has been separated into several VLANs, but that the network interface of the background server is not in the same VLAN with that of the OMP. Solution The engineer adjusts the network cable of the OMP to make it use the same VLAN with that of the OMM server. The communication is normal. Summary IP communication network requires appropriately planning the IP address, and separating the VLAN for the switch. Analyzing Instance 8 Topic Opening two OMM clients causes the failure of querying performance statistics results. Symptom After starting two clients, an engineer fails to query the performance statistic results for unknown error. Fault analysis and location 1. The engineer deletes the temporary files in the temp folder. After that, the performance query is successful. But the fault appears again very soon. 2. The problem always occurs when two OMM clients are started. In this project, MSCS and MGW are located at different places. Each has its own server side. Fault Cause When one OMM client is started, some temporary files will be automatically written in the TEMP folder. Meanwhile, these files will also be resident in the memory. When another client is started for performing performance query, the system will also write some temporary files in the TEMP folder. Since some temporary files with the same name are used by the OMM client that is started previously, the system implements write-protect for these files. This results in write failure, which causes the query to be terminated abnormally. Solution Close one of OMM clients. The performance query may be performed normally. Confidential and Proprietary Information of ZTE CORPORATION 71 ZXWN MGW Troubleshooting This page is intentionally blank. 72 Confidential and Proprietary Information of ZTE CORPORATION Chapter 6 Interconnection Faults in IP-Bearer Network Table of Contents Handling Continuous Call Loss Generated for Broken Receiving Fiber of Soft-Switch ......................................................73 Handling Call Loss Generated by Soft-Switch after CE Restarts ...........................................................................74 Handling Soft-Switch Failing to Ping through CE.....................75 Handling Continuous Call Loss Generated for Broken Receiving Fiber of Soft-Switch Fault Description When the receiving fiber at the soft-switch side is broken, high call loss appears. At the same time, there are about 300 online calls in stable status. However, there normally should be about 600 online calls. The networking structure is shown in Figure 26. The media plane interfaces adopt the load-sharing mode, and enable the BFD fast detection with the CE. The BFD is used to bind the static route. When the receiving fiber is interrupted at the MGW side, it can disable corresponding outgoing routes through this optical interface by using the BFD function. When the receiving fiber is interrupted at the MGW side, the ZTE soft-switch system can make the external port down through the board on which no optical signal is input, to quickly delete the outgoing route through this optical interface. Confidential and Proprietary Information of ZTE CORPORATION 73 ZXWN MGW Troubleshooting FIGURE 26 INTERCONNECTION BETWEEN MEDIA PLANE NETWORK AND BEARER Analyzing and Processing 1. After a fault occurs in the media plane interface of the MGW, immediately disable the route through the faulty interface. The method is: input the SHOW IP ROUTE ALL command on the OMM client to check all the IP V4 route entries. The static route through the MGW’s faulty interface is already down, indicating that the MGW processing is correct. 2. Check whether the CE side correctly handles the static route after the BFD is down. The CE engineer logs in the router, and checks the route table. He/she finds that the static route through the faulty port is still in Active status. He/she confirms that the BFD is not bound with the static route. Fault Cause When the interface link is faulty, both the soft-switch equipment and the CE must disable the route through the faulty interface through related detective mechanism. Otherwise, it will cause the data traffic, and affect the services. Solution The fault is handled after the CE engineer modifies the data configuration. Handling Call Loss Generated by Soft-Switch after CE Restarts Fault Description 74 The soft-switch equipment interconnects to another vendor’s CE in the load-sharing mode, as shown in Figure 27. After the CE1 is powered off, all the outgoing traffic of the soft-switch equipment Confidential and Proprietary Information of ZTE CORPORATION Chapter 6 Interconnection Faults in IP-Bearer Network passes through the CE2. There are about 800 call loss records during the power-on of the CE1. FIGURE 27 INTERCONNECTION BETWEEN SOFT-SWITCH AND CE Analyzing and Processing 1. When the call loss is generated during the power-on of the CE, ping the CE gateway from the ZTE soft-switch network management platform. The CE gateway can be pinged through, but fails to ping the opposite soft-switch through the bearer network. 2. Detect the opposite soft-switch equipment of the IP bearer network through the Trace tool, and find that the first hop can be transmitted to the CE, but the subsequent routes fail to reach the CE. Fault Cause When the CE is powered on, probably the port between the CE and the soft-switch is restored at first, and then the BFD, but the dynamic route of the IP bearer network is not restored. In this way, the data traffic of the soft-switch will be sent to the CE, resulting in the packet loss and bringing serious influence on the services. Solution The CE engineer adjusts the power-on sequence of the CE port to make sure that the internal routes of the IP bearer network restore first after the CE is powered on. Handling Soft-Switch Failing to Ping through CE Fault Description The ZTE soft-switch equipment is interconnected to another vendor’s CE. During the call, the soft-switch fails to ping through the CE address after the electric interface between the MSCS and CE is restored after physical disconnection. Confidential and Proprietary Information of ZTE CORPORATION 75 ZXWN MGW Troubleshooting Analyzing and Processing 1. After the laptop is directly connected with the signaling interface board of the soft-switch equipment, the laptop can ping through the SIPI interface address. 2. After the laptop is directly connected with the CE, the laptop cannot ping through the interface address of the CE. By capturing the package with the WireShark, the engineer only finds the ARP request sent out, but does not receive the response from the CE. The above-mentioned analysis shows that the fault is not located at the soft-switch side. The CE engineer checks the data, and finds that the CE’s electric interface cannot be configured as Enforced. However, the CE engineer said the electric interface of the CE is in enforced mode when he/she negotiated the data with us. 76 Fault Cause Inconsistent electric interface configuration will cause the softswitch failing to ping through the CE. Solution On the soft-switch network management platform, the engineer modifies the soft-switch’s electric interface to auto-negotiation mode, and the fault is solved. Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults Table of Contents Common Voice Faults .........................................................77 Troubleshooting Ideas and Common Methods ........................78 Echo Fault Handling ...........................................................82 Monolog Fault Handling ......................................................89 Both-Way Silence Fault Handling..........................................90 Noise Fault Handling ..........................................................92 Cross-Talking Fault Handling ...............................................92 Instance Analysis ..............................................................93 Common Voice Faults Overview Voice Fault Types In general case, voice faults result from bearer problems. In special case, other problems, such as signaling compatibility and abnormal parameter processing made by control plane, will also cause voice faults. Or, handover or other services happens after the subscriber enters the stable status, which will also result in voice faults. For these cases, analyze signaling to find corresponding cause. There are the following common voice phenomena. � Echo Speaker hears his/her own and the opposite’s voice at the same time in telephone. � Monolog The local party can hear the opposite voice for a period of time, but the opposite party cannot hear the local party’s voice during a call. Monolog divides into long-time monolog and instantaneous monolog. Long-term monolog refers to the monolog lasts for a long time, and cannot restore. Instantaneous monolog lasts for a short time. Usually, a call becomes normal after two to five seconds. Monolog is different from voice intermittence. Voice intermittence usually results from poor quality at wireless side. Because of this, the listening sound is discontinuous, but with very short interval. � Both-way silence Calling and called parties cannot hear the opposite’s voice, but can hear his/her own echo sometimes. Confidential and Proprietary Information of ZTE CORPORATION 77 ZXWN MGW Troubleshooting � Noise (gabbling call) The voice quality is poor during a call, sometimes along with metallic sound, iron forged sound, jangle, other noises. These abnormal sounds are discontinuous, abrupt, and short. � Cross-talking Calling or called party may hear voices, ring-back tone, and other sounds from other person during a call. Voice Fault Analysis Generally, voice faults are probably caused by CN NE, or wireless network (BSC/BTS). Sometimes, they are associated with mobile phones. In CN, the common fault points are as follows. � Abnormal signaling processing for incompatible signaling � A/IU interface trunk or inter-office trunk � UIMT/TFI/TSNB board fault � Fiber or fiber module between UIMT and TFI � VTCD board or some DSP is faulty during IP calls. In addition, EDRT board of BSC, BTS, mobile phone, and wireless environment may also cause tone faults. Troubleshooting Ideas and Common Methods This section describes troubleshooting ideas and common methods. Troubleshooting Ideas The most crucial point for troubleshooting voice faults is to make sure which NE in the network causes the fault firstly. If it is difficult to locate the fault quickly, analyze and process it by referring to the following ideas. If not sure which NE causes the fault, you should coordinate with engineers at wireless side and service personnel from other exchanges to troubleshoot the fault together. � Know subscriber’s complaints. Before troubleshooting the fault, you must clearly know detailed information about complaints. Focus on the following information. � Calling and called numbers � Calling and called locations (located exchange and BSC) � 78 Occurrence probability of the fault (whether it occurs during every call or occasionally) Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults � � Time of the fault occurs (whether it occurs in ring-back phase, when the call is just completed connected, or during a call). Know the operations before and after the fault occurs. Know the operational conditions of local NE, wireless equipment, and other NEs in the whole network before and after the fault, such as: � � � � Capacity expansion is implemented for A-interface or interoffice trunk. Engineering personnel changed trunk jumpers in the equipment room. The EFR function is enabled at wireless side. Fault reproduction Fault reproduction is used to make a fault occur again through many times of call quality tests when service personnel cannot locate the fault immediately after knowing complaints. This is to find the common points so as to find the fault source. The following aspects require attention when reproducing a fault. � � � � Respectively record the signaling when the voice-quality problem occurs and when the voice is normal. Record the probability of this problem occurring. Record the call model and information of involved NEs when this problem occurs. Fault Analysis Synthesize the collected information from the three aspects mentioned above, and analyze the fault. Common Methods for Locating Faulty NE The following methods are frequently used for locating the NE(s) that results in the fault. You may troubleshoot the fault with one or several methods. Exclusion Method The voice quality problem is complex. In most cases, it results from the combined influence of several NEs. To reduce the troubleshooting range, you may preliminarily locate the faulty NE through dialing test or loopback test. There are two methods of excluding NEs. Perform dialing test for specified services, offices, and resources (CIC or TC) to locate the faulty NE. The general principles are as follows. � Calls are normal in the local BSC, which proves that BSC processing, transmission from MGW to BSC, and some functions of MGW are normal. Confidential and Proprietary Information of ZTE CORPORATION 79 ZXWN MGW Troubleshooting Loopback between NEs � Calls cannot be connected on some BSCs in a same MGW. It is doubted that these BSCs process calls abnormally, or A-interface trunk works abnormally. � Local office call is normal, but inter-office call is abnormal. Probably, the fault results from the processing problem of inter-office trunk or opposite-end office. � Local office plays ring-back tone abnormally, which becomes normally after calls. Probably, MRB board of local office or T-network-related board is abnormal. Perform loopback on the bearer between NEs, and then implement CQT to locate faulty NE. For example, in a V3 end office, monolog occurs when a mobile subscriber calls a PSTN subscriber. Implement loopback test on one E1 of the A-interface trunk. After loopback, specify the call from mobile subscriber to PSTN subscriber to adopt the loopback trunk circuit. If the mobile subscriber can hear his/her own voice after the call is completed connected, but the PSTN subscriber cannot, BSC can be excluded. Implement loopback test on one E1 of inter-office trunk. After loopback, specify the call from mobile subscriber to PSTN subscriber to adopt the loopback trunk circuit. If the mobile subscriber can hear his/her own voice after the call is completely connected, but the PSTN subscriber cannot, BSC and CN can be excluded. Caution: Loopback test has effect on the current network services. Release loopback as earliest as possible after the test is completed. In addition, this method has some limitations for BSC and CN NEs with different implementation modes. Locating Faults with Tools Use 2M signaling analyzer and other similar meters to locate the system-related problems. Intercept signaling section by section to determine the problem range, as shown in Figure 28. FIGURE 28 INTERCEPTING CALLS SECTION BY SECTION During segmentation interception, focus on endpoint of each section (that is the access point of interception equipment). Probably, the problem just lies in this connection points. 80 Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults This method and the call quality test on specified trunk may be used in a combined manner. For example, specify the third timeslot of the first E1 of the A-interface trunk to accept the CQT test. After the call is connected, connect the receiving line of 2M signaling analyzer to the independent output interface of E1-corresponded tee connector on the DDF rack. Methods of Locating Internal Fault Points on CN Overview These methods are usually used for troubleshooting, when voice faults are certainly caused by the node on CN or leading-in trunkcircuit import. They can also be used for implementing self-test for CN when the faulty NE is different to be located. The following scenarios are only used for providing reference for locating faults onsite. The onsite engineers may also select suitable methods according to practical conditions. Signaling Comparison � Information collection Collect signaling messages when voice problems occur and when voice is normal. � Applicable Scenarios This method is applicable for a certain call model or office where the voice problem occurs. Example: a CN is interconnected with several PSTN offices. The voice quality is normal when mobile subscriber calls PSTN office A. However, when mobile subscriber calls subscriber in PSTN office B, voice quality is poor in every call. Method Compare the faulty signaling with the normal signaling to see whether their signaling code streams are different. If they have different lengths of code streams, or the settings of important parameters in signaling messages are different, feed back this signaling to headquarter for making sure whether the signaling is normal. Trunk Continuity Test � Prerequisites Opposite equipment supports continuity. � Applicable Scenarios Inter-office trunk test � Method Perform the continuity test on the trunks on Dynamic Management of OMM client. This method cannot distinguish a self-looped trunk. Dialing Test on Specified Trunk (TDM Bearer) � Prerequisites SIM cards and mobile phones of calling and called parties are ready. � Applicable Scenarios Confidential and Proprietary Information of ZTE CORPORATION 81 ZXWN MGW Troubleshooting Voice problems always focus on one or several A-interface trunk or inter-office trunk (TDM-type trunk circuit). � Method The current CN system provides the function of doing dialing test on specified trunks. Dialing Test on Specified RTP/TC Resources (IP Bearer � Prerequisites SIM cards and mobile phones of calling and called parties are ready. � Applicable Scenarios Voice problems always focus on one or several A-interface trunk or inter-office trunk (IP-type trunk circuit). � Method The current CN system provides the function of doing dialing test on specified trunks. Board Changeover/Resetting If you doubt that the voice fault probably locates on some functional board, and this board/interface is configured with active/standby mode, you may change over or reset it manually or with software. If this board/interface shares load with others, you may change over or reset spare board/interface to examine whether some board/interface is faulty. Caution: � Changeover is an operation with relatively high risks. System data backup must be done in advance. � It is recommended to perform changeover on OMP and other important boards at 0:00 - 6:00 am, and to keep a certain interval between two changeover activities. Echo Fault Handling This section describes the procedure for handling echo fault. Echo Principles Electrical Echo The transmission system on the relay network uses four-wire for transmission, while subscriber transmission line uses two-wire for full-duplex transmission. This conversion is implemented through two/four-wire hybrid in local switch. However, the transmitter and receiver cannot be isolated completely, since the impedance of actual hybrid coil is mismatched. Therefore, the two/four-wire converter can only separate the transmitter from the receiver to a certain extent. The signals received by four-wire are not converted completely to two-wire. 82 Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults Some signals leak out to the transmission part of four-wire. As a result, echo wave is generated, as shown in Figure 29. Electrical echo is the main source of echo. Common echo canceller is used to eliminate it. FIGURE 29 ELECTRICAL ECHO Acoustic Echo Speaker and microphone are not well isolated in some telephones. Acoustic echo is generated when the sound given out is transmitted back to the microphone through several times of space-reflection. This case happens when a hands-free telephone is used in a room or car. Working Principles of Echo Canceller Principle Overview An echo canceller subtracts the assessed value of echo signals from the four-wire transmitting path to eliminate the interference. The echo signal value is assessed based on the voice signals on four-wire receiving path. Briefly, an echo canceller supervises the voices from the far-end on the receiving path, and then assesses the echo value. Finally, subtract this value from the transmitting path. In this way, the echo is eliminated, only the voices of near-end are transmitted to the far-end. An echo canceller has four ports. Two ports are located at the drop-side, and the other two ports are located at the line-side, as shown in Figure 30. Confidential and Proprietary Information of ZTE CORPORATION 83 ZXWN MGW Troubleshooting FIGURE 30 WORKING PRINCIPLES OF ECHO CANCELLER Because the echo to be eliminated is generated at end path, the delay on the long haul line does not affect the echo canceller. But, the total circuit delay (end-path plus long-haul-line) determines whether to use an echo canceller. When it is more than 30 ms, an echo canceller should be adopted. Note: When an echo canceller is installed at near-end, remote subscribers are benefited. When it is installed at far-end, near subscribers are benefited. Echo Directions Incoming EC and outgoing EC are two concepts concerning gateway exchange. Figure 31 shows the essential meaning of incoming EC and outgoing EC. The marks of incoming and outgoing calls are reflected by signaling, which may be IAM/IAI signaling or ACM signaling. FIGURE 31 INCOMING/OUTING EC � 84 Incoming EC Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults It is the direction that receives an IAM or ACM/CPG/ANM message. This is the direction of an echo. The direction of audio frequency has the opposite direction with that of this echo. � Outgoing EC It is the direction that sends an IAM or ACM/CPG/ANM message. This is the direction of an echo. Audio frequency has the opposite direction with that of this echo. Generally, an echo has directional character. Because of it, the echo suppressor also has this character. We mainly focus on the parameters setting for EC direction. EC direction includes the GMSC-PSTN direction (or E1 direction) and the GMSC-MSC direction (or HW direction). Suppose the networking structure is PSTN-GMSC-MSC, PTSN generates echoes, and GMSC provides an echo canceller. EC direction may understand as this: The echo canceller on the E1 direction may suppress the echo generated when PSTN dials an MS. Echo-Suppression Implementation Whether to enable echo suppression in ZXWN MSCS requires integrating the echo suppression information carried by signaling and the hardware configuration of MGW (whether DTEC board is equipped) Currently, ZXWN MGW supports embedded echo suppression function. Independent echo suppression function (EC Pool) is under research. Configuring Echo Cancellation Prerequisites Before the operation, it is required to confirm: � The OMM system runs normally. � The OMM client has logged in the OMM server normally. Context Perform this procedure to configure the function of echo cancellation. Steps 1. On the MML Terminal interface of OMM client, select MSCS in the root tree. 2. In the MML Commands tree, double-click Trunk Configuration > Trunk Group > Modify Information Flag of Trunk Group. 3. Type Trunk Group Number, and click the box following Enable Flag to pop up the Enable Flag dialog box, as shown in Figure 32. Confidential and Proprietary Information of ZTE CORPORATION 85 ZXWN MGW Troubleshooting FIGURE 32 ENABLE FLAG 4. Select the Echo (Include Echo Killer) check box. 5. Click OK to return to the MML Terminal interface. 6. Click Execute to run the command. 7. On the OMM client, select Views > Professional Maintenance > MSCS > Variables Control > IGW Service Control Parameter to enter the IGW Service Control Parameter tab. 8. Select the ISUP Spare row, as shown in Figure 33. FIGURE 33 ISUP SPARE 9. Click the Modify icon on the sub-tool to enter the editing status. Type 1. The default is 0. 10. Click Save on the sub-toolbar. END OF STEPS 86 Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults Configuring Echo Cancellation by Adopting Resource Pool Prerequisites Before the operation, it is required to confirm: � The OMM system runs normally. � The OMM client has logged in the OMM server normally. Context In the condition of ECPOOL is configured, it is not allowed to configure the EC sub-card to the DTB and SDTB units where only E1 lines are led out. The SDTB unit does not support ECPOOL. Steps 1. Perform one of the following steps to specify the MGW exchange to be configured on the MML Terminal interface of the OMM client. Execute the SET command in the command input area. Select the required NE in the root tree. For example, to select the MGW exchange with the office ID as 31 in the root tree, the command is: SET:NEID=31; 2. Add an EC POOL resource board. The command is ADD UNIT. For the parameters in the ADD UNIT command, refer to Table 8. TABLE 8 PARAMETERS IN THE ADD UNIT COMMAND Parameters Explanation Instruction LOC Unit location Format: RACK - SHELF - SLOT MODULE Module No. Specifies No. of the module that the unit belongs to. Usually 1 is selected for the OMP module that the unit belongs to. UNIT Unit No. The No. of the unit corresponded by the board, ranging from 1 to 2000. Type an unused unit No.. TYPE Unit type Selects it according to the actual conditions. BKMODE Backup mode Select NO. For example, add an EC POOL resource board in a mode of DTEC sub-card plus EC. The specific command is as follows. ADD UNIT:LOC="1"-"2"-"3",MODULE=1,UNIT=3,TYPE=D TB2_ECPOOL_Z2EC1,BKMODE=NO,BKUNIT=65535,CLK=2 55; 3. Configure the management mode of EC resources. The command is SET ECRSC. For the parameters in the SET ECRSC command, refer to Table 9. Confidential and Proprietary Information of ZTE CORPORATION 87 ZXWN MGW Troubleshooting TABLE 9 PARAMETERS IN THE SET ECRSC COMMAND Parameters Explanation Instruction EC The management modes of EC resources are as follows. Type POOL. � CONNECTED: Direct connected mode � POOL: Pool mode For example, to set the management mode of EC resources to the POOL mode, the command is as follows. SET ECRSC:EC=POOL; END OF STEPS Postrequisite Transfer the data tables. System Implementation � Incoming processing for tandem/end office Whether to enable outgoing EC for incoming calls is determined by trunk flag and the echoInd field carried by IAM message. � � � If the echoInd field carried by incoming IAM message is 1, it indicates that the preceding office enables the EC function. By default, local office will disable this function. If the trunk flag is “ECHO” and the ISUP Spare variable is configured with 1, local office will forcibly enable the EC function. If the echoInd field carried by incoming IAM message is 0, it indicates that the preceding office disables the EC function. By default, local office will determine whether to enable this function. If the trunk flag is ECHO, local office will enable the EC function. Processing at outgoing side of tandem office Currently, the incoming EC function can be enabled forcibly for outgoing calls, which is determined by the trunk flag and the echoInd field carried by incoming ACM message. � � 88 If the echoInd field carried by incoming ACM message is 1, it indicates that the preceding office enables the EC function. By default, local office will not enable this function. If the trunk flag is ECHO and the ISUP Spare variable is configured with 1, local office will forcibly enable the EC function, and the echoInd field carried by outgoing ACM message is 1. If the echoInd field carried by incoming ACM message is 0, it indicates that the preceding office disables the EC function. Local office will determine whether to enable this function according to the configurations. If the trunk flag is ECHO, local office will enable the EC function, and the echoInd field carried by outgoing ACM message is 1. Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults � Processing an originating call of end office An end office originates a call. When it requests an A-interface circuit, EC is enabled by default. CC sends an message to interoffice signaling during voice service. The EC flag contained by this message is forcibly configured as 1, indicating that EC is enabled. But, mobile terminal has its own EC function so that the end office does not need to enable it. In subsequent version, EC is not required when an A-interface circuit is requested. The EC flag carried by the message sent to inter-office signaling is still forcibly configured as 1. � Special Notes If the ECHO (Include Echo Killer) flag is not selected for outgoing trunk group (for example, SET TGFLG:TG=1,DISABL E="Echo";), and the ISUP Spare variable is valued as 1, the echoInd field carried by outgoing IAM message is 1. During an originating call of local office, the echoInd field carried by outgoing IAM messages is 1 under default conditions. If local office serves as a transit exchange, the echoInd field is carried by incoming IAM message transmitted transparently. Fault Processing Generally, an echo is generated when a mobile subscriber belonging to a gateway exchange calls a fixed subscriber. In this case, you need to check whether DTEC board is installed in the trunk group between mobile exchange and fixed exchange, and whether the EC function is enabled during the corresponding data configuration. In addition, mobile phone’s own problem will also cause an echo. To locate this fault, just replace it with another test terminal. Monolog Fault Handling Overview The monolog fault often occurs, which involves many NEs. Because many reasons may cause this fault, onsite engineer should find the common points according to fault information so as to locate and handle it. Fault Analysis The monolog fault is relatively complicated, which is probably caused by the following reasons. � BSC/RNC or other NE problem EDRT of BSC, carrier frequency interference of radio signals, uplink/downlink imbalance, and other conditions may cause monolog. � Mismatched wires of trunk Receiving or transmitting line of some E1 on the trunk circuit is connected to a wrong E1, or cables are badly welded. No- Confidential and Proprietary Information of ZTE CORPORATION 89 ZXWN MGW Troubleshooting tice that the trunk problem only occurs on either receiving or transmitting direction; otherwise, it is a both-way silence fault. � Signaling compatibility Signaling incompatibility results from the interconnection between equipment produced by different manufacturers in the CN. As a result, the monolog fault occurs. � TSNB/TFI/UIMT/VTCD board is faulty. The monolog fault may be caused by the circuit-connection error in switching network board, the optical module fault of TFI board, and other conditions. � Fault Handling If ring-back tone or color ring back tone can be heard normally, but the monolog phenomenon suddenly appears during a call, check whether handover or radio signal interference occurs, and whether the radio signal strength is normally. Service personnel must regard Troubleshooting Ideas as overall guiding principle. Handle monolog fault with the locating methods mentioned above according to onsite conditions. Since currently voice stream cannot be saved, service personnel can only hear voice manually for judgment when voice fault (bothway silence, monolog, and other faults) occurs. Analyze the CDR files to judge the circuit seized by this call so as to reduce the range of hearing voice. Loop is the best method used to judge where the fault occurs. With this method, reduce the troubleshooting range till the fault is located. Both-Way Silence Fault Handling Overview As monolog fault, both-way silence fault often occurs. In addition, these two faults are confusable. Actually, there is an essential distinction between them. When a fault occurs, service personnel should distinguish them by respectively knowing the subjective feelings of calling and called parties. Both-way silence fault is mostly caused by interconnection of trunk circuits. Fault Analysis � Analyzing fault for TDM bearer Corresponding to a TDM bearer, both-way silence fault is generally caused by the following conditions. � BSC or other NE problem � Trunk wires are mismatched or self-looped. Mismatched-wire condition in both-way silence fault is somewhat different from that in monolog fault. The following conditions possibly result in both-way silence fault. Both receiving and transmitting lines of some E1 on the trunk circuit are connected to a wrong E1. Receiving and transmitting lines are connected inversely during interconnection. Cables are badly welded. In addition, trunk 90 Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults self-loop will also cause this fault. In this case, calling and called parties can hear his/her own voice, which is easy to reveals the problem. � PCM system IDs at both sides of a trunk for interconnection are inconsistent. The inter-office trunk between MSC A and MSC B are totally interconnected N E1s. The PCM system ID of MSC A is “0~N-1”, while that of MSC B is “1~N”. When MSC A allocates the second timeslot in the E1 of the trunk circuit whose PCM system ID is 1, the corresponded physical circuit is the second timeslot of the second E1. However, the physical circuit actually used by MSC B is the second timeslot of the first E1. � TSNB/TFI/UIMT board fault Both-way silence fault may be caused by the circuit-connection error in switching network board, the optical module fault of TFI board, and other conditions. � Analyzing fault for IP bearer Corresponding to an IP bearer, both-way silence fault is generally caused by the following conditions. � RNC or other NE is faulty. � GLI board in MGW is faulty. For example, the faulty memory on the GLI board causes this board’s forwarding table error, which results in bothway silence. � UIM board in MGW is faulty. For example, the forwarding chip on the UIM board is faulty. The current version is added the auto-sensing and autochangeover functions to evade this fault. Generally, this kind of problem will not appear in the outfield. � VTCD board in MGW is faulty. For example, VTCD chip has a hardware fault or other problems. Fault Handling To handle both-way silence, perform the CQT test first to make sure the fault range, whether both ring-back tone and voice cannot be heard, or just voice call is abnormal during the whole call. � If both-way silence occurs in all types of calls (intra-office and outgoing calls), it is usually related with core switching board, such as TSNB and PSN. � If this fault occurs in the calls to some office, it is usually related with the trunk circuit or opposite device(s) of this office. � Perform board changeover to handle the both-way silence fault that is possibly caused by malfunctioning TSNB/PSN/UIMT/TFI board. � To troubleshoot the fault caused by trunk circuits, perform CQT test on designated trunk for TDM circuits; or on designated RTP/TC resources for IP circuits. � To troubleshoot the fault related with IP bearer network, you can only capture packets on each rank, for CS version 3.06/3.07.11/3.07.20 does not yet provide the loopback Confidential and Proprietary Information of ZTE CORPORATION 91 ZXWN MGW Troubleshooting function. Generally, capture the IP messages between NEs first to make sure that which NE causes this fault, and then capture the messages inside this NE to make sure of the faulty module. Noise Fault Handling Overview Fault Analysis Fault Handling Noise fault occasionally appears in a call, which is usually concurrent with monolog or both-way silence. This fault is possibly due to one or more of the following causes. � The DDF frame where the trunk circuit is located is not well grounded, or error codes occur on the transmission equipment. � Poor radio link signals, radio frequency interference, or the system problem of radio antenna and feeder. � Clock fault in the CN, or BSC and other NEs � TSNB/TFI/UIMT board fault in the CN � TDM/IP terminal deadlock due to MGW version error. To troubleshoot a noise fault, perform CQT test to make sure the fault range. If a noise fault occurs in all types of calls (intra-office and outgoing calls), it is usually related with core switching board, such as TSNB and PSN. If this fault occurs to the calls to some office, it is usually related with the trunk circuit or opposite equipment of this office. Perform board changeover to handle the noise fault that is possibly caused by malfunctioning TSNB/PSN/UIMT/TFI board working in active/standby mode. For a noise fault caused by TDM trunk circuit, specify a trunk to perform the CQT test to locate it. For a noise fault caused by IP trunk circuit, capture IP messages to locate it. Cross-Talking Fault Handling A noise fault occasionally appears in a call, which is usually concurrent with monolog, both-way silence, or noises. Fault Analysis 92 Generally, cross-talking fault is seldom caused by the CN problem. The common causes are as follows. � BSC fault, radio frequency interference, or other radio problems � Connection error occurs on TSNB, PSN and other switching boards in the CN Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults Instance Analysis This section describes some voice-fault instances. Analyzing Instance 1 Topic Networking Diagram Calls of some end office occurs monolog, both-way silence and cross-talking faults. Figure 34 shows the networking diagram of this instance. FIGURE 34 NETWORKING DIAGRAM Symptom Analyzing and Processing When an end office is in maintenance period, monolog, both-way silence, cross-talking and other faults appear in some call. 1. Reproduce the fault through call quality test. � � � The subscriber dials intra-office calls under MGW 1 for 50 times. Monolog, cross-talking and other faults do not appear. The subscriber dials intra-office calls under MGW 2 for 50 times. Monolog, cross-talking and other faults do not appear. The subscriber under MGW1 dials a NOKIA subscriber for 50 times. The route is MGW1-T1-NOKIA. One time of Confidential and Proprietary Information of ZTE CORPORATION 93 ZXWN MGW Troubleshooting monolog, two times of cross-talking and both-way silence faults appear. � � � The subscriber under MGW1 dials a NOKIA subscriber for 50 times. The route is MGW1-MGW2-T2-NOKIA. One time of monolog and two times of cross-talking faults appear. The subscriber under MGW2 dials a NOKIA subscriber for 50 times. The route is MGW2-T2-NOKIA. Monolog, crosstalking and other faults do not appear. The subscriber under MGW2 dials a NOKIA subscriber for 50 times. The route is MGW2-MGW1-T1-NOKIA. Two times of monolog and one time of both-way silence faults appear. 2. Analyze call quality results By analyzing the call quality test results of local office, find that voice faults are not caused by BSC, A-interface trunk circuit, and local-office boards. By analyzing the results of tests through different calling routes, find that the MGW1-T1-NOKIA inter-office trunk and some E1 lines on the circuits between MGW 1 and MGW 2 have problems, which cause voice faults. 3. Perform designated trunk call quality test Perform designated trunk call quality test results on the MGW1T1-NOKIA trunks, and find that voice faults always occur in the calls through some E1 lines. Check the jumpers on this E1 line, and find that some lines are connected inversely or crosswise. 4. Check physical circuit Since NetNumen M30 (V3.06) installed in the current network does not support performing call quality test on inter-MGW circuits, onsite service personnel directly check the physical connection of this part of circuits. He/she finds that some E1 line is connected inversely or crosswise. Solution Rectify the trunk circuits having connection errors. The fault is eliminated. Analyzing Instance 2 Topic Echo and monolog faults happen in some end office. Symptom The calling subscriber belongs to ZTE end office, while the called subscriber belongs to S-manufacturer end office. The calling subscriber can hear the called subscriber’s voice, without any exceptions. But the called subscriber cannot hear the opposite’s voice, but only an echo of his/hers. There is a direct-connected trunk circuit between these two end offices. Analyzing and Processing 1. Inquire into the operating conditions before and after the fault happened. Before the fault occurs, seven A-interface circuits (PCM 16~22) are added, of which four circuits are abnormal. 2. Perform interception test. Connect a 2M online BER tester with the A-interface of ZTE end office for intercepting timeslots. The called subscriber sound is heard on uplink and downlink. 94 Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults 3. Analyze the fault. Since the called subscriber’s voice is audible on the uplink and downlink of the A interface, its uplink is abnormal. Possibly, the faulty point is located at BSC, A-interface trunk, and Abis circuit. 4. Handle the fault. i. At the CS side, replace the abnormal PCM 19 port with the normal PCM 20 port, and swap data between PCM 19 and PCM 20. The BSC side remains unchanged. PCM 19 port is abnormal, while PCM 20 port is normal, indicating that the MGW ports are normal. ii. At the CS side, the trunk group corresponded by PCM 16~19 is changed, but these four circuits are still faulty. iii. Perform changeover on the EDRT board at the BSC side, but the fault still remains. iv. On the TIC of the BSC side, exchange the transmission ports of PCM 19 and PCM 20, and the ports corresponded by data, too. The MGW side remains unchanged. However, PCM 19 port is still abnormal, while PCM 20 port is normal, indicating that the ports at the BSC side are normal. v. Perform soft-loop on transmission, and the same fault phenomena are simulated. So, the fault is caused by transmission problems. Solution The transmission personnel of A-interface are responsible for handling this fault. The fault is eliminated after the abnormal circuit is replaced. Analyzing Instance 3 Topic Background noises are audible in the calls of an end office. Symptom Background noise similar like firecracker sound appears in some calls of this end office. Either or both of the parties can hear it simultaneously. Analyzing and Processing Background noise appears after several times of intra-office call quality tests. Since intra-office call does not involve other MSC, the call is connected only through MSC of local office and BSC. And inter-office trunk or opposite device will not affect it. But, you still need to check whether the noise comes from the radio side or the CN side. Contact with headquarters, perform voice-channel self-loop on the trunk board of MSC or BSC to locate the fault. Log in the SDTB/DTB board occupied by the foreground, and operate self-loop voice channel with related commands. When background noise appears during the call test, perform selfloop from the timeslot of the CN A-interface occupied by calling and called parties to the BSC side. Noise does not exist. Perform self-loop to the UIMT side. Both parties can hear noises. It is basically sure that the fault is located at the CN side. After dialing for several times, find that either of the calling and called parties Confidential and Proprietary Information of ZTE CORPORATION 95 ZXWN MGW Troubleshooting will seize the two SDTB trunks in shelf 2 in rack 2 of MSC when noise appears. The voice code stream of intra-office call involves ETSN, TFI, UIM and SDTB boards in MSC. � SDTB board is a trunk board. � UIM board completes Ethernet level-2 switching inside shelf, and connects the TFI board (timeslot switching interface board) through fiber. � ETSN board is a timeslot switching board. An uplink originated call from A-interface trunk is transmitted through SDTB board, and then through UIMT and TFI board, and finally reaches ESTN. After the timeslot switching processing of ESTN, this call is sent to SDTB through TFI and UIMT boards. Finally, it reaches the called party through BSC, an A-interface trunk. The uplink route of a terminated call is the same as that of the originated call. The intra-MSC connection flow of intra-office call is shown in Figure 35. FIGURE 35 CALL QUALITY TEST FLOW Check TFI and UIMT boards, and find that the corresponding optical interface of TFI implements active/standby changeover automatically and continuously. This TFI board is connected with the UIM board of this shelf. And other TFI optical interfaces and UIMT board are normal. The changeover notification is also shown in the Fault Management window of the NetNumen M30 window. Since board fault or incorrect connection between UIMT and TFI boards may possibly cause noises, change over the ESTN board of MSC, and find that an exception of optical interface changeover exists, and noises are not eliminated. Solution After the optical module of the TFI board is replaced, the exception of optical-interface auto-changeover is eliminated. Specify the trunk of the two SDTB boards that occur exceptions to perform the CQT test for several times. Noises do not appear again. Analyzing Instance 4 Topic 96 Noises result from the transmission system fault. Confidential and Proprietary Information of ZTE CORPORATION Chapter 7 Voice Faults Symptom After an office is interconnected with its opposite office, the signaling flow is normal. However, some calling subscribers complain that they can hear noises when they dial an intra-office called party. Fault Analysis and Location 1. Analyze the fault phenomena, and find that noises basically result from inter-office fault, which probably lies in the trunk circuit to an office. 2. Check the relation schema of traffic flow and the network topology structure diagram of local office. The incoming and outgoing calls of local office are mainly forwarded by two T-offices. 3. Perform trunk dial-up test to reproduce the fault. Find that the noise always appears irregularly on one office of a T-office. 4. Check the transmission-circuit type between the MGW and the T-office, and find that it adopts SDH transmission mode. Check the alarms related to SDH on the OMM system, and find that a “Trunk receiving-end alarms” alarm exists on the trunk circuit to this office. It indicates that obvious error code exists on this transmission, which probably causes the noise. 5. Contact the service personnel of transmission system for cooperation. Eliminate the fault together with the method of loop-back segment by segment, and finally locate the fault point. Fault Handling When the transmission service personnel handle the fault, the alarm in the fault management system disappears. Perform the dial-up test with the designated mobile phone, the noise disappears. Summary � The noise fault occurs frequently in routine maintenance. To eliminate it, first determine the noise location according to its phenomena. It has following common features. � � � � If noise always appear in some cells or location areas belonging to local office, its location is related to BSC/RNC. Primarily check the trunk circuits between local office and BSC/RNC, and radio side. Contact the service personnel at the radio side to locate the fault together. If noise occurs on the inter-office circuit, and intra-office calls have no noises, the noise location is related to adjacent office. Check outgoing trunk circuits of local office. Perform dial-up tests on all the trunk circuits, and trace the inter-office signaling to find the board or transmission system existing fault. If noise appears irregularly and randomly both inside MGW office and between offices, check whether it is located in the local office, for example, in TSNB, TFI, UIM, and SMP boards, and data configuration of MSCS and MGW. If the transmission system is faulty, contact its service personnel immediately. Cooperate to find the fault point with the loop-back method segment by segment to provide necessary information and help for eliminating the fault quickly. Confidential and Proprietary Information of ZTE CORPORATION 97 ZXWN MGW Troubleshooting This page is intentionally blank. 98 Confidential and Proprietary Information of ZTE CORPORATION Figure Figure 1 Troubleshooting Flow .............................................. 3 Figure 2 Handling The Fault Occurring During Board Configuration ....................................................11 Figure 3 Handling Common Faults of Boards..........................13 Figure 4 Troubleshooting Flow of UIM Fault ...........................15 Figure 5 Troubleshooting Flow of OMP Fault...........................18 Figure 6 Handling the SIPI Board Fault .................................20 Figure 7 Troubleshooting Flow of SPB Board ..........................21 Figure 8 Troubleshooting Flow of the CLKG Board...................24 Figure 9 Handling the IPI Board Fault ...................................25 Figure 10 Handling the DTB/DTEC Board Fault .......................26 Figure 11 Handling the APBE Board Fault ..............................27 Figure 12 Handling the MRB Board Fault ...............................28 Figure 13 Handling VTCD Board Fault ...................................29 Figure 14 Handling the Clock Abnormality .............................34 Figure 15 Troubleshooting Flow of Clock Locking Failure ..........36 Figure 16 Example for Clock Self-Loop..................................39 Figure 17 Mc Interface Protocol Stack...................................47 Figure 18 Networking Mode & Interface Protocol Stack Structure..........................................................51 Figure 19 Networking and Protocol Architecture (Built-In SG Supported) .......................................................53 Figure 20 A-Interface Protocol Stack Structure ......................54 Figure 21 MGW Built-In Networking Mode and Its Protocol Structure..........................................................55 Figure 22 Boards Related to Narrow-Band No.7 Signaling Protocol............................................................55 Figure 23 Handling the Abrupt Abnormality of the OMM system.............................................................60 Figure 24 Handling Virus/Security Events..............................62 Figure 25 Setting CORBA FTP ..............................................66 Figure 26 Interconnection Between Media Plane and Bearer Network ...........................................................74 Figure 27 Interconnection Between Soft-Switch and CE ..........75 Confidential and Proprietary Information of ZTE CORPORATION 99 ZXWN MGW Troubleshooting Figure 28 Intercepting Calls Section by Section......................80 Figure 29 Electrical Echo.....................................................83 Figure 30 Working Principles of Echo Canceller ......................84 Figure 31 Incoming/Outing EC.............................................84 Figure 32 Enable Flag.........................................................86 Figure 33 ISUP SPARE ........................................................86 Figure 34 Networking Diagram ............................................93 Figure 35 Call Quality Test Flow ...........................................96 100 Confidential and Proprietary Information of ZTE CORPORATION Table Table 1 UIM Types .............................................................14 Table 2 Indicators on UIM ...................................................16 Table 3 Impedance DIP Switches of E1 on the SPB Board ........37 Table 4 Handling Method of Inconsistent Locking Status of Active/Standby CLKG Boards...............................40 Table 5 Handling Method of Inconsistent Clock Reference Status of Active/Standby CLKG Boards .................41 Table 6 Handling Method of Output Clock Loss .......................43 Table 7 Handling Method of Slip Code ...................................44 Table 8 Parameters in the ADD UNIT Command .....................87 Table 9 Parameters in the SET ECRSC Command....................88 Confidential and Proprietary Information of ZTE CORPORATION 101 ZXWN MGW Troubleshooting This page is intentionally blank. 102 Confidential and Proprietary Information of ZTE CORPORATION Index A L Active status..................... 30 Adjacent office .................. 48 Alarm analysis ....................5 Alarm level .........................5 Alarm query........................4 Link .................. 5, 7, 22, 25, 39, 49, 51, 74, 92 Local office ..............5–6, 28, 39, 48, 51, 97 M B Backplane ................ 10, 19, 36, 40, 42–43 C Changeover ...................... 19 Client/server.......................1 Conference call ................. 28 Control plane ......... 20, 50, 77 D Data configuration ................. 5–6, 32, 48, 51, 57, 63, 65, 74 DIP switch ........................ 26 DIP switches ..................... 22 Man-machine command ..... 59 Mask ............................... 60 Matching impedance ................ 26, 36 N Network cable ................... 63 Networking mode ...... 50, 54, 58 No.7 signaling .....................1 O OMM server ................ 9, 64–66, 68, 70–71 Outgoing route.................. 73 P F Failure observation ...... 4, 17, 50 File management............... 65 FTP.................................. 66 H Handover ................... 77, 90 History alarm .................... 43 History notification ............ 43 performance statistic ......... 71 Performance statistics .......................4, 31, 65 Power on .......................... 75 Power-off ...........................8 Probe............................... 55 R Registration ...................... 48 S I IP address .............9, 48, 60, 63–64, 66, 70–71 J signaling link .................... 48 Signaling link .............. 7, 42, 44, 51–52, 56 Signaling links................... 32 signaling route .................. 51 Signaling route.............. 7, 53 Jumper ............................ 22 Confidential and Proprietary Information of ZTE CORPORATION 103 ZXWN MGW Troubleshooting Signaling tracing ......4–5, 7, 17, 48, 50 T Trunk management............ 55 V Version management ................... 10, 30, 65 104 Confidential and Proprietary Information of ZTE CORPORATION

ZXWN MGW Media Gateway Troubleshooting

Documentos relacionados

Productos

Apoyo

ZXWN MGW Media Gateway Troubleshooting

Documentos relacionados

Añadir este documento a la recogida (s)

Añadir a este documento guardado

Sugiéranos cómo mejorar StudyLib