Subido por aeliasquijadab

ZXWN MGW Media Gateway Troubleshooting

Anuncio
ZXWN MGW
Media Gateway
Troubleshooting
Version 3.07
ZTE CORPORATION
ZTE Plaza, Keji Road South,
Hi-Tech Industrial Park,
Nanshan District, Shenzhen,
P. R. China
518057
Tel: (86) 755 26771900
Fax: (86) 755 26770801
URL: http://ensupport.zte.com.cn
E-mail: support@zte.com.cn
LEGAL INFORMATION
Copyright © 2006 ZTE CORPORATION.
The contents of this document are protected by copyright laws and international treaties. Any reproduction or distribution of
this document or any portion of this document, in any form by any means, without the prior written consent of ZTE CORPORATION is prohibited. Additionally, the contents of this document are protected by contractual confidentiality obligations.
All company, brand and product names are trade or service marks, or registered trade or service marks, of ZTE CORPORATION
or of their respective owners.
This document is provided “as is”, and all express, implied, or statutory warranties, representations or conditions are disclaimed, including without limitation any implied warranty of merchantability, fitness for a particular purpose, title or non-infringement. ZTE CORPORATION and its licensors shall not be liable for damages resulting from the use of or reliance on the
information contained herein.
ZTE CORPORATION or its licensors may have current or pending intellectual property rights or applications covering the subject
matter of this document. Except as expressly provided in any written license between ZTE CORPORATION and its licensee,
the user of this document shall not acquire any license to the subject matter herein.
ZTE CORPORATION reserves the right to upgrade or make technical change to this product without further notice.
Users may visit ZTE technical support website http://ensupport.zte.com.cn to inquire related information.
The ultimate right to interpret this product resides in ZTE CORPORATION.
Revision History
Product
Version
Revision
Revision Reason
V3.07.40
20090420–R1.0
First edition
Serial Number: sjzl20092207
Contents
About This Manual.............................................. i
Declaration of RoHS Compliance ........................ i
Troubleshooting ................................................1
Basic Requirements for the Maintenance Personnel ............. 1
Troubleshooting Flow ...................................................... 2
Troubleshooting Principles................................................ 4
Troubleshooting Methods ................................................. 4
Hardware Faults ................................................9
Faults in Board Configuration ........................................... 9
Handling Fault Occurring during Board Configuration ........ 9
An Instance for Handling Board Version Loading
Failure .............................................................12
Faults in Board Running Process ......................................12
Fault Handling Procedure ............................................13
Handling UIM Board Hardware Faults............................14
Handling OMP Board Hardware Faults ...........................17
Handling SMP Board Hardware Faults ...........................19
Handling SIPI Board Hardware Fault.............................20
Handling SPB Board Hardware Faults............................21
Handling CLKG Board Hardware Faults..........................23
Handling IPI Board Fault .............................................24
Handling DTB/DTEC Board Fault ..................................25
Handling APBE Board Fault..........................................26
Handling MRB Board Fault...........................................27
Handling VTCD Board Fault .........................................29
Handling Changeover Exceptions .....................................30
Handling CPU Overload...................................................31
Clock Faults..................................................... 33
Handling System Clock Exception ....................................33
Handling the Clock Lock Failure .......................................35
Handling the Clock Networking Fault ................................38
Handling the Inconsistent Lock Status of Active/Standby
CLKG Boards.........................................................39
Handling the Clock Reference Loss ...................................41
Handling the Output Clock Loss .......................................42
Handling the Slip Code ...................................................44
Interface Faults............................................... 47
Handling MGW-MSCS Interface Fault ................................47
Handling MGW-MGW Interface Fault .................................49
Handling MGW-RNC Interface Fault ..................................50
Handling MGW-PSTN Interface Fault.................................52
Handling MSCS-BSC Interface Fault .................................54
Handling Service Faults ..................................................56
OMM System Faults ......................................... 59
Handling OMM Abrupt Abnormality...................................59
Handling Virus/Security Events........................................62
Analyzing Instance 1......................................................64
Analyzing Instance 2......................................................64
Analyzing Instance 3......................................................65
Analyzing Instance 4......................................................66
Analyzing Instance 5......................................................67
Analyzing Instance 6......................................................68
Analyzing Instance 7......................................................70
Analyzing Instance 8......................................................71
Interconnection Faults in IP-Bearer
Network .......................................................... 73
Handling Continuous Call Loss Generated for Broken
Receiving Fiber of Soft-Switch .................................73
Handling Call Loss Generated by Soft-Switch after CE
Restarts ...............................................................74
Handling Soft-Switch Failing to Ping through CE.................75
Voice Faults..................................................... 77
Common Voice Faults .....................................................77
Troubleshooting Ideas and Common Methods ....................78
Troubleshooting Ideas ................................................78
Common Methods for Locating Faulty NE ......................79
Methods of Locating Internal Fault Points on CN.............81
Echo Fault Handling .......................................................82
Echo Principles ..........................................................82
Working Principles of Echo Canceller ............................83
Principle Overview.............................................83
Echo Directions.................................................84
Echo-Suppression Implementation ...............................85
Configuring Echo Cancellation .............................85
Configuring Echo Cancellation by Adopting
Resource Pool ........................................87
System Implementation .....................................88
Fault Processing ........................................................89
Monolog Fault Handling ..................................................89
Both-Way Silence Fault Handling......................................90
Noise Fault Handling ......................................................92
Cross-Talking Fault Handling ...........................................92
Instance Analysis ..........................................................93
Analyzing Instance 1..................................................93
Analyzing Instance 2..................................................94
Analyzing Instance 3..................................................95
Analyzing Instance 4..................................................96
Figure.............................................................. 99
Table ............................................................. 101
Index ............................................................ 103
About This Manual
Purpose
At first, thank you for choosing ZXWN wireless core network system of ZTE Corporation!
ZXWN system is the 3G mobile communication system developed
based on the UMTS technology. ZXWN system boasts powerful
service processing capability in both CS domain and PS domain,
providing more abundant service contents. Comparing with the
GSM, ZXWN provides telecommunication services in wider range,
capable of transmitting sound, data, graphics and other multi-media services. In addition, ZXWN has higher speed and resource utilization rate. ZXWN wireless core network system supports both
2G and 3G subscriber access, and provides various services related with the 3G core network.
ZXWN MGW Media Gateway boasts the functions of media control and media flow control, and provides transmission resources.
ZXWN MGW is totally compatible with 3GPP R4 of June 2003, and
is downward compatible with 3GPP R99 of June 2002. ZXWN
MGW not only supports the networking mode of bearer independent of control in 3GPP R4, but also can be bound to a MSC with
ZXWN MSCS, supporting the networking mode of 3GPP R99. Besides satisfying the requirements of constructing the common mobile switching network, ZXWN MSCS also can satisfy the requirements of construct the No.7 signaling network and mobile intelligent network, and can be adapted to various complicated networking modes of the mobile switching network. Thus the continuity
development capability of the network is improved.
The purpose of writing this manual is to provide procedures and
guidelines that support the maintenance of ZXWN MGW.
Intended
Audience
This document is intended for engineers and technicians who perform maintenance activities on ZXWN MGW.
Prerequisite Skill
and Knowledge
To use this document effectively, users should have a general understanding of wireless telecommunications technology. Familiarity with the following is helpful.
What Is in This
Manual
�
ZXWN MGW system and its various components
�
User interfaces on the ZXWN MGW
�
ZXWN MGW operating procedures.
This manual contains the following chapters:
Chapter
Summary
Chapter 1, Troubleshooting
Introduces troubleshooting
background, troubleshooting
sequence, troubleshooting
methods and types of faults.
Chapter 2, Hardware Faults
Introduces about the hardware
faults.
Chapter 3, Clock Faults
Introduces about the clock faults.
Confidential and Proprietary Information of ZTE CORPORATION
i
ZXWN MGW Troubleshooting
Conventions
Chapter
Summary
Chapter 4, Interface Faults
Introduces about the interface
and service faults.
Chapter 5, OMM System Faults
Introduces about the OMM system
faults.
Chapter 6, Interconnection Faults
in IP-Bearer Network
Introduces about the
interconnection faults in IP
bearer network.
Chapter 7, Voice Faults
Introduces the methods of
handling voice faults occurring in
the CN.
ZTE documents employ the following typographical conventions.
Typeface
Meaning
Italics
References to other Manuals and documents.
“Quotes”
Links on screens.
Bold
Menus, menu options, function names, input fields,
radio button names, check boxes, drop-down lists,
dialog box names, window names.
CAPS
Keys on the keyboard and buttons on screens and
company name.
Note: Provides additional information about a
certain topic.
Mouse operation conventions are listed as follows:
ii
Typeface
Meaning
Click
Refers to clicking the primary mouse button (usually the
left mouse button) once.
Doubleclick
Refers to quickly clicking the primary mouse button
(usually the left mouse button) twice.
Right-click
Refers to clicking the secondary mouse button (usually
the right mouse button) once.
Confidential and Proprietary Information of ZTE CORPORATION
Declaration of RoHS
Compliance
To minimize the environmental impact and take more responsibility to the earth we live, this document shall serve as formal
declaration that ZXWN MGW manufactured by ZTE CORPORATION
are in compliance with the Directive 2002/95/EC of the European
Parliament - RoHS (Restriction of Hazardous Substances) with respect to the following substances:
�
Lead (Pb)
�
Mercury (Hg)
�
Cadmium (Cd)
�
Hexavalent Chromium (Cr (VI))
�
PolyBrominated Biphenyls (PBB’s)
�
PolyBrominated Diphenyl Ethers (PBDE’s)
…
The ZXWN MGW manufactured by ZTE CORPORATION meet the
requirements of EU 2002/95/EC; however, some assemblies
are customized to client specifications. Addition of specialized,
customer-specified materials or processes which do not meet the
requirements of EU 2002/95/EC may negate RoHS compliance of the
assembly. To guarantee compliance of the assembly, the need for
compliant product must be communicated to ZTE CORPORATION in
written form. This declaration is issued based on our current level
of knowledge. Since conditions of use are outside our control, ZTE
CORPORATION makes no warranties, express or implied, and assumes
no liability in connection with the use of this information.
Confidential and Proprietary Information of ZTE CORPORATION
i
ZXWN MGW Troubleshooting
This page is intentionally blank.
ii
Confidential and Proprietary Information of ZTE CORPORATION
Chapter
1
Troubleshooting
Table of Contents
Basic Requirements for the Maintenance Personnel .................
Troubleshooting Flow ..........................................................
Troubleshooting Principles....................................................
Troubleshooting Methods .....................................................
1
2
4
4
Basic Requirements for the
Maintenance Personnel
Knowledge
Networking
and Running
Environment
Operations
�
Get familiar with the communication knowledge, such as mobile communication principles, ATM principles and soft-switching principle.
�
Get familiar with signaling protocols related with , BICC No.7
signaling and H.248 signaling.
�
Get familiar with related international technical regulations.
�
Understand billing principles and flows.
�
Understand basic knowledge about computer networks, including Ethernet, TCP/IP, Client/Server architecture and Oracle
database.
�
Get familiar with product knowledge of the ZXWN MGW system,
concerning functional structure, call flow, and service flow.
�
Know hardware architecture and performance of the ZXWN
MGW system very well.
�
Know the inter-module routing and routing between modules
and offices in the ZXWN MGW system very well.
�
Know signaling and protocols of the ZXWN MGW system and
the networking equipments very well.
�
Get familiar with the network architecture and channel allocation of the relevant transmission equipment.
�
Master daily operations of the ZXWN MGW system.
�
Know well which operations will cause the interruption of part
of or all services.
�
Know well which operations will cause damage to the equipment.
Confidential and Proprietary Information of ZTE CORPORATION
1
ZXWN MGW Troubleshooting
Instruments and
Meters
�
Know well which operations will cause vital effects on the
billing.
�
Know well which operations will cause the subscriber’s complaint.
�
Know well emergency or backup measures.
The maintenance personnel of the ZXWN MGW system must get
familiar with how to use instruments and meters to locate a fault.
The common instruments and meters include multi-meter and the
SS7 analyzer.
Troubleshooting Flow
The troubleshooting flow of the ZXWN MGW system is shown Figure 1.
2
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 1 Troubleshooting
FIGURE 1 TROUBLESHOOTING FLOW
Confidential and Proprietary Information of ZTE CORPORATION
3
ZXWN MGW Troubleshooting
The above flow describes the location and processing flow of the
non-emergency fault. As for the emergency fault, inform local
ZTE office or call ZTE Global Customer Support Center as earliest
as possible, and operate the equipments under the instruction of
the ZTE technical support personnel.
Troubleshooting Principles
When performing the troubleshooting, follow the troubleshooting
flow and the basic principles: check, ask, think and act.
Check
Check the phenomena of the fault first, that is, check which part of
the equipment is faulty and which alarm is generated, and check
the severity degree and the harm caused.
For the phenomena checking, the system provides various tools,
such as the performance statistics, the signaling tracing, the alarm
query, the log query and failure observation.
Ask
After checking the phenomena, the maintenance personnel must
inquiry the onsite personnel of each phase about the fault causes,
such as that the data have been modified, that the file has been
deleted, that the circuit board has been replaced, and that power
cut, lightening or incorrect operation has occurred.
Think
Combining the on-site phenomena checked and the results acquired with the communication knowledge, think, analyze and
judge the possible reason that may cause such fault, and then
make the correct judgment.
Act
Find out the fault point based on the above three steps, solve and
eliminate the fault by modifying data, replacing the circuit board,
and so on.
When operating the equipment, the maintenance personnel must
consider whether it is the busy hour, and the potential consequence
of operating at the busy hour. As for uncertain problems, the personnel must consult ZTE technical support personnel.
Troubleshooting Methods
Analyzing Fault
Information
The fault information analysis is mainly used to judge the range
and category of the fault, providing the evidence for reducing the
fault range and initially locating the fault on the primary phase of
the fault processing. The personnel with rich maintenance experiences can even locate the fault directly.
The collection and analysis of the fault information plays a vital role in processing other kinds of faults, especially the trunk
fault, for the trunk needs to connect with the transmission system
and there are signaling coordination problems. The fault information includes whether the transmission system runs normally, and
whether the data or definitions of some signaling parameters have
been changed by the opposite-end office.
4
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 1 Troubleshooting
Analyzing Alarm
Information
The alarm information refers to the information transferred from
the ZXWN MGW alarm system in the mode of sound, light or screen
output. The alarm information features briefness and large and
complete contents, involving the hardware, link, trunk, billing, CPU
load and other parts of the ZXWN MGW. It is one of the important
evidences for the fault analysis and location.
The alarm information analysis is mainly used to find out the specific location or cause of the fault. Because the alarm information
outputted by the Fault Management System of ZXWN MGW has
large and complete contents, it is usually used to directly locate
the fault cause, or coordinate with other methods to locate the
fault cause. It is one of main methods to analyze the fault.
With high location accuracy, the alarm generated by ZXWN MGW,
for example, can test and locate each circuit of the signaling system. When the alarm station generates several alarm messages,
firstly process the fault alarm with high level according to the alarm
level, and then process the event alarm.
Indicators Status
Every board of the ZXWN MGW is equipped with corresponding
running and status indicators, and some board is even equipped
with the function or feature indicator. Board indicators reflect the
working status of the board, and most of them can reflect the status of link, optical channel, node, channel, active/standby servers
and others serving as one of important bases for fault analysis and
location.
The indicator status analysis is used to quickly find the fault position or cause, preparing for further processing. Due to the relatively inadequate information provided by the indicator, it is usually
used together with the alarm analysis.
Signaling Tracing
The signaling tracing plays an important role in analyzing the failure cause of the subscriber call connection and inter-office signaling coordination. The cause for the call failure can be obtained
from results of the signaling tracing, which is helpful for the subsequent analysis. ZXWN MGW provides a lot of signaling tracing
methods.
Log Querying
Because the data configuration of ZXWN MGW is complex, incomplete configuration often causes faults. To quickly locate such
faults, it is required to query the data configuration performed by
the maintenance personnel. ZXWN MGW provides the log querying function, which can record the operator’s operation. Besides
querying the log information of the local office, inquire about the
data modification made by the opposite end office, when some
problems are related to the opposite-end office.
Test and Self-Loop
�
Test
With the aid of instruments or testing software, test the corresponding technical parameters of subscriber lines, transmission channels and trunk equipment that probably have faults,
and judge whether the equipment is faulty or going to be faulty
based on the test results.
In addition, ZXWN MGW supports testing a CPU on a board
instantly, and scheduling the task to test CPUs periodically or
in batches, too.
�
Self loop
Self-loop refers to testing the transmission equipment or
transmission channel by adopting the self-transmitting and
Confidential and Proprietary Information of ZTE CORPORATION
5
ZXWN MGW Troubleshooting
self-receiving method in hardware or software mode, to judge
whether such conditions as the transmission equipment, the
transmission channel, the service status and the signaling
coordination are normal. Through these conditions, confirm
whether the condition of the corresponding hardware and
the software parameter settings are normal. This method is
mostly used to locate the transmission problems and judge
the correctness of the trunk parameter settings.
When locating the fault related to the transmission, test is often
used together with self-loop. Self-loop can be divided into hardware self-loop and software self-loop. The latter one is featured
by simple operation and flexible usage, but its reliability is not as
good as the hardware self-loop. In addition, during the course of
the office commissioning and trunk expansion, the trunk self-loop
of ZXWN MGW is often used to judge the correctness of such aspects as the trunk parameter settings of the local office and the
outgoing routing data configuration.
Caution:
Cancel the software self-loop once the troubleshooting is completed. Therefore, it is recommended for the maintenance personnel to form a habit of making records to avoid such things.
Unplugging/Plugging
When the circuit board is faulty, eliminate such faults as poor
contact or processor exception by plugging/unplugging the circuit
board and external interface connector.
During the course of plugging/unplugging the board, the operation regulations of plugging/unplugging the board must be strictly
complied with; otherwise, the board and other components may
be damaged.
Comparison and
Interchange
�
Comparison
Comparison refers to comparing the faulty component or fault
phenomenon with the normal one and performing analysis to
find the differences and locate the fault. This method is applicable to the situation of single fault.
�
Interchange
Interchange refers to interchanging a faulty component with
the normal one (such as the board or the fiber) when the fault
range or the faulty component still cannot be confirmed after
replacing the components with spare parts. Then compare the
new running status with the previous status to judge the fault
range or location.
6
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 1 Troubleshooting
Caution:
Take all precautionary measures while interchanging the board,
because board interchange has a high risk and is easy to bring
new problems. Interchange operation should only be applicable
to the components such as optical fiber and E1. Perform board
replacement or board interchange when the traffic is low, for example, 0:00 ~ 6:00.
Configuration
Modification
The modified configuration contents can include the timeslot, the
board position, the board parameters, the number selector, the
number properties, the trunk properties and the protocol type.
So this method is applicable to eliminate the fault caused by the
configuration error after locating the fault to the single site.
For example, given that the signaling link to an office direction is
abnormal, which affects the signaling transmitted through the link
and services related to the link. The following configuration data
can be modified:
1. Change the signaling route so as the new route does not use
the signaling link to this office direction. If the fault disappears,
it indicates that the fault lies in the original route.
2. Change the signaling link group so as the new signaling link
group does not use the signaling link to this office direction.
If the fault disappears, it indicates that the fault lies in the
original link group.
3. Change the SPB channel used by the signaling so as the new
signaling does not use the channel on the original SPB. If the
fault disappears, it indicates the fault is related to the original
SPB.
In the process of version upgrading or capacity expansion, the
original configuration can be deployed again to locate the fault
when doubting that there are errors in the new configuration.
When the fault cannot be exactly located to the board by modifying
the timeslot configuration, the replacement method is required to
further locate the fault. Therefore, under the condition of no spare
board available, this method is applicable to preliminarily locate
the fault type and temporarily recover the services by using the
other service channels or board positions.
Performance
Statistics
The performance statistics is mainly used to locate the system
resource fault.
The performance measurement analysis is to create the performance measurement task through the ZXWN MGW performance
statistic function, to analyze the possible cause and range of the
fault. Usually, its result is important for fault handling.
Coordinating with the signaling tracing tool, the performance statistical tool plays an important role in finding abnormal inter-office signaling coordination and the trunk parameter setting error.
Therefore, the maintenance personnel should master it.
Configuring Data
The configured data decides the working and coordinating mode
of the system. Generally, the data modification is not allowed.
Some special conditions, such as the abrupt change of the external
environment or the wrong operation, probably cause the damage
Confidential and Proprietary Information of ZTE CORPORATION
7
ZXWN MGW Troubleshooting
or modification of the configuration data of the equipment, and the
service interruption as well.
If the fault has been located to the local system, it is capable to
query and analyze the current configuration data, and identify the
wrong operation of the network management by checking the user
operating logs of the network management.
Only the maintenance personnel with rich experiences and knowing well the equipment can analyze the configuration data.
Experience
Processing
Such special condition as instantaneous abnormal power supply,
low voltage and the strong external electromagnetic interference
can cause the abnormal working status of the board. The service
may be interrupted, accompanied with corresponding alarm, or no
alarm. And the configuration data of each board may be normal.
Testified by the experiences, the fault can be effectively handled by
using such methods as resetting the board, restarting the equipment after power-off, synchronizing the data again or switching
the MP.
It is recommended not to use this method frequently, as this
method cannot completely find out the fault reason. Except for
the emergency, preferentially adopt previous methods or ask
for the technical support through the normal channel to locate
the fault to eliminate the external/internal hidden trouble of the
equipment.
8
Confidential and Proprietary Information of ZTE CORPORATION
Chapter
2
Hardware Faults
Table of Contents
Faults in Board Configuration ............................................... 9
Faults in Board Running Process ..........................................12
Handling Changeover Exceptions .........................................30
Handling CPU Overload.......................................................31
Faults in Board Configuration
This section describes how to handle the faults occurring during
board configuration.
Handling Fault Occurring during
Board Configuration
Background
ZXWN MGW adopts integrated hardware platform. According to
different board types, the loaded board falls into loading OMP
board and non-OMP board.
When OMP board starts, it requests profile from OMM server
through IP address that is allocated during serial port debugging.
If the profile can be obtained, OMP board compares it with the
local file. If they are consistent, OMP board invokes the version
from the local hard disk for starting. Otherwise, it obtains the
version file from OMM server for starting, and modifies the local
profile at the same time. If the profile is unavailable, OMP starts
according to the local profile.
Non-OMP board requests the version file from active OMP instead
of from OMM server for starting. If non-OMP board obtains the
version file from OMP successfully, it uses the obtained version file
for starting. Otherwise, it invokes the version file stored on its
own FLASH.
Common
Phenomena
Common Causes
�
Board indicators failed to be illuminated normally.
�
The OMM alarm system prompts that the board is offline or
unstable.
During board configuration, the common fault causes are as follows.
1. Board fault occurs in the BOOT phase.
Confidential and Proprietary Information of ZTE CORPORATION
9
ZXWN MGW Troubleshooting
Hardware problems result in a lot of boards failing to start, such
as FLASH, RAM, hard disk, and some sub-cards. Poor contact
between backplane and boards may also cause such problems.
In these cases, faulty device can be located through the serial
port printing. To locate the device fault, replace the related
devices with spare parts to exclude the devices one by one.
Generally, all devices complete self-test in the production
phase. However, faults may occur in transit. Some faults
can be handled manually, for example, installing hard disk or
memory bar again. But in most cases, the board needs to be
sent back for repairing.
2. Slot or backplane error occurs in the BOOT phase.
Some boards failed to start due to slot or backplane problem. To locate it, plug the normal board into the faulty slot.
If the board still fails to start, then there are some problems in
the slot or backplane. On site, check whether the configuration conforms to the platform specifications for the slots, and
whether the backplane of this slot is damaged physically.
If the fault results from the physical fault of the slot or backplane, do not use this slot to avoid the fault on site.
3. The error occurs while downloading a version.
Check whether the shelf control-plane connection-line corresponding to this slot is normal. Usually, all of the boards in the
shelf cannot start if there are some problems in the connection
line. Some faulty chips in the UIM board may also cause some
slots in the shelf failing to start.
4. BSP self-test error
Generally, if there are some problems in the equipment hardware, view the alarms on the background to find out specific
causes. In some exceptional cases, the LED indicator of the
board is physically faulty, but the board works normally.
To handle the fault occurring in this phase, send the board back
for repairing.
5. Errors occur in other subsequent phases.
Usually the hardware works normally in these phases. The
fault is due to the software configuration error. While handling
the fault, first check whether the software version and related
chip version are correct through the alarms corresponding to
the version management. Then check whether the slot restriction, HW distribution, and port configuration are reasonable. Finally check whether the active/standby configuration
or load-sharing configuration is correct.
Flow Diagram
10
Figure 2 shows the flow of handling the fault occurring during board
configuration.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
FIGURE 2 HANDLING THE FAULT OCCURRING DURING BOARD
CONFIGURATION
Fault Handling
Procedure
1. Check whether version loading is finished for all the boards
in the Version Management tab of the Professional Maintenance window of the NetNumen M30 window. Check
whether the OMPCFGx.INI profile (x is the office number) exists in the folder zxwomcs\ums-svr of the installation directory
on the OMM server. Check whether the loaded version file
exists in the zxwomcs\ums-svr\cnvmVerFile directory.
If there is no OMPCFGx.INI file in the specified directory, create
the OMP boot file in the Version Management window. If
there is no loaded version file in the specified directory, load
the version again.
2. On the File Management window of Local Maintenance Tool,
check whether the version file exists on the hard disk (named
as /DOC0) and memory (named as /IDE0) of OMP board.
If the version file does not exist in the specified directory
(/DOC0/VER and /IDE0/RELEVER) of hard disk on the OMP
board, load the version again.
3. On the File Management interface of Local Maintenance Tool,
check whether version files on other boards are correct. In
general, version files exist in the /FLASH0/VER directory.
If you failed to query the version file with the file management
function of Local Maintenance Tool, check the running status
of the board, implement the troubleshooting flow of board. If
the queried version file is wrong, load the correct version file
for this board in the Version Management tab.
Confidential and Proprietary Information of ZTE CORPORATION
11
ZXWN MGW Troubleshooting
4. Check the working status of UIM and CHUB boards. If they are
faulty, the communication between OMP and other boards will
be interrupted. As a result, other boards cannot load version
file from OMP.
If UIM and CHUB boards work abnormally, check their hardware configuration and version loading to make sure that they
run normally.
An Instance for Handling Board
Version Loading Failure
Topic
Handling board version loading failure
Symptom
ZXWN MGW may upload version files to server when loading version files in batch by default. But it cannot complete foreground
switching. No prompt is displayed during this process.
Solution
1. Check whether the communication between background server
and foreground OMP is normal. Find that the background can
ping through the foreground. Check CPU1 on OMP board, and
find that the OMP board runs normally, and the RUN2 indicator
on its panel is in normal status.
2. To make sure whether the OMP gets the version files to be
loaded, check whether there are three files (*.BIN, *.RBF and
*.ini) in the \ZXWN-OMCS\zxwomcs\ums-svr directory on OMM
server, and then restart the OMP. Check whether the OMP configuration is correct, whether the OMP gets the version from
server. The results meet the requirements. After restarting
the OMP, log in the OMP from hyper-terminal, and check the
working status of the board with the SCSSHowMcmInfo command. The board is normal. Therefore, the board hardware is
not faulty.
3. Check the OMP board again, and find that the RPU fails to start.
Check whether the background is configured with the capacity
of office data. Restart the OMP. The problem still remains.
4. Probably the problem lies in FTP settings. On the Professional
Maintenance > Version Management tab of the NetNumen M30 window, click the Get OMP’s OMC FTP Address
button on the sub-tool. Find that the server IP address for
version downloading is 127.0.0.1. Modify it to the server’s intranet IP address. After that, version loading is normal.
Summery
The FTP address of OMM server must be configured correctly for
OMP board to get version files during the version-loading process.
Faults in Board Running
Process
This section describes how to handle the faults occurring during
the running process of boards.
12
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
Fault Handling Procedure
Common Fault
Phenomena
Common Causes
of Faults
Flow
�
Indicators on board cannot be illuminated normally.
�
OMM fault management system prompts that a board is offline
or unstable.
�
All board-related services are interrupted.
�
Malfunctioning board’s own hardware fault
�
Upper-level board of the board that reports an alarm is faulty.
�
Poor contact between board and its slot
�
Backplane is faulty.
�
Port connection of the board is faulty.
�
UIM board fault
�
CHUB board fault
Figure 3 shows the flow of handling the common faults occurring
during the running process of boards.
FIGURE 3 HANDLING COMMON FAULTS
Precautions
OF
BOARDS
Board changeover, reset, and replacement will bring some influence on the system. These operations must be implemented under the guide of ZTE technical supporters. Furthermore, board replacement procedure must strictly conform to the operating specifications. Refer to ZXWN MGW Board Replace for details.
Confidential and Proprietary Information of ZTE CORPORATION
13
ZXWN MGW Troubleshooting
Handling UIM Board Hardware Faults
Background
UIM board implements functions of managing resource shelf, the
Ethernet Level-2 switching, and the circuit domain timeslot multiplexing/exchanging in the resource shelf. Meanwhile, the UIM
board provides external interfaces of the resource shelf, including
the packet data interfaces (GE optical interfaces) connecting with
the core switching unit, the circuit domain interfaces (optical interfaces) connecting with the circuit switching units and the control
plane data Ethernet interfaces (4 FEs) of the distributed processing platform. The UIM board provides the control plane, the media
plane and the HW resources for this resource shelf. The UIM board
can be divided into several types, as described in Table 1.
TABLE 1 UIM TYPES
Board
Name
Function
UIMC
Universal Interface
Module of Control
Having the GCX sub-card, without the T network and the media
plane, and only providing the control plane resources
UIMP
Universal Interface
Module of Packet
Having the GXS sub-card, without
the T network resources, and providing the control plane resources
and the media plane resources
UIMT
Universal Interface
Module of TSNB
Having the TDM optical interface,
introducing the TSNB 8K timeslot
resources to this recourse shelf,
and providing the big T network
resources and the control plane
resources
UIMC
Universal Interface
Module of BUSN
Having the small T network and
the GXS sub-card, and providing the control plane, media
plane and inner-shelf 4K timeslot
switching resources
The UIM cable has the following types:
14
�
Clock connection line: adopts the ZTE dedicated clock cable,
connecting the RCLKG to the UIM. It sends the CLKG clock
signal to the UIM, and then the UIM provides the clock signal
to each board in the shelf.
�
Control plane connection line: adopts the ZTE dedicated cable
in the case of an office with multiple shelves, connecting the
RUIM (UIM rear board) to the RCHB (CHUB rear board); adopts
the straight Ethernet cable in the case of an office with two
shelves, connecting from the RUIM of the main control shelf to
that of non-main control shelf.
�
TDM connection line: used in the case of an office with the big
T network, and adopts the fiber, connecting from the TFI of
the core switching shelf in the circuit domain to the UIMT of
the resource shelf.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
Fault Phenomenon
Flow
�
Media plane connection line: used in the case of an office with
Level-1 switching shelf, and adopts the fiber, connecting from
the GLIQV board of the Level-1 switching shelf to the UIM.
�
The network management alarm system reports that the OSS
interruption occurs to several or all boards, and the network
interface of the control plane is blocked.
�
The ALM indicator on the UIM board gives alarms. And the
network management system reports the 8K and 16M clock
input loss.
�
Plugging/unplugging the board except the UIM causes alarms
that other online boards are abnormal.
�
There are whole-shelf HW loopback detection alarms in resource shelf.
The troubleshooting flow of the UIM fault is shown in Figure 4.
FIGURE 4 TROUBLESHOOTING FLOW
Solution
OF
UIM FAULT
1. When the fault occurs on the UIM board, checking the fault
from the indicators on the board is a relatively direct method.
Sometimes, it is the only method. Table 2 lists the meanings
of the indicators on the UIM board.
The normal running status is that the RUN indicator flashes at
1 Hz, and both the ALM indicator and the ENUM indicator are
constantly off.
�
If the RUN, ALM, and ENUM indicators are on, usually the
board breaks down.
Confidential and Proprietary Information of ZTE CORPORATION
15
ZXWN MGW Troubleshooting
�
�
�
�
If the ENUM indicator is on, the board is not plugged well.
Check whether the board is plugged well, and whether the
extractor is closed.
If the RUN indicator stays off, there is hardware fault occurs
on the board. Replace the board.
If the RUN indicator and ALM indicator are on constantly,
there is a hardware logic problem in the board. It probably
needs to update the chips on the board, which can be solved
by replacing the board on site.
If the RUN indicator flashes quickly, the board is loading
the version information. If the board cannot work normally,
check whether the hardware configuration and loaded version are correct.
TABLE 2 INDICATORS
ON
UIM
Indicator
Description
RUN
Run indicator
The board runs normally - flashing at 1 Hz.
ALM
Alarm indicator
ACT
Active/standby indicator
On – Active, off - Standby
ENUM
Board plugging/unplugging indicator
On: the board is not well plugged
ACT-P
Active indicator of the board packet domain
On: the board packet domain is active
ACT-T
Active indicator of the board circuit domain
On: the board circuit domain is active
16
ACT1
Optical interface 1 is active when plugging
the GXS sub-card.
ACT2
Optical interface 2 is active when plugging
the GXS sub-card.
LINK1
On: the FE-C1 port on the rear board is link
up
LINK2
On: the FE-C2 port on the rear board is link
up
LINK3
On: the FE-C3 port on the rear board is link
up
LINK4
On: the FE-C4 port on the rear board is link
up
SD1
There is signal in the optical interface 1
when plugging the GXS sub-card.
SD2
There is signal in the optical interface 2
when plugging the GXS sub-card.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
Indicator
Description
SD3
There is signal in the first TDM optical
interface when using the TDM optical
interface.
SD4
There is signal in the second TDM optical
interface when using the TDM optical
interface.
2. Switch the active/standby UIM, to check whether the fault disappears. But it is recommended not to plug/unplug the possible faulty board, because the fault possibly will not appear
again. The faulty board can be plugged and unplugged if the
changeover fails. If the fault still exists, replace the faulty UIM
board.
3. Whole-shelf HW loopback detection alarms of the resource
shelf: the T network related part is abnormal, check such
items as the small T network configuration of the UIMU board,
whether the physical connection between the UIMT and the
TFI is consistent with the UIM-COMM-TFI configuration port,
the physical connection, and the connection indicator.
Handling OMP Board Hardware
Faults
Background
The OMP board is responsible for the operation, maintenance and
management of the whole NEs on the foreground. It receives the
instructions from the OMM, reports various alarm information, traffic statistical information and tracing message specified by the subscribers to the OMM. It is the operation and maintenance center
of the system. The OMP is also responsible for managing and distributing the software version of all of boards on the foreground.
Each board needs to load the required version file from the OMP
for starting.
The OMP connects to the IP network externally through the OMC2
interface on the RMPB rear board, implementing the communication with the OMM server.
Fault Phenomenon
Flow
�
The Run indicator on the panel of the OMP board cannot flash
at 1 Hz normally, continuously alternating between flashing
quickly and turning off.
�
The OMM alarm system reports that the communication between the OMP board and the foreground fails, unable to transmit any data to the foreground.
�
The NE maintenance tools of the foreground, such as the signaling tracing and failure observation, cannot work normally.
�
The OMP board is in a deadlock.
The troubleshooting flow of the OMP fault is shown in Figure 5.
Confidential and Proprietary Information of ZTE CORPORATION
17
ZXWN MGW Troubleshooting
FIGURE 5 TROUBLESHOOTING FLOW
Solution
OMP FAULT
1. Observing the status of the RUN2 indicator on the OMP board,
first judge whether the board works normally. There are two
CPUs on the OMP board. The CPU1 belongs to the RPU module, while the CPU2 belongs to the OMP board. In the normal
working status, the RUN2 indicator of the OMP module flashes
at 1 Hz. Otherwise, there is fault in the OMP module.
�
�
�
�
18
OF
If the ENUM2 indicator is on, the board is not plugged well.
Check whether the board is plugged well, and whether the
extractor is closed.
If the RUN2 indicator stays off, there is hardware fault occurs in the board. Replace the board.
If both the RUN2 indicator and ALM2 indicator stay on,
there is a hardware logic problem in the board. It probably needs to update the onboard chips. It can be solved
by replacing the board on site.
If the RUN2 indicator flashes quickly, it indicates that the
board is loading the version information. If the board cannot work properly, check whether the hardware configuration and loaded version are correct.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
�
If the HD2 indicator stays on, it indicates that the board is
reading the hard disk all the time. Probably, there is a fault
in the hard disk of the board. Replace the board.
2. On the OMM client, perform the active/standby changeover for
the OMP board. If the changeover is failed, directly press the
EXCH key on the board for a while to implement hardware
changeover.
3. If the fault still remains after changeover, reset the OMP board.
Reset the board on the OMM client. If the resetting is unsuccessful, press the Reset key on the board with a tool to reset
the board forcefully. If the fault still remains after resetting,
replace the OMP board to make sure whether it is the hardware
fault.
4. If the fault still remains, check the pins on the backplane
to eliminate the contact problem. Check whether the DIP
switches on the backplane are correct. The DIP switches in
the shelf where the OMP is located are fixedly in the No.2 shelf
in the No. 1 rack.
Handling SMP Board Hardware
Faults
Background
The SMP board processes the MTP3 and its upper-layer protocols,
including the CC, MM, SCCP, BSSAP, BSSAP+, RANAP and H.248
protocol. It is the center of all services of the control system.
According to different functions, the SMP board is divided into the
signaling SMP and service SMP.
�
The signaling SMP is responsible for processing signaling, such
as the SIGTRAN protocol.
�
The service SMP is responsible for various upper-level services,
such as the call control and the mobility management.
In addition, the SMP generates the original CDR that is forwarded
by the USI to the billing server through the internal bus.
Without external interfaces, the SMP does not need the rear board
and the connection lines.
Fault Phenomenon
Solution
�
There are OSS communication interruption alarms in the network management.
�
The RUN indicator of the SMP board cannot flash at 1Hz, continuously alternating between flashing quickly and turning off.
�
Some signaling links are unavailable.
�
The services of some subscribers are interrupted.
Both the SMP and the OMP adopt the MPx86 physical board. For
the fault handling procedures, refer to “Solution” in Handling OMP
Board Hardware Faults.
Confidential and Proprietary Information of ZTE CORPORATION
19
ZXWN MGW Troubleshooting
Handling SIPI Board Hardware Fault
Background
The SIPI board is responsible for providing the bottom-layer IP
interface for SIGTRAN, and the external IP interface as well.
The RMNIC rear board provides the external interface of the SIPI,
connecting to the IP network with the network cable, and interconnecting with the adjacent office interface.
Fault Phenomenon
Flow
�
The fault management system on the OMM client prompts that
the association is interrupted.
�
The RUN indicator on the SIPI board flashes abnormally.
�
The office direction of some associations is unreachable.
�
Subscriber services are interrupted.
Figure 6 shows the flow of troubleshooting the SIPI board fault.
FIGURE 6 HANDLING
Solution
THE
SIPI BOARD FAULT
1. The MNIC does not need the FE port when serving as the SIPI
board. At this time, set the number of FE ports as zero.
2. The SIPI board connects the external network with the FE1
network interface, and the internal control plane with the FE3
network interface. Check the network cable to make sure that
the connection of network interfaces is normal.
3. Locate and replace the SIPI board with the replacement
method. Make sure whether the fault results from itself or
other boards.
Example: When the OMP is faulty, the SIPI cannot load the
version file while starting, resulting in the activation failure of
the SIPI board. When the UIM is faulty, the SIPI cannot communicate normally.
20
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
Handling SPB Board Hardware
Faults
Background
The SPB is responsible for processing the narrowband Signaling
System No.7 (SS7) MTP-2 protocol and its lower-layer protocols.
Each board provides 16 E1s externally for accessing, and extracts
the 8K line clock reference to the CLKG. In addition, the SPB can
be set with the 75 Ω or 120 Ω transmission mode according to
different impedance characteristics of the transmission line.
The SPB connects with other offices through the interfaces on the
RSPB rear board, adopting the ZTE dedicated transmission cable
as the connection line.
Fault Phenomenon
Flow
�
The board cannot be started, or in a deadlock while running.
The alarm indicator on the board flashes.
�
The fault management system prompts that the bit errors exist
in the link layer, or that the CRC check fails.
�
Some or all links on the SPB board are interrupted.
The troubleshooting flow of the SPB fault is shown in Figure 7.
FIGURE 7 TROUBLESHOOTING FLOW
OF
SPB BOARD
Confidential and Proprietary Information of ZTE CORPORATION
21
ZXWN MGW Troubleshooting
Solution
1. Check whether the CPU1 is normal.
There are up to four CPUs on the SPB board. The CPU1, as the
primary CPU, manages the board resources, including turning
on each indicator. The rest three CPUs are the secondary CPUs.
Therefore, check the status of CPU1 first when the board cannot be start or in a deadlock during the running.
Check the CPUs of the SPB board in the Rackchart Management tab of the Daily Maintenance window of the NetNumen M30 window. The system will display the status of four
CPUs. Check whether the CPU1 is normal.
2. The normal running status is indicated as:
�
The RUN indicator flashes at 1 Hz.
�
The ACT indicator is on constantly.
�
The ALM and the ENUM indicators are off constantly.
When all these indicators are on, the CPU1 possibly cannot be
started. The reasons are as follows:
�
�
If this problem appears when the board is powered on for
the first time, generally, the reason is that the boot is
burned incorrectly. Burn the software of the BOOT chip
again.
The mother board of SPB is faulty. Replace the board to
eliminate the fault.
3. The board runs normally, but there are bit errors in the E1 link,
especially with large traffic.
�
�
�
Check whether the clocks of two interconnected environments are synchronous;
Check whether the configuration of the jumpers and board
impedance DIPs are consistent with the cable adopted;
Check whether the impedance against ground of each environment conforms to the requirements.
For the SPB, the specific method of checking the impedance matching is as follows.
1. Findd four DIP switches (S3-S6) on the board, with each one
corresponding to four channels of E1s (No.1~16 channels of
E1s from the upper to the lower). The "ON" position represents
75 Ω, while the “OFF” position represents 120 Ω.
2. Find four DIP switches S2 that indicate the E1 status. From
the upper to the lower, one switch corresponds to one group of
E1s (having 4 E1s), that is, 16 E1s totally. The "ON" position
represents 75 Ω, while the “OFF” position represents 120 Ω.
3. Set the jumper on the rear board. There are five groups of
jumpers (X11-X15) from the upper to the lower. Where, the
X11 and X15 correspond to 2-channel E1s respectively, while
other jumpers correspond to 4-channel E1s respectively. Each
channel of E1 is configured with jumpers on the sending and
receiving directions. The sending jumper is above the receiving
one. When the impedance matching is set as 75 Ω, just place
the jumper on the sending end, and no jumper on the receiving
end. When the impedance matching is set to be 120 Ω, place
no jumper on both the receiving and the sending end.
22
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
Handling CLKG Board Hardware
Faults
Background
As the clock generation board, the CLKG is responsible for providing the clock for the system, tracing the external reference line
clock, and keeping the NE clock synchronous with the network
clock. In the ZXWN MGW, the clock is used to help establish stable link with the traditional SS7 network. Both the SPB board and
UIM board use the clock.
The CLKG board need not download the version from the OMP during the activation. It activates through the program in the onboard
chip directly. Therefore, the version loading problem can be omitted while troubleshooting the CLGK board.
The CLKG board provides the internal clock interfaces and external
clock tracing interfaces through the RCLKG rear board. The internal clock interface connects to the UIM on the expansion frame
with the ZTE dedicated clock cable, and then UIM sends the clock
to each board that needs the clock. The external clock tracing
interface connects to the clock tracing signal output interfaces of
the external interface boards such as SPB, APBE and SDTB with
the ZTE dedicated clock tracing cable.
Fault Phenomenon
Flow
�
The RUN indicator on the CLKG board flashes abnormally.
�
The red ALM indicators on the UIM, APBE and SPB boards are
on.
�
The fault management system on the OMM client prompts that
the clock signal is lost.
�
Faults occur in the SS7 link, such as intermittent disconnection,
bit error and interruption.
Figure 8 shows the flow of troubleshooting the CLKG board fault.
Confidential and Proprietary Information of ZTE CORPORATION
23
ZXWN MGW Troubleshooting
FIGURE 8 TROUBLESHOOTING FLOW
Solution
OF THE
CLKG BOARD
1. Check the indicators on the CLKG board to confirm the running
status of the CLKG.
2. Check whether the external line of the rear board RCLKG of the
CLKG is connected correctly.
3. Perform active/standby changeover to check the working status of the CLKG board.
4. Check whether the fault lies in the CLKG board by using the
replacement method. If the CLKG board is faulty, replace it.
Handling IPI Board Fault
Background
Fault Phenomenon
Flow
24
The SIPI board is responsible for providing the IP interface of the
media plane externally. The RMNIC rear board provides the IP
interface, connecting to the IP network with the network cable,
and interconnecting with the adjacent office interface.
�
The fault management system of the OMCS client prompts that
there is a RTP trunk alarm.
�
The RUN indicator on the IPI board flashes abnormally.
�
Subscriber services are interrupted.
Figure 9 shows the flow of troubleshooting the SIPI board fault.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
FIGURE 9 HANDLING
Solution
THE
IPI BOARD FAULT
1. The IPI board adopts the MNIC board. When the FE port number is set as one or two, the MNIC board actually occupies two
media plane network interfaces. It will occupy the media plane
network interfaces on the adjacent slots.
2. Four network interfaces of the IPI board can all be used to
connect the external. Check the network cable to confirm the
connection of the network interface is normal.
3. Locate and replace the IPI board with the replacement method.
Make sure whether the fault results from itself or other boards.
Example: When the OMP is faulty, the IPI cannot load the version file while starting, resulting in the activation failure of the
IPI board. When the UIM is faulty, the IPI cannot communicate
normally.
Handling DTB/DTEC Board Fault
Background
Fault Phenomenon
As the digital trunk interface board, the DTB/DTEC board is used
to access the E1/T1 link. It provides 32-channel E1/T1 interfaces
externally through the RDTB rear board, and extracts 8k line clock
reference to the CLKG. In addition, it can respectively set the 75
Ω and 120 Ω transmission mode based on the impedance characteristics of the transmission line.
�
The fault management system of the OMCS client prompts that
there is an E1 trunk alarm.
�
The RUN indicator on the DTB/DTEC panel flashes abnormally,
and the ALM indicator shows that there is an alarm.
�
Subscriber services are interrupted.
Confidential and Proprietary Information of ZTE CORPORATION
25
ZXWN MGW Troubleshooting
Flow
Figure 10 shows the flow of handling the DTB/DTEC board fault.
FIGURE 10 HANDLING
Solution
THE
DTB/DTEC BOARD FAULT
1. There are totally 12 4-position DIP switches on the DTB/DTEC
board. Eight 4-position DIP switches (S1-S6, S9, and S12) are
for setting the matching impedance of each E1 channel as 75 Ω
or 120 Ω. S7 and S8 4-position DIP switches are for indicating
the receiving matching impedance of corresponding E1 chip to
the CPU. S10 and S11 position DIP switches are for indicating
the long/short wire status of each E1 chip to the CPU.
2. The DTB/DTEC board provides 32 E1s to the outside world. If
there is a problem in a certain E1, locate the fault by creating
the E1 self loop to make sure whether the problem occurs on
local end or opposite end. If it is at local end, check the status
of related sub-unit and trunk data.
3. If all of the E1s of the board are faulty, first try to reset the
board, and check data. If the fault still exists, replace the
board with a new one to eliminate the fault.
Handling APBE Board Fault
Background
26
As the Iu-CS interface board, the APBE board provides two optical
interfaces to connect the interface at the RNC side. It is responsible for processing the ATM adaptation, the broadband No.7 bottom-layer signaling such as the ALL5-SAR, SSCOP, and SSCF, and
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
forwarding the MTP3B signaling packet to the SMP through the FE
interface for processing.
Fault Phenomenon
Flow
�
Serious alarm appears in the fault management system, and
the communication interruption occurs on the APBE unit.
�
All of the RNC services connected by the APBE are interrupted.
�
The fault management system prompts that the office direction
to the RNC is unreachable.
Figure 11 shows the flow of troubleshooting the APBE fault.
FIGURE 11 HANDLING
Solution
THE
APBE BOARD FAULT
1. There are two pairs of optical interfaces on the APBE board
panel. Each pair of optical interfaces is configured with sending
and receiving directions. If the sending/receiving directions
are reversed, the board cannot receive the optical signal.
2. The APBE board must be configured with correct ATM address;
otherwise, the RNC office direction is unreachable.
3. Check whether the PVC signaling, voice channel PVC, VCI, and
VPI, and PATH ID settings are correct.
Handling MRB Board Fault
Background
The MRB is a media resource board in the MGW. It is responsible for providing various media resources for the system, including various signal tones, voice tones, DTMF, MFC, and multi-party
Confidential and Proprietary Information of ZTE CORPORATION
27
ZXWN MGW Troubleshooting
talking resources. The MRB board is configured in the BUSN shelf,
without backup. The MRB implements different functions by configuring different attributes for the sub-units in the OMM system.
For example, the sub-unit represents the tone sub-unit when being configured as TONE, the dual tone multiple-frequency sub-unit
when being configured as the DTMF, and the multi-frequency compelled sub-unit when being configured as the MFC.
Fault Phenomenon
Flow
�
For the call service, the local office cannot provide system tones
or provide wrong tones.
�
For the second call service, subscriber cannot dial a second
call, or an error occurs on the number identification after the
second call.
�
Three-party call and conference call service cannot be implemented.
�
Serious alarm appears in the fault management system, and
the communication interruption occurs on the MRB unit.
�
The RUN indicator on the panel of the MRB board is abnormal.
Figure 12 shows the flow of troubleshooting the MRB fault.
FIGURE 12 HANDLING
Solution
28
THE
MRB BOARD FAULT
1. There are total four DSPs on each MRB board. A DSP can be
configured with different media resources. According to the
fault phenomena, find out the corresponding DSP for resetting.
2. The MRB must be configured under the corresponding SMP according to the planning. If the configuration is wrong, the sub-
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
scribers under the corresponding SMP cannot use the resources
in the MRB.
3. Reset the whole board. If the fault still remains, replace the
faulty board with a new one to eliminate the fault.
Handling VTCD Board Fault
Background
Fault Phenomenon
Flow
The VTCD board is the voice codec board in the MGW. It is responsible for coding/decoding voice Iu-UP and AMR, and processing the
Nb-UP. The VTCD board is configured in the BUSN shelf, without
backup.
�
Serious alarm appears in the fault management system, and
the communication interruption occurs on the VTCD unit.
�
Such faults as the noise or no voice occur during the call.
�
The indicator on the VTCD is abnormal.
Figure 13 shows the flow of troubleshooting the VTCD fault.
FIGURE 13 HANDLING VTCD BOARD FAULT
Solution
1. There are total four DSPs on each MRB board. A DSP can be
configured with different media resources. According to the
fault phenomena, find out the corresponding DSP for resetting.
2. The MRB must be configured under the corresponding SMP according to the planning. If the configuration is wrong, the subscribers under the corresponding SMP cannot use the resources
in the MRB.
3. Reset the whole board. If the fault still remains, replace the
faulty board with a new one to eliminate the fault.
Confidential and Proprietary Information of ZTE CORPORATION
29
ZXWN MGW Troubleshooting
Handling Changeover
Exceptions
Background
Common Causes
Handling
Board changeover is a kind of maintenance means frequently used
for board replacement, version update, and other routine maintenance activities. After changeover, service personnel should observe the working status of the system. If it works normally, board
changeover is successful. If the system cannot work normally, immediately power down the board on which changeover was performed, and bring the standby board into active status.
The common causes for changeover exceptions are as follows.
�
Operations do not conform to standardizations.
�
It is prohibited by system running status.
�
Standby board is in abnormal status.
1. Check whether standby board is normal.
When standby board is unavailable for no insertion, faults or
abnormal running status, the system will deny to implementing
active/standby changeover.
On the Daily Maintenance window of the NetNumen M30
window, find the NE to be maintained. Open the Rackchart
Management tab, select the corresponding module and rack,
and find the board on which active/standby changeover is to be
implemented. Check the board’s information. If the board is in
hybrid, unknown or other statuses, active/standby changeover
cannot be performed for it.
2. Check whether board version is loaded normally.
On the Professional Maintenance > File Management tab
of the NetNumen M30 window, check whether the board version files on the board are correct.
If you failed to query version files with this method, check the
running status of the board, and implement the troubleshooting flow for the board.
If you can query version files, but they are wrong, load the
correct ones for this board in Version Management.
3. Other prohibited conditions
To ensure the safe operation of switches, the system will also
deny the changeover operation when large traffic, high CPU
occupancy, scheduled tasks, data backup, and other special
conditions occur. If you forcefully to implement changeover,
serious consequences may arise, such as CDR missing, call
disconnection, and all active/standby boards being reset.
30
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 2 Hardware Faults
Caution:
�
Changeover is an operation with relatively high risks. System
data backup must be done in advance.
�
It is recommended to perform changeover on OMP and other
important boards at 0:00 - 6:00 am, and to keep a certain
interval between two changeover activities.
Handling CPU Overload
Background
Common Causes
Common Fault
Phenomena
Flow
CPU overload is a major fault in ZXWN MGW. Too-high CPU usage
will increase call loss and decrease the call completion ratio. More
serious condition will cause ZXWN MGW breakdown.
�
Over excessive traffic
�
Interface congestion
�
Too-short cycle of performance statistics task
�
Irrational location area settings
�
Informal maintenance activities
�
Incorrect data settings.
�
The “Load of the CMP board is excessive” alarm appears in
Fault Management system
�
View the CPU occupancies of CMP (service processing module)
and SMP (signaling processing module) in the Performance
Statistics and Load Statistics windows of the NetNumen
M30 window.
1. Check traffic.
On ZXWN MSCS, query recently performance statistics reports
to know the traffic within a period of time. Generally, observe
the conditions of CPU overload caused by large traffic. To control traffic, refer to the Emergency Handling for Large Traffic
section in ZXWN MSCS MSC Server Emergency Fault Handling
and System Recovery.
2. Check maintenance operations.
A lot of maintenance tasks will consume too many CPU resources. Therefore, some operations should be avoided when
traffic is large, such as performing bulk modification with commands, displaying excessive command-executed results, performing dynamitic tracing on excessive links, tracking excessive signaling, and other operations.
3. Check performance statistical cycle.
During routine maintenance, majority traffic statistical tasks
are closely associated with calls. Therefore, too-short statistical task cycle will aggravate the CPU load of system. Currently,
a 1-hour cycle is relatively adequate.
Confidential and Proprietary Information of ZTE CORPORATION
31
ZXWN MGW Troubleshooting
4. Check whether data configuration is correct.
For MSCS, data configuration errors will cause CPU overload in
the following three aspects.
�
�
�
Unbalanced load-sharing configuration on signaling links
and trunks results in some signaling links carrying too large
load. It causes the board that is responsible for processing
this part of services is overloaded. In this case, data-link
configuration should be adjusted.
Unbalanced trunk distribution results in some modules carrying relatively heavy load. In this case, circuits should be
distributed to each module.
Incorrect MAP configuration will also cause excessive CPU
load.
5. Check whether location area settings are rational.
Slit the location areas with irrational settings, or adjust them
through BSC/RNC.
32
Confidential and Proprietary Information of ZTE CORPORATION
Chapter
3
Clock Faults
Table of Contents
Handling System Clock Exception ........................................33
Handling the Clock Lock Failure ...........................................35
Handling the Clock Networking Fault ....................................38
Handling the Inconsistent Lock Status of Active/Standby
CLKG Boards.....................................................................39
Handling the Clock Reference Loss .......................................41
Handling the Output Clock Loss ...........................................42
Handling the Slip Code .......................................................44
Handling System Clock
Exception
Background
The clock plays a vital role in the CS domain. Without clock, the
CS domain cannot work normally. Therefore, it is necessary to
ensure the normally working of clock system.
In ZXWN MGW, the CLKG is plugged into the main control shelf
where the OMP is located, providing the clock signal to the UIM
boards in all shelves.
The CLKG traces the upper-level office clock signal from the DTB,
SDTB, APBE and SPB boards with the clock tracing cable to keep
the local clock synchronous with the upper-level office clock.
Common Causes
Flow Diagram
�
On the Fault Management window, there are Clock Loss
alarms reported by the resource board, such as DTEC, VTCD,
MRB and SPB.
�
The ALM indicator of the board is on.
�
The narrowband link is interrupted or there are alarms in the
narrowband link.
Figure 14 shows the flow of troubleshooting the clock abnormality.
Confidential and Proprietary Information of ZTE CORPORATION
33
ZXWN MGW Troubleshooting
FIGURE 14 HANDLING
Solution
THE
CLOCK ABNORMALITY
1. Check whether the CLKG board runs normally. For the method,
see Handling CLKG Board Hardware Faults.
2. If there is a fault in the CLKG board, handle it first, and then
check whether the system clock exception disappears. If the
CLKG works normally, check whether the connection between
the UIM board in the BUSN shelf and the CLKG is correct.
Note:
The CLKG board is connected to the UIM through the RCLKG
rear board.
3. If there is a fault in the connection line, replace the connector of
the clock cable to the UIM board, or the clock cable to eliminate
the fault.
4. If the clock is still lost in the case of both the CLKG board and
the connection between the CLKG board and UIM board are
correct, check whether the UIM board works normally. For the
method of troubleshooting the UIM fault, see .
34
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 3 Clock Faults
Handling the Clock Lock
Failure
Background
During the running process, the ZXWN MGW extracts the upperlevel office line clock, BITS clock or GPS clock according to the
configuration, to keep synchronous with the entire network clock.
When tracing the upper-level office line clock, it extracts the upper-level clock reference signal through the interface boards SPB,
DTB, APBE, and SDTB, and sends signal to the RCLKG rear board
of the CLKG through the dedicated clock tracing cable. The CLKG
traces this clock reference automatically.
When tracing the BITS clock or GPS clock, the dedicated clock
tracing cable transmits the external clock reference signal to the
RCLKG rear board of the CLKG. The CLKG automatically traces this
clock reference.
The clock lock failure means that the local office cannot synchronously trace the standard clock reference of upper-level office,
such as the BITS clock and line clock. When the clock lock is
failed, the working status of the CLKG board in the ZXWN MGW is
as follows.
�
The CATCH indicator is constantly on.
�
Both the CATCH indicator and the TRACE indicator have flashed
synchronously for more than 30 minutes.
Based on the above-mentioned conditions, judge that the CLKG
board is hard or unable to lock the upper-level clock.
Fault Phenomenon
Flow
�
The link-layer bit error occurs in the outgoing narrowband link
of the E1 (connecting with the HLR) or DT (connecting with the
PSTN) of the SPB board.
�
The outgoing E1 indicator of the SPB/DTEC board flashes abnormally, and the alarm indicator is on.
�
The signaling link is disconnected or intermittent.
�
The CLKG board cannot remain stable tracing status.
�
The indicator on the CLKG board is abnormal.
�
The signaling link is unstable, appearing intermittent disconnection and even interruption.
�
The fault management system reports the slip alarm.
�
Such faults as the noise and one-way transfer may possibly
occur when subscribers are talking.
The troubleshooting flow of the clock locking failure is shown in
Figure 15.
Confidential and Proprietary Information of ZTE CORPORATION
35
ZXWN MGW Troubleshooting
FIGURE 15 TROUBLESHOOTING FLOW
Solution
OF
CLOCK LOCKING FAILURE
1. The CLKG board is preheated insufficiently.
The CLKG board adopts the high-stability and high-precision
crystal oscillator to generate the local clock. Due to the working
characteristics of the crystal oscillator, the CLKG board cannot
enter the normal tracking status until it has been preheated
for a period of time. If the system failed to trace the clock
signal when the CLKG board has not run for enough time or just
been replaced, wait for at least three hours for the CLKG board
is preheated sufficiently, and then observe the clock tracing
status again.
2. Check the setting of the interface board (SPB).
�
�
If the clock fault occurs on the backplane of the SPB, troubleshoot the fault by replacing the CLKG board, UIM board,
and clock output cable.
Check whether the setting of 75/120 Ω matching
impedance is correct. The impedance of the E1 cable must
match that of the board.
The method to set the E1 cable impedance is: set the DIP
switches S3~S6 on the SPB board. The ON position in-
36
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 3 Clock Faults
dicates that the matching impedance is 120 Ω, while the
OFF position indicates that the matching impedance is 75
Ω. Table 3 lists the corresponding relationship between the
DIP switches and E1 lines.
TABLE 3 IMPEDANCE DIP SWITCHES
�
�
OF
E1
ON THE
SPB BOARD
DIP Switch
Corresponding E1
1st bit ~ 4th bit of S3
Channel 1~4 E1 of the SPB
board
1st bit ~ 4th bit of S4
Channel 5~8 E1 of the SPB
board
1st bit ~ 4th bit of S5
Channel 9~12 E1 of the SPB
board
1st bit ~ 4th bit of S6
Channel 13~16 E1 of the SPB
board
Check the reliability of the cable connecting to the E1, which
can be solved by replacing the E1 cable.
When the E1 line length is more than 300 meters, adopt
the long-line mode.
For the DTEC board, find the E1 status indicating DIP S10
and S11, with eight switches in total. From the upper to
the lower, a switch corresponds to a group of E1s (4 E1s),
that is, eight groups of E1s in total. The short-line mode is
set by placing the switch to ON, and the long-line mode is
set by placing it to OFF.
For the SPB board, find the E1 mode setting DIP S1, with
four switches in total. From the upper to the lower, a switch
corresponds to a group of E1s (4 E1s), that is, four groups
of E1s in total. The short-line mode is set by placing the
switch to ON, and the long-line mode is set by placing it to
OFF.
3. Handle the clock networking fault.
The clock networking fault results from the self-loop relationship exists between the local exchange and upper-level clock
office, which causes the clocks to track mutually. In this case,
the clock lock failure often occurs.
For the handling method, see Handling the Clock Networking
Fault.
4. Handle the upper-level clock source fault.
The CLKG board works normally and the clock networking relationship is correct, but the clock still cannot lock the upper-level
office. In this case, contact the office homed by the upper-level
clock source to make sure whether there is clock fault in the
upper-level office. If the clock-source accuracy of the equipment to be traced does not reach the level-2 clock, negotiate
with the office to ask for the clock source with high accuracy.
5. Handle the inconsistent lock status of the active/standby CLKG
board.
Confidential and Proprietary Information of ZTE CORPORATION
37
ZXWN MGW Troubleshooting
If one of active/standby CLKG boards of local office can trace
and lock the upper-level clock, there is no problem with the
upper-level clock source and clock networking. The fault must
exist in this office. For how to handle this kind of faults, see
Handling the Inconsistent Lock Status of Active/Standby CLKG
Boards.
Handling the Clock
Networking Fault
Background
In the communication network, all node clocks must keep synchronous so that the link document and data receiving/transmitting can be identified normally. In the clock networking, the clock
synchronous relation between each node is divided into following
three types:
�
Quasi-synchronization
This synchronization mode is generally applicable to the international communication, for the PRC used by different countries has high accuracy that reaches 1×1011 so that the slip
occurs only one time within 70 days.
�
Mutual synchronization
This synchronous relation is relatively complex and requires
the clocks with higher level.
�
Master-slave synchronization
With a reference clock in the network, distribute the clock
based on the hierarchy. The master-slave synchronization
adopting 3-level architecture is adopted in China.
The level-1 clock, PRC serves as the master clock and complies
with ITU-T recommendation G.811. Composed by the cesium
clocks, the PRC mainly works in the free-running status. There
is another local master clock mainly composed by the atomic
clock and the GPS can be synchronized by the PRC.
The level-2/level-3 slave clock is mainly used on the level2/level-3 node of the network. The node clock stability complies with the G.812/YD-T1012. Here, the level-2/level-3 corresponds to original enhanced level-2/level-3.
Overview
The clock networking fault refers to the clock synchronization fault
caused by the error configuration of synchronous relation with
other office. Clock self-loop is the most typical one.
Two offices with the master-slave relation track and synchronize
the clock mutually when they are synchronizing the clock reference. Therefore, the close-loop of clock is formed. That is the
clock self-loop.
For example, when the ZXWN MSCS is interconnected with the
PSTN, the MSCS tracks the E1 line clock on the PSTN line and
compares with the 8K line clock on the PSTN. And the PSTN does
the same thing to the MSCS. This is the clock self-loop, as shown
in Figure 16.
38
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 3 Clock Faults
FIGURE 16 EXAMPLE
FOR
CLOCK SELF-LOOP
In the practical networking, connecting the 8K link clock reference
often causes the clock self-loop fault. Such a fault is relatively
reduced when the BITS clock reference is adopted.
Fault Phenomenon
Solution
When the clock self-loop fault occurs, the common fault phenomena are shown as follows:
�
The system is hard to track the clock again when the clock lock
loss happens frequently.
�
The system clock cannot be locked.
The handling method of clock networking fault is shown below:
1. Check the actual result of the clock networking to make sure
whether the clock self-loop fault occurs.
2. If it is, change the reference clock source of the local office or
interconnected office based on the actual condition to eliminate
this fault.
Adopt the PSTN clock reference to eliminate the clock self-loop
fault as shown in the Figure 16.
Handling the Inconsistent
Lock Status of
Active/Standby CLKG
Boards
Overview
When the clock tracing status of two CLKG boards configured with
active/standby mode are inconsistent, the hidden trouble exists,
although the system may work normally. At this time, it is necessary to check the system, and handle the hidden trouble to make
the active and standby CLKG boards work normally.
In this case, the clock networking fault and the clock source fault
generally can be excluded basically.
Fault Phenomenon
When the CLKG boards is configured with the active/standby
mode, one CLKG board cannot be locked (the CATCH indicator is
Confidential and Proprietary Information of ZTE CORPORATION
39
ZXWN MGW Troubleshooting
constantly on, or both the CATCH indicator and TRACE indicator
flash synchronously), but another CLKG board can be locked.
Solution
The method of handling inconsistent lock status of the active/standby CLKG boards is as follows.
TABLE 4 HANDLING METHOD OF INCONSISTENT LOCKING STATUS
ACTIVE/STANDBY CLKG BOARDS
Phenomenon
Method
The BITS clock serves as the clock reference
Step 1
The 8K line clock servers as the clock
reference
Step 2
OF
1. The BITS clock serves as the clock reference
In the case of the BITS clock serving as the clock reference,
basically restrict the fault location on the slots and the CLKG
board itself while handling this inconsistent lock status fault.
i. Switch the CLKG board incapable of locking the clock to the
standby status. And then plug/unplug this board again to
exclude the contact problem.
ii. After unplugging this board, check whether the pin 24/25
of the J5 connector of the corresponding slot are bent. The
pin distortion may also result in this fault.
iii. If it is permitted, replace the clock reference of the uncertain CLKG board to see whether this board can be locked.
In this way, exclude or locate whether the CLKG board is
faulty.
2. The 8K line clock serves as the clock reference
The 8K reference clocks of the active/standby CLKG board are
transferred independently. Therefore, there is no inevitable
association between the 8K reference clock of the active CLKG
board and that of the standby CLKG board. As a result, it cannot directly draw a conclusion that the CLKG board is faulty
when the lock status of the active/standby CLKG board is inconsistent, in the case of the 8K clock source serving as the
CLKG current reference clock. Troubleshooting the fault carefully is necessary.
Although the 8k clock cables are separated on the backplane,
they have the same source, therefore, the fault generally does
not relate to the clock networking or the clock hierarchy.
i. Touch the RJ45 connector that feeds in the clock reference
on the RCKG1 rear board to eliminate the contact problem.
If the fault still exists, handle it by performing the following
steps.
ii. If there exists another 8K reference clock, switch the
current tracking reference of the CLKG through the background or manual mode to the standby reference. And
then observe the lock status of the CLKG. If the CLKG is
locked, restrict the fault location on the slot or the rear
board; otherwise, the CLGK board is basically faulty.
40
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 3 Clock Faults
iii. If there is only one 8K reference clock, pull out the clock reference cable from the RCKG1 rear board, replace the normal clock reference cable, and extract the clock from other
interface board. If this phenomenon disappears, the fault
is located on the clock cable, the interface board that extracts the clock, and the rear board of this interface board.
If the phenomenon still exists, restrict the fault location on
the CLKG, the slot plugged the CLKG, and the RCKG1. And
then handle the fault by using the replacement method,
find out the faulty component and replace it.
Handling the Clock
Reference Loss
Overview
The clock reference loss refers to the upper-level clock reference
signal used by the system is lost, which makes the system asynchronous with the whole network. When the CLKG board gives the
clock loss alarm, it is necessary to check and handle the fault.
Fault Phenomenon
When the BITS clock reference is lost, probably appear following
two conditions:
Solution
�
Both active CLKG board and standby CLKG board lose the clock
reference.
�
The clock reference loss status of the active/standby CLKG
board is different.
The handling method of clock reference loss is shown in Table 5.
TABLE 5 HANDLING METHOD OF INCONSISTENT CLOCK REFERENCE STATUS
OF ACTIVE/STANDBY CLKG BOARDS
Phenomenon
Method
The BITS clock serves as the clock
reference
Step 1
The 8K line clock servers as the clock
reference
Step 2
1. The BITS clock serves as the clock reference.
i. Both active CLKG board and standby CLKG board lose the
clock reference
Check the indicators on the active/standby CLKG board,
finding that the KEEP indicator (working mode indicator)
and the corresponding BITS clock reference indicator (such
as the 2MBPS1 indicator) are constantly on. This means
that the CLKG was in tracking status before losing the current clock reference. In this case, the CLKG board itself is
normal. Check whether the 9-pin connector on the RCKG1
rear board is dropped or loose.
Troubleshoot the corresponding output or transmission
equipment of the BITS.
Confidential and Proprietary Information of ZTE CORPORATION
41
ZXWN MGW Troubleshooting
ii. The clock reference loss status of the active/standby CLKG
board is different.
If this fault occurs during the system running, check
whether the CLKG board is well plugged, for which will
make the board loose. Otherwise, the CLKG board is
probably faulty. The board should be replaced in time.
If this fault appears during the course of office commissioning, check whether the slot is normal. And check whether
the pin 24/25 of the J5 connector are normal, which is located at the corresponding backplane slot.
2. The 8K line clock serves as the clock reference.
i. Both active and standby CLKG boards lose the clock reference.
Check whether the RCKG1 rear board is loose, whether
RJ45 connector of the reference inputting interface is loose
and whether the interface board works normally.
If the reference is lost during the course of office commissioning, check the connecting line. For example, if the 8K
clock reference is extracted through the DTB, check such
components as the 8K clock cable, the RCKG1 rear board.
ii. The clock reference loss status of the active/standby CLKG
board is different.
One CLKG board can detect the clock reference signal, but
another CLKG board cannot. Therefore, concentrate on the
CLKG board and its corresponding slot, the RCKG, the 8K
clock reference cable, the clock extracting interface board,
and the rear board of the interface board. For how to handle such kind of faults, refer to Handling the Inconsistent
Lock Status of Active/Standby CLKG Boards.
Handling the Output Clock
Loss
Overview
The output clock loss refers to that the output clock signal is abnormal or lost, which is transmitted from the CLKG board to the
UIM or other board in the system. At this time, the board in the
system will report the clock loss alarm.
Fault Phenomenon
When the BITS clock reference is lost, probably appear following
two conditions:
Solution
42
�
The board in the system reports the clock loss alarm.
�
The signaling link appears intermittent disconnection and even
interruption.
�
The ALM indicator on the board is on.
The handling method of output clock loss is listed in Table 6.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 3 Clock Faults
TABLE 6 HANDLING METHOD
OF
OUTPUT CLOCK LOSS
Phenomenon
Method
All the output clocks have been lost for a
moment
Step 1
All the clocks have been lost for a long time
Step 2
Several clocks are lost
Step 3
1. All the output clocks have been lost for a moment.
During the system running, the clock signal is lost suddenly,
and then recovers automatically about 10 seconds later. Query
the history notification and history alarm information to see the
working mode conversion of the CLKG. If the CLKG lock loss
happens suddenly and the lock loss status lasts for about 120
minutes, the fault reason can be basically judged as the clock
deterioration due to the network fluctuation. The CLKG board
tracks the clock signal again so that the clock is temporarily
lost during this process. It is necessary to check the network
fluctuation reason to prevent this fault from appearing again.
2. All the clocks have been lost for a long time.
The clock signal has been lost for more than 10 seconds during
the system running. If either the 44-pin connector on the CLKG
rear board or the 9-pin connector on the RUIM is not loose,
the fault reason is that both the active and standby CLKGs are
running in the standby mode due to the abnormality. In this
case, check whether the ACT indicator on the CLKG board turns
off. If it is, plug/unplug the CLKG board to handle this fault.
3. Several clocks are lost.
Generally, the loss of one or several clocks results from the
hardware faults. Check the clock cable, the socket connector
and the backplane first, and then check the board. The following example describes the handling procedure.
i. The CLKG board is connected to the three shelves, labeled
as the shelf 1, shelf 2, and shelf 3 respectively. The UIM
of the shelf 1 reports the clock loss alarm, but either the
shelf 2 or shelf 3 does not. Switch the 9-pin clock cable
connector of the shelf without reporting this alarm (such
as the shelf 2) with the UIM of the shelf 1 reporting this
alarm.
ii. If the shelf 1 still reports the clock loss alarm, but either
the shelf 2 or shelf 3 does not, restrict the fault location on
the RUIM, UIM and slot plugged with the UIM. Then replace
the RUIM to eliminate its problem.
If the clock loss phenomenon disappears after the RUIM is
replaced, the RUIM rear board is faulty. Otherwise, replace
the UIM board. If this phenomenon disappears after the
UIM is replaced, basically the UIM is faulty. Otherwise, the
backplane is faulty.
iii. After the UIM board is replaced, the UIM in the shelf 1 does
not report the clock loss any longer, but the shelf 2 starts
to report this alarm. In this case, troubleshoot the fault on
Confidential and Proprietary Information of ZTE CORPORATION
43
ZXWN MGW Troubleshooting
the system clock cable, RCKG1, RCKG2, CLKG, and the slot
plugged with the CLKG.
Replace the system clock cable. If the alarm from the shelf
1 disappears, the system clock cable is faulty. If the shelf 1
still reports alarm after the clock cable is replaced, replace
the RCKG1 and RCGK2 rear boards. If the alarm disappears, there is a problem in the rear boards. Otherwise,
replace the CLKG board. If the alarm disappears, there is
a problem in the CLKG board. Otherwise, the slot plugged
with the CLKG board is faulty.
Handling the Slip Code
Overview
The slip code is the error judgment to the data among different
offices after running for a period of time due to the minor difference
of clock frequency. The slip code is usually related to the clock
system, but it does not always result from the CLKG board. The
engineering problems or transmission problems often cause this
fault.
The national standard (GB-12048) defines the slip code alarm as
follows: report the common alarm when the slip occurs four times
within 24 hours, and the important alarm when it occurs 255 times.
When the CLKG board is locked, its frequency difference is above
-1×1010, fully complying with the related specifications.
Fault Phenomenon
Solution
�
The fault management system reports the slip code alarm.
�
Some signaling links report the link intermittent disconnection
alarm.
The handling method of slip code fault is listed in Table 7.
TABLE 7 HANDLING METHOD
OF
SLIP CODE
Phenomenon
Method
The CLKG board works abnormally
Step 1
E1 configuration is error
Step 2
The inter-office clock synchronous relation
is error
Step 3
Transmission fault
Step 4
Opposite-end office fault
Step 5
1. The slip code is caused by the CLKG.
When the slip code occurs, check the CLKG working status first.
If it is in the trace status, the slip interval is not less than 40
minutes. That is to say, if the CLKG is in the trace status at
the same time, the CLKG board can be excluded basically.
If the CATCH indicator on the CLKG is constantly on, or the
CATCH indicator and TRACE indicator flash synchronously, and
44
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 3 Clock Faults
the slip interval is about several minutes, the CLKG board is
probably faulty. The CLKG board should be checked by using
such methods as the active/standby changeover, resetting and
replacement.
2. The slip code is caused by the unused E1.
If there are some unused E1s that have been connected and
configured, these E1s certainly will cause the slip code, and the
serious one. Clear all used E1s and delete the configuration
data of the unused E1s in the system. In this way, the system
will not report the alarms from those unused E1s.
3. The offices interconnected through the E1 have neither direct
nor indirect clock synchronous relation.
In the whole network, all offices interconnected through the E1
must have the direct or indirect clock synchronous relation. If
two offices interconnected through the E1 have no such relation, the slip code probably occurs in the E1 system between
these two offices.
At this time, it is necessary to check the clock networking relation among different corresponding offices and establish the
direct or indirect clock synchronous relation between those offices.
4. The slip code is caused by the transmission.
If the clock networking relation is correct and the CLKG works
normally, the slip code still occurs. It is necessary to check
whether the transmission is normal. Because the normal-working CLKG can only confirm the clock is sent at correct rate.
However, the slip code is caused by the deviation of the sending rate from the receiving rate, especially when the clock reference of the CLKG is not extracted from the line, or not transmitted from the opposite-end office where the slip code occurs.
5. The slip code occurs when the opposite-end office is faulty.
The slip code also occurs when the opposite-end office clock is
faulty. At this time, it is necessary to handle the fault of the
opposite-end office in time to eliminate to slip code.
Confidential and Proprietary Information of ZTE CORPORATION
45
ZXWN MGW Troubleshooting
This page is intentionally blank.
46
Confidential and Proprietary Information of ZTE CORPORATION
Chapter
4
Interface Faults
Table of Contents
Handling MGW-MSCS Interface Fault ....................................47
Handling MGW-MGW Interface Fault.....................................49
Handling MGW-RNC Interface Fault ......................................50
Handling MGW-PSTN Interface Fault.....................................52
Handling MSCS-BSC Interface Fault .....................................54
Handling Service Faults ......................................................56
Handling MGW-MSCS
Interface Fault
Background
Mc interface is the interface between MSCS and MGW. Figure 17
shows its protocol-stack structure.
FIGURE 17 MC INTERFACE PROTOCOL STACK
The MGW equipment interacts with the MSCS through the Mc interface, which adopts the standard H.248 protocol and supports
the binary and text protocol CODEC formats.
During the service connecting process, the MGW is applicable to
invoke and manage various service resources under the control
of the MSC Server. The Mc interface fault will cause the MGW
equipment incapable of providing the service.
Generally, the Mc interface adopts the IP bearer. Because current
network is not all-IP network, the Mc interface protocol stack usually adopts the H.248/M3UA/SCTP/IP mode. The IP physical interconnection between the ZXWN MSC Server and the MGW is implemented by connecting the FE1 interfaces on the rear board of the
SIPI board of two NEs. The corresponding SIPI unit is configured
Confidential and Proprietary Information of ZTE CORPORATION
47
ZXWN MGW Troubleshooting
with the IP address and the SCTP association. The upper-layer is
configured with the AS data, and the AS data of SIO location.
The signal trail of the Mc interface is shown below:
SMP → UIM → SIPI → IP Network → MSCS
Fault Phenomenon
Solution
�
All the calls related to the subscribers in local office fail.
�
The office direction between the MSC Server and MGW is disconnected.
�
The association cannot be deactivated/activated normally in
the Dynamic Management.
�
The association status is normal, but the statuses of the AS
and ASP are abnormal, being non-activated. There is an MGW
registration failure message in the platform signaling tracing.
�
The AS status is normal, but the gateway failed to register on
the MGC.
1. Check the physical connection to see whether the change of
the connection line causes Mc interface abnormality. If the
SIPI and UIM are both active/standby, specially check the line
between the SIPI and the UIM.
2. Check the data configuration of the MGW, including the following data.
�
The adjacent office configuration
�
IP protocol stack configuration
�
SIGTRAN data configuration.
3. Check whether the status of the SMP managing the association
module and SIPI is normal, whether there is an alarm. If the
onsite conditions permitting, restart the OMP, SMP and SIPI to
see whether the fault can be eliminated.
4. Use the dynamic management tool provided by the OMM to
activate/deactivate the association to see whether the fault can
be eliminated.
5. Check the port address and service status of each board, and
the routing information on the RPU through the OMM system.
6. In the Dynamic Management interface, check the association
status and whether the M3UA route is reachable.
7. If the actual/virtual address of opposite end can be successfully pinged by using the platform tool, but the SCTP connection
cannot be established, possibly the broadcast storm occurs to
the switch connected with the SIPI boards of the MSCS and the
MGW, which causes the Mc interface communication abnormal.
To restore the service as soon as possible, use the cross cable to temporarily connect the SIPI boards of the ZXWN MSC
Server and the MGW.
8. If the fault still cannot be eliminated by using all available
methods, reset the RPU and SMP, transmit all tables again,
and finally restart the MP.
Instance Analysis
Topic: the Mc interface fault results from the disconnection of the
signaling link to the MSC Server.
1. Symptom
All of the call service cannot be processed.
2. Source
48
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 4 Interface Faults
The subscriber or carrier tells that the call service cannot be
processed. The background reports the link interruption alarm.
3. Related Components
Broadband signaling processing board or the MNIC board
4. Fault Analysis and Location
The link to the MSC Server is interrupted.
5. Solution
Determine whether the fault is caused by the local office or
by opposite-end office. If it is caused by the opposite office,
ask the opposite-end office to handle the fault. If it is caused
by the local office, determine whether it is caused by software
or hardware. If it is caused by hardware problem, check the
board or optical fiber. If it is caused by software, record the
symptom, and analyze the reason. If it is necessary, reset the
board.
6. Summery
i. In the fault location procedure, determine whether the
fault is from the local office or the opposite-end office,
and then determine whether it is hardware problem or
software problem.
ii. When resetting the board, save the relevant fault information for the future fault location.
Handling MGW-MGW
Interface Fault
Background
Nb interface is the interface between two MGWs. It is responsible
for transferring the media plane information. The bearing mode
of the Nb interface is the ATM, TDM, and IP, as specified by the
protocol. In the current practical networking, Nb interface mainly
adopts TDM bear and IP bearer.
�
With the bottom-layer adopting the IP bearer, the signaling
processing flow of the Nb interface is shown as follows.
VTCD (MRB) → UIM → IPI → IP Network → Opposite-end MGW
interface board
�
With the bottom-layer adopting the TDM bearer, the signaling
processing flow of the Nb interface is shown as follows:
VTCD (MRB) → UIM → DTB (SDTB) → TDM Network → Opposite-end MGW interface board
Fault Phenomenon
�
The call signaling flow is correct, but the call does not have
two-way speech.
�
The fault such as the noise and call interrupted during the call
process.
�
The call failed to be established because the MGW resources
failed to be obtained.
Confidential and Proprietary Information of ZTE CORPORATION
49
ZXWN MGW Troubleshooting
Solution
1. Check whether the data configuration is correct. Frequent data
configuration errors are as follows.
�
The CODEC mode setting error
�
Interface information configuration error
�
The VTCD configuration error.
2. Open the Failure Observation to trace the call and find out the
fault reason.
Instance Analysis
1. Symptom
One MSCS is configured with two MGWs. The inter-office communication between two MGWs is unsuccessful after the Nb
interface configuration is over. Without playing tones, trace
the signaling and find that the call is disconnected directly after it is confirmed, therefore, the call failed. The inter-office
call cannot be established.
2. Fault analysis and location
i. The inter-office signaling tracing shows that the call is disconnected directly after it is confirmed. The call fails.
ii. Implement the test, and find that the communication between two MGWs is normal, which means that the connection from the MGW to the BSC is normal and configured
correctly.
iii. Analyze that the VTCD configuration processing the CODEC
is error, which causes the unsuccessful communication between two MGWs. Check the configuration and find that
the VTCD unit is configured not under the SMP module that
processes the call but under the OMP module. As a result,
the inter-office call is failed.
3. Solution
Delete the VTCD unit from the OMP module and configure it
under the SMP module, and then restart the VTCD. The Nb
interface signaling is clear, and the fault is eliminated.
Handling MGW-RNC
Interface Fault
Background
Iu-CS interface is the interface between CN and RNC in CS domain. In the R4 phase, the MSCS processes the control plane of
the ATM-borne Iu-CS interface. The signaling is adapted based
on the AAL5, and transmitted through the SCCP. The MGW processes the user plane and the bearer control plane of the Iu-CS
interface. The ALCAP controls the establishment and release of
the user plane connection. The user data is adapted based on the
AAL2, and transmitted through the AAL2 connection.
In practical networking at present, the signaling interaction between the MSCS and the RNC is switched by the SG built in the
MGW. The networking mode and interface protocol stack structure
are shown in Figure 18.
50
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 4 Interface Faults
FIGURE 18 NETWORKING MODE & INTERFACE PROTOCOL STACK STRUCTURE
The connection relationship between the MGW and the RNC is the
direct-connection through the fiber physically, with bottom-layer
bearer ATM.
The signaling trail between MGW and RNC is as follows.
SMP → UIM → APBE → ATM Network → RNC
Fault Phenomenon
Solution
Instance Analysis
�
The RNC office is unreachable.
�
The subscriber calls under RNC all fail.
�
Subscriber location update service fails.
1. First check whether the APBE board works normally. Check
whether the RUN indicator on the APBE is normal and whether
the ALARM indicator gives alarms.
2. Check whether the indicator of the fiber-connected optical interface is on. If it is off, check whether the sending/receiving
fiber is connected reversely, and whether the connection is correct.
3. After correctly connecting the fiber, check whether the ATM-interface data configurations of local office and RNC are correct.
Focus on the PVC configuration.
4. If the office is still unreachable when fiber connection and PVC
configuration are both correct, locate the fault with the selfloop method. If the link cannot be activated after looping back
the local office, check the configuration data of the signaling
link to the RNC, including signaling link, signaling link group,
signaling office and signaling route. The links can be activated
normally after self looping back RNC and MGW, but the links
become abnormal after interconnection. Check whether the
data of interconnected ends are consistent, including PVC, SLC,
DPC, and OPC.
5. If the office is reachable, and the circuit is normal, but all the
subscriber calls under RNC still fail, further check the RNCrelated data configuration.
1. Topic
The signaling link of the Iu interface cannot enter into the service status.
2. Symptom
Confidential and Proprietary Information of ZTE CORPORATION
51
ZXWN MGW Troubleshooting
Being in the Initial Position Status, the signaling link from the
MGW to the RNC cannot enter into the Service Status.
3. Fault analysis and location
Generally, such fault is caused by two cases: the hardware
fault and the ATM configuration problem.
i. It is found that the board running indicator is normal, and
the board does not give alarms. Printing the information
from the foreground is normal. If there are problems on
the hardware, following information will be printed.
TIMER TCC OUT, SSCOP Send AA_RELEASE_Ind/Conf… or
TIMER NORESPONSE OUT, SSCOP Send AA_RELEASE_Ind
In….
It indicates that the bottom layer is disconnected or connected in single direction. Then replace the board with the
same normal one, but the fault does not be eliminated. So,
the hardware fault can be excluded.
ii. Check the ATM configuration on the MGW, finding that the
data satisfies the requirement. Check the interconnected
ATM configuration with the RNC side, finding that the VPI
configured at the RNC side is inconsistent with the one configured at the CN side.
4. Solution
Modify the VPI configuration data to make the VPI at the CN
side consistent with that at the RNC side. The link enters into
the service status after modification.
Handling MGW-PSTN
Interface Fault
Background
In the R4 networking application, all service interactions between
the MGW and PSTN switch are based on TDM bearer connection
mode. Since the call part is separated from the control part, the
configuration of MGW as a bearer device for interconnection with
other devices mainly refers to configuration of bearer resources.
Therefore, it only needs to ensure the consistent TDM interconnection of two ends while debugging.
In actual networking, there is no physical connection between MSC
Server and PSTN switch. Signaling interaction between them is
implemented by MGW-built-in SG to forward the signaling message. The ZXWN MGW supports the built-in SG function. Figure
19 shows the networking structure between the ZXWN MSCS and
PSTN switch for ISUP message forwarding.
52
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 4 Interface Faults
FIGURE 19 NETWORKING
SUPPORTED)
AND
PROTOCOL ARCHITECTURE (BUILT-IN SG
The MGW and PSTN are connected directly and physically through
the E1. The TDM serves as their bottom-layer bearer. The signaling route between the MGW and PSTN is shown as follows.
SMP → DTEC → TDM network → PSTN
Fault Phenomenon
Solution
Instance Analysis
�
The PSTN office direction is unreachable.
�
Part or all of the circuits are abnormal.
�
Calls from fixed network are crossed.
1. Check whether the DTEC board works normally. Check whether
the RUN indicator on the DTEC board is normal, and whether
there is an alarm shown by the ALARM indicator.
2. The crosstalk is usually due to that E1 lines are crossed. Adjust
the connection of E1 line to eliminate the fault.
3. In the Dynamic Management interface, check the status of the
PCM and CIC, and check the status of the corresponding circuit
in the R_CIC table with a probe. Locate the fault according to
the status.
4. If the fault is not due to the hardware problems, check the
circuit configuration and trunk management configuration on
the MSCS to further locate the fault.
1. Topic
A CIC circuit to the PSTN is unavailable.
2. Symptom
�
The call using this link fails.
�
The status value of the R_CIC is not 0.
�
�
The reset/block operation on the circuit through the dynamic management fails.
The platform cannot receive the message from the opposite
end after tracing the link signaling.
3. Fault analysis and location
i. Gather the service MP printing information about the activating/deactivating association for the management association module of the MGW and MSCS office.
ii. Check whether the trunk status is normal, and whether the
sub-unit is unblocked.
iii. Check the status value of the CICID in the R_CIC foreground table.
Confidential and Proprietary Information of ZTE CORPORATION
53
ZXWN MGW Troubleshooting
iv. If the status value of CICID is 512, it indicates that the local
end is blocked. Make sure that the PCM system number
and signaling timeslot are correct, and the trunk board is
in normal status. Use the light-emitting diode on the DDF
to ensure there are circuits on the receiving and sending
channels.
v. Unplug the trunk line from the trunk board, and plug the
line again after 30 seconds later. The fault is eliminated.
Handling MSCS-BSC
Interface Fault
Background
The MGW provides the bearer for the A interface traffic with the
TDM connection mode.
In the R4 phase, ZXWN MSCS integrates with ZXWN MGW to serve
as MSC in R99 (this mode is called MGW built-in mode). There is
only one MSC (one NE) at the BSC side. The A interface is the interface between MSC Server and BSC. Figure 20 shows the adopted
protocol stack. MSC Server processes all of the control messages
of the A interface. This interface implements the functions unrelated to the bearer part, including subscriber mobility management, BSS access, control plane processing of the call service and
SMS service.
FIGURE 20 A-INTERFACE PROTOCOL STACK STRUCTURE
When MGW built-in networking mode is adopted, ZXWN MGW and
ZXWN MSCS use the same SP. In this case, there is no physical
connection between ZXWN MSCS and BSC, the BSSAP protocol between them is transferred through ZXWN MGW. Figure 21 shows
the networking mode. In this networking mode, if the MTP3 upper-layer user is configured in the MGW, the SCCP cannot be configured, because the MSCS and MGW use the same SP. The MGW
needs to forward all of the messaged to the MSCS, but the BSSAP
is above the SCCP. If the SCCP part is configured to the MGW, the
BSSAP message cannot be forwarded to MSCS correctly. Therefore, the ZXWN MSCS and ZXWN MGW are combined to form a
MSC NE.
54
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 4 Interface Faults
FIGURE 21 MGW BUILT-IN NETWORKING MODE
STRUCTURE
AND
ITS PROTOCOL
For ZXWN MGW, the boards processing the narrow-band No.7 signaling protocol are shown in Figure 22.
FIGURE 22 BOARDS RELATED
PROTOCOL
Fault Phenomenon
Solution
Instance Analysis
TO
NARROW-BAND NO.7 SIGNALING
�
The BSC office direction is unreachable.
�
Part or all of the circuits are abnormal.
�
Talks are crossed.
1. Check whether the DTB board works normally. Check whether
the RUN indicator on the DTB board is normal, and whether
there is an alarm shown by the ALARM indicator.
2. The crosstalk is usually due to that E1 lines are crossed. Adjust
the connection of E1 line to eliminate the fault.
3. On the Dynamic Management window, check the status of
the PCM and CIC, and check the status of the corresponding
circuit in the R_CIC table with a probe. Locate the fault according to the status.
4. If the fault is not due to the hardware problems, check the
circuit configuration and trunk management configuration on
the MSCS to further locate the fault.
1. Topic
The call completion ration to a certain BSC is low.
Confidential and Proprietary Information of ZTE CORPORATION
55
ZXWN MGW Troubleshooting
2. Symptom
i. Calls are hard to connect, only few of which can be put
through.
ii. Gather the fault information
iii. Gather the networking condition to check whether the A
interface of the BSC can be forwarded to the MSC Server
by the MGW.
iv. Gather the performance statistic data of the OMM before
and after the fault occurs, such as m3ua, mtp3, and signaling link statistics.
v. Save the stored alarm information before and after the fault
occurs.
vi. Gather the call loss of the service and signaling, including
MM, VLRMAP, MSCMAP, and BSSAP.
vii.If there is printing information, save the information
printed before and after the fault occurs.
viii.If there are a lot of signaling tracing records before and
after the fault occurs, save them to the utmost.
ix. Gather and save the traffic statistics before and after the
fault occurs.
x. Gather the records of the operation to the foreground before and after the fault occurs.
3. Analysis
i. By tracing the signaling, it is found that for the calling party,
the network returns the call proceeding message after receiving the setup message. A moment later, the network
sends the disconnect message. But the called party is hard
to trace the page rsp message.
ii. Perform statistics on the sending and receiving messages of
the A interface signaling link through the OMM system. The
sending and receiving messages of each signaling link are
not balanced. The sending and receiving ratio is seriously
imbalanced.
iii. Observe the call loss records. Most of them are due to the
paging has no response or calling party releases the call
abnormally.
iv. It can be inferred that the problem lies in the message
forwarded from the M3UA to the MTP3 at the MGW side.
A lot of messages are lost. Reset the standby board of the
SMP
Handling Service Faults
Background
56
The MGW provides the call-independent service bearing function
and implements the service bearing conversion and service flow
format processing under the control of the MSC Server.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 4 Interface Faults
The basic service fault of the MGW refers to that the equipment
cannot either provide corresponding bearing function or implement
the bearing service conversion and service flow processing.
This section describes the handling of service faults by using a
specific example.
Instance 1
1. Topic
The circuit is in abnormal status. And the CIC is not in idle
status.
2. Symptom
Query the local end status of the CIC circuit on the MSCS.
The status is “ERRORREQ”, indicating the request message is
illegal.
3. Fault Analysis and Location
The CIC circuit is related to the voice channel. For the R4
networking, the signaling is separated from the voice channel,
that is, the MSC controls the signaling part, while the MGW
carries the voice channel. Therefore, focus on checking the
MSCS data configuration related to the MGW.
4. Solution
i. Check the office directions and links from the MSC to the
MGW, AS, and ASP. Find that all of them are in normal status.
ii. When querying the static data of the MGW, find that they
are not configured. Configure these data again. But the
command fails, prompting that “Language description template number does not exist. Configure the language description first.” Therefore, check the configuration related
to the tone. A tone type of the language description exists
in “Batch create MSCS tone”, but it is not configured in the
data configuration.
iii. Select ALL for the tone types in the “Batch create MSCS
tone”. Add MGW static data again. The command is executed successfully.
iv. Synchronize the data. Check the status of the CIC circuit,
and find that it is in IDLE status normally.
5. Summary
i. When creating MSCS tone in batch in the MSCS, the default
type of tone is TONEID, other types such as LANGSTR are
not default. For practical configuration, it is not enough to
configure these data with only the TONEID type. Therefore,
it needs to select other tone types according to the actual
condition. In this way, the command can be executed successfully.
ii. In addition, the OMM system provides the batch command
to speed up the data configuration. But it needs to carefully
check whether every command can be executed successfully after implementing each batch processing. If the command is executed unsuccessfully, carefully check where the
problem is located, and then execute this command separately.
Instance 2
�
Topic
Confidential and Proprietary Information of ZTE CORPORATION
57
ZXWN MGW Troubleshooting
CIC timeslot configuration error causes low call completion ratio of wireless system.
�
Symptom
This fault occurs in a soft switch project. On the OMM system,
the performance index “call completion ratio of wireless system” under the soft switch is lower than that under other MSC
about 4%~6%. It does not meet the specified standards.
The onsite networking mode is MSCS-MGW-BSC.
�
Fault Analysis and Location
Analyze the formula used by the operator for calculating the
call completion ratio of wireless system, and find that the “success rate of service channel allocation (changeover excluded)”
parameter is lower than normal standard.
The calculation formula of this parameter is: “Times of successful service channel allocation”/“Times of requests for service channel allocation”×100%.
The operator takes the times of BSC receiving an “AssigReq”
assignment request message from MSC as that of requests for
service channel allocation, and the times of BSC sending an
“AssignCmp” assignment completion message to MSC as that
of successful service channel allocation. Obviously, the lower
success rate of service channel allocation results from more
assignment failure.
Analyze the BSC-provided CICs failed to be allocated, and find
that these CICs are all corresponded by 16-timeslot in E1 of
A-interface. After querying, find that the BSC engineer deleted
all CICs corresponded by 16-timeslot of E1 when unblocking
the A-interface circuit.
�
Solution
At the MSC side, delete the BSC-provided CICs corresponded
by this part of 16-timeslot. The performance index is recovered.
�
Summary
Negotiate interconnecting data with adjacent office during
signaling and circuit commissioning, including signaling point
code, SLC, signaling-located timeslot, PCM start number, and
other data.
58
Confidential and Proprietary Information of ZTE CORPORATION
Chapter
5
OMM System Faults
Table of Contents
Handling OMM Abrupt Abnormality.......................................59
Handling Virus/Security Events............................................62
Analyzing Instance 1..........................................................64
Analyzing Instance 2..........................................................64
Analyzing Instance 3..........................................................65
Analyzing Instance 4..........................................................66
Analyzing Instance 5..........................................................67
Analyzing Instance 6..........................................................68
Analyzing Instance 7..........................................................70
Analyzing Instance 8..........................................................71
Handling OMM Abrupt
Abnormality
Fault Phenomenon
Flow Diagram
�
No clients can log in the system.
�
The OMM cannot connect to the primary NEs.
�
The OMM cannot execute various man-machine commands.
Figure 23 shows the flow of handling the abrupt abnormality of the
OMM system.
Confidential and Proprietary Information of ZTE CORPORATION
59
ZXWN MGW Troubleshooting
FIGURE 23 HANDLING
Solution
THE
ABRUPT ABNORMALITY
OF THE
OMM
SYSTEM
1. Update fault of the OMM system
The OMM system fails to be started after being upgraded.
�
�
�
Check whether the software packet is correct, such as the
software name, version ID and packet bytes.
Check whether the abnormal start of the OMM system is
due to the abnormal modification to ums-svr\deploy\deploy*.xml or ums-clnt\deploy\deploy*.xml. The solution is to replace deploy*.xml with the correct one.
In the OMM software running version, check whether the
key file or directory has been deleted. Start the OMM software, and carefully view the output information of the command, to find out the lost file according to the prompt. Copy
the corresponding file in the version backup position to current running position, and then restart the OMM system.
2. Network fault
�
60
There are errors in the configurations of the OMM IP address, subnet mask or network gateway, which cause the
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 5 OMM System Faults
network fault and make NEs unable to establish communication.
�
�
�
�
Check the above-mentioned configurations carefully, correct the error configuration. And then check whether the
network fault has been eliminated through the Ping command.
Enter the netstat -an command to check following ports
of the OMM server: 21, 23, 1521, 5000~5006, 5057 and
21099~21114. Check whether these ports are blocked by
the firewall.
Search for “port” in the file ums-svr\deploy\deploy-default.properties, and find the list of all ports needed by
the current version. And then check whether some listed
ports are blocked in the firewall configuration table. Unblock these ports if they are blocked.
The network port, network card, network cable or network
equipment of the carrier is faulty, and the Ping command
cannot be used to successfully ping the extra-network from
the OMM server.
By using such methods as the Ping packet, Trace command,
network cable or equipment replacement, check the fault
reason and replace the faulty equipment.
3. Database fault
�
�
�
Check the listening configuration of the ORALCE server.
Run the lsnrctl command status in the DOS window of the
server, and then check whether the listening service configuration is correct and the database instance runs normally.
If the listening service is abnormal, modify the configuration by using the NET MANAGER tool of ORACLE, and
restart the listening service.
Test whether the RACLE instance used by the OMM software
works normally through the sqlplus tool of ORALCE. If the
instance is abnormal, use the ORACLE Enterprise Manager
to restart the database instance, modify the operation, and
recover the database instance.
Check whether the IP address of each database server is
correctly configured in the ums-svr\deploy\deploy-default.properties file. Modify the IP address when it is
inconsistent with the one in the ORALCE database.
4. Other faults
It is found that the free space of disk partition where the OMM
system software is installed is less than 300 MB. Clear this
disk partition to keep its free space above 500 MB. The way
is mainly backing up and clearing the system log and the statistical information. Back up the old system log and statistical
information to other hard disk, or store them to the external
medium, and then delete them. Of course, the trash file in the
operating system also should be cleared.
Confidential and Proprietary Information of ZTE CORPORATION
61
ZXWN MGW Troubleshooting
Handling Virus/Security
Events
Fault Phenomenon
Flow Diagram
�
The OMM system finds that an anonymous IP user logs in the
system and modifies the data, or that the host where the OMM
system is located suffers from the login, intrusion or attack
from illegal user.
�
The virus occurs in the OMM system or the network where the
OMM system is located.
Figure 24 shows the flow of handling the virus/security events.
FIGURE 24 HANDLING VIRUS/SECURITY EVENTS
Solution
1. Check whether a virus or bug exists in the OMM system.
�
�
62
Scan the OMM system and kill the virus. One or several
antivirus software is available, such as Norton, Symantec,
MacAfee, KV3000 or Rising.
Perform analysis and check whether the security bug exists
in the premises network through various network security
analyzing tools, such as Network Analyzer, Sniffer and Ag-
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 5 OMM System Faults
ilent. Eliminate such hidden security trouble based on the
actual network condition.
2. Check whether there are security problems in the OMM system.
�
�
�
After checking the login logs of the OMM system, it is found
that the user with illegal IP address once logged in the system. Delete the illegal user name and password of the OMM
system.
It is found that the system configuration information has
been modified illegally. For example, the configuration
about NE access has been changed, which was not performed by the inner personnel. Check and recover the
modified data configuration in the system.
After checking the user name and password logging in the
NE system, it is found that there is a new user with administrator authority, or that a user announces that original login
user name and password cannot be used for unknown reason. Modify the user password of the root authority or the
Admin authority. It should be noted that the password must
have certain security intensity, such as not less than eight
digits, containing letters, numbers and being case sensitive.
3. Check whether the security problems exist in the database.
Open Enterprise Manager Console to check whether any illegal user exists in the administrator group. If any illegal user
exists, delete it in the Enterprise Manager Console.
4. Check whether the security problems exist in the OMM server
computer.
�
�
On the User tab of the Task Manager (For Windows
2003), anonymous user simultaneously logs in the system.
If this user is logging in the system, right-click and select
Break. When necessary, pull out the network cable to
temporarily disconnect the network, and then perform the
following operations:
Click My Computers > Manage > System Tools > Local
User and Group > User to check whether there is any new
administrator user. If there is, delete it. Select Administrator, right-click and select Set Password to modify the
user password. It should be noted that the password must
have certain security intensity, such as not less than eight
digits, containing letters, numbers and being case sensitive.
5. Check whether the security problems exist in the OMM network
firewall.
�
Log in the firewall with administrator authority, to check
whether there are illegal IP login records. If the records
exist, delete the corresponding illegal user’s login information. And then modify the current administrator’s login
password. It should be noted that the password must have
certain security intensity, such as not less than eight digits,
containing letters, numbers and being case sensitive. If the
administrator cannot log in the firewall, log in the firewall
again after restarting and resetting the firewall.
Confidential and Proprietary Information of ZTE CORPORATION
63
ZXWN MGW Troubleshooting
�
Check the firewall, including the access control list, static
route configuration and port settings. Recover the original
configuration items of the firewall.
Analyzing Instance 1
Topic
The client cannot access the server because of the system log full.
Symptom
Sometimes the client cannot connect to the server after being
restarted. After logging in the client, the topology tree cannot be
displayed on the client, and the left directory tree is empty. But
the connection between the client and the server is normal.
Fault Analysis and
Location
Fault Handling
�
After checking the server, it is found that the process is not
started. While restarting the server, all processes cannot be
started.
�
After the check, it is found that the free space of partition C
is 0, and there are 4G log files. The disk-full error causes the
OMM fault.
After deleting old logs, the server restarts normally. It is recommended to set reasonable log-clearing mechanism for the OMM
server and clients, automatically clearing the old logs.
Analyzing Instance 2
Topic
Symptom
Fault Analysis and
Location
The client fails to connect to the server.
After installing the OMM server and client (both can be installed on
the same computer), the system prompts that Failed to connect
the server after inputting the IP address of the server on the
client.
There are several reasons.
�
The server IP address inputted is incorrect.
Handling: check the IP address of the server carefully.
�
The physical connection between the client and the server is
not clear.
Handling: Check whether the channel between the client and
server is clear with the Ping command. Check the physical
connection between the server and client, to assure the normal
physical connection.
�
The port of the ftp service at the server side conflicts with that
of the IIS service.
Handling: Stop the IIS service in the Services, and set the
IIS service type as Disable. The OMM system of ZXWN MGW
does not need the IIS service. The activation of the IIS service
often results in the port conflict and affects the normal login of
the client.
�
64
Other reasons
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 5 OMM System Faults
Handling: The OMM system requires a server with relatively
high performance. The poor performance of the server will
possibly cause the login failure of the client.
Summary
1. Manually deleting the version file in the File Management possibly causes the foreground version database table inconsistent
with the file. It is recommended to directly delete the version
file through the Version Management instead of File Management. Otherwise, the foreground system possibly cannot obtain the version during the restarting.
2. The parameters are incorrect when performing the physical
configuration for the board, which causes version load failure.
For example, it must be clear whether the T network, CPU and
the GXS sub-card exist when configuring the UIM board parameters.
3. It is required to re-allocate the address of the OMM ftp server
after upgrading the OMM server.
Analyzing Instance 3
Topic
The OMM fault results from the incorrect settings of the timing
task.
Symptom
When an MSC Server is creating the data of the MGW managed
by this MSC Server, the response of the OMM client is too slow to
modify and add data, which will last for about 10 minutes.
Fault Analysis and
Location
1. This problem occurs frequently since a certain time (April 4).
Therefore, it is doubted that the data were modified at that
time. Search the operation logs of April 4, but there are too
many operations in the logs. So search the fault with other
method.
2. Observe the CPU load when the fault occurs. The CPU occupancy is very high. It is confirmed that the fault results from
the periodic or aperiodic running of a process.
3. Check the occupancy of the disk space by the OMM server.
There are five files with the extensions of DMP. Four of them
are larger than 1500 Mb, which are created by an unknown
process in the morning. The fault occurs again at about 14:20
p.m., and the size of the fifth file increases constantly and dramatically, close to 500 Mb. From this phenomenon, it is inferred that this fault occurs periodically.
4. Check the system management in the OMM system, only the
performance statistics data are set with the functions of data
backup, export and automatic deletion. And the Data Backup
policy is enabled for all of the policies in the Policy Management. This policy was modified on April 4 with a cycle of 27
hours. Check the entries of the policy management in the log
files, finding that an operator set the policy on that day. It is
confirmed that this policy results in the problem.
5. Suspend the data backup policy, and observe the running of
the system for two days. This fault does not occur again.
Summary
As a useful data backup tool, the data backup policy in the policy
management is used to perform data backup for the database of
data configuration in the OMM system. But the data backup operation occupies high CPU of the OMM server. Therefore, execute
Confidential and Proprietary Information of ZTE CORPORATION
65
ZXWN MGW Troubleshooting
such kind of task in the early morning or a period of time without
frequent operations.
Analyzing Instance 4
Topic
Symptom
Fault Analysis and
Location
The EMS fails to get the performance measurement file for OMM
FTP parameter setting errors.
The EMS fails to get the performance file reported by the CS.
The soft-switch reports a notification to the EMS, informing that the
performance files are ready. However, the IP address contained in
the performance files is wrong, which is set as 127.0.0.1. It results in that the EMS fails to get the performance files of the CS.
To configure the IP address through the tool in the tools\config\
directory on the OMM server, perform the following steps.
1. Run the run.bat in the tools\config\ directory, to pop up the
Config the Environment of Software dialog box, as shown
in Figure 25.
FIGURE 25 SETTING CORBA FTP
66
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 5 OMM System Faults
2. On the CORBA FTP tab, input the big network address (that
is, the IP address for the OMM communicating with the EMS).
3. Delete TEMP files.
3. Delete TEMP files.
Result: The fault is solved after the OMM system is restarted.
Summary
If selecting to install the CORBA during the installation of OMM
system, it needs to set the FTP address of the CORBA at a later
stage of the installation. The default address is 127.0.0.1. To
avoid this error, modify it to the address used for communicating
with the EMS.
Analyzing Instance 5
Topic
Symptom
The OMM server fails to start.
The OMM server fails to start. Check the log file server-start.log
on the OMM server in the X:\ZXWN-OMCS\zxwomcs\ums-svr\log
directory.
The following information is shown in the file.
Starting failed
RuntimeErrorException: Error in MBean operation ’start()’
Cause: java.lang.Error: test 3 times ftp server fail
Fault Analysis
The FTP server fails to start during the startup of the OMM server
because the port 21 was occupied. There are following two cases.
1. Other FTP server was started, probably the IIS carried by the
operating system.
2. During the startup of the server, the FTP process was not
stopped normally. The engineer checks whether there is a
JAVA process in the Task Manager, and closes the Oracle HTTP
server in the Services.
Solution
1.
2.
3.
4.
5.
Summary
Click Start > Run, and type the CMDcommand. The Command Promptwindow appears.
Type the netstat -an|findstr 21 command to view whether
the port 21 is occupied.
Select Control Panel > Administrative Tools > Services to
pop up the Services dialog box.
Stop the IIS service carried by the Windows operating system.
Restart the OMM server.
�
Directly execute the run.bat (\ZXWN-OMCS\zxwomcs\umssvr\bin\run.bat) to start the OMM server for viewing more
startup information.
�
Close the FTP server before startup. Execute the netstat
-an|findstr 21 command to check whether the port 21 is
occupied.
Confidential and Proprietary Information of ZTE CORPORATION
67
ZXWN MGW Troubleshooting
Analyzing Instance 6
Topic
Symptom
Fault analysis and
location
The OMM client fails to connect the OMM server, and fails to log in
the database with the sqlplus uep/uepuep@omc command, of
which the omc is a database instance.
The OMM client fails to connect the OMM server.
1. Collect files.
{ORACLE_HOME}\admin\{ORACLE_SID}\bdump\*.trc
{ORACLE_HOME}\admin\{ORACLE_SID}\udump\*.trc
{ORACLE_HOME}\admin\{ORACLE_SID}\cdump\*.trc
{ORACLE_HOME}\admin\{ORACLE_SID}\pfile\*.*
admin\{ORACLE_SID}\bdump\alert_{ORACLE_SID}.log
Note:
ORACLE_HOME refers the installation path of the ORCLE, which
is usually installed in the D:/ORACLE directory on the OMM
server.
Through gathering files, it is found that the records in the
alert_{ORACLE_SID}.log are as follows.
Wed Sep 12 00:39:07 2007
KCF: write/open error block=0x1055 online=1
file=2 D:\ORACLE\ORADATA\OMC\UNDOTBS01.DBF
error=27072 txt: ’OSD-04008: WriteFile() Failed, unable to
write files to it
O/S-Error: (OS 1453) Quota insufficient, unable to complete
the required service’
Wed Sep 12 00:39:07 2007
Errors
in
d:\oracle\admin\omc\bdump\omc_ckpt_2120.trc:
ORA-00202:
TROL03.CTL’
controlfile:
file
’D:\ORACLE\ORADATA\OMC\CON-
ORA-27091: skgfqio: unable to queue I/O
ORA-27070: skgfdisp: async read/write failed
OSD-04006: ReadFile() Failed, unable to write files to it
O/S-Error: (OS 1453) Insufficient quota exists to complete the
required service
Wed Sep 12 00:39:07 2007
Errors
in
d:\oracle\admin\omc\bdump\omc_dbw0_1612.trc:
file
ORA-01242: data file suffered media failure: database in
NOARCHIVELOG mode
ORA-01114: IO error writing block to file 2 (block # 4181)
68
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 5 OMM System Faults
ORA-01110: data file 2:
DOTBS01.DBF’
’D:\ORACLE\ORADATA\OMC\UN-
ORA-27072: skgfdisp: I/O error
OSD-04008: WriteFile() Failed, unable to write files to it
O/S-Error: (OS 1453) Insufficient quota exists to complete the
required service
DBW0: terminating instance due to error 1242
Wed Sep 12 00:39:09 2007
Errors
in
d:\oracle\admin\omc\bdump\omc_lgwr_2084.trc:
file
ORA-00345: redo log write error block 45901 count 2
ORA-00312: online log 1 thread 1:
DATA\OMC\REDO01.LOG’
’D:\ORACLE\ORA-
ORA-27072: skgfdisp: I/O error
OSD-04008: WriteFile() Failed, unable to write files to it
O/S-Error: (OS 1453) Insufficient quota exists to complete the
required service
Wed Sep 12 00:39:31 2007
Errors
in
d:\oracle\admin\omc\bdump\omc_ckpt_2120.trc:
file
ORA-00204: error in reading (block 1, # blocks 1) of controlfile
ORA-00202:
TROL03.CTL’
controlfile:
’D:\ORACLE\ORADATA\OMC\CON-
ORA-27091: skgfqio: unable to queue I/O
ORA-27070: skgfdisp: async read/write failed
OSD-04006: ReadFile() Failed, unable to write files to it
O/S-Error: (OS 1453) Insufficient quota exists to complete the
required service
Wed Sep 12 00:39:31 2007
Errors
in
d:\oracle\admin\omc\bdump\omc_lgwr_2084.trc:
file
ORA-00340: IO error processing online log 1 of thread 1
ORA-00345: redo log write error block 45901 count 2
ORA-00312: online log 1 thread 1:
DATA\OMC\REDO01.LOG’
’D:\ORACLE\ORA-
ORA-27072: skgfdisp: I/O error
OSD-04008: WriteFile() Failed, unable to write files to it
O/S-Error: (OS 1453) Insufficient quota exists to complete the
required service
Preliminarily, judge the fault results from insufficient table
space.
2. In DOS, type the sqlplus “sys/omc@omc as sysdba” command, and find that the 9205-patch is not installed. The oracle
version is still 9201.
Confidential and Proprietary Information of ZTE CORPORATION
69
ZXWN MGW Troubleshooting
Note:
Adjusting database parameters and installing 9205-patch are
recommended.
3. The adjusted database parameters are as follows.
sqlplus "sys/omc@omc as sysdba"
alter system set processes=300 scope=spfile;
alter system set timed_statistics=FALSE scope=spfile;
alter system set aq_tm_processes=0 scope=spfile;
alter system set shared_pool_size=167772160 scope=spfile;
alter system set java_pool_size=167772160 scope=spfile;
alter system set large_pool_size=67108864 scope=spfile;
alter system set db_cache_size=209715200 scope=spfile;
alter
system
scope=spfile;
set
pga_aggregate_target=167772160
alter system set undo_retention=1800 scope=spfile;
alter system set log_buffer=1048576 scope=spfile;
show parameters pfile;
4. Close the database instance.
shutdown immediate
5. Restart the database instance again.
startup
Summary
The oracle is not update to version 9205, which may cause some
uncertain factors. Therefore, it needs to confirm that the patches
are already installed for OMM database and user database, and
the OMM database is upsized successfully.
Analyzing Instance 7
Topic
OMP fails to communicate with the OMM server.
Symptom
OMP cannot communicate with the background server during MSC
Server debugging.
Fault analysis and
location
1. The engineer cannot ping the IP address of the foreground OMP
from the background server. However, the OMP is started and
runs the version normally.
2. The engineer checks the IP configuration of the OMP, which is
in the same network section with the background server. And
both of them has same mask. In normal circumstances, the IP
address of the OMP should be pinged through.
3. The engineer checks the FE interface of the switch, and finds
both status indicators and network cables are normal.
70
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 5 OMM System Faults
4. The engineer checks the switch settings, and finds that it has
been separated into several VLANs, but that the network interface of the background server is not in the same VLAN with
that of the OMP.
Solution
The engineer adjusts the network cable of the OMP to make it use
the same VLAN with that of the OMM server. The communication
is normal.
Summary
IP communication network requires appropriately planning the IP
address, and separating the VLAN for the switch.
Analyzing Instance 8
Topic
Opening two OMM clients causes the failure of querying performance statistics results.
Symptom
After starting two clients, an engineer fails to query the performance statistic results for unknown error.
Fault analysis and
location
1. The engineer deletes the temporary files in the temp folder.
After that, the performance query is successful. But the fault
appears again very soon.
2. The problem always occurs when two OMM clients are started.
In this project, MSCS and MGW are located at different places.
Each has its own server side.
Fault Cause
When one OMM client is started, some temporary files will be automatically written in the TEMP folder. Meanwhile, these files will
also be resident in the memory. When another client is started for
performing performance query, the system will also write some
temporary files in the TEMP folder. Since some temporary files
with the same name are used by the OMM client that is started
previously, the system implements write-protect for these files.
This results in write failure, which causes the query to be terminated abnormally.
Solution
Close one of OMM clients. The performance query may be performed normally.
Confidential and Proprietary Information of ZTE CORPORATION
71
ZXWN MGW Troubleshooting
This page is intentionally blank.
72
Confidential and Proprietary Information of ZTE CORPORATION
Chapter
6
Interconnection Faults
in IP-Bearer Network
Table of Contents
Handling Continuous Call Loss Generated for Broken Receiving Fiber of Soft-Switch ......................................................73
Handling Call Loss Generated by Soft-Switch after CE
Restarts ...........................................................................74
Handling Soft-Switch Failing to Ping through CE.....................75
Handling Continuous
Call Loss Generated for
Broken Receiving Fiber of
Soft-Switch
Fault Description
When the receiving fiber at the soft-switch side is broken, high call
loss appears. At the same time, there are about 300 online calls in
stable status. However, there normally should be about 600 online
calls.
The networking structure is shown in Figure 26. The media plane
interfaces adopt the load-sharing mode, and enable the BFD fast
detection with the CE. The BFD is used to bind the static route.
When the receiving fiber is interrupted at the MGW side, it can
disable corresponding outgoing routes through this optical interface by using the BFD function. When the receiving fiber is interrupted at the MGW side, the ZTE soft-switch system can make the
external port down through the board on which no optical signal
is input, to quickly delete the outgoing route through this optical
interface.
Confidential and Proprietary Information of ZTE CORPORATION
73
ZXWN MGW Troubleshooting
FIGURE 26 INTERCONNECTION BETWEEN MEDIA PLANE
NETWORK
AND
BEARER
Analyzing and
Processing
1. After a fault occurs in the media plane interface of the MGW,
immediately disable the route through the faulty interface. The
method is: input the SHOW IP ROUTE ALL command on the
OMM client to check all the IP V4 route entries. The static route
through the MGW’s faulty interface is already down, indicating
that the MGW processing is correct.
2. Check whether the CE side correctly handles the static route
after the BFD is down. The CE engineer logs in the router,
and checks the route table. He/she finds that the static route
through the faulty port is still in Active status. He/she confirms
that the BFD is not bound with the static route.
Fault Cause
When the interface link is faulty, both the soft-switch equipment
and the CE must disable the route through the faulty interface
through related detective mechanism. Otherwise, it will cause the
data traffic, and affect the services.
Solution
The fault is handled after the CE engineer modifies the data configuration.
Handling Call Loss
Generated by Soft-Switch
after CE Restarts
Fault Description
74
The soft-switch equipment interconnects to another vendor’s CE
in the load-sharing mode, as shown in Figure 27. After the CE1 is
powered off, all the outgoing traffic of the soft-switch equipment
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 6 Interconnection Faults in IP-Bearer Network
passes through the CE2. There are about 800 call loss records
during the power-on of the CE1.
FIGURE 27 INTERCONNECTION BETWEEN SOFT-SWITCH
AND
CE
Analyzing and
Processing
1. When the call loss is generated during the power-on of the CE,
ping the CE gateway from the ZTE soft-switch network management platform. The CE gateway can be pinged through,
but fails to ping the opposite soft-switch through the bearer
network.
2. Detect the opposite soft-switch equipment of the IP bearer network through the Trace tool, and find that the first hop can be
transmitted to the CE, but the subsequent routes fail to reach
the CE.
Fault Cause
When the CE is powered on, probably the port between the CE
and the soft-switch is restored at first, and then the BFD, but the
dynamic route of the IP bearer network is not restored. In this way,
the data traffic of the soft-switch will be sent to the CE, resulting
in the packet loss and bringing serious influence on the services.
Solution
The CE engineer adjusts the power-on sequence of the CE port to
make sure that the internal routes of the IP bearer network restore
first after the CE is powered on.
Handling Soft-Switch Failing
to Ping through CE
Fault Description
The ZTE soft-switch equipment is interconnected to another vendor’s CE. During the call, the soft-switch fails to ping through the
CE address after the electric interface between the MSCS and CE
is restored after physical disconnection.
Confidential and Proprietary Information of ZTE CORPORATION
75
ZXWN MGW Troubleshooting
Analyzing and
Processing
1. After the laptop is directly connected with the signaling interface board of the soft-switch equipment, the laptop can ping
through the SIPI interface address.
2. After the laptop is directly connected with the CE, the laptop
cannot ping through the interface address of the CE. By capturing the package with the WireShark, the engineer only finds
the ARP request sent out, but does not receive the response
from the CE.
The above-mentioned analysis shows that the fault is not located
at the soft-switch side. The CE engineer checks the data, and finds
that the CE’s electric interface cannot be configured as Enforced.
However, the CE engineer said the electric interface of the CE is in
enforced mode when he/she negotiated the data with us.
76
Fault Cause
Inconsistent electric interface configuration will cause the softswitch failing to ping through the CE.
Solution
On the soft-switch network management platform, the engineer
modifies the soft-switch’s electric interface to auto-negotiation
mode, and the fault is solved.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter
7
Voice Faults
Table of Contents
Common Voice Faults .........................................................77
Troubleshooting Ideas and Common Methods ........................78
Echo Fault Handling ...........................................................82
Monolog Fault Handling ......................................................89
Both-Way Silence Fault Handling..........................................90
Noise Fault Handling ..........................................................92
Cross-Talking Fault Handling ...............................................92
Instance Analysis ..............................................................93
Common Voice Faults
Overview
Voice Fault Types
In general case, voice faults result from bearer problems. In
special case, other problems, such as signaling compatibility
and abnormal parameter processing made by control plane, will
also cause voice faults. Or, handover or other services happens
after the subscriber enters the stable status, which will also
result in voice faults. For these cases, analyze signaling to find
corresponding cause.
There are the following common voice phenomena.
�
Echo
Speaker hears his/her own and the opposite’s voice at the same
time in telephone.
�
Monolog
The local party can hear the opposite voice for a period of time,
but the opposite party cannot hear the local party’s voice during a call.
Monolog divides into long-time monolog and instantaneous
monolog. Long-term monolog refers to the monolog lasts for
a long time, and cannot restore. Instantaneous monolog lasts
for a short time. Usually, a call becomes normal after two to
five seconds. Monolog is different from voice intermittence.
Voice intermittence usually results from poor quality at wireless side. Because of this, the listening sound is discontinuous,
but with very short interval.
�
Both-way silence
Calling and called parties cannot hear the opposite’s voice, but
can hear his/her own echo sometimes.
Confidential and Proprietary Information of ZTE CORPORATION
77
ZXWN MGW Troubleshooting
�
Noise (gabbling call)
The voice quality is poor during a call, sometimes along with
metallic sound, iron forged sound, jangle, other noises. These
abnormal sounds are discontinuous, abrupt, and short.
�
Cross-talking
Calling or called party may hear voices, ring-back tone, and
other sounds from other person during a call.
Voice Fault
Analysis
Generally, voice faults are probably caused by CN NE, or wireless
network (BSC/BTS). Sometimes, they are associated with mobile
phones.
In CN, the common fault points are as follows.
�
Abnormal signaling processing for incompatible signaling
�
A/IU interface trunk or inter-office trunk
�
UIMT/TFI/TSNB board fault
�
Fiber or fiber module between UIMT and TFI
�
VTCD board or some DSP is faulty during IP calls.
In addition, EDRT board of BSC, BTS, mobile phone, and wireless
environment may also cause tone faults.
Troubleshooting Ideas and
Common Methods
This section describes troubleshooting ideas and common methods.
Troubleshooting Ideas
The most crucial point for troubleshooting voice faults is to make
sure which NE in the network causes the fault firstly. If it is difficult
to locate the fault quickly, analyze and process it by referring to
the following ideas.
If not sure which NE causes the fault, you should coordinate with
engineers at wireless side and service personnel from other exchanges to troubleshoot the fault together.
�
Know subscriber’s complaints.
Before troubleshooting the fault, you must clearly know detailed information about complaints. Focus on the following
information.
�
Calling and called numbers
�
Calling and called locations (located exchange and BSC)
�
78
Occurrence probability of the fault (whether it occurs during
every call or occasionally)
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
�
�
Time of the fault occurs (whether it occurs in ring-back
phase, when the call is just completed connected, or during
a call).
Know the operations before and after the fault occurs.
Know the operational conditions of local NE, wireless equipment, and other NEs in the whole network before and after the
fault, such as:
�
�
�
�
Capacity expansion is implemented for A-interface or interoffice trunk.
Engineering personnel changed trunk jumpers in the equipment room.
The EFR function is enabled at wireless side.
Fault reproduction
Fault reproduction is used to make a fault occur again through
many times of call quality tests when service personnel cannot
locate the fault immediately after knowing complaints. This is
to find the common points so as to find the fault source.
The following aspects require attention when reproducing a
fault.
�
�
�
�
Respectively record the signaling when the voice-quality
problem occurs and when the voice is normal.
Record the probability of this problem occurring.
Record the call model and information of involved NEs when
this problem occurs.
Fault Analysis
Synthesize the collected information from the three aspects
mentioned above, and analyze the fault.
Common Methods for Locating Faulty
NE
The following methods are frequently used for locating the NE(s)
that results in the fault. You may troubleshoot the fault with one
or several methods.
Exclusion Method
The voice quality problem is complex. In most cases, it results
from the combined influence of several NEs. To reduce the
troubleshooting range, you may preliminarily locate the faulty NE
through dialing test or loopback test.
There are two methods of excluding NEs.
Perform dialing test for specified services, offices, and resources
(CIC or TC) to locate the faulty NE. The general principles are as
follows.
�
Calls are normal in the local BSC, which proves that BSC processing, transmission from MGW to BSC, and some functions
of MGW are normal.
Confidential and Proprietary Information of ZTE CORPORATION
79
ZXWN MGW Troubleshooting
Loopback between
NEs
�
Calls cannot be connected on some BSCs in a same MGW. It is
doubted that these BSCs process calls abnormally, or A-interface trunk works abnormally.
�
Local office call is normal, but inter-office call is abnormal.
Probably, the fault results from the processing problem of inter-office trunk or opposite-end office.
�
Local office plays ring-back tone abnormally, which becomes
normally after calls. Probably, MRB board of local office or
T-network-related board is abnormal.
Perform loopback on the bearer between NEs, and then implement
CQT to locate faulty NE.
For example, in a V3 end office, monolog occurs when a mobile
subscriber calls a PSTN subscriber. Implement loopback test on
one E1 of the A-interface trunk. After loopback, specify the call
from mobile subscriber to PSTN subscriber to adopt the loopback
trunk circuit. If the mobile subscriber can hear his/her own voice
after the call is completed connected, but the PSTN subscriber cannot, BSC can be excluded.
Implement loopback test on one E1 of inter-office trunk. After
loopback, specify the call from mobile subscriber to PSTN subscriber to adopt the loopback trunk circuit. If the mobile subscriber
can hear his/her own voice after the call is completely connected,
but the PSTN subscriber cannot, BSC and CN can be excluded.
Caution:
Loopback test has effect on the current network services. Release
loopback as earliest as possible after the test is completed. In
addition, this method has some limitations for BSC and CN NEs
with different implementation modes.
Locating Faults
with Tools
Use 2M signaling analyzer and other similar meters to locate the
system-related problems. Intercept signaling section by section
to determine the problem range, as shown in Figure 28.
FIGURE 28 INTERCEPTING CALLS SECTION
BY
SECTION
During segmentation interception, focus on endpoint of each section (that is the access point of interception equipment). Probably,
the problem just lies in this connection points.
80
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
This method and the call quality test on specified trunk may be
used in a combined manner. For example, specify the third timeslot of the first E1 of the A-interface trunk to accept the CQT test.
After the call is connected, connect the receiving line of 2M signaling analyzer to the independent output interface of E1-corresponded tee connector on the DDF rack.
Methods of Locating Internal Fault
Points on CN
Overview
These methods are usually used for troubleshooting, when voice
faults are certainly caused by the node on CN or leading-in trunkcircuit import. They can also be used for implementing self-test
for CN when the faulty NE is different to be located.
The following scenarios are only used for providing reference for
locating faults onsite. The onsite engineers may also select suitable methods according to practical conditions.
Signaling
Comparison
�
Information collection
Collect signaling messages when voice problems occur and
when voice is normal.
�
Applicable Scenarios
This method is applicable for a certain call model or office
where the voice problem occurs.
Example: a CN is interconnected with several PSTN offices.
The voice quality is normal when mobile subscriber calls PSTN
office A. However, when mobile subscriber calls subscriber in
PSTN office B, voice quality is poor in every call.
Method
Compare the faulty signaling with the normal signaling to see
whether their signaling code streams are different. If they have
different lengths of code streams, or the settings of important
parameters in signaling messages are different, feed back this
signaling to headquarter for making sure whether the signaling
is normal.
Trunk Continuity
Test
�
Prerequisites
Opposite equipment supports continuity.
�
Applicable Scenarios
Inter-office trunk test
�
Method
Perform the continuity test on the trunks on Dynamic Management of OMM client. This method cannot distinguish a
self-looped trunk.
Dialing Test on
Specified Trunk
(TDM Bearer)
�
Prerequisites
SIM cards and mobile phones of calling and called parties are
ready.
�
Applicable Scenarios
Confidential and Proprietary Information of ZTE CORPORATION
81
ZXWN MGW Troubleshooting
Voice problems always focus on one or several A-interface
trunk or inter-office trunk (TDM-type trunk circuit).
�
Method
The current CN system provides the function of doing dialing
test on specified trunks.
Dialing Test on
Specified RTP/TC
Resources (IP
Bearer
�
Prerequisites
SIM cards and mobile phones of calling and called parties are
ready.
�
Applicable Scenarios
Voice problems always focus on one or several A-interface
trunk or inter-office trunk (IP-type trunk circuit).
�
Method
The current CN system provides the function of doing dialing
test on specified trunks.
Board
Changeover/Resetting
If you doubt that the voice fault probably locates on some
functional board, and this board/interface is configured with
active/standby mode, you may change over or reset it manually
or with software. If this board/interface shares load with others,
you may change over or reset spare board/interface to examine
whether some board/interface is faulty.
Caution:
�
Changeover is an operation with relatively high risks. System
data backup must be done in advance.
�
It is recommended to perform changeover on OMP and other
important boards at 0:00 - 6:00 am, and to keep a certain
interval between two changeover activities.
Echo Fault Handling
This section describes the procedure for handling echo fault.
Echo Principles
Electrical Echo
The transmission system on the relay network uses four-wire for
transmission, while subscriber transmission line uses two-wire for
full-duplex transmission. This conversion is implemented through
two/four-wire hybrid in local switch.
However, the transmitter and receiver cannot be isolated completely, since the impedance of actual hybrid coil is mismatched.
Therefore, the two/four-wire converter can only separate the
transmitter from the receiver to a certain extent. The signals
received by four-wire are not converted completely to two-wire.
82
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
Some signals leak out to the transmission part of four-wire. As a
result, echo wave is generated, as shown in Figure 29. Electrical
echo is the main source of echo. Common echo canceller is used
to eliminate it.
FIGURE 29 ELECTRICAL ECHO
Acoustic Echo
Speaker and microphone are not well isolated in some telephones.
Acoustic echo is generated when the sound given out is transmitted
back to the microphone through several times of space-reflection.
This case happens when a hands-free telephone is used in a room
or car.
Working Principles of Echo Canceller
Principle Overview
An echo canceller subtracts the assessed value of echo signals
from the four-wire transmitting path to eliminate the interference.
The echo signal value is assessed based on the voice signals on
four-wire receiving path. Briefly, an echo canceller supervises the
voices from the far-end on the receiving path, and then assesses
the echo value. Finally, subtract this value from the transmitting path. In this way, the echo is eliminated, only the voices
of near-end are transmitted to the far-end.
An echo canceller has four ports. Two ports are located at the
drop-side, and the other two ports are located at the line-side, as
shown in Figure 30.
Confidential and Proprietary Information of ZTE CORPORATION
83
ZXWN MGW Troubleshooting
FIGURE 30 WORKING PRINCIPLES
OF
ECHO CANCELLER
Because the echo to be eliminated is generated at end path, the
delay on the long haul line does not affect the echo canceller. But,
the total circuit delay (end-path plus long-haul-line) determines
whether to use an echo canceller. When it is more than 30 ms, an
echo canceller should be adopted.
Note:
When an echo canceller is installed at near-end, remote subscribers are benefited. When it is installed at far-end, near
subscribers are benefited.
Echo Directions
Incoming EC and outgoing EC are two concepts concerning gateway exchange.
Figure 31 shows the essential meaning of incoming EC and outgoing EC. The marks of incoming and outgoing calls are reflected by
signaling, which may be IAM/IAI signaling or ACM signaling.
FIGURE 31 INCOMING/OUTING EC
�
84
Incoming EC
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
It is the direction that receives an IAM or ACM/CPG/ANM message. This is the direction of an echo. The direction of audio
frequency has the opposite direction with that of this echo.
�
Outgoing EC
It is the direction that sends an IAM or ACM/CPG/ANM message. This is the direction of an echo. Audio frequency has the
opposite direction with that of this echo.
Generally, an echo has directional character. Because of it, the
echo suppressor also has this character. We mainly focus on the
parameters setting for EC direction.
EC direction includes the GMSC-PSTN direction (or E1 direction)
and the GMSC-MSC direction (or HW direction). Suppose the networking structure is PSTN-GMSC-MSC, PTSN generates echoes,
and GMSC provides an echo canceller. EC direction may understand as this: The echo canceller on the E1 direction may suppress
the echo generated when PSTN dials an MS.
Echo-Suppression Implementation
Whether to enable echo suppression in ZXWN MSCS requires integrating the echo suppression information carried by signaling
and the hardware configuration of MGW (whether DTEC board is
equipped)
Currently, ZXWN MGW supports embedded echo suppression function. Independent echo suppression function (EC Pool) is under
research.
Configuring Echo Cancellation
Prerequisites
Before the operation, it is required to confirm:
�
The OMM system runs normally.
�
The OMM client has logged in the OMM server normally.
Context
Perform this procedure to configure the function of echo cancellation.
Steps
1. On the MML Terminal interface of OMM client, select MSCS in
the root tree.
2. In the MML Commands tree, double-click Trunk Configuration > Trunk Group > Modify Information Flag of Trunk
Group.
3. Type Trunk Group Number, and click the box following Enable Flag to pop up the Enable Flag dialog box, as shown in
Figure 32.
Confidential and Proprietary Information of ZTE CORPORATION
85
ZXWN MGW Troubleshooting
FIGURE 32 ENABLE FLAG
4. Select the Echo (Include Echo Killer) check box.
5. Click OK to return to the MML Terminal interface.
6. Click Execute to run the command.
7. On the OMM client, select Views > Professional Maintenance > MSCS > Variables Control > IGW Service Control
Parameter to enter the IGW Service Control Parameter
tab.
8. Select the ISUP Spare row, as shown in Figure 33.
FIGURE 33 ISUP SPARE
9. Click the Modify icon on the sub-tool to enter the editing status. Type 1.
The default is 0.
10. Click Save on the sub-toolbar.
END OF STEPS
86
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
Configuring Echo Cancellation by Adopting
Resource Pool
Prerequisites
Before the operation, it is required to confirm:
�
The OMM system runs normally.
�
The OMM client has logged in the OMM server normally.
Context
In the condition of ECPOOL is configured, it is not allowed to configure the EC sub-card to the DTB and SDTB units where only E1
lines are led out. The SDTB unit does not support ECPOOL.
Steps
1. Perform one of the following steps to specify the MGW exchange to be configured on the MML Terminal interface of
the OMM client.
Execute the SET command in the command input area.
Select the required NE in the root tree.
For example, to select the MGW exchange with the office ID as
31 in the root tree, the command is:
SET:NEID=31;
2. Add an EC POOL resource board. The command is ADD UNIT.
For the parameters in the ADD UNIT command, refer to Table
8.
TABLE 8 PARAMETERS
IN THE
ADD UNIT COMMAND
Parameters
Explanation
Instruction
LOC
Unit location
Format: RACK - SHELF - SLOT
MODULE
Module No.
Specifies No. of the module that
the unit belongs to. Usually 1 is
selected for the OMP module that
the unit belongs to.
UNIT
Unit No.
The No. of the unit corresponded
by the board, ranging from 1 to
2000. Type an unused unit No..
TYPE
Unit type
Selects it according to the actual
conditions.
BKMODE
Backup mode
Select NO.
For example, add an EC POOL resource board in a mode of
DTEC sub-card plus EC. The specific command is as follows.
ADD UNIT:LOC="1"-"2"-"3",MODULE=1,UNIT=3,TYPE=D
TB2_ECPOOL_Z2EC1,BKMODE=NO,BKUNIT=65535,CLK=2
55;
3. Configure the management mode of EC resources. The command is SET ECRSC.
For the parameters in the SET ECRSC command, refer to Table
9.
Confidential and Proprietary Information of ZTE CORPORATION
87
ZXWN MGW Troubleshooting
TABLE 9 PARAMETERS
IN THE
SET ECRSC COMMAND
Parameters
Explanation
Instruction
EC
The management modes of
EC resources are as follows.
Type POOL.
�
CONNECTED: Direct
connected mode
�
POOL: Pool mode
For example, to set the management mode of EC resources to
the POOL mode, the command is as follows.
SET ECRSC:EC=POOL;
END OF STEPS
Postrequisite
Transfer the data tables.
System Implementation
�
Incoming processing for tandem/end office
Whether to enable outgoing EC for incoming calls is determined
by trunk flag and the echoInd field carried by IAM message.
�
�
�
If the echoInd field carried by incoming IAM message is 1,
it indicates that the preceding office enables the EC function. By default, local office will disable this function. If
the trunk flag is “ECHO” and the ISUP Spare variable is
configured with 1, local office will forcibly enable the EC
function.
If the echoInd field carried by incoming IAM message is 0,
it indicates that the preceding office disables the EC function. By default, local office will determine whether to enable this function. If the trunk flag is ECHO, local office will
enable the EC function.
Processing at outgoing side of tandem office
Currently, the incoming EC function can be enabled forcibly for
outgoing calls, which is determined by the trunk flag and the
echoInd field carried by incoming ACM message.
�
�
88
If the echoInd field carried by incoming ACM message is
1, it indicates that the preceding office enables the EC function. By default, local office will not enable this function.
If the trunk flag is ECHO and the ISUP Spare variable is
configured with 1, local office will forcibly enable the EC
function, and the echoInd field carried by outgoing ACM
message is 1.
If the echoInd field carried by incoming ACM message
is 0, it indicates that the preceding office disables the EC
function. Local office will determine whether to enable this
function according to the configurations. If the trunk flag
is ECHO, local office will enable the EC function, and the
echoInd field carried by outgoing ACM message is 1.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
�
Processing an originating call of end office
An end office originates a call. When it requests an A-interface
circuit, EC is enabled by default. CC sends an message to interoffice signaling during voice service. The EC flag contained by
this message is forcibly configured as 1, indicating that EC is
enabled.
But, mobile terminal has its own EC function so that the end
office does not need to enable it. In subsequent version, EC is
not required when an A-interface circuit is requested. The EC
flag carried by the message sent to inter-office signaling is still
forcibly configured as 1.
�
Special Notes
If the ECHO (Include Echo Killer) flag is not selected for outgoing trunk group (for example, SET TGFLG:TG=1,DISABL
E="Echo";), and the ISUP Spare variable is valued as 1, the
echoInd field carried by outgoing IAM message is 1.
During an originating call of local office, the echoInd field carried by outgoing IAM messages is 1 under default conditions.
If local office serves as a transit exchange, the echoInd field
is carried by incoming IAM message transmitted transparently.
Fault Processing
Generally, an echo is generated when a mobile subscriber belonging to a gateway exchange calls a fixed subscriber. In this case,
you need to check whether DTEC board is installed in the trunk
group between mobile exchange and fixed exchange, and whether
the EC function is enabled during the corresponding data configuration.
In addition, mobile phone’s own problem will also cause an echo.
To locate this fault, just replace it with another test terminal.
Monolog Fault Handling
Overview
The monolog fault often occurs, which involves many NEs. Because many reasons may cause this fault, onsite engineer should
find the common points according to fault information so as to locate and handle it.
Fault Analysis
The monolog fault is relatively complicated, which is probably
caused by the following reasons.
�
BSC/RNC or other NE problem
EDRT of BSC, carrier frequency interference of radio signals,
uplink/downlink imbalance, and other conditions may cause
monolog.
�
Mismatched wires of trunk
Receiving or transmitting line of some E1 on the trunk circuit
is connected to a wrong E1, or cables are badly welded. No-
Confidential and Proprietary Information of ZTE CORPORATION
89
ZXWN MGW Troubleshooting
tice that the trunk problem only occurs on either receiving or
transmitting direction; otherwise, it is a both-way silence fault.
�
Signaling compatibility
Signaling incompatibility results from the interconnection between equipment produced by different manufacturers in the
CN. As a result, the monolog fault occurs.
�
TSNB/TFI/UIMT/VTCD board is faulty.
The monolog fault may be caused by the circuit-connection
error in switching network board, the optical module fault of
TFI board, and other conditions.
�
Fault Handling
If ring-back tone or color ring back tone can be heard normally,
but the monolog phenomenon suddenly appears during a call,
check whether handover or radio signal interference occurs,
and whether the radio signal strength is normally.
Service personnel must regard Troubleshooting Ideas as overall
guiding principle. Handle monolog fault with the locating methods
mentioned above according to onsite conditions.
Since currently voice stream cannot be saved, service personnel
can only hear voice manually for judgment when voice fault (bothway silence, monolog, and other faults) occurs. Analyze the CDR
files to judge the circuit seized by this call so as to reduce the range
of hearing voice. Loop is the best method used to judge where the
fault occurs. With this method, reduce the troubleshooting range
till the fault is located.
Both-Way Silence Fault
Handling
Overview
As monolog fault, both-way silence fault often occurs. In addition,
these two faults are confusable. Actually, there is an essential
distinction between them. When a fault occurs, service personnel
should distinguish them by respectively knowing the subjective
feelings of calling and called parties.
Both-way silence fault is mostly caused by interconnection of trunk
circuits.
Fault Analysis
�
Analyzing fault for TDM bearer
Corresponding to a TDM bearer, both-way silence fault is generally caused by the following conditions.
�
BSC or other NE problem
�
Trunk wires are mismatched or self-looped.
Mismatched-wire condition in both-way silence fault is
somewhat different from that in monolog fault. The following conditions possibly result in both-way silence fault.
Both receiving and transmitting lines of some E1 on the
trunk circuit are connected to a wrong E1. Receiving and
transmitting lines are connected inversely during interconnection. Cables are badly welded. In addition, trunk
90
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
self-loop will also cause this fault. In this case, calling and
called parties can hear his/her own voice, which is easy to
reveals the problem.
�
PCM system IDs at both sides of a trunk for interconnection
are inconsistent.
The inter-office trunk between MSC A and MSC B are totally interconnected N E1s. The PCM system ID of MSC A
is “0~N-1”, while that of MSC B is “1~N”. When MSC A
allocates the second timeslot in the E1 of the trunk circuit
whose PCM system ID is 1, the corresponded physical circuit is the second timeslot of the second E1. However, the
physical circuit actually used by MSC B is the second timeslot of the first E1.
�
TSNB/TFI/UIMT board fault
Both-way silence fault may be caused by the circuit-connection error in switching network board, the optical module fault of TFI board, and other conditions.
�
Analyzing fault for IP bearer
Corresponding to an IP bearer, both-way silence fault is generally caused by the following conditions.
�
RNC or other NE is faulty.
�
GLI board in MGW is faulty.
For example, the faulty memory on the GLI board causes
this board’s forwarding table error, which results in bothway silence.
�
UIM board in MGW is faulty.
For example, the forwarding chip on the UIM board is faulty.
The current version is added the auto-sensing and autochangeover functions to evade this fault. Generally, this
kind of problem will not appear in the outfield.
�
VTCD board in MGW is faulty.
For example, VTCD chip has a hardware fault or other problems.
Fault Handling
To handle both-way silence, perform the CQT test first to make
sure the fault range, whether both ring-back tone and voice cannot
be heard, or just voice call is abnormal during the whole call.
�
If both-way silence occurs in all types of calls (intra-office and
outgoing calls), it is usually related with core switching board,
such as TSNB and PSN.
�
If this fault occurs in the calls to some office, it is usually related
with the trunk circuit or opposite device(s) of this office.
�
Perform board changeover to handle the both-way silence fault
that is possibly caused by malfunctioning TSNB/PSN/UIMT/TFI
board.
�
To troubleshoot the fault caused by trunk circuits, perform CQT
test on designated trunk for TDM circuits; or on designated
RTP/TC resources for IP circuits.
�
To troubleshoot the fault related with IP bearer network,
you can only capture packets on each rank, for CS version
3.06/3.07.11/3.07.20 does not yet provide the loopback
Confidential and Proprietary Information of ZTE CORPORATION
91
ZXWN MGW Troubleshooting
function. Generally, capture the IP messages between NEs
first to make sure that which NE causes this fault, and then
capture the messages inside this NE to make sure of the faulty
module.
Noise Fault Handling
Overview
Fault Analysis
Fault Handling
Noise fault occasionally appears in a call, which is usually concurrent with monolog or both-way silence.
This fault is possibly due to one or more of the following causes.
�
The DDF frame where the trunk circuit is located is not well
grounded, or error codes occur on the transmission equipment.
�
Poor radio link signals, radio frequency interference, or the system problem of radio antenna and feeder.
�
Clock fault in the CN, or BSC and other NEs
�
TSNB/TFI/UIMT board fault in the CN
�
TDM/IP terminal deadlock due to MGW version error.
To troubleshoot a noise fault, perform CQT test to make sure the
fault range. If a noise fault occurs in all types of calls (intra-office and outgoing calls), it is usually related with core switching
board, such as TSNB and PSN. If this fault occurs to the calls to
some office, it is usually related with the trunk circuit or opposite
equipment of this office.
Perform board changeover to handle the noise fault that is possibly caused by malfunctioning TSNB/PSN/UIMT/TFI board working
in active/standby mode. For a noise fault caused by TDM trunk
circuit, specify a trunk to perform the CQT test to locate it. For
a noise fault caused by IP trunk circuit, capture IP messages to
locate it.
Cross-Talking Fault
Handling
A noise fault occasionally appears in a call, which is usually concurrent with monolog, both-way silence, or noises.
Fault Analysis
92
Generally, cross-talking fault is seldom caused by the CN problem.
The common causes are as follows.
�
BSC fault, radio frequency interference, or other radio problems
�
Connection error occurs on TSNB, PSN and other switching
boards in the CN
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
Instance Analysis
This section describes some voice-fault instances.
Analyzing Instance 1
Topic
Networking
Diagram
Calls of some end office occurs monolog, both-way silence and
cross-talking faults.
Figure 34 shows the networking diagram of this instance.
FIGURE 34 NETWORKING DIAGRAM
Symptom
Analyzing and
Processing
When an end office is in maintenance period, monolog, both-way
silence, cross-talking and other faults appear in some call.
1. Reproduce the fault through call quality test.
�
�
�
The subscriber dials intra-office calls under MGW 1 for 50
times. Monolog, cross-talking and other faults do not appear.
The subscriber dials intra-office calls under MGW 2 for 50
times. Monolog, cross-talking and other faults do not appear.
The subscriber under MGW1 dials a NOKIA subscriber for
50 times. The route is MGW1-T1-NOKIA. One time of
Confidential and Proprietary Information of ZTE CORPORATION
93
ZXWN MGW Troubleshooting
monolog, two times of cross-talking and both-way silence
faults appear.
�
�
�
The subscriber under MGW1 dials a NOKIA subscriber for
50 times. The route is MGW1-MGW2-T2-NOKIA. One time
of monolog and two times of cross-talking faults appear.
The subscriber under MGW2 dials a NOKIA subscriber for
50 times. The route is MGW2-T2-NOKIA. Monolog, crosstalking and other faults do not appear.
The subscriber under MGW2 dials a NOKIA subscriber for
50 times. The route is MGW2-MGW1-T1-NOKIA. Two times
of monolog and one time of both-way silence faults appear.
2. Analyze call quality results
By analyzing the call quality test results of local office, find
that voice faults are not caused by BSC, A-interface trunk circuit, and local-office boards. By analyzing the results of tests
through different calling routes, find that the MGW1-T1-NOKIA
inter-office trunk and some E1 lines on the circuits between
MGW 1 and MGW 2 have problems, which cause voice faults.
3. Perform designated trunk call quality test
Perform designated trunk call quality test results on the MGW1T1-NOKIA trunks, and find that voice faults always occur in the
calls through some E1 lines. Check the jumpers on this E1 line,
and find that some lines are connected inversely or crosswise.
4. Check physical circuit
Since NetNumen M30 (V3.06) installed in the current network
does not support performing call quality test on inter-MGW circuits, onsite service personnel directly check the physical connection of this part of circuits. He/she finds that some E1 line
is connected inversely or crosswise.
Solution
Rectify the trunk circuits having connection errors. The fault is
eliminated.
Analyzing Instance 2
Topic
Echo and monolog faults happen in some end office.
Symptom
The calling subscriber belongs to ZTE end office, while the called
subscriber belongs to S-manufacturer end office. The calling subscriber can hear the called subscriber’s voice, without any exceptions. But the called subscriber cannot hear the opposite’s voice,
but only an echo of his/hers. There is a direct-connected trunk
circuit between these two end offices.
Analyzing and
Processing
1. Inquire into the operating conditions before and after the fault
happened.
Before the fault occurs, seven A-interface circuits (PCM 16~22)
are added, of which four circuits are abnormal.
2. Perform interception test.
Connect a 2M online BER tester with the A-interface of ZTE end
office for intercepting timeslots. The called subscriber sound
is heard on uplink and downlink.
94
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
3. Analyze the fault.
Since the called subscriber’s voice is audible on the uplink and
downlink of the A interface, its uplink is abnormal. Possibly,
the faulty point is located at BSC, A-interface trunk, and Abis
circuit.
4. Handle the fault.
i. At the CS side, replace the abnormal PCM 19 port with the
normal PCM 20 port, and swap data between PCM 19 and
PCM 20. The BSC side remains unchanged. PCM 19 port is
abnormal, while PCM 20 port is normal, indicating that the
MGW ports are normal.
ii. At the CS side, the trunk group corresponded by PCM
16~19 is changed, but these four circuits are still faulty.
iii. Perform changeover on the EDRT board at the BSC side,
but the fault still remains.
iv. On the TIC of the BSC side, exchange the transmission
ports of PCM 19 and PCM 20, and the ports corresponded
by data, too. The MGW side remains unchanged. However,
PCM 19 port is still abnormal, while PCM 20 port is normal,
indicating that the ports at the BSC side are normal.
v. Perform soft-loop on transmission, and the same fault phenomena are simulated. So, the fault is caused by transmission problems.
Solution
The transmission personnel of A-interface are responsible for handling this fault. The fault is eliminated after the abnormal circuit
is replaced.
Analyzing Instance 3
Topic
Background noises are audible in the calls of an end office.
Symptom
Background noise similar like firecracker sound appears in some
calls of this end office. Either or both of the parties can hear it
simultaneously.
Analyzing and
Processing
Background noise appears after several times of intra-office call
quality tests. Since intra-office call does not involve other MSC,
the call is connected only through MSC of local office and BSC. And
inter-office trunk or opposite device will not affect it. But, you still
need to check whether the noise comes from the radio side or the
CN side.
Contact with headquarters, perform voice-channel self-loop on the
trunk board of MSC or BSC to locate the fault. Log in the SDTB/DTB
board occupied by the foreground, and operate self-loop voice
channel with related commands.
When background noise appears during the call test, perform selfloop from the timeslot of the CN A-interface occupied by calling
and called parties to the BSC side. Noise does not exist. Perform
self-loop to the UIMT side. Both parties can hear noises. It is
basically sure that the fault is located at the CN side. After dialing
for several times, find that either of the calling and called parties
Confidential and Proprietary Information of ZTE CORPORATION
95
ZXWN MGW Troubleshooting
will seize the two SDTB trunks in shelf 2 in rack 2 of MSC when
noise appears.
The voice code stream of intra-office call involves ETSN, TFI, UIM
and SDTB boards in MSC.
�
SDTB board is a trunk board.
�
UIM board completes Ethernet level-2 switching inside shelf,
and connects the TFI board (timeslot switching interface board)
through fiber.
�
ETSN board is a timeslot switching board.
An uplink originated call from A-interface trunk is transmitted
through SDTB board, and then through UIMT and TFI board, and
finally reaches ESTN. After the timeslot switching processing of
ESTN, this call is sent to SDTB through TFI and UIMT boards.
Finally, it reaches the called party through BSC, an A-interface
trunk. The uplink route of a terminated call is the same as that
of the originated call.
The intra-MSC connection flow of intra-office call is shown in Figure
35.
FIGURE 35 CALL QUALITY TEST FLOW
Check TFI and UIMT boards, and find that the corresponding optical interface of TFI implements active/standby changeover automatically and continuously. This TFI board is connected with the
UIM board of this shelf. And other TFI optical interfaces and UIMT
board are normal.
The changeover notification is also shown in the Fault Management window of the NetNumen M30 window.
Since board fault or incorrect connection between UIMT and TFI
boards may possibly cause noises, change over the ESTN board
of MSC, and find that an exception of optical interface changeover
exists, and noises are not eliminated.
Solution
After the optical module of the TFI board is replaced, the exception of optical-interface auto-changeover is eliminated. Specify
the trunk of the two SDTB boards that occur exceptions to perform the CQT test for several times. Noises do not appear again.
Analyzing Instance 4
Topic
96
Noises result from the transmission system fault.
Confidential and Proprietary Information of ZTE CORPORATION
Chapter 7 Voice Faults
Symptom
After an office is interconnected with its opposite office, the signaling flow is normal. However, some calling subscribers complain that they can hear noises when they dial an intra-office called
party.
Fault Analysis and
Location
1. Analyze the fault phenomena, and find that noises basically
result from inter-office fault, which probably lies in the trunk
circuit to an office.
2. Check the relation schema of traffic flow and the network topology structure diagram of local office. The incoming and outgoing calls of local office are mainly forwarded by two T-offices.
3. Perform trunk dial-up test to reproduce the fault. Find that the
noise always appears irregularly on one office of a T-office.
4. Check the transmission-circuit type between the MGW and the
T-office, and find that it adopts SDH transmission mode. Check
the alarms related to SDH on the OMM system, and find that a
“Trunk receiving-end alarms” alarm exists on the trunk circuit
to this office. It indicates that obvious error code exists on this
transmission, which probably causes the noise.
5. Contact the service personnel of transmission system for cooperation. Eliminate the fault together with the method of
loop-back segment by segment, and finally locate the fault
point.
Fault Handling
When the transmission service personnel handle the fault, the
alarm in the fault management system disappears. Perform the
dial-up test with the designated mobile phone, the noise disappears.
Summary
�
The noise fault occurs frequently in routine maintenance. To
eliminate it, first determine the noise location according to its
phenomena. It has following common features.
�
�
�
�
If noise always appear in some cells or location areas belonging to local office, its location is related to BSC/RNC.
Primarily check the trunk circuits between local office and
BSC/RNC, and radio side. Contact the service personnel at
the radio side to locate the fault together.
If noise occurs on the inter-office circuit, and intra-office
calls have no noises, the noise location is related to adjacent office. Check outgoing trunk circuits of local office.
Perform dial-up tests on all the trunk circuits, and trace the
inter-office signaling to find the board or transmission system existing fault.
If noise appears irregularly and randomly both inside MGW
office and between offices, check whether it is located in
the local office, for example, in TSNB, TFI, UIM, and SMP
boards, and data configuration of MSCS and MGW.
If the transmission system is faulty, contact its service personnel immediately. Cooperate to find the fault point with the
loop-back method segment by segment to provide necessary
information and help for eliminating the fault quickly.
Confidential and Proprietary Information of ZTE CORPORATION
97
ZXWN MGW Troubleshooting
This page is intentionally blank.
98
Confidential and Proprietary Information of ZTE CORPORATION
Figure
Figure 1 Troubleshooting Flow .............................................. 3
Figure 2 Handling The Fault Occurring During Board
Configuration ....................................................11
Figure 3 Handling Common Faults of Boards..........................13
Figure 4 Troubleshooting Flow of UIM Fault ...........................15
Figure 5 Troubleshooting Flow of OMP Fault...........................18
Figure 6 Handling the SIPI Board Fault .................................20
Figure 7 Troubleshooting Flow of SPB Board ..........................21
Figure 8 Troubleshooting Flow of the CLKG Board...................24
Figure 9 Handling the IPI Board Fault ...................................25
Figure 10 Handling the DTB/DTEC Board Fault .......................26
Figure 11 Handling the APBE Board Fault ..............................27
Figure 12 Handling the MRB Board Fault ...............................28
Figure 13 Handling VTCD Board Fault ...................................29
Figure 14 Handling the Clock Abnormality .............................34
Figure 15 Troubleshooting Flow of Clock Locking Failure ..........36
Figure 16 Example for Clock Self-Loop..................................39
Figure 17 Mc Interface Protocol Stack...................................47
Figure 18 Networking Mode & Interface Protocol Stack
Structure..........................................................51
Figure 19 Networking and Protocol Architecture (Built-In SG
Supported) .......................................................53
Figure 20 A-Interface Protocol Stack Structure ......................54
Figure 21 MGW Built-In Networking Mode and Its Protocol
Structure..........................................................55
Figure 22 Boards Related to Narrow-Band No.7 Signaling
Protocol............................................................55
Figure 23 Handling the Abrupt Abnormality of the OMM
system.............................................................60
Figure 24 Handling Virus/Security Events..............................62
Figure 25 Setting CORBA FTP ..............................................66
Figure 26 Interconnection Between Media Plane and Bearer
Network ...........................................................74
Figure 27 Interconnection Between Soft-Switch and CE ..........75
Confidential and Proprietary Information of ZTE CORPORATION
99
ZXWN MGW Troubleshooting
Figure 28 Intercepting Calls Section by Section......................80
Figure 29 Electrical Echo.....................................................83
Figure 30 Working Principles of Echo Canceller ......................84
Figure 31 Incoming/Outing EC.............................................84
Figure 32 Enable Flag.........................................................86
Figure 33 ISUP SPARE ........................................................86
Figure 34 Networking Diagram ............................................93
Figure 35 Call Quality Test Flow ...........................................96
100
Confidential and Proprietary Information of ZTE CORPORATION
Table
Table 1 UIM Types .............................................................14
Table 2 Indicators on UIM ...................................................16
Table 3 Impedance DIP Switches of E1 on the SPB Board ........37
Table 4 Handling Method of Inconsistent Locking Status of
Active/Standby CLKG Boards...............................40
Table 5 Handling Method of Inconsistent Clock Reference
Status of Active/Standby CLKG Boards .................41
Table 6 Handling Method of Output Clock Loss .......................43
Table 7 Handling Method of Slip Code ...................................44
Table 8 Parameters in the ADD UNIT Command .....................87
Table 9 Parameters in the SET ECRSC Command....................88
Confidential and Proprietary Information of ZTE CORPORATION
101
ZXWN MGW Troubleshooting
This page is intentionally blank.
102
Confidential and Proprietary Information of ZTE CORPORATION
Index
A
L
Active status..................... 30
Adjacent office .................. 48
Alarm analysis ....................5
Alarm level .........................5
Alarm query........................4
Link .................. 5, 7, 22,
25, 39, 49, 51, 74, 92
Local office ..............5–6,
28, 39, 48, 51, 97
M
B
Backplane ................ 10,
19, 36, 40, 42–43
C
Changeover ...................... 19
Client/server.......................1
Conference call ................. 28
Control plane ......... 20, 50, 77
D
Data configuration ................. 5–6, 32,
48, 51, 57, 63, 65, 74
DIP switch ........................ 26
DIP switches ..................... 22
Man-machine command ..... 59
Mask ............................... 60
Matching
impedance ................ 26, 36
N
Network cable ................... 63
Networking mode ...... 50,
54, 58
No.7 signaling .....................1
O
OMM server ................ 9,
64–66, 68, 70–71
Outgoing route.................. 73
P
F
Failure observation ...... 4,
17, 50
File management............... 65
FTP.................................. 66
H
Handover ................... 77, 90
History alarm .................... 43
History notification ............ 43
performance statistic ......... 71
Performance statistics .......................4, 31, 65
Power on .......................... 75
Power-off ...........................8
Probe............................... 55
R
Registration ...................... 48
S
I
IP address .............9, 48,
60, 63–64, 66, 70–71
J
signaling link .................... 48
Signaling link .............. 7,
42, 44, 51–52, 56
Signaling links................... 32
signaling route .................. 51
Signaling route.............. 7, 53
Jumper ............................ 22
Confidential and Proprietary Information of ZTE CORPORATION
103
ZXWN MGW Troubleshooting
Signaling tracing ......4–5,
7, 17, 48, 50
T
Trunk management............ 55
V
Version management ................... 10, 30, 65
104
Confidential and Proprietary Information of ZTE CORPORATION
Descargar