Title: Do’s and Don’ts for a Large-Scale OV Deployment
Session #: 262
Speaker: Tom Santanello
Company: Castello Systems
Agenda
Background
AND Complete all deployments in 30 Months
- Piece of Cake – Right…….
Build Approach - Three Primary Builds
Architecture
OVPI
OVPI
OVPI
OVPI
OVPI
Data Flow
OVPI
Organizational Issues
AND …
Management continually asked …Where’s my Operations Center?
Agenda
Organizational Do’s and Don’ts
User Community Do’s and Don’ts
Deployment Do’s and Don’ts
Agenda
Specific Implementations
OV Internet Services SLA Polling
Every Area has:
OV Internet Services Probe
OV Problem Diagnosis Probe
OV NDAOM Probe
ICD Function
Probes Provide:
Service Availability and Response Time (HTTP, DNS, EMAIL, …..
Path Availability and Response Time
Probe
Probe
Probe
Probe
Probe
Probe
Probe
Probe
Network 1 == 39 pollers
Network 2 ===25 Pollers
Network 3 == 25 Pollers
Currently 2000+ pairs
Opps!!
OV Internet Services SLA Response�Time and Availability Measurements
OV Internet Services – Do’s and Don’ts
possible
Http probes up to date with valid pages
to see what pages are valid and are accessed by end-users
OV Internet Services�Custom Probes – iperf probe
Do – Use Custom Probes
LogMessage("OVISTarget == "+OVISTarget());
OVISAvailable=false; //probe unavailable
Cmd=“ iperf.exe -c "+OVISTarget+" -u ";
WshShell=new ActiveXObject(“ WScript.Shell");
OVISStart("ResponseTime");
Exec=WshShell.Exec(Cmd);
sO=Exec.StdOut;
while (!sO.AtEndOfStream)
………….
LogMessage(“ Jitter size is "+Opt[7]);
OVISSetMetric("Jitter",Opt[7]);
………
OVIS Redundancy
OVIS Redundancy - cont
Step 2 – Add the virtual IP address to the “Advanced TCP/IP Settings” tab in the Local Area Connection Properties configuration box on the Primary server
Step 1 – Create virtual Nodename in DNS
that will migrate between the two
Management servers
OVIS Redundancy - cont
Step 3 – Change the OVIS Services entry on the Backup server to Manual and stop the services
Step 4 – Setup Backup Plan on both servers
OVIS Redundancy cont
OVIS Redundancy - cont
Enterprise Manager
drill down to
SQL Server Agent ->
Jobs and highlight
“Reporter
Backup-Full”
and then select
“start job”
OV Service Desk�Configuration Management Database
OV Service Desk�Configuration Management
Network
Devices
Desktops
HPOU Operations
Data Repository
Service Desk
Problem Diagnosis
NDAOM
Oracle
NNM / ET
S
D
P
R
O
L
I
A
N
T
1
8
5
0
R
HPOW Operations
NNM
/ ET
UNIX
Servers
Security
Devices
WINTEL
Servers
Op
Access
Web Access
EA Portal
Remedy
P
R
O
L
I
A
N
T
1
8
5
0
R
CiscoWorks/
IronView
SLA
Reports
TMS
S
D
P
R
O
L
I
A
N
T
1
8
5
0
R
Marimba /
Supportsoft
CM Sources
NNM - Discovery of all in-scope devices (Network/Servers/Workstations
IronView/CiscoWorks – Configuration/Inventory
Extended Topology – Configuration/Inventory
Marimba – Configuration/Inventory
Macfee – Version
OVOU/OVOW – Service Data
Legacy Server Databases
Data Consolidation into Service Desk
Events Generated from CM changes
if they are unauthorized
CM Changes Mapped to Remedy Ticket Changes
to Identify true events
4
3
2
1
Events
6
CM Database
Mapping of SD CM
data to Remedy
Asset Module
5
OV Service Desk�Configuration Management Database
OV Service Desk�Operator Java Console Drill Down�
JSP Interface
IronView
NNM
Marimba
OV Service Desk�Operator Java Console Drill Down�
Foundry Switch Report
OV Service Desk Do’s and Don’ts
�Scheduled Outage Integration – �Use Case�
Remedy
Service
Desk
OVO
OVIS
Scheduled Outage
Entered
Outage information
pushed to both OVO and
OVIS
Data
Repository
Crystal
Reports
Operator
Console
Events/Polling
Suppressed
WO Created
Mass Outages using OVOU
Mass Outages using OVOU - cont
Step 1 - Create Application Group “Outage”
Mass Outages using OVOU - cont
Mass Outages using OVOU - cont
Step 2 - Create Node groups that contain the servers according to the time/platform that they belong to. For example:
Mass Outages using OVOU - cont
Step 3 - Create external process (mon_outage) on OVO
- This is called by the OVO application created in Step 1 Process does the following:
Mass Outages using OVOU - cont
Why did we do it this way?
OV Performance Manager - Problem
queue sizes built using OVPM, DSI, OVO
opctranm, and Exchange SPI
*Developed by HPC&I
OV Performance Manager - Steps
three graphs in the same browser window.
OV Performance Manager
Step 1 - Exchange - Configure MTA Work Queue monitoring on OVOW
Need to turn on DSI collection
OV Performance Manager
Step 2 – Exchange
FAMILY: Exchange Mail Queues
GRAPH: Exchange MTA QUEUES
DESCRIPTION: Exchange MTA QUEUES
GRAPHTITLE: Exchange MTA QUEUES
YAXISTITLE: Messages
GRAPHBACKGROUND: None
JAVAGRAPHS: Yes
GRAPHMULTIPLEGRAPHS: Yes
GRAPHTYPE: bar
STACKED:
DATARANGE: 4 Hours
ENDDATE: now
GRAPHMETRICSPERGRAPH: 21
AUTOFRESH:
POINTSEVERY: auto
DSN: S2
DATASOURCE: CODA
SYSTEMNAME: exch_svr_1
CLASS: EA:Ex55MTAWorkQUEUE
METRIC: WorkQueueLength
LABEL: exch_svr1
COLOR: Orange
MARKER: marble
…………………..
This was created in previous step
OV Performance Manager
Step 3 - Sendmail - mailqchk.sh - Run on Managed Node
SENDMAIL=/usr/lib/sendmail
SENDMAILVER=`echo \\$Z | ${SENDMAIL} -bt -d0 | tail -1 | head -n 1 | cut -f 2 -d ' '`
# Determine the current sendmail mail queue length.
QueueHeader=`${SENDMAIL} -bp | head -1 | sed -e "s/(/ /"`
if [[ "${SENDMAILVER}" = 8.1?.* ]]; then # newer sendmailversion has other output format
QueueLength=`echo $QueueHeader | awk ' /empty/ {print 0}; /request/ {print $2 }' -`
else
QueueLength=`echo $QueueHeader | awk ' /empty/ {print 0}; /request/ {print $3 }' -`
fi
if [ -n "$QueueLength" ]
then
echo `hostname` `date +%m.%d-%H:%M:%S` ${QueueLength}
exit 0
else
echo Error: `hostname` `date +%m.%d-%H:%M:%S`
exit 1
fi
exit 0
OV Performance Manager
Step 4 - Sendmail – Create mailqsget.sh which collects and logs metrics
sub() {
……………
JOBFILE="/tmp/mailqsget.job"
MAILQCHK="/var/opt/OV/bin/OpC/cmds/mailqchk.sh"
………..
echo $fqnode > $JOBFILE
echo "!" $MAILQCHK >>$JOBFILE
ret=`$OPCTRANM -t $TIMEOUT $JOBFILE 2>&1` ### opctranm to start mailqck.sh on sendmail svr
……….
Line=`echo $ret | grep "^$node"` ######## Grab output of mailqchk.sh
……….
if [ $? -eq 0 ] ; then
qlen=` echo $Line| cut -f 3 -d ‘ ‘ ` ######### Grab Queue length
else
qlen="-1"
fi
………..
$dsiinput=“$dsiinput $qlen”
………
echo $dsiinput
……….
}
……….
sub | / opt/perf/bin/dsilog /var/opt/perf/datafiles/mailqs MAILQS >>$DSILOG 2>&1
######## Last pipes stdout of sub into stdin of dsilog program
OV Performance Manager
Step 5 - Sendmail - Create DSI measurement configuration file for Sendmail servers – mailqs.sp
CLASS MAILQS = 10001
INDEX BY DAY
MAX INDEXES 30
ROLL BY DAY
RECORDS PER HOUR 12;
METRICS
Sendmail_svr_1 = 101
PERCISION 1;
Sendmail_svr_2 = 102
PERCISION 1;
………………
OV Performance Manager
Step 6 - OVPM Server - Create multi-frame html file to display all three graphs
<frameset framespacing="2" rows="40%,20%,40%">
<frame name = "F1" scrolling="no" marginwidth="0" marginheight="0" SRC="http://ovis.com/hpov_iops/cgi-bin/analyzer.exe?-GRAPHTEMPLATE:+ExchangeMailQueues+-GRAPH:+"Exchange+MTA+Queues"">
<frame name = "F2" scrolling="no" marginwidth="0" marginheight="0" SRC="http://ovis.com/hpov_iops/cgi-bin/analyzer.exe?-
GRAPHTEMPLATE:+ExchangeMailQueues+-GRAPH:+"Exchange+Connector+MTA+Queues"">
<frame name = "F3" scrolling="no" marginwidth="0" marginheight="0" SRC="http://ovis.com/hpov_iops/cgi-bin/analyzer.exe?-
GRAPHTEMPLATE:+MailQueues+-
GRAPH:+"Sendmail+Queues"">
</head> <body>
<H1> No frames
</body>
</html>
OV Performance Manager – Mail Graph
OV Operations Unix or �OV Operations Windows ?
OV Operations Unix / �OV Operations Windows
OV Operations Unix – �OPC Internal Messages
Use OpC Number in Description
OVO Java Console Operator View
Because …
The operator’s are not going to use the tool or in the best case they will not get what they need out of the tool
OVO Java Console Operator View
Do – Think about how you want to tie multiple groups of users together
Operators view in OVO linked to OSPF Areas
Loc 1, Loc 2, …
Network Infrastructure +
Collection +
Topology +
Events ==
Operators View
NE/DC/ISS/EM all have the same view
OSPF Area 0
OSPF
Area 1n ++
OSPF
Area 1n++
OSPF
Area 1n++
OSPF
Area 1n++
OSPF
Area 1n++
OSPF
Area 1n++
OSPF
Area 1n++
OSPF
Area 0 + 1n
Poller
Poller
Poller
Server Farm
Server Farm
NNM
Collection Station
NNM
Collection Station
Poller
Message Instructions
Message Instructions
Message Instructions
Protected Enclave Mgmt
Protected Enclave Mgmt
Least to most capability
Agenda
Tying it all together -�Or how to we keep ourselves sane?
Tying it all together – Do’s and Don’ts
Tying it all together – Do’s and Don’ts - cont – Exception Reports
2. Do – Build exception reports on everything you can
Tying it all together – Do’s and Don’ts �- cont - Synching Up EM Systems
OVO
Inventory
report
Seed
File
Reporting
ET
Filter
NNM
IV
Fdry
Config
Bkups
inFdryNotinNNM
CMDB
Events
HTTPget
2
1
3
4
5
6
7
8
Tying it all together – Do’s and Don’ts - cont
2. Do – Take advantage of the EM systems to monitor themselves and other EM systems
OVR report with summarized OVOU message rate over last 96 hrs
Tying it all together – Do’s and Don’ts - cont
4,8,12,24,48 hour
1, 2, 3, 4, 7, 14, 21, 30 day
Tying it all together – Do’s and Don’ts - cont
3. Don’t – Forget to monitor the individual EM systems
Tying it all together – Do’s and Don’ts - cont
Tying it all together – Do’s and Don’ts - cont
Problem – How to manage User Responsibilities
Solution – Build a repeatable process outside of OVO to manage responsibilities
Step 1 – Setup an Excel Spreadsheet to track assignments and use an Excel Macro to publish the spreadsheet to a Web Page
Step 2 – Decide on a clear distinction between the Tier levels (Sol-Adm-T2, Sol-Adm-T3)
Step 3 - Start out with higher level Message Groups and then break them down to smaller groups
Tying it all together – Do’s and Don’ts - cont
Step 4 - Get rid of all the Message Groups that don’t make sense and simplify
Step 5 - Track various states of Message Groups
Step 6 – Decide how you are going to classify your users
OVO Operator Assignment
Tier Profiles(Users)
Step 7 – Break down application profiles so that they match your classification of users
Step 8 – Now map your users and applications to each other
App Profiles
OVO Operator Assignment
Tier Profiles(Users)
Step 9 – Now do the same with your events. Develop classifications by:
Step 10 – Map your Event Profiles to Users the same way you did with Applications
Event Profiles
OVO Operator Assignment
Step 11 – Now map your User Profiles to User Templates
Step 12 – Make the actual assignments in the User Profile Bank
Step 13 – Create the User Templates in the User Bank
User Template Assignments
Tier Profiles(Users)
OVO Operator Assignment
Step 14 – Create the Users
A. Copy User Template
B. Change Data for User
User Profiles
App and Event Profiles
OVO Operator Assignment
Agenda
Correlation Composer – Problem/Solution
Correlation Composer – General Outline
Correlation Composer – Step 1
2. Determine input sources and data that you want to populate Custom Message Attributes (CMA) fields with
Correlation Composer – Step 2
Correlation Composer– Step 3
3. Modify $OV_BIN/CO.conf to setup CMA’s
Correlation Composer– Step 4
4. Create $OV_CONF/ecs/CIB/OVONameSpace.conf
OVO_NNM.fs == NNM and Syslog
OVO_Security.fs = Firewall
OVO_OVIS.fs = OVIS
OVO_OVOU.fs == Solaris
OVO_OVOW.fs == Windows
OVO_Specific.fs = Node specific
OVO_OVOUAdm.fs == Mgmt Svr
OVO_Lookup.fs == Lookup rules
datastore or external data source
$OV_CONF/ecs/CIB/stores
Correlation Composer– Step 5
5. Create script that builds ecs_comp.fs – $OV_CONF/ecs/CIB/scripts/fact_commit
Correlation Composer– Step 6
Correlation Composer– Step 6 – cont
Customer datastore
entries
Correlation Composer– Step 7
Correlation Composer– Step 8
8. Lookup To Datastore
CMA field
name in the CMA_Cust1 list?
ADD DATA(“CMA_Cust1List” , [“svr1.com” , “svr2.com” , ….. ])
Correlation Composer– Step 8 - Cont
the alarm and put the
in the CMA_Customer
CMA field
_CMA_CustAList
Defined in Step 3
Evaluates to “Cust1”
Correlation Composer– Step 9
Correlation Composer – Step 9 - Cont
Get data in index #
Call Perl function
Correlation Composer– Step 9 - Cont
Correlation Composer– Step 9 - Cont
Two keys in this case – one to match primary key and one to match secondary key
Correlation Composer – Step 10
1
2
3
2
2
Correlation Composer– Step 11
Node_A
Node_E
Node_F
Node_I
Node_G
Node_H
Node_B
Node_C
Node_D
Node_E
Node_F
Node_I
Node_G
Node_H
Node_C
Node_D
Node_I
Node_G
Node_H
Location
Remedy Queue
Special Customer
Special Category
Service
Correlation Composer – Do’s and Don’ts
Correlation Composer – �Do’s and Don’ts - cont
Questions?
Tools enable the process, they are not the process
A Fool With a Tool is Still a Fool