1425132112_thumb.png

Troubleshooting Network Discovery in SCOM 2012

The following article applies to SCOM 2012 BETA and may or may not apply to RC or RTM release. I’ll try to repro the issue in the upcoming releases to see if the behavior changed and provide updates if necessary.

I guess everyone is testing SCOM 2012 Beta right now and a lot of people are already blogging about their experience. I thought it’s time to do the same and share some experience I had with network discovery.

Content of this post:

1. Introduction
2. Discovery
3. Troubleshooting
3.1 Discovery configuration
3.2 Events
3.3 Diagnostic Tracing
3.4 Network Tracing
4. Solution
5. Conclusion

1. Introduction

My small lab environment doesn’t really have lots of high-end network devices (in fact the number of high-end network devices in my lab is ZERO!) but I thought why not bring in my low-end, consumer grade devices for testing. First you should know, that SCOM treats certain devices different as others. Supported Cisco routers for example are monitored much more detailed than a consumer grade Zyxel Zywall-2. SCOM doesn’t really know much about those devices and are therefore handled as a “generic” device with a very limited set of monitoring rules (mostly availability and response times but no monitoring of ports, CPU, bandwidth etc.). By the time of RTM, Microsoft should release a complete list of supported network devices… (see Update 2 below)
As soon as I started the network device discovery I realized that my device couldn’t be discovered. Running the discovery manually showed that the task completed successfully:

The task window shows that the discovery job was successfully kicked off but this doesn’t really mean that the discovery is already done. The actual discovery process runs asynchronously but more on that later in the Discovery section. After a while the “Network Devices Pending Management” view shows my device:

No response ping?! Needless to say that my management server is able to ping the device and responds just nice to the ping requests.

2. Discovery

Let’s have a look how discovery is set up and how it’s working: When you create a network discovery rule you go through a wizard providing a lot of information about discovery methods, accounts, devices, schedule, etc. I will not go through all the options, the wizard provides. Only the parts necessary for this case will be discussed.
Because I only want to discover one single device, I chose “Explicit discovery” as discovery method. My device configuration looks like this:

Note that I selected “ICMP and SNMP” as Access mode. To tell you upfront, selecting only SNMP will successfully add my device but in this case I want to know why ICMP and SNMP is not working – especially since my discovery management server can successfully ping the machine. Besides, I learned a lot about network discovery which might help in other scenarios…

The discovery schedule was set to manually. After you finished the wizard, start the discovery:

3. Troubleshooting

3.1 Configuration

After you finished the wizard, an unsealed management pack with the name “Microsoft.SystemCenter.NetworkDiscovery.Internal” will be modified with your configuration. Once you export and open this management pack in a text editor, you’ll see the module configuration for the discovery:

First thing worth checking is to look into that MP and see if the configuration you’ve done with the wizard is reflected in the MP. Once you are sure the MP was configured correctly and was successfully deployed to the managing management server, examine the discover events…

3.2 Events

Thankfully, Microsoft created a view (and a rule) to pick-up and show all the relevant events for the network discovery progress right in the console:

The view can be found in the Monitoring space under “Operations Manager\Network Discovery\Network Discovery Progress Events”. I highlighted the events in the view for one discovery run. Usually the event 12024 indicates that the discovery is done and the discovery data was created but event 12008 is an interesting one: It shows a summery of the discovery. As you can see one device (my only one) is in pending list. Anyway, the events weren’t very helpful in finding a solution in my particular case…

3.3 Diagnostic Tracing

If you are familiar with tracing in SCOM 2007 (R2), you might not find anything different in SCOM 2012. In fact, I believe that the (hopefully) well known KB is still valid for SCOM 2012:http://support.microsoft.com/kb/942864
However, I learned that the TraceConfig tool now offers a lot more trace providers to do more granular tracing compared to SCOM 2007:

Btw, you still need the FormatTracing command to make the trace log readable.

Before we start with the tracing, we need to edit a config file to enable Network Discovery debug logging:

  • Navigate to C:\Program Files\System Center Operations Manager 2012\Server\NetworkMonitoring\conf\discovery location on management server which is used for network discovery.
  • Edit discovery.conf at this location as below
    • remove “#” before “DebugEnabled = TRUE”
    • Add line “LogDiscoveryProgress = TRUE” (without quotes) below the DebugEnabled = TRUE
  • Restart the Health service on management server.
  • Execute the network discovery. Wait for the discovery to complete. Verify the status is “idle” in administration ->network management->discovery rules or look at the events as shown above.
  • Take a look at the <ManagementServer-FQDN.log> from location c:\temp on the Management Server.

Unfortunately tracing didn’t show any indication of what is wrong with the discovery. Next stop, network tracing…

3.4 Network Tracing

To download and install the Microsoft Network Monitor (3.4) visit:http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=4865
Once the network trace was captured, I realized that the ICMP requests and replies went out and came in as expected:

4. Solution

Thanks again to Yuvraj Attarde from Microsoft, who actually found the solution and helped me through the process. The issue was caused by THE FIREWALL !

Disabling the windows firewall on the management server solved the issue but it’s not really a solution. You might want to keep the firewall on in production. To successfully discover using ICMP and SNMP, you need to allow ping.exe (from %WINDIR%\system32) and monitoringhost.exe (from the SCOM install directory) to communicate without restrictions.

But be careful, there are two ways to configure this:
1. Using the Windows Firewall\Allowed Programs control panel applet:

2. Firewall MMC Snap-In:

You can use either of those options but if you use the MMC snap-in, be sure to create anOutbound and an Inbound rule!
After that, discovery works just fine:

5. Conclusion

When troubleshooting network related issues, always try to disable the firewall (temporarily) to see if the issue originates from that area. I know, I could have done that sooner but then again, I may not have learned those lessons above.

I’m still not sure if this behavior is “as expected” and as Microsoft often nicely states “by design” or just simply a bug. What puzzles me is that SNMP works out of the box without any firewall exception and ICMP does not. It’s also confusing that a ping from command prompt works but in conjunction with the monitoringhost.exe it doesn’t. If you cannot count on ping as a diagnostic tool anymore, what’s the point of having it then?

I hope that the above is useful information for some of you in the need of troubleshooting network discovery issues. If you have any comments or feedback, let me know:stefan.koell@code4ward.net

Special thanks to Yuvraj for teaching me!

 

UPDATE:

Got response from MS about the issue. They plan to create the exception for monitoringhost.exe during install time by RTM. That’s great news and means that the issue I was having doesn’t affect anyone else. Although I guess the troubleshooting guide I’m offering here might help in other cases as well.

UPDATE 2:

MS is already offering a list of all network devices with extended monitoring capabilities:http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26831

Comments 1

  1. Nikhil

    Hi , Could you please help in understanding the way to put the customized formula for fetching the Availability report in SCOM 2007 R2. Not sure which tables in OPERATIONSMANAGERDW database has saved those entries . Any help or pointers will be much appreciated here.
    Regards

Leave a Reply

Your email address will not be published. Required fields are marked *

− 1 = 1