Tuesday, September 07, 2010
Author: Stefan Koell Created: Tuesday, September 09, 2008 2:24:59 PM
Everything about Microsoft Windows Operating Systems, Server Products, Programming and Technologies

I start to think, I will never get V2 of Royal TS out. There’s always something coming up consuming time and I’m not talking about family here. Things you expect to just work, do actually not – at least not always. First I was very happy when I got my new server. I was looking forward to setup all the stuff which should actually help me to develop more efficiently. I bought a nice box where I could run Hyper-V, run my test boxes on it as well as Team Foundation Server 2010 (which btw is great compared to Subversion!). But then, reality caught up on me.

Problems with the network driver on my Hyper-V box (reminder to myself, never trust a Broadcom driver – only use Intel network adapters!). The NIC on the host worked perfectly but all guests moved from another hyper-v machine weren’t able to get networking up and running, even when I used the legacy network device. Days of research and tests were necessary until I gave up and led me to try a different NIC brand. After installing an Intel card, everything worked fine – on host and on the guests. On a sidenote: this server box is a Dell box and is officially “Hyper-V certified”.

My Exchange server died a horrible death because of this incident. Fortunately no data loss. Btw, the nice guys @ netmonic offered to host my Exchange mailbox for a reasonable price and let me say, I would never go back to hosting it myself. Perfect service, good value, no headaches anymore! (contact them for the latest rates, the prices on their site are a bit outdated)

Then, out of the blue(!), my new server started to blue screen whenever I copied approx. 1 GB to or from the server. Even when doing backups, it blue screened somewhat after a GB. The server ran fine for weeks – even with the backups! The good news, there’s a fix for that (http://support.microsoft.com/kb/975530). The bad news, it wasn’t really easy to find and I am starting to lose confidence in MS server products. 

The past few months showed that things like WHQL, certified drivers, Hyper-V certification, all doesn’t really mean anything. If you are out of luck you can have a hard time. I hope everything is stable now and I can start to do actual work now…

 Since a lot of users are confused about targeting, I decided to give it another shot and try to explain on a more practical side why targeting is done the way it is. There are some fundamental technical reasons why you need to use overrides and cannot target a group directly (as many might expect first), so let's take a closer look at the platform beneath and it might be clearer to understand why this has to be done this way.

Here's the line you read (or hear) when it comes to targeting a workflow (i.e. discovery, rule, monitor, etc.):

You cannot target a group directly, you always need to target a class like Windows Computer, disable the workflow, create an override afterwards to enable the workflow for a limited number of instances (Windows Computers in this case).

So, what's all about targets (classes) and groups?

A group (after you created one) is basically a class, very much alike the "Windows Computer" class. The main difference is, it's a singleton class (only one instance will always exist) and it is hosted on the RMS (which is maintaining the group membership). Because of this fact, you:

  • see all the groups in your system when you have to choose a target class
  • see every workflow targeted directly to a group's class run on the RMS (this is the host of the singleton-instance) because OpsMgr thinks that's the place you want to execute the workflow.

A group can have members (nothing more than instances of other classes) from different kinds. You can actually have one or more "Windows Computer" classes and one or more "IIS Web Sites" class instances mixed with some "Logical Disk" class instances, all in the same group. 
image 

Why do we need to specify a target which is not a group when we create a workflow?

The most obvious reason is the "Variable Replacement Mechanism" (I have no clue if there's an official name for that). When you target a workflow, OpsMgr knows all attributes (properties) of your target. This allows you to pass any attribute from the current instance your workflow is running against to your workflow. For example, when you target a workflow (i.e. a script monitor) to the class "IIS Web Site", you can pass the attribute "LogFileDirectory" to your script.

image

After you select the target you can access all the attributes from that target in your workflow:

image

This is one of the most powerful features OpsMgr has to offer in all the workflow processing. Once you've wrapped your brain around that concept you will realize that this beast can do almost everything in a relatively simple way.

As a result, the following rules apply:

  • Once you chose a target class, you cannot change it after the workflow was applied to the system. This is simply because of the implications related to the "Variable Replacement Mechanism". When you change the target, all the dynamic values you pull in might or might not work afterwards. I guess the engineers at MS could come up with a solution allowing you to change the target and prompt for each and every dynamic value to change for the new target class but as you can see for yourself, the outcome is unpredictable. Therefore they just disabled the option to change the target class. Even doing so in XML can be a mess and you often just end up creating this workflow from scratch using your new target class.
  • Because of groups can have members of different classes, there's no way for OpsMgr to find out which attributes can be used for the variable replacement. Consider our previously shown group containing classes of "IIS Web Site", "Logical Disk" and "Windows Computer". Each class have a different set of attributes, so targeting a group directly would prevent you from using the replacement feature. Many of the built-in workflows and vendor MPs depend on that feature!

But why can I target a group in the Windows Service monitoring template wizard?

Since OpsMgr 2007 R2 the Windows Service monitoring template wizard was “upgraded” and allows you to specify a group you want to target the monitor. I really wished they designed that dialog differently because every new user will get confused because all the workflows are not directly targeted to the group you specify in this dialog!

image

In the end, the wizard is doing the exact same thing behind the scenes you would do when you create your own workflow:

It creates a disabled workflow (in this special case a discovery) targeted to "Windows Computer". 
image 

It then creates an override for this workflow, enabling it for the group you chose. 
image

Ok, I need to do overrides, any other benefits of using this technique?

Using an override to enable/disable a workflow for a group of members has also its advantages. See Jakub's post here:http://blogs.msdn.com/jakuboleksy/archive/2006/12/06/overrides-in-scom-2007.aspx

Jakub explains how overrides are actually applied and shows how powerful this mechanism really is. In short, for overrides it isn't really necessary to use the exact same target as your workflow is targeted to. The calculation algorithm allows you to be more flexible here. For example:

A workflow targeted to an "IIS Web Site" instance can be overridden using a group containing "Windows Computer" instances. OpsMgr will find out for which "IIS Web Site" instances this override will be applied and includes all instances running on the specified "Windows Computer" instances.

Summary:

I completely understand anyone having a hard time with this concept. It is a bit strange at first and as most of you I had to get used to it as well. The stuff above is just a very condensed view and is far from being complete. There are several other reasons, benefits and rules in the workflow engine I did not mention here. These are the facts that helped me to understand the engine and platform better and I hope I could illustrate some of that for you.

One thing MS can and should do better is making these concepts more accessible in the UI. It begins with the very confusing terminology in all the dialogs and wizards and surely ends in the documentation. The documentation is getting better and better while the UI is still confusing or getting more confusing (see Windows Service monitoring template wizard).

Read More »

I spent the last couple of days, installing and configuring TFS 2010. Getting TFS up and running is really easy and went smoothly. I’m really impressed what MS did with the installation experience.

However, when you want to make TFS and TFS web access accessible over https it’s not that easy anymore. I also couldn’t find any detailed instructions so it was a bit of trial and error…

Lessons Learned:

  • When you use a self-signed certificate, make sure that the CN is the same FQDN as used in Visual Studio to connect to your TFS. Invalid certificates are accepted as long as you have installed the certificate in the Trusted Root Certification Authorities store. A name mismatch (the CN of the certificate doesn’t match the name of the host you are trying to access) is not accepted. Internet Explorer let’s you decide if you want to continue or not, Visual Studio not – it’s just blocking…
  • Import the self signed certificate into your computer accounts personal store.
  • Add another binding to the TFS site in IIS for https with the self-signed certificate
  • Change the notification URL to the https://FQDN/tfs
  • Open the web.config “C:\Program Files\Team Foundation Server 2010\Application Tier\Web Access\Web\web.config” and uncomment/adjust the remarked section “tfServers”:
    image 

PowerWF is a very cool visual powershell workflow designer with the ability to create OpsMgr management packs with just a click. Very impressive tool, see for yourself:

A bit expensive for my taste and when you look at Apple’s Automator it should be a product by MS included in Windows…

If you are a SCOM geek, want to get some attention and win some cool prizes, come over to the www.systemcentercentral.com site and enter the Management Pack Extension Contest! There are four separate contest categories:

  • Reporting pack extensions
  • Diagram or SLM pack extensions
  • Visio or Dashboard pack extensions
  • Tuning pack extension

You can submit one extension for each category. The contest started a couple of days ago and ends at June 7, 2010.  Click here for more details: http://blogs.technet.com/systemcenter/pages/system-center-management-pack-extension-contest.aspx

See you on the other side ;-)

I was asked recently to post an article on how we do web page monitoring. For a number of reasons we do not really use the built-in “Web Application” monitoring template. One of the reasons is that we are not really happy with the selection of the watcher nodes. We needed a way to monitor every web server in our farms without managing the watcher nodes manually all the time. We create host entries on our web servers pointing to themselves. So every time you browse to www.code4ward.net on one of the web servers you do not go through the load balancer. Since the host entry for www.code4ward.net points to the web server itself, you will browse to the web hosted on the server you are currently connected to.

So I created a small script which is basically doing web monitoring the way we wanted it to be. In this blog post I will talk about the implementation we started to use back in MOM 2005 and still use it (slightly modified) in our SCOM 2007 environments. We have recently migrated all those scripts to PowerShell and did our own class definitions using the authoring console. For now, I will focus on the much simpler implementation using VBScript and OpsConsole without any work in the Authoring Console. Download the vbscript from the following link:

http://www.code4ward.net/c4w/files/Misc/code4ward.Sample.WebContentCheck.zip

Before you begin you should create a group containing all your computers you want to monitor with a web page. Or you can of course also use the script like the Web Application template to monitor a web page through a load balancer or whatever using watcher nodes. In any case, create a computer group with your web servers/watcher nodes.

image_thumb2 In your Operations Manager console switch to the “Authoring Space”, expand “Management Pack Objects”, right-click on the “Monitors” node, select Create a Monitor –> Unit Monitor

Now select “Scripting / Generic / Timed Script Two State Monitor”

Select a destination management pack.

Attention: The group I talked about earlier needs to be in the same management pack with the script monitor we now create. Or the group is in a sealed management pack, then you can select a different destination management pack.

Click next.
   
image_thumb7 Provide a name for your monitor and select a target like “Windows Server”.

Notice that we uncheck the checkbox “Monitor is enabled”. We will later create an override to enable the monitor for all the web servers/watcher nodes we created the group earlier.
   
image_thumb10 Configure a schedule. In general we schedule all our monitors (or rule) to run every 5 minutes (of course there are exceptions).
   
image_thumb18 I strongly suggest to provide a meaningful script file name on this page, as it will help you to find it on the agent when you have to trouble shoot something.

Let’s setup the timeout to 5 minutes.

Open the script attached to this blog post and copy everything from the code4ward.Sample.WebContentCheck.vbs into the script text field.

The script is very generic and needs 3 parameters to run successfully. 

As you can see from the script body,
Parameter 1: is the URL of the web page you want to monitor
Parameter 2: is the expected text in the content
Parameter 3: is the timeout in seconds (-1 means no timeout)

Before we click on next, click on the Parameters button to specify your parameters.
   
image_thumb22 To be on the safe side, I always put the parameters in double-quotes. The parameters line reads:
”http://www.code4ward.net” “code4ward” “30”

The configuration means, download the web page from www.code4ward.net every 5 minutes (the scheduled we configured earlier), look for the string “code4ward” (without the quotes) in the content, abort request after 30 seconds if there’s no answer from the web server.
If “code4ward” is in the content and the web page was returned within 30 seconds, the monitor is healthy.
If “code4ward” is not in the content or the web page took longer than 30 seconds, the monitor is unhealthy.
   
image_thumb25 Now we need to hook up the property bag status messages from the script with the health monitor’s unhealthy state:

Property[@Name=’Status’] Equals Error
   
image_thumb30 Now we need to hook up the property bag status messages from the script with the health monitor’s healthy state:

Property[@Name=’Status’] Equals OK
   
image_thumb33 Here you can decide, if you want the health state to be warning or critical.
   
image_thumb36 The last page of the wizard let’s you configure the alert properties for this monitor. In order to get all the nice output from the monitor in the alert description, you need to copy “$Data/Context/Property[@Name='Message']$” (without the quotes") into the alert description field.

Now click on “Create” and your monitor is ready to use.

All you need to do now, is to create an enable-override on the monitor for the group we created before.

As you can see, the monitor itself is pretty simple and has not all the features you know from the Web Application template. But sometimes less is more and this script monitor is used to monitor hundreds of sites without any problems.

If you have any questions or feedback, just comment or drop me an email.

cheers,
Stefan
http://www.code4ward.net

I just want to share this piece of information because it’s not really documented and someone might wonder about how SCOM behaves with discovered entities in the Operations Manager DB and Data Warehouse DB. We had a discussion with our Microsoft contacts and here’s the (somewhat surprising) outcome.

Let’s start with a short overview, why this information is so important for us (and maybe for you too):

As many of you do, we deploy our own authored management packs containing our own class definitions which are discovered and monitored. In our case, we created a sealed MP, a “library” MP containing our classes, rules, monitors, etc. We also create MPs programmatically which have a reference to this library MP. Sometimes we are forced to update this library MP to a new version and sometimes it happens that we break compatibility so that we cannot just “upgrade” our MP. We are forced to uninstall all referenced MPs, the library MP and reinstall the new library MP (my friend Tenchuu wrote an awesome script to automate this process: http://systemcentercentral.com/BlogDetails/tabid/143/IndexId/55738/Default.aspx).

First question came up: what happens to my discovered entities when I uninstall the MP containing the discovery and class definitions?

The second question was: what happens when I re-import the MPs, the discovery runs and the exact same entities with the same ID are discovered again?

The answer to the first question was – as expected:

When you uninstall the MP containing the class definitions and discovery, all discovered entities are deleted and all associated operational data (Events, Alerts, Performance Date, State Change Data, etc.) will be gone as well.

So everything is gone from the OperationsManager database.

In the data warehouse database the data will still be there.

The answer to the second question may be obvious as well:

When you reinstall your MPs and the discovery will create your entities with the same object IDs as before, the data warehouse would continue to associate the data to the same object you had before. Except for the missing data while you had your MP deleted, you will not see a difference.

There’s one caveat and you should be very careful: Since the data warehouse also keeps track of the MP version with each object ID, the data warehouse will only keep the data from the last three MP versions in the database. This is not configurable! In other words, when you keep your data warehouse data for 6 months and you do uninstalling and re-importing your MPs more than 3 times in one month, you will lose more than 5 months of data.

UPDATE: The above – the rule of 3 – only applies to unsealed Management Packs. When you use sealed Management Packs the DW data will always be kept until standard grooming kicks it out, regardless how often you update, delete or re-import the MPs.

One other small detail is, that you cannot trick this mechanism by just not changing the MP version. The data warehouse will create an “internal” versioning of the MPs which will change upon update, even if the MP version is still the same!

cheers,

Stefan
http://www.code4ward.net

Read More »

 

Yesterday was my first (of 30 days) in Seattle/Redmond. I will be in Washington for a month attending some business so there will be some latency in responding to emails and forum posts.

During the weeks we have a lot of work but the weekends are reserved for fun, recreation, theaters,shopping and some hiking. So far the weather is just great here... hopefully it stays that way...

Read More »

... but those who are may have received a conference guide like this:

image

How do you spell connect?

Thanks to David Allen for this.

Read More »

ComponentFactory is offering their Krypton Suite (Value: $299) for free to MVPs and .NET bloggers. Go, ckeck it out here: http://www.componentfactory.com/blog/?p=328

       
 

Search Blog

       
         
 
You must be logged in and have permission to create or edit a blog.
Privacy Statement  |  Terms Of Use
Copyright 2010 by code4ward - Stefan Koell
Cloud Server hosted by | Web Design