SCOM grey RMS and agents with 7022 events after deleting crypto folder

SCOM Send feedback »

The other day I had been called in to help troubleshoot another case where management servers and agents turned to a grey state. It was a case with a reason I had not seen before so I thought I might mention it here.

What happened was that due to some reason the contents of the "C:\Documents and Settings\" folder got deleted from all machines. These were Windows 2003 machines. This happened right before the agents went into a grey state, so it was very likely to be linked. Some other programs (IIS, SCCM agent) also seemed to not have liked this. At first sight of course this only contains a few user profiles, so what the heck, right? But there are also some other directories in there (sometime of them hidden) that actually do have a function.

In the end what had happened was that the "C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto" folder had been deleted which contains machine keys. Aha...

So we started out focussing on the management servers and the SQL box in order to get that up and running first. What you will see on agents and management servers are events 7022 saying something to the effect of that it has downloaded secure configuration but that it does not have the certificate or private key to decrypt it. So it fails. Crypto folder deleted, decrypt errors, looks like we have a link here. Now the management servers also had this, so we first had to get those talking again (and make sure we monitor its SQL as well in order to see if we had any other issues). We want to see an event 7023 please.

So the way to get this running was to:

  • Stop the System Center Management service
  • Clear the default value under the following registry key (making an export first might be a good idea!):
    "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\SecureStorageManager"
  • We deleted the contents of the following folder, also in order to flush the cache and have it re-download configuration and management packs:
    "C:\Program Files\System Center Operations Manager 2007\Health Service State\*.*"
  • Start the System Center Management service

After a while we finally got the 7023 event back and a minute later we saw the long list of 1201 events which means it is downloading the management packs again. Few minutes later the monitoring of those machines came back and the performance counters and health state.
Next we pulled down a list of all agents from the OpsMgr Shell with get-agent and we used that as input to run the above as a script on all other agents.

An hour later and everything was green\red again.

So if you ever delete that folder or have a corruption in it this is a way to try to get it fixed.

Good luck!
Bob Cornelissen

SCOM Trick 14 – Troubleshoot grey agents or management server

SCOM, SCOM Tricks 4 feedbacks »

Very often you will see some SCOM agent turn into a grey state. In some cases this can happen to your management server or RMS as well (I hope not!!). This can have several reasons and range from an agent being down or a machine being down to more serious issues.
Normally your first step might be to ping the agent (using the tasks in the actions pane), followed by a check if the agent service is running on the remote computer (System Center Management service).

I have already talked about machines that can turn grey in SCOM Trick 10 and SCOM Trick 11 in case you might be looking at stale data in your console or when a machine actually is not part of SCOM anymore. For instance if somebody turned the machine off at the end of its life cycle. Those pages discuss getting rid of objects that you know are not monitored anymore and you do not want them. Everything below this point I assume you actually do want these objects to be monitored and have a nice and green state (if possible).

:!: How to find grey agents?
In the SCOM console is one place of course.

There is a management pack that will show you alerts for grey agents:
http://www.systemcentercentral.com/tabid/145/indexId/23800/Default.aspx

Combining it with Opalis to find grey agents by Marcus Oh:
http://marcusoh.blogspot.com/2010/09/opalis-monitoring-grey-gray-agents-in.html

Find grey agents in PowerShell:
$WCC = get-monitoringclass -name "Microsoft.SystemCenter.Agent"
$MO = Get-MonitoringObject -monitoringclass:$WCC | where {$_.IsAvailable -eq $false}
$MO | select DisplayName

Find grey agents in SQL:
SELECT ManagedEntityGenericView.DisplayName, ManagedEntityGenericView.AvailabilityLastModified
FROM ManagedEntityGenericView
INNER JOIN ManagedTypeView ON ManagedEntityGenericView.MonitoringClassId = ManagedTypeView.Id
WHERE (ManagedTypeView.Name = 'microsoft.systemCenter.agent') AND (ManagedEntityGenericView.IsAvailable = 0)
ORDER BY ManagedEntityGenericView.DisplayName

:!: Troubleshooting grey agents:

Troubleshooting grey agents KB article KB2288515
Troubleshooting gray agent states in System Center Operations Manager 2007 and System Center Essentials
http://support.microsoft.com/kb/2288515
This is a very complete discussion on the subject.


Back to the SCOM Tricks general list

SCOM Trick 13 – SPN

SCOM, SCOM Tricks Send feedback »

Service Principal Names (SPN) are used to uniquely identify an instance of a service. SCOM Services are also registered and mostly everything will be just fine, but in some cases you might have issues with this.

My friend Walter Eikenboom has written up a very good post about SCOM and SPN, how to check it, how to set it etc. Check it out here: http://systemcenterdynamics.wordpress.com/2009/08/26/scom-2007-r2-what-should-my-spn-registrations-look-like/

Also a post by Kevin Holman about the subject:
System Center Operations Manager SDK service failed to register an SPN
http://blogs.technet.com/b/kevinholman/archive/2007/12/13/system-center-operations-manager-sdk-service-failed-to-register-an-spn.aspxSymptoms if you see an event from OpsMgr SDK Service with ID 26371

And another writeup by Jonathan Almquist:
http://blogs.technet.com/b/jonathanalmquist/archive/2008/03/12/sdk-spn-not-registered.aspx

Another case that can be brought back to an SPN issue by JC Hornbeck:
OpsMgr 2007: Agents stuck in Pending Management with Event ID 21016
http://blogs.technet.com/b/smsandmom/archive/2008/03/13/opsmgr-2007-agents-stuck-in-pending-management-with-event-id-21016.aspx

Back to the SCOM Tricks general list

SCOM Trick 12 – Diagnostic logging

SCOM, SCOM Tricks Send feedback »

In some cases you might require a lot more info from SCOM about what is going on during troubleshooting. In that case you might want to have more diagnostic logging (more verbose). Here is how to use it.

How to use diagnostic tracing in System Center Operations Manager 2007 and in System Center Essentials
http://support.microsoft.com/kb/942864


Back to the SCOM Tricks general list

SCOM Trick 11 – Seeing old or unmonitored entities

SCOM, SCOM Tricks Send feedback »

In some cases we might open the SCOM console and go into a state view and see old entries there. Devices that have actually gone already. Items that have been discovered before somehow and not being monitored. Assuming that we do not want these monitored we would like to get rid of them in our state views.

First of all check out SCOM Trick 10 and see if it is not just the SCOM console playing tricks on you.

Next you can go into the Start menu – All programs – System Center Operations Manager 2007 R2 – Operations Manager Shell. When it is loaded you can type the following command:
remove-disabledmonitoringobject

That should get rid of a lot of the unmonitored entries.

Back to the SCOM Tricks general list

SCOM Trick 10 – SCOM console renew

SCOM, SCOM Tricks 1 feedback »

Many times it can happen that you are looking at stale data in the SCOM console. For instance you click on an alert and it gives you an error, saying that it has already been closed (in more difficult terms). Sometimes you see an entry in a list that should not be there anymore. This could have several reasons, which will be discussed later. But the first things to do when you fear you are looking at data that could have some clutter in it are the following (going one step further every time until you see what you expect to see):

  • Press F5 to refresh the screen
  • Close the SCOM console and start it using the /clearcache option
    Click Start ? Run and put the following (in one line) as the command:
    "C:\Program Files\System Center Operations Manager 2007\Microsoft.MOM.UI.Console.exe" /clearcache
  • Next you can do the same after deleting the following registry key (close the console first):
    HKEY_CURRENT_USER\Software\Microsoft\Microsoft Operations Manager\3.0\Console
    This gets rid of most of the “ghost” entries of alerts and items in the view caused by the console itself.

Back to the SCOM Tricks general list

SCOM console crashes when running a task

SCOM, System Center 3 feedbacks »

Have seen a thread today on the TechNet forums about the SCOM Console crashing when running a task. http://social.technet.microsoft.com/Forums/en-US/operationsmanagergeneral/thread/c339c327-1e7d-412b-9d1a-5ae0b8a2e0f8/#9ae059f9-f881-4fa1-9d42-fceed4c4ef2d

We actually also have one issue exactly like that, so it was a nice opportunity to dive into it a bit more. So what happens is that from the SCOM Console you run a task against some agent and while running that task it crashes and takes the SCOM Console along with it. You can get an error like the following:

ie9scomconsolecrash

Alexey Zhuravlev from opsmgr.ru recently found that Internet Explorer 9 seemed to cause the problem in a case he encountered and in this thread the solution was also to remove IE9 to get it working.
To quote him on that part:

Console calls this:
StackTrace:
at System.Windows.Forms.UnsafeNativeMethods.IHTMLLocation.GetHref()
at System.Windows.Forms.WebBrowser.get_Document()
at System.Windows.Forms.WebBrowser.get_DocumentStream()
at System.Windows.Forms.WebBrowser.get_DocumentText()
If I understand the process correct, it uses Mshtml.dll. IE9 replaces this dll (installs it's own version 9.0.*). And it looks like the new version causes an access violation...

One of the colleagues of mine at a customer location also had these problems since a while, but with certain tasks. It turns out to be the tasks that are run against the agent. So for instance a ping will work just fine because it is running locally, but a task like show processes or start a service will first give a popup for you asking if you want these credentials after running that one it crashes (and takes the SCOM console down with it. So when you do that you get one of these popups:

ie9scomconsolecrash

We tested by uninstalling IE9 and confirming it was now IE8. Tasks ran fine. So we installed IE9 again, but this time from the Microsoft website and not some internal updating process. And yes, it crashed again at running a task.

So the current workaround is to uninstall IE9 to work with this. Hope that it will be fixed soon.

Update 9 June 2011:
Lincoln Atkinson gave an answer to a thread in the Technet Forums about this issue with a remark for a future fix:
The product team is aware of this issue and are looking into a fix. We will be fixing for vNext + if possible backporting the fix in a future cumulative update.

Bob Cornelissen

SCOM Trick 9 – Maintenance mode tooling

SCOM, SCOM Tricks Send feedback »

As I have said in SCOM Trick 7, the use of maintenance mode is important. But not always will somebody use the normal SCOM interface (or web interface) to start maintenance mode right before they start working on the machine. To be honest a lot of the times machines get placed in maintenance mode when the alerts start flowing in during planned work and they quickly place the machine into maintenance mode. In any case, you can actually schedule maintenance mode, or include it in scripting. Here are some resources to get you started.

Maintenance mode history report
http://www.systemcentercentral.com/tabid/145/indexId/70867/Default.aspx

Remote maintenance mode mp
http://www.systemcentercentral.com/tabid/145/indexId/11577/Default.aspx
by running a script on the agent, makes event log entry that gets picked up.

MCS maintenance mode mp
http://www.systemcentercentral.com/tabid/145/indexId/11546/Default.aspx

Put a group into Maintenance Mode
http://blogs.technet.com/b/operationsmgr/archive/2009/11/17/putting-a-group-of-computers-into-maintenance-mode-via-powershell.aspx

Maintenance Mode powershell script
http://blogs.technet.com/b/mgoedtel/archive/2009/10/29/updated-powershell-script-maintenance-mode.aspx

Remote Maintenance Mode GUI tool
http://www.scom2k7.com/scom-remote-maintenance-mode-scheduler-20/

Cluster and maintenance mode
http://blogs.msdn.com/b/mariussutara/archive/2008/09/05/cluster-and-maintenance-mode.aspx

Stopping maintenance mode
http://blogs.msdn.com/b/boris_yanushpolsky/archive/2007/08/30/stoping-maintenance-mode.aspx

SCOM Maintenance Mode Tool
http://scommaintenancemode.codeplex.com/

Schedule a group of URLs (or one) into maintenance mode
http://www.scom2k7.com/schedule-a-group-of-urls-into-maintenance-mode/




Back to the SCOM Tricks general list

SCOM Trick 8 – Who enabled maintenance mode

SCOM, SCOM Tricks Send feedback »

One of the questions that get asked after people start using maintenance mode in SCOM, especially in bigger environments, is to provide an overview of who put something into maintenance mode.

Somebody wrote a management pack for this!
Maintenance Mode History Report Management Pack
http://www.systemcentercentral.com/tabid/145/indexId/70867/Default.aspx

Back to the SCOM Tricks general list

SCOM Trick 7 – Use Maintenance mode

SCOM, SCOM Tricks Send feedback »

One of the great features in SCOM is the ability to place a machine/device or part of it in maintenance mode whenever you are working on the machine and you do not want it to generate alerts while you are doing your stuff. For instance during a planned change. This also avoids unnecessary red and yellow health states which affect your SLA availability reports. One more thing is that it tends to not stress out helpdesks and ticketing systems if you try to avoid sending them unneeded alerts (sometimes they do not know you are playing with the machines).

In many cases a whole machine will be placed into maintenance mode. As of SCOM 2007 R2 setting maintenance mode for a machine only has to be done in one place and not in three places like before.

You can enable maintenance mode from any state view by clicking the desired machine/device/website/database and selecting Start Maintenance Mode in the actions pane or by right-clicking and selecting that option. This can also be done right from an alert view, but I always prefer to be clear on where I select it.

From there you can select if it is planned or not and what the reason is. You can start it, stop it or change the duration.

Something to NOT do is place management servers in maintenance mode. Unless you know what you are doing.

In some cases when you have a health explorer that will not turn green and manually resetting it does not help and in some cases it is just at the rollup stages where it will not turn back to green… you can try to place the machine in maintenance mode for 15 minutes and after that time it will re-calculate the health state.

In an upcoming Trick I will list some of the tools you can use for maintenance mode, scripts, powershells, management packs and so on.

Just remember to use maintenance mod.

Back to the SCOM Tricks general list

Contact / Help. ©2015 by Bob Cornelissen. blog software.
Design & icons by N.Design Studio. Skin by Tender Feelings / Evo Factory.