I was working with an old Windows 2008 R2 server last night. It needed a "few" updates!
So I will first admit to several of my own mistakes. I did not give myself time to update this machine regularly enough in the past and of course we always have to install the Windows Updates on time. If you figure you wait for an extra month for any fixes introduced one month to be fixed the next its something we all understand. But this was many months worth of updates. I went the lazy way, which bit me as you will see below.
I was first interested on getting 1 specific update on the machine. So I selected that update and a random few other smaller updates. Now this is a mistake! It installed the updates and wanted a reboot. OK. Next thing which happens is that the machine starts up in an immediate Blue Screen with code STOP 0x00000050 PAGE_FAULT_IN_NONPAGED_AREA or in short a code 0x50. There was no way around this into for instance safe mode or whatever. The only thing which popped up was the System Recovery Options shown below:
By the way, before you get to this screen it asks you for the Local Administrator password. Turns out even I did not remember, but I got it in the end. Managing admin accounts, including local administrator accounts is important to do. Watch Paula JanuszKiewicz give you an example why it is important here at one of the CQURE academy sessions about passing the hash.
Felt a little panic coming up at that point, because data loss or at least a lot of time fixing things can follow this action. Did not look like I could do much from here either. I did have backups of the data, so in time I would have restored it.
Another rerason for the panic is that I was doing two systems at the same time and in the same way.... and you guessed it... both with the same result!
A lot of googling open and there are a lot of videos explaining how to fix this FROM Windows! Problem is I am stuck in this System Recovery Options Screen. The memory check did not show anything by the way.
Well somewhere hidden in a comment of one of the threads (I can not find it!) was the suggestion that some previous hotfix might have hit one file and removing that file solved it for a few people.
In the picture above you can see a command prompt. Open that.
Next you need to find out which drive letter contains your Windows Installation. The System Recovery just uses a drive letter for itself and throws the other drives into other drive letters. So I did a C: Enter. DIR and knew this was not the drive. So I went to D: and did DIR again. Nope.. Continued until I got it.
The file I am looking for is fntcache.dat
this is the font cache file. Do NOT touch the DLL file there. The DAT file is a cache and will be re-built by Windows after restart.
Now I exited the command prompt and restarted the server. It started again into Windows where I hoped it would go.
Next I still needed to do a select-all on the rest of the updates and install them all the same
So keep in mind to update regularly + do not select half the updates but go for them all because there are fixes in there which fix issues created (or surfaced) by other fixes.
Now I can continue with actually replacing these servers, which was the plan to start with!
Today I was doing a quick installation of the Savision 8.2 Live Maps Unity Portal. Downloaded the self-extracting executable from the website and of course arranged a license key. While running the installer I selected the Express setup which just pushes the web portal onto the machine and not the other components available in the Advanced installation option. The installation ran in 2 minutes on a slow machine, and this is including the extracting of the files and running checks.
After installation the web page automaticaly opens up and I was greeted with the following error:
HTTP Error 500.19 - Internal Server Error
In the error description there is talk of a configuration section being locked at parent level.
Screenshot of the error:
What happened is that the configuration on the server level is that Windows Authentication is turned off and that this configuration is locked for the whole machine. So for the Live Maps Portal it is trying to read configuration from a configuration file relating to Authentication and because this configuration is locked at a higher level it throws an error.
How to fix it:
Open IIS Manager
In the left menu select your server name
In the middle of the screen select Configuration Editor
Near the top of the Configuration Editor is a selection box for which section you want to see and edit.
Go to system.webServer/security/authentication/windowsAuthentication
In the right hand manu you will find a link to Unlock Section. Click it to unlock this configuration item.
Now any lower level (Sites or Applications within a site) can have their own configuration for Windows Authentication.
Refresh the error page and the Live Maps Unity Portal came up fine!
Ran into a customer issue today whereby there was a nice clean SCOM 2012 R2 installation with UR's. Certificates arranged and momcertimport ran. On the agent machines in DMZ we had the agent installed, UR on it, certificate root imported, certificate meant for computer imported. momcertimport ran to get the correct certficate running. Yet no communication at all between agent and server. This is what I found:
So first checks are:
- does the agent machine have the certificate for the name of the server (which in workgroup can be the short name and in a dmz domain a fully qualified name)? Yes
- does the agent machine trust the CA which issued the certificate? (in this case a customer own CA, so the root chain cert was imported). Yes
- can the agent resolve the SCOM server name you used while configuring the agent? Yes
- Is the management group name we used in configuring the agent correct (case sensitive!)? Yes
- Is there a firewall blocking TCP 5723 from agent to SCOM server? Yes! OK this was fixed quickly, and verified with telnet. Still no communication! Moving on.
- On the SCOM server did we import the CA root chain as trusted and did momcertimport run on the correct machine certificate with the correct FQDN for that server? Yes
- restart healthservice on both sides... Yes. No effect
Man usually its name resolving, firewall and routing, certificate with wrong name, no certificate, or not trusted certificate. Pffff.
Something must be wrong with the SCOM server, I'm sure of it.
Next step, lets check out if all our SPN's are correct.
setspn -L scomservername
He wait a second, I see an entry like this:
Now this SCOM server is installed with the setting that the SDK service is running using a domain account. So this SPN should not be registered to the server itself but to the service account in the domain.
setspn -L domain\sdkserviceaccount
Sure enough the entry is not here for MSOMSdkSvc on this service for the mentioned server.
ALright, now we can not place thie correct SPN for this until we remove the wrong one. so we first delete the wrong ones.
setspn -d MSOMSdkSvc/scomservername scomservername
setspn -d MSOMSdkSvc/scomservername.domain.com scomservername
Next we enter the SPNs on the service account:
setspn -s MSOMSdkSvc/scomservername domain\serviceaccount
setspn -s MSOMSdkSvc/scomservername.domain.com domain\serviceaccount
And we check our results again with the setspn -L command.
Looks fine now.
It must be the certificate somehow.
Open MMC Certificates, check the computer certificate. Is it valid, is it trusted, is it for the right purposes, does it have the correct name... Yes.
momcertimport it again.. only 1 certificate to chose from and its the same one. Restart the Microsoft Management Agent service afterwards.
Wait a second. Let me check in the registry for this certificate. What Momcertimport does is not that difficult. It grabs two properties of the certificate and creates two registry keys for it for SCOM to use.
Aha! NO registry values!
Looking in this key there must be two entries relating to the certificate:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings
Alright, so I will create them manually!
What you do is open the properties of the certificate. You need the Thumbprint and the SerialNumber.
Create a New -> String Value
Name it: ChannelCertificateHash
Copy and paste the Thumbprint contents into it and remove the spaces in between
Create a New -> Binary Value
Name it: ChannelCertificateSerialNumber
Now go to the properties of the certificate and click the Serial Number. Its again a string of numbers and letters in pairs of 2. What you need to do is fill in the pairs of 2 in the registry Binary value IN REVERSE.
Original serial number in certificate = 68 00 AB CD 69 00 23
What you enter in Binary field = 23 00 69 CD AB 00 68
So the pair of 2 characters stays the same, but the order of the pairs in the total string is reversed.
Next I restarted the SCOM services.
Within the minute it started saying that: A device which is not part of this management group has attempted to access this Health Service.
Those were the DMZ machines which just keep trying again and again!
In the end it will have been the certificate rather than the SPN record which messed it up, but at least I could show what things I checked. When the SPN came up I just fixed it as well. In the end it WAS the certificate eventhough I felt that it was alright. Well when in doubt and ALL untrusted agents refuse to talk to this machine, and all trusted ones have no issue... triple-check the certificate and if SCO is actually using it!
Have fun monitoring!
I thought I would take a different approach to thinking about how to make a SCOM monitoring project a success. It is not about technical details or designs this time, but about a way to bring business and IT together into monitoring business related services and being in control of those processes. In a short blog post below I am touching upon some of those items.
Last week I installed a fresh WSUS server for a customer of mine and because it needed to download lots of files after the approvals were done we left it for a few days. Today I came in and opened the WSUS console only to notice it refused to connect. Got an error like this one:
The WSUS administration console was unable to connect to the WSUS Server via the remote API.
Verify that the Update Services service, IIS and SQL are running on the server. If the problem persists, try restarting IIS, SQL, and the Update Services Service.
The WSUS administration console has encountered an unexpected error. This may be a transient error; try restarting the administration console. If this error persists, Try removing the persisted preferences for the console by deleting the wsus file under %appdata%\Microsoft\MMC\.
System.IO.IOException -- The handshake failed due to an unexpected packet format.
After checking that the requires services were running the investigation starts. Lot of blog and forum posts from long ago to recent, all with different solutions.
I came across a post from 6 weeks or so ago which talks about an update KB3148812 which causes this behavior and also to cause an additional error where clients can not scan WSUS.
Now I could not find this KB patch installed on my system, however it mentioned manual steps to be done after applying the hotfix and those manual steps solved it indeed. Keep reading.
A little more research found that the 3148812 has now been cancelled and another one came in its place KB3159706.
This article describes what is going on and it contains manual steps to be followed! The first step solved the console not being able to connect. The second step is for HTTP Activation. And if you have SSL turned on there are a few more steps to follow.
In my previous post which introduced SCOM 2016 Features - Network Monitoring MP Generator I have shown you how to use the command syntax of the tool and why it was created. Now it is time for an example.
Have fun monitoring some network device and see how the principles of the input XML file works.
Also because I have been doing a few presentations with a SCOMosaur theme, so we combine a little SCOM with a little dinosaur madness. You will see a few references of that here and there.
Mind I am using a simulated device which may not be fit for this purpose. Reason being the default simulated devices by the Jalasoft SNMP Device Simulator are all CERTIFIED. ANd we are of course creating monitoring for the non certified devices. The OID's in the example below are from a APC UPS device, but for now we can use it as exampe clearly enough.
- First of all I am using SCOM 2016 TP5 here, which is the first version to include this feature.
- I am using Jalasoft SNMP Device Simulator on another machine to simulate a few network devices of different types.
- Of course make sure both sides can reach eachother with ping (ICMP) and SNMP.
- I am using iReasoning MIB Browser to browse the SNMP tree on the device selected to determine we actually have data there and the right OID's.
Next on the list is to discover the devices in SCOM by creating a Device Discovery and adding the device IP addresses and SNMP community string to it and letting SCOM discover the devices.
The XML input file
Actually the idea here is relatively the same as a simple management pack setup.
- A manifest with management pack name and version
- A Device definition
A Device discovery>/li>
- Device Components
- Device ฉomponent Discovery
- Rules (these are collection rules)
Starting the Manifest
First we are going to define the start to the input file by the Root tag.
Next we define the Display Name and Version for the management pack.
Name and Version are mandatory and an optional tag is KeyToken.
Device Definition and Discovery
The next thing to do is create an entry for each type of device and to make a device discovery for it.
First we define a name for the device.
Next we jump into a discovery for it.
The discovery covers the SysObjId tag which points to the unique device identifier for the device type.
Next we have to specify a device type. The following types are supported for now: Switch, Router, Firewall, LoadBalancer.
Next fill out the Vendor and Model.
Components and Discovery
Now it is time to look into the components of the device. For example Processors or Fans. After we dicover those we can target monitors and rules to those components in order to monitor them.
We are opening the Components tag here, and it will be closed all the way at the end of the story.
Next we define our first component.
There are a few component types supported at this moment: Processor, Memory, Fan, Voltage Sensor, Power Supply, Temperature Sensor.
And we give it a name of course.
Now we define the OIDs we are interested in. These OIDs will have to be there for each instance of the Component we define. One of these will be used in the discovery of the component and the same one and/or others we can use for rules and monitors. At least we have defined all of them here and given them original names.
We do not have to enter the index number of each component instance. For example...
fan2 = 18.104.22.168
fan2 = 22.214.171.124
fan3 = 126.96.36.199
In the very short OID example above you can see the last number is the index number for each fan. So we only need to specify 1.3.6 in this case and the discoveries will find each instance for you.
In this case I named the component the Tricera Environment and gave it a Processor type, just because it needs to conform to the default types at this moment.
The 3 used OID's are a Temperature OID, a Usage OID (which happens to be the amount of battery percent left for the UPS), and an overal state indicator OID for this component.
For the step coming after this, it means we have two performance counters we can collect (but I will collect all three in the example), and also we can create state monitors based on the values.
Lastly the ComponentDiscovery is a pointer to which of the already defined OIDs is a component indicator. In this case I use the state indicator OID. If that one is there (with an index number behind it) an instance of the component will be created or as many as needed.
Monitoring and Rules
Alright now the monitoring needs to start for the component we are still at.
For starters we set the Monitoring tag. We will close that tag later after we have defined all rules and monitors.
Next we start with the rules:
We open the Rules tag and next define the performance collection rules as you see here. I used short names for it and pointed each rule to the name of the OID we defined already. See how easy that part is?
Lets go to the monitors now...
First again we start it off with the Monitors tag which we will close off after the last monitor we add.
Alright, first UnitMonitor. We give it a name. In this case Triceratops Environment Status.
It is a two state monitor so we define two expressions.
Both of them point (in black letters in the middle here)
to the name of the OID containing the state indication.
The first expression is for success (green state) and uses 2 or less. And the second expression uses anything higher than 2 to set it to an error state.
So i repeated that two more times for the Temperature and set it to 30 degrees as maximum acceptable value, otherwise our dino gets sunburn.
And the third monitor is using the TriEnvUsage OID to determine if it is at 100 or below.
And now as promissed we close the whole load of tags off:
The conversion process
Alright we now have an XML input file with all the stuff we need. Now we need to use the Network Monitoring MP Generator tool to convert the input file to a management pack XML file.
Open a command prompt and go to
%Program Files%\Microsoft System Center 2016\Operations Manager\
I placed my input file in the folder C:\SCOMosaur with file name dino.xml and I will allow the output file to be written to that folder as well.
I run the command:
NetMonMPGenerator.exe -InputFile "C:\SCOMosaur\dinos.xml" -OutputDir "C:\SCOMosaur"
The program will let you know if there are any errors and it will confirm if it finished creating the management pack file.
From here you simply import the management pack and as usual wait a little bit.
Well it is a lot easier to create this input file with the basics we need to be monitoring the custom device. The total input XML file was about 60 lines if we take away the empty lines. The resulting management pack was 690 lines long.
There will be a complete example coming from the product team very soon now, including comments in the file and such. This is just a quick starter to help you play with this feature.
This is meant to get NOT Certified devices in a more complete monitoring state as if it were CERTIFIED. As you have seen the device types and component types are for the moment a limited set.
My idea around this feature is that the possibilities might still expand in due time to be more and more flexible. Also it would be nice to see a graphic interface to build up the input XML and of course that would immediately build up the management pack. However those kind of things take a lot of time to build. I consider the current solution a nice go between.
Back to the SCOM 2016 Features - Overview post!
Hope you all have fun!
Obviously the product team has received some feedback in the past on the performance of the SCOM console. It is not a secret this is not the fastest tool out there when opening it, changing views or refreshing even. This is the most apparent in larger environments of course. We can name several good reasons for this which we will not dive into now, but there was room for improvement even when taking the good reasons into account. Now they have started work to increase the speed of certain views within the SCOM console and expand from there.
In SCOM 2016 TP5 first the Alert views were looked at and worked on.
- Alert view is optimized to load efficiently
- Alert tasks and alert details in alert view is optimized to load efficiently
- Context menus of an alert in alert view is optimized to load efficiently
Alert views are one of the most used in SCOM, so this is where they started. Meanwhile work is done on other types of views as well, such as State and Performance views. These improvements will arrive later than TP5.
Of course these changes are likely most apparent in larger views and busy environments.
I do not have numbers or percentages of improvement for you yet. We might really start to notice a change in RTM production environments of a certain size later. Still I am very happy this bit of feedback was picked up and worked on.
Back to the SCOM 2016 Features - Overview post!
Wishing you speedy monitoring!
In SCOM 2012 there was a difference between Certified devices and generic devices. When you added a network device to SCOM it would show up as on of both. The certified devices had additional monitoring applied to them such as Processor and Memory monitoring, while the generic devices were much more basic in their monitoring possibilities. To get around that and/or to create additional monitoring for a devices components and add monitors and rules to them was quite difficult to achieve. I know I spent a week creating a custom management pack for a customer with a few classes, discoveries, monitors and rules, also because the amount of information was very limited but also because it is such a hard process to get through. Plus I am not really much of a developer to be honest. Lets say in that week a lot of words were used and thankfully I got great tips from my MVP friend Daniele Grandini.
Now however we are getting some help from SCOM 2016!
What is the process?
What you do is create a custom formatted XML file. This contains some basic information you are used to while creating management packs, such as a name and version number. Next you define Discoveries for devices and components. You define the SNMP OID's to look for. And you create Rules which look at the defined OID's and collect their data, and you create monitors which also look at predefined OID's and have expressions connected to them which look easier than the ones you used to create in custom packs to determine state of the components.
The tool we are talking about converts this structured XML file into a management pack XML file which can be used by SCOM. It is a simple command line executable with very few options and it will check for mistakes in the input XML and notify you.
The first thing which needs to happen is that you discover the targetted device first as an SNMP network device in SCOM through the usual method. The management pack which will be created using this tool would only work on discovered and monitored network devices. We are just expanding the default monitoring set to include more specific monitoring.
Where is it found:
%Program Files%\Microsoft System Center 2016\Operations Manager\Server\NetMonMpGenerator.exe
The command line options:
-InputFile or -I is used to pass the filename of the XML file you created (can add a path to that within quotes).
-OuputDir or -O is the directory where the output of this tool will be written to (can use a full path between quotes). The tool will write the management pack file to this directory.
-Overwrite or -W will overwrite an existing MP with the same name if found in the output directory.
-Help or -H can be used to display short usage help for the executable.
Example of command line tool usage:
I opened up a command prompt and went to the following directory
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server
Next I ran this command (and the directories already existed)
NetMonMPGenerator.exe -InputFile "C:\SCOMosaur\dinos.xml" -OutputDir "C:\SCOMosaur"
And a few seconds later I got his message:
Management pack created: C:\SCOMosaur\System.NetworkManagement.SCOMosaursNetworkPack.xml
This file can be imported into your SCOM environmet to start monitoring.
Now I know you are going to ask me for a full example where I create the input XML as well.
Example of the SCOM 2016 Network Monitoring MP Generator where I will be attempting to monitor a Triceratops somehow.
This of course relates to me being one of the SCOMosaurs and staying on the Theme.
Back to the SCOM 2016 Features - Overview post!
With this post I am giving you an overview of the new features in SCOM 2016 which have been added currently. I bet you thought not much was happening with SCOM for the 2016 version right? Well I can tell you there is still a lot going on. Below you will find some of the things which have been worked on.
A number of features were added in early Technical Preview Releases (TP3 and TP4), such as Scheduled Maintenance Mode and Nano Server Agent. I will cover those in the series below as well, but first I will focus on the items added in TP5.
The following features and items were added since Technical Preview 5 of SCOM 2016 (Start of May 2016 timeframe) and we want YOU to know about them and you can use the links for each feature to dive more deeply into these features and improvements:
Now there are also other SCOM 2016 improvements on the list:
Give feedback on SCOM features
By the way, feel free to interact with the product team by giving them feedback:
The SCOM User Voice site
For example to get the Scheduled Maintenance Mode feature to move from the Admin pane to the Monitoring pane somehow so Operator level SCOM users can use the feature as well and not only SCOM admins Assuming of course most Operators and Service Desk staff are not heavy PowerSHell users (yet).
This and more is going on in SCOM 2016. I will be writing more about these subjects soon on my blog and in a future book and elsewhere probably.
Also be sure to watch for my presentations on SCOM 2016 at conferences (MMS 2016 Minneapolis on 17 May) and user group meetings (WMUG NL in May). I will be recording one and posting it up soon.
Enjoy being in control of your network infrastructure!
In SCOM 2012 R2 we were able to monitor up to 500 Unix/Linux agents per management server or about 100 through a gateway. To be honest I think that was already stretching it, unless the amount of workflows was kept to a minimum.
In SCOM 2016 work has been done to be able to scale up to higher numbers for this. Up to twice as much actually IF you use another monitoring method for cross platform monitoring. I will show you what I mean below.
In SCOM 2012 we were using WSMAN Sync API's to connect to the Linux agents and pull data from them. This is also the default setting for SCOM 2016.
However if you have a large Linux/Unix deployment that you wish to monitoring using SCOM 2016 there is a registry key you can set on the management server which will change the behavior of monitoring to use ASync MI API's. MI in this case stands for Windows Management Infrastructure which is based on CIM standards (the SCOM OMI agent is as well).
In order to get the SCOM management servers to use the new method (and thus scale up more!) you add a registry key to the management server which is monitoring the cross-platform agents.
Create this entry:
HKLM:\Software\Microsoft\Microsoft Operations Manager\3.0\Setup\UseMIAPI
After you do this I suggest you restart the Microsoft Monitoring Agent Service (also called the Healthservice) to be sure this goes into effect. Make sure all your management servers used for this purpose use the same method.
I think if you are monitoring a significant number of Linux/Unix agents in your environment (hundreds) that you change this setting on your SCOM 2016 management servers.
Back to the SCOM 2016 Features - Overview post!
Happy crossplat monitoring!