Category: "SCOM Tricks"
At one of my customers they had a problem deploying SCOM agents through a script on Linux servers. They had a number of Red Hat 6 servers and all went well. On the Red Hat 7 servers however the agent refused to install. Also through a push of the agent through the console. It seemed to stop around the file copy stage where the rpm file gets copied to the server and next run for installation.
It turned out to be a feature called "root squash" causing the issue. What it does is lock rights on NFS shared volumes, so root can not simply access or run commands from any directory. For instance the /home parts. When they turned off this feature the agent installed immediately.
Just writing this down because I am sure I will run into this again somewhere.
Happy agent deployment!
Today I was doing a quick installation of the Savision 8.2 Live Maps Unity Portal. Downloaded the self-extracting executable from the website and of course arranged a license key. While running the installer I selected the Express setup which just pushes the web portal onto the machine and not the other components available in the Advanced installation option. The installation ran in 2 minutes on a slow machine, and this is including the extracting of the files and running checks.
After installation the web page automaticaly opens up and I was greeted with the following error:
HTTP Error 500.19 - Internal Server Error
In the error description there is talk of a configuration section being locked at parent level.
Screenshot of the error:
What happened is that the configuration on the server level is that Windows Authentication is turned off and that this configuration is locked for the whole machine. So for the Live Maps Portal it is trying to read configuration from a configuration file relating to Authentication and because this configuration is locked at a higher level it throws an error.
How to fix it:
Open IIS Manager
In the left menu select your server name
In the middle of the screen select Configuration Editor
Near the top of the Configuration Editor is a selection box for which section you want to see and edit.
Go to system.webServer/security/authentication/windowsAuthentication
In the right hand manu you will find a link to Unlock Section. Click it to unlock this configuration item.
Now any lower level (Sites or Applications within a site) can have their own configuration for Windows Authentication.
Refresh the error page and the Live Maps Unity Portal came up fine!
Ran into a customer issue today whereby there was a nice clean SCOM 2012 R2 installation with UR's. Certificates arranged and momcertimport ran. On the agent machines in DMZ we had the agent installed, UR on it, certificate root imported, certificate meant for computer imported. momcertimport ran to get the correct certficate running. Yet no communication at all between agent and server. This is what I found:
So first checks are:
- does the agent machine have the certificate for the name of the server (which in workgroup can be the short name and in a dmz domain a fully qualified name)? Yes
- does the agent machine trust the CA which issued the certificate? (in this case a customer own CA, so the root chain cert was imported). Yes
- can the agent resolve the SCOM server name you used while configuring the agent? Yes
- Is the management group name we used in configuring the agent correct (case sensitive!)? Yes
- Is there a firewall blocking TCP 5723 from agent to SCOM server? Yes! OK this was fixed quickly, and verified with telnet. Still no communication! Moving on.
- On the SCOM server did we import the CA root chain as trusted and did momcertimport run on the correct machine certificate with the correct FQDN for that server? Yes
- restart healthservice on both sides... Yes. No effect
Man usually its name resolving, firewall and routing, certificate with wrong name, no certificate, or not trusted certificate. Pffff.
Something must be wrong with the SCOM server, I'm sure of it.
Next step, lets check out if all our SPN's are correct.
setspn -L scomservername
He wait a second, I see an entry like this:
Now this SCOM server is installed with the setting that the SDK service is running using a domain account. So this SPN should not be registered to the server itself but to the service account in the domain.
setspn -L domain\sdkserviceaccount
Sure enough the entry is not here for MSOMSdkSvc on this service for the mentioned server.
ALright, now we can not place thie correct SPN for this until we remove the wrong one. so we first delete the wrong ones.
setspn -d MSOMSdkSvc/scomservername scomservername
setspn -d MSOMSdkSvc/scomservername.domain.com scomservername
Next we enter the SPNs on the service account:
setspn -s MSOMSdkSvc/scomservername domain\serviceaccount
setspn -s MSOMSdkSvc/scomservername.domain.com domain\serviceaccount
And we check our results again with the setspn -L command.
Looks fine now.
It must be the certificate somehow.
Open MMC Certificates, check the computer certificate. Is it valid, is it trusted, is it for the right purposes, does it have the correct name... Yes.
momcertimport it again.. only 1 certificate to chose from and its the same one. Restart the Microsoft Management Agent service afterwards.
Wait a second. Let me check in the registry for this certificate. What Momcertimport does is not that difficult. It grabs two properties of the certificate and creates two registry keys for it for SCOM to use.
Aha! NO registry values!
Looking in this key there must be two entries relating to the certificate:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings
Alright, so I will create them manually!
What you do is open the properties of the certificate. You need the Thumbprint and the SerialNumber.
Create a New -> String Value
Name it: ChannelCertificateHash
Copy and paste the Thumbprint contents into it and remove the spaces in between
Create a New -> Binary Value
Name it: ChannelCertificateSerialNumber
Now go to the properties of the certificate and click the Serial Number. Its again a string of numbers and letters in pairs of 2. What you need to do is fill in the pairs of 2 in the registry Binary value IN REVERSE.
Original serial number in certificate = 68 00 AB CD 69 00 23
What you enter in Binary field = 23 00 69 CD AB 00 68
So the pair of 2 characters stays the same, but the order of the pairs in the total string is reversed.
Next I restarted the SCOM services.
Within the minute it started saying that: A device which is not part of this management group has attempted to access this Health Service.
Those were the DMZ machines which just keep trying again and again!
In the end it will have been the certificate rather than the SPN record which messed it up, but at least I could show what things I checked. When the SPN came up I just fixed it as well. In the end it WAS the certificate eventhough I felt that it was alright. Well when in doubt and ALL untrusted agents refuse to talk to this machine, and all trusted ones have no issue... triple-check the certificate and if SCO is actually using it!
Have fun monitoring!
I thought I would take a different approach to thinking about how to make a SCOM monitoring project a success. It is not about technical details or designs this time, but about a way to bring business and IT together into monitoring business related services and being in control of those processes. In a short blog post below I am touching upon some of those items.
In SCOM 2012 R2 we were able to monitor up to 500 Unix/Linux agents per management server or about 100 through a gateway. To be honest I think that was already stretching it, unless the amount of workflows was kept to a minimum.
In SCOM 2016 work has been done to be able to scale up to higher numbers for this. Up to twice as much actually IF you use another monitoring method for cross platform monitoring. I will show you what I mean below.
In SCOM 2012 we were using WSMAN Sync API's to connect to the Linux agents and pull data from them. This is also the default setting for SCOM 2016.
However if you have a large Linux/Unix deployment that you wish to monitoring using SCOM 2016 there is a registry key you can set on the management server which will change the behavior of monitoring to use ASync MI API's. MI in this case stands for Windows Management Infrastructure which is based on CIM standards (the SCOM OMI agent is as well).
In order to get the SCOM management servers to use the new method (and thus scale up more!) you add a registry key to the management server which is monitoring the cross-platform agents.
Create this entry:
HKLM:\Software\Microsoft\Microsoft Operations Manager\3.0\Setup\UseMIAPI
After you do this I suggest you restart the Microsoft Monitoring Agent Service (also called the Healthservice) to be sure this goes into effect. Make sure all your management servers used for this purpose use the same method.
I think if you are monitoring a significant number of Linux/Unix agents in your environment (hundreds) that you change this setting on your SCOM 2016 management servers.
Back to the SCOM 2016 Features - Overview post!
Happy crossplat monitoring!
This blog post will introduce the new SCOM 2016 feature of Management Pack Tuning. It is meant to use alert data from SCOM to determine where tuning may be beneficial. The screenshots are based on the TP5 release of SCOM 2016 and could be changed in a few months as work continues to be done to several features of SCOM.
The way we often used to tune out alerts and management packs was by a few methods. The first method is to import the management packs and sit back and see the alerts flowing in and taking them on one at a time.
The second method was by using reporting:
The two Data Volume reports are actually very useful in going through which management packs cause the most data volume (number of performance counter entries collected, number of alerts, number of events….). And they have possibility to drill down into them as well to see which workflows are the busy ones. After this you could go into SCOM and find the rules and monitors and tune them to your liking.
There are also reports in the SCC Health Check Reports library created by Oskar Landman and Pete Zerger which we can use for this. It is called SCOM Health Check Reports V3 now and can be found in the Technet Gallery.
A new solution
Now in order to facilitate alert tuning for you the product team has worked on a custom solution to help you analyze the alerts and which machines cause the most of this and tune the workflows directly from there.
Starting SCOM 2016 TP5 Tech Preview you can now go into the SCOM Administration pane and in the Management Packs folder you will find “Tune Management Packs” now.
To the right hand side in the tasks pane you will find "Identify management packs to tune" where you can set a time range for analysis. Otherwise just wait 2 days and things will surface.
Now in the middle we see I currently have one management pack which may need tuning and it has given us 32 alerts in a limited amount of time. SO we press the "Tune Alerts" task now!
From here we can see which alert(s) came up during this period. To the right of what is in this screenshot there is also the name of the Rule or Monitor which caused this alert.
Now which possibilities do we have from here? If we right-click we get the following options:
The Copy function will give you the possibility to have a clear text cop of the selected fields so you can put them in a notepad or Excel sheet.
The Overrides option gives you the usual overrides options where you can override the monitor for all objects of this class or a group or single objects.
Of course we can directly open the properties for the monitor right from here.
ANd lastly there is the option to "View or overrides sources" which will open up a popup where you can see which instances of the targetted class (here Logical Disk) have caused the alerts.
From here we can tune the selected monitor for the specific objects which caused the alerts.
As I said at the start of the article, these are screenshots on TP5 preview and there may be changes to come to the interface and possibilities presented here.
The idea is however very clear and I like that this will help a lot of SCOM admins move into the tuning of alerts easier and quicker. Some people know how to do this using available reports both from the default reports or third party reports packs, but this new feature opens this up for more regular use by more SCOM admins.
One more remark here: I tried to fool around with another monitor to force it to give lots of alerts and what happens? Another monitor causes alerts and the one I set to very low thresholds never even fired an alert. ha ha ha ha ha ha.
Back to the SCOM 2016 Features - Overview post!
This blog post discussed one of the new features in SCOM 2016 which is the Management Pack Updates and Recommendations. Now this feature addition was introduced I think in SCOM 2016 TP4 preview version already, but I will discuss it now anyway.
All SCOM admins know that we can get management packs from either the Microsoft websites (and of course community and third party pages for their management packs), or we could use the Import Management Packs option and point it to the Catalog.
In there we have the options of looking for specific management packs, or to look for recently released management packs, or look for updates to already installed management packs.
Thing is that it was easy to forget to look for new management pack updates, and also it often happened that SCOM admins forgot to download management packs for new products they did install on servers in their environment (or new versions like a new SQL version).
A new solution
In SCOM 2016 we can see in the Administration pane an entry under Management Packs called Updates and Recommendations:
From here we can select one management pack and download and install that management pack. There is also the possibility to do that with all of them. This will take you to the management pack download interface we were used to already.
As you can see from above screenshot there are a few management packs where we get an update recommendation, and two management packs this solution found to be missing if you thought you were already monitoring all roles.
What happens really is that this is a mini management pack which runs on all your agents and has very basic discoveries in it. It runs a discovery to see if you have for instance IIS or SQL installed or a number of other roles. These are looking for Microsoft management packs and not custom ones. When it finds certain software/roles installed this feature will check if you have the applicable management pack installed. There will be more discoveries added over time for additional software/features/roles over time.
Also of course there is a pack version comparison done with the catalog to check if you have the latest version of already installed management packs.
Another interesting addition to the tasks pane in that view above is the possibility to go to the management pack guide. This option will take you right to the download of the management pack guide in a web browser.
The second option there is to go to the DLC page. This is the Microsoft download center page where you can find the description of the management pack, its downloads and guides, and installation instructions. Not all management packs have this link enabled, but a lot of them will have.
The last task is called More Information. Now this is also a nice one. It will open a popup and show you which agents are running a workload relating to this management pack recommendation.
In this case it is my freshly installed SCOM TP5 machine needing the SQL 2014 management pack.
This is going to help us manage our management packs and check for updates to currently loaded management packs and also to check for forgotten management packs to get as much monitoring coverage as we can.
Back to the SCOM 2016 Features - Overview post!
Good luck monitoring!
Came along a SCOM 2012 R2 instance which was expired. The license key was not entered on time, so SCOM did not work anymore and the SDK refused connection. Look in the event log and you will see that your evaluation version has expired and you need to enter your key. The thing is that you connect to SCOM through the Shell to activate it and it refuses connection at that point.
The trick is to restart the SDK service and quickly enter the production key.
Just open a normal PowerShell in administrator mode on the SCOM server and throw these three commands in there:
restart-service -name omsdk
set-scomlicense -productid XYZXX-XYZXX-XYZXX-XYZXX-XYZXX -confirm:$false
Of course use the real product key in there where the X's are!
Have fun and good luck!
While chatting with some MVP friends of mine about a specific scenario where data from e-mails needed to be read and monitored, there are multiple possibilities to do it. I proposed one possibility which I implemented at a customer a while ago and got asked to blog about the solution, so here it is. Because SCOM is not built to natively read from a mailbox, one has to come up with a workaround, and in my case I used System Center Orchestrator to do part of the job.
Following is the situation. A number of servers monitored by another company and using another monitoring product. That product monitors servers from several customers of theirs, so we can not directly access it. We could not access or query the product directly either through scripts or commands or database queries. So in the end the result was that the other company would send e-mails from their several monitoring systems to one of our mailboxes. Resulting in 3 e-mails every 15 minutes. The e-mails contained an XML formatted body containing a list of servers and their state.
- So, we have to read 3 e-mails from a mailbox every 15 minutes. Pull out the body of the e-mails. Next merge the content to make it 1 XML file placed on a server with a SCOM agent on it. These steps are not native to SCOM, but a combination or Orchestrator and PowerShell
- After that we can use one of several methods to monitor a text based file on a server to create the monitoring part. For this we can use SCOM.
SO let us start with the first part
Using Orchestrator to get our e-mails into an XML file
I bet there are also other methods of doing this, but this was the method I selected and due to Orchestrator having some flexibility and some built-in actions in the intelligence packs this is very versatile.
Let us check out the email for a second:
We see the XML body there. In this case there are two servers mentioned in the email, however with longer names than how we know them so we need to play around with that too. Also with XML there is a header (first line) and a wrapper (second line start and end of last line), with the two actual content lines in the middle of it. Notice there are carriage returns and also spaces and potential tabs in there, which make it “nice” to filter those out while pulling the XML apart and creating a new XML file from that!
- A destination File share where the final XML file will be placed for being monitored.
- A mailbox where those messages arrive and we can read them from
- We created an automatic rule to place those e-mails in a specific named folder in the mailbox.
- We created a second folder where we can move the already read messages to.
- An account able to read in that mailbox.
- Orchestrator to create a runbook and bring it all together.
- An intelligence pack for Orchestrator which can read from a mailbox. I used the “SCORCH Dev - Exchange Email” IP for this which can be found at https://scorch.codeplex.com/
First import the Orchestrator IP needed to read the email and distribute it to the runbook servers as usual. Next start a fresh runbook and name it appropriately and place it in a folder where you can actually find it within Orchestrator. Advice is to use a clear folder structure within Orchestrator to place your runbooks in. This is not for the benefit of Orchestrator, but for yours!
Now we create the runbook. I will put the picture of the finished runbook here first before going through the activities:
Let’s now cut up the pieces:
Well this one simply says to check every 15 minutes
This one takes the current time from the first activity and at the bottom there subtracts 15 minutes from it. The story behind this is that we want to read all emails which came in between now and 15 minutes ago. So this gives us that point in time.
We wanted our monitored xml file to always have a fixed name. So when we are about to create a new version of that file we first go out to that file share and take the current XML file and rename it by adding a date-time format in the name to make it unique. We wanted to be able to look back in history here, else we would have chosen to just delete it. This makes the folder look like this:
Read mail from folder
Now this is a custom activity coming from the Exchange Email IP we imported earlier.
From the top we see we have to define a configuration. We will get back to that in a second. Next you can see that we are looking for Unread emails in a certain folder (keep in mind folder name must be unique in that mailbox else it just takes the other one, which you did not want to). Now on the left hand side we see Filters:
We also want those emails to have a certain subject line. And we want those emails to be received after the time from the Format Date/Time activity above. Meaning the email was received after 15 minutes ago. So in the last 15 minutes.
Now to get back to the Configuration part. Many IP’s in Orchestrator have a place where you can centrally set some parameters. For instance a login account, a server connection, and so on. This can be found on the top menu bar of the Orchestrator Runbook Designer under the Options menu. Find the item with the same name as the IP you are trying to configure. In this case it needs us to setup a connection to an email server. Type is Exchange Server, type a username, password, domain, and a ServiceURL. For an exchange server this could be https://webmail.domain.com/EWS/Exchange.asmx for example, but check this for your own environment.
Retry Read mail from folder
This one will only run if the first read mail from folder activity fails. You can set properties on those connecting arrows between the activities to make it go here it the first one fails. I made the line color red and set a delay on the line of 20 seconds. Else it will follow the other line and go to the script. This activity does exactly the same as the previous one. We had some time-outs during certain times so this extra loop slipped in there.
So those Read mail from folder activities should contain 3 e-mails received in the last 15 minutes from that folder, unread, with a subject line, and Orchestrator now knows what the body of those emails contains. This also means that the next activity (the script) will run three times.
Run .net script
At the top we define this to be a PowerShell script. So first we pull in the variable, which is the body of the email from the previous step. Next thing we do in the script is remove all excess stuff that we do not need. Empty spaces before and after several lines and entries. Also we will take out those headers and surrounding entries. We can add them ourselves to a clean file, right? SO this should give us a new string which only contains the XML entries for those servers with their state.
Next thing we needed to do is build in some tricks into this script. We know it is going to run three times and we need to stitch the contents together into one file.
Line of thought:
If there is no xml file there to write to this means this is the first time we run the script after the old file got renamed. So we need to create the xml file right now and add the headers to it. Next we add the body to it (server names with state).
If there is a file there with the correct name it means we are either in the second or third run. So what we do is simply write down the body (servers and state) and add the trailing end tag to it. This can be done on the second and third run. However, if this happens to be the third run, we will first check if that trailing tag is there and remove it. And next dump the body again and add the end tag.
So that part takes care of dumping the contents into the file following the above thought process (with the first thought coming at the end as the Else statement). Sorry for the Dutch comments, but you get the idea.
Next we take the e-mails found by the Read mail from folder activity and move them to the other folder in the mailbox.
So, that is the whole runbook to get a few emails and merge them together so we can monitor the thing!
There is a separate runbook which cleans old files from that file share and which cleans old emails from that folder in the mailbox by the way. At least we can look a few days back what happened.
The monitoring part in SCOM
Now I am not going into all the details of this part. I had a reason to not link these entries directly to the monitored servers, or to write the xml file to those servers. I opted to create a watcher node (and its discovery from a registry entry on that machine). That watcher node is the server with that file share and the xml file on it.
Next I created watchers in a class, and discovered them through registry as well. Containing the names of the servers we wanted to check for in the XML.
For each watcher it runs a PowerShell monitor which goes into the XML file and finds its corresponding entry (server name). Next it picks up the State (which is a number) and we translate the 12 possible numbers into green/yellow/red type entries and place them into the property bag. That gets evaluated into the three states we know so well.
Next we could throw those watcher entries for each server and also some other entries onto a dashboard. We could see the state the other party saw from their monitoring system and the state we see from SCOM side on one dashboard for those servers and monitored entries. We have the hardware/OS layer with a few extras, and they have an OS layer and application layers which we could not pick up.
As you can see sometimes we run into situations where there is no other way to get monitoring data than through workarounds and the long way. This is not ideal. As you can understand there is dependencies left and right for this whole chain to work. If there is no other way then that is the way it has to be. Direct monitoring or direct connecting is preferred.
But this shows how you can get monitoring data from e-mails into SCOM, in this case through the use of Orchestrator and watchers because that was what we needed.
Shout-out to amongst others Cameron Fuller for making me write this post!
Bridgeways has been working very hard this last year in coming back into their speed for creating management packs, updating existing ones, making them more intuitive and useful to work with, adding support staff and also hiring a very competent CTO (a good friend of mine and fellow MVP Simon Skinner ) who takes management pack quality very seriously. It is exciting to see the progress being made.
Today the new version of their VMWare monitoring management pack was released. It contains updated views and dashboards. The new style dashboards also were augmented with a few core dashboards:
- Host Performance: includes CPU Usage, Memory Usage, Swamp Memory Usage, Balloon Usage, Network Usage and Storage Usage data.
- CPU Performance: includes CPU Usage, Average CPU System Time, Average CPU Ready and CPU Wait data.
- Memory Performance: includes Active Memory, Balloon Memory, Shared Memory and Swapped Memory data.
Also there is an expanded set of reports. The combination gives better and quicker insights into the VMWare environment in order to troubleshoot issues and to proactively find upcoming issues.
Now as some of you probably know I have always been a big fan of a competing vendors management pack for VMWare (Veeam) and I still am. However it is good that a few other vendors have been looking seriously at creating a good management pack which can cover this monitoring scenario. I know a few have been working on this and are becoming serious contenders when it comes to product selection for this purpose. The Bridgeways pack will be one of them now in every selection process. Especially when there is a price difference to be found between solutions our customers (and you) will be looking at price/quality points and if it covers that what you are initially looking for. I know all my customers always look at price very closely and I do not think its a pure Dutch thing to do that.
I will be examining functionality and pro's and con's of these packs closely in the future.
So for now congratulations to Bridgeways for taking the step forward and we are watching you closely
For any questions regarding this article or any management packs feel free to contact me.