SCOM 31552 Data Warehouse configuration synchronization process failed to write data

SCOM, System Center, SCOM Tricks, SCOM 2012 Send feedback »

Today I investigated a case where SCOM had an alert with the following name and contents:

Data Warehouse configuration synchronization process failed to write data

Data Warehouse configuration synchronization process failed to write data to the Data Warehouse database. Kan geen gegevens in de datawarehouse opslaan.
Uitzondering SqlException: Sql execution failed. Error 2627, Level 14, State 1, Procedure ManagementPackInstall, Line 2879, Message: Violation of UNIQUE KEY constraint 'UN_ManagementGroupManagementPackVersion_ManagementGroupRowIdManagementPackVersionRowId'. Cannot insert duplicate key in object 'dbo.ManagementGroupManagementPackVersion'. The duplicate key value is (1, 2020, Jun 18 2014 3:15PM).

There are event log entries in the Operations Manager event log with ID 31552 with the same kind of contents.

Now I want to give a big shout out to the guys in the SCOM Support team, who are writing Support Tip entries on their blog for common issues and solutions. I found their Support Tip quickly and the contents is very clear on how it probably happened and what not to do next time and how to solve the issue at hand.

Support Tip: Data Warehouse synchronization failures following restore of the OperationsManager DB

And yes in my case we did decide during some problems earlier not to restore the DW database because it was huge and so on :> And yes that probably was the cause. Was during an upgrade of SCOM and the upgrade wizard failed and killed the management server and touched the operational database so decideed to restore the OpsDb because well they wouldn't have touched the datawarehouse db yet right? Wrong, they do :roll: So lesson learned for sure, when restoring one database just restore the other one to the same point in time.

So ran the SQL script provided and pasted in the correct key value pair into the script and ran it. Sure enough it was the Notifications Internal Library. Exported the pack, increased the version number. Imported the pack. And few minutes later the 31554 event popped up in the event log.

Thanks again to the SCOM Engineering Blog and the escalation engineers behind it for publishing these kind of support tips.

Bob Cornelissen

Microsoft Ignite

SCOM, SQL, Hyper-V, Exchange, System Center, Active Directory, Windows 2012 Send feedback »

Just now Microsoft announced the wave of main events of next year. The expected event which brings together several tech events like TechEd and Management Summit and Exchange/Lync/Sharepoint/Project conferences into one big event. Well here it is and it is called Microsoft Ignite. Scheduled May 4 to May 8 in Chicago. Read more about it on this page:

http://blogs.microsoft.com/blog/2014/10/16/introducing-microsoft-ignite-lineup-top-conferences-2015/

This page also lists some of the other conferences in the year like Convergence, Build and WPC.

Enjoy!
Bob Cornelissen

Case of the fast growing SCOM datawarehouse db and logs

SCOM, SQL, System Center, SCOM Tricks, SCOM 2012 Send feedback »

This is a post in continuation of my previous post on the topic of upgrading a SCOM 2012 SP1 to a SCOM 2012 R2, which went wrong and how I was fixing it. After fixing the SCOM instance everything seemed alright for half a day. The very next morning we however saw something was wrong with the SCOM datawarehouse database. Below are some of my lessons learned and some SQL stuff B) It is all a long story. So sorry. I put it all down because it was a learning experience and it just happened to unfold this way. And also here there are no screenshots available anymore so will have to go for a load of tekst.

So what happened?

First message we got was that the log file beloning to the datawarehouse database had grown to the size of the disk and the disk and log file were full. Second message was that the database was in recovery mode currently.

Investigate

So I went into SQL Management Studio on the machine and right there it said that the database was in recovery mode. Nothing else could be done with it. No reports to be run (I like the Disk Space report in SQL). No properties of the database to be opened. Looking at the eventvieuwer Application log I saw event 9002 "The transaction log for database 'OperationsManagerDW' is full due to 'ACTIVE_TRANSACTION'." and event 3619 "Could not write a checkpoint record in database OperationsManagerDW because the log is out of space". So next on the agenda was to first increase the size of the disk the log file was sitting on with 5 GB.

Log File

This prompted the log file to start growing again in its set increments and it filled the increased space right up. Hmmm :( ALright, lets do that again with 10 GB more space. And I even found an old file sitting on the disk of 5 GB which didnt belong there. SO it had 15 GB to work with.

Log file growing again. Database was out of recovery mode I thought. Great. But wait. Log file growing still. End of disk. Recovery mode turned on again! Ahhhhh.

Database in Recovery mode

When the database is recovering it will log Application log entry 3450 with a decription like this:
"Recovery of database 'OperationsManagerDW' (8) is 0%% complete (approximately 71975 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required."

It needs to get through all three phases of the recovery and in my case the 20 hour estimate ended up being close enough. So best would be to let this recovery job finish and clear my log file! Gave the disk 150 GB more space to work with. And recovery was underway for the next 20 hours and meanwhile the log was growing nicely and I could not play with the datawarehouse database in any way. So lets wait.

Recovery model

Now the thing you need to know about log files in SQL databases is there are three kinds of Recovery Model for a database. Full or Simple or BulkLogged. Open up a SQL management studio, go to a database. Right click Properties. Go to the Options tab and on the second line it states the Recovery Model. When it is set to Full the log file will fill with all transactions happening on the database and it will stay there until you make a backup of the database. The backup will flush the log file at the end and you have an empty log file again. If it is set to Simple mode the basic method is that a transaction happens on the database and it written to the database and at the end of the transaction it cleans the log file. This is the simple explanation mind you, however it is a bit more complicated as I discovered. Will come back later.
The SCOM databases are set to Simple mode Always. But because the log file was growing so fast I was starting to believe something went wrong with that. Right click properties on the database. Error. Sigh. Now it starts to be clear to me that I am a clicking kind of guy. I am used to clicking my way through options and settings in most applications. And now I needed some SQL queries.

Asking for advice!

At this point I called a friend of mine who knows all about SQL, Jacky van Hogen. For some advice on how to find out what was happening and for some guidance on how to proceed when the recovery was finished. I think I actually waited for the recovery to finish first. At that point I still could not open the properties of the database, but as we found out we could simply run queries against it :D Thank you so much for handing me some queries :idea: :>> :!:

Running checks

First a check on the recovery mode and on the status of the databases:

select * from sys.databases

Gives you a list of the database and one of the columns tells you if it is online and another one tells you recovery model. Database was now online and recovery model Simple.

Next we want to see what is going on. So directed this query against the OperationsManagerDW database:

select * from sys.dm_exec_requests

Check the sessions with an ID above 50.
You will find your own query requesting these data as well using your own session ID. I did not know this but at the top bar of your SQL Management Studio is your query session ID between brackets for each query session. Nice, so ignore that session and focus on the rest with numbers above 50.

Sure enough there were a few of those. A few were waiting for a session 117 in my case and it was an INSERT statement. Over the hours that followed I saw it go from Running to Suspended to Runnable to Running again all the time. And meanwhile the log file was still growing.

Now lets look at the oldest open transaction:

dbcc opentran()

Sure enough, there was session ID 117 and it was running since a number of hours by that time. Actually after recovery was succesful I had restarted SQL Services, hoping that would make it stop its transactions and flush the log file. Doesn't quite work that easy :-/ But at least we could see what happened.

Few things I learned during a short discussion with Jacky were:

- This oldest open transaction needs to finish its work first. Next it will clean the log file. Any jobs running alongside will also write to the log file and these will not be cleaned out. This is because there might be a chance that the currently running transaction might still need those data in case we need to stop that transaction and it will do a rollback. Aha, so there is slightly more to the Simple recovery model than I thought.
- We can kill the transaction simply with the command Kill 117. It will do a rollback of all its actions and in the end clear the log file. Or it should. This takes a while for something that has filled up over 200 GB in log space by now. However there is the biggest chance that the job will just start again and from the start and take the same amount of data again and more.
- Best thing in this case would be to give it the space it needs and let it finish its work. After that shrink the log file and clean up.

So we decided to give it some time and meanwhile keep an eye on it.

Checking the log file

She asked me to check out the log file contents.

dbcc loginfo

Now I might be saying this wrong or use the wrong terms but this basically gives you the Virtual log files within the log file (or log files) for that database. I think I will just see it as the pages in a book. You can have 1 or more log files (books) for a database. Each log file has a number of pages to fill up with data. Normally the log file writes sequentially (so it writes like in a book from beginning towards the end), but when bits and pieces get cleared out it could be that most of the log file is empty but still has some pages written at the end. This is usually the reason why sometimes you can not shrink a log file on the first try. It will clean out pages and find it can not clean one in the end so it can not make the log file smaller. Repeating it a few times makes the last change jump to the front of the log file again and the end becomes cleaned up, so we can shrink the log file. By the way we can write a checkpoint in the log by simply giving the command Checkpoint in a query.
Well in my case we were looking first at if the pages were all in use. Check the Status column. If it says 0 it is empty and if it sas 2 it is full with data. In my case most of the file was full. SO not much to gain by trying to move daat around inside the log file and shrinking it because the transaction had it all clearly in use.

Also we found there were way too many virtual log files (pages in my example) in the log file. Probably caused by the many auto-grow events. An interesting article forwarded to me by Jacky is http://blogs.msdn.com/b/saponsqlserver/archive/2012/02/22/too-many-virtual-log-files-vlfs-can-cause-slow-database-recovery.aspx

Watching the transaction do its work

Also interesting was to see how the transaction 117 went through the whole Running - Suspended - Runnable - Running status changes while running the "select * from sys.dm_exec_requests" command. This was due to the autogrow of the log file each time among others. Waiting for the file extension to be created, waiting for the disk (thats while it was suspended) and next it will go to the runnable status and waits for open threads to get processor time and jump to running status. Again this is the short and simple way of saying it I guess.

Also Jacky sent me a query to check if this transaction was using so much of the log space:

select * from sys.dm_exec_requests r
join
sys.dm_tran_database_transactions t
on t.transaction_id = r.transaction_id
and t.database_id = r.database_id

And check for the field database_transaction_log_bytes_used .
Sure enough it was transaction 117 using a few hundred GB of log file space.

Creating additional log files

Another thing which worried me was if I could keep expanding the log disk like that. There will come an end to the storage lun at some point right? So alternative would be to create additional log files for this database on other disks. Go to the database and right click to open the properties and to add a log file right? Wrong, could not open properties of the database still at this point, so had to use the TSQL again for it. I had done this once before for another customer. http://technet.microsoft.com/en-us/library/bb522469(v=sql.105).aspx
AN example perhaps:

USE master;
GO
ALTER DATABASE OperationsManagerDW
ADD LOG FILE
(
Name = OpsDW,
FILENAME = 'D:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\opsdwlog2.ldf',
SIZE = 100MB,
MAXSIZE = 10000MB,
FILEGROWTH = 100MB
);
GO

And yes in hindsight I should have ignored the autogrow setting and just made it fixed. It would be a temporary file anyway. In the end I could add space to the disk where the big log file resided anyway.

Give up your secret mister transaction

All of this was really bugging me and I was trying to figure things out as they came along. So I went out and tried to find out more about the query which was running. Our illusive number 117. What are you doing mister 117?

I found a query somewhere on the internet. Sorry, I did not record where I found it. It is an extention of the command I used before to check what it was dong. I will paste it below:

USE master
GO
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT
er.session_Id AS [Spid]
, sp.ecid
, er.start_time
, DATEDIFF(SS,er.start_time,GETDATE()) as [Age Seconds]
, sp.nt_username
, er.status
, er.wait_type
, SUBSTRING (qt.text, (er.statement_start_offset/2) + 1,
((CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2) + 1) AS [Individual Query]
, qt.text AS [Parent Query]
, sp.program_name
, sp.Hostname
, sp.nt_domain

FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt
WHERE session_Id > 50
AND session_Id NOT IN (@@SPID)
ORDER BY session_Id, ecid

Alright, so this one uses the same kinds of commands, filters out only sessions above 50 and gives some info on the query running and the parent query. Now what I saw in both cases was Alert Staging.
Now lets get back to SCOM again, because that sounds familiar!

Alert Staging

The basic way that SCOM works with this stuff is that the Agents send data to the management server. The management server inserts data into the Datawarehouse database. Next the data must be aggregated. It does this into a number of staging tables. There are staging tables for Alert, performance, event and state. Next inside the SCOM workflows there are some rules which belong to SCOM itself which kick off stored procedures in the datawarehouse. These process the data from/through the staging tables and put it in the historical tables and do some stuff with it and next clean the staging area up again.

So what this seemed to be telling me is that one of the management servers kicked off a rule which kicked off the stored procedure to handle the alerts in or through the alertstaging table. By the way I could see which of the management servers this was as well. But these kind of jobs run often and only have a rows of data to work through! Lets have a look.

SELECT count(*) from Alert.AlertStage

Uhmmm, 300 milion rows ?!?!?!

Now thats an alert storm of some kind and probably programatically. As there is no way we have that many alerts in a short amount of time. Now I know that this transaction will never get through this amount of data and what is the point? These are not normal alerts and there is no value in retaining them. Somebody wrote a nice blog post about another issue where the data from the alert staging was not written to the normal tables: http://bsuresh1.wordpress.com/2014/03/18/alert-data-is-not-being-inserted-into-scom-data-warehouse/. In my case I was not interested in the alerts, so I did not go for temporarily moving them to another table and running through them manually with that stored procedure.

Cleaning

I opted to clean out the table.

TRUNCATE TABLE Alert.AlertStage

Sit back and wait for 300 milion rows to be removed. I had the hope that once the transaction 117 realized there was no more data to process that it was done and ready and could clean up after itself and thus the log file. Guess I was not that Lucky because 2 hours afterwards it was still running. So I was done with this and killed session 117 (Kill 117). Of course this caused a rollback and a few hours later...

Next run a check on the four staging tables:
SELECT count(*) from Alert.AlertStage
SELECT count(*) from Event.EventStage
SELECT count(*) from Perf.PerformanceStage
SELECT count(*) from State.StateStage

All normal with low numbers.

It also cleaned up a few hundred GB of space in the database itself :roll:
And from this point I could finally open the properties if the database too!

So ran the task of shrinking the file. Right clik the database - task - shrink - file.
Make sure you select the log file of course! it shows the amount of available empty space inside (which was a lot). SO went ahead and shrunk it down. If it doesnt work open a query window and type the command "checkpoint" and try it again. This can be the case when there is something still writing around the end of the file. As soon as it wraps and starts writing at the start of the file the end will be clear and shrinking will work.

Concluding

So what was the cause of all this mess? I can't say, but we saw this happening within hours after the bodged update of SCOM from 2012 SP1 to 2012 R2 and UR3. And yes I did run those SQL scripts belonging to UR3 as well. It is possible that the upgrade wizard which killed my first management server and the operational database might have touched the datawarehouse as well before it even reached that step in the wizard? I do not know. Next time just go for sure and restore not only the operational database but also the datawarehouse database. Even if it is over a TB in size. All the stuff that happened in the story described above took a load of time as well.
But it was a nice learning experience on some points as well.

Thanks

Just happy that the whole system is running well again.
A big thanks to Jacky van Hogen for her advice on the SQL pieces over the phone on her lunch break! Just a few minutes of good advice and some pointers in the right direction from an expert in her field was such a big time saver and reduces stress.

It also sparked an idea, which I will get back to later.

Have fun!
Bob Cornelissen

SCOM 2012 SP1 to R2 failed and how to fix failed management servers

SCOM, System Center, SCOM Tricks, SCOM 2012 Send feedback »

A few weeks ago I did an upgrade of SCOM 2012 SP1 UR6 to SCOM 2012 R2 (and later to UR3) for a customer. Of course they have a test environment with SCOM in it and I went through the whole process and everything looked fine there. Next I was allowed to do it on the production environment. This has 6 management servers and some bits and pieces in it.
The story below contains a recovery command for a dead management server when you still do have the SCOM database, so if that is your case just read on through as well.

This post does not contain pictures as I cant bring those back anymore. It also does not contain any sound, which is good because some of the things I had to say about the upgrade that day were not that positive. By the way, most important thing about the day was to remain calm. Whatever happens during the upgrade or when you see stuff breaking, keep calm and take it step by step. I managed to fix it in the end and have a good upgrade and bring back dead SCOM servers to life. So can you.

First of all read the following story for the procedure to upgrade from 2012 SP1 to 2012 R2:
http://technet.microsoft.com/en-us/library/dn249704.aspx

This is the page where you do some checks and run a script and such things and the rest of the steps for the upgrade are listed in the other pages on the left-hand side of that page.

Now the step you will want to do is to backup your databases! Please do this. I did and I was happy for it as you will soon find out. Keep in mind to backup all databases involved!

First lesson learned here is that you need to install prerequisite software which was changed in version between SCOM 2012 SP1 and 2012 R2. This is ReportViewer 2012 (instead of 2010) and that one again has a prerequisite on SQL CLR Tools. Go to this page and find the prerequisites for the Operations Console. It lists the ReportViewer redisteributable and in a box below it the CLR Types prerequisite:
http://technet.microsoft.com/en-us/library/dn249696.aspx

First run the CLR Types prerequisite installer and if it assks to do a reboot, please do so! Next install the ReportViewer 2012. The setup wizard below will check for the prerequisite software and for any pending reboots :idea:

One of the lessons learned much earlier with these upgrades of SCOM is that you run the SCOM setup only on ONE management server at a time. Do not try to start up and run through the first half of the wizard on each box so you think you can win 30 seconds. It WILL break your SCOM if you do that (explanation in very short: the first step the wizard does is check if the SCOM db has been upgraded and if more servers than one think it is not and it does the upgrade on multiple servers anyway you break the stuff). So do not touch the SCOM setup except for one management server at first.

So on a management server, with the correct rights (all of them...) you start the setup from the SCOM 2012 R2 install media. The setup should quickly see that you are doing an upgrade. If you do not see that something is wrong. If you do see it thats fine. Walk through the wizard and enter the service accounts to be used. Pasting of passwords in these boxes might be troublesome and sometimes you need to copy a second time and to a notepad or the search box on the machine to confirm you will be pasting the right string. Just saying, I ran into this a few times and was wondering why it would not validate the accounts I typed.

If everything has gone right you should now be ready to run the upgrade wizard and it should upgrade your management server + databases + management packs + local console and if you have more functions installed on that machine also those.

In my case the wizard ran for a few minutes and next gave me an error on the second step (the database upgrade phase). So click OK and try again? Well...

Problem with this upgrade wizard is that if it runs into a problem it will NOT do a rollback of anything it did.
The first step looks like a preparation step in this wizard but what it actually does is remove the bits of your management server! And next it went to the second step and first thing it does it touch the database and let it know it is in upgrade mode.
So... my managment server was dead and gone!

I thought, well perhaps it is just that management server, I have a few more. Lets try the second one and I will fix the first MS later. Well nice try but the upgrade wizard immediately sees in the database that there is already an upgrade going on, so it will cancel out.
I searched and found the log file of the upgrade wizard and found no real good clue about what had gone wrong :-/

Next thing I did was to restore the SCOM database to the state right before the upgrade of the first machine started. What this does is that it doesnt know any upgrade business had been going on and it knows that the first management server should be there and working (which it wasnt). In hindsight I should have made a restore of the Datawarehouse database as well, so if you run into something like this restore all components!

SO now I started the upgrade on the second management server. Acting as if nothing happened. Ran though the upgrade wizard and pressed the Upgrade button. First step done, database step... taking a long while... and I see that it is doing stuff like importing management packs for a long time. This is expected. The management server which does the upgrade does take a while to go through all its steps. The rest of them later only need to upgrade their executables and such so will be much faster! And next I see this step through the whole wizard until the end. Upgrade done! Yeah!!! B)

So next up... 4 more living management servers to go. I started to upgrade them one by one as well now. And only the last one also failed on the second step somewhere and got killed due to no rollback.

Alright, so two management servers to be restored. First quickly lets do the web/report server. Start the upgrade wizard... Error :> And this time it had checked that the management server it was pointing to had not been upgraded yet (yes you guessed it, that was the first dead server it was pointing to). So to keep the stuff moving I went into fixing the two dead management servers first. I will write that down below in a minute.

I can tell you that after that the rest of the components upgraded smoothly after I fixed those two boxes.

The funniest thing was of course that at the end I also upgraded the Linux/Unix agents. You need to know that from 2012 Sp1 version to 2012 R2 version the cross platform agent was completely changed and built up from the ground. It is now not using OpenPegasus anymore, but OMI instead. And to my surprise this was a select-all, upgrade agent, yes use the default account (that one had the rights in my case) and go go go. 2 minutes later and all the Linux SCOM agents were upgraded and functioning without any error. I will have to hear this from the Linux admins for some time I guess that of course that part of the upgrade went fine and fast.

Reparing the management servers

What we can do in this case for a dead SCOM server where we know the database is still working fine and it still thinks there is a management there. In that case we can run the SCOM setup using a recovery option.

First thing to check is:
Add/Remove programs. Yes SCOM is not there anymore. Next in case you have Savision I would say remove the console extension parts. We will install those again after the SCOM console is installed again. Remember also that in my case the server died when it was a SCOM 2012 SP1 box, and in 2012 R2 the file paths change.

What I used in this case was the setup media from the SCOM 2012 R2 media directly. I know the box died when it was 2012 SP1, but meanwhile now all management servers and the database were upgraded to R2 already. So make sure you can access the SCOM setup media. In my case 2012 R2 version.

I used a command version to run this recovery. As you will see it is largely the same as a clean install, except that it has the /recovery switch in the command.

Open up taskmanager by the way so you can see setup.exe running and the second installation process as well. It takes a few minutes and next they disappear and the installation should be finished. Of course we assume all prequisite software as mentioned before was already done.

This recovery command below will put the management server componet back. It will not install the SCOM console. You can install that separately through the normal setup wizard after this is done. Next the Savision console extensions in my case. And the Update Rollup stuff comes well after this story of course.

Here is the command I used and I changed the accounts and passwords. Due to the way this blog displays it I have to enter line feeds in this to display it correctly. Keep in mind that this is ONE command on ONE line. When you copy and paste please first paste it in a notepad and make sure it returns to being one line with only a space between the parameter switches!!


setup.exe /silent /AcceptEndUserLicenseAgreement /recover /EnableErrorReporting:Always
/SendCEIPReports:1 /UseMicrosoftUpdate:1 /DatabaseName:OperationsManager
/SQLServerInstance:THESQLSERVER1 /DWDatabaseName:OperationsManagerDW
/DWSQLServerInstance:THESQLSERVER1 /DASAccountUser:CONTOSO\scomsdk
/DASAccountPassword:TooDiff1cult /DataReaderUser:CONTOSO\scomdra
/DataReaderPassword:TooDiff1cult /DataWriterUser:CONTOSO\scomdwa
/DataWriterPassword:TooDiff1cult /ActionAccountUser:CONTOSO\scommsa
/ActionAccountPassword:TooDiff1cult

So change the values according to your environment and use it.

Just to continue the story up the upgrade process for this specific environment... after fixing all servers and upgrading the web server as well we continued with the Update Rollup 3 upgrade including all steps (keep in mind there is also a step in there with SQL scripts to be run against the databases). Also the UNIX/Linux agent upgrades downloaded and run and management packs imported. Give all of this time as there is a LOT to synchronize.

Upgrade all the agents. Windows agents, cross platform agents and a number of left-over consoles.

So was this it? &#59;)
Well, no. :> 88|

First of all we ran into the changed code signing certificate for the web console components when run from desktops with users without rights (see http://www.bictt.com/blogs/bictt.php/2014/10/02/scom-2012-web-console-configuration ).

Second thing is the day after we discovered that the SCOM datawarehouse database was going nuts! I will write about that very soon on what appeared to have happened, how to diagnose, what to look for and how we eventually fixed that. One of the coming days.

Good luck!
Bob Cornelissen

SCOM 2012 Web Console Configuration Required

SCOM, System Center, SCOM Tricks, SCOM 2012 Send feedback »

Often after updates of SCOM 2012 / 2012 R2 or after a clean install people go to the SCOM Web Console and they get the following notification:

Web Console Configuration Required.

A user can click the configure button and run the executable and refresh the browser window and you should see a web console login page next.

Strange thing is that I would have sworn that I wrote a blog post about this last year, but I really can not find it online.

There are two reasons why I am writing a post about this.

  1. Some users do not have rights to run this executable on their desktop
  2. The occasional Windows XP user

In my case I ran into both cases last year and currently am in the situation where there is a VDI solution and users do not have rights to run this executable on their desktop and if they did it would be the same error every time they login.

So the easiest way is to figure out what this executable is doing in the first place and next apply it to those machines through another method.

The reason why you can have this more often after upgrades is that there are portions of the code behind this web console which are signed with a code signing certificate from Microsoft. This certificate is valid for a year or a bit longer. After a few Update Rollups suddenly the code signing certificate they used got changed and now when you go to the web console it gives you that Configuration task to do. In my case this happened while applying SCOM 2012 R2 UR3.

What the executable is doing is adjusting some rights for the Silverlight stuff (it sets this entry to true for both 32 and 64 bit for Silverlight: AllowElevatedTrustAppsInBrowser) and importing a code signing certificate in the Trusted Publisher store. All these actions can actually be replicated through the registry, so we are in luck. The rights things are still the same and only the part of the certificate changes. Well, because it's a new certificate again :p

I will provide a registry code below. There is one portion where the certificate is defined and that is the thing that needs to get changed according to your situation. Basically this is the same for everybody using the same version and rollup level of their SCOM installation.

Alright, here we go:

On a desktop where you do have full rights, go to the SCOM web console and run the configuration tool. Refresh your browser window and go to the same website and you should end up at a login page now.

Open up an MMC, add the Certificates Snapin and select Local Computer.

Go to the Trusted Publishers - Certificates folder. Find the Code signing certificate. The above pictures shows you two of them. One with validity until April 2014 and one with validity until July 2015. In my case I was first using the other one and after upgrading SCOM to a new Update Rollup level. And now the other one got added. So open up the properties of that certificate:

Now in the details tab find the Thumbprint entry. We need that string. You can also keep this open and compare it to what you are seeing in the registry next.

Open up the registry with regedit:

Go to:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SystemCertificates\TrustedPublisher\Certificates
Next find the folder with the same string as the Thumbprint of the certificate you were looking at. In my case there are only 8 folders or something like that, so it was easy to find the correct one. I have it selected here in my case.
Now right-click that specific folder (Key) and export it to a .reg file on your machine. Open up that reg file with Notepad.

What we are looking for is the part where it lists where the Key (folder) is located and that long blob entry. In my case it looks like this:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SystemCertificates\TrustedPublisher\Certificates\67B1757863E3EFF760EA9EBB02849AF07D3A8080]
"Blob"=hex:18,00,00,00,01,00,00,00,10,00,00,00,fe,24,f2,ea,00,13,0a,30,ca,fa,\
and so on and so on.

Take that line and the whole blob entry and paste it into the reg file below in the place of where my entry is for the same. Save it. If you happen to have the same version of SCOM (SCOM 2012 R2 UR3) you might not need to do it, but it is a good check to see if this is in fact the correct certificate loaded (check the Thumbprint of your certificate with the name of the key in my reg file below).

Try to import it into a second workstation where you have not run that configuration tool from the SCOM Web Console. Import the reg file. Next open up Internet Explorer and go to the SCOM Web Console site and see if you get the prompt to run the configurator or if you immediately get transferred to the login prompt. If you see the login prompt it has worked.

Now here is the attached txt file. Rename it to a .reg file.

om2012r2ur3webconsolefix.txt

Good luck and enjoy!
Bob Cornelissen

Attachments:

I am back

Uncategorized, SCOM, System Center, SCOM 2012 Send feedback »

Just wanted to say I am back. Last few months I have been working mostly behind the scenes for a while due to several reasons. Still doing the System Center stuff and working on some things you will be seeing in the near future though. But now I am back and will be a bit more visible over here and elsewhere in the community again.

Also again a shout out to the MVP's who were renewed yesterday and to those who joined the club! Just to name a few of the new joins: Travis Wright, Dan Kregor, Telmo Sampaio and Tao Yang. It is great to hear you are part of the System Center MVP crew now! All names we have known for years and well deserved.

Orchestrator 2012 PowerShell script fails to run

Windows 2008, SCOM, System Center, Windows 2012, SCORCH 2012 Send feedback »

This week I added an Orchestrator 2012 Runbook server to an existing one for scale-out and high availability reasons. Very soon it was ready to go and I was making some additional runbooks to use together with SCOM. In these runbooks were Run .Net Script activities with PowerShell scripts in there. And I noticed the script activities would refuse to run. Except when I ran them separately as a normal PowerShell script. SO I went in the history and checked what had happened:

File C:\Program Files\System Center Operations Manager 2012\Powershell\OperationsManager\OperationsManager.psm1 cannot be loaded because the execution of scripts is disabled on this system. Please see "get-help about_signing" for more details.

O right! So I opened up a PowerShell prompt using Run As Administrator and I typed "Set-ExecutionPolicy Unrestricted".

And the script failed to run again! Wait a second perhaps it is because I forgot to run the same command on the other runbook server. Oops. OK running that command again and....
Fail!
What?

I went searching for it and I saw a comment in a thread somewhere saying it could be that the same command needs to be done for the 64 bit version of PowerShell as well!

Open:
C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe
and type:
Set-ExecutionPolicy Unrestricted
Did this on all Runbook servers this time as well B)

And try again. Working fine!

Many big files in System Volume Information

Windows 2008 Send feedback »

Was monitoring an Exchange machine today and got a message that the Log disk contained less than x percent of storage. First thing to check if backups have run on the machine, because if those fail the log disk tends to fill up quickly. But the backups were OK and there was only a few hours of data on the disk. After short investigation it turned out to be the System Volume Information directory causing this and it had a lot of big files in it with a number of several GB each. Thing was most of these files were older than a year. So hereby a few quick commands:

First one to list the shadow copy space reserved and used for each disk:
vssadmin list shadowstorage

This got me some output and lets list here some stuff for the Log disk:

Used Shadow Copy Storage space: 52.481 GB (52%)
Allocated Shadow Copy Storage space: 52.983 GB (52%)
Maximum Shadow Copy Storage space: UNBOUNDED (100%)

Aha, so it had no limit and was already occupying over half the disk space and 52 GB. By the way the data disk was a factor 10 bigger in size and also had well over half the disk size used by only this stuff.

You can also open the Explorer and take the properties of the disk and look for the Shadow Copies tab. Also to make alterations.

In this case I used the available commands though. First I set the amount of disk space to use as maximum to only a few GB, which causes the old files to be deleted. Next I used the same command to raise the maximum space to be used again to a more suitable number.

vssadmin resize shadowstorage /on=E: /For=E: /Maxsize=19GB

So the above command sets the maximum size on the E drive to 19GB.

Sure got rid of the problem of Exchange management pack telling us during each day the Log disk was running below 50% free space already.

Very good, on to the next isue to be solved!
Bob Cornelissen

Getting SQL information from SCOM discovered inventory

SCOM, SQL, System Center, SCOM Tricks, SCOM 2012 Send feedback »

I often get questions for getting SQL info together, such as names, instances, versions, editions and so on for all kinds of purposes. Sometimes as inventory, sometimes to find instances no longer supported, rogue instances, needed for licensing info and so on.

The first thing to understand is that SCOM is not a CMDB. There are tools like SCCM and SCSM for those kind of things. However if a SCOM agent is installed and the SQL management packs are imported they will discover the SQL component and put some info in the discovered inventory for you.

So first thing I usually do for this and other reasons is to go in the monitoring pane all the way to the top in the left hand side menu and find Discovered Inventory. Next on the right hand Actions Menu go for Change Target Type. Next find the SQL DB Engine and select it. Now you should get a list of all SQL database engines and their versions and names and lots of other information. In the case of this management pack it is also possible to go to the Microsft SQL Server management pack folder to the left hand side and expand the server roles folder and select a state view, such as for database engine. It has the same information (could be you use the Personalize View actions item to add columns you are interested in). Keep in mind that the SQL DB Engine is not the only possible SQL component which can be installed. There is also Reporting Services for instance which is very common. The state views here are nice and fast to find your instances of those as well.

Now, lets pull this info into a CSV file using the Operations Manager Shell (these are two lines, enter as separate commands, and note these are SCOM 2012 commands):

$MyDevices = get-scomclass -Displayname "SQL DB Engine" | get-scomclassinstance

$MyDevices | select @{Label="Computer";Expression= {$_.'[Microsoft.Windows.Computer].PrincipalName'}}, @{Label="Instance";Expression= {$_.'[Microsoft.SQLServer.ServerRole].InstanceName'}}, @{Label="ConnectionString";Expression= {$_.'[Microsoft.SQLServer.DBEngine].ConnectionString'}}, @{Label="Version";Expression= {$_.'[Microsoft.SQLServer.DBEngine].Version'}}, @{Label="Edition";Expression= {$_.'[Microsoft.SQLServer.DBEngine].Edition'}} | Export-CSV -notype C:\sqlinstances.txt

And Voila you have a text file with the required info. What happened is that we are looking for a class called SQL DB Engine and we pull in all instances of that class. Next we select for each DB engine the ComputerName (you could have used Path as well there), Instance Name, Connection string, SQL version (as a number) and SQL edition (Standard/Enterprise/Express). Throw the CSV file into Excel and you will have the data in clear format.

This basically works the same way as in a post I did earlier about how to get devices (network device, windows agents, unix/linux agents) out of SCOM through PowerShell.

You can go deeper for instance by trying to find only instances of a certain version or edition and to sort the output. It is very versatile.

Enjoy!
Bob Cornelissen

SCOM 2012 Linux agent update fails with no tty present and no askpass program specified

SCOM, System Center, SCOM 2012 Send feedback »

While I was upgrading a bunch of SCOM 2012 Unix/Linux agents to a higher rollup level the other day I noticed an error on one of them. I need to quickly say that upgrading the agents was otherwise a breeze by just selecting a few of them and using the update agent option and using stored credentials and waiting for about 15 seconds. Was a great experience. However one of them was resisting and threw the following error:

Failed to update the cross platform agent. Exit code: 1
Standard Output: Sudo path: /etc/opt/microsoft/scx/conf/sudodir/
Standard Error: sudo: no tty present and no askpass program specified
Exception Message:

That is strange, because an agent was already installed on that machine so something must have changed somehow. It needs the same rights and settings to upgrade the agent.

So we checked the /etc/sudoers file on the machine.

First we check if the requiretty line is commented out:
#Defaults requiretty

Next we check if the account we are using for the monitoring and updating has the use of a password to elevate to sudo turned off (am using a different account of course):
scom-mon ALL=(ALL) NOPASSWD: ALL

Hmmm, that is set correctly as well. Alright lets test these settings.

Login with this user through ssh. Give the command sudo bash. If it asks for a password something is wrong.
And it did ask for a password in our case.

As it turns out this settings file is read top to bottom and unlike some firewall for instance it doesnt evaluate the first match, it evaluates the last match. Scrolling down there was another line in this config file where the wheel group got sudo rights with the following setting:
%wheel ALL=(ALL) ALL
Aha, so the NOPASSWD setting was different there and because our monitoring/management account was also a member of the wheel group and this line was further down the sudoers file it got evaluated last and won.

Simply move the line with your monitoring account to below the wheel group line in this example and it will work. Simply checked by testing again.

The update of the agent went fine after this.

Happy monitoring!
Bob Cornelissen

Contact / Help. ©2014 by Bob Cornelissen. blog software.
Design & icons by N.Design Studio. Skin by Tender Feelings / Evo Factory.