Category: "SQL"

WMUG NL first webinar of the year. Monitoring SQL with SCOM and Operational Insights

SCOM, SQL, System Center, SCOM 2012, Windows 2012 Send feedback »

Want to know about Monitoring SQL with SCOM and Operational Insights?
Then you have to check in with this webinar hosted by the WMUG NL user group and presented by a fellow MVP and friend Simon Skinner. Save your calendars for 28 January 2015 at 20:00 CET (GMT+1). For more information and registering for this webinar please follow this link and do not be shy to spread the word:

http://wmug.nl/2015/01/14/wmug-nl-webinar-1-2015-deep-dive-into-monitoring-sql-server-with-system-center-and-operational-insights-hosted-by-simon-skinner/

All are welcome to join!
Bob Cornelissen

Microsoft Ignite

SCOM, SQL, Hyper-V, Exchange, System Center, Active Directory, Windows 2012 Send feedback »

Just now Microsoft announced the wave of main events of next year. The expected event which brings together several tech events like TechEd and Management Summit and Exchange/Lync/Sharepoint/Project conferences into one big event. Well here it is and it is called Microsoft Ignite. Scheduled May 4 to May 8 in Chicago. Read more about it on this page:

http://blogs.microsoft.com/blog/2014/10/16/introducing-microsoft-ignite-lineup-top-conferences-2015/

This page also lists some of the other conferences in the year like Convergence, Build and WPC.

Enjoy!
Bob Cornelissen

Case of the fast growing SCOM datawarehouse db and logs

SCOM, SQL, System Center, SCOM Tricks, SCOM 2012 Send feedback »

This is a post in continuation of my previous post on the topic of upgrading a SCOM 2012 SP1 to a SCOM 2012 R2, which went wrong and how I was fixing it. After fixing the SCOM instance everything seemed alright for half a day. The very next morning we however saw something was wrong with the SCOM datawarehouse database. Below are some of my lessons learned and some SQL stuff B) It is all a long story. So sorry. I put it all down because it was a learning experience and it just happened to unfold this way. And also here there are no screenshots available anymore so will have to go for a load of tekst.

So what happened?

First message we got was that the log file beloning to the datawarehouse database had grown to the size of the disk and the disk and log file were full. Second message was that the database was in recovery mode currently.

Investigate

So I went into SQL Management Studio on the machine and right there it said that the database was in recovery mode. Nothing else could be done with it. No reports to be run (I like the Disk Space report in SQL). No properties of the database to be opened. Looking at the eventvieuwer Application log I saw event 9002 "The transaction log for database 'OperationsManagerDW' is full due to 'ACTIVE_TRANSACTION'." and event 3619 "Could not write a checkpoint record in database OperationsManagerDW because the log is out of space". So next on the agenda was to first increase the size of the disk the log file was sitting on with 5 GB.

Log File

This prompted the log file to start growing again in its set increments and it filled the increased space right up. Hmmm :( ALright, lets do that again with 10 GB more space. And I even found an old file sitting on the disk of 5 GB which didnt belong there. SO it had 15 GB to work with.

Log file growing again. Database was out of recovery mode I thought. Great. But wait. Log file growing still. End of disk. Recovery mode turned on again! Ahhhhh.

Database in Recovery mode

When the database is recovering it will log Application log entry 3450 with a decription like this:
"Recovery of database 'OperationsManagerDW' (8) is 0%% complete (approximately 71975 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required."

It needs to get through all three phases of the recovery and in my case the 20 hour estimate ended up being close enough. So best would be to let this recovery job finish and clear my log file! Gave the disk 150 GB more space to work with. And recovery was underway for the next 20 hours and meanwhile the log was growing nicely and I could not play with the datawarehouse database in any way. So lets wait.

Recovery model

Now the thing you need to know about log files in SQL databases is there are three kinds of Recovery Model for a database. Full or Simple or BulkLogged. Open up a SQL management studio, go to a database. Right click Properties. Go to the Options tab and on the second line it states the Recovery Model. When it is set to Full the log file will fill with all transactions happening on the database and it will stay there until you make a backup of the database. The backup will flush the log file at the end and you have an empty log file again. If it is set to Simple mode the basic method is that a transaction happens on the database and it written to the database and at the end of the transaction it cleans the log file. This is the simple explanation mind you, however it is a bit more complicated as I discovered. Will come back later.
The SCOM databases are set to Simple mode Always. But because the log file was growing so fast I was starting to believe something went wrong with that. Right click properties on the database. Error. Sigh. Now it starts to be clear to me that I am a clicking kind of guy. I am used to clicking my way through options and settings in most applications. And now I needed some SQL queries.

Asking for advice!

At this point I called a friend of mine who knows all about SQL, Jacky van Hogen. For some advice on how to find out what was happening and for some guidance on how to proceed when the recovery was finished. I think I actually waited for the recovery to finish first. At that point I still could not open the properties of the database, but as we found out we could simply run queries against it :D Thank you so much for handing me some queries :idea: :>> :!:

Running checks

First a check on the recovery mode and on the status of the databases:

select * from sys.databases

Gives you a list of the database and one of the columns tells you if it is online and another one tells you recovery model. Database was now online and recovery model Simple.

Next we want to see what is going on. So directed this query against the OperationsManagerDW database:

select * from sys.dm_exec_requests

Check the sessions with an ID above 50.
You will find your own query requesting these data as well using your own session ID. I did not know this but at the top bar of your SQL Management Studio is your query session ID between brackets for each query session. Nice, so ignore that session and focus on the rest with numbers above 50.

Sure enough there were a few of those. A few were waiting for a session 117 in my case and it was an INSERT statement. Over the hours that followed I saw it go from Running to Suspended to Runnable to Running again all the time. And meanwhile the log file was still growing.

Now lets look at the oldest open transaction:

dbcc opentran()

Sure enough, there was session ID 117 and it was running since a number of hours by that time. Actually after recovery was succesful I had restarted SQL Services, hoping that would make it stop its transactions and flush the log file. Doesn't quite work that easy :-/ But at least we could see what happened.

Few things I learned during a short discussion with Jacky were:

- This oldest open transaction needs to finish its work first. Next it will clean the log file. Any jobs running alongside will also write to the log file and these will not be cleaned out. This is because there might be a chance that the currently running transaction might still need those data in case we need to stop that transaction and it will do a rollback. Aha, so there is slightly more to the Simple recovery model than I thought.
- We can kill the transaction simply with the command Kill 117. It will do a rollback of all its actions and in the end clear the log file. Or it should. This takes a while for something that has filled up over 200 GB in log space by now. However there is the biggest chance that the job will just start again and from the start and take the same amount of data again and more.
- Best thing in this case would be to give it the space it needs and let it finish its work. After that shrink the log file and clean up.

So we decided to give it some time and meanwhile keep an eye on it.

Checking the log file

She asked me to check out the log file contents.

dbcc loginfo

Now I might be saying this wrong or use the wrong terms but this basically gives you the Virtual log files within the log file (or log files) for that database. I think I will just see it as the pages in a book. You can have 1 or more log files (books) for a database. Each log file has a number of pages to fill up with data. Normally the log file writes sequentially (so it writes like in a book from beginning towards the end), but when bits and pieces get cleared out it could be that most of the log file is empty but still has some pages written at the end. This is usually the reason why sometimes you can not shrink a log file on the first try. It will clean out pages and find it can not clean one in the end so it can not make the log file smaller. Repeating it a few times makes the last change jump to the front of the log file again and the end becomes cleaned up, so we can shrink the log file. By the way we can write a checkpoint in the log by simply giving the command Checkpoint in a query.
Well in my case we were looking first at if the pages were all in use. Check the Status column. If it says 0 it is empty and if it sas 2 it is full with data. In my case most of the file was full. SO not much to gain by trying to move daat around inside the log file and shrinking it because the transaction had it all clearly in use.

Also we found there were way too many virtual log files (pages in my example) in the log file. Probably caused by the many auto-grow events. An interesting article forwarded to me by Jacky is http://blogs.msdn.com/b/saponsqlserver/archive/2012/02/22/too-many-virtual-log-files-vlfs-can-cause-slow-database-recovery.aspx

Watching the transaction do its work

Also interesting was to see how the transaction 117 went through the whole Running - Suspended - Runnable - Running status changes while running the "select * from sys.dm_exec_requests" command. This was due to the autogrow of the log file each time among others. Waiting for the file extension to be created, waiting for the disk (thats while it was suspended) and next it will go to the runnable status and waits for open threads to get processor time and jump to running status. Again this is the short and simple way of saying it I guess.

Also Jacky sent me a query to check if this transaction was using so much of the log space:

select * from sys.dm_exec_requests r
join
sys.dm_tran_database_transactions t
on t.transaction_id = r.transaction_id
and t.database_id = r.database_id

And check for the field database_transaction_log_bytes_used .
Sure enough it was transaction 117 using a few hundred GB of log file space.

Creating additional log files

Another thing which worried me was if I could keep expanding the log disk like that. There will come an end to the storage lun at some point right? So alternative would be to create additional log files for this database on other disks. Go to the database and right click to open the properties and to add a log file right? Wrong, could not open properties of the database still at this point, so had to use the TSQL again for it. I had done this once before for another customer. http://technet.microsoft.com/en-us/library/bb522469(v=sql.105).aspx
AN example perhaps:

USE master;
GO
ALTER DATABASE OperationsManagerDW
ADD LOG FILE
(
Name = OpsDW,
FILENAME = 'D:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\opsdwlog2.ldf',
SIZE = 100MB,
MAXSIZE = 10000MB,
FILEGROWTH = 100MB
);
GO

And yes in hindsight I should have ignored the autogrow setting and just made it fixed. It would be a temporary file anyway. In the end I could add space to the disk where the big log file resided anyway.

Give up your secret mister transaction

All of this was really bugging me and I was trying to figure things out as they came along. So I went out and tried to find out more about the query which was running. Our illusive number 117. What are you doing mister 117?

I found a query somewhere on the internet. Sorry, I did not record where I found it. It is an extention of the command I used before to check what it was dong. I will paste it below:

USE master
GO
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT
er.session_Id AS [Spid]
, sp.ecid
, er.start_time
, DATEDIFF(SS,er.start_time,GETDATE()) as [Age Seconds]
, sp.nt_username
, er.status
, er.wait_type
, SUBSTRING (qt.text, (er.statement_start_offset/2) + 1,
((CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2) + 1) AS [Individual Query]
, qt.text AS [Parent Query]
, sp.program_name
, sp.Hostname
, sp.nt_domain

FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt
WHERE session_Id > 50
AND session_Id NOT IN (@@SPID)
ORDER BY session_Id, ecid

Alright, so this one uses the same kinds of commands, filters out only sessions above 50 and gives some info on the query running and the parent query. Now what I saw in both cases was Alert Staging.
Now lets get back to SCOM again, because that sounds familiar!

Alert Staging

The basic way that SCOM works with this stuff is that the Agents send data to the management server. The management server inserts data into the Datawarehouse database. Next the data must be aggregated. It does this into a number of staging tables. There are staging tables for Alert, performance, event and state. Next inside the SCOM workflows there are some rules which belong to SCOM itself which kick off stored procedures in the datawarehouse. These process the data from/through the staging tables and put it in the historical tables and do some stuff with it and next clean the staging area up again.

So what this seemed to be telling me is that one of the management servers kicked off a rule which kicked off the stored procedure to handle the alerts in or through the alertstaging table. By the way I could see which of the management servers this was as well. But these kind of jobs run often and only have a rows of data to work through! Lets have a look.

SELECT count(*) from Alert.AlertStage

Uhmmm, 300 milion rows ?!?!?!

Now thats an alert storm of some kind and probably programatically. As there is no way we have that many alerts in a short amount of time. Now I know that this transaction will never get through this amount of data and what is the point? These are not normal alerts and there is no value in retaining them. Somebody wrote a nice blog post about another issue where the data from the alert staging was not written to the normal tables: http://bsuresh1.wordpress.com/2014/03/18/alert-data-is-not-being-inserted-into-scom-data-warehouse/. In my case I was not interested in the alerts, so I did not go for temporarily moving them to another table and running through them manually with that stored procedure.

Cleaning

I opted to clean out the table.

TRUNCATE TABLE Alert.AlertStage

Sit back and wait for 300 milion rows to be removed. I had the hope that once the transaction 117 realized there was no more data to process that it was done and ready and could clean up after itself and thus the log file. Guess I was not that Lucky because 2 hours afterwards it was still running. So I was done with this and killed session 117 (Kill 117). Of course this caused a rollback and a few hours later...

Next run a check on the four staging tables:
SELECT count(*) from Alert.AlertStage
SELECT count(*) from Event.EventStage
SELECT count(*) from Perf.PerformanceStage
SELECT count(*) from State.StateStage

All normal with low numbers.

It also cleaned up a few hundred GB of space in the database itself :roll:
And from this point I could finally open the properties if the database too!

So ran the task of shrinking the file. Right clik the database - task - shrink - file.
Make sure you select the log file of course! it shows the amount of available empty space inside (which was a lot). SO went ahead and shrunk it down. If it doesnt work open a query window and type the command "checkpoint" and try it again. This can be the case when there is something still writing around the end of the file. As soon as it wraps and starts writing at the start of the file the end will be clear and shrinking will work.

Concluding

So what was the cause of all this mess? I can't say, but we saw this happening within hours after the bodged update of SCOM from 2012 SP1 to 2012 R2 and UR3. And yes I did run those SQL scripts belonging to UR3 as well. It is possible that the upgrade wizard which killed my first management server and the operational database might have touched the datawarehouse as well before it even reached that step in the wizard? I do not know. Next time just go for sure and restore not only the operational database but also the datawarehouse database. Even if it is over a TB in size. All the stuff that happened in the story described above took a load of time as well.
But it was a nice learning experience on some points as well.

Thanks

Just happy that the whole system is running well again.
A big thanks to Jacky van Hogen for her advice on the SQL pieces over the phone on her lunch break! Just a few minutes of good advice and some pointers in the right direction from an expert in her field was such a big time saver and reduces stress.

It also sparked an idea, which I will get back to later.

Have fun!
Bob Cornelissen

Getting SQL information from SCOM discovered inventory

SCOM, SQL, System Center, SCOM Tricks, SCOM 2012 Send feedback »

I often get questions for getting SQL info together, such as names, instances, versions, editions and so on for all kinds of purposes. Sometimes as inventory, sometimes to find instances no longer supported, rogue instances, needed for licensing info and so on.

The first thing to understand is that SCOM is not a CMDB. There are tools like SCCM and SCSM for those kind of things. However if a SCOM agent is installed and the SQL management packs are imported they will discover the SQL component and put some info in the discovered inventory for you.

So first thing I usually do for this and other reasons is to go in the monitoring pane all the way to the top in the left hand side menu and find Discovered Inventory. Next on the right hand Actions Menu go for Change Target Type. Next find the SQL DB Engine and select it. Now you should get a list of all SQL database engines and their versions and names and lots of other information. In the case of this management pack it is also possible to go to the Microsft SQL Server management pack folder to the left hand side and expand the server roles folder and select a state view, such as for database engine. It has the same information (could be you use the Personalize View actions item to add columns you are interested in). Keep in mind that the SQL DB Engine is not the only possible SQL component which can be installed. There is also Reporting Services for instance which is very common. The state views here are nice and fast to find your instances of those as well.

Now, lets pull this info into a CSV file using the Operations Manager Shell (these are two lines, enter as separate commands, and note these are SCOM 2012 commands):

$MyDevices = get-scomclass -Displayname "SQL DB Engine" | get-scomclassinstance

$MyDevices | select @{Label="Computer";Expression= {$_.'[Microsoft.Windows.Computer].PrincipalName'}}, @{Label="Instance";Expression= {$_.'[Microsoft.SQLServer.ServerRole].InstanceName'}}, @{Label="ConnectionString";Expression= {$_.'[Microsoft.SQLServer.DBEngine].ConnectionString'}}, @{Label="Version";Expression= {$_.'[Microsoft.SQLServer.DBEngine].Version'}}, @{Label="Edition";Expression= {$_.'[Microsoft.SQLServer.DBEngine].Edition'}} | Export-CSV -notype C:\sqlinstances.txt

And Voila you have a text file with the required info. What happened is that we are looking for a class called SQL DB Engine and we pull in all instances of that class. Next we select for each DB engine the ComputerName (you could have used Path as well there), Instance Name, Connection string, SQL version (as a number) and SQL edition (Standard/Enterprise/Express). Throw the CSV file into Excel and you will have the data in clear format.

This basically works the same way as in a post I did earlier about how to get devices (network device, windows agents, unix/linux agents) out of SCOM through PowerShell.

You can go deeper for instance by trying to find only instances of a certain version or edition and to sort the output. It is very versatile.

Enjoy!
Bob Cornelissen

SQL File Disk Performance Management Pack for SCOM

SCOM, SQL, System Center, SCOM 2012 Send feedback »

Purpose

This management pack brings together I/O statistics for SQL data files and log files directly from the SQL virtual file stats view into SCOM where you can report on these for any period of time. It can collect data from SQL 2005, 2008 and 2012 data files.

Introduction

Starting SQL 2005 SQL DBA’s have the possibility to use Dynamic Management Views and Functions to monitor the health of a server instance, diagnose problems, and tune performance. This management pack is specifically geared towards the sys.dm_io_virtual_file_stats view, which returns I/O statistics for data and log files. The view is used by DBA’s to view current counter values for each database file (data or log). It is a current point in time and these values are cumulative values since the last restart of the server. This management pack collects these values for use in viewing and reporting. There are no health state changes or alerts connected to these values. There are short time performance views and two reports with double Y axis to display longer term data. The long term data is translated from cumulative data into amongst others IOPS for each point in time.

Files
You can find all management pack files, optional override management packs and the management pack guide explaining it all in the zip file attached in the TechNet Gallery page. The current version of this management pack is 1.0.2.48 and released on the second of June 2013.

General

This Management Pack is provided free of charge by the authors to the System Center and SQL community.

Written by Bob Cornelissen, David Scheltens, and Pavel Dzemyantsau

The Management pack can be found here:
SQL File Disk Performance Management Pack for SCOM.

Enjoy!

New SQL management pack released

SCOM, SQL, System Center, Active Directory, SCOM 2012 Send feedback »

Yesterday a new version of the SQL management pack for SCOM was released. Yes we are now on a roll with releasing new stuff! The fresh version of the SQL mp is now 6.3.173.0.
The new version supports also monitoring of SQL 2012 now and will work for SCOM 2007 and 2012.

AN excerpt of some of the new stuff added taken directly from the mp downlaod page:

SQL MP version changes

Looks like a good update to the MP B)

You can download the new SQL management pack version 6.3.173.0 here:
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=10631

More new downloads to come!
Bob Cornelissen

ReportServer application pool stopped with unspecified error and keeps crashing giving 1057 and 1059 errors

SCOM, SQL, Active Directory Send feedback »

Today ran into a SCOM Report Server that would not load correctly on a Windows 2003 with SQL 2005 version.
This also gives an error in SCOM console as follows:
Data Warehouse failed to request a list of management packs from SQL RS server..
And in the description a reference to:
The request failed with HTTP status 503: Service Unavailable.
When connecting to http://reportserver/reports it just displays a Service Unavailable message.

We quickly enough saw that the ReportServer application pool in the report server was not running. When starting the application pool from the IIS manager it start correctly. However when the first connection was made to http://reportserver/reports or /reportserver we got an error and the application pool had been stopped. The application pool was Stopped and had unspecified error when looking at it from the IIS manager.

We got the following two errors in the event viewer on the report server in the System log.



We tried to reset the password on the application pool. Did not work. We tried to set the account back to network service in the Report Server configuration tool and after a restart of the reportserver set it back. We confirmed we were using the right password.

When looking through the local rights I noticed the account was not in the IIS_WPG group on that machine.
Added the account we were running the application pool as to that group.
Got error again.
Did an IISRESET from the command line.
Success! B)

So this solved my issue and we could move on to the next step. The SCOM alert auto closed right after this. Good times!

By the way:
I did find a link to a solution in the case you are running many application pools on one web server (about 60 or more). It tells you to create a regkey UseSharedWPDesktop and set it as stated in this document:
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/d779ee4e-5cd1-4159-b098-66c10c5a3314.mspx?mfr=true
This did not apply to our environment as it was a dedicated machine.

Enjoy!
Bob Cornelissen

SCOM Web Console Error 404 and Unexpected error

SCOM, SQL Send feedback »

This is an old one as well, but we still see it happening (including in my case). Actually these are two cases.

First had an 404 error when I opened the SCOM Web Console for the first time. Well this links back directly to my previous post where the right ASP.NET versin was not linked to the website. See this post: SQL 2005 Reporting Services http error 404

Alright the next error I got was the following with a nice red cross:
Unexpected error
There was an error displaying the page you requested.
Try the following:
Restart the web browser
Refresh the page

You can see a screenshot of that one here:

I had seen this one before as well and forgot to blog about it. I found it in my notes, without a reference, but that doesn't matter.

Edit the Web.Config file in "C:\Program Files\System Center Operations Manager 2007\Web Console\Web.config"

Find and change the following entry:

Code

<authentication mode="Forms">
  <forms requireSSL="true" />
</authentication>

to

Code

<authentication mode="Forms">
  <forms requireSSL="false" />
</authentication>

This is a screenshot of the file with the old entry in it:

So after changing it and restarting the Web Console website we got a working console again.
End result is a login screen:

Good luck if you run into one of these errors. Hope this helps.
Bob Cornelissen

SQL 2005 Reporting Services http error 404

SCOM, SQL, Active Directory Send feedback »

Had a small issue with SQL 2005 Reporting Services on Windows 2003 R2 in a quick quick test environment I was setting up for SCOM. When trying to open http://servername/reports I got an error "HTTP Error 404 - File or Directory not found". I knew I had seen this one before sometime, but could not remember. So I will blog it here as a reminder. In my case the solution was simple and worked.

After installing SQL Reporting Services I went to http://servername/reportserver and this worked. After I went to http://servername/reports I got an error like "HTTP Error 404 - File or Directory not found"". Restarted the box. Didn't work. Checked config of SSRS. Seemd fine. After some scrolling around I found it again.

  • Open IIS manager.
  • Go to the properties of the website.
  • Go to the ASP.NET tab.
  • Select the correct version of ASP.NET in the first dropdown box (in this case it was 2.0.50727).

In my case it was working!

Good luck!
Bob Cornelissen

Remotely testing SQL connections

SQL, System Center 1 feedback »

One nice gem from Steve Rachui's blog is a trick on how to simply troubleshoot remote SQL connections.

Just create an empty txt file anywhere on the disk and give it an UDL extension and click it. You will see a tool popping up with several tabs. on the first tab you will find SQL for instance, but also other types of data connections. On the second tab you can enter info like server name and database name and authentication needs.

Nice! Very very nice!

Bob Cornelissen

Contact / Help. ©2017 by Bob Cornelissen. multiple blogs.
Design & icons by N.Design Studio. Skin by Tender Feelings / Evofactory.