This is a continuation of my notes on the fifth SCOM 2012 CEP meeting on Application Performance Monitoring. This is a two part post. You can find the first post over here.
The discovery is done through the IIS discovery first. the IIS 7 mp discovers the ASP.NET inventory.
It is possible to do bulk configuration of applications with similar settings.
Slow requests raise performance events. Code exceptions raise exception events.
Synthetic Transactions are pre-recorded paths that let you know about availability and validates code success paths. With real transactions (APM) you are looking at why a transaction fails on a site that is running. SO to get the best picture you would need both components, as APM only works when the site is working. The server side monitoring from the IIS mp together with synthetic transactions will tell you about availability of your site.
Dinnernow demo app is available from codeplex.
We were shown an example of an error. AM going to post some screenshots here although they don't show it as well as having it full screen.
when clicking that link you get a link through to the APM part.
This gives a lot of information.
For instance the next picture shows a breakdown of one specific call.
You can see that it might be the SQL query behind this call that took a longer time and what the query was. There are performance counters of the system. There are packets of 15 minutes performance counters are linked to an event.
There is a link you can use to email the application guys to show them the event and contents as a URL.
If a developer has visual studio installed the SCOM will give a way to open visual studio and go to the code (if you have access) and highlight the part where something happened.
There will be more documentation and glossary of controls and such.
We got a nice example of stats in the advisor and we saw the application diagnostics view. Developers like this view apparently.
How to get started
steps to start in a few steps:
- Install IIS 7 mp.
- Run .Net APM template and use defaults.
- Select applications to monitor
- Set target computer group (optional)
- Set SLA threshold
- Restart IIS when the alert to do that comes into SCOM.
It is possible to monitor Java based applications. This does not used the interceptor but uses the Java extension.
Alright this was the end of my notes on the meeting.
For the next meeting we will have to have some patience. About 4 weeks of it actually. Next meeting: 1 November and it is on Network Monitoring.
One of the great new features integrated into the product, so worth watching!
Today was the fifth SCOM 2012 CEP meeting, talking about Application Performance Management with Michael Guthrie presenting.
I will split this post into two parts as well.
This is part 1 and you can find part two over here.
Framework for .net monitoring.
This is a screen from the slides on the subject:
This is different from an MP based module.
Goal is to bridge the gap between Operations and Development. Give more information that the application team can take action on. There will be no noticable impact on the throughput of the application itself. There is a max 5% CPU tax on the managed server. There is support for ASP.NET and WCF hosted on IIS 7.
Advantage will be that there is no need to adjust the code for the .Net application. The code is attached by APM to the IIS.
This is essentially AviCode, but integrated in the new product.
Avicode 5.7 templates will continue to work after upgrading to SCOM 2012. It does not upgrade the bits.
All the consoles are installed at once. So also diagnostics and advisor console.
When doing a push install of the agent the bits will be deployed as well, but the agent will be disabled until you run your first application monitoring for that server (same as the ACS agent).
The Avicode database has been merged into the operations manager database. What used to be in the Avicode Advisor is now in the datawarehouse.
Server- and Client-side monitoring.
The server side is doing the monitoring from the system itself. The client side monitoring is monitoring the performance from a browser client side perspective. This is the last mile. This will see browser side errors which can not be seen from the server side. It also checks for performance counters such as how long it took to download and render a picture. Together this gives an end-to-end visibility on application performance and reliability.
This is the flow of the server side monitoring.
From OM side the configuration gets pushed to a configuration file. This gets loaded by the APM Service. APM service turns itself on and loads the interceptors in IIS (this needs a recycle of IIS). The interceptor times requests and watches for exceptions. Processing of the data coming from IIS is offloaded into the APM service. The APM service passes things on to the Health Service.
This was about half time through the session. Read more on the second half in part 2 of this post. You can find it here.
This morning I talked to my good friend and SCOM guru Marnix Wolf about the new Base OS management pack version 6.0.6956.0 that was just released a few days ago. Same as in his case I was writing up a blog post about the mp, when I started noticing some issues with it and when looking over at Kevin Holmans blog about the subject we could see the issues streaming in. Seems we were not the only ones!
Some of these issues are very easily fixed and should be fixed with lightning speed. One of those is a spelling mistake in the MP on a monitor that is very noisy and everybody has an override on it. And now because of the spelling mistake everybody needs to dive into the XML and try to manually fix things. And possibly at the next update to the MP do the same again!
So first of all we want to call in your help to get a quick fix to the issues in this MP. The basis of the updates to the MP are great and the thoughts behind it are great! However these issues need to be fixed, because a lot of people will be hurt by it.
So friends from the product team, please work together and bring out a quick new version of this MP!
Please join the community in voting on the bug that was opened on Connect over here.
Find the blog posts relating to the subject here:
Post from Marnix Wolf
More Than One: Let’s ask for a new improved Server OS MP! Yes We Can!
And again please help by voting on the bug report on Connect
Again, this is not a flame or something like that, just an initiative to get a fixed and working MP very soon with all the great new stuff that was added and without at least most of the issues! This is to make the acceptance and workings of the product and the benefits people have from it maximized!
Yes we can!!
Edit 5 October:
I am writing an addition here, because some people seemed to have misinterpreted my comments. Our posts are not about bashing a product team. Certainly not. This product team has actually made an updated management pack (which is great), and has done a lot of good work on it and they have taken into account a lot of community feedback. However there a re a number of issues we are seeing now that should be fixed on short notice, so as not to discredit the great work they did. That is why we identify the issues and bring the to attention, with the strong request to make a minor bug fix update to the management pack and release that one to the public.
So some of the things that have been seen and my remarks:
- Due to a spelling mistake in a disk monitor there are many problems with the import due to people having previous overrides on it (I am betting a high percentage for this specific monitor). This is a very simple edit to change back to the right value.
- Report execution might fail because of lacking proper security settings on a Stored Procedure. This should also be fixed as having people edit rights inside of the databases is not a good thing.
- Knowledge is out of date for the new default values in the free space monitors. A few simple edits and this is done. Not a biggie, but just a finishing touch. Dont forget this is one of the monitors that at least I get the most customer questions about on settings and default values.
- The BPA monitors can be noisy for Server 2008R2 systems. This has two reasons in my opinion, with only one actionable by this management pack. FIrst of all, I would request these to be disabled by default and have written in the MP guide how to enable them if needed, as it feels like the disk fragmented monitor all over again (which caused a lot of noise and angry comments from customers). Second there is an external cause (which is not the fault of this product team, they are just reporting on issues seen by BPA), and that seems to be at least from the WSUS BPA. This WSUS BPA and possibly a few others has a bug in it, that causes it to whish you had a WSUS server installed on every server you have and complains if you don't . This is something to be adjusted by the BPA or WSUS team and others involved.
- The ‘performance by utilization’ report section dealing with Logical Disk % Idle time is upside down: the lowest idle time values are on top (100% Idle time is the lowest) and the highest idle time (anything close to 10% or even less) are on the bottom. Well, this is easy to see and evaluate and change in the report by the product team and post the next version in the mp.
- Heard some issues with the reports failing to run or failing to load into the reportserver. I guess this is a matter of getting it replicated in a test environment and finding the possible issues and seeing if this needs to be fixed in the mp or with a workaround somehow. However there are a good number of people seeming to have issues with this and even after doing the rights change on the stored procedures (workaround of one of the other issues above) they still have the issue.
- Some other things I have heard I deem not yet confirmed, perhaps these belong to the mention in mp guide story later or perhaps these are on-offs.
As you see a few items that are easy to fix and a few things that require a bit of SQL (the rights on the SP and deploying report). I would strongly suggest that it would not be needed to wait for 6 to 12 months for a next version to fix the issues (and probably give need to workaround issue number 1 in the list above again!).
This way everybody will be able to enjoy the new and great features of this mp (added CSV monitoring, added BPA support (if needed!), added very nice/fancy and above all very usefull reports. And a lot less noise and performance collection (while having the option to enable those again if you need it). Now thats great!
So it is meant as a help from the community to the MS product group(s) to get the maximum value out of these MP's and to help and support frequent releases of updates to management packs for the products that give more value, without making customers complain. And this is especially the case for product groups who are actually bringing (new) versions of management packs for their products with a certain frequency.
Hope this clears up my point.
Just to inform you that the next SCOM 2012 CEP meeting is tomorrow and is on the subject of Application monitoring. Application monitoring is a monitoring layer that is scary to dive into, because for many non-Microsoft products there are no management packs and certainly not for custom developed applications. In many cases however a bit of searching and a few custom monitors are enough to catch those and when using Distributed Applications to capture the bits and pieces into an application view usually makes things a lot more clear. Some nice dashboards also will do the trick. For .Net applications which would normally be sitting on front-end web servers and connecting to back-end SQL servers there is a great solution called AviCode. This product has been bought by Microsoft and is now being integrated into SCOM 2012 and that is great news! It can do some fancy stuff that dives into the application layer where most system admins who are on the network and infrastructure layer never go. You will enjoy this if you need to monitor certain .Net apps and you need to point out the pain points to the application managers.
Here is an excerpt from the announcement of this session:
Applications are at the heart of all companies and IT is increasingly being tasked with not only keeping the servers running, but ensuring the applications being hosted on those servers are healthy. Virtualization and VM density has made it critical for IT to understand the impact of changes in the environment to the applications being hosted and to quickly isolate issues and address them before the small incidents grow into major problems. In this session we will look at how Operations Manager, the tool you use to manager your infrastructure, can also help you manage your applications.
CEP Theme 5 Virtual Chalk Talk
Tuesday, October 4, 2011
8:00 – 9:30 AM PDT
Hope to see you there!
SCOM - Find New alerts which have been open for a certain amount of time and change Resolution StateSCOM Send feedback »
For a script I was creating in order to classify new alerts in SCOM, I found myself wanting to find alerts that had been in a "New" state for over 30 minutes and give it a specific Resolution State.
So just started out creating a search string in PowerShell until I got the logic of the command, but still missing some specifics, which I went out to find as I expected some time issues (how to tell it that an alert is older than 30 minutes). I don't know enough PowerShell to do the fancy stuff yet, although I will be working on it
After some searching I found an entry by Scott Garret at BlackOps, which I could use to adjust what I was looking for. Happy he found the route through all the brackets
Most importantly I changed the part where it doesn't look for LastModified, but for the last time the resolution state of the alert was changed. And in my case it needed to be more than a certain number of minutes. Also had to play with the single and double quotes as copy/paste always gives some headache with those things.
So in the end this is a short one that loops through all New alerts in SCOM and finds alerts who have been in the New state for over 30 minutes and sets the resolution state to something else (just took a random number which I could use later in the followup).
So the important part is this part of the command. This will output the alerts when you run it in the OpsMgr Shell.
This gets the alerts, with two criteria. First is that it needs to be a New alert (resolutionstate="0") and the last change in the resolution state must be more than 30 minutes ago (this is the created time by the way or if somebody really wanted to manually place an alert back into New state).
Want to see the alerts and just have it output the last time the resolution state changed?
Keep in mind you are seeing UTC time here and not your local time.
By the way there is a way to just this in a one-liner, but there was another need in the script to catch other things for the classification, so I just used this method. If you wanted to close those alerts based on this get-alert search string you can just pipe it into | resolve-alert and you are done with it.
Perhaps somebody else find a use for this as well sometime If so, good luck!
This is the second time I see this happening in a few weeks, so time to post it out again.
What happened is that we approved a new SCOM gateway server and installed the gateway. All good and without issues. However we did not see any health state change for this gateway and no state on the Windows component and so on for this machine. No state at all actually. Additional info with this is that the gateway was in the same domain and no certificates were used (or needed). The SCOM version was 2007 R2 CU4.
First of all went to the gateway and checked the event logs. There were a lot of the following error notifications that the management server on the other side refuses the connection. First did a restart of the system center management service in order to see a clean start of the service and what happens. Same issue.
Last time I could not really find out what had happened that we saw this behavior, but I did find the workaround.
And here it is:
- Stop the System Center Management service on the gateway machine.
- Open the registry editor and go to the following path (using your own managementgroupID of course):
HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Server Management Groups\MymgmtgroupID\Parent Health Services\0
- Change the values of AuthentiocationName and NetworkName from the current management server to another management server.
- Restart the System Center Management Service on the gateway machine
- Check the operations manager eventlog on the gateway server if connection is established.
- Let this run for some time and the state will appear in SCOM and it will be monitoring
If you like you can use the same steps to change the entries back to the management server you wanted it to talk to. In my case this worked without complaining and it kept monitoring and communicating through the management server that originally refused connection.
This worked for me two times now. Hope it helps you too if this happens to you. I know that the first time I have been looking for quite some time and the second time when remembering the workaround it was just minutes.
ReportServer application pool stopped with unspecified error and keeps crashing giving 1057 and 1059 errorsSCOM, SQL, Active Directory Send feedback »
Today ran into a SCOM Report Server that would not load correctly on a Windows 2003 with SQL 2005 version.
This also gives an error in SCOM console as follows:
Data Warehouse failed to request a list of management packs from SQL RS server..
And in the description a reference to:
The request failed with HTTP status 503: Service Unavailable.
When connecting to http://reportserver/reports it just displays a Service Unavailable message.
We quickly enough saw that the ReportServer application pool in the report server was not running. When starting the application pool from the IIS manager it start correctly. However when the first connection was made to http://reportserver/reports or /reportserver we got an error and the application pool had been stopped. The application pool was Stopped and had unspecified error when looking at it from the IIS manager.
We got the following two errors in the event viewer on the report server in the System log.
We tried to reset the password on the application pool. Did not work. We tried to set the account back to network service in the Report Server configuration tool and after a restart of the reportserver set it back. We confirmed we were using the right password.
When looking through the local rights I noticed the account was not in the IIS_WPG group on that machine.
Added the account we were running the application pool as to that group.
Got error again.
Did an IISRESET from the command line.
So this solved my issue and we could move on to the next step. The SCOM alert auto closed right after this. Good times!
By the way:
I did find a link to a solution in the case you are running many application pools on one web server (about 60 or more). It tells you to create a regkey UseSharedWPDesktop and set it as stated in this document:
This did not apply to our environment as it was a dedicated machine.
I have seen this one in the forums as well once and have had this happen to myself a few weeks ago, but forgot to post about it as I would get back to the topic later. So now is the time.
What is the situation?
I was creating SCOM reports from the SCOM Reporting pane in a development environment. These were simple reports based on the generic reports included. Because I had to move the reports to two other environments later I decided to use the option to save the report in a Management Pack right from the console. I used a separate clean management pack for this, so we can update and upgrade this MP easily. Works great! Until I moved it into another environment and tried to run the report and found out it did not want to run. What happens is that the report does not want to run because some fields have not been defined (like aggregation for hourly of daily in this case), but those fields are greyed out! So there is no way to change those fields in the console.
What does the error look like?
We can see that certain fields are grey in the report parameters and that it blocks you from continuing to run the report. You see the two grey areas pointed out in the picture by blue arrows?
The error we get when trying the Run button is:
A valid value for parameter ‘Data Aggregation’ must be specified.
Where did this come from?
It was time to investigate. As at first I used a dedicated management pack and had one report in it, it was best to open the XML itself and check what was there.
Soon I saw there is a reference to the management group there. This reference will not work in another environment! The reference to the group I was targeting against did not seem as a problem, because I assumed (as a prerequisite) that those groups would exist in the other management group as well. This is because I used groups that come from sealed mp’s from Microsoft (like All Windows Computers and such groups).
Alright so in the XML I see the report and there are a few parameters included there. One of those was the following:
In this case it was as easy as removing this specific parameter from the XML and loading it into SCOM again.
Once this was done and refreshing the reporting pane in SCOM after a few minutes the report when opened just ran without input (those two fields that were greyed out were actually already defined, but didn’t get picked up due to the problem with the reference to the ManagementGroupId.
If you ever run into this, just check the XML to see if there is anything specific blocking your run. The reference to the management group ID was the most likely cause and turned out to be the one that caused the issue.
This is a continuation of my notes on the fourth SCOM 2012 CEP meeting in Dashboards. The first post on this can be found here.
Ake Pettersson followed with demonstrations of out of the box dashboards that are upcoming or already there.
Three main out of the box dashboards:
- network monitoring dashboards
- management group health dashboard
- service level dashboard
The SCOM team is working with 16 internal product groups to extend the existing MP’s with dashboards. They will be aiming for consistency in UI Design and Styling best practices, also by using templates. Those dashboards will come with management packs (through the catalog).
There will be a top-n widget. Will be very useful in many views.
In a concept picture for a SQL dashboard there was a sample of this. Actually I will post screens of these samples right here. Keep in mind these are just samples and not the real thing yet. Things could look a lot different after these are actually created. Nice to see work is being done in that area.
Next was a demo of a Network summary dashboard. This shows the devices with highest load on top, so it can sort and the ones that need the most attention float to the top of the lists.
The vicinity view is shown for the network node dashboard. Vicinity view is able to also show computers connected to the network devices. This is a long requested view.
The average availability for last day/week/month is shown for a device and some performance graphs. Graphically there is a lot more to see now which is good.
Ake shows the SCOM Management Group Health dashboard, pointing to Resource Pools, SCOM Infrastructure and some active alerts that relate to SCOM itself, Agent configuration and agent versions. Here is another picture to illustrate part of that view:
There will be things added and changed on this dashboard as well in the coming weeks.
Management group health trend dashboard. Shows trends in for instance active alerts over time and agent health states over time. Good to see alert storms during the past few hours or days and also to see how many grey agents you are having at the moment and keep track of those trends.
Some points from the questions session:
- For the SharePoint web parts, read through the documentation very well when installing.
- It is verified that the SLA dashboard does not have the former limitation of 6 objects per SLA and/or SLO.
The next meeting will be on 4 October, same time same place. This one will be about Application Monitoring (so Avicode integration in SCOM 2012). Looking forward to it!
These are some of my notes from the fourth SCOM 2012 CEP meeting on Dashboards. As these are quick notes it is not a beautifully formatted story, but highlights some things that are worth mentioning. So let’s go with the one-liners. I threw in some screenshots as well, hope you don't mind
This time the presenters were Dale Koetke and Ake Pettersson.
To start off Dale Koetke explained what has happened on the dashboarding front.
Dashboards will be more efficient through a dashboard framework. This can be extended for custom management packs.
Dashboards are available directly through SharePoint to make it easier accessible to non-operator users.
The UI architecture and console architecture can be best shown through pictures I think, so here are screens of those two slides that explain how the console are setup:
Three consoles - same dashboard. So Windows GUI + Web console + SharePoint dashboard will show the same experience. Of course this is for the monitoring pane. Administration is still done from the Windows GUI.
So dashboards framework consists of a few layers. I think again a picture would better explain:
So the Dashboard host -> dashboard template. The template hosts a number of widgets. For instance you can have a three part template to show three widgets. A widget is an UI Control (chart controls for instance) and a Data Provider (to get data from the database).
Out of the box dashboards will come from the authors of management packs. Some will be coming soon, such as dashboards for SQL, Hyper-V etc.
Custom dashboards, can be created by yourself. Select KPIs, specify the scope (application for instance), tune visualization. Target different dashboards to sets of users. Where you want to store the dashboard, Select a template (specifies the layout). Add widgets.
The demos were shown in a build of the upcoming RC. Good to get used to the layout and colors.
In alert views and state views it is now possible to select objects to scope the alert view to. This is new functionality. No need to create groups anymore just for scoping views. This is a very good thing actually.
It is possible to create dashboards and add widgets from the web console now. This is a good thing to see that the web client is getting more functional.
Possible to personalize views through the web console as well.
There was a question if personalization would be carried to the dashboard in SharePoint as well. It depends on how the SharePoint web part is configured if those follow through to SharePoint. If it uses the current user credentials option it will carry them forward. If it is configured for shared credentials the personalizations are not following through.
In this release there is no support for extensibility in widgets for third parties. A few of the participants would have loved to see the possibility to use maps for instance, or if some third party would create them for us (for instance like Savision Live Maps, google-like maps and so on). However creating widgets by others is not supported.
This post will be continued in a second part. You can find it here.