Alright -> I have my own test environment in another part of the customers network and I also have a few Red Hat machines in there. With everything working! So lets try the same kind of command there. Guess what? Error! But a different error!
Apparently the winrm command can not handle an ampersand (& in the password . Getting impatient so we changed the password of that account on the linux box and in the winrm command of course and tried again. Bingo. Output by the way looks a bit like this (it is longer, I just left out a lot of additional lines with info and I changed the server name):
Back to the other test environment again. Time to ask the man who manages the ISA firewall to check if he is seeing traffic passing through. Yes, he does see the some traffic on one of the firewalls, where it does not belong. Remember this is a network with several possible exits (gateways, firewalls). He told us the traffic was not going to the default gateway, but towards the wrong ISA server.
Alright, add the IP addresses and FQDN names of the remote Red Hat machines to the proxy configuration in Internet explorer and re-run the NetSH command again.
Now we got connection with the command and we found the traffic in tcpdump as well.
Networking networking networking. Told you so!
In the meantime we already manually removed the installed agent from the Red Hat box and removed the signed certificate. We were actually installing for two machines at this point – a RH4 and a RH5 box. On the RH5 box we have just done the changes to the scoma account to enable it to su and sudo. At this point there was a discussion whether the Red Hat box needed to have a full DNS name. Of course it does as the discovery wizard forces itself to use it and checks the certificates on that basis. But to prove it we just went ahead and tried to push to both machines.
Now the discovery wizard run did progress. Of course the agent was not installed (anymore) so this time it was right that it wanted to continue installing the agent. Installed the agent and validated. Yes, the next step did say it wanted to sign the certificate for the RH4 server and did not want to talk about the RH5 server as it was not using an FQDN in its self signed certificate.
If you want to check this you can do the following. If you copy the self signed .pem certificate file from the cross platform machine to the management server and you rename it to a .cer file you are able to open it like any other certificate and check it. In this case it was obvious that the RH5 machine did not use an FQDN name for itself, but a short name. By the way also after cross signing between the servers you can also use this trick to see what this double signing is about.
So we needed to give the RH5 box a DNS name. We went to “/etc/sysconfig/network-scripts/ifcfg-eth0” and entered a domain name there and restarted the server.
We also deleted the rpm again and the remaining files (including the wrong certificate file) and restarted the discovery wizard. This worked and we had the server in SCOM! So this confirmed that a FQDN name is needed.
- Networking is very important. Routing, firewalls, perhaps proxy settings or lack thereof, as long as it works. We got to be able to connect first.
- DNS is important. Make sure resolving works and that the cross plat machine also has an FQDN name. Either the machines can not find each other or the certificate process will break if this is not alright.
- Use the tools if something is going wrong. Debugview, enableopsmgrmodulelogging.
- Use putty for instance to connect in order to check logon and elevation process.
- Pre-requisite software on cross platform machines. Make sure you have it covered.
- Use the latest update (especially cumulative update) for the cross platform components and the latest management packs and make sure they are consistent for all management servers where the components are used.
- Winrm command can also help you to troubleshoot, although it can create new issues, like formatting of the command after copy/paste actions and possible characters in the password that it does not like.
So, I think we have touched every part of the diagram so graciously provided by Robert Hearn in some way. Please make sure to check out Roberts troubleshooting series as well (already linked a few times in here).
Happy cross platform monitoring!
This is a continuation of the previous part 1 of this blog post.
So on to the troubleshooting! First enable the logging as discussed in a previous post: http://www.bictt.com/blogs/bictt.php/2010/02/22/scom-discovery-wizard-error-while-deploying-redhat-agent by creating the EnableOpsmgrModuleLogging file in C:\Windows\Temp and downloading DebugView (a tool from Sysinternals). Also check out this page for tooling to use for troubleshooting: http://blogs.msdn.com/b/scxplat/archive/2010/06/10/troubleshooting-cross-platform-discovery-and-agent-installation-part-1.aspx ; Rob Hearn does a very good job of explaining things as well here! Unfortunately the debugview and the additional logging did not always give enough information and we had to use some additional tricks and the logic provided in the next section. But if you run into any issues please use these first as they always show you what it is doing in the background so you can follow the steps and errors.
So at this point it is necessary to use a diagram in order to understand the discovery process better. I will use the diagram posted by my friend Robert Hearn on his blog http://blogs.msdn.com/b/scxplat/archive/2010/06/10/troubleshooting-cross-platform-discovery-and-agent-installation-part-1.aspx. I got permission to use it on this blog for explanations.
So moving through the flow chart:
- Get list of supported agents and agent packages -> check; we have ran the cross plat CU2 and checked the files were there; the latest versions of management packs for the operating systems we wanted to monitor were imported.
- IP DNS resolving –> check; forced through hosts files on both sides
- Connect to CIM provider -> Aha, so there was our first red flag. Remember it found that the agent needed to be installed? Means this step had a problem. We will test this using the winrm command. But lets continue through the diagram the way we were following the procedure until now.
- Next it will try to push the getosversion.sh file to the other server and run it. If this is going wrong you would errors as mentioned in a previous post (for instance if sftp is not enabled in the ssh config) http://www.bictt.com/blogs/bictt.php/2010/02/22/scom-discovery-wizard-error-while-deploying-redhat-agent . If this has ran successfully you would see the version of the discovered operating system mentioned in the discovery wizard and if it knows it has an appropriate agent installer for it, it will tell you it can Install and discover for you. We had this message.
- The rpm file will be deployed and installed and validated. So at the end of this step we seemed to get an error. Hard to say if it is the end of this step or the beginning of the next step of course, but we expected it to be the end of this step. So what does it do to validate the installation? Perhaps it will run a query? If so this is an indication and we found it was possibly linked to the connect to CIM provider step before that went wrong as well.
- After this step it will try to check for a certificate and if it is signed and correct names are used and if it is not -> try to sign it. Well we did not get to that stage yet. We ran into this one as well later. Hold on, we will get to that. We did get the error that the name on the certificate does not match the name of the machine. FQDN name is important here as we will see later.
- Next it would try to establish the agent version installed on the remote machine. If it is OK the discovery wizard will accept the agent.
Alright, lets get back to the connect to CIM provider step. There is a way to use winrm to launch a query to the remote machine and check what the answer is.
So I went out and found some examples of queries to run. This one asks for the operatingsystem version and sends the answer back in XML format (remember the elevated command prompt in Windows 2008):
So what this seems to do is connect to schemas.microsoft.com and get the definition of a query. It executes this against the red hat machine across a secured connection (https) on port 1270 and uses the username and password combination of the scoma account in this case. It also skips the certificate check (as we know we do not have a counter-signed and trusted certificate yet) and we force it into UTF8 encoding and pretty xml formatting (better for us humans to read).
At first we had some problems with this command and almost gave up. It was talking about the format of the winrm command and that it was not correct. In the end we decided to manually type the command in stead of copy/paste and that worked! Now we got the following error:
So the start of the error is what we already got before in the validating step of the agent install! Unable to parse XML
First I did not see it, but when looking more closely this is just an HTML error page with formatting. So we were running a command that seems to connect to two locations (Microsoft and the remote agent). One of both was going wrong.
So we go to the Linux box and try to check what is talking to port 1270 on that box. In our case go into privileged mode and use “tcpdump port 1270” and see what happens.
Ran the command again -> nothing in the tcpdump!
So it is not talking to the Microsoft site? Strange, we can surf in Internet explorer when using proxy settings. Wait a second – the command prompt might not use these IE settings. Alright, lets pick up these settings with proxcfg. Hmmm, doesn’t know that command. Ahh right, Windows 2008 box -> “NetSH WinHTTP import Proxy ie” (for more info, see Rob’s blog post at http://wmug.co.uk/blogs/r0b/archive/2010/01/08/proxycfg-on-vista-and-win2008.aspx ).
Try again -> failed again.
Last week I was working with a customer who was doing a proof of concept for cross platform monitoring and some other SCOM functions. They were using an existing SCOM test environment and wanted to add a few machines for cross platform monitoring. However these machines were not in the same network as the SCOM machines. Due to this setup we already suspected that we might be testing networking more than testing SCOM. And this turned out to be the case. We got some strange errors that we have not seen before. I will try to cover some of the things we found. The story is not completely in time chronological order. Also as it seems to be a long story I will split it into three parts.
So from experience we know a few things are important when looking at cross platform monitoring:
- DNS name resolving, both directions
- Certificates (mostly in relationship to DNS)
- Accounts with rights on the cross platform system, make sure they can logon and do the same stuff the discovery wizard and monitoring wizards do
- Use the latest SCOM cross plat CU update and management packs and make sure you are using the right installers when manually deploying agents
- Make sure SSH and sftp work
- Pre-requisite software on cross platform machines
- Define runas accounts and place them in the runas profiles
So the first thing we had to get to was networking. In our case the SCOM test environment was separated from other networks by several firewall/routing devices. And the cross platform machines we were to get access to are located in networks at least a few hops away. So routing and firewall ports were important. We were promised a few Red Hat machines, an AIX box and a HP-UX machine and all were located in different networks. Actually this did reflect a reality for this company as it is a service provider, monitoring several customers, without direct network contact and trusts.
First of all of course make sure your routing works the right way. Second, firewall ports need to be opened between the machines. In our case TCP 1270 and TCP 22 were to be opened. This was difficult at first as port 22 was at first refused by the security team.
From the SCOM management server(s) we tried to do a telnet to both ports to check if they were accessible. In Windows 2008 you would first need to install the feature “Telnet Client” if you want to use telnet to troubleshoot connections. We will go into further connection testing later in this article.
Because it took some time to get port 22 open we started out on one of the Red Hat machines to manually install the cross platform agent for Red Hat (check you are using the right one; version of OS, version of the agent and type of architecture). So at first the wrong version of agent installer file was used as the latest SCOM cross plat CU2 was not installed on the SCOM management servers yet. After installing that update on the SCOM servers we could pick up that version of the agent (258 in that case) and move that one to the Red Hat machine. Always fun if there is no direct connectivity and port 22 is still closed.
For purposes of the POC we requested two accounts to be setup on all cross plat machines:
- Scoma -> privileged account
- Scomv -> normal account
We also requested these to have the same password for both accounts on all of the test machines. In normal circumstances this will probably not be the case.
On the SCOM side of things make sure you define these accounts as RunAs Accounts. Their type is “Basic Authentication”. For Distribution chose More Secure and enter the SCOM Management Servers that you would require to talk to the cross platform agents.
Next in the RunAs Profiles you can find the Unix Action Account and Unix Privileged Account and link the previously defined accounts to the target objects you want to use. In this case as all targets use the same accounts we could just leave the default of “All targeted Objects”.
So, we were ready to start with the manual installation on the Red Hat box. The system admin installed the rpm again with the latest version of the rpm and checked the new service was running. Next thing was to counter sign the certificate. As we did not have SSH opened on the firewalls yet, we opted to do a manual signing of the certificate. This procedure is in the documentation and also discussed before on this blog http://www.bictt.com/blogs/bictt.php/2009/09/30/scom-agent-on-sun-solaris . Counter signing was easy and the certificate file was brought back to the Red Hat machine and replaced the existing self signed certificate and restarted the agent. If the installation of the agent does not work, please re-check the prerequisites http://technet.microsoft.com/en-us/library/dd789030.aspx . Also check if Linux Standard Base is installed, check out a post from David Allen here http://wmug.co.uk/blogs/aquilaweb/archive/2009/09/02/more-opsmgr-x-plat-notes.aspx We got an error relating to this on one of the machines as well and as it stated something about a directory or file not found with /lsb/ in the path I remembered David’s post and we fixed that one.
So at this point we did a telnet to 1270 from the SCOM server to the Red Hat machine. This worked (we got an empty screen, so good enough as an answer in this case).
Name resolving is also an important point with cross platform monitoring. There are a few reasons that might be obvious, but one of the important things is that the certificate and certificate signing (in combination with the discovery wizard) uses fully qualified domain name! In this case we had to manually point the machines towards each other in the hosts file (windows) and the /etc/hosts file (linux).
So now we could run the Discovery wizard. While running this we tried several options (SSH discovery works only when SSH is enabled on the firewalls and only got error messages back.
Right after this we got access to the SSH port to the Red Hat boxes.
So we ran the discovery wizard again.
This time we got a bit further and the wizard told us that it needed to Install and Discover the agent (and that it found it to be a Red Hat 4 machine, which was correct). This gave us a bit of a red flag actually as the agent was already installed and the certificate was already cross signed as well! We will come back to this error in a minute.
But since we got this option we thought, just install the agent through the wizard and see what happens. We got errors again that led to Access Denied. So we checked again on the Red Hat box and sure enough, although the scoma account was privileged it had trouble doing a “su –root”. In order to get this working the scoma account (privileged) had to be added to the wheel group in order to get admin rights. This enabled the account to use sudo. Also we had to run “chmod +s /bin/su” in order to make sure the users in the wheel group can execute su (look at this page for further information on setting permissions on files http://www.comptechdoc.org/os/linux/usersguide/linux_ugfilesp.html ).
Run the discovery wizard again! It found a Red hat 4 box again and wanted to install the agent. Why didn’t it find the already installed agent? Try to press on anyway! After pressing the deploy button we saw in the status field that it went from Deploying (it is sending the rpm file to the machine) to Installing (running the rpm file), Validating (run checks and move to certificate checking part). Somewhere in the Validating phase or before getting to the next step we got an error:
WinRM cannot process the request because the input XML contains an invalid attribute or element name.
Trying to find out what that meant did not give back any usable result unfortunately. Also on the forums I could not find anything pointing in that direction. This brought us back to our initial thought -> we are testing networking here and not agent deployment. We just had to find out what was going on.
On one of the SCOM management servers we ran windows updates and checked for anything that could help. Interesting -> Powershell 2 with Winrm 2 was available as an update. So we installed that one on one of the machines and guess what? The discovery wizard gave another error -> simply that the Microsoft.Unix.DiscoveryScript.Discovery.Task went wrong. Seems like we are getting less information now. But as the step took some time it could have also been a time-out as this task has a time-out at 20 seconds.
Microsoft has made some changes to the management pack catalog on Pinpoint. It was a bit diffucult to navigate and to find certain management packs. It has improved a lot now.
Check it out:
Enjoy managing everything!
Just wanted to share a few links to new KB articles that have been released in the last few days (weeks actually).
The system Center Operations Manager 2007 web console fails to open with "Error: Could not load file or assembly 'Microsoft.ReportViewer.WebForms, Version=18.104.22.168" - http://support.microsoft.com/kb/2010168
The monitoring of SNMP devices may stop intermittently in System Center Operations Manager or in System Center Essentials - http://support.microsoft.com/kb/982501
Attempts to opening the System Center Operations Manager 2007 console fails with "The client has been disconnected from the server" exception - http://support.microsoft.com/kb/2262476
How to monitor for Opalis Integration Server Platform Events - http://support.microsoft.com/kb/2269622
From the Microsoft Deployment Toolkit Team good news on an update to the MDT 2010:
Deploy Windows 7 and Office 2010 quickly and reliably—while boosting user satisfaction
Microsoft® Deployment Toolkit (MDT) 2010 Update 1 is now available! Download MDT 2010 Update 1 at: http://go.microsoft.com/fwlink/?LinkId=159061
As you prepare to deploy Windows® 7, Office 2010, and Windows Server® 2008 R2, get a jump start with MDT 2010 Update 1. Use this Solution Accelerator to achieve efficient, cost-effective deployment of Windows 7, Office 2010, and Windows Server 2008 R2.
This latest release offers something for everyone. Benefits include:
For System Center Configuration Manager 2007 customers:
New “User Driven Installation” deployment method. An easy-to-use UDI Wizard allows users to initiate and customize operating system and application deployments to their PCs that are tailored to their individual needs.
Support for Configuration Manager R3 “Prestaged Media.” For those deploying Windows 7 and Office 2010 along with new PCs, a custom operating system image can easily be preloaded and then customized once deployed.
For Lite Touch Installation:
Support for Office 2010. Easily configure Office 2010 installation and deployment settings through the Deployment Workbench and integration with the Office Customization Tool.
Improved driver import process. All drivers are inspected during the import process to accurately determine what platforms they really support, avoiding common inaccuracies that can cause deployment issues.
For all existing customers:
A smooth and simple upgrade process. Installing MDT 2010 Update 1 will preserve your existing MDT configuration, with simple wizards to upgrade existing deployment shares and Configuration Manager installations.
Many small enhancements and bug fixes. Made in direct response to feedback received from customers and partners all around the world, MDT 2010 Update 1 is an indispensible upgrade for those currently using MDT (as well as a great starting point for those just starting).
Continued support for older products. MDT 2010 Update 1 still supports deployment of Windows XP, Windows Server 2003, Windows Vista®, Windows Server 2008, and Office 2007, for those customers who need to be able to support these products during the deployment of Windows 7 and Office 2010.
Download Microsoft Deployment Toolkit 2010: http://go.microsoft.com/fwlink/?LinkId=159061.
Learn more by visiting the MDT site on Microsoft TechNet: www.microsoft.com/mdt.
Get the latest news by visiting the Microsoft Deployment Toolkit Team blog: http://blogs.technet.com/msdeployment/default.aspx.
Provide us with feedback at firstname.lastname@example.org.
If you have used a Solution Accelerator within your organization, please share your experience with us by completing this short survey: http://go.microsoft.com/fwlink/?LinkID=132579.
Microsoft Deployment Toolkit Team
Saw a post from Cory Delamarter from a few days ago about the release of a new version of the OpsMgr 2007 R2 MP. They released version 6.1.7672.0
Please refer to that post for a number of changes implemented in this version of the MP.
Thanks to Cory and the team for releasing quarterly updates to pck up issues and adding features/views/monitors/reports that are very usefull!
My good friend Walter Eikenboom just wrote a post about sending SCOM alerts to Twitter
There are of course some disadvantages and advantages, but it is just very cool and quite easy to setup! You could say it is an alternative for the SMS subscriptions.
Nice going Wally!!
So a few days ago the TMG team released Forefront TMG Service Pack 1. You can read about it here:
Finally got around to upgrading my DPM 2010 RC to the RTM version, so thought to quickly run through it on the blog as well. I guess I can run this one without screenshots this time as the steps should be clear enough.
Started out with the DPM 2010 upgrade advisor you can find at http://download.microsoft.com/download/F/F/3/FF3347F5-C076-400C-A77A-B6FFA0EA56A4/DPM%20Upgrade%20Advisor.xls
My situation is DPM 2010 RC to DPM 2010 RTM on Windows 2008 R2 with a local database and no localization or other special options. The upgrade advisor gave me the following todo list:
1. Close DPM administrator console and DPM management shell if opened.
2. Launch DPM 2010 RTM Retail setup and proceed by clicking on Install DPM
3. Complete the installation wizard and restart the computer to complete the upgrade (if prompted).
4. Upgrade agents on production servers
5. Run consistency check for all the protected Datasources
6. Uninstall DPM 2010 RC SQL instance (Optional) if there are no issues after upgrade. However, if you're looking for downgrade then DPM 2010 RC DPMDB is required.
Looks like the plan I had in my head, so that is good.
When running the installer it could see it was about an upgrade already . Walked through the steps of the wizard; simple enough. Runs some prerequisite checks, asks you where to install the SQL database and if you want to use another SQL instance. It will ask for a strong password to run some services. It will install SQL 2008 SP1 and DPM 2010, so will have to wait a little for that to finish.
The upgrade was successful. I got new icons on the desktop and in the program itself, so that is nice
So the next step is to upgrade the agents. Opened the DPM console and went to agent management. Upgraded most agents from there. I know the few workstations are turned off at the moment and the Forefront TMG server is blocking the upgrade because of its firewall. No big issues there, I can fix that part. Most agents now have version 3.0.7696.0.
So next is running the consistency checks on the protection groups. That takes a while.
Meanwhile try to uninstall the old SQL instances. Hmmm, an error! Seems it has a problem accessing one file and because of that it did not want to uninstall the Report Server feature of both old SQL instances (yeah, I also still had the DPM Beta instance there). It did remove the database engine though for both. The security settings on both files he had a problem with seemed to be fine, so I tried the old troubleshooting technique of the impatient and successful system admin … Reboot!
After the reboot I was able to remove both old SQL Reporting Services instances. Told you!
Now continue to let it check anything that is not consistent and fix where needed.
Looks like a good update!