Tuesday, January 17, 2012

SCOM Zombie PSA#30: Event ID 21042

 

A few weeks ago someone (aka zombie) thought they were manually uninstalling a SCOM agent from a managed server. What they didn’t know was that they were NOT on the server they thought they were on but it was the Root Management Server. What comes next? They uninstalled the SCOM software from the Root Management Server. I knew this because I was on the server troubleshooting bad agents and everything closed out on me. I immediately check my event logs to see why, and the Operations Manager event log was GONE. I know, RIGHT! (aka Zombie Apocalypse)

But luck was on our side, and we were able to re-install SCOM back and apply the backup keys. everything was going good so we thought.

About 24 hours later the Root Management Server had the following alerts in the Operations Manager event log. I mean nothing but this event; it was pouring a lot of these event id every second.

There was not much out there in the world on this event id that we could find.


Event Type:       Information

Event Source:    OpsMgr Connector

Event Category: None

Event ID:           21042

Computer:        RMS.FQDN

Description:

Operations Manager has discarded 1 item in management group <Management Group Name>, which came from $$ROOT$$. These items have been discarded because no valid route exists at this time. This can happen when devices are added to the topology but the completed topology has not been distributed yet. the discarded items will be regenerated.


Funny thing was each of the management Servers that had agents report to them were being flooded with these alerts.


Event Type:       Information

Event Source:    OpsMgr Connector

Event Category: None

Event ID:           20000

Computer:         RMS.FQDN

Description:

A device which is not part of this management group has attempted to access this Health Service. Requesting Device Name: ServerName.FQDN 


The quick fix was to “”Re-Enter”” each of the Run-As accounts back manually. Once this was done, the 21042 alert went away. However many of the event id 20000 didn’t all clear up. it took some manual process of stopping the agent service and deleting the Health Service State folder and re-starting the service again to make the some of the agents communicate back.

In my lab I was able to re-create the issue and correct it by doing this. I don’t fully understand all the details just yet but I’m working on that, if I can get some more time to it.

SCOM Zombie PSA#29: Alerts in console not showing up

 

This caught me off guard just now. When I opened my web console, there was a couple of alerts but when I opened my console up no alerts displayed. funny since it stated 8 active alerts.

image

The quick fix was that my Alert Detail tab was all the way up, so I just dragged and dropped it down to expose my alerts.

image