You need (at least) two Enterprise Manager installations

So last time round I ended with a subscript that this article would be coming and here it is… a lot later than I intended. I was waiting for EM13c to ship so I could introduce something it’s delivering into the discussion and, well, I work for a development organisation. Time can be a relative thing….

In my previous post I put forward several strong reasons for “One EM to rule them all”. It’s a general design principle that I use to design large-scale EM implementations but I’m going to expand on the definition of ‘one’ a little….bear with me, it’ll all make sense at the end (maybe).

When I talk about “one” EM here what I actually mean is one “Operational, Production EM”. Alongside this I think there’s a case to be made for potentially two additional EM installations….One that provides high availability (and optionally disaster protection) against a failure of the Production EM configuration and one additional Enterprise Manager installation that’s used as a sandpit to provide amongst other things a place to test patches, upgrades, trial new features, etc.

I want to establish up front the difference between having one EM and multiple installations of EM. It’s possible to have one configuration of EM which consists of several installations of the same software configuration, where one is used for production and one is a testbed environment. It’s equally possible to have several installations of the software that provides more than one EM….. The case I’m putting forward here is for the former. The key difference between these two configurations is that one of them has a single repository (that happens to be survivable through a combination of RAC and optionally, and normally, Data Guard…the other is actually two separate EM repositories. As anyone who’s been using EM for a long time knows there’s not much integration that can be done between two separate repositories….

In my view there are at least two quite specific cases when you should consider having at least 2 EM installations in addition to your primary, production EM configuration. EM is an Enterprise Application, and as such it should be treated like one.

You can’t live without it

The first of these is that old chestnut – High Availability. One of the most common requests I get when meeting with customers who ask for our guidance or conduct an architectural review of their EM installations is the statement and question…..”We never believed we’d come to be so reliant on EM, what can we do make sure it’s always up ?”….. I have lots of experience with customers who began with a small EM installation that over time has grown from being simple playground environments or test setups to evolve into being, effectively, the Production EM. It’s only when, unexpectedly, this EM environment is unavailable and administrators have come to rely on it for notifications, monitoring, etc that suddenly there’s a bit of activity around making sure it can remain available in the face of various failures (hardware, software, etc).

It can be a challenge to get a customer to transition from a simple EM implementation topology into a more appropriate one that offers them a higher level of availabiliy. Fortunately the tech stack that EM itself is built on, and the functionality that EM itself offers, has built-in capabilities for enabling this. Indeeed a lot of steps can be taken inside the EM product itself to add resilience. The EM product can be used to transform itself from a single Oracle Management Server & Repository configuration into something offering more resilience. Some years ago, myself and Jim Viscusi* worked on creating a formal design pattern for deploying Enterprise Manager in various ‘levels’ of availability. This began quite informally when working with customers and gradually evolved into what is now a fully documented High Availability design pattern that customers are using all over the world as it’s got official product adoption and support. The detail of this process is pretty significant and that will wait for another day. It began with a series of constantly updating presentations and white papers but thankfully it’s now largely been incorporated in the excellent EM Documentation set.

To be clear, this additional EM installation or installations (if you have many of them) are essentially serving the purpose of providing resilience for the single Production EM configuration. If you ‘switchover’ or ‘failover’ from the Production EM installation to one of these other ones, you’re simply activating a clone of the original one which will perform the role of the Production EM when activated. If you’re eager to begin to explore this at this point I’d recommend you take a look at Chapters 18-22 in the EM Advanced Configuration Guide, it’s a great starting point. There’s a full description in this document of configurations ‘levels’ of EM and the protection levels they provide.

The Sandpit

The second use case for having an additional EM installation is also pretty key in large installations or global roll-outs that I’ve worked on. This second installation performs the role of a sand-pit environment and it serves two critical functions.

It’s first, and primary function, is to provide a safe environment to validate patches, updates or configuration changes to the EM configuration without fear of impacting the Production EM service. It’s really not a sensible idea to test changes to EM in the production environment as a first point of call….this is going to end badly. The other role this second EM can perform which is very beneficial in large global deployments is a place to test out new features and functionality that you plan to roll out in the production EM. Testing these new features in a safe environment without fear of impacting the production service means administrators can validate their configuration and functional roll-out plans ahead of time and users can be trained on using the new features effectively without any risk to the production environment. I’m a big fan of these sandpit environments where new functionality can make it’s way up through promotion into the production EM. It’s a great way to incrementally test something when you’re not sure of it’s impact. Those who know me will often hear me say “If you’re not sure what you’re doing, don’t do it in a big way”. Testing new functionality in a pre-production sandpit is a best practice I’d highly recommend for all EM implementations.

Enterprise Manager 13c Always On Monitoring

There is a third use case that often comes up for having an additional EM installation and I really don’t recommend it. Some administrators have created an additional EM installation to provide monitoring telemetry of targets while the production EM is being upgraded or patched. This design pattern typically involves having a completely separate Oracle Management Server and Repository, along with associated deployed Agents, on each and every machine. This means incredible amounts of duplication and consumption of resources. Whilst I understand the problem it’s trying to address it’s an enormous overhead in attempting to solve the thorny problem of EM downtime during planned maintenance (patching typically). I really, really don’t advocate this approach because of the extra workload in both compute terms and human terms.

Thankfully Enterprise Manager 13c had gone a little way to addressing this by providing a new feature titled “Always on Monitoring”. When discussing requirements with customers about what services were critical to their enterprise monitoring needs the same thing came up time and time again. They could live for a small period of time without some functionality, for example, Provisioning or Patching targets whilst EM was being patched but monitoring,…continuous monitoring…was essential to the business. EM13c introduces the first implementation of “Always on Monitoring” by allowing administrators to continue to receive notifications when the Production EM is in some state of reduced availability (for example when it’s being patched). The solution involves having a separate infrastructure that’s configured specifically for this purpose that’s entirely separate to the EM repository but is synchronised with it. The implementation of this feature is also so substantial that it merits it’s own post one day so I’ll come around to that. If you’re keen to delve into this just now however, the documentation on EM13c Always on Monitoring can be found here in Chapter 12 of the Administrators Guide. This first step into providing continuity of monitoring through planned maintenance has been on many customers request lists for many versions so I have a feeling it’s introduction will be welcomed by users as a key feature driving EM13c adoption.

* Whilst the original model was devised by Jim and myself it’s undergone scrutiny and enhancement by a lot of our colleagues over the years, partly driven through challenging the validity and currency of the model, partly through continued customer proving/engagement and partly as the products implementation has evolved. At the risk of missing people out unfairly I won’t name names but siffice it to say many people, smarter than me, have contributed to making sure this model has stood the test of time and remained fit for purpose for many customer implementations of Enterprise Manager globally.