IT Operating Environments Best Practices - Define availability and performance SLAs for every environment tier - not only Production
IT Operating Environments Best Practices
Define availability and performance SLAs for every environment tier - not only Production
Overview
Service Level Agreements for environment availability and performance are routinely defined for Production and left undefined for every other environment tier. When lower environment SLAs are undefined, teams have no basis for escalating environment unavailability as a governance failure, environment owners have no performance standard against which to be held accountable, and the organizational impact of lower environment downtime is consistently underestimated because it is not measured against any expectation. An SIT environment that is unavailable for three days per sprint cycle blocks the integration testing of every team that depends on it - this is a significant organizational productivity loss that defined SLAs would create the accountability to prevent.
Best Practice
Define explicit availability and performance SLAs for every environment tier in the enterprise pipeline, calibrated to the purpose and user population of each tier rather than applied uniformly. Production and PSTG SLAs should reflect the organizational impact of their unavailability: Production availability commitments are typically defined in terms of annual uptime percentages and measured incident by incident, with defined response and resolution time commitments for each incident severity level. PSTG availability should approach Production availability standards because PSTG downtime delays final validation activities with direct Production deployment schedule implications. UAT and SIT availability SLAs should reflect the impact of their unavailability on team delivery velocity: typically defined as a minimum available hours per business day or per sprint cycle, with a defined response time for environment restoration when availability falls below the minimum. DEV and RSC SLAs should be modest but defined: a best-effort availability commitment with a defined escalation path when availability failures persist beyond a threshold that affects team productivity. TRN and PEN availability SLAs should be defined relative to the training or testing programs they support: the environment should be available throughout any scheduled training or testing engagement.
Benefit(s)
Defined availability and performance SLAs for every environment tier create the accountability framework that motivates proactive environment reliability management rather than reactive incident response. Environment owners have defined standards against which their environments are measured and governed. Teams that depend on environments for delivery activities have defined availability expectations that allow them to plan their work and escalate environment unavailability as a governance issue when it falls below the defined standard. The organization develops a culture in which environment reliability is treated as an organizational obligation at every tier, not only in Production where the consequences of downtime are most immediately visible.
Copyright for the International Foundation for Information Technology (IF4IT): 2008 - Present
Legal Disclaimers