Non-Functional Requirements (NFRs) Framework for Software Systems - Best Practice: Consider Reliability Non-Functional Requirements (NFRs)
Non-Functional Requirements (NFRs) Framework for Software Systems
Chapter 12. Best Practice: Consider Reliability Non-Functional Requirements (NFRs)
Overview
Reliability Non-Functional Requirements (NFRs) define how consistently and correctly a software system performs under expected, peak, degraded, and abnormal operating conditions. Reliability includes successful transaction processing, predictable error handling, correct retry and timeout behavior, dependency failure behavior, and the ability to avoid repeated or silent failures.
Reliability should be expressed through measurable outcomes such as transaction success rate, error rate, failure frequency, recovery behavior, and trend reporting. Reliable systems make failures visible, bounded, diagnosable, and manageable rather than surprising or uncontrolled.
Best Practice: Define transaction success and error-rate non-functional requirements
Description
Transaction success and error-rate NFRs define the expected success rate for important operations and the acceptable level of failed, rejected, timed-out, duplicated, or incomplete processing. These requirements should identify critical transactions, error categories, measurement windows, and whether user, integration, or batch transactions are in scope.
Benefits
Defining success and error-rate expectations helps teams focus reliability engineering on the transactions that matter most. It also enables meaningful monitoring, alerting, root-cause analysis, and service-level review.
Example non-functional requirements
- The payment authorization workflow shall successfully complete at least 99.5% of valid requests during each monthly measurement window, excluding failures caused by approved external provider outages.
Validation method: Compare transaction logs, external provider incident records, monitoring dashboards, and monthly service reports against the target.
Example validation evidence: Transaction success dashboard, monthly reliability report, provider outage log, incident records, and service review approval.
- Critical API endpoints shall maintain an application error rate below 0.5% during normal operating conditions and below 1.0% during approved peak-load events.
Validation method: Review API gateway logs, application logs, error-rate dashboards, and peak-event monitoring records.
Example validation evidence: API error-rate dashboard, log query results, peak-event report, incident summary, and corrective-action backlog.
Related stakeholders
Typical stakeholders include product owners, service owners, developers, QA teams, SRE teams, integration teams, operations teams, and business process owners.
Related lifecycle phases
These NFRs are defined during requirements and design; validated during unit testing, integration testing, performance testing, production monitoring, incident review, and service-level review.
Best Practice: Define retry, timeout, and exception-handling non-functional requirements
Description
Retry, timeout, and exception-handling NFRs define how the software system behaves when operations are slow, unavailable, partially complete, invalid, duplicated, or failed. These requirements should specify retry limits, backoff behavior, timeout thresholds, idempotency expectations, error classification, user messages, logging, and escalation behavior.
Benefits
Clear retry and exception-handling requirements reduce cascading failures, duplicate processing, hidden errors, and poor user experience. They also help developers implement consistent failure behavior across services, APIs, integrations, jobs, and user workflows.
Example non-functional requirements
- The software system shall use bounded retry logic with exponential backoff for transient integration failures and shall not retry indefinitely.
Validation method: Review code/configuration, run integration failure tests, and verify retry count, backoff behavior, and final failure handling.
Example validation evidence: Code review record, configuration export, integration failure test report, logs showing retry behavior, and monitoring alert evidence.
- User-facing transactions that fail after approved retry attempts shall display a clear error message, record a traceable error event, and avoid duplicate transaction submission.
Validation method: Execute user workflow tests for failed transactions and inspect user message, audit trail, application logs, and duplicate-prevention behavior.
Example validation evidence: Test case result, screen capture, log entry, audit record, duplicate-check evidence, and defect closure record if applicable.
Related stakeholders
Typical stakeholders include software engineers, integration architects, QA teams, UX teams, SRE teams, operations teams, and product owners.
Related lifecycle phases
These requirements are defined during design and implementation planning; validated during code review, unit testing, integration testing, UX testing, failure simulation, and production incident analysis.
Best Practice: Define dependency failure behavior non-functional requirements
Description
Dependency failure behavior NFRs define how a software system responds when dependent services, databases, APIs, message queues, identity providers, files, data feeds, networks, platforms, or vendor services become unavailable, slow, incorrect, or degraded. These requirements should distinguish critical dependencies from optional dependencies and specify fallback, fail-fast, queueing, or degraded-mode behavior.
Benefits
Dependency failure NFRs help prevent localized dependency problems from becoming widespread outages. They also support clearer operational response because teams know which failures should cause alerts, degraded service, delayed processing, manual intervention, or business escalation.
Example non-functional requirements
- If the external address validation service is unavailable, the software system shall allow authorized users to save the transaction in pending validation status and resume validation when the service is restored.
Validation method: Simulate external service outage and verify pending-state behavior, user notification, logging, alerting, and recovery processing.
Example validation evidence: Outage simulation test report, pending transaction record, user notification evidence, log sample, alert record, and recovery test result.
- The software system shall identify all critical runtime dependencies and define expected behavior for timeout, failure, degraded response, and recovery for each dependency.
Validation method: Review dependency inventory, architecture diagrams, failure-mode documentation, and test coverage for representative failure scenarios.
Example validation evidence: Dependency inventory, failure-mode matrix, architecture review, integration test results, and runbook updates.
Related stakeholders
Typical stakeholders include solution architects, integration architects, software engineers, platform teams, operations teams, SRE teams, vendor managers, and business owners.
Related lifecycle phases
Dependency behavior NFRs are defined during architecture and integration design; validated during integration testing, resiliency testing, failure injection, operational readiness, and incident retrospectives.
Best Practice: Define graceful degradation non-functional requirements
Description
Graceful degradation NFRs define how a software system continues to provide limited, prioritized, or safe functionality when some capabilities, dependencies, data sources, or infrastructure resources are degraded. These requirements should identify which capabilities must remain available, which may be disabled, what users should see, and how degraded mode is exited.
Benefits
Graceful degradation improves user trust and business continuity by avoiding all-or-nothing failures. It also supports resilience, safety, and operational control when systems encounter overload, dependency failures, partial outages, or external provider problems.
Example non-functional requirements
- If the recommendation service is unavailable, the customer portal shall continue to support browsing, search, cart, and checkout while hiding personalized recommendations and recording a degraded-mode event.
Validation method: Simulate recommendation service outage and verify critical user journeys, UI behavior, event logging, alerting, and restoration behavior.
Example validation evidence: Degraded-mode test report, user journey test results, UI screenshots, event log, alert record, and monitoring dashboard.
- During high-load events, non-critical background processing shall be throttled or deferred before critical user-facing transactions are affected.
Validation method: Run load tests that exceed normal operating thresholds and verify prioritization, throttling, queueing, and alert behavior.
Example validation evidence: Load test report, queue metrics, throttling logs, critical transaction latency report, and capacity review record.
Related stakeholders
Typical stakeholders include product owners, UX teams, software engineers, SRE teams, platform teams, operations teams, and business continuity stakeholders.
Related lifecycle phases
Graceful degradation NFRs are defined during architecture and user experience design; validated during integration testing, load testing, resilience testing, production monitoring, and incident simulations.
Best Practice: Define reliability evidence and trend reporting non-functional requirements
Description
Reliability evidence and trend reporting NFRs define which reliability measures are collected, how they are reported, how long evidence is retained, and who reviews trends. Evidence may include transaction success rates, error rates, failure patterns, retry behavior, incident records, defect trends, monitoring alerts, and customer-impact analysis.
Benefits
Trend reporting helps teams identify reliability degradation before it becomes a major incident. It also supports continuous improvement by connecting reliability NFRs to defects, incidents, capacity changes, release changes, dependency issues, and architecture decisions.
Example non-functional requirements
- The software system shall produce monthly reliability trend reporting that includes transaction success rate, error rate, incident count, recurring failure patterns, and corrective actions.
Validation method: Review the monthly reliability report and reconcile reported values against monitoring dashboards, logs, and incident records.
Example validation evidence: Reliability trend report, monitoring export, incident records, defect trends, and corrective-action backlog.
- Critical reliability indicators shall have alert thresholds and named owners responsible for triage and corrective action.
Validation method: Inspect alert configuration, ownership assignments, escalation procedures, and incident response records.
Example validation evidence: Alert policy configuration, ownership matrix, escalation runbook, incident response evidence, and service review minutes.
Related stakeholders
Typical stakeholders include service owners, SRE teams, operations teams, product owners, QA teams, engineering leads, and governance stakeholders.
Related lifecycle phases
Reliability evidence requirements are defined during operational planning; validated during monitoring setup, release readiness, production operation, incident review, trend review, and governance reporting.
Best Practice: Define reliability validation and evidence non-functional requirements
Description
Reliability validation NFRs define how teams prove that reliability requirements are measurable, implemented, tested, monitored, evidenced, and governed. Validation should cover normal processing, failure scenarios, dependency failures, retry behavior, data consistency, degraded operation, and production trend monitoring.
Benefits
Explicit reliability validation improves confidence that the system behaves consistently under real operating conditions. It also makes reliability expectations auditable and actionable for engineering, operations, and business stakeholders.
Example non-functional requirements
- Reliability NFRs shall define the critical transaction or process, expected success target, allowed error categories, measurement window, validation method, evidence source, and responsible owner.
Validation method: Review reliability requirements for completeness and confirm stakeholder approval before release readiness signoff.
Example validation evidence: Approved reliability requirement, SLI/SLO mapping, owner assignment, monitoring design, test plan, and release-readiness approval.
- Reliability validation shall include representative normal, peak, failure, retry, timeout, and dependency-degradation scenarios before production release.
Validation method: Review test scenarios and execution results against the approved reliability requirement set.
Example validation evidence: Reliability test plan, executed test results, failure scenario report, retry/timeout logs, defect closure records, and approval signoff.
Related stakeholders
Typical stakeholders include product owners, engineering teams, QA teams, SRE teams, operations teams, architects, and governance stakeholders.
Related lifecycle phases
Reliability validation occurs during requirements review, architecture review, implementation, testing, production readiness, production monitoring, incident review, and continuous improvement.
How to cite this page
When referencing this page in academic work, internal standards, or external publications, include the page title, IF4IT as publisher, the URL, and your access date.
Example (informal web citation):
International Foundation for Information Technology (IF4IT). Best Practice: Consider Reliability Non-Functional Requirements (NFRs) | Non-Functional Requirements (NFRs) Framework for Software Systems. https://if4it.org/best-practices/non-functional-requirements-nfrs-framework-for-software-systems/best-practice-consider-reliability-non-functional-requirements-nfrs/ (accessed 2026-06-24).
See About Us for content governance and site-wide citation guidance.
Copyright for the International Foundation for Information Technology (IF4IT): 2008 - Present
Legal Disclaimers