Senior Observability Engineer

Al Kharj, Riyadh
Permanent
Full-time

8 hours ago
Apply easily

We are looking for a high-impact Observability Engineer with proven experience in fintech, banking, or other regulated environments to design and scale enterprise-grade observability systems. This role is critical in ensuring high availability, low latency, and full-stack visibility across mission-critical financial platforms, while supporting compliance, auditability, and incident response readiness.Key Responsibilities:

Own and evolve the end-to-end observability architecture across applications, infrastructure, and cloud environments
Centralize metrics, logs, traces, and events with high reliability and scalability
Design and enforce SLOs, SLIs, and error budgets for critical financial systems
Build advanced real-time dashboards and business-aligned KPIs for engineering and leadership
Develop intelligent alerting frameworks to minimize noise and enable faster incident resolution
Ensure observability pipelines are resilient, scalable, and cost-optimized
Collaborate with DevOps and engineering teams to implement instrumentation, distributed tracing, and logging standards
Integrate observability systems with incident management, on-call, and escalation workflows
Support compliance, audit, and forensic analysis through structured logging and traceability
Drive root cause analysis (RCA) and continuous improvement of system reliability
Automate monitoring, alerting, and data enrichment workflows

Requirements

6 to 10 years of experience in Observability, SRE, or Monitoring Engineering roles
Mandatory experience in fintech, banking, or highly regulated environments
Strong hands-on expertise with:

Monitoring: Dynatrace, Prometheus, Grafana
Logging: Elastic Stack (ELK), Splunk, Fluentbit, Logstash
Alerting & Correlation: Dynatrace, ELK, Splunk Alertmanager
Proficiency in PromQL, SPL, KQL for advanced log/metric analysis
Experience developing high-performance, scalable dashboards in Grafana and Kibana, integrating application, infrastructure, and business KPIs for end-to-end observability.
Deep understanding of distributed systems observability and performance monitoring
Experience with high-throughput, low-latency systems
Experience with enterprise monitoring tools such as Riverbed and SolarWinds for network performance monitoring (NPM), application visibility, traffic analysis, and infrastructure health tracking across distributed systems.

Core Expertise:

Observability pillars: metrics, logs, traces, events
Golden signals: latency, traffic, errors, saturation
SLO/SLI-driven reliability engineering
Alert design with high signal-to-noise ratio
Telemetry standardization and instrumentation strategies
Mapping technical metrics to financial/business KPIs

Preferred Qualifications and FinTech Alignment:

Proven experience supporting audit, compliance, and regulatory requirements within fintech, banking, or other regulated environments
Strong familiarity with industry frameworks such as:

PCI DSS
ISO 27001
SAMA / NCA
Solid understanding of data sensitivity, traceability, and audit logging standards for financial systems
Experience working on large-scale fintech or digital banking platforms
Exposure to CI/CD-integrated observability and DevSecOps practices
Proficiency in scripting and automation (Python, Bash)
Hands-on experience with incident management and on-call frameworks (e.g., PagerDuty, Opsgenie)

What We’re Looking For:

A proactive engineer with a strong reliability and performance mindset
Ability to translate observability data into actionable insights
Experience working cross-functionally with SRE, DevOps, and product teams
Ownership-driven individual focused on continuous improvement of monitoring systems

Devsinc