Senior Observability Engineer
Devsinc View all jobs
- Al Kharj, Riyadh
- Permanent
- Full-time
- Own and evolve the end-to-end observability architecture across applications, infrastructure, and cloud environments
- Centralize metrics, logs, traces, and events with high reliability and scalability
- Design and enforce SLOs, SLIs, and error budgets for critical financial systems
- Build advanced real-time dashboards and business-aligned KPIs for engineering and leadership
- Develop intelligent alerting frameworks to minimize noise and enable faster incident resolution
- Ensure observability pipelines are resilient, scalable, and cost-optimized
- Collaborate with DevOps and engineering teams to implement instrumentation, distributed tracing, and logging standards
- Integrate observability systems with incident management, on-call, and escalation workflows
- Support compliance, audit, and forensic analysis through structured logging and traceability
- Drive root cause analysis (RCA) and continuous improvement of system reliability
- Automate monitoring, alerting, and data enrichment workflows
- 6 to 10 years of experience in Observability, SRE, or Monitoring Engineering roles
- Mandatory experience in fintech, banking, or highly regulated environments
- Strong hands-on expertise with:
- Monitoring: Dynatrace, Prometheus, Grafana
- Logging: Elastic Stack (ELK), Splunk, Fluentbit, Logstash
- Alerting & Correlation: Dynatrace, ELK, Splunk Alertmanager
- Proficiency in PromQL, SPL, KQL for advanced log/metric analysis
- Experience developing high-performance, scalable dashboards in Grafana and Kibana, integrating application, infrastructure, and business KPIs for end-to-end observability.
- Deep understanding of distributed systems observability and performance monitoring
- Experience with high-throughput, low-latency systems
- Experience with enterprise monitoring tools such as Riverbed and SolarWinds for network performance monitoring (NPM), application visibility, traffic analysis, and infrastructure health tracking across distributed systems.
- Observability pillars: metrics, logs, traces, events
- Golden signals: latency, traffic, errors, saturation
- SLO/SLI-driven reliability engineering
- Alert design with high signal-to-noise ratio
- Telemetry standardization and instrumentation strategies
- Mapping technical metrics to financial/business KPIs
- Proven experience supporting audit, compliance, and regulatory requirements within fintech, banking, or other regulated environments
- Strong familiarity with industry frameworks such as:
- PCI DSS
- ISO 27001
- SAMA / NCA
- Solid understanding of data sensitivity, traceability, and audit logging standards for financial systems
- Experience working on large-scale fintech or digital banking platforms
- Exposure to CI/CD-integrated observability and DevSecOps practices
- Proficiency in scripting and automation (Python, Bash)
- Hands-on experience with incident management and on-call frameworks (e.g., PagerDuty, Opsgenie)
- A proactive engineer with a strong reliability and performance mindset
- Ability to translate observability data into actionable insights
- Experience working cross-functionally with SRE, DevOps, and product teams
- Ownership-driven individual focused on continuous improvement of monitoring systems