Data engineer
Master Works View all jobs
- Riyadh
- Permanent
- Full-time
- Design, develop, and maintain real-time and batch data pipelines leveraging Kafka, Spark, and Hadoop components.
- Must have understanding of Teradata CLDM, should know how to create new Data Model or modify existing data model based on business requirement
- Collaborate with business analysts and data architects to translate business requirements into robust data models and ETL frameworks.
- Apply Relational and Dimensional modeling techniques to design databases and ensure data is organized effectively for both operational and analytical purposes.
- Write, debug, and optimize SQL and Stored Procedures to ensure efficient data processing.
- Work closely with BI, Data Science, and Campaign teams to ensure seamless data availability for analytics
- Work closely with Data Architects, Analysts, and Business Stakeholders to translate business requirements into database solutions.
- Ensure all database design and code is well-documented and follows best practices for performance and maintainability.
- Involves designing fact and dimension tables for reporting and analytics purposes, often in a star or snowflake schema.
- Develop and maintain technical documentation (data flow diagrams, source-to-target mappings, architecture documents).
- Ensure that the Data Dictionary is always up-to-date, capturing all changes to the database schema, including newly created or modified tables, columns, views,
- Perform data quality checks, validation, and ensure end-to-end data accuracy and lineage.
- Support and troubleshoot real-time streaming jobs and ensure high availability of data pipelines
- Strong expertise in real-time data integration using Kafka, Spark Streaming, or Dataflow.
- Hands-on experience with Hadoop ecosystem components (HDFS, Hive, Sqoop, Spark etc.).
- Strong Data Modeling concepts including FSLDM, CLDM, and Dimensional / Data Vault modeling.
- Deep understanding of Telecom domain (BSS/OSS, CDR, usage, revenue, and campaign data).
- Experience building and optimizing ETL pipelines and data ingestion frameworks for structured and unstructured data.
- Proficiency in SQL and distributed data processing using Hive, Spark SQL, or PySpark.
- Good understanding of data governance, data quality, and lineage frameworks.
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration skills with cross-functional teams.
- Experience in working on data vartulization tools like (Tibco. Trino etc)
- Familiarity with Data Catalogs, Metadata Management, and NDMO data governance standards.
- Experience with Data Catalogue tools.
- Familiarity with CI/CD pipelines, Git.
- Knowledge of ETL orchestration tools like Airflow, NiFi