Open to Junior Data Engineering Roles

Data Engineerbuilding pipelinesthat scale.

I design batch and real-time data workflows using Spark, Flink, Kafka, Airflow, Hive, and Python — focused on reliability, performance, and business-ready analytics.

Records / Batch

5M+

ETL processing scale

Events / Minute

10K+

Kafka streaming pipeline

Runtime Improved

35%

Spark optimization

Airflow Workflows

6+

Automated data jobs

Ahmed Samy Abdelrahim

Ahmed Samy Abdelrahim

Junior Data Engineer

Building reliable data pipelines, streaming workflows, and analytics-ready platforms.

Live Portfolio Dashboard

Big Data Control Room

Active

Core Tool

Spark

Core Tool

Kafka

Core Tool

Flink

Core Tool

Airflow

System Health

Reliable workflows

Batch ETL92%
Streaming86%
Automation80%
Warehousing84%

About Me

I turn raw data into scalable, automated, and analytics-ready workflows.

Junior Data Engineer with hands-on internship experience in ETL development and distributed processing. I focus on performance optimization, workflow automation, real-time pipelines, and clean data architecture that supports decision-making.

Skills

Tech stack

Tools used to build batch pipelines, streaming systems, warehouses, and automated workflows.

PythonSQLApache SparkApache FlinkKafkaAirflowHadoopHiveHDFSSpark SQLETL PipelinesData WarehousingDockerLinuxPostgreSQLAWS S3AWS EC2

Projects

Proof of work

Portfolio projects written to show business value, architecture thinking, and technical delivery.

Flagship Project

Credit Card Fraud Detection at Scale

End-to-end big data system combining Kafka ingestion, Flink real-time processing, Spark batch analytics, and dashboard monitoring for suspicious transaction detection.

Outcome

Real-time fraud analytics architecture

KafkaFlinkSparkPythonML
View Repository
Batch Engineering

Spark ETL Pipeline

Optimized PySpark pipeline processing 3M+ rows with transformations, aggregations, partition tuning, and caching to reduce execution time.

Outcome

~30% faster runtime

PySparkSpark SQLETLOptimization
View Repository
Streaming

Kafka Streaming Pipeline

Streaming ingestion pipeline handling high-volume events with validation and transformation logic for analytical consumption.

Outcome

10K+ events per minute

KafkaStreamingValidation
View Repository
Warehouse

Hive Data Warehouse

Designed a layered Hive warehouse model supporting analytical queries across multiple datasets with clean structure and reporting readiness.

Outcome

10+ datasets supported

HiveHDFSData Warehouse
View Repository

Architecture

Data pipeline blueprint

Step 01

Sources

Raw data and transactions

Step 02

Kafka

Event ingestion layer

Step 03

Flink

Real-time processing

Step 04

Spark

Batch transformations

Step 05

Hive

Analytical storage

Step 06

Dashboard

Business insights

pipeline-terminal

01spark.readStream.format('kafka').load()
02processing: 10K+ events/min
03batch_etl.optimize(partitioning=True)
04airflow.dag.status = 'automated'
✓ pipeline_status: ready_for_demo

Experience

Practical background

NTI & Huawei Big Data Program

Junior Data Engineer Intern

  • • Built ETL pipelines processing 5M+ records per batch.
  • • Improved execution performance by 35% using tuning and caching.
  • • Designed Hive warehouse layers supporting 10+ datasets.
  • • Automated 6+ workflows using Apache Airflow.
  • • Engineered Kafka ingestion handling 10K+ events per minute.

Education

Computer Engineering Graduate

Bachelor of Engineering in Computer Engineering — Mansoura University. Strong foundation in programming, systems, databases, data processing, and problem solving.

Career Direction

Data Engineering • Big Data • Streaming Systems

Contact

Let’s build something valuable with data.

Open to internships, junior data engineering roles, freelance data tasks, and collaboration on real-world big data projects.