Key Responsibilities
- Design, construct, install, and maintain scalable and robust data pipelines.
- Build and maintain reliable ETL processes to collect data from multiple sources.
- Optimize and tune data pipelines and data storage solutions for performance and cost-efficiency.
- Ensure data integrity, consistency, and security.
- Work with cross-functional teams to understand data needs and deliver high-quality data solutions.
- Develop and maintain data models, schemas, and documentation.
- Automate data processes and workflows using scripting and orchestration tools.
- Monitor and troubleshoot data pipeline issues and ensure uptime of data services.
- Strong programming skills in Python, Java, or Scala.
- Proficiency in SQL and experience with relational databases (e.g., PostgreSQL, MySQL).
- Experience with big data tools such as Apache Spark, Hadoop, Hive, or Presto.
- Familiarity with cloud data platforms (e.g., AWS Redshift, Google BigQuery, Azure Data Lake).
- Hands-on experience with ETL tools like Apache Airflow, AWS Glue, or Talend.
- Knowledge of data warehouse concepts and data modeling techniques.
- Experience with version control systems like Git.