Location: 100% Remote Years? Experience: 10 years... Education: Bachelor?s in IT related field Work Authorization: Must show that applicant is legally permitted to work in the United States. Clearance: Applicants must be able to meet the requirements to obtain an Public Trust security clearance. NOTE: United States Citizenship is required to be eligible to obtain this security clearance. Key Skills: ? 10 years of IT experience focusing on enterprise data architecture and management ? Experience with Databricks required ? 8 years experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling ? Experience with Great Expectations or other data quality validation frameworks ? Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services ? Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization) ? Experience with AWS environment, CI/CD pipelines, and Python (Python 3) a bonus Responsibilities ? Plan, create, and maintain data architectures, ensuring alignment with business requirements ? Obtain data, formulate dataset processes, and store optimized data ? Identify problems and inefficiencies and apply solutions ? Determine tasks where manual participation can be eliminated with automation. ? Identify and optimize data bottlenecks, leveraging automation where possible ? Create and manage data lifecycle policies (retention, backups/restore, etc) ? In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines ? Create, maintain, and manage data transformations ? Maintain/update documentation ? Create, maintain, and manage data pipeline schedules ? Monitor data pipelines ? Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality ? Support AI/ML teams with optimizing feature engineering code ? Expertise in Spark/Python/Databricks, Data Lake and SQL ? Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT ? Research existing data in the data lake to determine best sources for data ? Create, manage, and maintain ksqlDB and Kafka Streams queries/code ? Data driven testing for data quality ? Maintain and update Python-based data processing scripts executed on AWS Lambdas ? Unit tests for all the Spark, Python data processing and Lambda codes ? Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc) ? Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness. Qualifications ? 10 years of IT experience focusing on enterprise data architecture and management ? Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling ? Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required ? Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark ? Data Lake concepts such as time travel and schema evolution and optimization ? Structured Streaming and Delta Live Tables with Databricks a bonus ? Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support ? Advanced level understanding of streaming data pipelines and how they differ from batch systems ? Formalize concepts of how to handle late data, defining windows, and data freshness ? Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc ? Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc. ? Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus ? Understanding of streaming data pipelines and batch systems ? Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness ? Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization) ? Indexing and partitioning strategy experience ? Debug, troubleshoot, design and implement solutions to complex technical issues ? Experience with large-scale, high-performance enterprise big data application deployment and solution ? Understanding how to create DAGs to define workflows ? Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required ? Architecture experience in AWS environment a bonus ? Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus ? Experience with Docker, Jenkins, and CloudWatch ? Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines ? Experience working with AWS Lambdas for configuration and optimization ? Experience working with DynamoDB to query and write data ? Experience with S3 ? Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus ? Familiarity with Pytest and Unittest a bonus ? Experience working with JSON and defining JSON Schemas a bonus ? Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus ? Familiarity with Schema Registry, message formats such as Avro, ORC, etc. ? Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams ? Ability to thrive in a team-based environment ? Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management

Senior Data Engineer

Related Jobs