Data Engineer with data science exp
Posted
Role : Data Engineer with data science exp
Location : Scottsdale AZ (Onsite)
Rate : $75/hr.
Indent :
SF_OP_(phone number removed)-1
We are looking for a skilled Data Engineer with strong PySpark experience to work on large-scale data processing and analytics initiatives. The ideal candidate will have hands-on experience working with large datasets, complex joins, and performance optimization, along with the ability to apply basic analytical thinking and deliver clear, stakeholder-ready outputs.
Key Responsibilities
Data Engineering & Development
Expected Technical Approach (Problem-Solving Mindset)
Candidates are expected to demonstrate the ability to:
Core Skill Sets (Must-Have)
Good to Have Skills
Soft Skills & Competencies
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Location : Scottsdale AZ (Onsite)
Rate : $75/hr.
Indent :
SF_OP_(phone number removed)-1
We are looking for a skilled Data Engineer with strong PySpark experience to work on large-scale data processing and analytics initiatives. The ideal candidate will have hands-on experience working with large datasets, complex joins, and performance optimization, along with the ability to apply basic analytical thinking and deliver clear, stakeholder-ready outputs.
Key Responsibilities
Data Engineering & Development
- Design, develop, and maintain scalable data pipelines using PySpark.
- Write efficient and optimized PySpark code to process and transform large-scale datasets.
- Handle joins across multiple large databases, ensuring performance, accuracy, and scalability.
- Optimize Spark jobs to minimize runtime, memory usage, and compute cost.
- Work with structured and semi-structured data from multiple sources.
- Build and curate training and analytical datasets by joining and transforming multiple data sources.
- Apply basic analytical skills to understand data patterns, anomalies, and business relevance.
- Perform data validation and quality checks, including:
- Record counts and reconciliation
- Duplicate detection
- Null and outlier checks
- Schema and data-type validation
- Ensure datasets are analysis-ready and trustworthy.
- Understand business objectives and translate them into data requirements.
- Ask the right questions to determine:
- Level of aggregation required
- Metrics definitions
- Data freshness and accuracy expectations
- Preferred output and reporting formats
- Present results and insights clearly to stakeholders.
- Create reports and summaries using Excel for business users and leadership.
Expected Technical Approach (Problem-Solving Mindset)
Candidates are expected to demonstrate the ability to:
- Approach complex data projects methodically, starting with:
- Understanding business objectives
- Reviewing source data structure and volume
- Designing efficient join strategies
- Choose the right join types, partitioning strategies, and caching techniques.
- Validate data at every stage of the pipeline.
- Balance technical accuracy with business usability when presenting results.
Core Skill Sets (Must-Have)
- Strong hands-on experience with PySpark
- Extensive experience working with large datasets
- Proven expertise in joining large databases efficiently
- Ability to write high-performance, optimized code
- Basic analytical skills to interpret and validate data
- Reporting skills using Excel
Good to Have Skills
- Experience in model development or supporting analytics/modeling teams
- SAS experience
- Exposure to Cloudera or similar big data platforms
- Understanding of data warehousing and analytics workflows
Soft Skills & Competencies
- Strong problem-solving and logical thinking
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
