Skip to main content

Databricks Data Engineer with DevOps Skills

Arlington, VA
Permanent
Job Title: Databricks Data Engineer with DevOps Skills
Location : Los Angeles CA (Hybrid)
Hire type : FTE / CTH
Rate : $75/hr
Salary : $130K
Indent : SF_OP_(phone number removed)-1

Job Summary
We are looking for an experienced Databricks Data Engineer with strong DevOps expertise to join our data engineering team. The ideal candidate will design, build, and optimize large-scale pipelines on the Databricks Lakehouse Platform on AWS, while driving automated CI/CD and deployment practices. This role requires strong skills in PySpark, SQL, AWS cloud services, and modern DevOps tooling. You will collaborate closely with cross-functional teams to deliver scalable, secure, and high-performance data solutions.Must Demonstrate (Critical Skills & Architectural Competencies)
  • Designing and implementing Databricks-based Lakehouse architectures on AWS
  • Clear separation of compute vs. serving layers
  • Ability to design low-latency data/API access strategies (beyond Spark-only patterns)
  • Strong understanding of caching strategies for performance and cost optimization
  • Data partitioning, storage optimization, and file layout strategy
  • Ability to handle multi-terabyte structured or time-series datasets
  • Skill in requirement probing, identifying what matters architecturally
  • A player-coach mindset: hands-on engineering + technical leadership

Key Responsibilities
1. Data Pipeline Development
  • Design, build, and maintain scalable ETL/ELT pipelines using Databricks on AWS.
  • Develop high-performance data processing workflows using PySpark/Spark and SQL.
  • Integrate data from Amazon S3, relational databases, and semi/non structured sources.
  • Implement Delta Lake best practices including schema evolution, ACID, OPTIMIZE, ZORDER, partitioning, and file-size tuning.
  • Ensure architectures support high-volume, multi-terabyte workloads.
2. DevOps & CI/CD
  • Implement CI/CD pipelines for Databricks using Git, GitLab, GitHub Actions, or AWS-native tools.
  • Build and manage automated deployments using Databricks Asset Bundles.
  • Manage version control for notebooks, workflows, libraries, and environment configuration.
  • Automate cluster policies, job creation, environment provisioning, and configuration management.
  • Support infrastructure-as-code via Terraform (preferred) or CloudFormation.
3. Collaboration & Business Support
  • Work with data analysts and BI teams to prepare curated datasets for reporting and analytics.
  • Collaborate closely with product owners, engineering teams, and business partners to translate requirements into scalable implementations.
  • Document data flows, technical architecture, and DevOps/deployment workflows.
4. Performance & Optimization
  • Tune Spark clusters, workflows, and queries for cost efficiency and compute performance.
  • Monitor pipelines, troubleshoot failures, and maintain high reliability.
  • Implement logging, monitoring, and observability across workflows and jobs.
  • Apply caching strategies and workload optimization techniques to support low-latency consumption patterns.
5. Governance & Security
  • Implement and maintain data governance using Unity Catalog.
  • Enforce access controls, security policies, and data compliance requirements.
  • Ensure lineage, quality checks, and auditability across data flows.
Technical Skills
  • Strong hands-on experience with Databricks, including:
    • Delta Lake
    • Unity Catalog
    • Lakehouse Architecture
    • Delta Live Pipelines
    • Databricks Runtime
    • Table Triggers
    • Databricks Workflows
  • Proficiency in PySpark, Spark, and advanced SQL.
  • Expertise with AWS cloud services, including:
    • S3
    • IAM
    • Glue / Glue Catalog
    • Lambda
    • Kinesis (optional but beneficial)
    • Secrets Manager
  • Strong understanding of DevOps tools:
    • Git / GitLab
    • CI/CD pipelines
    • Databricks Asset Bundles
  • Familiarity with Terraform is a plus.
  • Experience with relational databases and data warehouse concepts.
Preferred Experience
  • Knowledge of streaming technologies like Structured Streaming/Spark Streaming.
  • Experience building real-time or near real-time pipelines.
  • Exposure to advanced Databricks runtime configurations and performance tuning.
Certifications (Optional)
  • Databricks Certified Data Engineer Associate / Professional
  • AWS Data Engineer or AWS Solutions Architect certification

Job Type: Permanent

Job ID: 254265696