Skip to main content

Databricks Architect

Lawrence Township, NJ
Permanent

Posted

At Staffworxs, we don't just connect talent we power transformation. Headquartered in Frisco, TX, with teams in Bengaluru and Hyderabad, we combine global reach with deep expertise. Our Digital & Data Analytics practice drives growth and innovation for some of the world's top brands, who continue to retain us as their trusted partner. If you're ready to make an impact, you're in the right place.

Job Details:

Title: Databricks Architect
Location: Princeton, NJ (Hybrid)
Duration: Long term Contract

Job Summary:
Key Responsibilities:
1. Strategic Planning & Discovery
" Conduct stakeholder discovery sessions to capture business priorities, SLAs, and success metrics across multiple domains.
" Build a comprehensive inventory of data sources including Delta Share, MongoDB, PostgreSQL, Kafka, Snowflake, and file-based feeds.
" Assess current-state architecture (EMR, custom JARs, Python workloads, orchestration tools).
" Classify and analyze ~1000+ data pipelines by complexity, risk, and business impact.
" Define a strategic, wave-based migration roadmap including pilot and target-state architecture.
2. Pilot Implementation & Ingestion Design
" Lead pilot implementations using representative pipelines (simple, medium, complex).
" Establish ingestion patterns using:
" Delta Share with Change Data Feed (CDF)
" File-based ingestion (Auto Loader, schema evolution)
" Implement Medallion architecture (Bronze, Silver, Gold layers).
" Replace legacy orchestration (Step Functions/custom schedulers) with Databricks Workflows.
3. Target Architecture & Governance
" Design scalable Lakehouse architecture with full CDF enablement.
" Define standardized ingestion patterns across all data sources.
" Implement Unity Catalog governance (naming conventions, access control, environment strategy).
" Establish enterprise data governance, security, and compliance standards (including MNPI controls).
" Define data quality, lineage, and observability frameworks (logging, metrics, alerting).
" Design business continuity and disaster recovery strategies.
4. Migration Execution & Engineering Excellence
" Lead migration of ~1000 EMR jobs to Databricks, prioritizing high-volume and critical pipelines.
" Map EMR workloads to optimized Databricks cluster configurations (job clusters, serverless, shared compute).
" Refactor pipelines to:
" Use Unity Catalog namespaces
" Standardize configurations and secrets
" Validate/replace custom JARs and Python dependencies
" Develop scalable, configuration-driven pipeline frameworks for migration acceleration.
5. CI/CD, DevOps & Platform Standards
" Establish CI/CD pipelines using Databricks Asset Bundles.
" Define infrastructure-as-code standards for clusters, jobs, and permissions.
" Implement branching strategies, code review processes, and deployment governance.
" Enable federated development with strong quality gates and repository governance.
6. Collaboration & Continuous Improvement
" Provide architectural guidance and unblock engineering teams during migration waves.
" Review code, validate adherence to standards, and resolve technical challenges.
" Continuously refine migration patterns, templates, and best practices.
" Advise on compute optimization (serverless vs classic clusters) based on workload insights.
Required Skills & Experience:
" Strong experience with Databricks Lakehouse platform (Delta Lake, Unity Catalog, Workflows)
" Hands-on expertise in big data ecosystems (Spark, EMR, Kafka, Snowflake, MongoDB, RDBMS)
" Proven experience in large-scale data migration projects
" Knowledge of data architecture patterns (Medallion architecture, CDC/CDF pipelines)
" Experience with CI/CD, DevOps, and Infrastructure-as-Code
" Strong understanding of data governance, security, and compliance frameworks
" Proficiency in Python, SQL, and/or Scala/Java
" Experience with orchestration tools and workflow modernization
Preferred Qualifications:
" Experience in financial services or regulated environments (MNPI handling)
" Exposure to GenAI / Mosaic AI architectures
" Experience with enterprise-scale data quality and lineage tools
" Familiarity with multi-environment (dev/staging/prod) deployment strategies
Staffworxs is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive workplace for all employees, regardless of race, color, religion, gender, sexual orientation, national origin, age, disability, or veteran status.

Job Type: Permanent

Job ID: 254680191