Senior Staff Engineer
Job Title: Senior Staff Engineer
Duration: 06 Months
Location: Redwood City, CA 94065
Job Description:
The Senior Staff Engineer for NPE Observability is the preeminent technical strategist for global telemetry fabric. In this senior contract role, you will bridge the gap between high-scale distributed software and global network hardware, driving the architectural standards for our most complex data-intensive initiatives. You will own the technical integrity of our streaming pipelines, ensuring telemetry from the global fleet is ingested, normalized, and processed with sub-second latency. As a master of our tech stack (Java, Kafka, Postgres, Grafana), you will define the "Gold Standard" for technical excellence within the Network Platform Engineering (NPE) group.
Responsibilities:
Architectural Strategy & Technical Vision
Core Stack Evolution: Architect and optimize our primary ingestion and storage engines utilizing Java and PostgreSQL, ensuring high availability and performance at scale.
Real-Time Data Orchestration: Lead the design of high-throughput messaging systems using Apache Kafka to handle trillions of telemetry points with sub-second latency.
Unified Visibility: Define the global standard for observability visualization in Grafana, building complex, high-performance dashboards that aggregate data from diverse telemetry sources.
High-Scale Engineering & Innovation
Stream Processing Mastery: Architect massively parallel processing pipelines and stateful stream processing frameworks (utilizing tools like Apache Flink) to enable real-time anomaly detection.
Advanced R&D: Evaluate and prototype emerging technologies such as Model-Driven Telemetry (MDT) and ClickHouse/Thanos for long-term metric storage and high-cardinality data analysis.
Technical Roadmap Ownership: Drive the engineering team toward key milestones, ensuring the code we ship aligns with the 35 year long-term NPE vision.
Reliability & Systemic Leadership:
Service Standards: Define and monitor critical SLI/SLO metrics (e.g., P95 response times) to ensure the platform maintains world-class performance and global ITIL compliance.
Incident Authority: Serve as the senior point of contact for complex root-cause analysis, identifying architectural weaknesses in the Java/Kafka/Postgres stack to prevent future outages.
Stakeholder Synthesis: Translate complex product requirements into deep technical specifications, managing relationships with both internal software teams and external network vendors.
Required Qualifications & Experience
Tenure: 10 years of professional experience in software engineering and distributed systems.
Domain Expertise: 5 years of experience specifically in large-scale network engineering, telemetry, or observability platforms.
Java Expert: Mastery of Java for building high-performance, scalable backend services.
Data & Messaging: Deep expertise in PostgreSQL (schema design and tuning) and Apache Kafka (cluster architecture and stream management).
Visualization: Expert-level proficiency in Grafana for creating enterprise-level observability dashboards.
Large-Scale Systems: Proven experience with Prometheus, Thanos, or Click House and working within a structured Agile/Scrum environment.
Education: Bachelors or Masters degree in Computer Science or a related technical field.
