Site Reliability Engineer

2 weeks ago


Taguig, Philippines Socium - Teams Done Differently Full time

Recruitment Consultant @ Socium - Teams Done Differently Job Title: Site Reliability Engineering (SRE) Subject Matter Expert (SME) Overview We’re looking for an experienced SRE Subject Matter Expert (SME) to lead our reliability, performance, and automation initiatives. This role will design and drive best-in-class observability, performance engineering, AIOps, and reliability practices to ensure our systems are stable, scalable, and efficient . The ideal candidate is both hands‑on and strategic —able to solve technical problems, mentor teams, and influence company‑wide engineering decisions. Key Responsibilities 1. Observability & Monitoring Build and manage observability frameworks across logs, metrics, traces, and events . Design and maintain monitoring tools (e.g., Prometheus, Grafana, ELK, Splunk, Datadog, Dynatrace, New Relic ) for better system insights. Define and track SLOs, SLIs, and error budgets with product and engineering teams. Enable proactive incident detection and root cause analysis . 2. Performance Engineering Lead load, stress, and scalability testing for applications and infrastructure. Create performance models and capacity plans for critical systems. Work closely with developers to find and fix performance bottlenecks . 3. Reliability Engineering Automate incident response, disaster recovery, and self‑healing systems . Lead Chaos Engineering and resilience testing . Promote a blameless postmortem culture and drive reliability reviews. Ensure all systems follow best practices for fault tolerance and high availability . 4. AIOps & Automation Define and implement the AIOps strategy using ML/AI to improve observability and response. Use anomaly detection, event correlation, and predictive analytics for proactive issue resolution. Integrate AIOps tools with ITSM systems for smarter alerting and automated remediation. Act as a thought leader and mentor for SRE practices across teams. Collaborate with engineering, infrastructure, and business units to embed SRE principles company-wide. Champion a continuous improvement culture focused on availability, scalability, and operational excellence. Required Qualifications 10+ years in IT Operations, Reliability, or Performance Engineering. Deep expertise in observability and monitoring tools (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, etc.). Strong experience with performance testing tools (JMeter, LoadRunner, Gatling, k6, etc.) and capacity planning . Hands‑on experience with AWS, Azure, or GCP and container platforms (Kubernetes, Docker, OpenShift ). Skilled in automation (Terraform, Ansible, Python, Go, Shell scripting). Familiar with AIOps tools (Moogsoft, BigPanda, Dynatrace Davis AI, ServiceNow AIOps). Strong understanding of distributed systems, networking, CI/CD, and DevOps . Preferred Qualifications Experience leading enterprise‑wide SRE or observability transformations . Knowledge of Chaos Engineering tools (Gremlin, Chaos Mesh, Litmus). Familiarity with ITSM/ITIL and modern incident management. Excellent communication and stakeholder management, including executive‑level influence . Certifications in Google SRE, AWS DevOps, Azure SRE, or Datadog/Dynatrace (a plus). Seniority level Mid-Senior level Employment type Full‑time Job function Information Technology Industries Business Consulting and Services #J-18808-Ljbffr



  • Taguig, Philippines Philtech Inc. Full time

    Senior Technical Recruiter | Technology & Engineering About the Role We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with a strong focus on front-end application performance and reliability . In this role, you will ensure the scalability, availability, and responsiveness of our web and mobile user-facing platforms . You will...


  • Taguig, National Capital Region, Philippines Pan Asia Resources PH Inc. Full time ₱1,440,000 - ₱2,160,000 per year

    About the Role:We are seeking a skilled and motivated Site Reliability Engineer (SRE) with expertise in supporting and managing MQ and Kafka systems. The ideal candidate will have a strong background in Unix systems administration, experience with Kubernetes (preferred), and a passion for maintaining high availability, performance, and reliability in...


  • Taguig, Philippines Nasdaq Full time

    Nasdaq Taguig, National Capital Region, Philippines Site Reliability Engineer Join to apply for the Site Reliability Engineer role at Nasdaq Why Nasdaq When you work at Nasdaq, you’re working for more open and transparent markets so that more people can access opportunities. Connections can be made, jobs can be created, and communities can thrive. We want...


  • Taguig, Philippines Nasdaq, Inc. Full time

    Site Reliability Engineer page is loaded## Site Reliability Engineerlocations: Philippines - Taguig City - National Capital Regiontime type: Full timeposted on: Posted Todayjob requisition id: R ## Why NasdaqWhen you work at Nasdaq, you’re working for more open and transparent markets so that more people can access opportunities. Connections can be made,...


  • Taguig, Philippines Ingram Micro Full time

    Sr. Engineer, Software Development (Site Reliability Engineer) page is loaded## Sr. Engineer, Software Development (Site Reliability Engineer)remote type: Hybridlocations: Taguig City, Philippinestime type: Full timeposted on: Posted Todayjob requisition id: R- **It's fun to work in a company where people truly BELIEVE in what they're doing!**Job...


  • Taguig, Philippines Procter & Gamble Full time

    As an SRE in the Consumer Data Platform, you will play a vital role in enabling and sustaining superior brand communication with our consumers. The Consumer Data Platform is integral to enhancing consumer journeys, and achieving our reliability targets is essential for delivering this excellence. It supports over 250+ brand country combinations and...


  • Taguig, Philippines Amadeus Full time

    Amadeus Taguig, National Capital Region, Philippines Senior Service Reliability Engineer Amadeus Taguig, National Capital Region, Philippines Purpose of the role The Senior Site Reliability Engineer for Stratos will be responsible for ensuring the reliability, performance and scalability of our mission-critical platforms. In this role, you will safeguard...


  • Taguig, Philippines Amadeus Full time

    Principal Service Reliability Engineer Purpose of the role The Principal Site Reliability Engineer will be responsible for ensuring the reliability, performance and scalability of our mission‑critical platforms. In this role, you will be safeguarding operational excellence in the assigned product, influencing reliability strategies, integral in production...


  • Philippines - A.T. Yuchengco Centre - Taguig City TP ICAP Full time $100,000 - $120,000 per year

    Group OverviewThe TP ICAP Group is a world leading provider of market infrastructure.Our purpose is to provide clients with access to global financial and commodities markets, improving price discovery, liquidity, and distribution of data, through responsible and innovative solutions.Through our people and technology, we connect clients to superior liquidity...

  • Site Engineer

    2 weeks ago


    Taguig, Philippines SmoothMoves, Inc. Full time ₱35,000 - ₱50,000

    Responsibilities:Supervise dynamic compaction works and ensure compliance with project specs.Oversee field testing (CPT, SPT, plate load tests) carried out by subcontractor.Monitor settlement, vibration, and ground response during compaction.Prepare technical reports and maintain site records.Enforce health, safety, and environmental...