Site Reliability Engineer
2 days ago
Job Title:
Site Reliability Engineering (SRE) Subject Matter Expert (SME)
Overview
We're looking for an experienced
SRE Subject Matter Expert (SME)
to lead our reliability, performance, and automation initiatives. This role will design and drive best-in-class
observability, performance engineering, AIOps, and reliability
practices to ensure our systems are
stable, scalable, and efficient
.
The ideal candidate is both
hands-on and strategic
—able to solve technical problems, mentor teams, and influence company-wide engineering decisions.
Key Responsibilities
1. Observability & Monitoring
- Build and manage observability frameworks across
logs, metrics, traces, and events
. - Design and maintain monitoring tools (e.g.,
Prometheus, Grafana, ELK, Splunk, Datadog, Dynatrace, New Relic
) for better system insights. - Define and track
SLOs, SLIs, and error budgets
with product and engineering teams. - Enable proactive
incident detection and root cause analysis
.
2. Performance Engineering
- Lead
load, stress, and scalability testing
for applications and infrastructure. - Create
performance models
and
capacity plans
for critical systems. - Work closely with developers to
find and fix performance bottlenecks
.
3. Reliability Engineering
- Automate
incident response, disaster recovery, and self-healing systems
. - Lead
Chaos Engineering
and
resilience testing
. - Promote a
blameless postmortem culture
and drive reliability reviews. - Ensure all systems follow
best practices for fault tolerance and high availability
.
4. AIOps & Automation
- Define and implement the
AIOps strategy
using ML/AI to improve observability and response. - Use
anomaly detection, event correlation, and predictive analytics
for proactive issue resolution. - Integrate AIOps tools with
ITSM systems
for smarter alerting and automated remediation.
5. Leadership & Enablement
- Act as a
thought leader and mentor
for SRE practices across teams. - Collaborate with engineering, infrastructure, and business units to
embed SRE principles
company-wide. - Champion a
continuous improvement culture
focused on availability, scalability, and operational excellence.
Required Qualifications
- 10+ years
in IT Operations, Reliability, or Performance Engineering. - Deep expertise in
observability and monitoring tools
(Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, etc.). - Strong experience with
performance testing tools
(JMeter, LoadRunner, Gatling, k6, etc.) and
capacity planning
. - Hands-on experience with
AWS, Azure, or GCP
and container platforms (
Kubernetes, Docker, OpenShift
). - Skilled in
automation
(Terraform, Ansible, Python, Go, Shell scripting). - Familiar with
AIOps tools
(Moogsoft, BigPanda, Dynatrace Davis AI, ServiceNow AIOps). - Strong understanding of
distributed systems, networking, CI/CD, and DevOps
.
Preferred Qualifications
- Experience leading
enterprise-wide SRE or observability transformations
. - Knowledge of
Chaos Engineering tools
(Gremlin, Chaos Mesh, Litmus). - Familiarity with
ITSM/ITIL
and modern incident management. - Excellent communication and stakeholder management, including
executive-level influence
. - Certifications in
Google SRE, AWS DevOps, Azure SRE, or Datadog/Dynatrace
(a plus).
-
Site Reliability Engineer
2 days ago
Taguig, National Capital Region, Philippines weSource Management Consultancy Firm Full time ₱600,000 - ₱1,800,000 per yearWe are urgently Hiring for:Site Reliability EngineersHybrid BGCUp to 155K Gross Monthly**The Role**- Implement and maintain Observability platforms such as Datadog- Proactive monitoring of production and other environments to ensure stability, availability,security and integrity- Collaborate with cross-functional teams to ensure the reliability,...
-
Site Reliability Engineer
2 weeks ago
Taguig, National Capital Region, Philippines Pan Asia Resources PH Inc. Full time ₱1,440,000 - ₱2,160,000 per yearAbout the Role:We are seeking a skilled and motivated Site Reliability Engineer (SRE) with expertise in supporting and managing MQ and Kafka systems. The ideal candidate will have a strong background in Unix systems administration, experience with Kubernetes (preferred), and a passion for maintaining high availability, performance, and reliability in...
-
Site Reliability Engineer
7 days ago
Taguig, National Capital Region, Philippines Tata Consultancy Services Full time ₱900,000 - ₱1,200,000 per yearRole:EIT MQ L3About the Role:We are seeking a skilled and motivated Site Reliability Engineer (SRE) with expertise in supporting and managingMQ and Kafka systems. The ideal candidate will have a strong background in Unix systems administration, experience with Kubernetes (preferred), and a passion for maintaining high availability, performance, and...
-
Site Reliability Engineer
2 weeks ago
Taguig, National Capital Region, Philippines Procter & Gamble Full time ₱1,200,000 - ₱2,400,000 per yearJob LocationTaguig CityJob DescriptionInformation Technology (IT) at Procter & Gamble is where business, innovation and technology integrate to build a competitive advantage for P&G. Our mission is clear -- you deliver IT to help P&G win with consumers.Do you love implementing continuous improvement in IT solutions to drive efficiency and agility in meeting...
-
Site Reliability Engineer
1 week ago
Taguig, National Capital Region, Philippines Philtech Full time ₱1,200,000 - ₱2,400,000 per yearAbout the RoleWe are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with a strong focus on front-end application performance and reliability. In this role, you will ensure the scalability, availability, and responsiveness of our web and mobile user-facing platforms. You will collaborate closely with engineering, product, and design...
-
Site Reliability Engineer
2 days ago
Taguig, National Capital Region, Philippines weSource Management Consultancy Firm Full time ₱1,440,000 - ₱2,160,000 per yearWe are looking for Senior Site Reliability Engineer client in BGCSalary: up to 180kSet up: HybridJob responsibilities:Our SRE/DevOps Engineering team combines software and systems engineering to ensure that our production systems are always performing optimally and efficiently.SRE/DevOps Engineers are responsible for understanding how our systems interact...
-
Site Reliability Engineering
1 week ago
Taguig, National Capital Region, Philippines Tata Consultancy Services Full time ₱2,000,000 - ₱2,500,000 per yearRequired Qualifications10+ years of experience in IT Operations, Reliability Engineering, or Performance Engineering.Deep expertise in observability and monitoring platforms (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, AppDynamics, etc.).Strong background in performance testing tools (JMeter, LoadRunner, Gatling, k6, etc.) and capacity...
-
Principal Networks Site Reliability Engineer
2 days ago
Taguig, National Capital Region, Philippines Cloud Bridge Full time ₱3,300,000 per yearPrincipal Networks Site Reliability EngineerUp to 3.3 million per annum3 days per week in Manilla OfficeMy client are looking for a Site Reliability Engineer (SRE) to join their team. This position demands a strategic individual who can collaborate with cross-functional teams to implement cutting-edge best practices, drive process automation, and elevate...
-
Site Reliability Engineer
2 days ago
Taguig, National Capital Region, Philippines Philtech Full time ₱2,000,000 - ₱2,500,000 per yearREQUIREMENTS:· Bachelor's degree in Information Technology, Computer Science, Engineering, or any related course.· At least 5 years of working experience as SRE/Application and Maintenance Support.· Knowledge on the following technologies:o Cloud Platforms: Microsoft Azure, Google Cloud Platform (GCP)o Operating Systems:· Experience with Unix/Linux and...
-
Senior Site Reliability Engineer
2 days ago
Taguig, National Capital Region, Philippines weSource Management Consultancy Firm Full time ₱120,000 - ₱200,000 per yearWe are looking for Senior Site Reliability Engineer client in BGCSalary: up to 200kSet up: HybridJob responsibilities:Our DevOps Engineering team combines software and systems engineering in order to ensure that our production systems are always performing optimally and efficiently.DevOps Engineers are responsible for understanding how our systems interact...