Site Reliability Engineer
7 days ago
Job Title:
Site Reliability Engineering (SRE) Subject Matter Expert (SME)
Overview
We're looking for an experienced
SRE Subject Matter Expert (SME)
to lead our reliability, performance, and automation initiatives. This role will design and drive best-in-class
observability, performance engineering, AIOps, and reliability
practices to ensure our systems are
stable, scalable, and efficient
.
The ideal candidate is both
hands-on and strategic
—able to solve technical problems, mentor teams, and influence company-wide engineering decisions.
Key Responsibilities
1. Observability & Monitoring
- Build and manage observability frameworks across
logs, metrics, traces, and events
. - Design and maintain monitoring tools (e.g.,
Prometheus, Grafana, ELK, Splunk, Datadog, Dynatrace, New Relic
) for better system insights. - Define and track
SLOs, SLIs, and error budgets
with product and engineering teams. - Enable proactive
incident detection and root cause analysis
.
2. Performance Engineering
- Lead
load, stress, and scalability testing
for applications and infrastructure. - Create
performance models
and
capacity plans
for critical systems. - Work closely with developers to
find and fix performance bottlenecks
.
3. Reliability Engineering
- Automate
incident response, disaster recovery, and self-healing systems
. - Lead
Chaos Engineering
and
resilience testing
. - Promote a
blameless postmortem culture
and drive reliability reviews. - Ensure all systems follow
best practices for fault tolerance and high availability
.
4. AIOps & Automation
- Define and implement the
AIOps strategy
using ML/AI to improve observability and response. - Use
anomaly detection, event correlation, and predictive analytics
for proactive issue resolution. - Integrate AIOps tools with
ITSM systems
for smarter alerting and automated remediation.
5. Leadership & Enablement
- Act as a
thought leader and mentor
for SRE practices across teams. - Collaborate with engineering, infrastructure, and business units to
embed SRE principles
company-wide. - Champion a
continuous improvement culture
focused on availability, scalability, and operational excellence.
Required Qualifications
- 10+ years
in IT Operations, Reliability, or Performance Engineering. - Deep expertise in
observability and monitoring tools
(Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, etc.). - Strong experience with
performance testing tools
(JMeter, LoadRunner, Gatling, k6, etc.) and
capacity planning
. - Hands-on experience with
AWS, Azure, or GCP
and container platforms (
Kubernetes, Docker, OpenShift
). - Skilled in
automation
(Terraform, Ansible, Python, Go, Shell scripting). - Familiar with
AIOps tools
(Moogsoft, BigPanda, Dynatrace Davis AI, ServiceNow AIOps). - Strong understanding of
distributed systems, networking, CI/CD, and DevOps
.
Preferred Qualifications
- Experience leading
enterprise-wide SRE or observability transformations
. - Knowledge of
Chaos Engineering tools
(Gremlin, Chaos Mesh, Litmus). - Familiarity with
ITSM/ITIL
and modern incident management. - Excellent communication and stakeholder management, including
executive-level influence
. - Certifications in
Google SRE, AWS DevOps, Azure SRE, or Datadog/Dynatrace
(a plus).
-
Site Reliability Engineering
1 week ago
Taguig, National Capital Region, Philippines Tata Consultancy Services Full time ₱120,000 - ₱180,000 per yearJob Description: Site Reliability Engineering (SRE) SMEPosition OverviewWe are seeking a highly skilledSite Reliability Engineering (SRE) Subject Matter Expert (SME)to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing modern SRE capabilities...
-
Site Reliability Engineer
7 days ago
Taguig, National Capital Region, Philippines weSource Management Consultancy Firm Full time ₱1,200,000 - ₱1,860,000 per yearWe are urgently Hiring for:Site Reliability EngineersHybrid BGCUp to 155K Gross Monthly**The Role**- Implement and maintain Observability platforms such as Datadog- Proactive monitoring of production and other environments to ensure stability, availability,security and integrity- Collaborate with cross-functional teams to ensure the reliability,...
-
Site Reliability Engineer
1 week ago
Taguig, National Capital Region, Philippines Philtech Full time ₱1,200,000 - ₱2,400,000 per yearAbout the RoleWe are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with a strong focus on front-end application performance and reliability. In this role, you will ensure the scalability, availability, and responsiveness of our web and mobile user-facing platforms. You will collaborate closely with engineering, product, and design...
-
Site Reliability Engineering
1 week ago
Taguig, National Capital Region, Philippines Tata Consultancy Services Full time ₱2,000,000 - ₱2,500,000 per yearRequired Qualifications10+ years of experience in IT Operations, Reliability Engineering, or Performance Engineering.Deep expertise in observability and monitoring platforms (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, AppDynamics, etc.).Strong background in performance testing tools (JMeter, LoadRunner, Gatling, k6, etc.) and capacity...
-
Site Reliability Engineer
6 days ago
Taguig, National Capital Region, Philippines weSource Management Consultancy Firm Full time ₱100,000 - ₱180,000 per yearWe are looking for Senior Site Reliability Engineer client in BGCSalary: up to 180kSet up: HybridJob responsibilities:Our SRE/DevOps Engineering team combines software and systems engineering to ensure that our production systems are always performing optimally and efficiently.SRE/DevOps Engineers are responsible for understanding how our systems interact...
-
Principal Networks Site Reliability Engineer
7 days ago
Taguig, National Capital Region, Philippines Cloud Bridge Full time ₱3,300,000 per yearPrincipal Networks Site Reliability EngineerUp to 3.3 million per annum3 days per week in Manilla OfficeMy client are looking for a Site Reliability Engineer (SRE) to join their team. This position demands a strategic individual who can collaborate with cross-functional teams to implement cutting-edge best practices, drive process automation, and elevate...
-
Senior Site Reliability Engineer
7 days ago
Taguig, National Capital Region, Philippines weSource Management Consultancy Firm Full time ₱160,000 - ₱200,000 per yearWe are looking for Senior Site Reliability Engineer client in BGCSalary: up to 200kSet up: HybridJob responsibilities:Our DevOps Engineering team combines software and systems engineering in order to ensure that our production systems are always performing optimally and efficiently.DevOps Engineers are responsible for understanding how our systems interact...
-
Service Reliability Engineer
1 week ago
Taguig, National Capital Region, Philippines YONDU INC. Full time ₱900,000 - ₱1,200,000 per yearAbout the role: As a Service Reliability Engineer at YONDU INC.', you will be responsible for ensuring the smooth and reliable operation of the company's critical IT systems and infrastructure. This full-time position is based in Taguig City Metro Manila and is a key role in supporting the company's overall business objectives.What you'll be...
-
Head of Site Reliability Engineering
7 days ago
Taguig, National Capital Region, Philippines Acquire Intelligence Full time $150,000 - $250,000 per yearWe're an award-winning global outsourcer providing contact center and back office services on behalf of our global clients. Come work at a place where innovation and teamwork come together to support the most exciting missions in the worldAcquire Intelligence exists to help businesses unlock smarter ways of working. We believe that by combining the best of...
-
Site Reliability Engineer
1 week ago
Taguig, National Capital Region, Philippines NASDAQ Full time ₱1,200,000 - ₱2,400,000 per yearWhy NasdaqWhen you work at Nasdaq, you're working for more open and transparent markets so that more people can access opportunities. Connections can be made, jobs can be created, and communities can thrive. We want all our employees to have access to opportunity, too. That means planning for career growth, ensuring you have the tools you need, and promoting...