Site Reliability Engineering
3 hours ago
Job Description: Site Reliability Engineering (SRE) SME
Position Overview
We are seeking a highly skilled
Site Reliability Engineering (SRE) Subject Matter Expert (SME)
to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing modern SRE capabilities that improve system reliability, scalability, and efficiency across our IT ecosystem. This role requires deep technical expertise, hands-on problem-solving skills, and the ability to influence cross-functional teams.
Key Responsibilities
- Observability & Monitoring
- Define and implement observability frameworks across logs, metrics, traces, and events.
- Architect monitoring platforms (e.g., Prometheus, Grafana, ELK, Splunk, Datadog, Dynatrace, New Relic) to deliver actionable insights.
- Establish SLOs, SLIs, and error budgets in collaboration with product and engineering teams.
- Drive proactive incident detection and root cause analysis.
- Performance Engineering
- Lead performance benchmarking, load/stress testing, and scalability assessments of applications and infrastructure.
- Build performance models and capacity planning strategies for critical business systems.
- Partner with development teams to identify performance bottlenecks and optimize application/infrastructure efficiency.
- Reliability Engineering
- Design and implement automation for incident response, disaster recovery, and self-healing systems.
- Lead Chaos Engineering and Resilience testing initiatives.
- Drive reliability reviews, postmortems, and blameless RCA culture.
- Ensure best practices for fault tolerance, availability, and resilience are embedded in system design.
- AIOps & Intelligent Automation
- Define AIOps strategy and deploy ML/AI-driven observability and incident response capabilities.
- Leverage anomaly detection, event correlation, and predictive analytics for proactive IT operations.
- Integrate AIOps platforms with ITSM tools for intelligent ticketing, alert suppression, and automated remediation.
- Leadership & Evangelism
- Act as a thought leader in SRE practices, mentoring engineers and influencing leadership decisions.
- Partner with development, infrastructure, and business teams to embed SRE principles across the enterprise.
- Drive continuous improvement culture for availability, scalability, and operational excellence.
Required Qualifications
- 10+ years of experience in IT Operations, Reliability Engineering, or Performance Engineering.
- Deep expertise in observability and monitoring platforms (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, AppDynamics, etc.).
- Strong background in performance testing tools (JMeter, LoadRunner, Gatling, k6, etc.) and capacity planning.
- Hands-on experience in cloud platforms (AWS, Azure, GCP) and containerized environments (Kubernetes, Docker, OpenShift).
- Expertise in automation frameworks (Terraform, Ansible, Python, Go, Shell scripting).
- Experience with AIOps platforms (Moogsoft, BigPanda, Dynatrace Davis AI, ServiceNow AIOps, etc.) and ML-driven IT operations.
- Strong understanding of distributed systems, networking, CI/CD, and DevOps practices.
Preferred Qualifications
- Prior experience leading enterprise-wide SRE/Observability transformations.
- Knowledge of Chaos Engineering platforms (Gremlin, Chaos Mesh, Litmus).
- Exposure to ITSM/ITIL processes and modern incident management practices.
- Strong communication skills with ability to influence CxO-level stakeholders.
- Certifications: Google SRE, AWS DevOps Engineer, Azure SRE Expert, Dynatrace/Datadog certifications (preferred).
Key Competencies
- Strategic and analytical thinker with problem-solving mindset.
- Strong leadership, mentorship, and stakeholder engagement skills.
- Passionate about automation, scalability, and resilience engineering.
- Ability to balance reliability with velocity in fast-paced environments.
-
Site Reliability Engineer
1 week ago
Taguig, National Capital Region, Philippines Pan Asia Resources PH Inc. Full time ₱1,440,000 - ₱2,160,000 per yearAbout the Role:We are seeking a skilled and motivated Site Reliability Engineer (SRE) with expertise in supporting and managing MQ and Kafka systems. The ideal candidate will have a strong background in Unix systems administration, experience with Kubernetes (preferred), and a passion for maintaining high availability, performance, and reliability in...
-
Site Reliability Engineer
4 hours ago
Taguig, National Capital Region, Philippines Philtech Full time ₱1,200,000 - ₱2,400,000 per yearAbout the RoleWe are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with a strong focus on front-end application performance and reliability. In this role, you will ensure the scalability, availability, and responsiveness of our web and mobile user-facing platforms. You will collaborate closely with engineering, product, and design...
-
Site Reliability Engineering
4 hours ago
Taguig, National Capital Region, Philippines Tata Consultancy Services Full time ₱2,000,000 - ₱2,500,000 per yearRequired Qualifications10+ years of experience in IT Operations, Reliability Engineering, or Performance Engineering.Deep expertise in observability and monitoring platforms (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, AppDynamics, etc.).Strong background in performance testing tools (JMeter, LoadRunner, Gatling, k6, etc.) and capacity...
-
Service Reliability Engineer
4 hours ago
Taguig, National Capital Region, Philippines YONDU INC. Full time ₱900,000 - ₱1,200,000 per yearAbout the role: As a Service Reliability Engineer at YONDU INC.', you will be responsible for ensuring the smooth and reliable operation of the company's critical IT systems and infrastructure. This full-time position is based in Taguig City Metro Manila and is a key role in supporting the company's overall business objectives.What you'll be...
-
Site Reliability Engineer
4 hours ago
Taguig, National Capital Region, Philippines NASDAQ Full time ₱1,200,000 - ₱2,400,000 per yearWhy NasdaqWhen you work at Nasdaq, you're working for more open and transparent markets so that more people can access opportunities. Connections can be made, jobs can be created, and communities can thrive. We want all our employees to have access to opportunity, too. That means planning for career growth, ensuring you have the tools you need, and promoting...
-
Tech Lead, Site Reliability Engineering
4 hours ago
Taguig, National Capital Region, Philippines LSEG (London Stock Exchange Group) Full time ₱2,500,000 - ₱5,000,000 per yearRole ProfileWe are seeking a highly motivated and experienced Tech Lead to manage the Shared Site Reliability Engineering (SRE) team supporting Risk Intelligence Services within the Markets and Risk Intelligence division. This role is critical to ensuring operational continuity across multiple applications by maintaining a pool of cross-functional, adaptable...
-
Tech Lead, Site Reliability Engineering
3 hours ago
Taguig, National Capital Region, Philippines LSEG Full time $100,000 - $150,000 per yearRole ProfileWe are seeking a highly motivated and experiencedTech Leadto manage theShared Site Reliability Engineering (SRE) teamsupporting Risk Intelligence Services within theMarkets and Risk Intelligence division. This role is critical to ensuring operational continuity across multiple applications by maintaining a pool of cross-functional, adaptable team...
-
Senior Specialist, Site Reliability Engineering
4 hours ago
Taguig, National Capital Region, Philippines LSEG (London Stock Exchange Group) Full time $1,000,000 - $1,250,000 per yearRole ProfileWe are seeking a highly motivated and experienced Senior Associate to join the Shared Site Reliability Engineering (SRE) team supporting Risk Intelligence Services within the Markets and Risk Intelligence division. This role is essential to maintaining uninterrupted business operations across multiple applications while adhering to defined...
-
Senior Specialist, Site Reliability Engineering
3 hours ago
Taguig, National Capital Region, Philippines LSEG Full time ₱120,000 - ₱240,000 per yearRole ProfileWe are seeking a highly motivated and experienced Senior Associate to join the Shared Site Reliability Engineering (SRE) team supporting Risk Intelligence Services within the Markets and Risk Intelligence division. This role is essential to maintaining uninterrupted business operations across multiple applications while adhering to defined...
-
Site Reliability Engineer
2 weeks ago
Taguig, National Capital Region, Philippines YONDU INC. Full time ₱900,000 - ₱1,200,000 per yearNon nego:5+ years' experience (Senior)Will handle stakeholders like (bancnet, instapay, external and internal stakeholders)With Banking experience or Hybrid (Telco etc. as long as with Banking)Will handle 3 FTEsJIRA as Ticketing toolNice to have:Slack – communication toolOffice 365, MS Teams – KnowledgeAny certifications relating to Mobile...