Site Reliability Engineer
2 days ago
Job Description:
• Handle service monitoring, incident response, and drive technical support efficiency
• Responsible for managing and maintaining network monitoring tools, systems, and
processes that ensure the availability, scalability, and performance of our production
environments.
• Responsible for incident handling, service monitoring, and technical support efficiency.
• Closely work with developers, DevOps, infrastructure teams, and different stakeholders
to achieve proactive incident prevention, issue resolution and incident documentations.
Key Responsibilities:
• Ensure that all tickets are updated and handled based on set KPI's and SLA's
• Manage monitoring, alerting, and logging tools to ensure system health and service
uptime.
• Ensure early detection, triage and escalation of service degradation based on defined
service level agreement
• Trigger L2 ticket handling and on-call rotations for critical incidents.
• Execute triage, diagnosis, and resolution of incidents required for L3 escalations, both
internal and 3rd party support teams
• Support major incident response, contribute to root cause analysis (RCA), and help
document postmortems.
• Track, analyze, and act on incident trends and recurring technical issues.
• Use data from ticketing systems (Jira, ServiceNow, etc.) to improve team responsiveness
and resolution quality.
• Update and maintain SOPs, runbooks, and knowledge base articles including the
documentation of known issues, fixes, and playbooks to improve mean time to resolution.
• Collaborate with development and QA teams to improve deployment readiness and
reliability
• Participate in technical competency mapping to ensure coverage and reduce unnecessary
escalations.
Skills and Competencies:
• Hands-on experience with ITSM platforms (e.g., ServiceNow, Jira Service Management).
• Familiarity with ITIL principles and ITSM process areas (incident, problem, request,
change, asset, and service catalog management).
• Basic knowledge of IT infrastructure components (networks, servers, applications) and
how they support IT services.
• Experience in monitoring system performance and escalating outages or performance
degradation.
• Ability to troubleshoot and document IT issues effectively for escalation and closure.
• Strong attention to detail in documentation, ticket updates, and asset records.
• Familiarity with regulatory and compliance frameworks (e.g., BSP, PDIC, ISO 27001,
COBIT) is a plus.
• Clear written and verbal communication skills for ticket handling and team collaboration.
• Proactive, detail-oriented, and able to manage multiple tasks in a structured IT operations
environment.
Qualifications and Experience:
• Bachelor's degree in Electronics Engineering, Information Technology, Computer
Science, Management Information Systems, or equivalent.
• 2–5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles..
• Hands-on experience with monitoring tools (e.g., Prometheus, Grafana, ELK, or
Datadog).
• Familiarity with incident response and troubleshooting in production systems.
• Experience with at least one cloud platform (AWS, GCP, or Azure).
• Knowledgeable in scripting (e.g., Python, Bash) and Linux systems.
• Exposure to ITIL-based processes, especially Incident and Problem Management.
• Experience working in fintech, banking, or SaaS with high availability SLAs.
• Familiarity with DevOps practices, CI/CD pipelines, and cloud-based monitoring tools.
• Experience with automation platforms
• Knowledge of BSP regulatory frameworks, policies, and guidelines.
-
Site Reliability Engineer
7 days ago
Manila, Philippines Tata Consultancy Services Full timeHuman Resources Executive at Tata Consultancy Services Job Description: Site Reliability Engineering (SRE) SME Position Overview We are seeking a highly skilled Site Reliability Engineering (SRE) Subject Matter Expert (SME) to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for...
-
Site Reliability Engineer
3 weeks ago
, Metro Manila, Philippines Broadridge Full timeJoin to apply for the Site Reliability Engineer (Hybrid) role at Broadridge 1 week ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (Hybrid) role at Broadridge Direct message the job poster from Broadridge Talent Acquisition Specialist @ Broadridge | Bridging Talent and Opportunity in Fintech | Mastering APAC Markets:...
-
Site Reliability Engineer
6 days ago
Manila, National Capital Region, Philippines HGS Offshore Staffing Solutions Full time ₱2,000,000 - ₱2,500,000 per yearSENIOR SITE RELIABILITY ENGINEERPOSITION OVERVIEWWe are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functionalcloud platform team. Working alongside a diverse group of DevOps and Site ReliabilityEngineers, you will combine deep technical expertise in AWS cloud infrastructure with strongleadership capabilities in incident...
-
Site Reliability Engineer
1 week ago
Manila, Philippines Russell Tobin Full timeSenior Associate - Talent Acquisition - Corporate Strategy Hiring | Specialized in APAC We are seeking a highly skilled Site Reliability Engineering (SRE) Subject Matter Expert (SME) to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing...
-
Site Reliability Engineer
2 days ago
Manila, National Capital Region, Philippines CDOps Tech Full time ₱120,000 - ₱180,000 per yearAbout the OpportunityWe are seeking a seasoned and passionate Site Reliability Engineer for a high-impact contract engagement with one of our key clients, a leader in the marketing-tech sector. This is not just a typical SRE role; you will be the foundational expert responsible for spearheading the adoption of SRE culture and practices within the client's...
-
Site Reliability Engineer
1 week ago
, Metro Manila, Philippines Michael Page Full timeJoin a growing team. Enjoy market-aligned salaries & benefits. About Our Client The hiring company is a large organization in the healthcare industry, focused on delivering innovative solutions to improve patient care and operational efficiency. The company is committed to leveraging cutting-edge technology to support its services. Job Description Oversee...
-
Site Reliability Engineer
2 weeks ago
Manila, National Capital Region, Philippines Russell Tobin Full time ₱120,000 - ₱180,000 per yearWe are seeking a highly skilledSite Reliability Engineering (SRE) Subject Matter Expert (SME)to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing modern SRE capabilities that improve system reliability, scalability, and efficiency across our...
-
SRE - Site Reliability Engineer
3 weeks ago
, Metro Manila, Philippines GCash Full timeJoin to apply for the SRE - Site Reliability Engineer role at GCash . Here in GCash we want to stay at the forefront of the FinTech industry by creating innovative, meaningful, and convenient financial solutions for the nation! G ka ba? Join the G Nation today! Roles And Responsibilities Responsible for the identification and assessment of potential risks of...
-
Engineer, Site Reliability
2 weeks ago
Southern Manila District, Philippines Royal Caribbean International Full timeOverview Position Summary: The Site Reliability Engineer (Senior SRE) reports to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making. The SRE uses performance metrics from various sources and tools to support tasks such as initial triage of critical production...
-
Cloud Site Reliability Engineer
3 weeks ago
Manila, Philippines Tyler Technologies Full timeJoin to apply for the Cloud Site Reliability Engineer role at Tyler Technologies Overview Responsibilities Implement tooling to monitor AWS EKS-based systems focusing on performance, reliability, and scalability. Ensure that architecture and deployment models are sufficient to support SLA commitments and are well prepared for future problems of scale....