Site Reliability Engineer

17 hours ago


Metro Manila Philippines Buscojobs Full time

Site Reliability Engineer jobs in the Philippines

47 Site Reliability Engineer jobs in the Philippines Site Reliability Engineer

Posted today

Job Viewed

Tap Again To Close

Job Description

Responsibilities:

  • Develop, maintain, and optimize SAP landscapes on GCP for our clients, ensuring optimal performance, reliability, and efficiency.
  • Utilize industry-leading tools such as Git, Azure DevOps, Terraform, and Ansible for the management and deployment of infrastructure as code (IAC) solutions.
  • Adhere to best practices for managing systems and services across various cloud environments.
  • Monitor production environments using Google Cloud Operations Suite to track metrics, logs, and system performance, ensuring high levels of system and infrastructure availability.
  • Work with development teams to identify and implement automation in appropriate areas.
  • Handle code deployments in all environments.
  • Maintain system standards and securities for Operating System and Cloud Best practices
  • Work with other teams, especially SAP basis, on Incident and Problem Resolution and fulfillment of Service and Change Requests.
  • Adhere to Enterprise-level Operational Procedures, such as Change Control.

Qualifications

  • Minimum of 2 years of experience working in Operations environments that utilize DevOps tooling.
  • Hands-on experience with GCP services, including Compute Engine, Cloud Storage, VPC, IAM, Cloud Monitoring, and Cloud Logging.
  • Strong Linux (SUSE or RHEL experience preferred) and Windows System Administration skills.
  • Proficiency in leveraging DevOps tools such as Terraform, Ansible, Azure DevOps, and Git.
  • Experience managing SAP workloads on GCP, including understanding of SAP HANA, NetWeaver, and S/4HANA cloud deployments.
  • Excellent problem-solving skills and attention to detail.
  • Proficiency in ITIL procedures such as Incident Management, Problem Management and Change Management.
  • Strong troubleshooting, documentation, and communication skills.
  • Ability to automate complex tasks with tools such as Terraform / Ansible / Bash / PowerShell
  • Knowledge of Agile methodologies
  • Work experience across Development and Support
  • Relevant certification in Cloud technologies, SAP, or DevOps would be advantageous
  • ITIL certification is advantageous

Job Type: Full-time

Pay: Php49, Php132,202.38 per year

Application Question(s):

  • What is your salary expectation ?

Experience:

  • DevOps : 1 year (Preferred)
  • SAP: 1 year (Preferred)
  • ITIL: 1 year (Preferred)

This advertiser has chosen not to accept applicants from your region.

0

Site Reliability Engineer

Makati City, National Capital Region ₱ - ₱ Y Drake International Philippines

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Drake International Philippines is actively hiring for an IT Observability Engineer / Site Reliability Engineer that is eager to boost their growing career upward

ABOUT THE ROLE:

Job Title: IT Observability Engineer / Site Reliability Engineer

Employment Type: 6-month contract (Renewable)

Work Set-up and location: Onsite, Makati

Work Schedule: Mondays to Fridays

Here's what we're looking for an IT Observability Engineer / Site Reliability Engineer:

  • Must have 4+ years of experience in IT operations or a similar role, with a solid foundation in monitoring and observability principles
  • Must be proficient in Prometheus, Grafana, Splunk, Jaeger, or similar industry-leading tools
  • Must have solid experience with cloud-based observability platforms like AWS CloudWatch or Azure Monitor
  • Must have knowledge of security monitoring tools and incident response best practices and experience with incident response methodologies
  • Must have experience in scripting and automation, with proficiency in languages like Python, Bash, or Go for data manipulation and automation tasks

Apply now

This advertiser has chosen not to accept applicants from your region.

1

Site Reliability Engineer

Makati City, National Capital Region ₱ - ₱ Y Cambridge University Press & Assessment | Manila

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

NOTE: When you click the apply button, you will be redirected to Cambridge University Press & Assessment's website where you will be required to create a profile and upload a copy of your CV to complete your application.

Work setup: Hybrid (open to 2x a week in the office)

Work schedule: 10AM to 6PM Manila time

Employment type: Permanent

Pay range: Php 60,000 to Php 81,000

We value transparency and encourage applicants comfortable with this range to apply.

Discover a world of endless possibilities with Cambridge University Press & Assessment, a distinguished global academic publisher and assessment organization proudly affiliated with the prestigious University of Cambridge.

We are recruiting for a Site Reliability Engineers who will be part of our SRE function within the Platform Operations Team. This is a new team of engineers who will work alongside English Technologies existing Platform Support and Engineering teams.

Why Cambridge?

Cambridge University Press & Assessment is a world-renowned not-for-profit academic publisher and assessment organisation, proudly part of the prestigious University of Cambridge. With a legacy rooted in over 800 years of educational excellence, we are dedicated to unlocking the potential of learners and educators across the globe.

What can you get from Cambridge?

At Cambridge, you'll become a part of a vibrant and forward-thinking community that transcends tradition, fostering a culture of continuous growth and personal development. Here, we provide the right environment for you to thrive, supporting your professional journey and empowering you to reach your highest potential, that is why our pay philosophy is intricately tied to your skills and competencies, ensuring that your compensation aligns with the unique value you bring to the role you are applying for.

The organization offers a wide range of benefits and opportunities including:

  • Regular Employment on Day 1
  • HMO Coverage and Life Insurance on Day 1
  • Paid Annual Leaves (Vacation, Well-being, Flexible, Holiday, and Volunteering leaves)
  • Opportunities for career growth and development
  • Access to well-being programs
  • Flexible schedule, hybrid work arrangement and work-life balance
  • Opportunity to collaborate with colleagues from diverse branches that will expand your horizons

What will you do as a Site Reliability Engineer?

  • The Site Reliability Engineer will join a new SRE function within the Platform Operations Team working alongside existing Platform Support and Engineering teams.
  • The role will be responsible for support and design aspects of the English Engineers ecosystem (Platforms, Applications, Services and Websites).
  • Responsible for creating and maintaining software and processes that ensure the reliability and availability of the English digital platforms/websites and their software delivery pipelines.

Please review the attached job description for further details on the role.

What makes you the ideal candidate for this role?

  • Education & Experience: Degree or equivalent experience with at least 3 years in AWS Cloud Engineering, Architecture, or Infrastructure, combined with 3+ years in a Systems Admin or DevOps role.
  • DevOps & Delivery Model: Experience with DevOps delivery for infrastructure, applications, and configuration, including Infrastructure as Code (Terraform, CDK), CI/CD (GitHub Actions, Bitbucket Pipelines), and containerization/orchestration (Docker, Kubernetes).

Monitoring & Logging: Expertise with central logging systems (ELK/EFK stack), monitoring tools (New Relic, Datadog, Grafana, Alert Manager, PagerDuty, site24x7), and troubleshooting production issues in cloud environments.

Cloud Infrastructure: Deep knowledge of AWS services such as Fargate, Route53, CloudWatch, API Gateway, Lambda, CodePipeline, CloudFormation, DynamoDB, and networking.

  • Application & Database: Breadth of experience across Elasticsearch, MySQL, PostgreSQL, Java, Git/GitHub, and Confluent/Kafka.

Technical Skills: Strong troubleshooting, debugging, documentation, and communication abilities.

Ways of Working: Experience working in Agile product development environments and collaborating with global teams across cultures.

Are you driven by desire to be part of a globally renowned institution that celebrates innovation, embraces inclusion, and empowers learners? Then, we invite you to Pursue your Potential with us.

Applications received through the system will be reviewed on a rolling basis and may close the vacancy once sufficient applications are received. Therefore, if you are interested, tailor-fit your CV (advantageous if you submit one with a Cover Letter) and submit as early as possible

This advertiser has chosen not to accept applicants from your region.

2

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Shift Schedule: Day Shift

Work Setup: Hybrid (3-4x a week onsite)

Job Description:

  • Handle service monitoring, incident response, and drive technical support efficiency
  • Responsible for managing and maintaining network monitoring tools, systems, and processes that ensure the availability, scalability, and performance of our production environments.
  • Responsible for incident handling, service monitoring, and technical support efficiency.
  • Closely work with developers, DevOps, infrastructure teams, and different stakeholders to achieve proactive incident prevention, issue resolution and incident documentations.

Key Responsibilities:

  • Ensure that all tickets are updated and handled based on set KPI's and SLA's
  • Manage monitoring, alerting, and logging tools to ensure system health and service uptime.
  • Ensure early detection, triage and escalation of service degradation based on defined service level agreement
  • Trigger L2 ticket handling and on-call rotations for critical incidents.
  • Execute triage, diagnosis, and resolution of incidents required for L3 escalations, both internal and 3rd party support teams
  • Support major incident response, contribute to root cause analysis (RCA), and help document postmortems.
  • Track, analyze, and act on incident trends and recurring technical issues.
  • Use data from ticketing systems (Jira, ServiceNow, etc.) to improve team responsiveness and resolution quality.
  • Update and maintain SOPs, runbooks, and knowledge base articles including the documentation of known issues, fixes, and playbooks to improve mean time to resolution.
  • Collaborate with development and QA teams to improve deployment readiness and reliability
  • Participate in technical competency mapping to ensure coverage and reduce unnecessary escalations.

Skills and Competencies:

  • Hands-on experience with ITSM platforms (e.g., ServiceNow, Jira Service Management).
  • Familiarity with ITIL principles and ITSM process areas (incident, problem, request, change, asset, and service catalog management).
  • Basic knowledge of IT infrastructure components (networks, servers, applications) and how they support IT services.
  • Experience in monitoring system performance and escalating outages or performance degradation.
  • Ability to troubleshoot and document IT issues effectively for escalation and closure.
  • Strong attention to detail in documentation, ticket updates, and asset records.
  • Familiarity with regulatory and compliance frameworks (e.g., BSP, PDIC, ISO 27001, COBIT) is a plus.
  • Clear written and verbal communication skills for ticket handling and team collaboration.
  • Proactive, detail-oriented, and able to manage multiple tasks in a structured IT operations environment.

Qualifications and Experience

  • Bachelor's degree in Electronics Engineering, Information Technology, Computer Science, Management Information Systems, or equivalent.
  • 2–5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
  • Hands-on experience with monitoring tools (e.g., Prometheus, Grafana, ELK, or Datadog).
  • Familiarity with incident response and troubleshooting in production systems.
  • Experience with at least one cloud platform (AWS, GCP, or Azure).
  • Knowledgeable in scripting (e.g., Python, Bash) and Linux systems.
  • Exposure to ITIL-based processes, especially Incident and Problem Management.
  • Experience working in fintech, banking, or SaaS with high availability SLAs.
  • Familiarity with DevOps practices, CI/CD pipelines, and cloud-based monitoring tools.
  • Experience with automation platforms
  • Knowledge of BSP regulatory frameworks, policies, and guidelines

This advertiser has chosen not to accept applicants from your region.

3

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Responsibilities:

Software Maintenance and Support:

Monitor software performance and suggest improvements.

  • Ensure software systems are secure and compliant with industry standards.
  • Conduct code reviews and provide feedback to developers.
  • Develop and maintain automated testing scripts.

Respond promptly to alerts, incidents, troubleshoot issues, and restore services during production emergencies.

  • Collaborate with stakeholders to resolve complex problems.

Automation:

Identify opportunities for automation and implement solutions.

  • Streamline repetitive tasks to improve efficiency.

Collaboration and Documentation:

Work closely with the development team to address issues and implement enhancements.

  • Participate in code reviews to improve coding skills.
  • Write detailed technical and user documentation.

Testing:

Conduct unit and regression tests to validate application functionality.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • Proficiency in multiple programming languages such as Java and/or Python

Knowledge of the following technologies:

Agile methodology and tools

  • Different operating systems (Windows, Unix/Linux).
  • Databases and SQL such as MongoDB and/or Snowflake.
  • Monitoring platform such as Grafana, Microsoft Log Analytics, others
  • Micro Services and Containers
  • Databricks
  • Google Cloud Platform or Microsoft Azure
  • Open-Source Frameworks (e.g., Struts, Spring, Hibernate) is a plus

This advertiser has chosen not to accept applicants from your region.

4

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Job Description

Compensation range varies off level of experience:

Jr SRE $12k-$8k/yr, Intermediate: 20k- 30k/yr, Senior: 35k - 50k/yr

Some travel may be required.

Card payment domain knowledge/experience is key:

Our client, a global Business Process Outsourcing (BPO) businesses is looking for Site Reliability Engineers (SRE) to support their client, a global payment technology company that provides platforms to consumers, businesses and organizations to make electronic payments. The successful candidate will be responsible for ensuring site reliability & performance, monitoring & alerting, and supporting emergency response situations. This would require working closely with software engineers, DevOps and product teams to maintain robust infrastructure and automation that supports mission-critical applications.

The ideal candidate creates a bridge between development and operations by applying a software engineering mindset to service management. We are seeking an individual who is highly motivated, intellectually curious, and seeks out opportunities for improvement.

The Role:

This role involves working with a team of talented SREs/DevOps Engineers to support highly scalable services. Responsibilities include:

  • Responsible for pipeline build and maintenance in accordance with

the clients tooling and conventions.

  • Participate in the software development lifecycle, working closely with the

development team to ensure that designed solutions meet non-functional

requirements such as availability, performance, security and

maintainability standards.

  • Maintain services through monitoring of metrics, system health, and

analysis of reports.

  • Provide support for production and in-house systems. Participate in on-

call Production support rota.

  • Incident management, on call support and root cause analysis conducting post incident reviews and 5-Whys analysis.
  • Remediate system vulnerability , security and resiliency measures.
  • Improve process and systems within the Program.
  • Lead incident management efforts by proactively monitoring and analyzing ISO 8583 financial transaction messages across the 4-party payment model (Cardholder, Merchant, Acquirer, Issuer).

Skills & requirements: MIN 2+ years of experience

  • Card payment domain knowledge (mandatory)
  • Experience with CI/CD and Build pipelines using Jenkins.
  • Experience in public and private Cloud offerings (PCF, Azure, AWS etc.).
  • Knowledge of NoSQL & SQL databases such as Mongo / Oracle/
  • Experience and knowledge of managing distributed systems and working

with microservices.

  • Familiarity with Unix tooling, with strong scripting skills
  • Exposure to working with Monitoring and Alerting tools such as Splunk,

Dynatrace

  • Proficiency in one of the following: Python, Java, GO or equivalent.
  • Familiarity defining SLO's and SLA's
  • Prior experience of working in an SRE/DevOps team and excellent understanding of SRE/DevOps principles.
  • High degree of initiative and self-motivation, with a willingness to take on

challenging opportunities.

  • Excellent communication and relationship building/collaboration skills.

This advertiser has chosen not to accept applicants from your region.

5

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Responsibilities:

Software Maintenance and Support:

Monitor software performance and suggest improvements.

  • Ensure software systems are secure and compliant with industry standards.
  • Conduct code reviews and provide feedback to developers.
  • Develop and maintain automated testing scripts.

Respond promptly to alerts, incidents, troubleshoot issues, and restore services during production emergencies.

  • Collaborate with stakeholders to resolve complex problems.

Automation:

Identify opportunities for automation and implement solutions.

  • Streamline repetitive tasks to improve efficiency.

Collaboration and Documentation:

Work closely with the development team to address issues and implement enhancements.

  • Participate in code reviews to improve coding skills.
  • Write detailed technical and user documentation.

Testing:

Conduct unit and regression tests to validate application functionality.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • Proficiency in multiple programming languages such as Java and/or Python

Knowledge of the following technologies:

Agile methodology and tools

  • Different operating systems (Windows, Unix/Linux).
  • Databases and SQL such as MongoDB and/or Snowflake.
  • Monitoring platform such as Grafana, Microsoft Log Analytics, others
  • Micro Services and Containers
  • Databricks
  • Google Cloud Platform or Microsoft Azure
  • Open-Source Frameworks (e.g., Struts, Spring, Hibernate) is a plus

This advertiser has chosen not to accept applicants from your region.

6

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Job Description

Compensation range varies off level of experience:

Jr SRE $12k-$8k/yr, Intermediate: 20k- 30k/yr, Senior: 35k - 50k/yr

Some travel may be required.

Card payment domain knowledge/experience is key:

Our client, a global Business Process Outsourcing (BPO) businesses is looking for Site Reliability Engineers (SRE) to support their client, a global payment technology company that provides platforms to consumers, businesses and organizations to make electronic payments. The successful candidate will be responsible for ensuring site reliability & performance, monitoring & alerting, and supporting emergency response situations. This would require working closely with software engineers, DevOps and product teams to maintain robust infrastructure and automation that supports mission-critical applications.

The ideal candidate creates a bridge between development and operations by applying a software engineering mindset to service management. We are seeking an individual who is highly motivated, intellectually curious, and seeks out opportunities for improvement.

The Role:

This role involves working with a team of talented SREs/DevOps Engineers to support highly scalable services. Responsibilities include:

  • Responsible for pipeline build and maintenance in accordance with

the clients tooling and conventions.

  • Participate in the software development lifecycle, working closely with the

development team to ensure that designed solutions meet non-functional

requirements such as availability, performance, security and

maintainability standards.

  • Maintain services through monitoring of metrics, system health, and

analysis of reports.

  • Provide support for production and in-house systems. Participate in on-

call Production support rota.

  • Incident management, on call support and root cause analysis conducting post incident reviews and 5-Whys analysis.
  • Remediate system vulnerability , security and resiliency measures.
  • Improve process and systems within the Program.
  • Lead incident management efforts by proactively monitoring and analyzing ISO 8583 financial transaction messages across the 4-party payment model (Cardholder, Merchant, Acquirer, Issuer).

Skills & requirements: MIN 2+ years of experience

  • Card payment domain knowledge (mandatory)
  • Experience with CI/CD and Build pipelines using Jenkins.
  • Experience in public and private Cloud offerings (PCF, Azure, AWS etc.).
  • Knowledge of NoSQL & SQL databases such as Mongo / Oracle/
  • Experience and knowledge of managing distributed systems and working

with microservices.

  • Familiarity with Unix tooling, with strong scripting skills
  • Exposure to working with Monitoring and Alerting tools such as Splunk,

Dynatrace

  • Proficiency in one of the following: Python, Java, GO or equivalent.
  • Familiarity defining SLO's and SLA's
  • Prior experience of working in an SRE/DevOps team and excellent understanding of SRE/DevOps principles.
  • High degree of initiative and self-motivation, with a willingness to take on

challenging opportunities.

  • Excellent communication and relationship building/collaboration skills.

This advertiser has chosen not to accept applicants from your region.

7

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

DFI Team Brief

As a Site Reliability Engineer (SRE) at DFI Retail Group, you will be the bridge between development and operations, ensuring our systems are designed, implemented, and maintained for maximum reliability, scalability, and performance. You will leverage your software engineering expertise to automate operations, optimize system performance, and develop solutions that prevent recurring issues. Your work will be essential in guaranteeing seamless experience for our users by maintaining the high availability and efficiency of our services.

Responsibilities:

  • Design and Implement Solutions for Reliability and Scalability: Develop and implement highly scalable and available system architectures to meet growing user demands without compromising performance.
  • Automate Operations: Design, build, and integrate software tools to automate operational processes, including system monitoring, incident response, and deployment procedures.
  • Optimize System Performance: Proactively monitor system performance, identify bottlenecks, and implement optimization strategies to ensure efficient resource utilization and service delivery.
  • Implement and Manage Monitoring and Observability: Establish comprehensive service metrics and implement robust monitoring systems to track, analyze, and report on system reliability, performance, and efficiency including monitoring systems (New Relic, Azure Monitor, Google Cloud Monitoring). Utilize observability tools to gain deeper insights into system behavior and identify potential issues proactively.
  • Incident Response and Resolution: Develop and implement strategies for rapid incident detection and response. Troubleshoot and resolve complex system issues, minimizing downtime and mitigating service disruptions.
  • Capacity Planning and Performance Tuning: Conduct capacity planning analyses to anticipate future resource needs and ensure system scalability. Proactively tune system performance to optimize resource utilization and maintain SLAs.
  • Collaboration with Development Teams: Work closely with software development teams to integrate reliability considerations throughout the software development lifecycle. Participate in code reviews, design discussions, and post-incident reviews to enhance system reliability and prevent recurring issues.
  • Drive Continuous Improvement: Continuously evaluate existing processes and tools, identifying areas for improvement and automation. Research and implement new technologies and best practices to enhance system reliability and operational efficiency.
  • Documentation and Knowledge Sharing: Create and maintain comprehensive documentation for systems, processes, and incident responses. Share knowledge and best practices with the team.
  • Administer Atlassian Product Suite: Jira, Confluence, and Bitbucket management and support.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • Proven experience in SRE, DevOps, or related roles (5+ years preferred).
  • Hands-on experience with the Atlassian product suite (Jira, Confluence, Bitbucket).
  • Cloud platform experience (AWS, Azure, or GCP).
  • Scripting and automation with Terraform or Ansible.

Familiarity with Monitoring and log systems (Prometheus, Zabbix, Grafana, ELK, Azure Monitor, Google Monitoring)

  • Containerization and orchestration (Docker, Kubernetes).
  • Networking fundamentals.
  • CI/CD pipelines and automation tooling.
  • Security best practices in cloud environments.
  • Strong analytical and communication skills.
  • Heavy emphasis on collaboration and problem solving.

This advertiser has chosen not to accept applicants from your region.

8

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Benefits:

  • Flexible hybrid work setup
  • IT Equipment provided
  • HMO coverage from Day 1 for you and dependents
  • Retirement package with company matching
  • Life and Accident Insurance from Day 1
  • Annual PTO with growth opportunities
  • Competitive benefits with merit increases
  • Learning and development opportunities

Accountabilities:

  • Design and maintain scalable, secure, high-availability AWS infrastructure
  • Infrastructure as Code with Terraform, CloudFormation, Ansible
  • Manage ECS/ECR container platforms
  • Automation & CI/CD pipelines
  • Automate routine tasks and ensure security in pipelines (DevSecOps)
  • Monitoring, observability, SLIs/SLOs/SLAs
  • Incident management and on-call rotations
  • Postmortems and root cause analysis
  • Self-healing and chaos engineering concepts
  • Security, IAM, secrets management and compliance alignment (SOC 2, ISO)
  • Collaborate with software engineers, QA, and product teams
  • Mentor junior engineers and contribute to team knowledge sharing

Qualifications:

  • Bachelor's degree or equivalent
  • Proven SRE/DevOps experience
  • Linux, networking, and cloud proficiency (AWS preferred)
  • CI/CD and container orchestration experience (ECS, Kubernetes)
  • Terraform, Ansible proficiency
  • AWS Certifications are a plus

This advertiser has chosen not to accept applicants from your region.

9

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description
  • Handle service monitoring, incident response, and drive technical support efficiency.
  • Manage network monitoring tools and ensure availability, scalability, and performance.
  • Incident handling and technical support efficiency.
  • Collaborate with developers, DevOps and stakeholders to prevent incidents and document resolutions.

Key Responsibilities:

  • Update tickets per KPI/SLA
  • Manage monitoring, alerting, and logging
  • Early detection and escalation
  • On-call rotations for critical incidents
  • Triaged incident resolutions for L3 escalations
  • RCAs and postmortems
  • Data-driven improvements from tickets
  • Maintain SOPs and runbooks
  • Assist deployment readiness and reliability
  • Knowledge sharing and competency mapping

Qualifications:

  • Bachelor's degree or equivalent
  • Java and/or Python proficiency
  • Experience with Agile, various OS, databases (MongoDB, Snowflake), monitoring tools
  • GCP/AWS/Azure experience

This advertiser has chosen not to accept applicants from your region.

10

Be The First To Know

About the latest Site reliability engineer Jobs in Philippines

Set Email Alert:

Job title

Location

6

Site Reliability Engineer

₱ - ₱ Y HGS Offshore Staffing Solutions (HGS OSS)

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

POSITION OVERVIEW

We are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functional cloud platform team. Working alongside a diverse group of DevOps and Site Reliability Engineers, you will combine deep technical expertise in AWS cloud infrastructure with strong leadership capabilities in incident response and system reliability. In this role, you will be instrumental in leading incident response, maintaining, optimising and scaling our cloud infrastructure while ensuring exceptional system reliability and performance.

KEY RESPONSIBILITIES

• Lead incident response from initial detection, real-time mitigation, root cause analysis, post-mortem documentation and implementation of lessons learned, with a focus on continuous improvement.

• Develop and execute comprehensive incident response strategies to minimise downtime and business impact

• Participate in a 24/7 on-call rotation to ensure continuous system availability

• Implement and maintain comprehensive observability solutions using Cloudwatch, DataDog or similar monitoring platforms

• Maintain, improve, and optimise AWS infrastructure using Terraform while ensuring scalability, reliability, and cost efficiency.

• Continuously assess and enhance AWS infrastructure to optimise performance and cost

• Monitor and optimise serverless technologies including AWS Lambda and API Gateway for peak performance and cost efficiency

• Monitor and maintain ECS Fargate deployments for containerised applications

• Collect and analyse metrics to identify resource consumption, abnormal behavior, and potential performance bottlenecks

• Configure and manage alerting, dashboards, and automated monitoring across distributed systems

• Foster improved collaboration between development and operations teams by implementing SRE practices

REQUIRED QUALIFICATIONS

• Previous experience in a DevOps or SRE role

• Exceptional written and verbal communication skills

• Proven experience in incident response and 24/7 on-call responsibilities

• Expert-level knowledge of Infrastructure as Code, primarily Terraform

• Expert-level knowledge of AWS compute infrastructure

• Proficiency in automation tools and scripting languages

• Strong understanding of monitoring, metrics collection, and performance analysis

• Expert knowledge of observability and monitoring platforms such as DataDog, New Relic, Prometheus

• Experience with log aggregation, APM, and distributed tracing

• Excellent collaboration abilities and ability to work across teams

• Strong analytical and problem-solving skills

• Demonstrated ability to work autonomously and take ownership

PREFERRED QUALIFICATIONS

• Payments domain experience and PCI compliance familiarity

• Experience with container orchestration and microservices

• Cloud security best practices knowledge

This advertiser has chosen not to accept applicants from your region.

11

Site Reliability Engineer

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

DFI Retail Group SRE overview and responsibilities noted above apply here as well, including incident response, observability, automation, security, and collaboration with development teams. The candidate will design scalable, reliable systems and participate in on-call rotations, postmortems, and continuous improvement.

Qualifications:

  • Bachelor's degree or equivalent
  • Experience in SRE/DevOps
  • Linux, networking, and cloud proficiency (AWS preferred)
  • CI/CD and container orchestration (ECS, Kubernetes)
  • Terraform/Ansible proficiency
  • AWS certifications advantageous

This advertiser has chosen not to accept applicants from your region.

12

What other jobs are popular in this category?

Explore these high-demand roles to expand your search:

Didn't find the right job? Get Career Advice to find your ideal role.

What Locations Can I Find These Jobs In?
  • Receive personalized job recommendations.
#J-18808-Ljbffr

  • , Metro Manila, Philippines ABC Worldwide (AKA BRIP Careers Worldwide) Full time

    Overview Our client, a global Business Process Outsourcing (BPO) business, is looking for Site Reliability Engineers (SRE) to support their global payment technology company that provides platforms to consumers, businesses and organizations to make electronic payments. The successful candidate will be responsible for ensuring site reliability & performance,...


  • , Metro Manila, Philippines Broadridge Full time

    Join to apply for the Site Reliability Engineer (Hybrid) role at Broadridge 1 week ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (Hybrid) role at Broadridge Direct message the job poster from Broadridge Talent Acquisition Specialist @ Broadridge | Bridging Talent and Opportunity in Fintech | Mastering APAC Markets:...


  • , Metro Manila, Philippines Michael Page Full time

    Join a growing team. Enjoy market-aligned salaries & benefits. About Our Client The hiring company is a large organization in the healthcare industry, focused on delivering innovative solutions to improve patient care and operational efficiency. The company is committed to leveraging cutting-edge technology to support its services. Job Description Oversee...


  • , Metro Manila, Philippines GCash Full time

    Join to apply for the SRE - Site Reliability Engineer role at GCash . Here in GCash we want to stay at the forefront of the FinTech industry by creating innovative, meaningful, and convenient financial solutions for the nation! G ka ba? Join the G Nation today! Roles And Responsibilities Responsible for the identification and assessment of potential...


  • Manila, National Capital Region, Philippines Michael Page Full time

    Join a growing team.Enjoy market-aligned salaries & benefits.About Our ClientThe hiring company is a large organization in the healthcare industry, focused on delivering innovative solutions to improve patient care and operational efficiency. The company is committed to leveraging cutting-edge technology to support its services.Job DescriptionOversee the...


  • , Metro Manila, Philippines Broadridge Full time

    Overview Senior Site Reliability Engineer (Hybrid) – Join to apply for the Senior Site Reliability Engineer (Hybrid) role at Broadridge Responsibilities You will manage applications running on Windows and Unix/Linux servers, perform application installations, modify configurations, and server maintenance. Create documentations, diagrams, procedures,...


  • Manila, National Capital Region, Philippines HGS Offshore Staffing Solutions Full time ₱2,000,000 - ₱2,500,000 per year

    SENIOR SITE RELIABILITY ENGINEERPOSITION OVERVIEWWe are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functionalcloud platform team. Working alongside a diverse group of DevOps and Site ReliabilityEngineers, you will combine deep technical expertise in AWS cloud infrastructure with strongleadership capabilities in incident...


  • , Metro Manila, Philippines Buscojobs Full time

    Lead Site Reliability Engineer Posted today Job Description What Makes Us, Us. Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn and pursue outcomes with our prestigious financial clients, say Hello to...


  • , Metro Manila, Philippines Satori Full time

    Senior Site Reliability Engineer (Hybrid) Our client, a multinational leader in fleet performance management, is establishing its operations in the Philippines and is currently hiring members for the pioneer team. Job Summary: You will be part of an autonomous team, responsible for maintaining and developing the Client’s global SaaS platforms. Your efforts...


  • , Metro Manila, Philippines Buscojobs Full time

    Overview Work setup : Hybrid (open to x a week in the office) Work schedule : AM to PM Manila time Employment type : Permanent Location : Makati City, Metro Manila Pay range : Php , to Php , We value transparency and encourage applicants comfortable with this range to apply. Discover a world of endless possibilities with Cambridge University Press &...