Engineer, Site Reliability

6 days ago


Southern Manila District, Philippines Royal Caribbean International Full time

Overview

Position Summary:

The Site Reliability Engineer (Senior SRE) reports to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making. The SRE uses performance metrics from various sources and tools to support tasks such as initial triage of critical production incidents, bug analysis, implementation of site reliability engineering best practices, infrastructure optimization, and collaboration between internal teams and external service providers. The ideal candidate has a deep understanding and proven track record in an IT support role and proactively implements preventative measures to avoid technical incidents. The role requires working with multiple product and project teams in a fast-paced, dynamic environment and connecting threads across disparate teams.

Essential Duties and Responsibilities

At a high level, responsibilities for this role include:

  • Product Health : Responsible for incident management, application performance, configuration management, and operational readiness of the products within ownership. Partners with stakeholders from IT to ensure performance, configuration, and monitoring tools meet product needs.
  • Incident Management : Responsible for initial response, triage, and communication of production incidents that impact customers. Restore systems and applications to normal service operation quickly, analyze incident impact using performance data, and document incidents with postmortems and next steps. Support product team initiatives and releases; communicate details to production teams and stakeholders, including executives.
  • Application Performance Management (APM) : Proactive monitoring and management of performance and availability for the applications. Detect and diagnose complex performance problems, provide insights into metrics (errors, baseline violations, etc.), and understand business value of bug fixes and enhancements.
  • Configuration Management : Maintain a high-level view of website operations to identify performance trends between business processes; perform daily governance of application monitoring software.
  • Change Control Governance : Ensure production changes are planned, authorized, tested, and validated from a monitoring perspective, following change control policies and procedures.
  • Production Operations Readiness : Ensure all product implementations undergo operational readiness reviews. Establish clear communication channels with relevant teams and keep stakeholders informed about updates and changes affecting the website.
Qualifications
  • 3-6 years in Site Reliability Engineering (SRE), DevOps, QA, or a related IT operations role.
  • Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or other relevant advanced degree preferred.
Knowledge and Skills
  • Technical Expertise :
  • Proficiency in cloud platforms such as AWS, including AWS Elastic Beanstalk.
  • Understanding of API design principles: REST, SOAP, Graph (QL).
  • Advanced knowledge of monitoring and logging tools (AppDynamics, Datadog, Splunk, New Relic, etc.).
  • Familiarity with Adobe AEM Cloud is preferred to enhance system performance and reliability.
  • AI & Automation Expertise :
  • Working knowledge of scripting languages (Python, Bash, PowerShell) to automate alert routing, incident response, and infrastructure tasks; proactive mindset to explore and adopt new automation approaches.
  • Hands-on exposure to AI Ops platforms for improving anomaly detection, root cause analysis, and incident management; interest in staying ahead of industry trends.
  • Understanding of AI/ML and Generative AI techniques to reduce alert noise, predict incidents, and develop automation workflows; interest in piloting innovative solutions.
  • Familiarity with autonomous AI agents or intelligent automation systems in operational environments; enthusiasm to experiment with AI-driven tools in SRE.
  • Problem-Solving Skills :
  • Strong analytical and troubleshooting abilities to diagnose and resolve complex production issues swiftly.
  • Ability to develop and implement effective incident response plans.
  • Communication and Collaboration :
  • Excellent written and verbal communication for effective interaction with cross-functional teams and documentation.
  • Ability to collaborate with Development, QA, IT, and external managed service providers to ensure seamless operations.
Work Environment
  • The SRE may be required to participate in an on-call rotation to handle urgent incidents and ensure 24x7 system reliability. On-call duties may include evenings, weekends, and holidays as needed.
#J-18808-Ljbffr

  • Southern Manila District, Philippines Royal Caribbean International Full time

    Overview Position Summary: The Lead Site Reliability Engineer (Lead SRE) will report to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making. The Lead SRE will use application and user performance metrics collected from various sources and tools to support tasks such...


  • Southern Manila District, Philippines Vestas Wind Systems AS Full time

    Overview Are you ready to guide the development of innovative infrastructure solutions for a technology-focused entity in the renewable energy sector? We are seeking a Senior Systems Engineer committed to automation, monitoring, and asset management—someone who takes charge of what happens next and promotes continuous improvement in our digital landscape....


  • Eastern Manila District, Philippines CC.Talent Full time

    Senior Site Reliability Engineer (SRE) Senior Site Reliability Engineer (SRE) to join our global infrastructure team. You will be a guardian of our production environment, responsible for its health, performance, and scalability. Your mission is to apply software engineering principles to solve operational problems, automate everything, and ensure our...


  • Manila, National Capital Region, Philippines Michael Page Full time

    Join a growing team.Enjoy market-aligned salaries & benefits.About Our ClientThe hiring company is a large organization in the healthcare industry, focused on delivering innovative solutions to improve patient care and operational efficiency. The company is committed to leveraging cutting-edge technology to support its services.Job DescriptionOversee the...


  • Manila, National Capital Region, Philippines HGS Offshore Staffing Solutions Full time ₱2,000,000 - ₱2,500,000 per year

    SENIOR SITE RELIABILITY ENGINEERPOSITION OVERVIEWWe are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functionalcloud platform team. Working alongside a diverse group of DevOps and Site ReliabilityEngineers, you will combine deep technical expertise in AWS cloud infrastructure with strongleadership capabilities in incident...


  • , Metro Manila, Philippines Buscojobs Full time

    Site Reliability Engineer jobs in the Philippines 47 Site Reliability Engineer jobs in the Philippines Site Reliability Engineer Posted today Job Viewed Tap Again To Close Job Description Responsibilities: Develop, maintain, and optimize SAP landscapes on GCP for our clients, ensuring optimal performance, reliability, and efficiency. Utilize industry-leading...


  • Manila, National Capital Region, Philippines Cambridge University Press & Assessment Full time ₱60,000 - ₱81,000 per year

    Salary:Php 60,000 to Php 81,000- Location:Manila- Country:Philippines- Business Unit:Technology- Vacancy Type:Permanent- Closing Date:9 October 2025Meet the recruiterImee SantosWork setup: Hybrid (open to 2x a week in the office)Work schedule: 10AM to 6PM Manila timeEmployment type: PermanentLocation: Makati City, Metro ManilaPay range: Php 60,000 to Php...


  • Manila, National Capital Region, Philippines Canonical Full time

    OverviewJoin to apply for the Site Reliability Engineer role at Canonical. Canonical is hiring a Site Reliability Engineer to work on open source infrastructure and cloud engineering. Location: Globally remote role.ResponsibilitiesDeploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices.Identify and...


  • Manila, National Capital Region, Philippines Braintrust Full time ₱30,000 - ₱150,000 per year

    Job Description*Compensation range varies off level of experience:*Jr SRE $12k-$18k/yr, Intermediate: $20k-$30k/yr, Senior: $35k - $50k/yrSome travel may be required.*Card payment domain knowledge/experience is key:*Our client, a global Business Process Outsourcing (BPO) businesses is looking for Site Reliability Engineers (SRE) to support their client, a...


  • Manila, National Capital Region, Philippines Broadridge Financial Solutions Full time

    Senior Site Reliability Engineer (Hybrid) page is loaded## Senior Site Reliability Engineer (Hybrid)locations: Manila - 6805 Ayala Avetime type: Full timeposted on: Posted Todayjob requisition id: JR1075784At Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your...