
Site Reliability Engineer
1 day ago
The Site Reliability Engineer (Senior SRE) will report to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making. The SRE will use application and user performance metrics collected from various sources and tools to support tasks such as initial triage of critical production incidents, bug analysis, implementation of best practices in site reliability engineering, infrastructure optimization, and seamless collaboration between internal teams and external service providers, among other operational initiatives.
Essential Duties and Responsibilities:
At a high-level, responsibilities for this role will include:
Product Health
- Responsible for the Incident Management, Application Performance, Configuration Management and Operational Readiness of the products within her/his ownership. Partners with and collaborate closely with stakeholders from the various teams within IT to ensure that performance tools, configuration tools and monitoring tools meet the needs of her/his products.
Incident Management
- Responsible for the initial response, triage, and communication of key production incidents (customer impacting) that occur on the site with the goal to restore systems/applications back to normal service operation as quickly as possible and minimizing the impact on guest/crew experience or business operations, thus ensuring the best possible service levels and availability are maintained.
- Performs analysis of incident impact on site to determine the root cause by reviewing performance data, including end user experience, application metrics, and infrastructure metrics. Support product team initiatives and releases.
- Synthesizes and communicates incident details to the production team, stakeholders, including executive level stakeholders. Document incident, perform postmortem and create next steps (as needed)
Application Performance Management (APM)
- Ensures the proactive monitoring and management of performance and availability of the software applications within the products s/he is responsible for. Strives to detect and diagnose complex application performance problems to maintain an expected level of service. Provides insight into application performance metrics (errors, exceptions, baseline violations,
- etc.) to identify technical impacts of bugs and enhancements. Understands key performance metrics (traffic volumes, booking volumes, response times, etc.) to identify business value of bug fixes and enhancements.
Configuration Management
.
- Understands high level view of the website operations to identify performance trends between
business processes. Performs daily governance of application monitoring software.
Change Control Governance
.
- Ensuring all production changes required by the product teams are carried out in a planned and authorized manner, within established change control policies and procedures and that all changes are thoroughly tested and validated from the monitoring perspective.
Production Operations Readiness.
- Ensure all product implementations go through an operational readiness review. Establish and maintain clear communication channels (e.g., Slack, Teams) with the scrum and marketing teams. Ensure all team members are informed about relevant updates and changes that may affect the website.
Qualifications:
- 3-6 years in Site Reliability Engineering (SRE), DevOps, QA, or a related IT operations role.
- Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or other relevant advanced degree preferred.
Knowledge and Skills:
Technical Expertise
:
- Proficiency in cloud platforms such as AWS, AWS Elastic Beanstalk.
- Understanding of API design principles: REST, SOAP, Graph
- Advanced knowledge of monitoring and logging tools (AppDynamics, Datadog, Splunk, New Relic, etc.).
- Familiarity with Adobe AEM Cloud is preferred to enhance system performance and reliability
AI & Automation Expertise:
- Working knowledge of scripting languages (Python, Bash, PowerShell) applied to automate alert routing, incident response, and infrastructure tasks, combined with a proactive mindset to explore and adopt new automation approaches.
- Hands-on exposure to AI Ops platforms for enhancing anomaly detection, root cause analysis, and incident management, demonstrating a passion for staying ahead of industry trends.
- Solid understanding of AI/ML and Generative AI techniques aimed at reducing alert noise, predicting incidents, and developing automation workflows, with active interest in piloting innovative solutions.
- Familiarity with autonomous AI agents (Agentic Agents) or intelligent automation systems within operational environments, coupled with enthusiasm to experiment with emerging AI-driven tools in SRE.
Problem-Solving Skills
:
- Strong analytical and troubleshooting skills to diagnose and resolve complex production issues swiftly.
- Ability to develop and implement effective incident response plans.
Communication and Collaboration
:
- Excellent written and verbal communication skills for effective interaction with cross-functional teams and documentation.
- Ability to collaborate with Development, QA, IT, and external managed service providers to ensure seamless operations.
-
Infrastructure Reliability Specialist
9 hours ago
Makati City, National Capital Region, Philippines beBee Reliability Full time $90,000 - $124,000Reliability Engineer Position">This role focuses on supporting and enhancing the critical components of our real-time trading infrastructure. You will work alongside production experts across global regions to ensure the availability, performance, and resilience of these high-throughput platforms.">Key Responsibilities:">">Ensure the availability and health...
-
Site Reliability Engineer
1 day ago
Makati City, National Capital Region, Philippines Royal Caribbean International Full time $80,000 - $100,000 per yearGet AI-powered advice on this job and more exclusive features. Site Reliability Engineer (SRE) will assist the SRE team in support of the Royal Caribbean website using application and user performance data to guide informed decision making. The SRE will use site performance metrics collected by various sources and tools to support the following tasks: the...
-
Site Reliability Engineer
1 day ago
Makati City, National Capital Region, Philippines Globant Full time $80,000 - $100,000 per yearGlobant Makati, National Capital Region, PhilippinesSite Reliability EngineerGlobant Makati, National Capital Region, PhilippinesWe are seeking a motivated and experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in application performance monitoring, logging and tracing, and web performance...
-
Site Reliability Engineer
14 hours ago
Makati City, National Capital Region, Philippines Cambridge University Press & Assessment | Manila Full time ₱60,000 - ₱81,000 per yearNOTE: When you click the apply button, you will be re-directed to Cambridge University Press & Assessment's website where you will be required to create a profile and upload a copy of your CV to complete your application.ork setup: Hybrid (open to 2x a week in the office)Work schedule: 10AM to 6PM Manila timeEmployment type: PermanentLocation: Makati City,...
-
Site Reliability Engineering Specialist
1 day ago
Makati City, National Capital Region, Philippines Electronic Transfer and Advance Processing Inc. Full time $90,000 - $120,000 per yearJob DescriptionWe are seeking a Senior Site Reliability Engineer (SRE) to lead the design, deployment, and management of highly available and scalable AWS cloud infrastructure. This role will focus on building automation solutions, optimizing system performance, and strengthening the reliability and security of cloud services. As a senior member of the team,...
-
Site Reliability Engineer
7 hours ago
Makati City, National Capital Region, Philippines Broadridge Financial Solutions Full timeSite Reliability Engineer (Hybrid) page is loadedSite Reliability Engineer (Hybrid)Apply locations Manila - 6805 Ayala Ave time type Full time posted on Posted 5 Days Ago job requisition id JR1075416At Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while...
-
Senior Site Reliability Engineer
1 day ago
Makati City, National Capital Region, Philippines Royal Caribbean International Full time $90,000 - $120,000 per yearSenior Site Reliability Engineer (Sr. SRE) will support the Royal Caribbean website by analyzing application and user performance data to inform decision-making. The Sr. SRE will utilize site performance metrics from various sources and tools to:Assist in triaging critical production incidents Analyze bugs and implement best practices in site reliability...
-
Site Reliability Engineer
1 day ago
Mandaluyong City, National Capital Region, Philippines DFI Retail Group Full time $90,000 - $120,000 per yearDFI Team BriefAs a Site Reliability Engineer (SRE) at DFI Retail Group, you will be the bridge between development and operations, ensuring our systems are designed, implemented, and maintained for maximum reliability, scalability, and performance. You will leverage your software engineering expertise to automate operations, optimize system performance, and...
-
Senior Site Reliability Engineer
1 day ago
Makati City, National Capital Region, Philippines eTap Inc. Full time ₱900,000 - ₱1,200,000 per yeare Tap Inc. Makati, National Capital Region, PhilippinesSenior Site Reliability EngineereTap Inc. Makati, National Capital Region, Philippines1 day ago Be among the first 25 applicants Direct message the job poster from e Tap Inc.Human Resources Manager at Electronic Transfer and Advance Processing Inc.About Electronic Transfer and Advance Processing Inc (e...
-
Site Reliability Engineer
1 day ago
Makati City, National Capital Region, Philippines Broadridge Full time $90,000 - $120,000 per yearAt Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while helping others along the way, come join the Broadridge team.Role Overview At Broadridge Trading & Connectivity Solutions, we foster a culture of empowerment, innovation, and collaboration, where...