Engineer, Site Reliability
2 weeks ago
Overview Position Summary: The Site Reliability Engineer (Senior SRE) reports to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making. The SRE uses performance metrics from various sources and tools to support tasks such as initial triage of critical production incidents, bug analysis, implementation of site reliability engineering best practices, infrastructure optimization, and collaboration between internal teams and external service providers. The ideal candidate has a deep understanding and proven track record in an IT support role and proactively implements preventative measures to avoid technical incidents. The role requires working with multiple product and project teams in a fast-paced, dynamic environment and connecting threads across disparate teams. Essential Duties and Responsibilities At a high level, responsibilities for this role include: Product Health : Responsible for incident management, application performance, configuration management, and operational readiness of the products within ownership. Partners with stakeholders from IT to ensure performance, configuration, and monitoring tools meet product needs. Incident Management : Responsible for initial response, triage, and communication of production incidents that impact customers. Restore systems and applications to normal service operation quickly, analyze incident impact using performance data, and document incidents with postmortems and next steps. Support product team initiatives and releases; communicate details to production teams and stakeholders, including executives. Application Performance Management (APM) : Proactive monitoring and management of performance and availability for the applications. Detect and diagnose complex performance problems, provide insights into metrics (errors, baseline violations, etc.), and understand business value of bug fixes and enhancements. Configuration Management : Maintain a high-level view of website operations to identify performance trends between business processes; perform daily governance of application monitoring software. Change Control Governance : Ensure production changes are planned, authorized, tested, and validated from a monitoring perspective, following change control policies and procedures. Production Operations Readiness : Ensure all product implementations undergo operational readiness reviews. Establish clear communication channels with relevant teams and keep stakeholders informed about updates and changes affecting the website. Qualifications 3-6 years in Site Reliability Engineering (SRE), DevOps, QA, or a related IT operations role. Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or other relevant advanced degree preferred. Knowledge and Skills Technical Expertise : Proficiency in cloud platforms such as AWS, including AWS Elastic Beanstalk. Understanding of API design principles: REST, SOAP, Graph (QL). Advanced knowledge of monitoring and logging tools (AppDynamics, Datadog, Splunk, New Relic, etc.). Familiarity with Adobe AEM Cloud is preferred to enhance system performance and reliability. AI & Automation Expertise : Working knowledge of scripting languages (Python, Bash, PowerShell) to automate alert routing, incident response, and infrastructure tasks; proactive mindset to explore and adopt new automation approaches. Hands-on exposure to AI Ops platforms for improving anomaly detection, root cause analysis, and incident management; interest in staying ahead of industry trends. Understanding of AI/ML and Generative AI techniques to reduce alert noise, predict incidents, and develop automation workflows; interest in piloting innovative solutions. Familiarity with autonomous AI agents or intelligent automation systems in operational environments; enthusiasm to experiment with AI-driven tools in SRE. Problem-Solving Skills : Strong analytical and troubleshooting abilities to diagnose and resolve complex production issues swiftly. Ability to develop and implement effective incident response plans. Communication and Collaboration : Excellent written and verbal communication for effective interaction with cross-functional teams and documentation. Ability to collaborate with Development, QA, IT, and external managed service providers to ensure seamless operations. Work Environment The SRE may be required to participate in an on-call rotation to handle urgent incidents and ensure 24x7 system reliability. On-call duties may include evenings, weekends, and holidays as needed. #J-18808-Ljbffr
-
Lead, Site Reliability Engineer
7 days ago
Southern Manila District, Philippines Royal Caribbean International Full timeOverview Position Summary: The Lead Site Reliability Engineer (Lead SRE) will report to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making. The Lead SRE will use application and user performance metrics collected from various sources and tools to support tasks such as...
-
Site Reliability Engineer
7 days ago
Southern Manila District, Philippines Vestas Wind Systems AS Full timeOverview Are you ready to guide the development of innovative infrastructure solutions for a technology-focused entity in the renewable energy sector? We are seeking a Senior Systems Engineer committed to automation, monitoring, and asset management—someone who takes charge of what happens next and promotes continuous improvement in our digital landscape....
-
Senior Site Reliability Engineer
3 weeks ago
Eastern Manila District, Philippines CC.Talent Full timeSenior Site Reliability Engineer (SRE) Senior Site Reliability Engineer (SRE) to join our global infrastructure team. You will be a guardian of our production environment, responsible for its health, performance, and scalability. Your mission is to apply software engineering principles to solve operational problems, automate everything, and ensure our...
-
Site Reliability Engineer
7 days ago
Manila, Philippines Tata Consultancy Services Full timeHuman Resources Executive at Tata Consultancy Services Job Description: Site Reliability Engineering (SRE) SME Position Overview We are seeking a highly skilled Site Reliability Engineering (SRE) Subject Matter Expert (SME) to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for...
-
Site Reliability Engineer
6 days ago
Manila, National Capital Region, Philippines HGS Offshore Staffing Solutions Full time ₱2,000,000 - ₱2,500,000 per yearSENIOR SITE RELIABILITY ENGINEERPOSITION OVERVIEWWe are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functionalcloud platform team. Working alongside a diverse group of DevOps and Site ReliabilityEngineers, you will combine deep technical expertise in AWS cloud infrastructure with strongleadership capabilities in incident...
-
Site Reliability Engineer
1 week ago
Manila, Philippines Russell Tobin Full timeSenior Associate - Talent Acquisition - Corporate Strategy Hiring | Specialized in APAC We are seeking a highly skilled Site Reliability Engineering (SRE) Subject Matter Expert (SME) to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing...
-
Site Reliability Engineer
2 days ago
Manila, National Capital Region, Philippines CDOps Tech Full time ₱120,000 - ₱180,000 per yearAbout the OpportunityWe are seeking a seasoned and passionate Site Reliability Engineer for a high-impact contract engagement with one of our key clients, a leader in the marketing-tech sector. This is not just a typical SRE role; you will be the foundational expert responsible for spearheading the adoption of SRE culture and practices within the client's...
-
Site Reliability Engineer
2 weeks ago
Manila, National Capital Region, Philippines Russell Tobin Full time ₱120,000 - ₱180,000 per yearWe are seeking a highly skilledSite Reliability Engineering (SRE) Subject Matter Expert (SME)to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing modern SRE capabilities that improve system reliability, scalability, and efficiency across our...
-
Cloud Site Reliability Engineer
3 weeks ago
Manila, Philippines Tyler Technologies Full timeJoin to apply for the Cloud Site Reliability Engineer role at Tyler Technologies Overview Responsibilities Implement tooling to monitor AWS EKS-based systems focusing on performance, reliability, and scalability. Ensure that architecture and deployment models are sufficient to support SLA commitments and are well prepared for future problems of scale....
-
Site Reliability Engineer
2 days ago
Manila, National Capital Region, Philippines Broadridge Full time ₱1,200,000 - ₱2,400,000 per yearAt Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while helping others along the way, come join the Broadridge team.Role OverviewAtBroadridge Trading & Connectivity Solutions, we foster a culture of empowerment, innovation, and collaboration, where...