SRE Engineer
1 week ago
JOB PURPOSE:
Reporting to the Sr Manager, DevSecOps & SRE, the Site Reliability Engineer will be responsible for: Site reliability engineers (SREs) are responsible for improving system reliability and resilience to make it faster and easier to develop and deploy new software capabilities. SREs focus especially on building automation to reduce manual effort and prevent operations incidents.
JOB RESPONSIBILITIES:
- Work with stakeholders such as product owners and Engineering to define service level objectives (SLOs) for system operations.
- Track performance against SLOs in partnership with monitoring teams or other stakeholders, and ensure systems continue to meet SLOs over time.
- Create dashboards and reports to communicate key metrics.
- Create software to improve performance, scalability, and stability of systems.
- Collaborate with development teams to promote the concept of reliability engineering during all phases of the software development lifecycle to detect and correct performance issues and meet availability goals.
- Design, code, test, and deliver infrastructure software to automate manual operational work (i.e., toil).
- Participate in operational support and on-call rotation shifts for supported systems and products.
- Conduct blameless post mortems to troubleshoot priority incidents.
- Perform analytics on previous incidents to understand root causes and better predict and prevent future issues.
- Use automation to reduce the probability and/or impact of problem recurrence.
- Identify, evaluate, and recommend monitoring tools and diagnostic techniques to improve system observability.
- Participate in system design consulting, platform management, capacity planning and launch reviews.
- Collaborate and share lessons learned regarding performance and reliability issues with all stakeholders including developers, other SREs, operations teams, and project management teams.
- Participate in communities of practice to share knowledge and foster continuous improvement.
- Remain current on site reliability engineering methods and trends such as observability-driven development and chaos engineering.
- Drive continuous improvement in software quality and infrastructure reliability and resilience.
- Oversee, design, implement, and manage DevOps capabilities using continuous integration/continuous delivery toolsets and automation.
- SRE engineer will focus on Application Performance Monitoring (APM) including Design, Solution, POC, profiling and tuning application compute and data nodes and resources.
- Assist in defining SRE and Observability architecture, design.
- Analyze, Implement new features of SRE and Observability Platform.
- Full stack monitoring across all layers (Infrastructure/Network/Database/Application/Services/Third Party).
- Provide technical hands-on leadership in commercial and Open source/commercial monitoring Tool selection Implementation.
- Implement SRE driven automated Incident Detection -> automated Engagement –> Triage/Mitigate – RCA/Postmortems -> Problem task Remediation.
- AI Driven Correlation, De-duplication Noise Reduction and Auto Remediation.
- Provide weekly monitoring and alert analysis and continuous improvement.
- Create a model of the run-time environment (discovery).
- Profile the performance and behavior of user-defined transactions.
- Establish Performance metrics from each of the applications/systems technical components (Webserver, App server, Database, etc.).
- Application performance management database.
- APM tool Administration and Support.
- Monitoring Tool design and implementation.
- APM Setup/Usage policies and guidelines.
- Capacity Planning and monitoring.
- Monitor selected application performance.
- Report vital statistics of application performance in production.
- Make recommendations for improvements with Service Desk.
- Make recommendations for adjustments to runtime resources to improve overall performance profile.
KEY QUALIFICATION & EXPERIENCES:
- Strong problem solving and analytical skills.
- Strong interpersonal and written and verbal communication skills.
- Highly adaptable to changing circumstances. Interest in continuously learning new skills and technologies.
- Experience with programming and scripting languages (e.g. Java, C#, C++, Python, Bash, PowerShell).
- Experience with incident and response management.
- Experience with Agile and DevOps development methodologies.
- Experience with container technologies and supporting tools (e.g. Docker Swarm, Podman, Kubernetes, Mesos).
- Experience with working in cloud ecosystems (Microsoft Azure, AWS, Google Cloud Platform).
- Experience with monitoring and observability tools (e.g. Splunk, Cloudwatch, AppDynamics, NewRelic, ELK, Prometheus, OpenTelemetry).
- Experience with configuration management systems (e.g. Puppet, Ansible, Chef, Salt, Terraform).
- Experience working with continuous integration/continuous deployment tools (e.g. Git, Teamcity, Jenkins, Artifactory).
- Experience in GitOps based automation is Plus.
- Bachelor's degree (or equivalent years of experience).
- 5+ years of relevant work experience. SRE experience preferred.
- Background in Manufacturing, Platform/Tech companies is preferred.
- Must have Public Cloud provider certifications (Azure, GCP or AWS).
- Having CNCF certification is plus.
-
DevOps/SRE
9 hours ago
Manila, National Capital Region, Philippines Theoria Medical Full timeGet AI-powered advice on this job and more exclusive features.Position Type: Full-TimeCompensation: P50,000 to P90,000 per month (All-in salary package)Job Location: RemoteJob Highlights:Paid Time Off: Relax and recharge with paid vacation and sick leaves.Bonus Boost: Enjoy an extra bonus with our 13th month pay.Weekends Free: Say goodbye to work on weekends...
-
Reliability Engineering Lead
24 hours ago
Manila, National Capital Region, Philippines Broadridge Connectivity Solutions Philippines Inc. Full timeAs a Senior Site Reliability Engineer at Broadridge Connectivity Solutions Philippines Inc., you will play a critical role in establishing and executing our SRE strategy. You will be responsible for leading the implementation of SRE best practices, managing applications, and monitoring system health.
-
Site Reliability Engineering Team Lead
6 days ago
Manila, National Capital Region, Philippines Token Metrics Full timeSite Reliability Engineering Team LeadToken Metrics is seeking a Site Reliability Engineering Team Lead to lead our SRE team. As a Site Reliability Engineering Team Lead, you will be responsible for designing, implementing, and maintaining our company's IT infrastructure, with a focus on reliability and efficiency.Key Responsibilities:Lead our SRE team in...
-
Reliability Engineering Leader
13 hours ago
Manila, National Capital Region, Philippines Itiviti Full timeJob DescriptionWe are seeking a Senior Site Reliability Engineer to lead the design, implementation, and operational support of our hybrid environments. This role will be pivotal in setting the foundation and strategy for our SRE practices while driving their implementation across the organization.The ideal candidate will combine technical expertise with...
-
Senior Site Reliability Engineer
2 weeks ago
Manila, National Capital Region, Philippines Itiviti Full timeSenior Site Reliability Engineer (Hybrid-Flexible Options)ItivitiNegotiableOn-site - Manila 3-5 Yrs Exp Bachelor Full-timeJob DescriptionDescriptionRole OverviewWe are seeking a dynamic Senior Site Reliability Engineer (SRE) to lead the design, implementation, and operational support of our hybrid environments, spanning on-premises, private cloud, and public...
-
SRE Engineer
6 days ago
Manila, National Capital Region, Philippines Avaloq Full timeAbout AvaloqWe are a Swiss-headquartered company with a global footprint, continuously expanding our presence in 12 countries and serving over 170 clients worldwide. Our commitment to innovation drives us to develop cutting-edge wealth management technology and services.Job ResponsibilitiesThis role involves building software and systems to manage and...
-
Site Reliability Engineer
5 days ago
Manila, National Capital Region, Philippines Five9 Inc. Full timeSRE - Observability EngineerFive9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide. We are seeking a talented SRE - Observability Engineer to join our growing Observability Team.The ideal candidate will have experience with Tier 1 maintenance/support of Observability Tools, including...
-
SRE DevOps Engineer
5 days ago
Manila, National Capital Region, Philippines MicroSourcing Full timeDiscover your 100% YOU with MicroSourcingPosition: Sr. DevOps EngineerWork setup & shift: WFH/ DayshiftWhy join MicroSourcing?You'll have:Competitive Rewards: Enjoy above-market compensation, healthcare coverage on day one, plus one or more dependents, paid time-off with cash conversion, group life insurance, and performance bonuses.A Collaborative Spirit:...
-
DevOps Engineer
1 week ago
Manila, National Capital Region, Philippines Broadridge Financial Solutions Full timeSite Reliability Engineer (Hybrid-Flexible Options)Site Reliability Engineer (Hybrid-Flexible Options)Apply locations Manila - 6805 Ayala Ave time type Full time posted on Posted Yesterday job requisition id JR1070899At Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your...
-
Sr. Site Reliability Engineer
5 days ago
Manila, National Capital Region, Philippines Broadridge Connectivity Solutions Philippines Inc. Full timeRole OverviewWe are seeking a dynamic Senior Site Reliability Engineer (SRE) to lead the design, implementation, and operational support of our hybrid environments across on-premises, private cloud, and public cloud platforms. This role is vital in establishing and executing our SRE strategy to ensure our environments are scalable, reliable, and secure. The...
-
Systems Reliability Engineer
5 days ago
Manila, National Capital Region, Philippines Iron Mountain Full timeWe are seeking a Systems Reliability Engineer to join our team at Iron Mountain. As a Systems Reliability Engineer, you will play a crucial role in providing technical support for our computer applications and hardware, including PCs, servers, and mainframes.Your responsibilities will include answering system-related queries, collaborating with network...
-
Observability Platform Specialist
5 days ago
Manila, National Capital Region, Philippines Five9 Inc. Full timeObservability Engineer (SRE)Five9 is committed to building a team that represents a variety of backgrounds, perspectives, and skills. We are seeking a skilled Observability Engineer (SRE) to join our growing Observability Team.The ideal candidate will have experience with Tier 1 maintenance/support of Observability Tools, including provisioning users,...
-
Devops Engineer/SRE
2 weeks ago
Manila, National Capital Region, Philippines PENBROTHERS Full timeWhat's the scoop?As a Site Reliability Engineer, you will ensure deployment readiness, automate CI/CD processes, and optimize Azure cloud infrastructure. You will collaborate with development teams to troubleshoot deployment issues, enhance system reliability, and implement Infrastructure as Code (IaC) using Terraform and ARM templates. This role also...
-
DevOps Engineer
5 hours ago
Manila, National Capital Region, Philippines Nezda Technologies Inc Full timeAbout the Role:We are looking for a talented DevOps Engineer - Cloud and Automation to join our infrastructure team at Nezda Technologies Inc. In this role, you will be responsible for designing, building, and maintaining cloud-based infrastructure and automation solutions.Key Responsibilities:Develop and maintain automation scripts in Python, PowerShell,...
-
Monitoring Engineer/Observability Engineer
2 weeks ago
Manila, National Capital Region, Philippines YONDU INC. Full timeResponsibilities:Proficiency in data engineering tools (e.g., Apache Kafka, Spark, SQL) and observability platforms (e.g., Grafana, Prometheus, Elastic Stack, Datadog).Solid understanding of data modeling, ETL pipelines, and data integration techniques.Experience with cloud infrastructure (e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes)...
-
Distributed Systems Engineer
6 hours ago
Manila, National Capital Region, Philippines Hydrolix Full timeAt Hydrolix, we foster a culture of operational excellence across the organization.We are seeking a Site Reliability Engineer to contribute to the reliability and scalability of our cutting-edge platform.Job ResponsibilitiesDesign, implement, and maintain systems and processes to enhance service reliability.Develop and manage monitoring, alerting, and...
-
Manila, National Capital Region, Philippines Outsourced Full timeJob OverviewWe are seeking a highly skilled DevOps/Site Reliability Engineer to join our team at Outsourced. As a key member of our technical staff, you will be responsible for ensuring the reliability, scalability, and security of our critical systems and infrastructure.This is an exciting opportunity to work collaboratively across development, operations,...
-
Cloud Engineer
3 weeks ago
Manila, National Capital Region, Philippines ING Hubs Philippines Full timeKey ResponsibilitiesAs a Public Cloud Engineer - networking, you will be reporting to a Chapter Lead, and work closely with fellow Public Cloud Engineers both within Philippines, Netherlands and Poland to design, implement, manage, and deliver solutions in the Public Cloud Domain consumed by ING DevOps teams in many countries throughout the globe.Public...
-
Senior Cloud Networking Engineer
5 days ago
Manila, National Capital Region, Philippines ING Hubs Philippines Full timeWe are seeking a skilled Cloud Engineer to join our team at ING Hubs Philippines. As a Cloud Engineer, you will play a crucial role in designing, implementing, and managing cloud-based infrastructure and applications.About Your RoleYou will collaborate with cross-functional teams, including DevOps, SRE, and DevSecOps, to design and implement cloud-based...
-
Site Reliability Engineer
9 hours ago
Manila, National Capital Region, Philippines Hydrolix Full timeHydrolix Manila, National Capital Region, PhilippinesAt Hydrolix, we are revolutionizing the world of data management and analytics with our innovative cloud data platform, purpose-built for petabyte-scale datasets. Our mission is to help organizations drastically reduce data costs while increasing their data retention.We are looking for a Site Reliability...