
Site Reliability Engineer
2 weeks ago
Overview
We are seeking a skilled and motivated Site Reliability Engineer (SRE) with expertise in supporting and managing MQ and Kafka systems. The ideal candidate will have a strong background in Unix systems administration, experience with Kubernetes (preferred), and a passion for maintaining high availability, performance, and reliability in distributed systems.
Key Responsibilities- Provide technical guidance and assist application teams with adoption of MQ Encryption in Transit.
- Manage and Support MQ & Kafka Systems: Monitor, maintain, and troubleshoot IBM MQ and Kafka clusters to ensure optimal performance and reliability. Handle incident management, including root cause analysis and post-mortem reviews.
- Automation and Scripting: Develop and maintain automation scripts to streamline operational processes, deployment pipelines, and monitoring solutions using tools like Ansible, Python and/or shell scripts or similar.
- Monitoring and Alerting: Implement and manage monitoring tools (e.g., Prometheus, Grafana) to track the health and performance of MQ, Kafka, and related systems. Create and manage alerting mechanisms to proactively identify and resolve issues.
- Performance Tuning and Optimization: Continuously monitor system performance, identifying and resolving bottlenecks. Implement best practices for scaling Kafka and MQ clusters.
- Collaboration and Support: Work closely with application development, DevOps, and other engineering teams to support new and existing applications.
- Documentation: Maintain clear and comprehensive documentation for system configurations, procedures, and troubleshooting guides.
Experience: 5+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting MQ (e.g., IBM MQ) and Confluent Kafka.
Technical Skills: Experience in managing & supporting a large IBM MQ and/or Kafka plant. One of the two below is mandatory.
IBM MQ Administration- Installation & Configuration: Proficiency in installing and configuring IBM MQ.
- Queue Management: Creating, configuring, and managing queues, channels, listeners, and other MQ objects.
- Security Configuration: Implementing SSL/TLS, access control lists (ACLs), and MQ object security.
- Troubleshooting MQ Performance related issues.
- Installation & Configuration: Proficiency in installing and configuring Kafka brokers, Zookeeper, Kafka Connect, and Schema Registry on various platforms.
- Cluster Management: Skills in managing Kafka clusters, including adding/removing brokers, partitioning, and replication strategies.
- Kafka Broker Configuration: Deep understanding of broker configurations such as log retention, segment sizes, and in-sync replica (ISR) management.
- Experience of managing Kafka on Kubernetes is preferred.
- Familiarity with CI/CD pipelines and related tools (e.g., Jenkins, Git).
- Experience with configuration management tools (e.g., Ansible).
- Proficiency in Unix/Linux administration.
- Familiar with Networking concepts and utilities
- Proficiency with Python programming language
- High Availability (HA) & Disaster Recovery (DR): Setting up and maintaining HA clusters and disaster recovery solutions, including replication and failover mechanisms.
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
- Ability to work in a fast-paced, dynamic environment.
-
Site Reliability Engineer
4 days ago
Taguig, Philippines Yondu, Inc. Full timeJoin to apply for the Site Reliability Engineer role at Yondu, Inc. in Taguig, National Capital Region, Philippines. Responsibilities Handle service monitoring, incident response, and drive technical support efficiency. Manage and maintain network monitoring tools, systems, and processes that ensure the availability, scalability, and performance of...
-
Site Reliability Engineer
1 week ago
Taguig, National Capital Region, Philippines Pan Asia Resources PH Inc. Full time ₱1,440,000 - ₱2,160,000 per yearAbout the Role:We are seeking a skilled and motivated Site Reliability Engineer (SRE) with expertise in supporting and managing MQ and Kafka systems. The ideal candidate will have a strong background in Unix systems administration, experience with Kubernetes (preferred), and a passion for maintaining high availability, performance, and reliability in...
-
Site Reliability Engineer
5 days ago
Taguig, National Capital Region, Philippines Tata Consultancy Services Full time ₱900,000 - ₱1,200,000 per yearRole:EIT MQ L3About the Role:We are seeking a skilled and motivated Site Reliability Engineer (SRE) with expertise in supporting and managingMQ and Kafka systems. The ideal candidate will have a strong background in Unix systems administration, experience with Kubernetes (preferred), and a passion for maintaining high availability, performance, and...
-
Site Reliability Engineer
2 weeks ago
Taguig, National Capital Region, Philippines Procter & Gamble Full time ₱1,200,000 - ₱2,400,000 per yearJob LocationTaguig CityJob DescriptionInformation Technology (IT) at Procter & Gamble is where business, innovation and technology integrate to build a competitive advantage for P&G. Our mission is clear -- you deliver IT to help P&G win with consumers.Do you love implementing continuous improvement in IT solutions to drive efficiency and agility in meeting...
-
Senior Site Reliability Engineer
2 weeks ago
Taguig, Philippines magentIQ Full timeJoin to apply for the Senior Site Reliability Engineer role at magentIQ As a Site Reliability Engineer (SRE), you will be responsible for maintaining the reliability, performance, and scalability of our client’s systems. You’re a proactive engineer, who thrives in environments where you can automate and improve existing processes, and you are...
-
Site Reliability Engineering
7 days ago
Taguig, National Capital Region, Philippines Tata Consultancy Services Full time ₱2,000,000 - ₱2,500,000 per yearRequired Qualifications10+ years of experience in IT Operations, Reliability Engineering, or Performance Engineering.Deep expertise in observability and monitoring platforms (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, AppDynamics, etc.).Strong background in performance testing tools (JMeter, LoadRunner, Gatling, k6, etc.) and capacity...
-
Principal Site Reliability Engineer
1 day ago
Taguig, National Capital Region, Philippines GCash Full time ₱2,000,000 - ₱2,500,000 per yearDo you want to take the first step in making Filipinos' lives better everyday? Here in GCash we want to stay at the forefront of the FinTech industry by creating innovative, meaningful, and convenient financial solutions for the nation G ka ba? Join the G Nation todayKey ResponsibilitiesRelease Engineering TransformationAnalyze current Release Engineering...
-
Site Reliability Engineering
2 days ago
Taguig, Philippines Tata Consultancy Services Full time3 days ago Be among the first 25 applicants Direct message the job poster from Tata Consultancy Services 10+ years of experience in IT Operations, Reliability Engineering, or Performance Engineering. Deep expertise in observability and monitoring platforms (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, AppDynamics, etc.). Strong background in...
-
Site Reliability Engineer
1 day ago
Taguig, National Capital Region, Philippines Philtech Full time ₱2,000,000 - ₱2,500,000 per yearREQUIREMENTS:· Bachelor's degree in Information Technology, Computer Science, Engineering, or any related course.· At least 5 years of working experience as SRE/Application and Maintenance Support.· Knowledge on the following technologies:o Cloud Platforms: Microsoft Azure, Google Cloud Platform (GCP)o Operating Systems:· Experience with Unix/Linux and...
-
Service Reliability Engineer
5 days ago
Taguig, National Capital Region, Philippines YONDU INC. Full time ₱900,000 - ₱1,200,000 per yearAbout the role: As a Service Reliability Engineer at YONDU INC.', you will be responsible for ensuring the smooth and reliable operation of the company's critical IT systems and infrastructure. This full-time position is based in Taguig City Metro Manila and is a key role in supporting the company's overall business objectives.What you'll be...