
Reliability Engineer
Job Description
Posted on: January 31, 2026
🗂 We’re Hiring: Reliability Engineer🕒 Employment Type: Full-Time
💼 Level: Mid-Level / Senior
We are seeking a skilled and detail-oriented Reliability Engineer to join our team. As a Reliability Engineer, you will be responsible for ensuring the continuous performance and reliability of our systems, applications, and infrastructure. You will work closely with cross-functional teams to identify areas of improvement, automate processes, and resolve issues before they impact customers. Your role will be critical in driving long-term system stability, improving uptime, and enhancing operational efficiency. If you have a strong technical background, a proactive mindset, and a passion for optimizing system performance, we’d love to hear from you!
🎯 Key Responsibilities:
- Design, implement, and maintain monitoring systems to ensure the reliability and performance of infrastructure, applications, and services.
- Develop and deploy automated solutions to detect, troubleshoot, and resolve system and application issues.
- Identify and address potential failure points in systems to proactively improve system reliability and uptime.
- Collaborate with DevOps, engineering, and product teams to build scalable, highly available, and fault-tolerant systems.
- Conduct root cause analysis of incidents and problems, developing long-term solutions to prevent recurrence.
- Develop and maintain disaster recovery plans, ensuring systems can recover quickly and efficiently from failures.
- Perform capacity planning, ensuring systems have the necessary resources to scale with user demand.
- Analyze system logs, metrics, and trends to identify opportunities for optimization and performance tuning.
- Participate in on-call rotations to provide timely responses to system outages or performance degradation.
- Drive continuous improvement initiatives through data-driven analysis and performance feedback.
- Create and maintain documentation for reliability-related processes, systems, and incidents.
- Contribute to creating a culture of reliability within the engineering and operations teams, promoting best practices and a focus on system health.
✅ Requirements:
- Proven experience as a Reliability Engineer, Site Reliability Engineer (SRE), DevOps Engineer, or in a similar role.
- Strong knowledge of system architecture, cloud computing, and high-availability systems.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, New Relic).
- Proficiency with scripting and automation tools (e.g., Python, Bash, Terraform, Ansible).
- Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
- Familiarity with infrastructure-as-code (IaC) practices and tools (e.g., AWS CloudFormation, Terraform).
- Solid understanding of databases, caching systems, and load balancing.
- Experience with incident response, root cause analysis, and post-mortem processes.
- Ability to analyze and optimize performance at both the system and application levels.
- Knowledge of cloud platforms (AWS, Google Cloud, Azure) and distributed systems.
- Strong problem-solving skills and the ability to handle complex technical issues.
- Excellent communication skills, both written and verbal, with the ability to collaborate across teams.
- Ability to manage multiple priorities and meet deadlines in a fast-paced environment.
- A degree in Computer Science, Engineering, or a related field is preferred.
- Relevant certifications (e.g., AWS Certified Solutions Architect, Google Professional Cloud Architect) are a plus.
🌟 What We Offer:
- A dynamic, collaborative, and innovative work environment.
- Opportunities for career growth and development in system reliability and DevOps practices.
- Access to cutting-edge technologies and tools in a growing tech company.
- Competitive compensation and benefits package.
- A team culture that values continuous learning, reliability, and automation.
- Flexible work arrangements, including remote work options.
- Ongoing training opportunities to enhance your technical expertise and certifications.
Apply now
Please let the company know that you found this position on our job board. This is a great way to support us, so we can keep posting cool jobs every day!
RemoteITJobs.app
Get RemoteITJobs.app on your phone!

Senior Software Engineer | Remote

Reliability Engineer

Senior Software Engineer - Backend

Senior Software Engineer

