Reliability Engineer Vacancy (Remote) – Odixcity Consulting

Job Summary

  •  We are looking for a proactive and hands-on Reliability Engineer to join our team. You will be crucial in ensuring our core services are stable, scalable, and efficient.

Responsibilities

  • Closely monitor system health, performance, and availability using tools like Grafana, Prometheus, Datadog, or New Relic. Respond to and resolve incidents.
  • Lead and document post-incident reviews to identify root causes and preventive actions.
  • Write scripts (Python, Bash) and use configuration management tools to automate operational tasks, deployments, and recovery procedures.
  • Build the internal platforms and tools that make reliability a default for every engineering team- self-healing systems, automated canary analysis, and performance tracing at scale.
  • Work with software teams to define Service Level Objectives (SLOs) and Error Budgets. Implement improvements to reduce manual toil, improve system resilience, and prevent recurring issues.
  • Manage and optimize cloud resources (AWS, Google Cloud, or Azure) to ensure cost-effectiveness and performance. Implement infrastructure as Code (IaC) principles.
  • Lead the design and implementation of chaos engineering practices, disaster recovery automation, and capacity planning.

Requirements

  • 3-5 years of experience in a DevOps, SRE, Linux System Administration, or Backend Engineering role.
  • Proficiency in scripting language; Python or GO.
  • Solid experience with cloud platforms; Azure, Google Cloud, AWS etc.
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Practical knowledge of monitoring/ observability tools.
  • Familiarity with CI/CD Pipelines (GitLab CI, Jenkins, GitHub Actions).

Core Skills:

  • Excellent problem solving and trobuleshooting skills under pressure.
  • Strong understanding of network fundamentals (TCP/IP, DNS, HTTP/S).
  • Knowledge of database performance and reliability (PostgreSQL, MySQL, MongoDB).
  • A systematic approach to automation and a desire to eliminate manual work.
  • Good communication skills to collaborate with both technical and non-technical teams.
  • Understanding of security best practices in infrastructure.
Print Job Listing

Sign in

Sign Up

Forgot Password

Cart

Your cart is currently empty.

Share