Full Time

Site Reliability Engineer

UST
Remote!
$90,000 - $150,000* / year

Job Description

Overview


The Site Reliability Engineer (SRE) at UST will play a critical role in ensuring the stability, reliability, and performance of our cloud infrastructure and services. Collaborating closely with development and operations teams, the SRE will implement best practices in system automation, incident management, and optimization of our cloud services. This position requires a balance of software engineering skills and systems administration expertise to support and improve our production environment while adopting a mindset of continual improvement and innovation.



Job Responsibilities

  • Design, maintain, and optimize scalable and resilient cloud infrastructures.
  • Implement monitoring, alerting, and incident response to ensure uptime and performance.
  • Collaborate with development teams to improve product reliability and performance through automation.
  • Troubleshoot and resolve production issues in a timely manner.
  • Develop and maintain configuration management and deployment processes.
  • Engage in capacity planning and performance tuning for production systems.
  • Document procedures, processes, and incident reports for continuous improvement.


Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • Minimum of 3 years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • Proficiency in cloud platforms such as AWS, Google Cloud, or Microsoft Azure.
  • Strong programming skills in languages such as Python, Go, or Java.
  • Experience with container orchestration platforms like Kubernetes or Docker.
  • Solid understanding of Linux systems and server administration.
  • Familiarity with CI/CD pipelines and configuration management tools like Ansible or Terraform.


Benefits

  • Competitive salary and performance-based incentives.
  • Comprehensive health, dental, and vision insurance.
  • Flexible work arrangements and remote work options.
  • Generous paid time off and vacation policies.
  • 401(k) retirement plan with company match.
  • Employee development and professional training opportunities.
  • Wellness programs and employee support resources.


Technologies & Tools


In this role, the Site Reliability Engineer will work extensively with a range of tools and technologies, including cloud service platforms such as Amazon Web Services (AWS) and Google Cloud. Additionally, proficiency in Kubernetes for container orchestration and monitoring tools like Prometheus or Grafana will be essential. Familiarity with infrastructure as code tools such as Terraform and configuration management systems like Ansible will also be critical for automating processes and ensuring system reliability.



Ideal Candidates


Ideal candidates for the Site Reliability Engineer position at UST will exhibit a combination of technical expertise and a proactive attitude towards problem-solving. They should possess strong analytical skills, the ability to work collaboratively in a team setting, and a passion for implementing automated solutions. Candidates must be adaptable, willing to learn new technologies, and committed to maintaining high service availability while optimizing performance and infrastructure costs.

View Similar Jobs

Matches Jobs

Similar jobs which you may be interested in. Typically using your existing skillset.