Site Reliability Engineer Job at Brooksource, San Antonio, TX

N25pTXVlUEFzUk1Wa0g5MWtrOUtsSkJVL0E9PQ==
  • Brooksource
  • San Antonio, TX

Job Description

Site Reliability Engineer (SRE)

Job Summary

Seeking a highly skilled Site Reliability Engineer to work closely with engineering teams to ensure applications are highly available, meet performance standards, and meet the reliability expectations of business stakeholders. As a Site Reliability Engineer, you will work to identify and deliver automation solutions designed to ensure high availability and resiliency using your expertise in software development, complexity analysis, and scalable system design.

Duties and Responsibilities

  • Monitor system performance, identify areas for improvement, and implement solutions to enhance reliability and availability.
  • Guide architecture and development teams on how to make applications highly available, reliable, and performant at a global scale
  • Collaborate with product owners to Implement and monitor key metrics to meet SLOs and SLA
  • Collaborate with development team members to troubleshoot and resolve problems
  • Drive the Root Cause Analysis of production issues and other failures within the supported application software stack
  • Design, build, and champion automated solutions and tasks to optimize application/service/platform uptime with minimal human intervention
  • Develop tools and processes to monitor the Cloud resources and applications
  • Use Kubernetes to deploy platform services
  • Create and implement standards and best practices, driving adoption across development teams and external vendors as applicable

Requirements and Qualifications

Expertise and/or relevant experience in the following areas is mandatory:

  • Bachelor or above degree in Computer Science or a related technical discipline
  • 4+ year’s experience in the deployment, administration, and troubleshooting of large-scale distributed systems
  • 4+ years of experience in Automation Programming in one or more of the following scripting programming languages: Python, Go, Rust, and JavaScript (with priority being given to Python and Go but not required). Bash is not a programing language.
  • 4+ years of experience working with Linux terminal tools and writing shell scripts within a Linux environment
  • Strong understanding of SLA’s, SLO’s and SLI’s
  • Strong understanding of public cloud service concepts
  • Strong understanding of Unix/Linux operating systems internals and administration (Debian is preferred but not required)
  • Strong understanding of networking (e.g. TCP/IP, routing, network topologies, and hardware)
  • Strong experience in debugging and optimizing code and automating routine tasks
  • Strong skills in problem-solving and communication
  • SRE experience including:
  • Monitoring
  • Alert creation and tuning
  • Willing to work and support West Coast hours (9 AM – 6 PM PST)
  • Willing to work in on-call rotation to participate in troubleshooting and communication efforts outside of normal business hours

Expertise and/or relevant experience in the following areas is preferred:

  • Experience with the following or equivalent technologies: Kubernetes, Docker, OpenStack, Relational Databases, NoSQL Databases
  • Strong communication skills and presentation skills
  • Exhibits a determination or willingness to take action and achieve results
  • Excellent command of the English language (written and spoken)
  • Excellent organizational skills in planning and prioritizing own workload and initiatives

Job Tags

Similar Jobs

City of Walnut Creek

Police Officer - Lateral Job at City of Walnut Creek

 ...committed; reviews modus operandi of known criminals in attempting to determine relationships...  ...first three elements of Education and Experience. Candidates must meet all four criteria...  ...limited to, criminal history, Department of Justice and FBI fingerprinting, DMV driving... 

BJC HealthCare

Consultant Job at BJC HealthCare

Additional Information About the Role Come be part of transformational change in our compensation team! We are looking for a Strategic Sr. Compensation Consultant to work through job architecture, design and implementation of compensation structure and overall compensation...

BJC HealthCare

High Risk Cardiac Clinical Nurse Part Time Job at BJC HealthCare

 ...About the Role Unit - Memorial Belleville High Risk Cardiac Part Time Days Up to $7,500 Sign-On Bonus Competitive Pay (See...  ...Additional Preferred Requirements BSN Degree Telemetry Experience Overview Memorial Hospital Belleville is an acute... 

Skyrocket Ventures

DevOps / Site Reliability Engineer - GCP - Hypergrowth Healthcare GenAI Startup Job at Skyrocket Ventures

 ...and logging tools such as ELK, Grafana, or Datadog. - Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault. - Strong problem-solving skills with the ability to work independently and collaboratively... 

The Staff Pad

Clinical Lab Operations Manager Job at The Staff Pad

 ...standard for healthcare in Montana. Laboratory Services Operations Manager Overview: Reporting to the Director of Laboratory Services, the...  ...preferred. License/Certification: Current licensure as a Clinical Laboratory Scientist (CLS) strongly preferred. Skills: Strong leadership...