Site Reliability Engineer at OnShift, Inc
This role will focus on automation, scalability, and performance of systems including our core applications.
DO YOU HAVE WHAT IT TAKES?
- Associate’s or bachelor’s degree, with a technical major, such as engineering or computer science is preferred
- Previous experience, 5-7 years or more, working in a Software Development, Dev Ops, Site Reliability Engineering, or Infrastructure position/team.
- Proficient in one or more general purpose programming (E.g., Java, C#) and scripting (E.g., Python, PowerShell) languages
- Experience with automation tools (E.g. Terraform, Ansible, etc.) is required
- Thorough understanding of the SDLC (Software Development Lifecycle)
- Is open and responsive to change and demonstrates a commitment to the process of continuous improvement by identifying and responding actively and with sensitivity to the needs of all employees
- Must be available outside of normal business hours to assist in recovery in the event of a failure or outage of a critical system.
Description
The Site Reliability Engineer is responsible for monitoring systems, ensuring responsiveness to alerts, and improving application stability and supportability. This role will focus on automation, scalability, and performance of systems including our core applications. The Site Reliability Engineer will research and help implement new site reliability related technologies, analyze current architecture and processes, optimize automation schedules, and expand and evangelize the automation vision throughout the Engineering department at OnShift.
WHO WE ARE
OnShift is a B2B Software Company headquartered in the theater district of Cleveland, Ohio. We are a remote-first organization using the office as a hub for innovation, collaboration, meetings, and social interaction. Along the way we've secured notable awards, some of them include 2021 Top Workplaces in Northeast Ohio and 2021 Weatherhead 100. We’re growing fast, we love what we do, and we do it with passion.
Since our inception, OnShift has been uniquely dedicated to the needs of the long-term post-acute healthcare and senior living workforce. We recognize that employees are the greatest asset, and today it's more critical than ever to attract and retain staff members. Our HCM software solutions are designed to deliver operational excellence while improving the daily lives of healthcare employees, so they can focus on what matters most—delivering quality care to patients and residents.
As an organization, we value people who are dedicated and innovative and we reward them with challenging work, competitive pay, solid benefits, equity participation, career growth, and personal development.
OUR COMMITMENT TO DIVERSITY
OnShift understands that diverse teams who feel a sense of belonging create a better employee experience, have more fun at work, drive better business results, and create more innovative products. OnShift is committed to continuously evaluating and improving our Diversity, Inclusion, and Belonging efforts through our internal Diversity, Inclusion, and Belonging (DIBs) initiative.
A DAY IN THE LIFE
- Monitor applications and performance reporting to ensure adherence to established service levels
- Strong knowledge of Cloud based systems or Virtual environments (Azure, AWS, Google Cloud)
- Use site reliability expertise to collaboratively work with other technical teams to provide guidance and ensure systems and solutions are scalable and reliable
- Research and recommend new technologies for compatibility, expandability, ease of use, and supportability
- Review, evolve, and assist with disaster recovery plans on production environments
- Create and maintain excellent documentation including workflows, procedures, and troubleshooting
- Provide on-call support on a rotating schedule or as needed for emergency situations, including outside of normal business hours
- Develop scripts used for batch job scheduling; assist with job schedules to ensure stability and efficient use of machine resources
- Ensure sufficient logging, monitoring, and alerting strategies around availability, latency, and overall system health
- Support the day-to-day maintenance with a focus on reducing toil or technical debt of the applications while maintaining a high degree of system availability
- Keep management advised of any on-going problems not being resolved satisfactorily or that prevent the fulfillment of responsibilities
- Perform other duties and assist other employees, as assigned
- Keen understanding of security risks and evangelize the importance of following security procedures
EEO STATEMENT
We believe in equal employment and advancement opportunities for all people, based on ability, potential and record of accomplishment.
Employers will see your profile when they are sending a job in your skill.
Create Your Profile (simple)