Site Reliability Engineer

Our Benefits

City location

Private healthcare

27 days holiday

Buy or sell holidays

Bonus scheme

Pension scheme

Regular social events

£350 Christmas vouchers

Loyalty awards

Birthday off work

What to expect

About the role 

A key member of our Technology Operations team, the Site Reliability Engineer (SRE) is passionate about our products and platforms running at peak performance. They understand how our software runs on top of the infrastructure (both physical and Cloud) and how the individual components interact to provide the brilliant messaging solutions our customers expect.

This role takes pride in the performance and availability of the solutions they look after, and works hard to ensure the manual overhead of running a 24×7 internet solution is reduced to the minimum amount possible.

The SRE will work collaboratively with Infrastructure and Engineering teams to ensure issues with existing software are understood and responded to appropriately and that new solutions are built with reliability at their heart.

About Commify

We make business communication brilliant! We work with more than 45,000 companies, helping them to transform their mobile communication with their customers and employees. Our success is the result of hundreds of talented people pulling together to achieve a common goal. Join our team and be part of our success story. 

You will thrive in an environment of passion, integrity, ownership and innovation, where development and progression is a real focus. We’d like to think we have everything you’d expect from a benefits package, from 27 days holiday and your birthday off work, to private medical cover, dental cover and bi-monthly social events! On top of this you can expect £350 of Christmas vouchers and added extras like beer o’clock and an amazing Christmas party.

Principle duties and responsibilities

The role holder will be responsible for:

Developing the data platform, testing, improving and maintaining new and existing data technologies and data feeds.

Your duties will include:

  • Ensuring high levels of system performance through monitoring, analysis and performance tuning
  • Troubleshooting system hardware, software, networks, operating and system management systems
  • Working with the Security team to identify and protect against threats to the Products we offer
  • Liaising with Developers, Product Owners and other Engineering teams to deliver engineering roadmaps showing key items such as upgrades, technical refreshes and new versions
  • Contributing to reviews and audits of projects from an engineering perspective, including identifying risks and mitigation options
  • Providing on-call support including out-of-hours incident support on a rota basis to help deliver a high quality of service around the clock
  • Building knowledge and skills within Engineering teams to ensure successful running of their software in a high throughput production environment
  • Building systems to enable safe, secure and rapid software deployment
  • Implementing scalability and fault tolerance
  • Using data and feedback loops to make better decisions about managing systems
  • Improving processes through automation or other efficiencies

Skills / Experience

Essential:

  • Passion around reliability
  • Previous experience of working in an Operations role (ideally a Site Reliability role)
  • Ability to work collaboratively across multiple teams, to take ownership of, prioritise and be accountable for your work
  • Excellent communication skills and a desire to continue to learn
  • Centralised monitoring solutions (New Relic, Application Insights, Log Analytics, ELK or similar)
  • Configuration Management tools (Ansible, Chef or similar)
  • Scripting/programming languages to assist in automating solutions e.g. PowerShell (preferred), Bash, C#, Ruby, Python.
  • Experience supporting web-based applications – with understanding of firewall configuration, load balancing and availability checks
  • Experience of working with Linux and Microsoft Server Operating Systems

Desirable:

  • Experience of defining service level objectives/operational requirements for a Cloud-based solution
  • Understanding and working knowledge of Microsoft Azure Cloud offerings, especially in the Platform as a Service category (Web Apps, Storage, Functions)
  • A good understanding or working knowledge of the following tools: Terraform, Ansible, VSTS, ARM, Puppet, Chef, Jenkins, ELK, Grafana
  • A good understanding or working knowledge of DNS, Load Balancer configuration, Active Directory and Cloud-based network infrastructure
  • Experience of working in an agile environment and experience with agile methodologies such as TDD, Scrum, Kanban
  • Understanding and experience of implementing a monitoring and alerting system for a micro-service architecture
  • Applied understanding of cloud security best practice

Please note: This is intended as a guide to the range of duties involved. The post holder will need to be flexible and adaptable in order to respond to changes and developments in business priorities

What to do next

To apply please send your CV to recruitment@esendex.com.

Diversity

We’re committed to building a team with a variety of backgrounds, views and skills, embracing our key values. The more diverse and inclusive we are, the stronger we are as a team. We encourage applications from all candidates with the relevant skills and experience.

The legal stuff

Esendex is committed to protecting the privacy and security of your information. Personal information submitted as part of the recruitment and selection process will only be used for these purposes. We will retain information for up to 12 months, after which it will be deleted or destroyed. For full information about your rights in relation to your data, please see our full Recruitment Privacy Policy here.