DE Jobs

Search from over 2 Million Available Jobs, No Extra Steps, No Extra Forms, Just DirectEmployers

Job Information

Davita Sr Engineer, Operations System (IT) in Denver, Colorado

2000 16th Street,Denver,Colorado,80202,United States of America

Job Description:

Position Overview

DaVita IT Operations is seeking an experienced and innovative Continuity Engineer - Infrastructure to enhance our IT resilience and operational readiness. This role is critical in ensuring that our infrastructure aligns with best practices in continuity and cyber resiliency. The Continuity Engineer will design, implement, and maintain advanced infrastructure solutions with a focus on fault domain isolation, multi-region reliability, cloud tiering, and elastic compute capabilities to support seamless business continuity.

Key Responsibilities

  1. Resilience Design and Fault Domain Best Practices Architect and implement infrastructure solutions using fault domain best practices, ensuring redundancy across multiple fault zones, including: Two storage arrays to support diverse and resilient storage configurations. Two compute platforms designed for Multi-AZ, Multi-Region, and Multi-Cluster (blue-green/A & B) strategies. Design cloud tier storage and backup solutions, including off-premise options and tiering to AWS AZs (e.g., S3 targets).

2.Infrastructure Monitoring and Maintenance Deploy and maintain advanced monitoring solutions within protected zones, ensuring seamless management functionality. Implement auto-healing elastic compute strategies to maintain high levels of maturity in fault tolerance and recovery. Regularly test failover and recovery processes, with comprehensive quarterly testing and documentation for continuous improvement.

3.Incident Management and Recovery Develop and maintain playbooks for critical applications that operate across multi-data center and multi-region deployments. Act as the lead engineer during infrastructure recovery scenarios, ensuring quick and efficient resolution. Document lessons learned post-incident to refine failover and recovery plans.

4.Cross-Functional Collaboration Partner with IT Security to ensure continuity strategies align with zero-trust principles while optimizing processes such as port request approvals. Collaborate with cloud engineering teams to integrate resilient cloud tier storage and backup solutions into hybrid and multi-cloud environments. Work closely with other IT Operations teams to align infrastructure strategies with organizational resilience objectives.

5.Optimization and Automation Drive automation initiatives to support elastic compute/auto-healing capabilities, enabling dynamic scaling and resource optimization. Continuously evaluate new technologies and practices to enhance multi-region, multi-cluster, and hybrid infrastructure resilience. Support cost-effective tiering strategies for storage and compute resources, leveraging cloud-native tools for efficiency.

  1. Documentation and Testing Maintain detailed and up-to-date documentation of infrastructure configurations, including fault domain architecture and backup tiering strategies. Facilitate and oversee failover and recovery testing, ensuring that systems and teams are prepared for operational disruptions. Provide training and guidelines for stakeholders on managing critical application continuity across regions and data centers.

Qualifications

Education: Bachelor's degree in Information Technology, Computer Science, or a related field.

Experience: 7+ years in infrastructure engineering with a focus on business continuity, disaster recovery, or cyber resiliency. Proven experience with Multi-AZ, Multi-Region, and Multi-Cluster architectures. Expertise in designing and managing cloud tier storage and backup solutions for off-premises and multi-cloud environments.

Technical Skills: Proficiency in cloud platforms (AWS, Azure, Google Cloud), including tiering and backup solutions. Deep knowledge of fault domain isolation and strategies for distributed systems. Familiarity with elastic compute platforms and tools for auto-healing and dynamic scaling. Experience with management GUIs and monitoring tools in protected zones.

Soft Skills: Strong analytical and problem-solving skills to anticipate and mitigate risks. Effective communication and collaboration skills for cross-functional teams. Ability to prioritize and manage multiple complex projects simultaneously.

Preferred Certifications Google Professional Cloud Architect or other relevant cloud certifications. Certified Business Continuity Professional (CBCP) or equivalent. ITIL Foundation Certification.

Here is what you can expect when you join our Village:

DirectEmployers