DE Jobs

Search from over 2 Million Available Jobs, No Extra Steps, No Extra Forms, Just DirectEmployers

Job Information

NCR Atleos DataOps and SRE Engineering Manager in ATLANTA, Georgia

About NCR Atleos

NCR Atleos, headquartered in Atlanta, is a leader in expanding financial access. Our dedicated 20,000 employees optimize the branch, improve operational efficiency and maximize self-service availability for financial institutions and retailers across the globe.

DataOps and SRE Engineering Manager

Location: Atlanta (Midtown)

Work style: hybrid (in office M/T/W/Th)

We are seeking a skilled DataOps & Site Reliability Engineering (SRE) Manager to lead and manage a team of 5-10 Data Engineering, SRE Engineering and DataOps professionals. The ideal candidate will be responsible for the reliability, availability, and performance of data platforms: traditional, cloud based, or data lakes in Azure.

Your responsibilities in this role will include:

  • Team Leadership: You will be responsible for building and managing a high-performing team of SREs & DataOps professionals. This includes hiring, training, mentoring, and guiding team members.

  • Strategy and Planning: You will work closely with cross-functional teams to develop and implement strategies for improving system reliability, scalability, and performance. This may involve capacity planning, incident management, and disaster recovery planning.

  • Data Operations Management: You will oversee the day-to-day operations of data pipelines, ensuring data is collected, processed, and stored accurately and efficiently. This includes monitoring data quality, troubleshooting issues, and optimizing data workflows.

  • Data Integration: You will work closely with cross-functional teams, such as data engineers, data scientists, and business analysts, to ensure seamless integration of data from various sources into the organization's data infrastructure. This may involve designing and implementing data integration processes and tools.

  • System Reliability: You will oversee the monitoring, alerting, and incident response processes to ensure timely identification and resolution of system issues. You will also drive the implementation of automation and self-healing mechanisms to minimize manual intervention.

  • Collaboration: You will collaborate with development teams, infrastructure teams, and other stakeholders to ensure that reliability and performance requirements are considered throughout the software development lifecycle. This includes participating in design reviews, providing feedback on architecture, and promoting best practices.

  • Continuous Improvement: You will drive a culture of continuous improvement by identifying areas for optimization, implementing process enhancements, and leveraging new technologies and tools to enhance system reliability and data management processes.

  • Documentation, Communication and Knowledge Sharing: You will ensure that data operations processes, runbooks, and incident postmortems are created and maintained to facilitate knowledge sharing, learning from past incidents, and providing guidance on data-related matters.

Required Skills & Experience

  • Bachelor's degree in computer science or a related field or equivalent work experience

  • 8+ years relevant experience in DevOps / Data Engineering / SRE Engineering

  • A strong understanding of system architecture, cloud services and infrastructure is essential. This includes knowledge of networking, operating systems, databases, and distributed systems. Familiarity with tools and technologies such as Kubernetes, Docker, monitoring systems, and automation frameworks is also important.

  • Experience in incident response and management is crucial. This includes the ability to lead and coordinate incident response efforts, perform root cause analysis, and implement preventive measures. Strong problem-solving and troubleshooting skills are necessary to handle critical incidents effectively.

  • Proficiency in data management concepts, including data integration and data security. Experience with data modeling, data warehousing, and ETL (Extract, Transform, Load) processes is also important.

  • Knowledge of programming languages like Python, R, or SQL is valuable for data manipulation, analysis, and automation tasks. Scripting skills, particularly with PowerShell, can be beneficial for managing Azure resources.

  • Experience in designing and implementing data pipelines and ETL processes using tools like Azure Data Factory/Airflow and Azure Databricks/Azure Synapse. Understanding data ingestion, transformation, and data integration techniques is crucial.

  • Proficiency in data visualization tools like Power BI or Azure Synapse Analytics for creating interactive dashboards and reports. Ability to communicate insights effectively to stakeholders is important.

  • Understanding of performance optimization techniques and scalability principles. Ability to analyze system performance metrics, identify bottlenecks, and implement optimizations to ensure high availability and reliability.

  • Familiarity with Agile methodologies and project management tools can be beneficial.

  • Strong analytical and problem-solving skills to identify and resolve complex technical issues. Ability to make data-driven decisions and prioritize tasks based on business impact.

  • Prior experience managing a team of local and offshore resources.

  • Excellent communication skills including the ability to present to a diverse audience (business, technical, and leadership) and clearly communicate across diverse cultures and time zones.

Preferred Skills:

  • Experience with open-source DataOps tools such as Apache NiFi or Luigi.

  • Experience with tools like Prometheus, Grafana, and Datadog are widely used for monitoring the health and performance of systems, collecting metrics, and setting up alerts.

  • Familiarity of tools like ELK Stack (Elasticsearch, Logstash, Kibana), and/or Azure Log Analytics used for centralized logging, log analysis, and troubleshooting.

  • Knowledge of Azure Data Platform including Azure SQL Database, Azure Synapse Analytics, Azure Databricks, Azure HDInsight, Azure Cosmos DB, Azure Data Lake Storage, Azure Stream Analytics, and Azure Data Factory

  • Knowledge of applying ML/AI in DataOps and SRE practices

#LI-AR1

#LI-HYBRID

Offers of employment are conditional upon passage of screening criteria applicable to the job.

Full time employee benefits include :

  • Medical Insurance

  • Dental Insurance

  • Life Insurance

  • Vision Insurance

  • Short/Long Term Disability

  • Paid Vacation

  • 401k

EEO Statement

NCR Atleos is an equal-opportunity employer. It is NCR Atleos policy to hire, train, promote, and pay associates based on their job-related qualifications, ability, and performance, without regard to race, color, creed, religion, national origin, citizenship status, sex, sexual orientation, gender identity/expression, pregnancy, marital status, age, mental or physical disability, genetic information, medical condition, military or veteran status, or any other factor protected by law.

Statement to Third Party Agencies

To ALL recruitment agencies: NCR Atleos only accepts resumes from agencies on the NCR Atleos preferred supplier list. Please do not forward resumes to our applicant tracking system, NCR Atleos employees, or any NCR Atleos facility. NCR Atleos is not responsible for any fees or charges associated with unsolicited resumes.

DirectEmployers