DE Jobs

Search from over 2 Million Available Jobs, No Extra Steps, No Extra Forms, Just DirectEmployers

Job Information

Meta Production Engineer, Network in Menlo Park, California

Summary:

Meta is seeking a Production Engineer with in-depth understanding of networking, systems, automation, and tooling to join the PE Network team. This team is responsible for deploying and managing one of the world’s largest and most complex networks. Meta’s network is a foundational component in achieving the company's AI goals and this role would play a key role in supporting it. Given the scale and demands of our infrastructure, automation plays a critical role. In this position, you will design, develop, and implement automation and tooling to streamline network operations while ensuring the scalability and reliability of Meta’s global network. You’ll collaborate with top engineers in the industry to build and maintain the systems that power one of the largest networks in the world, supporting billions of users across our applications.Make the global Datacenter network fleet reliable and available for all Infrastructure services to use. Contribute to the company's mission by operating and bringing to production all the new network products that enable networking for AI training and Inference.A Network Production Engineer in this role would support leading Meta’s server fleet connections to the network, specifically from the Network Interface Card layer, top of rack and beyond. They would operate at a unique intersection of low level systems engineering and handle the challenge of operating a massively distributed fleet that is uniquely available at Meta. Network Production Engineers are exposed to bleeding edge technology being developed internally at Meta to optimize our server network communication stack.

Required Skills:

Production Engineer, Network Responsibilities:

  1. Conceptualize, build, and maintain automation and tools to support the next generation of network products, network deployment, release engineering and operations.

  2. Develop operational process improvements and implement them in scalable, automated workflows to enhance operational efficiency.

  3. Design and develop solutions that scale across a variety of network platforms.

  4. Lead enhancements of automation for continuous integration, validations, testing infrastructure, release, and configuration management across our global data center network fleet.

  5. Conduct thorough investigations into complex technical issues across networks, ranging from automated tooling to hardware failures and network issues.

  6. Participate in a weekly on-call rotation with the team and be an escalation contact for your service

  7. Proactively find operational gaps that impact the efficiency of your team, come up with the execution plan, and drive the project directly and through influence of other team members.

  8. Contribute to team growth and development through peer mentorship.

Minimum Qualifications:

Minimum Qualifications:

  1. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.

  2. Experience developing software to automate operations.

  3. 5+ years of experience developing and understanding network device configurations for at least one network vendor (e.g. Arista, Juniper, Cisco, Brocade, Ciena, Infinera, Nokia, etc.).

  4. 5+ years of coding experience in at least one programming language (e.g. Python, Go,C++,).

  5. Demonstrated knowledge of TCP, IPv4/6, Routing Protocols (one or more of BGP, MPLS, ISIS, or similar), or related network services (e.g. DHCP and DNS).

Preferred Qualifications:

Preferred Qualifications:

  1. Master's degree or graduate work experience in Computer Science, Computer Engineering, or a related technical field..

  2. 6+ years of experience building software solutions for managing network infrastructure, with a focus on scalability and reliability.

  3. In-depth knowledge of software and network debugging, profiling, and instrumentation techniques to ensure optimal system performance.

  4. Proven experience designing, developing, and operating distributed systems at scale, with an in-depth understanding of the challenges and opportunities in this space.

  5. Experience designing and maintaining automated testing infrastructure to ensure the quality and reliability of our systems.

  6. Knowledge of IB/RDMA/RoCE Networks, including RDMA congestion control mechanisms, AI training workloads and demands they exert on networks

Public Compensation:

$147,000/year to $208,000/year + bonus + equity + benefits

Industry: Internet

Equal Opportunity:

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.

DirectEmployers