Lenovo 大模型分布式训练研究员 in Beijing, China

Job Information

Lenovo 大模型分布式训练研究员 in Beijing, China

大模型分布式训练研究员

General Information

Req #

100015311

Career area:

Artificial Intelligence

Country/Region:

China

State:

Beijing

City:

北京（Beijing）

Date:

Thursday, July 4, 2024

Additional Locations :

China

Why Work at Lenovo

We are Lenovo. We do what we say. We own what we do. We WOW our customers.

Lenovo is a US$57 billion revenue global technology powerhouse, ranked #248 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).

This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com , and read about the latest news via our StoryHub (https://news.lenovo.com/) .

Description and Requirements

岗位职责：

1.负责深度学习大模型的分布式训练系统的架构设计与开发，优化模型训练效率和资源利用率；

2.研究并实现基于GPU等高性能计算平台的大规模深度学习模型并行训练算法；

3.对现有深度学习框架和分布式训练框架（如PyTorch、TensorFlow、DeepSpeed、Colossal-AI，megatron）进行深度定制和扩展，以满足大规模模型训练的需求；

4.与算法团队紧密合作，解决在超大规模数据集上模型训练过程中的性能瓶颈问题；

5.设计并实现模型训练监控系统，包括但不限于训练进度、资源占用情况、训练效果可视化等；

6.持续跟踪最新的分布式训练技术发展趋势，将前沿研究成果应用于实际项目中。

任职要求：

1.计算机科学或相关专业硕士及以上学历，具有3年以上深度学习领域工作经验，有大型互联网公司或者AI实验室工作经验者优先；

2.熟练掌握至少一种深度学习框架和分布式训练框架（如PyTorch、TensorFlow），并具备丰富的模型开发与训练经验；

3.精通分布式系统原理，熟悉常见的分布式计算框架（如MPI、DeepSpeed、Colossal-AI、OneFlow），有大规模并行计算和分布式训练系统开发经验；

4.具备良好的算法基础，对深度学习模型训练优化有深入理解和实践经验，包括但不限于梯度压缩、通信优化、异步训练等；

5.有大模型分布式训练理论和实践经验，熟悉国内外主流基础大模型;

6.具备优秀的分析和解决问题的能力，能够独立进行复杂问题定位与解决；

7.对于计算机体系结构、操作系统、网络编程等相关知识有一定理解；

8.英语读写能力强，能快速阅读英文文献和技术文档，追踪国际最新研究动态和技术趋势。

加分项：

1.在顶级会议或期刊（如NIPS, ICML, ICLR, JMLR等）发表过关于分布式训练或深度学习相关论文；

2.参与过开源分布式训练项目，并有显著贡献。

Additional Locations :

China
China

Apply Now

DE Jobs

Search from over 2 Million Available Jobs, No Extra Steps, No Extra Forms, Just DirectEmployers

Job Information

Lenovo 大模型分布式训练研究员 in Beijing, China

Current Search Criteria