NVIDIA AI代工厂：定制部署Llama 3.1模型，加速企

【智东西7月24日报道】NVIDIA，这一全球领先的AI计算公司，近期宣布推出全新NVIDIA AI Foundry服务和NVIDIA NIM推理微服务，旨在为全球企业提供更加强大、高效且定制化的生成式AI解决方案。此次NVIDIA与Meta Llama 3.1开源模型的携手，标志着企业级AI代工厂NVIDIA AI Foundry的全面升级，为全球企业开启了生成式AI应用的新篇章。

Llama 3.1大语言模型，以其8B、70B和405B三种参数规模的模型，展现了其在语言理解与生成领域的卓越能力。这些模型均在超过16000个NVIDIA Tensor Core GPU上进行训练，并针对NVIDIA的加速计算和软件进行了优化，确保其在数据中心、云环境乃至配备NVIDIA RTX GPU的本地工作站或配备GeForce RTX GPU的个人电脑上均能高效运行。

NVIDIA AI Foundry作为企业级AI代工厂，集成Llama 3.1模型，为企业构建和部署自定义的Llama超级模型提供了强大支持。NVIDIA创始人兼CEO黄仁勋表示：“Meta的Llama 3.1开源模型，是全球企业采用生成式AI的关键时刻。NVIDIA AI Foundry将助力企业构建先进生成式AI应用，推动各个行业的发展。”

NVIDIA AI Foundry依托NVIDIA DGX Cloud AI平台，与全球领先的公有云合作，为企业提供构建自定义超级模型的端到端服务。企业不仅可以轻松创建和定制AI服务，还能通过NVIDIA NIM进行高效部署。借助NVIDIA AI Foundry，企业可以利用自有数据以及由Llama 3.1 405B和NVIDIA Nemotron Reward模型生成的合成数据，训练自定义模型，提高准确性。通过领域自适应预训练（DAPT），企业可以进一步优化模型性能。

NVIDIA与Meta共同为Llama 3.1提供了一种蒸馏方法，使开发者能够创建更小的自定义Llama模型，这些模型可以在更多加速基础设施上运行，如AI工作站和笔记本电脑。创建自定义模型后，企业可以构建NVIDIA NIM推理微服务，将其部署到首选的云平台或全球服务器制造商提供的NVIDIA认证系统上，使用MLOps和AIOps平台在生产环境中运行模型。

通过NVIDIA NIM推理微服务，企业可以将Llama 3.1模型部署到生产中，其吞吐量最高可比不使用NIM运行推理时高出2.5倍。此外，NVIDIA NeMo Retriever NIM微服务与Llama 3.1模型结合，为企业提供了先进的检索工作流，以提高响应准确性，特别适用于AI copilot、助手和数字人虚拟形象的应用场景。

NVIDIA AI Foundry的推出，不仅整合了NVIDIA软件、基础设施和专业知识，还与开放社区模型、技术和来自NVIDIA AI生态系统的支持相结合，旨在加速AI从开发到部署的全过程。专业服务公司埃森哲等全球系统集成商合作伙伴，将与AI Foundry客户紧密合作，共同推进AI技术的广泛应用。

此次NVIDIA AI Foundry与Llama 3.1的结合，不仅为全球企业提供了强大的生成式AI工具，也预示着AI技术在各行业应用的广阔前景。通过定制化、高效部署和优化性能，NVIDIA AI Foundry将助力企业加速创新，推动AI技术在实际业务中的应用，为全球数字化转型注入新的活力。

英语如下：

Headline: “NVIDIA AI Foundry: Custom Deployment of Llama 3.1 Model to Accelerate Enterprise Generative AI Innovation”

Keywords: AI Foundry, Llama Model, NVIDIA Services

News Content: [Reported by智东西 on July 24] Global AI computing leader NVIDIA has recently announced the launch of its new NVIDIA AI Foundry services and NVIDIA NIM inference microservices, aimed at empowering enterprises worldwide with more powerful, efficient, and customized generative AI solutions. This partnership between NVIDIA and Meta’s Llama 3.1 open-source model marks a comprehensive upgrade for NVIDIA AI Foundry, an enterprise-grade AI foundry, opening a new chapter in the application of generative AI.

Llama 3.1, a large language model, showcases its prowess in language understanding and generation through models of 8B, 70B, and 405B parameters, all trained on over 16,000 NVIDIA Tensor Core GPUs and optimized for NVIDIA’s accelerated computing and software. These models are designed to run efficiently on data centers, cloud environments, as well as local workstations equipped with NVIDIA RTX GPUs and personal computers with GeForce RTX GPUs.

NVIDIA AI Foundry, as an enterprise-grade AI foundry, integrates the Llama 3.1 model to provide robust support for enterprises in building and deploying custom Llama supermodels. NVIDIA’s Founder and CEO, Jensen Huang, stated, “Meta’s Llama 3.1 open-source model is a pivotal moment for global enterprises to adopt generative AI. NVIDIA AI Foundry will enable enterprises to build advanced generative AI applications, driving advancements across industries.”

Relying on the NVIDIA DGX Cloud AI platform, NVIDIA AI Foundry collaborates with leading public clouds worldwide to provide end-to-end services for building custom supermodels. Enterprises can easily create and customize AI services, with efficient deployment through NVIDIA NIM. Leveraging NVIDIA AI Foundry, enterprises can train custom models using their own data and synthetic data generated by the Llama 3.1 405B and NVIDIA Nemotron Reward models, enhancing accuracy. The use of domain-adaptive pre-training (DAPT) further optimizes model performance.

NVIDIA and Meta have provided a distillation method to enable developers to create smaller, custom Llama models that can run on a wider range of accelerated infrastructure, including AI workstations and laptops. After creating custom models, enterprises can build NVIDIA NIM inference microservices to deploy these models to preferred cloud platforms or NVIDIA-certified systems provided by global server manufacturers. This is done using MLOps and AIOps platforms in production environments.

With NVIDIA NIM inference microservices, enterprises can deploy the Llama 3.1 model into production, achieving up to 2.5 times higher throughput compared to running inference without NIM. The integration of NVIDIA NeMo Retriever NIM microservices with the Llama 3.1 model provides advanced retrieval workflows for enterprises to improve response accuracy, particularly in applications like AI copilots, assistants, and digital human virtual avatars.

The launch of NVIDIA AI Foundry integrates NVIDIA’s software, infrastructure, and expertise with contributions from open-source communities, models, and technologies from the NVIDIA AI ecosystem, aiming to expedite the entire AI development-to-deployment process. Global system integrators such as Accenture, among others, will collaborate closely with NVIDIA AI Foundry customers to widely deploy AI technologies.

The partnership between NVIDIA AI Foundry and Llama 3.1 not only empowers global enterprises with powerful generative AI tools but also foresees a vast future for AI technology across various industries. By enabling customization, efficient deployment, and performance optimization, NVIDIA AI Foundry accelerates innovation, driving the practical application of AI technology in business operations and fueling the global digital transformation with renewed vigor.

【来源】https://zhidx.com/p/435084.html