生成式人工智能如何影响数据中心运营商的基础设施?
2024-03-08 CError102 3865
正文翻译

How is generative AI impacting your infrastructure?

生成式人工智能如何影响数据中心运营商的基础设施?

In the rapidly evolving realm of AI technology, new developments surface nearly every day. Clearly, AI possesses a significant capacity to transform our lives, a technology that spans chatbots, facial recognition, self-driving vehicles, and early disease detection.

在快速发展的人工智能技术领域,几乎每天都有新的进展。显然,人工智能拥有改变我们生活的巨大潜力,这项技术涵盖了聊天机器人、面部识别、自动驾驶汽车和早期疾病检测。

The global AI market was valued at $142.3 billion in 2023, with finance, healthcare, and the high-tech/telco markets taking the lead in AI adoption.

2023 年,全球人工智能市值达 1,423 亿美元,其中金融、医疗保健和高科技/电信市场将率先采用人工智能。

AI is already being used to monitor data center assets, proactively detect faults and improve energy efficiency by driving better power usage effectiveness (PUE). And its not just being used by Hyperscalers, but also by many large enterprise companies.

人工智能已被用于监控数据中心资产,主动检测故障,并通过提高电源使用效率(PUE)来改善能源效率。而且,这不仅仅是超大规模数据中心才会使用,它还被许多大型企业公司所采用。

InfiniBand versus Ethernet

InfiniBand 与以太网对比

Ethernet remains the prevailing global standard in most data centers. But an increasing number of today’s AI networks now use InfiniBand technology, although InfiniBand holds a mere fraction of the market share at present, primarily for HPC networks.

以太网仍然是大多数数据中心的全球标准。但是,现在越来越多的人工智能网络使用 InfiniBand 技术,尽管 InfiniBand 目前仅占市场份额的一小部分,主要用于 HPC 网络。

Competition is emerging between InfiniBand market leaders and prominent Ethernet switch and chip manufacturers, whose next-generation chips have been designed to construct AI clusters using Ethernet instead of InfiniBand. Regardless of the protocol chosen, both InfiniBand and Ethernet share requirements for high bandwidth and low latency, necessitating top-tier optical cabling solutions for optimal performance to support large language model (LLM) training and inferencing.

InfiniBand市场领先者与知名的以太网交换机和芯片制造商之间的竞争日益激烈,这些制造商的下一代芯片旨在使用以太网而非InfiniBand构建的AI集群。无论采用哪种协议,InfiniBand和以太网都需具备高带宽和低延迟,因此需要顶级的光缆解决方案来实现最佳性能,用以支持大型语言模型(LLM)的训练和推理。

Exponential demands for power and bandwidth

对于电力以及带宽的指数级需求

Two of the key challenges that data centers are facing relate to extreme power needs and associated cooling requirements for the equipment, and the exorbitant bandwidth needs of the GPUs.

数据中心正面临着两个关键性挑战,一个挑战是设备极高的电力需求以及相关设备冷却的需求,另一个挑战则是GPU极高带宽的需求。

Supercomputers with GPUs running AI applications demand vast power and multiple high-bandwidth connections. These GPUs demand from 6.5kW to over 11kW per 6U unit. When contrasted with packed data center cabinets, averaging 7-8kW and maxing at 15-20kW per cabinet, the extent of AI’s power appetite becomes clear. Many of the leading Server OEMs are also offering servers with these GPUs.

配备GPU的超级计算机运行人工智能应用需要大量的电力以及多个高带宽连接。这些GPU每6U个单位的功耗从6.5千瓦到超过11千瓦不等。与平均每个机柜7-8千瓦,每个机柜最大15-20千瓦的密集型数据中心机柜相比,人工智能对电力的需求显而易见。许多头部原始设备制造商们(OEMs)的服务器也提供配备这些GPU。

These GPUs typically need connections with bandwidth of up to 8x100Gb/s (EDR), 200Gb/s (HDR) or 400Gb/s (NDR). Every node commonly has eight connections, equating up to 8x400G or 3.2 terabit per node.

这些GPU通常需要带宽高达8x100Gb/s(EDR)、200Gb/s(HDR)或400Gb/s(NDR)的连接。每个节点通常有八个连接,相当于每个节点的总带宽为8x400G或3.2兆比特。

How will IT infrastructure cope with these requirements?

IT基础设施将如何应对这些需求?

Data center power and cooling demands are pushing network managers to reconsider their infrastructure. This often involves altering network blueprints and spacing out GPU cabinets further, potentially adopting end-of-row (EoR) configurations to better handle escalating temperatures.

数据中心的电力和冷却的需求正迫使网络管理人员重新考虑他们的基础设施。这通常涉及改变网络蓝图以及进一步拉大 GPU 机柜的间距,甚至可能采用行端(EoR)配置,以更好地应对不断升高的温度。

However, this means an increased physical gap between switches and GPUs. To accommodate this, data center operators might need to incorporate more fiber cabling used for switch-to-switch connections. Given these extended spans, direct attach cables (DACs) are unlikely to be suitable as they are confined to five meters at most for such speeds.

但是,这也意味着交换机和GPU之间的物理间隙拉大。为了适应这种情况,数据中心运营商可能需要增加光纤布线用于交换机之间的连接。 鉴于跨度如此之大,直接连接电缆 (DAC) 不太可能适用,因为在这种速度下,直接连接电缆的长度最多只能达到 5 米。
原创翻译:龙腾网 https://www.ltaaa.cn 转载请注明出处


Active optical cables (AOCs) are also a feasible choice thanks to their capacity to cover greater distances compared to DACs. AOCs offer the added advantages of significantly reduced power consumption in comparison with transceivers, as well as enhanced latency.

活动光缆(AOCs)也是一种选择,因为它们能够覆盖比DACs更远的距离。与光收发器(DACs)相比,活动光缆还具备功耗显著降低,延时更短等优点。
原创翻译:龙腾网 https://www.ltaaa.cn 转载请注明出处


Transitioning data center backbone interconnections between switches will necessitate parallel optic technology to sustain increasing bandwidth demands. Several existing choices for parallel fiber optic technology employ eight fibers in conjunction with multi-fiber push-on connectivity (MPO/MTP fiber connectors). These MPO Base-8 solutions permit the adoption of either singlemode or multimode fiber and facilitate smooth migration to higher speeds. For enterprise data centers, contemplating a Base-8 MPO OM4 cabling solution is advisable when upgrading to 100Gb/s and 400Gb/s. Conversely, cloud data centers should sext a Base-8 MPO singlemode cabling solution while transitioning to 400Gb/s and 800Gb/s speeds.

为了满足不断增长的带宽需求,数据中心骨干间的交换机之间连接必须采用并行光学技术。现有的几种并行光纤技术采用八根光纤与多光纤推入式连接(MPO/MTP 光纤连接器)。这些 MPO Base-8 解决方案允许采用单模或多模光纤,便于向更高速度平稳迁移。对于企业数据中心来说,在升级到 100Gb/s 和 400Gb/s 时,最好考虑使用 Base-8 MPO OM4 布线解决方案。

Innovative new fiber enclosure systems on the market can flexibly support different fiber modules, including Base-8 and Base-12 with shuttered LC, MTP pass-thru modules, and splicing modules. They allow for easy access and improved cable management.

市场上创新的新型光纤外壳系统可以灵活地支持不同的光纤模块,包括带闭合 LC 的 Base-8 和 Base-12、MTP 直通模块和熔接模块。它们可以方便地接入并改进电缆管理。

In the realm of AI applications, where latency holds immense significance, Siemon suggests opting for “AI-Ready” solutions employing ultra-low loss (ULL) performance alongside MTP/APC connectors. The incorporation of ultra-low-loss fiber connectivity becomes pivotal for emerging short-reach singlemode applications (backing 100, 200, and 400 Gb/s speeds over distances exceeding 100 meters). This ULL connectivity effectively meets the more stringent insertion loss prerequisites set by AI applications, thereby enhancing the entirety of network performance.

在人工智能应用领域,时延具有极其重要的意义,Siemon 建议选择 "人工智能就绪"(AI-Ready)解决方案,该方案采用超低损耗(ULL)性能和 MTP/APC 连接器。对于新兴的短距离单模应用(在超过 100 米的距离上支持 100、200 和 400 Gb/s 的速度)而言,采用超低损耗光纤连接至关重要。这种超低损耗连接可有效满足人工智能应用所设定的更为严格的插入损耗要求,从而提高整个网络的性能。

Additionally, Expert advises the adoption of APC (angled physical connect) fiber connectors, including the MTP/APC variant, for specific multimode cabling applications, alongside the traditional singlemode approach. The angle-polished end-face configuration of APC connectors (in contrast to UPC connectors) reduces reflectance, thus elevating fiber performance.

此外,除去传统的单模方法外,专家 还建议在特定的多模布线应用中采用 APC(带角度的物理连接)光纤连接器,包括 MTP/APC 变体。APC 连接器(与 UPC 连接器相反)的端面角抛光结构可减少反射,从而提高光纤性能。

AI stands as a disruptive technology, yet it harbors the capacity to transform not just our professional lives but the very fabric of our existence—and data center operators need to prepare for it. Adopting measures to facilitate a seamless shift to elevated data speeds, and enhancing the energy efficiency of data centers, should be a particular focus. Those data center operators who adeptly brace for AI’s demands will find themselves well-placed to leverage the forthcoming prospects accompanying its evolutionary journey and its widespread integration.

人工智能是一种颠覆性技术,它不仅有能力改变我们的职业生涯,还有可能改变我们的生存的基本构架。数据中心运营商需要为此做好准备。采取措施促进数据速度的无缝转变,并提高数据中心的能源效率,应成为特别关注的重点。如果数据中心运营商能够为人工智能的需求做好准备,那么他们就能很好地利用人工智能在发展历程和广泛融合的过程中进行布局,让自己在行业内处于领先地位。

Trends in New AI Use Cases

新人工智能使用趋势案例

While generative AI and related applications are rapidly evolving, several emerging areas already show significant potential in the near term. Some leading new AI applications to watch for include:

虽然生成式人工智能和相关应用正在迅速发展,但一些新兴领域已经在近期显示出的巨大潜力。值得关注的一些领先的新人工智能应用包括:

Generative AI: Generative AI involves the creation of new data, such as images, text, music, or video, by AI models. This technology has applications in various fields, including content creation, design, gaming, and virtual reality. Generative adversarial networks (GANs), that are designed for iterative self-correction learning, have shown remarkable progress in generating realistic, high-quality content.

生成式人工智能(Generative AI)涉及由AI模型创造的新数据,如图像、文本、音乐或视频。这项技术在包括内容创作、设计、游戏和虚拟现实等各个领域都有应用。生成对抗网络(GANs)是为迭代自我校正学习而设计的,已经在生成逼真、高质量内容方面取得了显著进展。

AI in Natural Language Processing (NLP): NLP continues to evolve, with advancements in language understanding, sentiment analysis, and language generation. OpenAI's GPT models, for instance, have demonstrated impressive language generation capabilities. Future applications include more natural and conversational virtual assistants, improved language translation, and enhanced content creation.

自然语言处理 (NLP) 中的人工智能: NLP 不断的发展下,在语言理解、情感分析和语言生成方面取得了进步。例如,OpenAI 的 GPT 模型已经展示了令人印象深刻的语言生成能力。未来的应用包括更自然、更富对话性的虚拟助手、进阶的语言翻译和内容创作上的增强。

Edge AI: Edge computing combined with AI is gaining traction. By deploying AI algorithms and models directly on edge devices, such as smartphones, IoT devices, and autonomous vehicles, real-time decision-making and local data processing can be achieved. This enables faster response times, reduced latency, and improved privacy.

边缘人工智能(Edge AI):边缘计算与人工智能的结合正在逐渐受到关注。通过在智能手机、物联网设备和自动驾驶汽车等边缘设备上直接部署人工智能算法与模型,可以实现实时决策和本地数据处理。这有助于加快响应速度、减少延迟,并提高隐私性。

Explainable AI (XAI): Explainable AI focuses on making AI models and their decisions transparent and interpretable to humans. It aims to address the "black box" nature of deep learning models and provides insights into why certain decisions or predictions are made. XAI is crucial for building trust in AI systems, especially in domains like healthcare, finance, and law.

可解释人工智能(XAI):可解释人工智能专注于使人工智能模型及其决策,对人类透明且易于理解。其目标是解决深度学习模型的“黑匣子”特性,并提供关于为何做出某些决策或预测性的见解。XAI对于建立人们对人工智能系统的信任至关重要,特别是在医疗保健、金融和法律等领域。

AI in Robotics and Automation (XAI): The integration of AI with robotics is advancing automation capabilities across industries. Collaborative robots, or cobots, equipped with AI can work alongside humans in manufacturing and assembly tasks. AI also enables robots to learn and adapt to new environments, enhancing their autonomy and versatility.

机器人与自动化领域的人工智能(XAI): 人工智能与机器人技术的融合正在推进各行业的自动化能力。配备有人工智能的协作机器人(或 cobots)可与人类一起完成制造和装配任务。人工智能还能让机器人学习和适应新环境,增强其自主性和多功能性。

AI for Cybersecurity: As cyber threats become more sophisticated, AI is being employed to strengthen cybersecurity measures. AI algorithms can detect anomalies in network traffic, identify patterns of malicious activity, and prevent cyber-attacks. AI-driven cybersecurity systems can respond and adapt to evolving threats in real-time, providing enhanced protection.

AI在网络安全中的应用:随着网络威胁变得更加复杂,人工智能被用于加强网络安防。人工智能的算法可以检测网络流量中的异常,识别恶意活动的模式,并预防网络攻击。基于人工智能驱动的网络安全系统可以实时响应和适应不断演变的威胁,为网络安全提供了更加强大的保护。

AI in Personalized Medicine: AI is revolutionizing healthcare by enabling personalized medicine approaches. Machine learning models can analyze large-scale patient data, genetic information, and medical records to identify patterns and correlations. This can aid in disease diagnosis, treatment sextion, and predicting patient outcomes, leading to more effective healthcare interventions.

个性化医疗中的人工智能: 通过实现个性化医疗方法,人工智能正在彻底改变医疗保健。机器学习模型可以分析大规模的患者数据、基因信息和医疗记录,以确定模式和相关性。这有助于疾病诊断、治疗选择和预测患者预后,从而实现更有效的医疗干预。

AI for Climate Change and Sustainability: AI is being explored to address pressing environmental challenges. It can help analyze climate data, optimize energy consumption, predict weather patterns, and develop sustainable solutions. AI-powered systems have the potential to optimize resource utilization, reduce emissions, and contribute to environmental conservation.

AI在应对气候变化和可持续发展方面的应用:人工智能被用于解决紧迫的环境挑战。它可以帮助分析气候数据,优化能源消耗,预测天气模式,并开发可持续解决方案。基于人工智能的系统有潜力优化资源利用,减少排放,并促进环境保护。

AI in Natural Language Processing (NLP): NLP continues to evolve, with advancements in language understanding, sentiment analysis, and language generation. OpenAI's GPT models, for instance, have demonstrated impressive language generation capabilities. Future applications include more natural and conversational virtual assistants, improved language translation, and enhanced content creation.

AI在应对气候变化和可持续发展方面的应用: 人们正在探索利用人工智能来应对紧迫的环境挑战。它可以帮助分析气候数据、优化能源消耗、预测天气模式并制定可持续的解决方案。基于人工智能的系统有潜力优化资源利用,减少排放,并促进环境保护。

评论翻译



很赞 2
收藏