首页 >> 来自播客: Meta AI 更新反馈

AI Infra @Scale | Meta AI

发布时间 2023-05-19 06:44:20 来源

1,5%

1.5%表示一种数量占比，即每100个单位中有1.5个单位，相当于小数0.015或百分数1.5%。例如，1.5%的收益率表示每100元投资将获得1.5元的收益。

If you think about it, Mattur's apps, Facebook, Instagram, WhatsApp, Messenger, Oculus, these are some of the busiest online destinations on the planet. It's how half of humanity connects, they connect with the loved ones, how people make a living, where they turn to coordinate, services when there are emergencies, and the magic, the magic in all of this, is that, if you can call it magic by the way, because I think about it as hard work, is that it just works. And when it works, which hopefully is all of the time, people do not even think about it.

如果你想想，Mattur的应用，Facebook、Instagram、WhatsApp、Messenger、Oculus，这些是地球上最繁忙的网站之一。这是全球人类的一半如何联系，他们与所爱之人联系，人们如何谋生，他们在有紧急情况时寻求协调服务，而其中的魔力，这种魔力，如果你能这么称呼它的话，因为我认为这是艰苦的劳动，就是它只是工作。当它工作时，希望始终如此，人们甚至不会想到它。

Hey folks, my name is Senthosh Narvan, and I head up infrastructure here at Mattur. I've been here a long time, 13 years to be exact, and I've been very lucky to be part of this journey. You're facing another inflection point in our infrastructure journey, the reality is this, AI is no longer a minor workload, it is the workload. We are now AI first, and AI is central to every single thing we do here, at infrastructure at Mattur.

大家好，我是Senthosh Narvan，是Mattur的基础设施负责人。我已经在这里工作了很长时间，确切地说是13年，很幸运能成为这个旅程的一份子。我们在基础设施旅程中面临着另一个拐点，现实是，人工智能不再是一个小的工作负载，它是工作负载的中心。我们现在以人工智能为先，人工智能是Mattur基础设施中所做的每一件事情的核心。

It enables better personalization, it allows for safer and fairer products, it also delivers much richer experiences for people, while helping businesses reach the audiences, they really care about the most. AI is also different. It's a game changer, because it comes with great promise, but also with great challenges. The hardware and the software needs required a support, and developer AI are profoundly different from the basic compute technologies that we have been familiar for, for a couple of decades at this point.

AI的优势在于它能够更好地个性化服务，使产品更加安全公平，为人们提供更加丰富的体验，并帮助企业吸引最关注的受众。AI与其他技术不同，它具备巨大的潜力，但也面临着巨大的挑战。需要支持硬件和软件，开发AI大大区别于我们数十年来所熟知的基本计算技术。

So, we design and custom build our DCs, we design custom builder hardware, we own the kernel, the silicon, the software stack, and we know our workloads. We have PyTorch, that is the clue between all of this. We can vertically build an integrator stack like few others can. This has inspired us to rethink completely what we are doing for AI, and also do it at an immense scale. And as we build, we look for ways to open source our breakthroughs so that the world can benefit, and people can innovate with us, alongside us.

因此，我们设计并自定义构建我们的数据中心，设计自定义建造硬件，拥有内核、硅片、软件堆栈，并且了解我们的工作负载。我们拥有PyTorch，这是所有这些的线索。我们可以垂直地构建一个整合器堆栈，像很少有其他人可以做到的那样。这启发我们完全重新思考我们为人工智能所做的工作，并且以巨大的规模进行。随着我们的建设，我们寻找方法公开我们的突破，以使世界受益，并让人们与我们一起创新。

We'll share today how we are transforming everything in our infrastructure. Across the six presentations you'll hear today, we'll export some of our latest innovations, including the latest research coming from a fully built RSC, which is one of the fastest AI supercomputers in the world. We'll share progress on our in-house custom build silicon investments for inference, video encoding, and as I mentioned earlier, the DC designs for AI is absolutely critical.

今天我们将分享如何转变我们基础设施的所有内容。在今天的六个演讲中，我们将介绍一些最新的创新成果，包括来自全球最快的人工智能超级计算机之一——完全建成的RSC的最新研究成果。我们将分享我们在推断、视频编码和像我之前提到的AI DC设计领域的内部自定义硅芯投资方面的进展，这绝对至关重要。

So, we hear about how we designing our facilities to support future. We need liquid cooling for AI hardware, and we're doing this with shorter construction timelines and much lower cost. Now, we have goals to use AI internally as well in our consumer products. So, we'll share about Metas Generative AI-based coding assistant, and the longer-term vision to help developers across the software development cycle, the SDLC, if you will. Finally, we'll talk about PyTorch, and the impact it has on our overall AI-infravision.

我们听说过如何设计设施来支持未来。我们需要采用液冷技术来支持AI硬件，并且我们正在缩短建造时间并大大降低成本。现在，我们也有计划在我们的消费品中内部使用AI。因此，我们将会分享关于Meta基于生成AI的编码助手，以及长期的愿景是帮助软件开发周期SDLC中的开发人员。最后，我们将讨论PyTorch在我们整体AI基础设施上的影响。

To wrap up the day, we'll bring to you the leaders across our infra-organization in a panel. I'm super excited about this, to share the perspective on the future of AI infrastructure. More reading the discussion is Irene Kaufman. She's Metas PM leader, responsible for all our AI efforts that span across many teams into the company. She and other infra leaders will be exploring the challenges and opportunities that lie ahead, and how meta plans to focus on delivery, the long-term value, and impact to guide our vision here.

为了结束这一天，我们将在一个小组内介绍我们基础组织的领导者。我非常兴奋能够分享关于人工智能基础设施未来的观点。更多的阅读讨论的人是Irene Kaufman。她是Meta的PM领导者，负责我们在公司众多团队中实现的所有人工智能工作。她和其他基础设施领导者将探讨未来面临的挑战和机遇，以及Meta如何专注于交付、长期价值和影响，来指导我们的愿景。

I want to thank all of you again for joining us. All through the event if you have questions if you'd like to answer, or topics you'd like us to cover in future at-scale events, or in our meta technical blogs, simply visit our website at scaleconference.com, or scan the QR code right on your screen.

我要再次感谢大家加入我们。在整个活动过程中，如果您有问题想要提问，或者您想我们在未来的大规模活动中涉及的话题，或者在我们的技术博客中涵盖的话题，可以随时访问我们的网站scaleconference.com，或者扫描您屏幕上的QR码。

We'll kick off presentations today with RSC, our AI Supercomputer. Ensuring that our platform is open, to ask many diverse cultures, languages, perspectives, is a significant challenge at a scale, as you can imagine, and that requires pretty intensive, large-scale AI models. Complexity is a fading of virtual reality meta-verse, who increases the challenge space. This requires much larger models, greater number of modalities, parameters, as you can imagine.

今天我们将以RSC作为我们的AI超级计算机开启展示。保证我们的平台对多元文化、语言和观点开放，对于一个如此庞大的规模来说，是一个重大的挑战，这需要相当多、相当大规模的AI模型。虚拟现实元宇宙复杂性逐渐消失，增加了挑战空间。正如你所能想象的，这需要更大的模型、更多的模态和参数。

We anticipated the challenges and have built out the research super-cluster AI Supercomputer, a dedicated, high-performance state of the art cluster, to accelerate AI research. Technical program managers Scott, Jeshoenik, and software engineer, Kalyan Saladi, will join us now. We will present the architectural choices that went into building the cluster. Enjoy, you're going to have a fun ride.

我们预见到了挑战并建立了研究超级集群AI Supercomputer，这是一个专用的高性能现代化集群，以加速人工智能研究。技术项目经理Scott、Jeshoenik和软件工程师Kalyan Saladi将会加入我们。我们将展示建立该集群所做的架构选择。祝你们享受愉快的旅程。

Hi everyone, and thank you for joining us today. We're excited to share an update about our research super-cluster. Recently completed this past fall. We had started effort on it last year, in 2021, actually, and completed the first phase in 2022, and finalized everything at the end of October, November, time frame. But for now, I'm going to set the groundwork for you as to why we built a research super-cluster in the first place.

大家好，感谢你们今天的参与。我们很高兴与大家分享我们研究超级计算机的最新进展。它已在去年秋天完成。我们是在2021年开始的努力，并在2022年完成了第一阶段，最终在10月和11月期间完成了所有工作。现在，我将为大家介绍我们为什么要建立研究超级计算机的背景。

Meta has been using AI in a lot of different ways for many years. Whether you're talking about flagging harmful or toxic or biased content in our application space, or if you're talking about translating a language instantaneously, if you've been in the Facebook app, and you've clicked on the Translate button, you know that that's machine learning that's doing machine translation for you. This is a significant area of investment, and later on in the presentation, I'll talk about one of the projects, no language left behind that's in this space. And then also the advent of AR and VR, whether you're using an Oculus today, or you're thinking about what it will be like in the future, the use of AI is essential to these platforms. Whether you're talking about the placement of you within your room, or how your hands are moving, these are all things that are driven by AI. And as these platforms grow in importance and scope, AI is going to play an even more important role. And being able to do this across billions of users in many different countries on all of our platforms, this is just a very ambitious scope of work, and it requires research to underpin it.

Meta（原Facebook公司）多年来一直在各种不同的方面使用人工智能。无论是在应用程序空间中标记有害、有毒或有偏见的内容，还是立即翻译语言，如果您使用Facebook应用程序并单击“翻译”按钮，您就知道那是由机器学习为您进行机器翻译的。这是一个重要的投资领域，接下来的演示中，我将讲述其中一个项目“无语言落后计划”在这一领域的实践。此外，随着AR和VR的出现，无论您今天使用Oculus还是思考未来会是什么样子，使用人工智能对这些平台至关重要。无论是您在房间中的位置还是手的活动方式，这些都是由人工智能驱动的。随着这些平台的重要性和范围的增长，人工智能将在其中发挥更加重要的作用。在我们所有平台上处理数十亿用户，涵盖许多不同的国家，这只是一个非常雄心勃勃的工作范围，需要研究来支撑它。

In addition to that, we have to be aware of our use of data to train our research models, being able to track that data and make sure that it's logged and it's stored in an appropriate way, and that it's encrypted, all these things are important. But also it's important to make sure that the access to our research platform is a controlled environment, and that unauthorized access is a default feature. And so my peer calearn will be talking a little bit about some of the technology that we've put into the RSC that helped in this way.

此外，我们还要注意使用数据训练研究模型的方法，能够追踪这些数据并确保它被记录并存储在适当的方式中，同时还要对其进行加密，所有这些都很重要。此外，确保我们的研究平台是一个受控环境，未经授权的访问是默认功能也同样重要。因此，我的同伴calearn会介绍一些我们在RSC中实现这些功能所使用的技术。

In terms of how we approach research and how we continue to improve our AI functions in meta, our research community is constantly looking at ways of increasing the amount of data, or the quality of the data, or the source of the data to improve our training. The verge, different modalities of data, whether you're talking about training a language model with just text, or perhaps you want to train your language model with text, and images to enhance the richness of the output. That's one way. In addition, you may want to increase the complexity of the model, whether you're talking about adding parameters or you're talking about pre-repost processing, that's being added into like a model workflow, those are areas of focus too. But to do all of these things, you have to learn from what you did, and make improvements, and tune, and iterate. And that requires rapid iteration and rapid innovation, which involves time. And so that has a direct correlation to how much resource you're able to run on for your investigation, or for your final output.

在我们进行研究并持续改进meta的AI功能方面，我们的研究社区不断寻找增加数据量、提高数据质量及数据来源以改进训练的方法。无论您是要训练某种语言模型只使用文本还是要使用文本和图像增强输出的丰富性，我们都在探索各种不同的数据模式。此外，如果您想增加模型复杂性，无论是添加参数还是预/后处理，这些都是重点关注的领域。但要做到所有这些，您必须学习以前的做法，进行改进，并进行调节和迭代。这需要快速迭代和快速创新，这需要时间。因此，这与您能够运用多少资源进行调查或最终输出直接相关。

An example of how this has become important to us is in the large language space. If you look at the chart here, this is the number of parameters being used for large language models in the last five years. With the advent of the transformer technology, that allowed for a lot more parallelization, and scale, which meant that you could add many, many more parameters. And you can see the growth of parameter counts has gone exponentially since 2018. We're now approaching a trillion. The more parameters you add, the more data that you add, the longer it's going to take the process, if you don't have sufficient scale. And we knew this because we've been using AI for a good while now, and we decided to make an investment in a large cluster.

一个示例是在大语言空间中，它已经变得对我们非常重要。如果你看这里的图表，这是过去五年中大语言模型使用的参数数量。随着变压器技术的出现，可以更多地进行并行化和扩展，这意味着可以添加更多的参数。您可以看到参数计数的增长自2018年以来呈指数增长。我们现在快要接近万亿个参数了。添加的参数越多，添加的数据越多，如果没有足够的规模，处理过程就需要更长时间。我们已经使用人工智能有很长时间了，我们决定投资一个大型聚类，因此我们知道这一点。

You can't build bees overnight, you have to plan. There have been some challenges in the past few years, as I'm sure we're all aware that COVID has impacted supply chains. And we had to factor all these things in. So this was a multi-year effort, very forward looking, and we're very happy that we've come out on the other side with a fully functional cluster. And on that note, I'd like to hand off to my peer, Kalyan, who's going to talk about the details of the cluster and the lessons that we learned.

你不能在一夜之间就建立蜜蜂群，需要提前计划。在过去几年中，我们面临了一些挑战，毫无疑问，COVID对供应链产生了影响。我们必须考虑到所有这些因素。因此，这是一个多年的努力，非常具有前瞻性，我们非常高兴我们通过完全功能的群集走出了困境。谈到这一点，我想转给我的同事Kalyan，他将讲述群集的细节和我们所学到的经验教训。

Thank you, Scott, for the introduction. Hello, everyone. My name is Kalyan. I'm a software engineer in AI Research Infrastructure Team at Meta. I happen to work on ML Training, production ML Training, and large-scale distributed systems at Meta and VMware before joining the current team. Let's jump into the question, why build a custom super cluster instead of using the existing data center technology that Meta has deployed all over the globe? It really comes down to understanding and realizing the unique demands large-scale AI training places on the infrastructure. This translates into our need to control the physical parameters.

谢谢Scott为我介绍。大家好，我叫Kalyan。我是Meta人工智能研究基础设施团队的软件工程师。在加入这个团队之前，我曾在Meta和VMware工作过，从事过机器学习训练、生产机器学习训练和大规模分布式系统开发。那么，为什么要建立一个定制的超级集群，而不是利用Meta已经在全球部署的现有数据中心技术呢？这主要是因为我们需要了解并认识到大规模人工智能训练对基础设施的独特需求。因此，我们需要控制物理参数。

What are the physical parameters? I'll highlight three of them. Number one is cooling technology. Airflow-based cooling was not meeting the mark for large-scale AI training for us. So we had to go with liquid cooling, which was a departure from Meta production data centers. Given the rack density, given the number of GPUs we wanted to pack in the data center building, this meant that the power requirements significantly deviated again from the production setup. But there's one more important aspect. That is the specialized flat backend network. This is low latency high bandwidth network with constraints on the cable length. Again, constraining the physical parameters of how far you can spread these GPUs. When you put these three together, we had to make a choice that we needed a custom cluster.

这些物理参数指的是什么？我会重点介绍其中三个。第一个是冷却技术。基于气流的冷却技术在我们的大规模人工智能训练过程中无法满足要求。所以我们选择了液体冷却，这与 Meta 的生产数据中心不同。考虑到机架密度和我们想要放入数据中心建筑物的GPU数量，这意味着功耗要求与生产设置显著偏离。但还有一个重要方面是专门的后端网络。这是一个低延迟、高带宽的网络，对电缆长度有限制。这再次限制了如何传播这些 GPU 的物理参数。当你将这三者组合在一起时，我们必须做出一个选择，我们需要一个定制的集群。

Let's look at what is research super cluster at a high level. I want to quote one number that is aggregate compute power from RSC is up to five X of lobs of compute power. That is 1 billion billion operations per second times five. So this is an armor scale. How do you get this scale? What does it take to support the scale? Let's drill into the building blocks of the cluster in the next few minutes.

让我们从高层次来看一下什么是研究超级集群。我想引用一个数字，即RSC的聚合计算能力是相当于5倍于lob计算能力的，即每秒1百万亿次操作的5倍。这是一种装甲级别的计算能力。如何达到这种规模？为了支持这种规模需要做哪些事情？让我们在接下来的几分钟中深入探讨集群的构建模块。

The faster and foremost is the faster and flat network. What's fast about it? Each server has eight infinite band links. This is 2x more than the prior generation. Each link is 200 gigabits per second, again 2x faster. And most importantly, there is no over subscription in the network. When you zoom out from the server level to the entire fabric, we believe this is one of the largest known flat IB fabrics in the world. To quote some numbers, the fabric has 48,000 links, more than approximately 2000 switches, including agent spine. This is a lot of entities, right? I believe 20K nodes is what is reported by infinite band network there.

首先是快速和平等的网络。它为何如此快？每个服务器有8个无限带宽链接。这比早期的一代更快2倍。每个链接的速度为每秒200千兆字节，再次快2倍。最重要的是，网络没有过度订阅。当从服务器层级缩放到整个系统中，我们相信这是世界上最大的已知平坦IB系统之一。举几个数字，该系统有48,000个链接，超过2000个交换机，包括代理倒刺。这是很多实体，对吗？我相信无限带宽网络报告了20K个节点。

We talked about the speed and the scale, but there is a very good, very important qualitative aspect to the network, the way we designed. We repeatedly emphasize the flat nature of the network. What are the benefits of flat network? It presents the scheduler sees a homogeneous set of resources, again translating to researchers perspective. They are free from having to worry about what performance they get if they land on XYZ nodes versus ABC nodes. This is like a degrees of freedom that our researchers really appreciate. Work loads are free from topology awareness. As a result, there is no resource fragmentation in the cluster. And we get to train more jobs and at a larger scale.

我们曾谈论过网络的速度和规模，但网络中还有一个非常重要的定性方面，也就是我们的设计。我们反复强调网络的扁平性。扁平网络有哪些好处呢？它使调度程序看到一个均匀的资源集，从研究人员的角度来看，它是自由的，不必担心它们落在XYZ节点还是ABC节点会得到什么性能。这就像是我们的研究人员真正欣赏的自由度。工作负载不需要关注拓扑意识。因此，集群中没有资源分片。我们能够训练更多的工作并在更大的规模上操作。

Let's put these design aspects together and see what bandwidth do we get from this network. As an example, if we run a 4,096, that is 4K GPU, Nickel, Alde, Reduce, benchmark, we get 157 Gbps in-place bus bandwidth. And that's already pretty good, right? This is without optimization. As recently as a couple of weeks ago, we managed to optimize the network further with sharp. And we are seeing close to 211 Gbps Nickel Alde, Reduce, bandwidth. That is a tremendous performance that we are able to extract out of the network.

让我们将这些设计方面放在一起，看看我们从这个网络中能得到多少带宽。例如，如果我们运行一个4,096，也就是4K GPU的尼克尔，阿尔德，降低（Nickel, Alde, Reduce）基准测试，我们可以得到157 Gbps的就地总线带宽。而且这已经相当不错了，对吧？这是没有优化的情况下。就在几周前，我们成功地通过锐化进一步优化了网络。现在我们看到尼克尔，阿尔德，降低带宽接近211 Gbps。这是我们能够从网络中提取出的巨大性能。

Let's move on to the next building block that is the powerful compute. Even if you have the fast and flat network, if the compute nodes are not good, we are going to get the 5X of the labs that we promised, right? What do we have as the compute nodes? We have 2,000 NVIDIA DGXA100 systems, which was the latest generation available at the time of the cluster build out. Each server has 800 GPUs. Each GPU has ADGB memory, totaling up to 640 GB. And we also have a front-end network, Ethernet network at 200 Gbps throughput. The goal was to fit as many of these GPUs as possible, which ties back to the need for custom cluster.

让我们继续谈论下一个构建块——强大的计算力。即使你拥有快速且平坦的网络，如果计算节点不好，我们也无法实现我们承诺的5倍实验室效果，对吧？那么我们的计算节点有哪些呢？我们拥有2,000个英伟达DGXA100系统，这是集群建设时最新的一代。每个服务器都有800个GPU。每个GPU都有ADGB内存，总共高达640 GB。我们还拥有一个前端网络，以200 Gbps的吞吐量运行的以太网网络。我们的目标是尽可能多地安装这些GPU，这就涉及到需要自定义集群的需求。

Now that we have talked about the compute, let's move on to the next building block of the cluster, that is storage and data. AI training at the scale is nothing if we cannot supply the data fast enough to the GPUs, right? With that in mind, let's see why we had to build a custom storage engine.

现在我们已经谈论了计算机，让我们来看看集群的下一个构建块，即存储和数据。如果不能快速地向GPU供应数据，那么规模上的AI训练也无成果可言，对吧？在这种情况下，让我们看看为什么我们不得不构建自定义存储引擎。

Early on, we realized that the special properties of training data and the demands it places on the storage systems, meant that we had to create a special purpose storage engine called ASTOR, the AI Research Storage Service. ASTOR improves the data loading performance and scalability. What happens in the background is that we have dozens of flash arrays and we have hundreds of cache nodes. We pre-process the training data and distribute the bundles across the flash arrays.

早期，我们意识到训练数据的特殊属性和对存储系统的要求意味着我们必须创建一个特殊目的的存储引擎，称为ASTOR，即AI研究存储服务。 ASTOR提高了数据加载性能和可伸缩性。背景中发生的是，我们有数十个快闪阵列和数百个缓存节点。我们预处理训练数据，并将捆绑包分布到快闪阵列中。

The cache nodes fetch the data and keep it ready for the GPUs to pull the data when they need a particular sample. ASTOR is a complex storage service that we built in which requires a dedicated presentation of its own. I'm only highlighting the important aspects. When you orchestrate this, what this means is that you are able to achieve 10-200x reduction of disk seeks and RPCs. This is very critical to this is very important to keeping the hungry training nodes busy with data.

缓存节点获取数据，并在GPU需要特定样本时保持数据准备就绪。我们构建了一个复杂的储存服务ASTOR，需要专门的介绍。我只强调重要的方面。当你协调这个时，这意味着你能够实现10-200倍的磁盘查找和远程过程调用的减少。这对于让饥饿的训练节点忙于数据非常重要。

Let's look at the other aspect of storage, that is NFS. We have 10 petabytes of flash storage mounted and visible to every device in the cluster. This storage is used for intermediate checkpointing and code, logs and other transient data that jobs produce. Why is this very important aspect of the cluster in a jobs lifecycle is that jobs can start training, interrupt themselves, resume on a different node from a previously taken checkpoint. This makes it easy to handle both failure sub-job as well as stop suspend and resume of job at a different point in time, because you have the checkpoint available across the cluster.

让我们来看存储的另一个方面，即NFS。我们挂载了10 PB的闪存存储，并让所有集群中的设备都可以看到。这个存储用于中间检查点和作业产生的代码、日志和其他短暂数据。为什么这个集群中的这个方面在作业生命周期中非常重要呢？因为作业可以开始训练，中断自己，从先前的检查点在不同的节点上恢复。这使得很容易处理失败子作业以及在不同时间点停止暂停和恢复作业，因为你可以在整个集群中使用检查点。

I want to take the opportunity to re-emphasize the commitment to privacy and security in the cluster, because we encrypt data at rest and in line. All samples are only decrypted when the GPU wants to consume them and then the data overall has a TTL, beyond which the data is deleted and wiped away from the cluster.

我想借此机会再次强调我们对集群中隐私和安全的承诺，因为我们在数据存储和传输过程中都进行了加密。所有样本只有在GPU需要使用它们时才会被解密，同时这些数据也有一个生命周期，一旦过期，数据就会被从集群中删除和清除。

Now we have the compute, the network and the storage systems. This makes for a pretty powerful compute infrastructure. How do you offer it to our researchers? Researchers need to be able to consume the resources in an easy manner. For that we built a purpose built control plane and we use an open source scheduler called slum. This is a HPC scheduler that makes job management fairly easy for researchers as well as the cluster administrators.

现在，我们拥有计算机、网络和存储系统，这使得计算基础设施非常强大。我们该如何向研究人员提供这些资源呢？研究人员需要能够轻松地使用这些资源。为此，我们建了一个专用的控制平面，并使用了一个名为Slurm的开源调度器。这是一个高性能计算调度器，使得作业管理对于研究人员和集群管理员都变得相当容易。

We have a talk through a flow user simply submit a batch job and that enters the job queue based on the priority based on the resources available and resources required. Slum picks up the job and places it on a set of nodes. Remember our cluster is homogenous that meant that means that slum is able to pick any subset of nodes and place the job. And as jobs finish slots become available new jobs can enter the new jobs can land on the nodes.

我们通过流程进行了交流，用户只需提交一批作业，根据可用资源和所需资源，它们将进入作业队列，并根据优先级进行排序。Slum会取出该作业并将其放置在一组节点上。请记住，我们的集群是同质的，这意味着Slum可以选择任何节点子集并放置作业。随着作业结束，空闲的插槽可用，新作业可以进入并落在节点上。

Now that I have described the building blocks and highlights of various storage systems that we built. Let's move on to some of the lessons we learned while building and operating the cluster. I'll start with one significant aspect. This entire cluster of 16,000 GPUs was not built in daily word as a one shot exercise. In fact, we did this in two phases during COVID and remotely.

现在我已经描述了我们所建立的各种存储系统的构建模块和亮点，接下来让我们来看看在建设和运行集群时我们学到的一些经验教训。首先我将从一个重要的方面开始。这整个由16000个GPU组成的集群不是一次性完成的，而是分为两个阶段在COVID期间远程完成的。

But more importantly this the phase build out was done in a non disruptive manner where 40% of the capacity was made available year and a half ago and we continue to build and expand the cluster. This required some good planning upfront so that we have the ability to extend the cluster without disrupting workloads. A few more lessons that we learned and we incorporated into the second phase of the cluster number one is failure rates.

更为重要的是，这个阶段的建设是以非干扰性的方式进行的，一年半前已经提供了40%的容量，我们继续建设和扩展这个集群。这需要一些良好的前期规划，以使我们有能力在不影响工作负载的情况下扩展集群。我们在第二阶段的集群中学到了几个教训并加以整合，其中第一个是故障率。

The hardware failure rates were higher than we anticipated and this required us to build better detection remediation mechanisms to ensure a stable and a level cluster and offer a seamless experience to our researchers. Next up as I mentioned before our fabric is one of the largest flat IB fabrics in the world, especially at the scale. This forced us to do pioneering and groundbreaking work to find the bottlenecks find the performance bottlenecks tune and you know sustain the performance of the cluster over a long period of time. And these lessons were incorporated into how we brought up the rest of the cluster in phase two.

硬件故障率超出我们的预期，因此我们需要构建更好的检测和纠正机制，以确保一个稳定和平等的集群，并为我们的研究人员提供无缝的体验。正如我之前提到的，我们的基础设施是世界上最大的IB扁平布局之一，特别是在规模方面。这迫使我们开拓先驱性的工作，找到瓶颈、找到性能瓶颈、调整和维持集群长期的性能。这些经验教训应用于我们如何在第二阶段开启其余的集群。

The third aspect is when you have a high performance compute infrastructure and a lot of projects want to run their jobs on this cluster. How do you offer these resources in a manner that is controllable and can prioritize and can implement the business priorities. We work closely with slums scheduling and priority primitives to incorporate the resource quotas and priority so that the right jobs consume resources at the right time.

第三个方面是当你有一台高性能的计算基础设施，并且许多项目想在该集群上运行他们的任务。如何以一种可控的方式提供这些资源，可以设置优先级并实现业务优先级。我们与排程和优先级原语密切合作，结合资源配额和优先级，以使正确的任务在正确的时间消耗资源。

I want to cover a few more lessons that we picked up during the operational phase of the cluster across both phase one and phase two. As mentioned before GPUs fail in multiple ways they both hard and soft failures. This requires different detection and remediation strategy for soft failures you can get away with tooling that can fix the state of the GPU. But sometimes you may have to go towards a parts replacement and have a longer remediation cycle.

我想再分享一些在第一和第二阶段的聚集运行期间所收集到的经验教训。正如之前所提到的，GPU会以多种方式发生故障，包括硬件故障和软件故障。这需要不同的检测和修复策略。对于软件故障，您可以使用能够修复GPU状态的工具。但有时您可能需要进行部件更换，并进行较长的修复周期。

The second part is network stabilization especially at this scale when you're talking about 48,000 links is a pretty hard problem. Every component becomes a suspect starting from the nick attached to the server, the cables linking nodes to switches, the agent spine switches themselves and the internal switch links, ISLs. When you have these thousands of fiber cables, a lot of physical factors play a role here including the band radius, the temperature and any other degradation can affect both performance and functionality. The network stabilization is a hard problem, both during the build out phase as well as ensuring a stable cluster.

第二部分是网络稳定，尤其是当您考虑到有48,000个链接时，这是一个非常艰巨的问题。从连接到服务器的插座，节点连接交换机的电缆，代理脊椎交换机以及内部交换机连接（ISL）开始，每个组件都有嫌疑。当您拥有这些数以千计的光纤电缆时，许多物理因素都会起作用，包括带半径、温度和任何其他降解，这可能会影响性能和功能。网络稳定是一个难题，无论是在构建阶段还是确保稳定的群集时都是如此。

One other key takeaway that I want to emphasize is that in such a large and high performance setup when your workloads misbehave or fail the symptoms are several layers removed from the root cause. This means that you have to have very good observability and debuggability infrastructure investments to be made because you can have bad nodes, memory, faulty cables, all of them can result in a completely opaque symptom experienced by the job.

我想强调的另一个关键要点是，在如此庞大和高性能的设置中，当你的工作负载表现不良或失败时，症状与根本原因相差数层。这意味着你必须做出非常好的可观察性和可调试性基础设施投资，因为你可能会有坏节点、内存、故障电缆等问题，所有这些都可能导致作业遭受完全不透明的症状。

Now that I've talked about the why and what are the cluster I want to hand it back to Scott who will go over case studies of model strain down RSC. Thanks, Calian.

现在我已经讲述了为什么和什么是集群，我想把话题交给Scott，他将介绍在RSC中模型应变降低的案例研究。谢谢，Calian.

Now I'd like to talk to you about a couple of the projects and then we'll wrap up. So I'd like to start with the Lama project we just recently released. The research team was trying to address a few different concerns or tackle a few different problems. One question was let's say you're a researcher and you want to understand how large language models work and you're in the general community or you're at a college or whatever and you can't really run something like a GPT because you don't have the scale of facility, the scale of infrastructure to do that.

现在我想跟大家谈一谈我们最近完成的几个项目，然后我们就结束吧。首先我想谈的是我们刚刚发布的Lama项目。这个研究团队试图解决几个不同的问题。其中一个问题是，假设你是一个研究人员，想要了解大型语言模型的工作原理，而你又在普通社区或大学等没有能力运行像GPT这样的系统的场所，也没有相应的基础设施。

So one of the drivers was to release a set of foundational models with much smaller parameter counts that would be available to the research community to help them better understand LLMs and to potentially innovate beyond the foundational model themselves. So that was one of the agenda items.

其中之一的驱动因素是推出参数数量更少的基础模型，使其可供研究社区使用，以帮助他们更好地理解LLMs，并有可能在基础模型基础上进行创新。这是其中的一个议程项目。

The other goal was could a model trained for a longer period of time with perhaps more data but smaller parameter counts yield similar results to some of the state of the art language models in terms of quality of responses and accuracy. So I encourage you to take a look at the research paper that will give you all of the findings. They had some interesting results.

另一个目标是，通过使用更多数据，但参数数量更少的长时间训练模型，是否能够获得与某些最先进的语言模型相似的响应质量和准确性。因此，我鼓励您查看研究论文，该论文将为您提供所有研究结果。他们得出了一些有趣的结论。

One of the keys is that they were able to use the RSC to do rapid training. They were able to speed up their training significantly from prior efforts in other environments that helped them get to their deadline much faster. In addition, the stability of the cluster during their training epics allowed them to run without interruption for multiple weeks at a time which allowed them to get to their results quicker as well. So this is an excellent example of a project that was able to leverage the scale.

其中一个关键是他们能够利用RSC进行快速培训。他们能够比在其他环境中之前的尝试显着加快培训速度，从而帮助他们更快地达到了截止日期。此外，在他们的培训周期中，集群的稳定性使他们能够连续数周运行而不间断，这也使他们更快地取得了结果。因此，这是一个能够利用规模的优秀项目的例子。

The largest model, the 65 billion parameter model, was trained on 2048 GPUs and they were able to dramatically decrease the amount of time it took to complete that. In addition, I mentioned this earlier in the presentation. We had the no language left behind project NLB. I mentioned this a little earlier in the presentation is a project that was designed to look at could we do machine translation for languages without a lot of data on the internet.

最大的模型，650亿个参数的模型，是在2048个GPU上训练的，他们能够大大减少完成所需要的时间。另外，在本次演示中，我早些时候提到了"没有语言被遗忘"（NLB）项目。这个项目的设计初衷是研究一下我们能否为在互联网上没有很多数据的语言进行机器翻译。

And the team has done a wonderful job. They released an excellent research paper. I encourage you to look at the research supercluster enabled this team to dramatically speed up their training times from where they were on the previous generation cluster. They were around a month. Parapic running on RSC. They were able to drop their training times down to around a week, which allowed them to speed up and improve the output and get to their research results in time.

这个团队做得很出色，发表了一篇优秀的研究论文。我鼓励你们去看一看，超级集群让这个团队的训练时间显著缩短，从上一代集群中约一个月的时间缩短到了约一周。在RSC上运行的Parapic，他们能够加快并改善产出，及时得到研究结果。

So in conclusion, the research supercluster offers two key ways to scale. One, the ability to scale the state of the art. So can we take a single model and go big? Can we go with thousands of GPUs for a single model? That is a key aspect of the RSC. But the second way is the ability to scale out and have many projects running because of the nature of the network and the architecture, the ability to run multiple simultaneous projects without interruption or without impinging on each other is available to our research community. This means that we'll be able to take on projects, offboard projects quickly and scale our research efforts here at Meta.

总的来说，研究超级集群提供了两种关键的扩展方式。第一种是能够推进最先进技术的扩展能力。也就是说，我们能否将单一模型扩展至更大规模？能否运用数千个GPU来支持单一模型？这是RSC的一个重要方面。第二种扩展方式则是由于网络和架构的特性，能够扩展并运行许多项目，这些项目不会相互干扰或中断。这意味着我们将能够在Meta这里快速开展项目，同时扩大我们的研究工作规模并增加外部的项目。

In terms of infrastructure innovation, we will continue to look at how we build RSC, how we operationalize it, better engineering practices, how can we improve it, what can we learn from our efforts that we could apply to any future endeavors. And also to share our learnings and to share our thoughts and processes with our peers in production as we drive towards the future. And with that, I thank you all for attending our presentation. Thanks, Gordon Kalyan.

在基础设施创新方面，我们将继续关注如何构建RSC，如何将其运营化，改进工程实践，如何改善它，我们能从我们的努力中学到什么，可以应用于任何未来的事业。并且分享我们的学习和思考过程，与我们在生产中的同行们一起驱向未来。在此，感谢大家参加我们的演讲。谢谢，戈登·卡利扬。

Having enough physical AI capacities essential to the future of our company and supporting all our AI workload at scale requires a very different approach than what we do to scale our regular web services, right? Our new DC design will support the next generation of AI systems. We are building an increased level of flexibility into this design. This allows us to pivot and response to shifts and changes that we see in the AI space. Please welcome engineering director Ellen Duong to take us inside the vision for Meta's next generation AI Optimized DC design. Take it away, Ellen.

我们公司未来所需的物理人工智能能力足够够重要了，要支撑所有规模的人工智能工作负载，需要采取不同于扩展常规网站服务的方法，对吧？我们的新数据中心设计将支持下一代人工智能系统。我们在这种设计中构建了更高水平的灵活性，这使我们可以在AI领域看到的转变和变化中转向和响应。欢迎工程总监Ellen Duong进入Meta下一代人工智能优化DC设计的愿景。请开始，Ellen。

AI is the foundation of our discovery engine and our ads business while improving our user experience. AI based ranking, computer vision and language translation has drastically changed how we built and deployed services across all of our apps. So advancing AI, building into every one of our products, as well as enabling many new products and services is critical to our future. It's evolving, so it's pretty complex for data centers.

AI是我们发现引擎和广告业务的基础，同时也提高了我们的用户体验。基于AI的排序、计算机视觉和语言翻译已经极大地改变了我们在所有应用程序中构建和部署服务的方式。因此，推进AI技术，将其纳入我们的每一个产品，以及推动许多新产品和服务的实现，对我们的未来至关重要。AI技术不断发展变化，因此对于数据中心来说相当复杂。

Hi, I'm Alan. People call me AD, director of engineering for data centers. I've been with Meta for nine years, but I've dedicated my entire career to designing and constructing data centers. Something about creating an idea, drawing it on paper, conceptualizing it, creating blueprints, and actually seeing it physically constructed coming to life. And along the way, I get to work with thousands of amazing people in our industry to make it happen. Nothing better than that. So we're going to continue to make AI happen in our data centers. Newer AI technology calls for new hardware. Our new hardware needs a new home. That new home is our next-gen data center design. Our next-gen design will enable AI technology for today and for future generations. But also more importantly, we need to plan for roughly 4x scale. So how do we do that? How do we plan for the scale? Scaling is not new for us. We've done it before.

大家好，我是Alan，人们称我为AD，我是数据中心工程总监。我已经在Meta工作了九年，但我一直致力于设计和建造数据中心。创造一个想法，在纸上绘制，构思，制作蓝图，然后看着它在现实中被构建成形，这一切都非常令人兴奋。在此过程中，我与我们行业的数千位优秀人才一起工作，以实现这一目标。没有比这更好的事情了。因此，我们将继续在我们的数据中心中实现AI技术。新的AI技术需要新的硬件。我们的新硬件需要一个新的家。这个新家是我们的下一代数据中心设计。我们的下一代设计将为今天和未来几代人的AI技术提供支持。但更重要的是，我们需要为大约4倍的规模做好计划。那么我们该如何做到呢？我们如何规划扩展规模？对我们来说，扩展并不是新鲜事。我们以前做过。

As you can see in this graphic, since 2010, we've scaled our infrastructure by over 10x. It all started out as a growth journey. We experienced exponential use of growth and engagement. We had to rethink our approach to our infrastructure stack, meaning how do we control our own destiny? We believe that innovation requires a full technical stack approach. What that means is we develop our own products, our own software. We're in the business of going to develop our own hardware, network, and building our own data centers. So a resilient and portable software that sat on an open, modular, and disaggregated hardware was the approach that led to super efficient data center design.

从这张图表可以看出，自2010年以来，我们的基础设施规模已经扩大了十倍以上。一切始于我们的成长之旅。我们经历了指数级的增长和参与。我们不得不重新思考我们的基础设施堆栈的方法，也就是如何掌控我们自己的命运？我们相信创新需要全方位的技术堆栈方法。这意味着我们开发自己的产品、自己的软件。我们将进入开发自己的硬件、网络，并建立自己的数据中心的业务领域。因此，一种坚韧而可移植的软件，安装在开放的、模块化的、分散的硬件上，成为了超高效的数据中心设计方法。

So the image you see on the left is a napkin sketch. And this was an example of engineers from the hardware team and the data center team got together to design a fully integrated power distribution system that led to a world class efficient data center design. In 2009, we completed our very first data center building, Primeville, Oregon. That build was 38% more efficient to build, and 24% less expensive to run than any of our previous facilities.

左边你所看到的图像是一张餐巾画草图。这是一个例子，展示了硬件团队和数据中心团队的工程师们如何共同设计一个完全集成的供电系统，在此基础上开发了世界级高效的数据中心设计。在2009年，我们完成了我们第一个数据中心建筑，Primeville，在俄勒冈州。相比之前的设施，Primeville的建造效率提高了38％，运行成本降低了24％。

In 2014, we quickly surpassed a billion users. This was right around the time that I joined the company. Needless to say, we had to scale our infrastructure exponentially by a factor of 10x. We called that the wall. So our foundation was strong or design was simple, repeatable with world class energy efficiency. Basically, we're asked to just do it. We're really good at just a lot more of it.

在2014年，我们很快就超过了10亿用户。那个时候，我加入了这家公司。毋庸置疑，我们必须将基础设施扩展10倍来满足需求。我们称之为“墙”。因此，我们的基础很强，设计简单、可重复性高，能效也世界一流。基本上，我们只需要做就可以了。我们非常擅长只是做更多的工作。

In 2018, a slight wrench was thrown into the mix. It was a new design, a new network design specifically to support disaggregated flash that how can I simply put it broke our first gen design. So we had to redesign the data center, but we had to redesign it in mind with jobs that were in flight in the middle of the construction. We had to rethink of this design for our existing fleet and how we can reconfigure it for future projects without really breaking it all apart.

在2018年，有了一点变动。我们推出了一种新的设计，一种专门支持分离闪存的新网络设计，可以简单地说，它破坏了我们第一代的设计。因此，我们必须重新设计数据中心，但我们必须考虑到已经在施工中的工作。我们必须重新考虑我们现有车队的设计，并考虑如何在不完全打碎所有东西的情况下重新配置它以适应未来的项目。

So in the same theme of change in 2020, we set some new goals and those goals were to reach net zero emission for our entire value chain by 2030. So again, more change that drove change into our current design. In 2022, additional change hit us and that change was to be water positive, which means our company will restore more water than we consume for our global operations. So with all that change, with all that growth, that resulted in 12 plus years of building world class data centers.

在2020年改变的主题中，我们设定了一些新目标，这些目标是到2030年使我们整个价值链实现净零排放。因此，这又是更多的变化，推动我们现有设计的变化。2022年，又有了额外的变化，那就是成为水正面的公司，这意味着我们公司将为全球运营恢复比我们消耗更多的水。因此，所有这些变化和增长导致了12年以上的建设世界一流的数据中心。

We landed on scale and more importantly, a homogeneous infrastructure. The uniformity provided us the ability to rapidly deploy and efficiently maintain our global fleet. So here we are today, different set of requirements, but in my opinion, similar challenges, we're experiencing growth in AI tech. We need to make sure that our data center can adapt to something that's still evolving. So all that change and we need to scale it by 4x roughly.

我们达成了在规模和更重要的，同质化基础设施方面的目标。这种统一性为我们提供了迅速部署和高效维护全球机群的能力。所以今天我们面临着不同的需求，但在我看来，相似的挑战，我们正在经历AI技术的增长。我们需要确保我们的数据中心能够适应仍在不断发展的技术。所以所有这些变化，我们需要将其扩大大约4倍。

So for the rest of the talk, I'm going to walk us through how we enable 4x through innovation, efficiency, while meeting our goals and commitments that we've made for sustainability. So it all starts with innovation. AI is relatively new and is still evolving. I've said it a couple of times now. And really, it just makes innovating in data center design really complex.

因此，在接下来的演讲中，我将带领大家了解我们如何通过创新和提高效率实现4倍效益，同时实现我们对可持续性的承诺和目标。所有这一切都始于创新。人工智能还比较新颖，仍在不断演进中。我已经说过几次了。这使得数据中心设计的创新变得非常复杂。

I often ask myself, like, how do we balance our design around what we know today versus how much we should plan for or how much we should future proof. So for example, over the next four years, we may see a 1.5 to 2 times growth in power consumption per accelerator and roughly 1.5 time for high bandwidth memory. This evolution is all happening as we're planning to break ground on our first next gen data center, like today or this year. And so if we're just starting on construction for that data center by the time we're done, we might be obsolete.

我常常问自己，我们该如何平衡现有的设计与未来计划的设计？例如，在未来四年内，我们可能会看到加速器的能耗增长1.5到2倍，高带宽内存也将增长1.5倍左右。而我们正在计划开始建造的我们第一个下一代数据中心，在这个过程中，这些技术也在不断发展。如果我们现在开始建造这个数据中心，到完成的时候可能就已经过时了。因此，我们需要思考如何平衡现有技术与未来技术的使用，以确保我们的设计具备未来的可扩展性和适应性。

In addition to that, depending on what our services and products will need, we can see smaller scaled clusters of 1000 accelerators to potentially 30,000 plus for much larger jobs. Each of these configurations, as well as the accelerator that we utilize will require a slightly different approach to hardware and network systems designs. So the data center will need to accommodate for all this.

此外，根据我们的服务和产品的需求，我们可能会看到一些规模较小的1000个加速器的集群，也可能会有大型作业所需的超过30,000个加速器的集群。每个配置以及我们所使用的加速器都需要稍微不同的硬件和网络系统设计方案。因此，数据中心需要考虑到所有这些因素。

So here's how we're thinking about it. Data center design innovation, we need to focus on flexibility for long term compatibility and scale in deployment. So it starts with a building design, as well as some power distribution components. So we have to enable co-location of server and network hardware.

这是我们对数据中心设计创新的思考。我们需要关注长期兼容性和可扩展性的灵活性。因此，这始于建筑设计，同时还包括一些电力分配组件。我们需要使服务器和网络硬件能够共存。

So in the case of AI training, the server, which are built around accelerators and the network system operate as one, if we're looking to scale it up or down. So there's a dependency in the co-location of this equipment. And depending on our products and services, that all changes. That could change. So we're going to share physical infrastructure for these two types of hardware. It's going to be really important for us if we're going to plan for flexibility or fungibility.

因此，在AI培训的情况下，服务器是围绕加速器和网络系统构建的，如果我们想要扩大或缩小规模，它们将作为一个整体运作。因此，设备的共同位置存在依赖，而这取决于我们的产品和服务，这可能会发生变化。因此，我们将为这两种硬件共享物理基础设施。如果我们要计划灵活性或可替代性，这对我们来说非常重要。

This also enables efficiency in our fiber deployment, because we need a significant fiber to interconnect the servers. So co-locating them closer together will allow us to gain some efficiencies there. And when I think about it, adding this flexibility within the white space itself still enables a home genaic approach to data center deployment and operations. So it gives us some flexibility there from that perspective as well.

这个还可以提高我们的光纤部署效率，因为我们需要大量的光纤来连接服务器。因此，将它们放置在更靠近的位置将使我们能够获得一些效率。当我想起来时，在白色空间本身添加这种灵活性仍然能够实现数据中心部署和运营的通用方法。因此，从这个角度来看，它也给了我们一些灵活性。

Server type flexibility. These servers are going to require different types of cooling. That means that as we think about our new design, we're developing cooling systems that will support 100% air cooling, as well as a large percentage of liquid cooling. This allows us to support and continue to support the traditional service today like compute and storage.

服务器类型的灵活性。这些服务器将需要不同类型的冷却。这意味着在我们考虑新设计时，我们正在开发支持100％空气冷却和大量液体冷却的冷却系统。这使我们能够支持和继续支持像计算和存储等传统服务。

It also allows us to support the first generation of Metas AI enabled hardware, as well as any future enabled hardware. Delivering power infrastructure closer to the server rack will be simpler and more efficient with our new design. We're eliminating as much equipment as possible through our power distribution chain.

这也使我们能够支持第一代Metas AI启用的硬件，以及未来启用的任何硬件。通过我们的新设计，将电力基础设施提供给服务器机架将更简单高效。我们正在通过我们的电力分配链删除尽可能多的设备。

So as you can see from the graphic, we're eliminating the low voltage switch gear that creates what you might want to call a bottleneck of capacity. Eliminating that allows us to, you know, allows the server rack to grow in density in the future with minor modifications to our infrastructure. And it continues to allow for greater power utilization.

正如您从图表中可以看到的那样，我们正在消除低电压开关设备，这会导致产生您可能想要称之为容量瓶颈的问题。消除这个问题可以让我们，您知道的，让服务器架可以在未来通过对基础设施进行小的修改来增加密度。并且这仍然可以允许更大的能源利用。

So we pride ourselves on world class power utilization. And today, you know, we were utilizing roughly 70 plus percentage of the power that we deploy. What does that mean? It means that we strand less power and it eventually means we build less data centers, which is all good mixes more efficient.

我们以全球一流的动力利用率为自豪。今天，我们利用了大约70％以上的部署电源。这是什么意思？这意味着我们浪费的电力更少，最终意味着我们建造较少的数据中心，这是全部混合更有效的好处。

So innovation to enable flexibility and scale. That's important, but efficiency in our design has always been core to our business. Ultra flexibility, future proofing requirements for both air and water cooling doesn't do us any favors in power efficiency or reducing costs or deploying data centers faster. So we had to make some tradeoffs as we progress through the design.

创新是实现灵活性和可扩展性的重要方式，但设计的效率一直是我们业务的核心。超级灵活性和未来性的要求既不能提高能源效率，也不能降低成本，更不能加快数据中心的部署速度。所以我们必须在设计过程中进行一些权衡。

So over the last year, there's a couple examples that I'll share with you. We've had to make some tradeoffs. So when you think about liquid to chip cooling, that's important for us to enable future generations of AI, but deploying too much to early is inefficient. In our design, we've already made a wholesale shift to facility water and we've created these AI scaling units as I previously shared. But for efficiency, we're going to only deploy a small percentage of liquid to chip cooling on day one and we'll scale it up as we need. This means more complex upfront rack placement and planning.

在过去的一年里，我要和大家分享几个例子。我们不得不做出一些权衡。例如，在考虑用液体冷却芯片时，这对我们来说非常重要，因为这有助于为未来的人工智能打下基础，但是过早地部署过多的设备是低效的。因此，在我们的设计中，我们已经全面转向设施用水，并创建了之前分享过的这些AI扩展单元。但为了提高效率，我们只会在第一天部署一小部分液体冷却芯片，然后根据需求逐步扩展。这意味着需要更复杂的机架放置和规划。

We haven't had to do that in the past. So this is a much very complicated process for us, but it allows us to save some capital. Right, it allows us to deploy faster, less equipment means we can we can build this thing faster and it limits the unnecessary maintenance of equipment that become unused. We're going to continue to lean into our software resiliency. And some hardware buffer versus relying too heavily on physical sort of resiliency, like we do in the industry. This allows us to write size or physical backup infrastructure, like using fewer diesel generators, saving time and deployment again, less equipment, less time, and it reduces emissions and and and continues to make our operations more efficient.

过去我们没有做过这个。所以这对我们来说是一个非常复杂的过程，但它允许我们节省一些资本。对，它使我们能够更快地部署，较少的设备意味着我们可以更快地建造这个东西，而且它限制了未使用设备的不必要维护。我们将继续借助我们的软件弹性能力。相比过度依赖物理弹性能力（如我们在行业中所做的），我们会使用一些硬件缓冲区，这使我们能够适当地大小化我们的物理备份基础设施，例如使用较少的柴油发电机，节省了我们部署的时间，同时减少了设备数量和操作排放，使我们的运营更加高效。

But the risk is this means that we're going to take on some unknown risk associated with software for our AI workloads. We're still learning about that as we're deploying this in scale. And so as we learn more, we might adjust our strategy. Increase and power usage to enable water neutrality in liquid cooling. So when you think about it, liquid cooling doesn't come for free. We can't simply just open our windows and rely on free air cooling anymore. We can't keep leveraging evaporation to reject heat because that will continue to be a challenge for us as we go into regions that are water constrained and it was we continue to scale out our operations.

但风险是这意味着我们将承担与AI工作负载相关的一些未知软件风险。在我们进行大规模部署时，我们仍在学习这方面的知识。因此，随着我们的学习，我们可能会调整我们的策略。增加和使用功率以实现液体冷却的水中性。当你考虑这个问题时，液体冷却不是免费的。我们不能仅仅打开窗户并依靠免费的空气冷却。我们不能一直利用蒸发来排热，因为在我们进入水资源受限的地区并继续扩大我们的运营时，这将继续成为我们面临的挑战。

So this means that we'll be using a little bit more power to cool our equipment, but on the flip side will reduce our water consumption. So with all these tradeoffs and there's many, many more tradeoffs that I can share that I don't have time for, but where do we land? We anticipate that our next gen data center will be 31% more cost effective. And we're going to be able to build it two times faster for a complete full region when you compare that to our current generation data center.

这意味着我们将使用更多的电力来冷却我们的设备，但另一方面会减少我们的用水量。所以，在所有这些权衡和更多的权衡中，我可以分享的权衡有很多，但我们最终会做出什么样的决定？我们预计我们的下一代数据中心将比现有的数据中心更具成本效益，成本节约31％。当与我们现有的数据中心进行比较时，我们可以将其建造两倍的速度，在一个完整的完整地区。

So these are all tough tradeoffs that we've had to make or continue to make decisions on a side by side case based on the constraints we have. So this leads me to our continual commitment to sustainability. We've committed already to reaching net zero across our value chain by 2030. We continue to support 100% of our operations with renewable energy. And we've committed to reach water positivity by 2030. To date, we've restored 2.2 million cubic meters of water.

这些都是我们所面临的棘手抉择，我们必须根据我们所面临的限制来做出决定。这也促使我们不断致力于可持续发展。我们已经承诺到2030年在我们的价值链中达到净零排放。我们继续支持我们所有的操作使用100%可再生能源。并且我们已经承诺到2030年实现正水平。到目前为止，我们已经恢复了2.2百万立方米的水资源。

So how does our next gen design contribute to this? Well, number one is just use less material. Less is more. That's the easy button. That's why color that green, right? Just press the easy button, right? Use less. Design for smaller. Think about a region that is significantly smaller than what we have today. That means just less equipment. Less underground infrastructure. Deeper supplier engagement.

那么我们的下一代设计如何为此做出贡献呢？首先，就是尽量少用材料。少即是多。这是简单方法，也是为什么我们选择这种绿色设计的原因，只需按下"简单"按键。少用，设计为更小。考虑一个比今天的区域显著更小的区域。这意味着只用更少的设备，更少的地下基础设施，更深的供应商参与。

So drive for greater supply chain transparency in developing share goals and share to mission targets with our suppliers. We committed to 20. We committed to net zero by 23 across our entire supply chain. That means everything, every component that goes into our data center. And then lastly, but not least, switch to low carbon alternative materials. For example, we see a ton of opportunities in concrete. Prior to our new design by 2030, our emissions footprint measured in metric toned carbon dioxide equivalent was projected to grow by roughly 3x. Just follow that dotted line to the right. Just by using less as a first step, our next gen design is tracking roughly 75% less carbon intensive. So as we continue to progress our design, explore alternative materials and engage deeper with our suppliers. We're confident that the data center will do its part in helping us reach our goals.

我们致力于推动供应链透明度的提升，并与我们的供应商共同制定目标和任务。我们决定在2020年实现零排放，全面实现净零碳排放。这意味着我们对数据中心中的每个组件进行了全面考虑。最后，我们切换到低碳替代材料。例如，在混凝土领域，我们看到了许多机会。在我们的新设计之前，到2030年，我们测算的排放量（以碳排放当量计量的度量单位）有望增长约3倍。只要第一步减少使用，我们的下一代设计就可以追踪出大约75％的碳排放量减少。因此，随着我们继续推进设计，探索替代材料并与供应商进行更深入的交流，我们相信数据中心将尽其所能，帮助我们实现目标。

So in close, AI tech continues to evolve at a rapid pace. Flexibility design is key for long term success. Balancing trade offs between efficiency compatibility with a continued commitment to our sustainability goals is key. Lastly, the journey is only 1% finished. We will continue to innovate and evolve our design and drive for greater efficiency while enabling future generations of AI technology. Thank you.

总之，人工智能技术仍在迅速发展。灵活的设计是长期成功的关键。在效率和与可持续发展目标的兼容性之间平衡权衡是关键。最后，旅程才刚刚开始的1%。我们将继续创新和发展设计，不断追求更高的效率，同时实现未来一代人工智能技术的发展。谢谢。

Thanks, Alan. After almost 5 years, we finally create the technologies that made it possible to compile any PyTorch model, resulting in a step function change in PyTorch's approach to execution efficiency. We call it PyTorch 2.0. PyTorch 2.0 delivers significant performance improvements over a wide variety of models, often with a simple one-line change. Engineering manager Pung Wu joins us now to explore two important technologies that underline PyTorch 2.0. Tosh Dynamo and Tosh Inductor. Take it away, Pung.

Alan，感谢你。在经过近5年的努力后，我们终于成功地开发了一些技术，使得能够编译任何PyTorch模型，从而在PyTorch的执行效率方面实现了质的飞跃，我们称之为PyTorch 2.0。PyTorch 2.0在各种模型中都能实现显著的性能提升，有时只需要简单的一行代码修改。现在，工程经理Pung Wu加入我们，来探讨PyTorch 2.0背后的两项重要技术：Tosh Dynamo和Tosh Inductor。请开始您的分享，Pung。

Welcome. My name is Pung Wu. I support the PyTorch compiler team. On March 15, 2023, we announced PyTorch 2.0, a step function change to PyTorch performance via this new mode called the Graph Mode. The most remarkable aspect of PyTorch 2.0 is that we are able to offer Graph Mode without sacrificing the ease of use UX that made PyTorch successful in the first place. I want you to bookmark two phrases, Graph Mode and ease of use.

欢迎来到我的介绍，我是Pung Wu，支持PyTorch编译器团队。于2023年3月15日，我们宣布推出了PyTorch 2.0，这是一次跨越式的变革，通过新模式的图形模式大幅提升了PyTorch的性能。PyTorch 2.0最引人注目的方面是，我们能够在不损害PyTorch一开始成功的易用性的情况下提供图形模式。我希望你们记住两个关键词，即图形模式和易用性。

It was long believed by the industry that machine learning frameworks cannot have both Graph Mode and ease of use. Pt 2.0 actually challenged that conventional system. So, in today's talk, I'm going to tell the story of Pt 2's unique Graph Mode and essentially how could we have the cake and eat it too. But before we jump into 2.0, let's talk about PyTorch 1.0 first.

很长一段时间，行业内都认为机器学习框架不能同时具备 Graph Mode 和易用性。Pt 2.0 实际上挑战了传统系统。因此，在今天的演讲中，我将讲述 Pt 2 的独特 Graph Mode 的故事，以及我们如何实现既有蛋糕又能吃掉它。但在我们深入探讨 2.0 之前，让我们先谈谈 PyTorch 1.0。

So, 1.0 was announced about five years ago and to give a little bit of a historic context, at the time, the whole industry of machine learning frameworks are mostly embracing and designing around Graph Mode. It was believed that Graph Mode allow computer optimizations so could potentially provide better performance. But the catch of Graph Mode is that it requires the developer to think in Graphs and this is really counterintuitive, hard to express and even harder to debug. So, 1.0 at the time made a bold bet. We decided to value ease of use above everything else, including Graph Mode.

大约五年前，我们宣布了TensorFlow 1.0。为了提供一些历史背景，在当时，机器学习框架的整个行业都大多采用并围绕图模式进行设计。人们认为，图模式可以进行计算机优化，因此可能提供更好的性能。但是，图模式的陷阱是需要开发人员以图形的形式进行思考，这真的很不直观，很难表达，甚至更难调试。因此，当时的TensorFlow 1.0做出了一个大胆的赌注。我们决定把易用性放在首位，包括优于图模式。

So, PyTorch 1.0 boldly chose to embrace non-graph mode. We call it eager mode. We're the intention to quickly draw adoption from researchers. So, this bet paid off. 2.5 years after the 1.0 release, PyTorch reached 50% adoption, making it number one machine learning frameworks used by researchers. And after that period, we still see healthy year to year growth to date. So, today, PyTorch is the de facto training engine for most of the most advanced ML models out there.

因此，PyTorch 1.0大胆选择采用非图形模式，称之为eager mode。我们的意图是吸引研究人员快速采用。这个赌注成功了。在1.0版本发布后2.5年，PyTorch的采用率达到了50%，成为研究人员使用的主要机器学习框架。此后，我们仍然看到健康的年增长率。因此，如今，PyTorch是许多最先进的ML模型的事实上的训练引擎。

So, if the story of 1.0 is about making a strategic bet of using ease of use to attract research adoption, then 2.0 came into being by pure technical innovation. So, in PyTorch 2.0, we introduced this Torch.com file API as the primary graph mode API. It's very easy to use. You program as if in the eager mode and you just need this one liner Torch.com file decorator to your code, and then the graph engine would kick into place behind the things. And this graph engine would be able to offer out-of-box performance boost from 30% on average to 70% on average over a wide range of OSS models.

如果说PyTorch 1.0的故事是关于通过易用性来吸引研究采用的战略性赌注，那么PyTorch 2.0是通过纯技术创新而诞生的。因此，在PyTorch 2.0中，我们引入了这个Torch.com文件API作为主要的图形模式API，非常易于使用。你可以像使用eager模式一样编程，只需要在你的代码中添加一个Torch.com文件装饰器，然后图引擎就会在后台启动。这个图引擎能够为多种OSS模型提供平均30%到70%的性能提升。

So, essentially, we did have the cake and ate it too. You may wonder, why do we make such a fuss about PyTorch graph mode? It is because what made PyTorch be loved, its flexibility and dynamism, is exactly what made it hard to come out. So, the moment we figure out how to have graph mode while maintaining the ease of use API, we know that a step function change is happening on PyTorch. And that moment is Torch Dynamo.

因此，本质上来说，我们既拥有了蛋糕，也吃掉了蛋糕。你可能会想，为什么我们会在意PyTorch图模式的事情呢？这是因为PyTorch所受到喜爱的灵活性和动态性，也正是使它难以推出的原因。因此，当我们找到了如何在保持易用API的同时实现图模式的方法时，我们知道PyTorch正在经历一个重要的步骤变化。而这一刻就是Torch Dynamo。

Torch Dynamo solved a long-standing problem in PyTorch graph capture. In fact, this is not our first attempt. We have offered several generations of graph capture techniques for PyTorch, all of which require significant manual effort. The solutions ranges from capturing correct graphs, but some of the graphs require human intervention to make the graphs captureable. On the one spectrum or on the other spectrum, you can always capture a graph, but the graph may not be correct. Dynamo solved both issues.

Torch Dynamo解决了PyTorch图形捕捉领域长期存在的问题。事实上，这不是我们第一次尝试。我们已经为PyTorch提供了几代图形捕捉技术，但所有这些技术都需要大量手动努力。这些解决方案包括捕获正确的图形，但其中一些图形需要人工干预才能捕获；或者你总是可以捕捉到一个图形，但该图形可能不正确。Dynamo解决了这两个问题。

To give you some intuition, there were two key insights. To make graphs always captureable, then we'll let go of the requirement of always capturing whole graph. Instead, we capture partial graphs, so basically we will stop graph capture if we encounter something that Dynamo does not recognize and fall back to eager and then resume capturing graphs when we reach a region that we recognize. To solve the second problem about capturing a graph that is not correct for the execution, Dynamo introduce guards that would be validated at the runtime. And of course, if guard fails, we have the ability to recapture graphs just in time. Eventually, these three key designs, partial, guarded graphs with just in time recompilation is what made Tors Dynamo both sound and auto box.

给你一些直觉，有两个关键洞见。为了让图形始终可捕捉，我们将放弃始终捕捉整个图形的要求。相反，我们捕捉部分图形，所以如果我们遇到Dynamo无法识别的东西，我们就停止图形捕捉，然后转而执行eager，当我们到达一个我们认识的区域时，就会恢复捕捉图形。为了解决捕捉不正确的图形问题，Dynamo引入了运行时验证的保护程序。当然，如果保护程序失败，我们有能力及时重复捕捉图形。最终，这些三个关键设计，部分的，带有保护的图形和及时重新汇编使得Tors Dynamo既可靠又自动化。

Just to give you an example from the previous code fragment we showed. So this example, there is a deliberative introduce graph break in the if statement. To the right hand side, we're printing out the graphs. There are actually three graphs as highlighted in the color bar. And this is actually by design. This is exactly what made Tors Dynamo operating completely transparent from end users.

只是给你举个例子，我们之前展示的代码片段中。在这个例子中，if语句时有一次意图引入图形中断。在右侧，我们打印出三张图，这在颜色条中用颜色标注出来了。这实际上是有设计意图的。这正是使得Tors Dynamo在最终用户面前完全透明的原因。

So I just want to give you a glimpse of the magic behind Tors Dynamo to the left hand side is the normal C Python interpreter. This is what's going to happen when you execute in eager mode to write inside is the contraption build by Tors Dynamo. A few things I want to highlight number one is dynamo solution is built on top of a standard Python using standard feature called pep 523. And the second part I want to highlight is that all the added box. Transparently handle the things we talked about before such as graph picture graph breaks guard validation of guards and recapture a runtime compiling code and executing it fall back to eager. So a lot of these complexity are completely handled seamlessly by toward dynamo execution engine.

我想要向你展示Tors Dynamo魔力的一角。左侧是普通的C Python解释器。当你在急切模式下执行写操作时，它将会发生什么事情，在里面的是Tors Dynamo所构建的机器。我想要强调的第一点是，dynamo solution是建立在标准Python上，使用了标准特性，叫做pep 523。第二点是，所有添加的框架都能透明地处理我们之前所讨论的问题，例如图像的绘制、图像间隔、卫队的验证和运行时编译代码并执行它，回退到原始模式。因此，Tors Dynamo的执行引擎完全无缝地处理了许多这些复杂性。

So dynamo solve the graph capture problem, but keep in mind that capturing graphs do not necessarily actually do not improve performance. So this is where torching doctor came into play. The torching doctor is a pie torching native optimizing compiler and it's too magic behind the 2.0 performance. It is also one of the field training compilers out there for pie torch, but by far the offers the best out of box performance and cover the most models.

动态图机制解决了图形捕捉问题，但请注意，捕捉图形并不一定会真正提高性能。这就是切换编译器的时候了。切换编译器可使用 PyTorch 编译器 TorchScript，它是一种本地优化编译器，是 2.0 性能的背后魔法。它也是 PyTorch 中的先进培训编译器之一，但目前为止，它提供的是最好的开箱即用性能和可覆盖的模型范围。

I do not have enough time to go into details of inductor, but if there is only one thing that I'm allowed to highlight, it is the unique IR design of towards inductor. So inductors designed to handle real models that meant from the very beginning we designed IR to handle the very tricky cases of pie torch, semantics, including the large op surface, the mutation, semantics and dynamic shape. And all of these contributed to inductor being the best performing training compiler for pie torch and for the best coverage for pie torch models as well.

我没有足够的时间详细介绍电感器，但如果只允许我强调一件事，那就是向电感器独特的IR设计。我们的电感器是为处理真实模型而设计的，因此从一开始就设计了IR来处理非常棘手的情况，包括大型op表面、变异、语义和动态形状等。所有这些都使电感器成为最优秀的用于pie torch的训练编译器，并具有最好的pie torch模型覆盖率。

Just wrap up this picture summarizes the journey from 1.0 to 2.0. Five years ago, 1.0 surprised the industry by fully embracing non graph mode execution, which we call eager with the intention of quickly attracting adoption from researchers and let us to be the number one machine learning framework. Today, 2.0 surprised the industry again by introducing this special graph mode under the hood without sacrificing the ease of use UX that makes 1.0 successful. Behind the 2.0 technology are two really cool innovations.

请简单总结这张图片，展示了从1.0到2.0的旅程。五年前，1.0以全面拥抱即时执行模式（我们称之为“eager”）为特点，惊艳了整个行业，旨在吸引研究人员的采用，从而成为第一机器学习框架。今天，2.0再次惊艳整个行业，通过引入独特的图形模式实现不牺牲1.0易于使用的UX的情况下。2.0背后的技术有两个非常酷的创新。

The first one is torch dynamo. This out of box graph capture opened the pathway from eager mode to graph mode so that most pie torch models can seamlessly transition to graph mode without any human effort. Once we open that pathway, the second technology is the optimizing compiler towards inductor and today inductor is the best performing training compiler for pie torch and also covers the most pie torch semantics and handle all the tricky semantics of pie torch.

第一个技术是手电筒动力机。这种开箱即用的图形捕捉技术打开了从急切模式到图形模式的路径，使得大多数PyTorch模型可以无缝地过渡到图形模式，而无需人工干预。一旦我们打开了这条路径，第二个技术就是优化编译器型电感器，而今天电感器是PyTorch表现最佳的训练编译器，也涵盖了大多数PyTorch语义，并处理了所有棘手的PyTorch语义。

So today, you can already use pie torch as it's been released and since March, we have seen countless user testimonies of 2.0 by adding this symbol one liner of torch icon pile, improving their performance from 30% to 2x. We have also seen our partners vendors embracing the PD2 stack by integrating their back and compiler into the PD2 stack. So going forward, the short term focus of 2.0 is to continue to improve performance. We do believe that the 2.0 release and the impressive number we just show before is actually the starting point of the 2.0 journey in terms of pie torch entering graph mode. There is still a lot on the table. And the second part we want to improve is to improve interoperability between 2.0 and other core features of pie torch. There are still some of the features that we have not got the chance to make to work with 2.0 and we don't want users to choose between these core features and 2.0. And keep in mind that graph mode really has a lot of possibilities. So in terms of our longer term go roughly a year mark or beyond. We do see that there are 2 huge venues that we need to invest in.

今天，您已经可以使用Pie Torch，因为它已经发布了，自三月以来，我们已经看到了无数用户证明，通过在火炬图标中添加这个符号一行的堆叠，将他们的性能从30%提高到2倍。我们也看到我们的合作伙伴供应商通过将其后端和编译器集成到PD2堆栈中来支持PD2堆栈。因此，未来2.0的短期重点是继续提高性能。我们确实相信，2.0的发布以及我们刚才展示的令人印象深刻的数字实际上是Pie Torch进入图模式的2.0旅程的起点。仍有许多待解决的问题。而我们想要改进的第二个部分是改进2.0与Pie Torch其他核心功能之间的互操作性。仍有一些功能我们还没有让它与2.0一起工作，并且我们不希望用户在这些核心功能和2.0之间做出选择。记住，图模式确实有很多可能性。因此，就我们的长期目标而言，大约一年或更长时间，我们发现我们需要投资于两个巨大的领域。

Number one is distributed compiler. Because today's training workloads are increasingly large and distributed is an indispensable aspect of training as well as for inference. So with distributed compiler we would be able to optimize both compute and communication. And the second important major feature is pie torch export and this feature would speed up the transition from pie torch from research production and would allow 2.0 to be used in many of the production use cases, both in training and inference. So if you are excited about the 2.0 story, I would invite you to try out towards a compile and give us feedback. And if you are a pie torch developer or a compiler developer, I would invite you to participate in the community and continue to make pie torch the number one machine learning framework in the world. Thank you.

第一项是分布式编译器。因为今天的训练工作负载越来越大，分布式是训练和推断中不可或缺的方面。因此，通过分布式编译器，我们可以优化计算和通信。第二个重要的主要功能是pie torch导出，这个功能将加速pie torch从研究生产的转换，并允许2.0在许多生产用例中用于训练和推断。因此，如果您对2.0故事感到兴奋，我邀请您尝试向编译并给我们反馈。如果您是pie torch开发人员或编译器开发人员，我邀请您参与社区，并继续使pie torch成为全球排名第一的机器学习框架。谢谢。

We have traditionally relied on using CPU based servers for running AI workloads with the increasing compute and memory requirements of these huge AI models has pushed us towards using specialized solutions such as GPUs and other specialty hardware accelerators. Please welcome engineering director Roman Levenstein AI Infra research scientist Amin Firo Shah Young, software engineer Joel Kuban and ASIC engineer Olivia Wu to share first look at MTIA, the meta inference accelerator. MTIA is a very first silicon design in house specifically for our internal AI workloads and systems.

我们传统上依赖使用基于CPU的服务器来运行AI工作负载，但这些巨大的AI模型的计算和内存要求越来越高，使我们转向使用专门的解决方案，如GPU和其他专用硬件加速器。请欢迎工程总监Roman Levenstein、AI基础设施研究科学家Amin Firo Shah Young、软件工程师Joel Kuban和ASIC工程师Olivia Wu分享MTIA——元推断加速器的首次亮相。MTIA是我们内部AI工作负载和系统专门的第一片自主设计的硅芯片。

Roman mean Joel in Olivia, they are going to share details on the design, talk about the challenges and opportunities of developing custom silicon. Take it away folks. We are very excited today to announce MTIA is the meta's first in house accelerator for AI with MTIA we only entire system design from the silicon to the platform to the software stack to the application and it allows us to customize for our unique recommendation workloads and really control our destiny in providing cutting edge AI for our users. So let's talk about why this is such an important and exciting step for us.

罗曼和乔尔将与奥利维亚分享设计细节，谈论开发定制硅芯片的挑战和机遇。让我们开始吧。今天我们非常兴奋地宣布，MTIA是首个为Meta AI提供的内部加速器，与MTIA一起，我们可以从硅到平台，再到软件栈和应用程序进行整个系统设计，并为我们独特的推荐工作负载进行定制，从而真正掌控我们的命运，为我们的用户提供尖端的AI。让我们谈谈为什么这对我们来说是如此重要和令人兴奋的一步。

And meta deep learning recommendation models or DLRMs are a key part of the company's business. They're at the heart of our family of applications such as Facebook, Instagram and WhatsApp. In this graph, we're looking at an important trend we've seen for models serving in production in the data center. So significant growth over time in terms of model size, that is the memory footprint, both in terms of the embedding stored on the device, the yellow line, the model as a whole, blue line and the complexity or the number of computations require per sample and that's the pink line. So keeping up with this model growth in AI requires we deliver ML platform solutions that provide the expected ROI for our business.

元深度学习推荐模型或DLRM是该公司业务的关键部分。它们是我们在Facebook、Instagram和WhatsApp等应用程序系列中的核心。在这张图表中，我们正在关注我们在数据中心中用于生产的模型的一个重要趋势。随着时间的推移，模型的大小（即内存占用量）显著增长，这既涉及设备上存储的嵌入式内容（黄线），也涉及整个模型（蓝线）以及每个样本需要的计算复杂性或计算量（粉线）。因此，要跟上人工智能模型增长的步伐，我们需要提供能为我们的业务提供预期投资回报率的机器学习平台解决方案。

I'm Mr. Olkobern and I work on AI hardware software co-designed at meta. This means I work on designing systems across the hardware software boundary to help deliver platform solutions that will address these model demands. So how do we do this? Traditionally, CPUs were used for serving inference models in production in the data center, but they're not a cost effective solution to keep up with this growth. Hardware acceleration can address the power and performance issues. It provides much more efficient way to serve inference requests and it also provides the compute headroom to scale to future models.

我是奥科伯恩先生，我在Meta公司工作，负责AI硬件软件共同设计。这意味着我致力于设计跨硬件软件边界的系统，以帮助提供平台解决方案，以满足这些模型需求。那么我们该如何实现呢？传统上，在数据中心中，CPU被用于为推理模型提供服务，但它们的成本效益不高，无法跟上这种增长速度。硬件加速可以解决功耗和性能问题，它提供了更高效的服务推理请求的方式，同时也提供了计算空间来适应未来的模型。

So take a look at this graph showing our server capacity increase over a two year deployment period. You can see that the initial demand for increased capacity was met with the NMPI accelerator, so we're switching from the CPU and blue to NMPI and pink. So you can see the requirements for inference quickly outpaced the NMPI capabilities and metapivoted GPUs because they provided greater compute power to meet the growing demand.

请看这个图表，展示了我们在两年时间内服务器容量的增长。你可以看到初始对容量增加的需求被NMPI加速器所满足，所以我们从CPU和蓝色转向NMPI和粉色。你可以看到推断的需求很快超过了NMPI的能力，因为Metapivoted GPU可以提供更大的计算能力来满足不断增长的需求。

But it turns out that while GPUs provide a lot of memory bandwidth and compute throughput, they were not designed with inference in mind. Their efficiency is low for real models despite significant software optimizations. And this makes them challenging and expensive to deploy in practice. So this is why we need MTAA with our in-house accelerator design. We can directly address the requirements of DLRM workloads and adapt to model trends over time.

但事实证明，虽然GPU提供了大量的内存带宽和计算吞吐量，但它们并不是为推理而设计的。即使进行了大量的软件优化，它们在实际模型中的效率也很低。这使得它们在实践中部署变得具有挑战性和昂贵。因此，这就是为什么我们需要使用MTAA和我们自己的加速器设计。我们可以直接满足DLRM工作负载的要求，并随着时间的推移来适应模型趋势。

So let me give a brief overview of our approach with MTAA and describe what makes it successful. So the goal of MTAA is to improve user experience in metapplications. That is, we want to provide more accurate and interesting predictions, increased watch time, higher clickthrough rates, all things that improve the user experience and are driven by better capabilities in AI. So we do this by providing better developer efficiency and better per-protec O over existing solutions.

让我简单介绍一下我们与MTAA的方法，并描述为什么它成功。MTAA的目标是改善元应用程序中的用户体验。也就是说，我们希望提供更准确、更有趣的预测、更长的观看时间、更高的点击率，所有这些都可以通过更好的AI能力来提高用户体验。因此，我们通过提供更好的开发人员效率和更好的每个人的保护来实现这一目标，从而优于现有解决方案。

So developer efficiency, this means we can lower the effort to enable new models, write new kernels and optimize performance so we can get models into production quickly and with high efficiency. And we do this by providing a development ecosystem built on popular and familiar libraries and infrastructure. So we integrate with PyTorch for building models, we innovate in the area of DSLs for kernel authoring and we integrate with emerging technologies like Triton and MILR.

开发人员效率指的是降低启用新模型、编写新内核和优化性能的工作量，以便我们能够快速高效地将模型投入生产中。我们通过建立基于常用且熟悉的库和基础设施的开发生态系统来实现这一点。因此，我们与 PyTorch 整合以构建模型，在内核编写 DSL 领域进行创新，并与 Triton 和 MILR 等新兴技术整合。

For efficiency, per-protec O and time to production, we focus on doing a chip and system design with open source components and leveraging vendor partnerships. With this, we can take advantage of the risk 5 ecosystem, leveraging external IP and open source ISA and the LLVM compiler. All these things allow us to focus on the really critical part for our business which is designing the custom ML acceleration logic that makes Sparzen dense operations more efficient.

为了提高效率、保护 O 和投入生产时间，我们专注于使用开源组件进行芯片和系统设计，并利用供应商合作。借助此方式，我们可以利用风险 5 生态系统，利用外部 IP 和开源 ISA 以及 LLVM 编译器。所有这些使我们能够专注于我们业务的真正关键部分，即设计定制的 ML 加速逻辑，使 Sparzen 密集运算更加高效。

We'll now go into more detail if the MTA design. Amin will present the architecture, Roman will follow and discuss the software stack and Olivia will describe the trends in the design and challenges to scale to future models. And now I welcome Amin to talk about the architecture.

现在我们将更详细地探讨MTA设计。Amin将介绍架构，Roman将跟进并讨论软件栈，Olivia将描述设计的趋势和应对未来模型扩展的挑战。现在，我欢迎Amin来讲述架构。

If you're sure for great introduction and motivation, at this point of the presentation, I would like to review with you the architecture of the accelerator and the design of the systems that are used to deploy these accelerators in the data centers. But before we go and dive into that topic, let's briefly recap what the idea of the acceleration means.

如果你确定想要一个出色的开场白和激励部分，我想在本次演示中与你回顾加速器的架构和用于在数据中心部署这些加速器的系统设计。但在我们深入探讨这个话题之前，让我们简单回顾一下加速的概念。

We have our workloads typically on the CPUs inside the servers, but the CPUs are not equipped enough to handle high demand workloads such as AI. Therefore, these workloads are typically offloaded and are run on adjacent systems that are coupled with the CPU and they are called accelerators. Accelerators either provide a lot more compute power or specialize on performing specific forms of compute such as processing graphics in the GPUs. These are typically tightly coupled with the CPUs in the servers and are controlled and managed by the CPUs.

我们通常将工作负载放在服务器内的CPU上，但CPU还不足以处理高负荷工作负载，如人工智能。因此，这些工作负载通常会被卸载，并在与CPU配对的相邻系统上运行，这些系统称为加速器。加速器要么提供更多的计算能力，要么专门执行特定形式的计算，如在GPU中处理图形。这些通常与服务器中的CPU密切配合，并由CPU进行控制和管理。

My name is Amin. I'm a research scientist in the first structure organization and I have been active in the field of computer architecture for 15 years. With that brief overview, let's take a look at the first in-house silicon that meta has built for its own workloads. In this photo, you can see the silicon die of the MTIH chip, which is fabricated in the 7nm technology from the SMC.

我叫阿明，是第一结构组织的研究科学家，致力于计算机架构领域已有15年。简单介绍完我的背景后，我们来看看Meta为其自己的工作负载建造的第一个内部硅片。在这张照片中，你可以看到MTIH芯片的硅片，它采用SMC的7纳米工艺制造。

It runs at 800 MHz and it's about 317mm square. It has a tight power budget of 25 watts and within that tight power budget, provides 102 tops of integer 8 accuracy computation or 51.2 terraflops of FP16 accuracy computation. The accelerator uses both on-chip and off-chip memories and can provide up to 800GB per second of on-chip memory bandwidth or 176GB per second of off-chip DRM bandwidth.

它的运行频率为800兆赫兹，大小约为317mm平方。它的功率预算非常紧张，只有25瓦特，并且在这个预算内，能提供102个整数8精度计算或51.2万亿次的FP16精度计算。加速器使用芯片内外存储器，可提供高达800GB/s的芯片内内存带宽或176GB/s的芯片外DRM带宽。

Now that you have seen the die photo, let's take a look at the high level accelerator architecture as you can see in this slide. This slide shows the high level architecture of the accelerator chip. As you can see, the accelerator is organized as an 8x8 grid of processing elements that are connected to each other via a mesh network. There are memory resources on the sides of the mesh that are connected to the PEs and can be used by them.

既然你已经看过芯片的照片，那我们现在来看看这张幻灯片中展示的高级加速器架构。这张幻灯片展示了加速器芯片的高级架构。正如你所看到的，这个加速器被组织成一个8x8的处理单元网格，这些单元元素通过网格网络相互连接。在网格的侧面有内存资源，它们连接到处理单元并可以被它们使用。

These on-chip memory resources, which total of 128 MB, can either be used as addressable memory or they can be configured as a memory side cache. In which case, they are supported by 16 LPDDR5 channels that provide connectivity to off-chip DRM chips. There is a dedicated control subsystem and dedicated host interface unit that you can see on the bottom right that connects the accelerator to the CPU on the server.

这些芯片内存资源总计128 MB，可以用作可寻址内存，也可以配置为内存侧缓存。在这种情况下，它们由16个LPDDR5通道支持，提供与片外DRM芯片的连接。底部右侧有一个专用的控制子系统和专用的主机接口单元，将加速器连接到服务器上的CPU。

Now let's do a zoom in and dive into the internals of a PE. This diagram shows internal organization of a given PE. As you can see, the PE is equipped with two processor cores, which are based on RISK5 Open Instruction Set Architecture and are heavily customized to perform their tasks. One of the processor cores is also equipped with the RISK5 Vector Extension and can handle any form of general purpose vector compute. On the right hand side of the diagram, you can see fixed function units that are specialized in performing dedicated forms of compute such as matrix multiplication or calculation of nonlinear functions, or even specialized data movements within the PE or between the PE and the external memory.

现在，让我们放大并深入探讨一下PE的内部。该图显示了给定PE的内部组织。正如您所看到的，PE配备了两个处理器核心，这些核心基于RISK5开放指令集架构，并经过大量定制以执行其任务。其中一个处理器核心还配备有RISK5向量扩展，可以处理任何形式的通用向量计算。在图的右侧，您可以看到专门执行专用计算形式的固定功能单元，例如矩阵乘法或非线性函数的计算，甚至包括PE内部或PE与外部内存之间的专门数据移动。

PE has 128 kB of on-chip memory that can be used by the processor cores or the fixed function units. There is a central command processor connecting the processor cores to the fixed function units. It receives the stream of commands from the processors and distributes and orchestrates their execution on the fixed function units. On the left hand side, you can see general purpose components such as timers or interrupt controllers or a very elaborate debug subsystem that are required for proper functionality of the PEs.

PE拥有128kB的片上存储器，可供处理器核心或固定功能单元使用。有一个中央命令处理器将处理器核心连接到固定功能单元。它接收来自处理器的指令流，并将它们分发和协调在固定功能单元上执行。在左侧，您可以看到一些通用组件，例如定时器或中断控制器或非常复杂的调试子系统，这些组件对PE的正常功能至关重要。

Now, after reviewing the architecture of the accelerator, let's take a look at the design of the systems that are used to deploy these accelerators. In this slide, you can see a picture of the test board for the MTI Accelerator chip with the chip sitting right in the middle. It is using a dual M.2 form factor and has a power budget of 35 watts. It is connected to the host using 8 links of PCIe Gen4 for a total of 12.8 gigabytes of bandwidth to the host. The small form factor and power budget allows us to deploy multiple of these accelerator cards within a given system.

在了解加速器的架构之后，我们来看看用于部署这些加速器的系统设计。在这张幻灯片上，您可以看到MTI加速器芯片的测试板图片，芯片就在中间。它采用双M.2形式，并且有35瓦的功率预算。它使用8个PCIe Gen4连接到主机，总带宽为12.8GB到主机。小尺寸和低功率预算使我们可以在给定的系统中部署多个这些加速器卡。

In this slide, you can see the topology of the system that are used to deploy the accelerators in the data center. Up to 12 accelerator cards can be housed inside the single system. And they are connected to the host CPU and to each other using a hierarchy of PCIe switches. This particular topology allows the accelerators to talk to the host CPU as well as to each other in a peer-to-peer manner, which does not involve the host CPU and does not interrupt the host CPU. The parameters of the system, which is based on the U7A TV3 server specification from OpenCompute Project, are carefully chosen.

在这张幻灯片中，您可以看到用于在数据中心部署加速器的系统拓扑结构。单个系统可以容纳高达12张加速器卡，它们通过一系列PCIe交换机连接到主机CPU和彼此之间。这种拓扑结构使加速器可以以点对点方式与主机CPU以及彼此通信，而不涉及主机CPU和不会中断主机CPU。该系统的参数是基于OpenCompute项目的U7A TV3服务器规格精心选择的。

The amount of host CPU processing power, amount of host DRAM, storage, network bandwidth and acceleration compute power are all balanced such that they are optimal for our current and future workloads. When fully populated, the system consumes around 780 watts of power. But I should note that hardware is only half the story. For having a successful deployment, you also need a very powerful and flexible software stack that can map the resources of the hardware to the needs of the application. And with that, I would like to turn that over to Roman to talk about our software stack.

主机CPU处理能力、主机DRAM容量、存储、网络带宽和加速计算能力都平衡地优化了，以适应我们当前和未来的工作负载。当系统完全填充时，它消耗约780瓦的功率。但我应该指出，硬件只是其中的一半。要实现成功的部署，您还需要一个非常强大和灵活的软件堆栈，该堆栈能够将硬件资源映射到应用程序的需求上。接下来，我想将这个话题转交给罗曼，他会谈论我们的软件堆栈。

Thank you, I'm in for the intro. I'm Roman Levinstein, I'm with Meta for over five years and I'm leading the development of MTAE Software Stack, which I'm going to talk about in my presentation. MTAE Software Stack aims to provide a developer efficiency in high performance. It is fully integrated with PyTorch to provide a familiar developer experience. Using PyTorch with MTAE is as easy as using PyTorch with CPUs or GPUs.

谢谢，我将介绍一下。我是Roman Levinstein，已在Meta工作了五年，负责MTAE软件栈的开发，我将在我的演示中讨论它。MTAE软件栈旨在提高开发者的高性能效率。它与PyTorch完全集成，提供了熟悉的开发者体验。使用MTAE和PyTorch就像使用PyTorch和CPU或GPU一样容易。

The MTAE Software Stack benefits a lot from flourishing PyTorch developer's ecosystem and tooling. On the slide, you can see that MTAE Software Stack consists of multiple logical layers. On the top, you can see the application layer, which represents, for example, a serving stack of a recommendation system. It is operating on top of PyTorch and it's mostly hardware diagnostics supporting back-and-target such as CPUs, GPUs, and MTAE. Below is PyTorch layer, which includes compilers and runtime.

MTAE软件堆栈受益于蓬勃发展的PyTorch开发者生态系统和工具。在幻灯片上，您可以看到MTAE软件堆栈由多个逻辑层组成。在顶部，您可以看到应用程序层，代表例如推荐系统的服务堆栈。它在PyTorch上运行，并且主要支持硬件诊断作为后端和目标，例如CPU、GPU和MTAE。在下面是PyTorch层，其中包括编译器和运行时。

Let's talk about compilers first. Compilers are responsible for converting PyTorch models into efficiently executable NTA code. First, we have the model compiler, which uses PyTorch FX Intermediate Representation for model-level graph transformations and optimizations. It's responsible for making sure that the work and compute and data is distributed among processing element greets, and that the fixed function units accelerating the compute are always kept busy. It gradually converts the PyTorch graph into lower-level representation, which is finally converted into a low-VM Intermediate Representation.

首先，让我们来谈谈编译器。编译器的责任是将PyTorch模型转换为高效可执行的NTA代码。我们首先有了模型编译器，它使用PyTorch FX中间表示进行模型级图形转换和优化。它的责任是确保工作和计算以及数据分布在处理元件群之间，并且保持用于加速计算的固定函数单元始终处于繁忙状态。它逐渐将PyTorch图形转换为较低级别的表示形式，最终转换为低VM的中间表示。

Next, we have the knife, the main specific language. This is our own development, and it's responsible for out-of-generation of efficient MTAE kernels from short, high-level descriptions of ML operators. The library of ML kernels is mostly developed using this domain-specific language, but some of the most performance critical operators, like fully-connected layers or embedded backs, are developed by human experts using low-level C++ and hardware APIs to make full use of available hardware resources. At the bottom of CompilerStack, we have LLVM, which is based on open-source LLVM compiler toolchain with MTAE extensions. It is responsible for the last level of optimizations, such as in-lining, register allocation, and emission of risk-5 executable code for the device.

接下来，我们有刀具，这是主要的领域特定语言。这是我们自己开发的，在机器学习运算符的短、高级别描述中可生成高效的MTAE内核。ML内核库主要是使用这种领域特定语言开发的，但一些性能关键的运算符，如全连接层或嵌入式反向传播，是由人类专家使用低级别的C++和硬件API开发的，以充分利用可用的硬件资源。在CompilerStack的底部，我们有LLVM，它是基于开源LLVM编译器工具链与MTAE扩展的，负责最后一层优化，如内联、寄存器分配和发射器为设备生成的risk-5可执行代码。

Below that, we have PyTorch runtime. PyTorch runtime is responsible for multiple things. It provides such abstractions like MTAE tensors, memory allocation, and most of all, CUDA-like streaming APIs, which are needed for streaming and scheduling operators on the device. It's important to mention that PyTorch runtime for MTAE supports different modes of model execution, including eager mode and graph mode, which is a full-model compilation to maximize performance on the device. It also supports running multiple models partitioned across multiple cards, providing the necessary synchronization and communication channels between them.

在此之下，我们有PyTorch运行时。PyTorch运行时负责多项任务。它提供了MTAE张量等抽象，内存分配，最重要的是类似于CUDA的流API，需要用于设备上的流式和调度操作。值得一提的是，PyTorch运行时支持MTAE的不同模型执行模式，包括即时模式和图模式，后者是完整的模型编译，可最大程度提高设备上的性能。它还支持在多个卡上分区执行多个模型，提供必要的同步和通信通道。

Below PyTorch runtime is the host-side device driver, which is responsible for communication between the host and MTAE devices. And finally, at the bottom, we have a firmware running on the MTAE device, which accepts commands from the host-side runtime and driver and manages the execution of models on MTAE device. It's worth mentioning that MTAE software stack is still evolving, and that we are working on making the compilers and runtime even more powerful by integrating them with the recently released PyTorch 2.0, which was presented by PEMC in her presentation. We also work on integrating MTAE software stack with such new emerging technologies like Torch Dynamo, Torchant Doctor, and we are working on extending Triton domain-specific language to support MTAE ML accelerators. We are also looking into using MLIR Intermediate Representation for more advanced compiler optimizations.

下面是PyTorch运行时的主机设备驱动程序，负责主机和MTAE设备之间的通信。最后，我们有一个在MTAE设备上运行的固件，它接受来自主机端运行时和驱动程序的命令，并管理在MTAE设备上执行模型。值得一提的是，MTAE软件栈仍在不断发展，并且我们正在努力将编译器和运行时与最近发布的PyTorch 2.0集成，这是PEMC在演讲中介绍的。我们也在努力将MTAE软件栈与Torch Dynamo、Torchant Doctor等新兴技术集成，同时我们也在努力扩展Triton领域特定语言以支持MTAE ML加速器。我们也正在研究使用MLIR中间表示实现更高级的编译器优化。

In the next slide, we are going to look at the performance and efficient evaluation of MTAE. We have evaluated MTAE against NNPI and GPUs using a set of DLRM models, representative of what we run in production. They are shown in the following table. We see their Low-Complexity Model, Medium-Complexity Model, and High-Complexity Model. The models vary widely in model size, up to 160 times, and in model complexity, up to 32 times. MTAE must perform well across this whole range of models.

在下一页中，我们将看一下MTAE的表现和高效评估。我们使用一组DLRM模型评估了MTAE，这些模型代表了我们在生产中运行的模型。它们在下表中显示。我们看到它们的低复杂度模型、中等复杂度模型和高复杂度模型。这些模型的模型大小差别很大，最高达到160倍，模型复杂度高达32倍。MTAE必须在整个模型范围内表现良好。

The PyTorch on the right shows a typical breakdown of where the time is spent in a typical DLRM model. We can see that the majority of the time is actually spent on fully connected layers, followed by embedded back layers, and then trailed by such long teleparations like CONCAP, transpose, quantize, and decontize. And others. The breakdown gives us also insight into where and how MTAE is more efficient. MTAE can reach up to two times better perfper what on fully connected layers compared to GPUs.

右边的PyTorch展示了典型的DLRM模型中时间分配的情况。我们可以看到，主要的时间都花在了全连接层上，其次是嵌入背层，然后是像CONCAP、转置、量化和反量化这样的长距离传输。还有其他的地方。这个分解还给我们展示了MTAE在哪里以及如何更高效。与GPU相比，MTAE在全连接层上的性能可以提高两倍。（注：MTAE是一种基于FPGA的加速器）

Now, let's look at MTAE efficiency across the set of models. Just as a note, the MTAE software stack is still evolving. So it is a production software stack that must both adapt to the latest environment, and changes like moving from PyTorch to PyTorch to Point-O, but at the same time, it must operate well across a range of models to provide stability, accuracy, performance, as well as usability. We can see on the slide that MTAE achieves near perfper what parity with GPUs and exceeds perfper what often in Py in all cases. Roofline modeling indicates that there is still much room for improvement.

现在，让我们来看看MTAE在一组模型的效率。值得注意的是，MTAE软件堆栈仍在不断发展。因此，它是一种生产软件堆栈，必须适应最新的环境和变化，比如从PyTorch到Point-O的转变，但同时，它必须在一系列模型上表现良好，以提供稳定性、准确性、性能和可用性。我们可以看到幻灯片上，MTAE在所有情况下都可以接近GPU的性能，并且在Py中通常超过性能。屋顶线建模表明还有很大的改进空间。

MTAE achieves impressive gains up to three times better perfper what on low complexity models, and trails behind GPUs on high complexity models. Which is an area we have not yet focused on optimizing in the software stack, but we are looking into it in the upcoming halves. Modestyles about this results can be found in our upcoming paper for the ISCA conference industry track later this year.

MTAE在低复杂度模型上取得了三倍更高的性能提升，但在高复杂度模型上略逊于GPU。我们还没有专注于优化该软件堆栈中的这一领域，但我们正在研究。有关这些结果的详细信息将在我们即将发布的今年ISCA会议工业轨道的论文中提供。

With that, I would like to hand over to Olivia who will tell us more about the next steps for MTAE. Thank you, Roman.

在此，我想转交给Olivia，她将向我们介绍MTAE的下一步计划。谢谢你，Roman。

Hi, I'm Olivia Wu. I'm the design lead for MTAE ASIC. Today, I'm going to talk about the next step for MTAE Silicon development.

你好，我是Olivia Wu。我是MTAE ASIC的设计负责人。今天，我要谈谈MTAE硅的下一步开发计划。

MTAE has been deploying off-to-shelf CPUs and GPUs in our data center. In this slide, we have a graph that shows the scaling trend of compute, memory, and network bandwidth on CPU and GPUs in the past 20 years. In this graph, compute is in orange, memory bandwidth in green, and interconnect bandwidth in blue. As you can observe here, that the compute's capability has been scaling at twice the pace of memory and interconnect bandwidth across multiple generations of CPU and GPUs.

MTAE 在我们的数据中心中部署了现成的 CPU 和 GPU。在此幻灯片中，我们展示了过去 20 年中 CPU 和 GPU 的计算、内存和网络带宽的扩展趋势图。在此图表中，计算为橙色，内存带宽为绿色，互连带宽为蓝色。正如您在这里所观察到的一样，计算能力在多代 CPU 和 GPU 上的增长速度是内存和互联带宽的两倍。

As we scale our system to support much larger and complex workload, this imbalance had manifested itself as bottleneck in our data center. You can see in the lower-right graph here that some of our workload had spent as much as 58% of the time on networking and data transfer.

随着我们将系统扩展以支持更大和更复杂的工作负载，这种不平衡在我们的数据中心中已经表现为瓶颈。如下右下角的图表所示，我们的一些工作负载花费了高达58%的时间在网络和数据传输上。

By designing the AI Silicon in-house, we are finally able to gain control over the full stack from application to software, system, and silicon. This enables us to finally close the gap and optimize the full stack for our workload and control our own destiny. MTAE is our first ML accelerator that we developed in-house and we learned a lot throughout this process.

我们设计AI硅片，最终能够掌控从应用程序到软件、系统和硅片的整个堆栈。这使我们终于能够缩小差距，为我们的工作负载优化整个堆栈，并掌控我们自己的命运。 MTAE是我们内部开发的第一个ML加速器，我们在这个过程中学到了很多。

As we develop our next generation of ML silicon, we will continue to optimize every aspect of our architecture to strive for a balance between computation, capabilities, memory, bandwidth, and network bandwidth to achieve optimal performance for our workload.

作为我们开发下一代机器学习芯片的计划，我们会一直优化架构的每一方面，以追求计算能力、功能、内存、带宽和网络带宽的平衡，以实现大量工作负载的最优性能。

One of the key advantage of designing in-house silicon is the ability to co-design the architecture with our software team. As Roman had covered earlier, our software stack is fully integrated into our PyTorch ecosystem. With feedback from our co-design team, we are able to introduce new custom instructions and compute primitive for model innovation, create construct that will enable faster operator launch, memory allocation, and easy prototyping, and incorporate features that will allow us to future-proof the silicon design and scale with future workload.

在内部设计硅片的关键优势之一是能够与我们的软件团队共同设计架构。正如Roman之前所介绍的，我们的软件堆栈完全集成到我们的PyTorch生态系统中。通过与我们的共同设计团队的反馈，我们能够引入新的自定义指令和计算原语来推动模型创新，创建能够实现更快的操作器启动、内存分配和易于原型制作的构造，并加入功能，以使我们的硅片设计未来具备可扩展性和适应未来的工作负载。

The advancement in AI is going to provide a tremendous opportunity for us to innovate and push the boundary of technology. Our in-house accelerator will allow us to optimize all the components of the silicon, system, and tooling to improve the cost efficiency of our infrastructure. It enables our software developers to create AI models that will provide more relevant contents, recommendation, and elevates the user experience to the next level.

人工智能的进步将为我们提供巨大的创新机会，推动技术的边界。我们的内部加速器将使我们能够优化所有硅片、系统和工具的组件，以提高基础设施的成本效益。它使我们的软件开发人员能够创建更相关的内容、推荐和将用户体验提升到新的水平的AI模型。

Thanks Roman, Amin, Jolin, orivia. We will now be doing a short break. When we come back, we will share another locator in-house silicon efforts. We are going to focus specifically on video processing. See you shortly.

感谢Roman、Amin、Jolin和orivia。现在我们将进行一个短暂的休息。回来后，我们将分享另一个内部硅芯努力的位置信息。我们将专注于视频处理。稍后见。

Infrastructure fundamentally is what I call the engine behind the growth of this company. We have been building data centers since 2010. We are talking about serving maybe a half of humanity. Now when we are talking about AI, you need a similar engine behind AI so that AI can achieve the potential that we are sort of dreaming.

基础设施从根本上来说是我们公司增长的引擎。自2010年以来，我们一直在建设数据中心。我们正谈论着为将近一半的人类提供服务。现在当我们谈论人工智能时，你需要一种类似的引擎来支持人工智能实现我们所梦想的潜力。

AI workloads are growing at a pace of 1,000X every two years. I mean, just contemplating what that means for our systems, our silicon, our software stack, we're in the middle of a pivot to the next age of information. The models themselves are becoming hundreds or thousands of times larger and more complex and what this is going to require is infrastructure at the XF-LOP level.

AI工作负荷每两年增长1000倍。我的意思是，仅考虑这对我们的系统、硅、软件堆栈意味着什么，我们正在向下一个信息时代的转型之中。模型本身变得更大、更复杂，这将需要XF-LOP级别的基础设施。

We are creating new hardware, including our own silicon. We're building out new kinds of network architectures. We're reimagining the software stack like PyTorch, thousands of engineers are innovating on this large-scale infrastructure infrastructure that's built specifically for AI. Meta training and inference accelerated. It's Meta's first in-house silicon. It was designed solely for recommendation models in mind and it is a piece that fits with the rest of the system like software ecosystem for writing applications and deploying ML models. And it's built in-house. By having it in-house, we are able to optimize every single nanometer of the chip so we don't have any part of the architecture that is wasted and that helps to bring down the power.

我们正在创建新的硬件，包括自己的硅片。我们正在建立新型的网络架构。我们正在重新构思软件堆栈，如PyTorch，成千上万的工程师正在大规模基础设施上进行创新，这些基础设施专为AI而建。元训练和推理加速。这是Meta公司内部开发的第一款硅片。它的设计完全考虑到推荐模型，并且与其他系统组件（如用于编写应用程序和部署ML模型的软件生态系统）相配合。这个硅片是内部设计的。通过拥有内部硅片，我们能够优化芯片的每一纳米，以避免任何架构部分的浪费，并有助于降低功耗。

The fundamental target in designing this MTIA is to provide highest performance in the lowest power and in the process we achieved twice the efficiencies compared to today's GPUs.

设计这个MTIA的根本目标是在最低功耗下提供最高性能，而在这个过程中，我们实现了比今天的GPU高两倍的效率。

Another silicon product built by Meta is MSVP, Meta-scalable video processor. People spend more and more time producing videos and sharing videos. That means more and more pixels will hit our data centers. MSVP processes these videos nine times faster than the traditional software encoders, maintaining the same video quality at par with the software encoders and at half the energy. MSVP can be the final engine that takes all the generative AI, content that people create, eventually needs to be encoded. It can never traverse the internet in its raw format. All these requirements put together, necessitated the design and manufacturing away MSVP.

Meta开发的另一种硅产品是MSVP，即Meta可扩展性视频处理器。人们花费越来越多的时间制作和分享视频。这意味着越来越多的像素将影响我们的数据中心。 MSVP比传统的软件编码器快九倍，同时保持与软件编码器相当的视频质量，并且能耗仅相当于软件编码器的一半。 MSVP可以成为最终引擎，处理人们创建的所有生成式AI，内容最终需要被编码。它不能以原始格式在互联网上传输。所有这些需求一起，需要设计和制造MSVP。

So our new silicon hardware is going to need a new home. We're working on the next generation data center design. We're moving to sort of AI machines that can leverage GPUs or custom silicon that we're developing ourselves and that's going to require a more dense data center design. We're going to be leveraging higher dense racks. The service themselves will be liquid cool to the chip. And with flexibility in mind, we're going to collate service and network together to enable future generations of AI.

因此，我们的新硅硬件需要一个新家。我们正在致力于下一代数据中心设计。我们正在转向可以利用GPU或我们自己开发的定制硅的AI机器，并且这将需要更密集的数据中心设计。我们将利用更高密度的机架。服务本身将被液态冷却到芯片。并且考虑到灵活性，我们将把服务和网络整合在一起，以便启用未来的AI技术。

We believe our research supercluster is one of the fastest AI supercomputer in the world. It is one of the unique places where you can run truly large scale jobs. We have 16,000 GPUs interconnected with 1.6 terabits per second of infinite band network, producing approximately 5X of flops of compute power. We also have almost half an X-abyte of storage, backing the compute plus network.

我们相信我们的研究超级计算机集群是世界上最快的人工智能超级计算机之一。这是一个可以运行真正大规模作业的独特场所之一。我们有16,000个GPU，互联有1.6太比特每秒的无限带宽网络，产生的计算能力约为5X的FLOPS。我们还有几乎半个X字节的存储，支持计算加网络。

The advancements in AI are coming at the right time so that RSC can be put to use to take advantage of the infrastructure and the data and then move fast. We design, create, run and operate all of our infrastructure. So it's a sweet spot of being able to work at scale, work at cutting edge and also do work that literally hits billions.

AI的发展正好适时，这样RSC可以被利用来利用基础设施和数据，然后快速移动。我们设计、创建、运行和操作所有基础设施。因此，这是一个有规模、尖端的工作，也可以跟数十亿人一起工作的甜蜜点。

Infrastructure at scale is what our long term research requires and innovation without it is impossible. I feel like I'm in the middle of a revolution. It's just an incredibly exciting time to be a meta.

在规模化的基础设施支持下，我们长期的研究才能够实现，如果没有这种支持，创新是不可能实现的。我感觉自己正处于一场革命之中，成为了一个元数据处理者是一个充满激情的时代。

We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do.

我们还有很多工作要做。重复了四遍。

We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do. We also have a lot of work to do.

我们还有很多工作要做。

我们还有很多工作要做。重复了十次以上，强调我们面临的工作量很大。

Welcome back. Our next presenters will introduce you to MSVP, Metas, scalable video processor. It's the first generation, so-ve grade video processing hardware accelerator of its kind that we have developed here at Meta. Technical lead manager, Hare Krishna, and Video Infraresource Scientists, Janice, Katzevindis, will describe why we needed to build this, the architecture behind it, and some of the novel algorithms we developed to achieve high quality video transcoding. They will also get into describing how the hardware accelerators are used in Metas DCs to support processing and transcoding billions of videos every single day, and to provide premium video quality to end users, all of this while saving us power. Take it away guys.

欢迎回来。接下来的演讲者将向您介绍MSVP，Metas的可扩展视频处理器。这是我们在Meta开发的一种第一代硬件加速器，可以处理so-ve级的视频。技术领导经理Hare Krishna和视频基础资源科学家Janice，Katzevindis将描述我们为什么需要构建这个系统，其背后的架构以及一些我们开发的创新算法，以实现高质量的视频转码。他们还将描述硬件加速器如何在Metas DC中用于支持每天处理和转码数十亿个视频，并向终端用户提供高品质的视频，同时节省我们的电能。接下来由他们进行演讲。

Hello and welcome. I'll be talking to you about how we process video set Meta, and especially how do that by maintaining the best quality by being very energy efficient using MSVP, Metas scalable video processor. My name is Janice Katzevindis, and I'm part of Video Infra.

大家好，欢迎来到我的演讲。今天我将为您讲解我们如何处理视频集合的元数据，尤其是如何通过使用可扩展视频处理器MSVP来保持最佳质量并且非常节能。我叫Janice Katzevindis，是视频基础设施团队的一员。

Everybody is familiar with Metas family of apps, Facebook, Messenger, Instagram, and WhatsApp, and our hardware products such as the Oculus Quest. There are more than 3.7 billion monthly active users, and more than 2.9 daily users. But you probably didn't know that video overall makes up more than 50% of the time spent on Facebook. Video is king, and that shows up in many of our products, such as our source form video called Reels, premium music video, the unique social experience of watching together, and of course live video.

每个人都熟悉Metas家族的应用，包括Facebook、Messenger、Instagram和WhatsApp，以及我们的硬件产品Oculus Quest。每月活跃用户超过37亿人，每日活跃用户超过29亿人。但您可能不知道，视频在Facebook上的使用时间超过50%。视频较受欢迎，这也体现在我们的产品中，例如我们的源视频Reels、高级音乐视频、独特的社交观看体验，和当然还有直播视频。

What makes unique video processing at Meta is the wide variety of content. We have everything including video demand, live and real time processing, and that includes both user generated and professional content. Here is how and why we process video set Meta.

Meta所进行的独特视频处理的原因之一是其丰富多样的内容。我们拥有所有类型的视频内容，包括视频点播、实时直播，既包括用户生成的内容也包括专业内容。以下是我们在Meta上处理视频的方式和原因。

Everything starts with video on your mobile phone that first gets uploaded to our data centers. There it gets transcoded into different formats and different resolutions. For example, one may need to deliver it to a mobile phone at 700 kilobits per second, or a tablet connected on a Wi-Fi at 2 megabits per second, or your browser on your computer at 20 megabits per second.

一切始于您手机上的视频，先上传到我们的数据中心。那里将对其进行不同分辨率和格式的转码。例如，可能需要将其以每秒700千比特的速度传输到手机上，或者以每秒2兆比特的速度传输到连接Wi-Fi的平板电脑上，或者以每秒20兆比特的速度传输到您计算机上的浏览器中。

There are 4 basic processing steps when transcoding videos. After the video is uploaded, first step is to decode it into frames or pixels, then resize it into smaller resolutions, next step is to encode it into more sophisticated codec, such as AV1, and last but not least, calculating the quality of that transcode using standard quality metrics.

在视频转码过程中，通常包含4个基本处理步骤。上传后第一步是将视频解码成帧或像素，然后将其缩小到更小的分辨率，接着将其编码为更复杂的编解码器，例如AV1，最后计算该转码的质量使用标准质量指标。

Now, there is a 2-part trade-off that everybody is familiar with, and that's a trade-off between quality and bit-spent. But at Meta, we have a third component, and that's the amount of compute we spend to do all this processing. And that trade-off means that you cannot improve all three at the same time. For example, in order to keep constant quality and spend less compute, you pay that by using more bits.

现在，有一种大家都熟悉的两方面权衡，那就是在质量和比特消耗之间进行权衡。但在 Meta，我们有第三个组成部分，那就是进行所有这些处理所花费的计算量。这种权衡意味着你不能同时提高这三个方面。例如，为了保持恒定的质量并减少计算量，你就需要使用更多的比特来支付代价。

Now it's time for my friend, Harry, to explain how we do it using MSWP. Thank you, Yanni, for the great introduction. Hello, I'm Harry Krishna. Before we go through the MSWP architecture, let's look at the motivation behind building MSWP. Meta's billion-scale at video needs an energy efficient and a low latency video transcoding solution. We mostly process pre-encoded videos, which means the video quality is already degraded from the source. So we need an encoder that is at par with the best in-class software encoders to preserve the video quality. So these stringent requirements led us to building the MSWP.

现在是我的朋友Harry解释如何使用MSWP完成工作的时候了。谢谢你的精彩介绍Yanni。大家好，我是Harry Krishna。在我们讲解MSWP架构之前，先让我们看看建立MSWP的动机。Meta数十亿级的视频需求需要一种能高效节能且具有低延迟的视频转码解决方案。我们主要处理预编码视频，这意味着视频质量已经从源头降低。因此，我们需要与最好的软件编码器相当的编码器以保留视频质量。因此，这些严格要求促使我们建立MSWP。

As Yanni mentioned, these are the key components of the transcoder pipeline. Every uploaded video first needs to be decoded to produce the original pixels. We support H264, H265, VP9, and AV1 codec formats. These pixels are then sent through overlay composition, cropping, and rotation as required before being resized to produce the various resolutions we need for the encoding. The resized frames are then encoded for H264 and VP9 format. We also have a calltometric module to compute the similarity metrics for every encoded video.

正如Yanni所提到的，这些是转码管道的关键组成部分。每个上传的视频首先需要解码以生成原始像素。我们支持H264、H265、VP9和AV1编解码器格式。然后根据需要进行叠加合成、裁剪和旋转等操作，然后将其调整大小以生成所需的各种分辨率。调整大小后的帧然后会被编码为H264和VP9格式。此外，我们还有一个计算每个编码视频相似度指标的calometric模块。

In this pipeline, the pixels mostly are exchanged through these modules directly. If not, we also have a large on-chip cache to exchange the pixels which need a slightly longer lifetime. And for those pixels which need much more than a frame worth of life, we send them through off-chip memory. And before we send them to off-chip memory, we also compress the pixels to save energy. This pipeline is programmable to support various quality presets through the firmware running on this RISKFI controller.

在这个流水线中，像素主要通过这些模块直接交换。如果不是的话，我们还有一个大型的芯片缓存来交换需要稍长生命周期的像素。对于那些需要超过一帧时间的像素，我们通过离片存储器发送它们。在发送它们到离片存储器之前，我们还压缩像素以节省能量。这个流水线是可编程的，通过运行在这个RISKFI控制器上的固件来支持各种质量预设。

Apart from this, we also have a JPEG image transcoded in this pipeline. And this pipeline is programmable either to operate as a single pass encoding to for the applications which need very low latency or as a multi pass encoding to produce for high quality videos. And this pixel, this pipeline at the peak, can support single input multiple output transcoding at 1 billion pixels per second with less than 10 watts. Compared to software encoding, this is 9 times faster with half the energy. This pipeline can also support transcoding from 4k to Q-sive and the frame rate varies and is directly proportional to the video resolution.

除此之外，我们还在这个处理流程中进行了JPEG图像转码。这个处理流程可以编程，可以作为单通编码来运行，为那些需要非常低延迟的应用，也可以作为多通编码来生产高质量的视频。在高峰期，这个像素处理流程可以支持每秒10亿个像素的单输入多输出转码，功耗不到10瓦。与软件编码相比，速度快了9倍，能源消耗只有一半。这个处理流程还可以支持从4k转码到Q-sive，帧率会随着视频分辨率的增加而直接增加。

Within the pre-processor, apart from OLED composition, scalar is the key component there. In our scalar, we use 25 tap 2D filters. These offer very high precision filtering and to offer a far superior quality compared to any conventional scalars used in the industry. We also need to support arbitrary frame sizes in our use cases. And then we also support scaling from 4k to Q-sive in a single step.

在预处理器中，除了OLED组合外，标量是其中的关键组件。在我们的标量中，我们使用了25个点的二维滤波器。这些滤波器提供非常高精度的滤波，并且比行业中使用的任何传统标量都要具有远远优越的质量。我们还需要在使用中支持任意的帧大小。此外，我们还支持从4k到Q-sive的单步缩放。

Within the transcoding pipeline, encoder is the most compute intensive module and the architecture choices we make pretty much dictate the video quality of all the videos coming out of the transcoder. We use a 3 stage motion search which is programmable and can support a very wide search range. We support plus or minus 512 pixels in the horizontal direction and a plus or minus 160 pixels search in the vertical direction across multiple reference frames. We also support a near exhaustive mode decision using rate distortion in every decision. Rate distortion optimizer or RDO is one of the best known practices in video compression to determine the optimal mode decision. The distortion calculation itself is very compute intensive but parallelizable. However, the rate estimation is very serial in nature. We use a novel rate estimation model in MSVP to allow us using multiple of these RDO engines in parallel to get us the speed we need. We also use many smart quantization techniques and also other proprietary algorithms in our video pipe. And finally, we also use 3 of these encoder pipes to process 3 consecutive macro block or super macro block rows in a way front parallel manner.

在转码流程中，编码器是计算量最大的模块，我们所做的架构选择基本上决定了转码器输出的所有视频的视频质量。我们使用三阶段运动搜索，这是可编程的，可以支持非常广泛的搜索范围。我们支持横向正负512像素和纵向正负160像素搜索多个参考帧。我们还支持近似耗尽模式决策，使用完全决策过程中的速率失真。估算速率失真或RDO是视频压缩中广泛使用的最佳实践之一，用于确定最佳的模式决策。失真计算本身计算量很大，但是可以并行化。但是，速率估计本质上非常串行。我们在MSVP中使用了新颖的率估算模型，以允许我们同时使用多个这些RDO引擎，以获得所需的速度。我们还使用许多智能量化技术以及其他专有算法在我们的视频管道中。最后，我们还使用了三个这样的编码器管道以前进式并行处理三个连续的宏块或超级宏块行。

Here we show you the quantum metric module. The way we implement and use quantum metric in our video traffic is very unique to MSVP. So, we support SSIM, multi-scale SSIM, WIF, PSNR and no reference matrix like BLUR in our quantum metric module. And also the quantum metric module is pretty compute intensive in the sense that for every in a typical case, for every uploaded video, we need to produce 5 different encoding resolutions. And for each of these encoding resolutions, we need to compute similarity matrix at 5 different viewport resolutions, which means total we need about 25 quantum metric computations for every uploaded video.

在这里，我们向您展示量子度量模块。我们在视频传输中实现和使用量子度量的方式非常独特，这是MSVP的特色。因此，我们在我们的量子度量模块中支持 SSIM、多尺度SSIM、WIF、PSNR 和无参考矩阵（如BLUR）。同时，由于每个典型情况上传的视频都需要生成5个不同的编码分辨率，因此量子度量模块在计算上很耗费资源。对于每一个这些编码分辨率，我们需要在5个不同的视口分辨率上计算相似度矩阵，这意味着每个上传的视频我们总共需要进行约25次量子度量计算。

So much goodness here. So now it's my turn to show you how we use it in our data centers. The wide variety of videos and the distribution of popularity dictates different treatment for different videos. Videos with less views get the so-called basic treatment while the more popular videos get our advanced family of encodings. This is all enabled by MSVP. In this example, you see how by doing multiple encodings at different resolutions, we can get the optimal settings that allows us to get the best quality for a different bitrate. But the best part of it is that after you do all these multiple encodings using MSVP, you can get the corresponding settings and do another pass using a software encode to get even better quality. In this example, you see how the convex hole, which is those best quality settings, can translate from AVC to AV1, giving us an additional 65% bitrate savings. So in summary, we use MSVP because we have a wide variety of content, both premium and user generated, VOD live in real time. We use those best practices and that gives us the best end-to-end quality. MSVP allows us to do both the basic encoding using the lowest amount of latency all the way to the advanced tabier encoding to push quality to the max.

这里有很多好处。现在轮到我展示我们在数据中心如何使用它。各种视频的流行度决定了不同的处理方式。观看次数较少的视频采用所谓的基本处理，而较受欢迎的视频则采用我们的高级编码。这一切都得益于MSVP。在此示例中，您可以看到通过在不同分辨率上进行多次编码，我们可以获得最佳设置，从而使我们能够以不同的比特率获得最佳质量。但最好的部分是，在使用MSVP进行所有这些多重编码之后，您可以获得相应的设置，并使用软件编码进行另一次更好的质量。在此示例中，您可以看到凸出的孔，即那些最佳质量设置，可以从AVC转换为AV1，进一步节省65%的比特率。因此，总结起来，我们使用MSVP是因为我们拥有各种内容，包括高级和用户生成的VOD即时现场内容。我们使用这些最佳实践，从而获得最佳的端到端质量。MSVP使我们可以进行基本编码，使用最少的延迟以及高级编码，从而将质量推至最大。

Finally, here, we show you how all the transcode IPs we discussed earlier is packaged with the PCIE controller to connect to the host. And then we also have a memory controller to connect to the off chip, a memory where we store all the intermediate pixels. We also have a secure boot processor to authenticate the firmware running on this ASIC. We also have many peripherals to help us on the debug and diagnostics. On the right, we show you the die shot. This is a 100 millimeter square chip in 7 nanometer and encoder takes more than 50% of the area here. And finally, this SOC along with LPDDR modules is packaged onto this M.2 connector. We have two of these connectors with the heat sinks going on to into a GPV3. This GPV3 is coupled with a host server into a sled and we have two of these slides going into a cubby and we have several of these cubbies going into the data center. I'll now hand it off to Yani. We'll talk about how MFCP is used in our data centers. Thank you very much.

最后，在这里，我们展示了如何将之前讨论的所有转码IP与PCIE控制器打包连接到主机。然后，我们还有一个内存控制器连接到芯片外部，用于存储所有中间像素的存储器。我们还有一个安全启动处理器来验证在此ASIC上运行的固件。我们还有许多外设来帮助我们进行调试和诊断。在右侧，我们向您展示了芯片截图。这是一颗100平方毫米的芯片，采用7纳米工艺，编码器在这里占据了超过50％的面积。最后，这个SOC连同LPDDR模块一起打包到这个M.2连接器上。我们有两个这样的连接器，散热器连接到GPV3上。这个GPV3与主机服务器配对，放入一个抽屉中，我们有两个这样的抽屉放入一个仓库中，我们有几个这样的仓库放入数据中心。现在我把它交给雅尼。我们将讨论如何在我们的数据中心中使用MFCP。非常感谢。

It also gives us many potential other opportunities, such as to use it in live and broadcasting, where the even lower latency matters a lot. To advance pre-processing, both to improve quality and also use it for video understanding. Our north side is always to have flowed the majority of those stable and mature video processing in order to use software only as a thin layer to further boost quality wherever that matters.

这也为我们提供了许多其他潜在的机会，比如在现场直播和广播中使用它，即使更低的延迟也非常重要。为了提高质量和用于视频理解，进一步推进预处理。我们的方向是在大多数稳定和成熟的视频处理中引领潮流，以便软件只作为薄层使用，进一步提高重要的质量。

So are we done yet? Well, we'd like to do many more of these by adding more video code extender, such as AV1. We're going to support more and better video quality metrics. We're going to further improve video and audio quality by doing pre and post-processing, such as denoising or de-blocking and artifact removal. We're going to push even more pixels through the same die, and we'd like to stay focused on matters most important use cases, such as to reduce the compute and storage footprint, to improve quality for our short form videos, and to enable the immersive 360XRVR videos and other metaverse content. Thank you, and we welcome your collaboration.

那么，我们已经完成了吗？其实，我们想要通过增加更多的视频码流扩展器（如AV1）来做更多的工作。我们将支持更多、更好的视频质量度量标准。我们将通过预处理和后处理（如去噪声、去块状和去伪影）进一步提高视频和音频质量。我们将在同样的芯片上输出更多像素，并致力于最重要的使用案例，如减少计算和存储占用、提高短视频的质量以及支持沉浸式360度、XR、VR视频和其他元宇宙内容。谢谢大家，并欢迎您的合作。

Thanks, Hari and Yarnis. With thousands of developers across meta working in multiple programming languages across the company and writing billions of lines of code, software development is truly the core of our business. As we look to help our developers continue to be as productive as they possibly can be, we develop code compose. It's a GenieB's coding assistant. Code compose suggests code that a developer might be likely to type next. It's intended to be quick, unobtrusive, and helps accelerate the process of code authoring. Met a software engineer Michael Ballin joins us now with an inside look at code compose, and a longer term vision to help developers across the whole SDLC, a software development lifecycle.

感谢Hari和Yarnis的工作。在全球范围内，我们有成千上万的开发人员在使用许多编程语言编写着数十亿行的代码，因此，软件开发真正是我们业务的核心。为了让我们的开发人员保持高效率，我们开发了代码组合。这是GenieB的编码助手。代码组合建议开发人员将要输入的代码，旨在快速、不显眼，有助于加快编码的过程。Metac的软件工程师Michael Ballin现在将与我们分享代码组合的内部情况，以及长期愿景，帮助开发人员在整个软件开发周期（SDLC）中更加高效。

Take it away Michael. Hi, my name is Michael Bolin, and I'm a software engineer at Meta. Today, I'm going to show you how we leverage generative AI for code authoring using an in-house tool we have developed called code compose. Code compose is a code completion service that suggests code as you type in an editor such as VS code. The underlying model is built on top of public research from fair that we have tuned for our internal use cases and code bases. On the product side, we are able to integrate code compose into any surface where our developers or data scientists work with code.

接下来由Michael来介绍。大家好，我是Meta的软件工程师Michael Bolin。今天我将向大家展示我们如何利用生成式人工智能技术来编写代码，我们使用了一款名为“code compose”的自主开发工具。Code compose是一项代码补全服务，它可以在像VS Code这样的编辑器中为您提供代码建议。基础模型是建立在来自fair公共研究基础上，并经过我们的内部优化以适应我们的使用情况和代码库。在产品方面，我们能够将code compose集成到任何开发人员或数据科学家处理代码的界面中。

As we will show, by taking ownership of the product from end to end, we've been able to build a compelling code completion service for developers at Meta, deployed at scale, which contributes to a meaningful portion of code authored at the company.

正如我们将展示的那样，通过从始至终拥有产品，我们已经成功地为Meta的开发人员构建了一个引人注目的代码完成服务，并且得到了大范围的部署。该服务在公司编写的代码中占据了有意义的一部分。

Today, we'll start by examining the Gen AI model that powers code compose. Next, we'll provide a brief look at the product architecture, followed by a demo where you can see it in action. Finally, we'll discuss the impact that code compose has had at Meta.

今天，我们将首先检查驱动代码构成的Gen AI模型。接下来，我们将简要介绍产品架构，然后进行演示，您可以在其中看到其实际应用。最后，我们将讨论代码构成对Meta带来的影响。

Let's start by digging into the model. About a year ago, Fair released encoder, a generative AI model, trained on code. This model comes in two variants, a smaller model with 1.3 billion parameters and a larger model with 6.7 billion parameters. For the code compose service, we use both variants of encoder, which, based on context, lets us optimize the trade-off between quality and latency. A key difference between encoder and other large language models currently used for code generation is its ability to perform in-filling. In-filling lets the model suggest text within existing text, which is an action that is important for code authoring and editing. While the public version of encoder has been an invaluable starting point for code compose, we have made a number of notable improvements along the way.

让我们从深入研究模型开始。大约一年前，Fair发布了一款名为Encoder的生成式AI模型，该模型是通过对代码进行训练来实现的。该模型有两个版本：1.3亿个参数的较小模型和6.7亿个参数的较大模型。对于代码构建服务，我们使用了Encoder的两个变体，根据上下文，让我们优化质量和延迟之间的权衡。Encoder和其他目前用于代码生成的大型语言模型之间的一个关键区别是其能够执行内嵌操作。内嵌能够让模型在现有文本内部建议文本，这对于代码撰写和编辑非常重要。尽管公共版本的Encoder已经是代码构建的重要起点，但我们在路上已经进行了许多显著的改进。

For starters, we fine-tune the model on first-party code, exposing it to our internal libraries and frameworks, so code compose can incorporate them into its code suggestions. In particular, at Meta, we are heavy users of Hack and Flow, which are programming languages that were not well represented when the original encoder model was trained. Fine-tuning on our first-party code helps close that gap.

首先，我们会对内部的代码进行优化，将其暴露给我们的内部库和框架，让代码合成可以将其纳入到其代码建议中。特别是在Meta，我们会频繁使用Hack和Flow等编程语言，但这些语言在原始编码器模型的训练中没有得到很好的代表。通过对我们的内部代码进行微调可以缩小这一差距。

Further, incurrating the code used for fine-tuning, we exclude files that contain patterns we want to discourage, such as deprecated APIs like React, Create, Class, and Flow, or code containing errors suppressed via an H-FixMe annotation and Hack. We also supplement our training data with code that does not live in source control, such as Jupyter notebooks. The net result of these investments in training data have yielded impressive results.

此外，在热调码用于微调时，我们会排除包含我们不希望使用的模式的文件，例如已过时的API（如React、Create、 Class和Flow）、通过H-FixMe注释和Hack抑制错误的代码等。我们还会用不在源代码控制内的代码（例如Jupyter笔记本）来补充训练数据。这些训练数据的投入从而产生了显著的结果。

To assess the impact of fine-tuning, we ran an offline evaluation of different versions of the model to measure its ability to infill an exact match on first-party code across various languages. For the experiment, we prepared three versions of the model. As a baseline, we took the original 1.3 billion encoder model and used it to infill a random sampling of mass snippets of first-party Python, Hack, and Flow code. As expected, the model performed the best on Python, reproducing the original code over 20% of the time. Surprisingly, the model was also able to yield an exact match over 10% of the time for Hack and Flow, despite those languages not being well represented in the original training data.

为了评估fine-tuning的影响，我们对模型的不同版本进行了离线评估，以衡量其在各种语言的第一方代码中填充精确匹配的能力。为此，我们准备了三个版本的模型。作为基准，我们采用了原始的13亿编码器模型，并将其用于填充第一方Python、Hack和Flow代码的大量片段。正如预期的那样，模型在Python上表现最佳，超过20%的时间能够复现原始代码。令人惊讶的是，该模型在Hack和Flow上也能够在10%的时间内得到精确匹配，尽管这些语言在原始训练数据中没有很好地代表。

Next, we took the original 1.3 billion encoder model and fine-tuned it exclusively on first-party Python code and re-ran the infilling analysis. As expected, this improved the exact match for Python in our experiment, jumping from 23% to 36%. Though an unexpected result was how fine-tuning on Python also improved the scores for Hack and Flow, illustrating the potential benefits of transfer learning in large language models.

接着，我们拿原来的13亿Encoder模型，专门在第一方Python代码上进行微调，并重新进行了填充分析。预料之中，这次微调提高了我们实验中Python的精确匹配度，从23%升至36%。但一个出乎意料的结果是，Python的微调还改进了Hack和Flow的得分，这显示了大型语言模型中迁移学习的潜在好处。

Finally, we did an additional fine-tuning to include first-party Hack and Flow code and re-ran the analysis once more. As expected, this further improved the model's ability to infill exact matches for Hack and Flow, though the score for Python had a small decrease of less than 1%. As you can see, being able to fine-tune on first-party code is a significant advantage of building our own code completion service.

最后，我们进行了额外的微调，包括第一方 Hack 和 Flow 代码，并再次运行分析。预料之中，这进一步提高了模型填补 Hack 和 Flow 的精确匹配的能力，尽管 Python 的得分略微降低不到 1%。正如您所看到的，能够在第一方代码上进行微调是建立自己的代码完成服务的重要优势。

In addition to experimenting with the training data, we also made some changes to the model architecture itself. In our current code-compose integration in VS Code, we only request completions when the user's cursor is in an empty block or if it occurs at the end of a line after a hard-coded set of trigger characters, such as a period or parenthesis.

除了对训练数据进行实验外，我们还对模型架构本身进行了一些更改。在我们当前的VS Code代码组合集成中，仅当用户的光标在空块中或在硬编码的触发字符集之后的行末出现，例如句号或括号时，我们才请求完成。

In practice, this means we provide completions not at arbitrary offsets within a document, but only in a limited set of circumstances. However, the original encoder model was trained using its own causal masking objective in which source code is tokenized using byte-parent coding before the region to be masked is selected. Because mask boundaries were not guaranteed to align with trigger character boundaries in the original source code, we discovered that this led to a mismatch between training and inference that resulted in poor mid-word predictions.

在实践中，这意味着我们仅在有限的情况下提供自动完成功能，而不是在文档的任意偏移处进行。然而，原始的编码器模型是使用其自己的因果蒙版目标进行训练的，其中源代码在选择要屏蔽的区域之前使用字节父编码进行标记。由于掩码边界不能保证与原始源代码中的触发字符边界对齐，我们发现这导致了训练和推断之间的不匹配，从而导致了不良的中间词预测结果。

To address this, we refined the training objective to something we call language causal masking, in which the code is partitioned on trigger character boundaries and the mask is selected from the resulting segments. Only after this is done do we tokenize the three segments individually, the code before, the code after, and the target code that was masked. This resulted in gains of up to 61% in offline evaluation of exact match.

为了解决这个问题，我们改进了训练目标，称之为语言因果屏蔽。在此过程中，我们根据触发字符的边界将代码分割，并从结果片段中选择掩码。仅在完成此过程后，我们才对三个片段进行标记化处理，即先处理被屏蔽的目标代码，然后分别处理它之前的代码和之后的代码。这使得离线评估精确匹配的收益增加了高达61％。

So now, we've talked a lot about the model, but let's see how we can leverage it to provide code suggestions to end users. At meta, we have a tier of machines equipped with powerful GPUs, each with sufficient memory to host both the 1 billion and 6 billion variants of the code compose model. Clients can make requests to this tier via thrift.

现在，我们已经讨论了很多关于这个模型的内容，那么让我们看看如何利用它为最终用户提供代码建议。在Meta公司，我们有一层配备了强大GPU的机器，每个机器的内存足够存储1亿和60亿变体的代码合成模型。客户可以通过Thrift向这一层提出请求。

The caller specifies the code before and after the cursor, as well as the file path, language, and which model to use to process the request. The caller also decides whether to request a single line or multi-line code completion. In practice, clients request single line suggestions from the smaller model and request multi-line suggestions from the larger model.

调用者指定光标前后的代码，以及文件路径、语言和用于处理请求的模型。调用者还决定是否请求一行代码或多行代码完成。实际上，客户端从较小的模型请求单行建议，从较大的模型请求多行建议。

To mediate requests between the client and server, we implemented a language server and Rust that we reuse across our various editor integrations. For editors such as VS code that support LSP natively, we require relatively little glue code to create the code compose extension. For editors such as Android Studio that do not have native LSP support, we build a small adapter to proxy requests to our LSP.

为了在客户端和服务器之间进行请求的协调，我们实现了一个语言服务器和Rust，在我们不同的编辑器集成中重复使用。对于像VS Code这样本来就支持LSP的编辑器，我们只需要写很少的粘合代码就可以创建代码编写扩展。对于像Android Studio这样没有本地LSP支持的编辑器，我们构建了一个小的适配器来代理请求到我们的LSP。

Further, this architecture makes it straightforward to integrate code compose with our in-house developer tools, such as bento, which is a web-based Jupyter Notebook UI created internally at meta. Because bento also supports LSP, it was easy to provide our data scientists with the same AI-powered code suggestions as developers working in VS code. This ability to plug code compose into any code editing surface internally is an advantage of owning the entire stack. Finally, in addition to reducing the amount of integration work required to support a new surface, making the LSP responsible for the bulk of the client-side logic ensures that metrics are recorded consistently across these surfaces. Implementing fine grain telemetry to accurately capture the impact of code compose is important in improving both the model and the product. This tight feedback loop has helped us make fast progress in this space.

此外，这种架构使得将代码组合与我们内部开发工具（如 bento，一个由 Meta 内部创建的基于 web 的 Jupyter Notebook 用户界面）集成非常简单。由于 bento 还支持 LSP，因此我们的数据科学家能够像在 VS code 中编写代码的开发人员一样获得 AI 提供的代码建议。将代码组合插入任何代码编辑界面的能力是拥有整个堆栈的优势。最后，在减少支持新界面所需的集成工作量的同时，使 LSP 负责大部分客户端逻辑可以确保在这些界面上记录的指标保持一致。实施细粒度的遥测以准确捕捉代码组合的影响对于改进模型和产品都非常重要。这种紧密的反馈循环帮助我们在这个领域取得了快速进展。

We've talked a lot about code compose, but now it's time to see it in action. We'll start in VS code. Code compose makes suggestions as the user types that can be accepted by pressing tab. It shows single line suggestions to complete a line or multi-line ones to fill in an entire block of code. Code compose can take advantage of the surrounding code to provide better suggestions. Here we see that adding an import statement and refining the function signature helps code compose produce a more specific implementation of the function. Code compose also uses prose comments as signal and generating code, as shown here, updating the doc string to mention the use of PS causes code compose to suggest a new implementation that does just that. And a comment to specify the use of PSAUX refines things further. Because code compose takes code before and after the cursor into account, it can suggest things like annotations and import statements.

我们已经谈论了很多关于代码组合的内容，但现在是时候在实践中看到它的作用了。我们将从VS代码开始。在用户键入时，代码组合提供建议，可以通过按tab键接受。它显示单行建议以完成一行或多行建议以填充整个代码块。代码组合可以利用周围的代码来提供更好的建议。在这里，我们看到添加一个导入语句和优化函数签名帮助代码组合生成更具体的函数实现。代码组合还使用平文注释作为信号和生成代码，如此所示，更新文档字符串以提到使用PS导致代码组合建议一种新的实现。并且指定使用PSAUX的注释可进一步优化代码。因为代码组合考虑光标前后的代码，所以它可以建议注释和导入语句等内容。

Now let's look at code compose in bento. Just like in VS code, you can use comments to help code compose generate suggestions. Given the interactive nature of notebooks, this results in a tight feedback loop for exploring data. As you can see, code compose considers the code from the surrounding cells, allowing each step to build upon the previous steps.

现在让我们来看看Bento中的代码组合。就像在VS code中一样，您可以使用注释来帮助代码组合生成建议。由于笔记本的交互性质，这使得探索数据的反馈循环更加紧密。正如您可以看到的那样，代码组合考虑来自周围单元格的代码，使每一步都建立在前面的步骤之上。

Now that you've seen code compose in the wild, let's see what sort of impact did this had at meta. Anacdotally, we have received a lot of positive feedback about how code compose has helped people write code better and faster. But let's look at some numbers as well. Python was the first language supported by code compose and usage has grown steadily as we continue to improve the service. Today, thousands of employees at meta are accepting suggestions from code compose every week. For suggestions visible to the user for at least 750 milliseconds, our acceptance rate is over 20% in climbing.

既然你已经在生产环境中看到了Code Compose技术的应用，现在让我们来看看它对Meta产生了什么样的影响。根据我们的经验，我们收到了许多积极的反馈，表示Code Compose能够帮助人们更好、更快地编写代码。但是，我们也来看看一些数字。Python是Code Compose首次支持的语言，随着我们不断完善服务，它的使用量也在稳步增长。今天，Meta上成千上万的员工每周都会接受Code Compose的建议。对于用户至少在屏幕上停留了750毫秒的建议，我们的接受率已经超过了20%并且仍在逐步提升。

We are extremely excited with our progress on code compose today and we believe that our developers are best served by bringing this work in-house. First, as we have shown today, having the flexibility to experiment with the model has made it possible to support languages like hack and flow, avoid undesirable code patterns and perform all sorts of analyses and explorations to improve the service that would not have been possible if the model were a black box. Second, because we control the product from end to end, we can integrate code compose anywhere in any of our vast array of internal tools and our LSP architecture makes it economical to do so. Third, privacy and security are fundamental to all the work we do here at meta. Building code compose in-house ensures we do not share our code with third parties and that the telemetry we collect is subject to access controls and there you have it.

今天我们在代码组合方面取得的进展让我们非常兴奋，我们相信将这项工作内部化可以为我们的开发人员提供最佳服务。首先，如今我们拥有灵活性，可以尝试不同的模型，这使得我们可以支持像hack和flow这样的语言，避免不良代码模式，并执行各种分析和探索来改进服务。如果这个模型像黑匣子一样，这些都是不可能的。其次，因为我们从头到尾控制产品，我们可以将代码组合集成到我们广泛的内部工具中的任何地方，而我们的LSP架构使得这样做经济实惠。第三，隐私和安全在Meta这里进行的所有工作中都是根本性的。内部构建代码组合确保我们不会与第三方共享我们的代码，而我们收集的遥测数据也受到访问控制的限制。以上便是我们的想法。

We covered a lot today, but if you still want to learn more, we are publishing a blog post and paper with more details about code compose. That said, we are still early in our work in this space, so we look forward to sharing more about our progress going forward. Thanks for listening. I hope you enjoy this conversation with us.

今天我们讲了很多，但如果您还想学习更多，我们将发布一篇有关代码组成更多细节的博客文章和论文。虽然如此，在这个领域，我们的工作仍处于早期阶段，因此我们期待着分享我们未来的进展。感谢您的聆听。希望您喜欢与我们的这次交谈。

I also took on the role as the head of product for our newly formed generative AI team. Today I'm going to dive into some of our AI infrastructure projects with this great group of people. I'll kick it off with some intros, Rachel first.

我还担任我们新成立的生成式人工智能团队的产品负责人。今天我将与这些优秀的人一起探讨我们的一些人工智能基础设施项目。首先，我将介绍一下 Rachel。

Great, thanks, Serena. Hi, I'm Rachel Peterson and I lead the Data Center strategy team, which looks at our data center capacity needs of today, but also ensures that we're prepared for the future. Since 2010, Meta has been designing, building and operating its own data centers. We've always done this with a mindset of innovation, efficiency, and reliability. Over at least the last 18 or so months, our data center organization has been really essentially focused on AI and the future of AI and preparing us for this. And so while our current data center designs have been supporting all of our AI workloads, we're really looking towards the future. And our teams have been really focused on designing and deploying a next generation data center, which is not only able to support our current needs around AI and our current products and services, but also for our needs of the future.

太好了，谢谢你，Serena。你好，我是Rachel Peterson，我领导着数据中心战略团队，该团队负责查看我们今天的数据中心容量需求，并确保我们为未来做好准备。自2010年以来，Meta一直在设计、建造和运营自己的数据中心。我们始终以创新、效率和可靠性的思维来完成这个任务。在至少过去的18个月中，我们的数据中心组织一直专注于人工智能和未来的人工智能，并为此做好了准备。虽然我们目前的数据中心设计一直支持所有的人工智能工作负载，但我们真正看向未来。我们的团队一直在专注于设计和部署下一代数据中心，不仅能够支持我们当前的人工智能需求及我们当前的产品和服务，还能满足我们未来的需求。

I'm going to pass it off to Alexis next. My name is Alexis Berling and I support our AI systems and accelerated platforms teams here at Meta. Our teams are responsible for designing, delivering, and supporting our compute and storage platforms at scale. In addition to building and delivering our compute storage and networking hardware, all the way down to the silicon itself. One of the biggest missions and visions of our team is ultimately to deliver software defined hardware. That is really taking into account the needs of our evolving workloads, developer efficiency, our entire software stack, so that we can deliver hardware that is optimized for the needs of our users. We're in a unique position here at Meta to do that because we've got the end-to-end ecosystem in-house. Kim, and then we work together a bunch, though. Absolutely, yeah. Thanks so much, Arita. Hi, everybody. My name's Kim Hazelwood. Broadly, I lead the organization that provides our AI research infrastructure solutions for our cutting-edge AI research that's happening within the research organizations. So we develop things like the AI research supercluster, as well as the entire software stack that is used by the researchers in our organization. We're also doing some cutting-edge systems research as well, right at that intersection between systems and machine learning. And all of this is being used to unlock some of the latest breakthroughs in AI. And next, the partner in crime on the XAI team? Hi, my name is a partner of money, and I am responsible for a data AI and developer infrastructure. And so collectively, my team's responsible for all the components across the data and software life cycles. So we start with the teams that build the systems for prepping data, and having it be ready for training. We have teams that build the systems for training and inference stacks. And we also have a team of specialists that are just incredible engineers working on all of the developer experiences. So everything from enabling both machine learning engineers, as well as software engineers, that build applications that serve the three-billion users that we have. It's an exciting time to be in AI, both for us at Meta, because we have this incredible, ambitious roadmap ahead of us for AI. But also just looking at the transformative power of AI across our lives on a daily basis.

我接下来会转交给Alexis。我叫Alexis Berling，负责支持Meta公司的AI系统和高速平台团队。我们的团队负责设计、交付和支持大规模计算和存储平台，以及开发和交付计算、存储和网络硬件，甚至到硅本身。我们团队最大的任务和愿景之一是最终交付软件定义的硬件，考虑到我们不断演进的工作负载、开发者效率和整个软件栈的需求，以便为我们的用户优化硬件。我们在Meta公司处于独特的地位，因为我们拥有内部完整的生态系统。Kim，我们经常一起工作。当然，谢谢你，Arita。大家好，我叫Kim Hazelwood。我领导的广泛组织为我们在研究机构中正在进行的尖端AI研究提供AI研究基础设施解决方案。所以我们开发类似AI研究超级集群以及研究人员在我们组织中使用的整个软件栈。我们还在系统和机器学习之间的交叉口进行一些尖端的系统研究。所有这些都被用来解锁一些最新的AI突破。接下来是XAI团队的搭档？你好，我叫Partner of money，我负责数据AI和开发人员基础架构。因此，我的团队集体负责数据和软件生命周期的所有组成部分。我们从构建准备数据的系统开始，使其可以用于培训。我们有构建培训和推理堆栈的系统的团队。我们还有一支由令人难以置信的工程师组成的专业团队，致力于所有开发者体验。从让机器学习工程师和软件工程师都能够开发为我们所拥有的30亿用户提供服务的应用程序。这是一个令人兴奋的AI时代，不仅对于我们在Meta公司来说，因为我们在AI方面有着雄心勃勃的路线图，而且看到AI在我们的日常生活中的变革能力。

So, AI, it's a great time to be here. So many of you mentioned this transformation, the impact AI has had with the evolution of AI. How has it impacted Meta's infrastructure that you've seen today? Also, as the company evolves, we're more and more reliant from family of apps to the metaverse to our devices that we produce. How are you able to adjust your infrastructure strategy as you do that? Rachel, do you want to kick us off on that?

嗨，人工智能，现在是一个伟大的时代。你们许多人提到了这一变革，AI对于Meta的基础架构产生了哪些影响？随着公司不断发展，我们越来越依赖于应用程序、元宇宙和我们生产的设备。你们如何能够调整基础设施策略来适应这种发展？Rachel，你想先谈一谈吗？

Our current generation of data center designs is world-class energy and power use efficiency. And it's actually really supported us through multiple generations of server, storage, and network. And it's really able to serve our current AI workloads really well. But as we look towards the future, it's always been about planning for the future of AI hardware and systems. And how can we have the most performance systems in our fleet?

我们当前的数据中心设计在能源和电力使用效率方面是世界级的。这些设计在多个服务器、存储和网络的代际中都给了我们很大的支持，同时也能很好地满足当前的人工智能工作负载。但是，我们一直在考虑未来人工智能硬件和系统的发展，如何拥有最高性能的系统，以便使我们的整个团队更加出色。

And to do this, we really had to redesign and rethink our data centers. And this is to support the future all the way from, you know, how do we look at the building design to power and cooling, and how does network fit into the data center. So, for example, as we look towards the future, we see a future in which the AI chips are expected to consume more than 5X the power of our typical CPU servers.

为了实现这一点，我们必须重新设计和重新思考我们的数据中心。这是为了支持未来，从我们如何看待建筑设计到电力和冷却，以及网络如何适应数据中心。例如，当我们展望未来时，我们预见到AI芯片预计将比我们典型的CPU服务器消耗多5倍的功率。

And this has really caused us to rethink the cooling of the data center and really provide liquid cooling to the chips in order to provide this level of power. And so for that, we had to, of course, redesign the system. But I'm really excited about what the future holds in our new data center designs as we look forward. And as we do these things, we're always looking forward and adjusting some right back to that question of like, how do we adjust for the change before the change happens?

这真的让我们重新思考数据中心的冷却方法，并为芯片提供液冷，在电力水平方面得到保证。因此，我们必须重新设计系统。但是，我非常兴奋地期待我们新的数据中心设计将带来的未来。而且在做这些事情时，我们总是保持前瞻性，并根据变化之前的变化来调整，回到那个问题，我们如何适应变化之前的变化？

And Alexis, I know you've done a bunch of that with your team that you lead. Yeah, absolutely. And just building off of everything that Rachel just shared, I think it should be very clear that we're designing end-to-end systems that really start from the physical side, from the data center all the way through the network that connects our global fleet. All the way into the data center to our AI training clusters and our distributed inference serving platforms.

而且，Alexis，我知道你与你领导的团队已经完成了很多工作。是的，绝对的。就基于Rachel刚分享的所有内容，我们应该非常清楚地知道，我们正在设计从物理层开始的端到端系统，从数据中心一直到连接我们全球业务的网络，最终进入我们的人工智能训练集群和分布式推断服务平台的数据中心。

Meta has a tremendous amount of experience in delivering distributed compute at scale. And some of the pivots that we had to take when we started building out our AI infrastructure a number of years ago was really shifting some of the thinking to deliver more highly integrated, tightly coupled systems to train our workloads. And then to do the inference serving. So not only do we design end-to-end for our AI platforms in the physical side, but also all the way up our software stack working with AI in for working with next generation research workloads, working with our product product group teams to make sure that we are trying that we're achieving that software defined hardware systems that our fleet can handle more heterogeneity that we're able to build the flexibility and capabilities that will service our future workloads. That we don't need to envision today.

Meta在传递大规模分布式计算方面具有丰富的经验。在我们几年前开始建立AI基础设施时，我们必须采取一些关键转变，以提供更高度集成、紧密耦合的系统来训练我们的工作负载，同时进行推理服务。因此，我们不仅在物理方面为我们的AI平台进行端到端设计，还在软件堆栈上与AI一起工作，以应对下一代研究工作负载的需求，以及与产品组团队合作，确保我们努力实现软件定义硬件系统，以使我们的船队能够处理更多的异构性和构建灵活性和功能，以服务我们未来的工作负载，而这些负载今天无法预见。

Yeah, and you mentioned networking hardware software. And apparently we talk a lot about these shifts and being able to be nimble but at least and plan at the same time. So how do you manage that? You're able to plan ahead, but still adjust as the market changes.

是的，你提到了网络硬件和软件。显然，我们经常谈论这些变化，并且要能够灵活应对，同时也需要计划。那么你是如何管理这些的呢？你能够提前计划，又能随着市场变化而做出调整。

One of the things I want to say is we've invested in AI for years as meta. If you really think about it, Newsfeed was launched in 2005. And Newsfeed is really a ranking algorithm. And then as relevance improved to Newsfeed evolved, that's all powered by AI. We've invested in helping our users stay safe with spam detection and with removing hate speech, for example, all of that is powered by AI. And then we have, I think, one of the world's best ads platforms and all of that is powered by AI. So this is really an evolution for us in infra from a very long time ago.

我想表达的其中一件事情是，作为META，我们已经投资人工智能多年。如果你仔细想想，Newsfeed在2005年推出。Newsfeed实际上是一个排名算法。随着Newsfeed的相关性得到提高并不断向前发展，这都是由人工智能驱动的。我们投资于帮助用户保持安全，例如垃圾邮件检测和删除仇恨言论，所有这些都是由人工智能驱动的。然后，我们拥有世界上最好的广告平台之一，所有这些都由人工智能驱动。所以对我们来说，这确实是一个基础设施的演变，从很长时间以前开始。

And what's shifting now, I think, is the pace of innovation is really rapidly increasing. Right. Model architectures are changing. Model sizes are changing. We're seeing really complex models that are evolving data volumes are growing. And so, you know, thinking about the software evolution that we've had. We've done a few different things. I mean, organizationally, we've now consolidated all of our software into the AI in for organizations. So we're really well positioned to handle all of these new changes and shifts. We've created whole new software stacks, PyTorch, the leading framework of choice for all machine learning development. It was built by Beth Metter and obviously open source to now a spot of a foundation. We've invested in entire new training stacks and we've invested in inference, which is sort of like our ability to serve these models at scale.

目前发生的变化是，我认为创新速度正在迅速增加。模型架构正在改变。模型大小也在改变。我们看到越来越复杂的模型和日益增长的数据量。考虑到我们已经经历的软件演化，我们已经做了一些不同的事情。在组织上，我们现在已将所有软件整合到了AI四个组织中。因此，我们非常适应这些新的变化和转变。我们创建了全新的软件堆栈，PyTorch，它是所有机器学习开发的首选框架。它是由Beth Metter构建的，并开源给现在的基金会。我们已经投资了全新的培训堆栈，并投资了推理，也就是我们能够在规模上提供这些模型的能力。

And can I know your next step was mentioned by Alexis and you're always looking forward for many, many years ahead of some of us. How do you manage that from like a research and foundational perspective? Yeah, I mean, in addition to all of the amazing product use cases that we just heard about, we're also evolving our infrastructure to deal with the research use case. And what's great about this is that it gives us this sort of preview of like what's potentially to come. And we can kind of evolve both the hardware infrastructure, the compute, the networking, the storage in anticipation of sort of what's coming.

Alexis提到了你的下一步是什么，并且你总是在期待比我们更久远的未来。从研究和基础的角度来看，你如何管理这种期待呢？除了我们听到的所有惊人的产品用例之外，我们还在不断发展基础设施来应对研究用例。这么做的好处在于它让我们有了一种预览未来的感觉。我们可以预见未来的发展，并在计算机、网络和存储等方面不断进化。

Like as we've evolved from recommender systems into more recently generative AI, then you know, we were able to anticipate, okay, what does this mean from an infrastructure perspective? What are the, how are the demands different? What bottlenecks might we run into? And sort of how can we evolve our designs? And not just at the hardware level, at the software level as well.

就像我们从推荐系统逐渐进化到最近的生成式人工智能一样，我们能够预见到，从基础设施的角度来看，这意味着什么？需求有何不同？我们可能会遇到哪些瓶颈？以及我们该如何发展我们的设计？不仅仅是在硬件级别上，软件级别也是一样。

Because we also get a preview of like the usability pain points that people will have. And so, you know, Aparna had mentioned PyTorch. Like so PyTorch originated out of the research organization, generally just to solve a problem. Like the problem was, you know, I want to be able to focus on the model itself. I don't want to have to worry about the infrastructure.

因为我们也能预测人们可能遇到的可用性问题。所以，Aparna提到了PyTorch。PyTorch的起源是研究组织，通常是用来解决问题的。比如问题是，我想专注于模型本身，不想担心基础架构。

So let me just quickly build some infrastructure that lets me focus on what I want to be focusing on. So basically, how do I hide the complexity and not have to spend too much time and energy to be able to get what, what, what I need.

让我快速建立一些基础设施，让我专注于我想要专注的事情。基本上，如何隐藏复杂性而不必花费太多时间和精力就能获得我需要的东西。

And you all mentioned AI transformations. More recently, Gen.A.I. I think nothing has captured the hearts and minds of people as much as Gen.A.I. and been a transformation. As we think about generative AI, and you mentioned the research side, we couldn't have done the product use cases without considering that research.

你们都提到了人工智能的转变。最近，Gen.A.I. 无疑是抓住了人们的心灵，并进行了一次革命性的变革。当我们思考生成式人工智能时，你们也提到了研究方面。我们在考虑产品使用案例时，也必须考虑到这些研究。

And how is infrastructure impacted when we make this massive change to generative AI? Alexis, do you want to talk a little bit about generative AI and how it's impacted your team? Absolutely. You know, the first thing that I could say is scale, peer scale. You know, when you, we've been delivering, we're on our third purpose built AI system with our internal hardware designs today.

当我们采用生成式人工智能时，基础设施会受到什么影响？Alexis，你能谈谈生成式人工智能对你的团队的影响吗？绝对可以。首先要说的是规模，非常大的规模。当我们向内部硬件设计提供第三个专门为人工智能打造的系统时，我们已经在交付了。

And as we look to the future, it's the generative AI workloads and needs require much, or the models are much more complex. They require much larger scales of GPUs. As an example, I mean, are, you know, whereas traditional AI workloads may be run on tens or hundreds of GPUs at a time, the generative AI workloads are being run on thousands, if not more. So the actual requirements have changed as well.

随着我们展望未来，生成性人工智能的工作负载和需求要求更多，或者说模型更加复杂。它们需要更大规模的GPU。举个例子，传统人工智能工作负载可能同时运行在十几个或者几百个GPU上，但生成性人工智能工作负载正在运行在数千个，甚至更多个GPU上。因此，实际要求也发生了变化。

Our recommender models were traditionally memory or network bound. Generative AI is incredibly compute intensive. So the compute density needs to increase. So what does this mean for us when we build? We are building much larger clusters. We need to be able to shard our workloads across many different GPUs. We're thinking about how do we optimize these, you know, power consumption is also rising.

我们传统的推荐模型通常受限于内存或网络资源。生成式人工智能的计算量非常大，所以计算密度需要提高。那么，这对我们构建模型有什么影响呢？我们需要构建更大的集群，并能够在许多不同的GPU上分配工作负载。我们正在思考如何优化这些工作负载，同时考虑电力消耗不断增加的问题。

So we've done a lot of work in advance for looking to figure out how do we cool these systems. So, but how do we cool them, you know, continually into the future? These are also incredibly capital intensive systems. So when we think about that, that's one of the main reasons we have the opportunity to innovate and to end. And that's one of the reasons we kicked off many years ago, our internal custom silicon development.

我们事先做了很多工作，用于寻找如何冷却这些系统。但是，未来我们如何持续地冷却它们呢？这些系统也需要巨额的资金投入。因此，这是我们创新和完善的主要原因之一。这也是多年前我们启动内部定制硅开发的原因之一。

So that we could optimize our specific clusters, our specific infrastructure to meet the efficiency and the both the performance, power and capital efficiency required for the evolution of AI workloads. And, apparently, we've been talking a long time about the product experiences way before I think actually generative AI picked up.

为了优化我们的特定集群和基础架构，以满足人工智能工作负载的效率、性能、功率和资本效益需求，而我们显然在讨论生成式人工智能出现之前就已经长时间探讨过产品体验。

So how do you manage planning across all these products, because it's not one size fits all? I mean, generative AI has been an interesting evolution. And I know Alexis talked a bunch about the systems impact and the latest center impact. I'll maybe mention a couple of areas on the software side.

那么，你如何管理所有产品的规划，因为它不是一刀切的呢？我的意思是，生成式人工智能是一个有趣的发展阶段。我知道Alexis曾经谈到了系统影响和最新的中心影响。我可能会提到一些软件方面的领域。

First for training, these jobs run for weeks and months, not hours. And so fundamentally, the way that we think about check pointing, the way that we think about scheduling, fault tolerance, being able to find capacity that's contiguous, all of this, I think, are new challenges. That surface, because of the nature of generative AI.

首先是为了训练，这些工作需要数周甚至数月的时间，而不是数小时。因此，我们在考虑核查点、调度、容错性、连续容量查找等方面需要重新思考，这些都是由于生成型人工智能的本质而带来的新挑战。

Fortunately, in Africa, we have just an incredible core infratime that's been working on large scale problems for so long. So we're able to sort of pull from that expertise to be able to just build on that to be able to support these really large jobs. And it's a totally different ballgame running this in production versus smaller versions of this and research and to do it predictably, I think, is really hard.

在非洲，我们有一支令人难以置信的核心基础设施团队，他们长期致力于解决大规模的问题。因此，我们可以从他们的专业知识中汲取经验，以支持这些真正大的项目。在生产环境下运行这种项目完全是一回事，与较小版本以及研究项目不同，而要做到可预测是非常困难的。

The other thing I'll say is on the inference side, we're finding that geniei requests, like land up being about a thousand times more expensive than, you know, ranking or recommendation request and just the nature of the models themselves. And so there's just an incredible opportunity.

我想说的另一件事是，就预测来说，我们发现，像geniei请求这样的请求，比起排名或推荐请求，要贵上一千倍，这主要是因为模型本身的性质。所以说这是一个难得的机会。

We have, I think, one of the best teams in the industry working on all of the performance optimizations here, everything from new runtimes, changing the languages of our runtimes to be more performant. We're working very closely with the Lexus team and your team, Irene, and co designing these models so that we can build more performance into these models themselves. And we're working on really innovative creative ways of serving these geniei v i models with, you know, different tiers for different context windows and things like that. So just very exciting all around.

我认为我们拥有业内最优秀的团队来进行性能优化，包括新的运行时、改变运行时的语言以提高性能等所有方面。我们正在与Lexus团队和你们的团队密切合作，共同设计这些模型，以便将更多性能融入到这些模型本身中。我们正在研究创新的、有创意的方法，来满足不同上下文窗口的不同层次需求，非常激动人心。

One of our differentiators is our open science philosophy. So when you think about open source and a lot of that comes from your team, how do you think about it? Yeah, I'm really proud of the work that we're doing with open source.

我们的差异化特点之一是我们的开放科学理念。因此，当您考虑开源时，其中许多都来自您的团队，您会如何考虑？是的，我非常自豪我们在开源方面的工作。意思：我们的开放科学理念是我们的独特之处之一。因此，在考虑开源时，我们的团队非常重要。我们认为开源对我们很重要，并且引以为豪我们在这方面所做的工作。

I mean, the research teams have open source, Salama are geniei v i models and not just the models, but also the weights. And that's just unprecedented in the industry. And I think people are finding that really valuable in the community. We've open source pie torch, which is the leading machine learning framework for all model development. And recently we also handed over pie torch to a pie torch foundation under the Linux foundation to enable an ecosystem.

我的意思是，研究团队已经开源了Salama智能模型，不仅仅是模型，还包括权重。这在行业中是前所未有的，我认为人们在社区中发现这非常有价值。我们还开源了领先的机器学习框架pytorch，用于所有模型开发。最近，我们还将pytorch移交给了一个在Linux基金会下的pytorch基金会，以创建一个生态系统。

And I think it's really important, right? As research continues to evolve at this crazy pace, you want to be able to have all of the investments be be applicable to many people in the community so we can further a research together. And so we've been able to develop a pie torch 2.0 recently released with just doubling the performance. If you caught punk stock earlier today, she talked about touched dynamo and touched inductor. As part of the pie torch 2.0 release, really exciting technology. So I think overall that matter, we're committed to open source, we're committed to enabling research across the communities and I feel really proud about it.

我认为这非常重要，对吧？随着研究不断以疯狂的速度发展，您希望所有投资都能适用于社区中的许多人，以便我们可以共同推进研究。因此，我们最近发布了Pie Torch 2.0，仅仅将性能提高了一倍。如果您今天早些时候看到了Punk Stock，她谈到了Touch Dynamo和Touch Inductor。作为Pie Torch 2.0版本的一部分，这是一项非常令人兴奋的技术。因此，我认为总的来说，这很重要，我们致力于开源，致力于推动社区间的研究，我为此感到非常自豪。

And I know Kim, your team was very good at predicting the creativity, generative AI trend coming out with some of the launches. What are other areas of AI that you see coming in the future?

我知道Kim，你的团队非常擅长预测创意、生成AI趋势，并推出了一些产品。你认为未来还会涌现哪些AI领域呢？请分享一下你的看法。

Yeah, I mean, for sure the big boom in generative AI has been amazing for the AI research community. It has opened a lot of doors, but it's also exposed a lot of investments that we need to be making as we look forward. First of all, the models themselves are really only as good as the data that is used to train them. And therefore we're going to need fundamental investments in data and data infrastructure and understanding how much data we need and pruning that data. And so that's a real key investment that we need to make and we should make.

当然，生成式人工智能方面的巨大突破对人工智能研究社区来说是令人惊叹的。它为我们打开了很多门，但同时也揭示了我们前进时需要做的许多投资。首先，模型本身的好坏取决于训练它们的数据。因此，我们需要在数据和数据基础设施上做出基本的投资，并理解我们需要多少数据，以及如何剪枝这些数据。这是我们需要做出的关键性投资，我们也应该做出这种投资。

Second of all, we look at generative AI today. There's a lot of singular modality efforts. So basically we have large language models. But going forward, we're really going to need to know how to understand different modalities besides language and then also multiple modalities at once. So images with text, speech with video. So that's going to be an investment that we also need to make.

其次，我们现在看一下生成式人工智能。有许多独立模态的尝试。基本上，我们有大型语言模型。但是在未来，我们真正需要知道的是如何理解除了语言之外的不同模态，并且同时理解多种模态。因此，需要投资的是图像与文本、语音与视频等。

And when we look at generative AI from a use case perspective, it's rare that it's sort of a one shot deal. Usually there's some iterative process to generating and regenerating, prompting and tuning. And so that's going to open up the doors for a lot of tools that we're going to need to build to be able to facilitate that process. And then finally the sky's the limit in terms of the applications that we're going to have and that we're going to see coming out of generative AI. So I'm really excited to see what would happen there.

从使用案例的角度来看，当我们谈论生成式人工智能时，通常不仅涉及到一次性的生成。通常还需要迭代的过程，不断地生成和重建，提示和调整。因此，我们需要构建许多工具来促进这个过程。最后，生成式人工智能的应用潜力是无限的，我们将看到更多的应用出现。因此，我非常期待未来的发展。

So with AI, it's really easy to create silos, right? Whether you're working on networking, development of data centers, software. How do we avoid that? Because the company is big, right? And we're doing a lot of different things.

使用人工智能轻松创建孤立的团队是很容易的，无论你是在网络、数据中心或软件开发领域工作。那么我们如何避免这种情况呢？因为公司很大，我们正在做许多不同的事情。

Rachel, how do you avoid that on your team?

瑞秋，你是如何避免团队出现这种情况的？意思是问瑞秋是如何避免团队中出现问题的。

Yeah, so I've seen quite the opposite in practical experience here. I mean, our teams have always worked closely together. But since going down this, you know, AI in particular over the last many years here, it's really required us to work very, very tightly coupled across teams across the entire company.

是的，根据我的实际经验，我看到了完全相反的情况。我的意思是，我们的团队一直密切合作。但是自从我们走上了这条路，特别是在人工智能方面，在过去的几年里，这确实要求我们在整个公司的团队之间非常紧密地协作。

So anywhere from software to hardware to data center to networking. You name it. We're all moving super fast. And in order for us to really deliver a very integrated stack that has as really reliable, it requires us to work very, very closely.

我们从软件到硬件再到数据中心甚至网络，无论哪个领域，都在高速运转。为了提供一个非常集成且非常可靠的技术堆栈，我们需要非常密切地协同合作。

And the success here is very, very dependent on how we can have a common vision together and how we can execute and continue and execute seamlessly across this space. So it's really been all about that.

在这里的成功非常依赖于我们如何共同拥有一个共同的愿景以及如何在这个领域里无缝执行并持续执行。因此，这真的全部都是关于这个的。

Each and every day, I'm really privileged to work alongside these industry experts as well as as well as many others at the company, which has been great. And it's really enabled to us to move very, very quickly together.

每天都很幸运地与这些行业专家以及公司的其他许多人一起工作，这非常棒。这使我们能够一起非常非常快速地前进。

So I think as we look forward, our opportunity as our challenge as well is to move to continue to execute in a very seamless fashion as we've been doing, but in the face of rapidly changing technology rapidly changing use cases, both for what we're doing today and into the future.

我认为未来我们面临的机遇和挑战是要在技术和用例迅速变化的情况下继续顺利地执行我们一直以来做的事情，无缝接轨现实。这不仅适用于目前的工作，也适用于未来。

Can I add to what Rachel said? This group is so amazing. I get to work with them every day. So I just wanted to shout out to all these incredible partners that I get to work with every day.

我可以接着Rachel说的话吗？这个团队太棒了。我每天都能和他们一起工作。所以我想对我每天都能一起工作的这些不可思议的伙伴们喊一声。

A.I. is one of those, it's interesting. We have to work together to make A.I. successful across the entire stack. But when you talk about silos, it occurred to me that it's not just about silos within the company or within infrastructure. We also really need to work hard to remove silos across the industry. So this group is building the future.

人工智能是一个非常有趣的领域，我们需要共同努力，使人工智能在整个技术栈中取得成功。但是，当我们谈论到业务部门或者基础设施内部的固守陈规，我们也必须要努力消除整个行业中的壁垒。因此，这个小组正在塑造未来。

So as we wrap up this discussion, I want to hear from all of you what you envision for the next 10 years of infrastructure.

在我们结束本次讨论之际，我想听听大家对未来10年基础设施的愿景。

Rachel?

请问你需要翻译什么内容呢？请提供原文。

Sure. Well, from a data center perspective, I'm confident that the data center design that we're talking about, our next generation data center, will be able to support the business for years to come.

当然。从数据中心的角度来看，我相信我们正在谈论的下一代数据中心设计可以支持业务未来数年的发展。

This is because we've really optimized on flexibility across power, cooling, hardware and network, as well as fungibility to support A.I. various workloads and products and services.

这是因为我们在功率、冷却、硬件和网络上真正优化了灵活性，同时支持人工智能各种工作负载、产品和服务的互换性。

So I'm confident about that. But that said, this is a really rapidly evolving space. So we're going to continue to innovate on our design and really continue to think about how we can support the business as needs dictate.

因此，我对此有信心。但是，这是一个快速发展的领域。因此，我们将继续创新设计并继续思考如何根据需求支持业务。

The other thing I want to call out is we look towards the future. Of course, sustainability is really important. Climate change is really important. We're all as an industry working on how do we bend that curve? And so that also is a very core focus for us here at Meta and our infrastructure.

我要提出的另一件事是，我们展望未来。当然，可持续性非常重要。气候变化非常重要。作为一个行业，我们都在努力解决如何扭转这个趋势的问题。因此，这也是我们在Meta和基础设施方面非常核心的关注点。

And so we have our net-zero target by 2030. It's something that we are focused on is getting to net-zero emissions by 2030. And that's across our entire value chain and that impacts the way that we're looking at designing a data centers to reduce our emissions, as well as through our hardware, through our network, and so on and so forth.

因此，我们的零排放目标是到2030年实现的。我们专注于实现到2030年零排放。这涉及到我们整个价值链，影响到我们设计数据中心减少排放的方式，以及通过我们的硬件、网络等方面实现零排放。

So that's going to be a really big focus for our infrastructures. We move forward as well as of course the future of A.I.

因此，这将是我们基础设施的一个非常重要的关注点。我们同时还将着眼于人工智能的未来。

And Alexis?

安德烈克西斯呢？简单表达就是询问对方关于安德烈克西斯的情况或者下落。

Yeah, well, I certainly hope that in 10 years we'll have A.I. systems that will be generating and building designing our next generation silicon and systems. But before that arrives, I definitely were looking at, I believe the next 10 years will see an incredible amount of customization of application-specific and purpose-built compute, compute platforms.

是的，我当然希望在未来10年内，我们将拥有能够生成和设计下一代硅片和系统的AI系统。但在那之前，我相信未来10年将见证大量应用特定和目的构建计算平台的定制化。

I'll see tremendous evolution in the network and high-performance networks at scale. We'll have to innovate and deliver breakthroughs on every area that's a bottleneck today in order to enable the compute systems of the future. So that's not exciting. I'm not really sure what is the Kim.

我相信网络和高性能网络将有巨大的进步。我们需要在当前的瓶颈问题上不断创新和取得突破，以确保未来的计算系统能够正常运行。这是一个非常令人兴奋的领域。我不确定还有什么比这更令人激动的了，Kim。

Sure. So first of all, I'll just say that A.I. is definitely here to stay. Right. So so far we've already seen two waves of A.I.

当然。首先，我要说的是，人工智能肯定是来留下的。对的。到目前为止，我们已经见证了两波人工智能。

Right. We saw the recommender system era and more recently we've seen generative A.I. This is not going to be the last wave. There I envision probably at least two more waves that we're going to see in the next 10 years. And so it's going to be really, really important that we stay ahead of that, understand when the trends are shifting and respond accordingly with our infrastructure to be able to position ourselves to like take full advantage of each of the waves.

对的。我们见证了推荐系统时代，最近我们又看到了生成AI。这不会是最后一波。我预计在接下来的10年里会出现至少两波新的趋势。所以，非常非常重要的是，我们必须跟进这些趋势，理解何时趋势正在转移，并相应地通过我们的基础设施来定位自己，以充分利用每一波新趋势。

So I think it's going to be a really exciting time.

所以我认为这将是一个非常令人兴奋的时刻。

And a part of how do you feel about these waves?

你对这些波浪有什么感受？

I mean, I think A.I. You know, A.I. for infra. So I see, you know, infra wall over the next 10 years. We've already seen this. I mean, I don't know if you caught Michael Bolin's talk today. He talked about code compose. We're working on A.I. Augmented productivity within matter where we're actually helping engineers like right code and augmenting their work and enabling them with with knowledge that would otherwise be a lot of things. So I expect to see A.I. transforming various parts of infra and you know, Alexa's alluded to silicon designs being generated by A.I. And so I do see scheduling systems having really smart algorithms that are powered by A.I. I see very lots of info involving with the I.I. It's going to be exciting.

我是说，我认为人工智能（A.I.）将在基础设施领域发挥作用。你知道，在未来的十年里，我认为我们会看到基础设施的智能化。我们已经看到了这个趋势。我不知道你是否听过迈克尔·博林今天的演讲。他谈到了代码生成（code compose）。我们正在研发人工智能在材料学领域增强生产力，以帮助工程师编写代码，提升他们的工作效率，并提供他们无法获得的知识。因此，我预计看到 A.I. 在基础设施的各个方面产生影响，例如，Alexa提到将使用 A.I. 生成硅芯片设计。我认为调度系统将采用智能算法，由人工智能推动其运作。我预计看到很多信息与人工智能结合，这会令人期待。

And as we wrap up, I know this is the group that keeps our services and products stable for tons of users and also so inspired that you're building the future of what this will be, what A.I. will be in that infrastructure. So really honored to share this stage with all of you. And I know you'll be leading that future in building A.I. And not just meta, but the industry at large. So thank you all for joining us. Thank you so much, Alexis, the partner Rachel Kim and of course, Irene. We're so grateful that you could join us today. That's a powerful group of women.

作为我们结束的时候，我知道这个团队让我们的服务和产品对数百万用户而言保持稳定，并且你们正致力于建设未来，打造A.I.在基础设施中的未来。真的很荣幸能与你们一同分享这个舞台。我知道你们将领导A.I.建设的未来，不仅仅是META公司，而是整个行业。感谢你们所有人的加入。非常感谢分享这个舞台的Alexis、合作伙伴Rachel Kim和当然我所感激的Irene。这是一个强有力的女性团队。

As you heard throughout the day today, folks, we're translating our A.I. and fine investments into new experiences across our whole family faps while ensuring that we have the ability to drive long term research and technical innovation. Over the next decade, what we can expect is the needs for A.I. and training and inference to ramp up pretty dramatically and will need to scale with it. We'll probably see increased specialization and customization in chip design. We will see purpose built and workload specific A.I. and for we'll see new systems and probably new tooling for deployment at scale and improved efficiency in product and design support. All of this will deliver increasingly sophisticated models that are probably built on the latest research, which is going pretty fast these days and products that give people around the whole world access to new and much more relevant experiences in their day to day life.

大家，正如你们在今天一整天所听到的，我们正在将我们的人工智能和良好的投资转化为全家族应用程序的新体验，同时确保我们具备长期的研究和技术创新能力。在接下来的十年里，我们可以预计AI和培训推理的需求将急剧增加，并且需要相应地扩展。我们可能会看到芯片设计方面的增加专业化和定制化。我们将看到专门针对工作量的AI，并且我们将看到可规模化部署的新系统和可能是新的工具，以提高产品和设计支持的效率。所有这些都将提供日益复杂的模型，这些模型可能基于最新的研究，这些研究目前正在快速进行，并为人们提供全世界各地日常生活中新的、更相关的体验的产品。

In the days, the months, the years ahead will continue to update you on our A.I. and for journey. Again, want to thank all of you for joining us today. If you have any questions you'd like us to answer or topics you'd like us to cover in future at scale events or on a meta technical blocks, just visit our website at scaleconference.com or scan the keyword code you have right on your screen. Also, please make sure to check out our next upcoming events. We have systems at scale networking at scale and both are taking place in July. Registration is open right now, I believe, to sign up and to and please join our mailing list for the latest news.

在未来的日子、月份和年份里，我们将继续向您介绍我们的人工智能和旅程的更新情况。再次感谢所有加入我们的人。如果您有任何问题或想要我们在未来的范围活动或元技术块中涉及的话题，请访问我们的网站scaleconference.com，或扫描您屏幕上的关键词代码。此外，请务必查看我们即将举办的下一个活动。我们将在七月份举办系统规模网络和规模，现在可以注册，欢迎加入我们的邮件列表获取最新消息。

I've spent 13 years now here in infram and I've seen how we are not only able to react to the present but also look forward towards the future and make that future a reality. We're always focused on the long term and when I think about the next 5, 10, maybe 15 years, I'm pretty sure when we look back as a point in time, this is going to be a pivotal moment for us as we continue to build on a position as a leading AI company in the next decade and beyond. And an infrastructure, this team, which we're building today at scale will bring a lot of this vision to life. Thanks again for joining us, have a great day. Thank you for joining us. Thank you for joining us. Thank you for joining us. Thank you for joining us. Thank you for joining us. Thank you for joining us.

我已经在Infram度过了13年，我看到我们不仅能够应对目前的情况，还能够展望未来，并让未来成为现实。我们始终专注于长期发展，当我想到未来5、10甚至15年时，我相信当我们回顾这一时刻时，这将是我们作为未来十年及以后领先人工智能公司的关键时刻。而在基础设施方面，我们正在规模化地建立这个团队，将实现大量愿景。再次感谢您的加入，祝您有一个美好的一天。感谢您加入我们。感谢您加入我们。感谢您加入我们。感谢您加入我们。感谢您加入我们。感谢您加入我们。