Developer Preview – EC2 Instances (F1) with Programmable Hardware Have you ever had to decide between a general purpose tool and one built for a very specific purpose? The general purpose tools can be used to solve many different problems, but may not be the best choice for any particular one. Purpose-built tools excel at one task, but you may need to do that particular task infrequently. Computer engineers face this problem when designing architectures and instruction sets, almost always pursuing an approach that delivers good performance across a very wide range of workloads. From time to time, new types of workloads and working conditions emerge that are best addressed by custom hardware. This requires another balancing act: trading off the potential for incredible performance vs. a development life cycle often measured in quarters or years. Enter the FPGA
One of the more interesting routes to a custom, hardware-based solution is known as a Field Programmable Gate Array, or FPGA. In contrast to a purpose-built chip which is designed with a single function in mind and then hard-wired to implement it, an FPGA is more flexible. It can be programmed in the field, after it has been plugged in to a socket on a PC board. Each FPGA includes a fixed, finite number of simple logic gates. Programming an FPGA is “simply” a matter of connecting them up to create the desired logical functions (AND, OR, XOR, and so forth) or storage elements (flip-flops and shift registers). Unlike a CPU which is essentially serial (with a few parallel elements) and has fixed-size instructions and data paths (typically 32 or 64 bit), the FPGA can be programmed to perform many operations in parallel, and the operations themselves can be of almost any width, large or small.
This highly parallelized model is ideal for building custom accelerators to process compute-intensive problems. Properly programmed, an FPGA has the potential to provide a 30x speedup to many types of genomics, seismic analysis, financial risk analysis, big data search, and encryption algorithms and applications. I hope that this sounds awesome and that you are chomping at the bit to use FPGAs to speed up your own applications! There are a few interesting challenges along the way. First, FPGAs have traditionally been a component of a larger, purpose-built system. You cannot simply buy one and plug it in to your desktop. Instead, the route to FPGA-powered solutions has included hardware prototyping, construction of a hardware appliance, mass production, and a lengthy sales & deployment cycle. The lead time can limit the applicability of FPGAs, and also means that Moore’s Law has time to make CPU-based solutions more cost-effective. We think we can do better here! The New F1 Instance
Today we are launching a developer preview of the new F1 instance. In addition to building applications and services for your own use, you will be able to package them up for sale and reuse in AWS Marketplace. Putting it all together, you will be able to avoid all of the capital-intensive and time-consuming steps that were once a prerequisite to the use of FPGA-powered applications, using a business model that is more akin to that used for every other type of software. We are giving you the ability to design your own logic, simulate and verify it using cloud-based tools, and then get it to market in a matter of days. Equipped with Intel Broadwell E5 2686 v4 processors (2.3 GHz base speed, 2.7 GHz Turbo mode on all cores, and 3.0 GHz Turbo mode on one core), up to 976 GiB of memory, up to 4 TB of NVMe SSD storage, and one to eight FPGAs, the F1 instances provide you with plenty of resources to complement your core, FPGA-based logic. The FPGAs are dedicated to the instance and are isolated for use in multi-tenant environments. Here are the specs on the FPGA (remember that there are up to eight of these in a single F1 instance): - Xilinx UltraScale+ VU9P fabricated using a 16 nm process.
- 64 GiB of ECC-protected memory on a 288-bit wide bus (four DDR4 channels).
- Dedicated PCIe x16 interface to the CPU.
- Approximately 2.5 million logic elements.
- Approximately 6,800 Digital Signal Processing (DSP) engines.
- Virtual JTAG interface for debugging.
In instances with more than one FPGA, dedicated PCIe fabric allows the FPGAs to share the same memory address space and to communicate with each other across a PCIe Fabric at up to 12 Gbps in each direction. The FPGAs within an instance share access to a 400 Gbps bidirectional ring for low-latency, high bandwidth communication (you’ll need to define your own protocol in order to make use of this advanced feature). The FPGA Development Process As part of the developer preview we are also making an FPGA developer AMI available. You can launch this AMI on a memory-optimized or compute-optimized instance for development and simulation, and then use an F1 instance for final debugging and testing.
|