This week, HotChips 33 takes place, a large fair in which the main semiconductor manufacturers in the world reveal some of the news that are being prepared for the coming months. AMD was one of the companies to demonstrate novelties, with the unveiling of more details of the 3D V-Cache, a technology that stacks caches on processors to achieve gains of up to 15% in performance.

Samsung is also among the advertisers, having revealed more information about its DDR5 RAM module with 512GB of capacity and performance up to 40% higher than DDR4 modules. This Wednesday (25), the South Korean giant brought new details of yet another weighty technology: Processing-in-Memory (Memory Processing), or PIM, which promises to speed up AI processing and reduce the consumption of meaningful way

PIM, Aquabolt XL and the goal of replacing HBM2 memories

PIM technology was announced by Samsung in February 2021 with a mission to revolutionize the way Artificial Intelligence workloads are processed. For this, a kind of miniprocessor, the Programmable Computing Unit (PCU), or Programmable Processing Unit in free translation, is applied in each of the chips’ memory banks, totaling 32 PCUs per die.

Running at 300 MHz, these PCUs are capable of performing FP16 calculations and would deliver up to 1.2 TFLOPs of processing, curiously increasing system performance by up to 2 times, while reducing consumption by up to 70%, taking as based on the HBM2 Aquabolt modules already sold by Samsung.

Samsung’s PIM technology applies PCUs to perform processing directly from memories (Image: Press Release/Samsung)

One of the most interesting points is that no adaptation would need to be made for servers and computers to be compatible with PIM modules, as these components use industry standard commands and can communicate with any traditional memory controller certified by JEDEC, the agency responsible for standardizing microelectronic technologies.

In today’s presentation (25), Samsung demonstrated in practice the benefits of Aquabolt XL, the name by which the company’s HBM2 memory module, equipped with PIM technology, will be commercially called. The company replaced its traditional HBM2 modules with the Aquabolt XL in a Xilinx Alveo FPGA, an integrated circuit board that allows the user to configure the functions to be performed.

Just by replacing HBM2 memories with HBM2-PIM in an FPGA, Samsung was able to increase performance by 2.5x and reduce consumption by 62% (Image: Press Release/Samsung)

Without any changes to the board, except the addition of new memory, the South Korean giant reported 2.5x performance gains, with a 62% reduction in power consumption. The expectation is that future optimizations made by CPU manufacturers to improve this scenario even more — in fact, one of the giants is already working on it, but it hasn’t been announced so far.

As far as we know, possible partners that could already be dealing with optimizations for the use of the Aquabolt XL would include Intel in its Sapphire Rapids processors, AMD with the EPYC Genoa family or even ARM, with the Neoverse chip, considering that all use HBM2 memory for processing acceleration.

Aquabolt XL, commercial name of the HBM2-PIM memory, is compatible with current systems, which should facilitate its implementation (Image: Press Release/Samsung)

Unfortunately, a downside to adopting PCUs in memory chips is the reduction in total capacity by half — an 8GB die, for example, would be reduced to just 4GB using PIM technology. However, Samsung has already confirmed that it is working on this limitation and intends to eliminate it with the HBM3 modules.

AXDIMM and PIM technology in memories for GPUs, notebooks and cell phones

During the announcement of PIM technology in February, Samsung said it had no plans to bring the new to ordinary consumers anytime soon, as the idea was to speed up data processing from servers and data centers. This has just changed with today’s announcement that PIM can also be applied to more traditional format memory such as DDR4, GDDR6 and LPDDR5X.

These memories are common in desktops, graphics cards and notebooks and smartphones, respectively, and also had versions with embedded processing demonstrated by the company. The first one arrived in the form of the AXDIMM, a memory module that can replace DDR4 memories in the LRDIMM and UDIMM formats of any common server.

The AXDIMM module is equivalent to traditional DDR4 modules, but delivers the benefits of PIM technology (Image: Press Release/Samsung)

Like the Aquabolt XL, the AXDIMM also supports FP16 processing using codes in TensorFlow and Python, with the promise that more languages ​​will be supported in the future.

To showcase the capacity of the new module, Samsung ran Facebook’s AI workloads, achieving 1.8 times better performance, 42.6% reduction in power consumption and 70% reduction in latency. The numbers are impressive when we consider that no changes were made to the tested system, and there is still room for improvement with future optimizations.

Samsung is also working on LPDDR5X-PIM memories, which should speed up AI in cell phones and notebooks in the future (Image: Press Release/Samsung)

Another test conducted by the company theorized LPDDR5X modules with PIM in simulations, reinforcing the possibility that the technology could become popular even among ordinary consumers. Clocked at 6400 MHz, the simulated memory promises to deliver 2.3 times the performance in speech recognition, up to 2.4 times improvements in text generation by the IA GPT-2 and more, consuming up to 4 times less.

Finally, the South Korean giant discussed the intention of applying the processing to GDDR6 memories, although it has not demonstrated the application at the moment. With the advancement of advanced AI features in graphics cards, such as Nvidia’s DLSS and Intel’s recent XeSS, there is plenty of scope to significantly improve the performance of GPUs in games and other programs that take advantage of Artificial Intelligence.

HBM3 under development without loss of capacity

Samsung’s latest big announcement concerns HBM3 memories, next generation high-speed, high-bandwidth memories designed for heavy data processing. Details on the specifications of these new modules, which are still under development, have not yet been released, but the manufacturer has confirmed two important new features for the new format.

The South Korean giant is still working on HBM3-PIM memories, which will eliminate current limitations and improve consumption and speed (Image: Press Release/Samsung)

The first is that, with the arrival of the new generation, the capacity reduction limitation of PIM technology will be eliminated, that is, adopting processing in new memories will no longer reduce capacity by half. The other news refers to the calculation format that will be performed in the HBM3 modules — the FP16 SIMD is out to make way for the FP64, with greater precision.

Source: Samsung, WCCFTech, Tom’s Hardware (1, 2)