New User-Friendly System Empowers Developers to Build Smarter Simulations and AI Models

Neural network artificial intelligence models used in applications such as medical imaging and speech recognition perform operations on highly complex data structures that require huge amounts of computation to process, which is one of the reasons why deep learning models consume large amounts of energy.

To improve the efficiency of AI models, MIT researchers have developed an automated system that enables developers of deep learning algorithms to simultaneously leverage two types of redundant data, reducing the amount of computation, bandwidth, and memory storage required for machine learning operations.

Current techniques for optimizing algorithms can be complex and often only allow developers to exploit sparsity or symmetry, two different types of redundancy that exist in deep learning data structures.

By allowing developers to build algorithms from scratch that exploit both types of redundancy simultaneously, the MIT researchers’ method sped up calculations by about 30 times in some experiments.

The system uses a user-friendly programming language that allows machine learning algorithms to be optimized for different applications. The system may also be useful for scientists who are not experts in deep learning but want to improve the efficiency of AI algorithms used for data processing. Additionally, the system may also have applications in scientific computing.

“For a long time, obtaining this redundant data required a lot of deployment work. Instead, scientists can tell the system what they want to compute in a more abstract way, without telling it exactly how to compute it,” said Willow Arends, a postdoctoral researcher at MIT and co-author of a paper on the system to be presented at the International Symposium on Code Generation and Optimization.

The research also involved lead author Radha Patel (Class of ’23, Class of ’24) and senior author Saman Amarasinghe, a professor in the Department of Electrical Engineering and Computer Science (EECS) and principal investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Mathematics omitted

In machine learning, data is often represented and manipulated as multidimensional arrays called tensors. A tensor is like a matrix: a rectangular array of values ​​arranged on two axes, rows and columns. However, unlike a two-dimensional matrix, a tensor can have multiple dimensions, or axes, which makes manipulating tensors more difficult.

Deep learning models perform tensor operations through repeated matrix multiplication and addition. This process allows neural networks to learn complex patterns in data. The large number of calculations that must be performed on these multidimensional data structures requires enormous amounts of computation and energy.

But the way data is arranged in tensors often allows engineers to cut down on redundant calculations and speed up neural networks.

For example, if a tensor represents user review data for an e-commerce website, most of the values ​​in that tensor are likely to be zero because not every user reviews every product. This type of redundant data is called sparse. Your model can save time and computation by storing and manipulating only the non-zero values.

Tensors can also be symmetric, meaning that the top and bottom parts of the data structure are equal. In this case, the model only needs to process half, reducing the amount of computation. This type of data redundancy is called symmetric.

“But if you try to optimize both of those things, things get very complicated,” Ahrens says.

To simplify this process, she and her colleagues built a new compiler — a computer program that translates complex code into a simpler language that machines can process. Their compiler, called SySTeC, can automatically exploit both the sparsity and symmetry of tensors to optimize computations.

They began the process of building SySTeC by identifying three key optimizations that could be achieved by exploiting symmetry.

First, if the output tensor of an algorithm is symmetric, the algorithm only needs to compute half of that tensor. Second, if the input tensor is symmetric, the algorithm only needs to read half of that tensor. Finally, if the intermediate results of a tensor operation are symmetric, the algorithm can skip redundant operations.

Simultaneous Optimization

To use SySTeC, developers import their programs and the system automatically optimizes the code for all three types of symmetry. The second phase of SySTeC performs additional transformations to store only non-zero data values, optimizing the program for sparsity.

Finally, SySTeC generates code that is ready to use.

“This way we get the benefits of both optimization techniques,” Ahrens says. “The great thing about symmetry is that it increases the dimensionality of the tensors, which allows us to save even more computations.”

The researchers demonstrated a speedup of about 30x using code automatically generated by SySTeC.

Because the system is automated, it is particularly useful if scientists want to process data using algorithms they have written themselves.

In the future, the researchers hope to integrate SySTeC into existing sparse tensor compilation systems to create a seamless user interface, and also use it to optimize code for more complex programs.

The research was funded in part by Intel, the National Science Foundation, the Defense Advanced Research Projects Agency and the Department of Energy.

Để lại một bình luận

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *