AI compilers act as a bridge between frameworks and hardware, achieving the goal of developing code once and reusing various computational chips. Recently, Altran has also open-sourced its self-developed TPU compilation tool—TPU-MLIR (Multi-Level Intermediate Representation). TPU-MLIR is an open-source project that focuses on AI chip TPU compilers. The project provides a complete toolchain that converts pre-trained neural networks of various frameworks into binary files (bmodel) that can be efficiently operated in TPU, to achieve more efficient inference. This course is driven by practical exercises and aims to lead everyone to intuitively understand, practice, and master the SOPHON AI chip TPU compiler framework.
The current TPU-MLIR project has been applied to the latest generation of artificial intelligence chip BM1684X developed by Sophon, which, together with the chip's high-performance ARM core and corresponding SDK, can achieve rapid deployment of deep learning algorithms. The course content will cover the basic syntax of MLIR and implementation details of various optimization operations in the compiler, such as graph optimization, int8 quantization, operator splitting, and address allocation.
Compared to other compilation tools, TPU-MLIR has the following advantages:
1. Simple and convenient
Users can quickly get started by reading the development manual and included examples to understand the model conversion process and principles. TPU-MLIR is designed based on the current mainstream compiler tool library MLIR, and users can also use it to learn the application of MLIR. The project has provided a complete toolchain, and users can directly complete the model conversion work quickly through the existing interface, without the need to adapt to different networks themselves.
TPU-MLIR currently supports TFLite and ONNX formats, and these two formats of models can be directly converted into bmodels that TPU can use. What if it is not one of these two formats? In fact, ONNX provides a set of conversion tools that can convert models written in mainstream deep learning frameworks on the market to ONNX format, and then continue to convert them into bmodel.
3. Precision and efficiency coexist
During the model conversion process, there may be precision loss. TPU-MLIR supports INT8 symmetric and asymmetric quantization, which greatly improves performance while combining the original development company's Calibration and Tune technologies to ensure high precision of the model. Not only that, TPU-MLIR also uses a large number of graph optimization and operator splitting optimization technologies to ensure efficient operation of the model.
4. Achieving ultimate cost-effectiveness and creating the next generation of AI compilers
To support GPU computing, each operator in the neural network model needs to develop a GPU version; to adapt to TPU, each operator should have a TPU version. In addition, some scenarios require adapting products of different models of the same computational chip, and each time they need to be manually compiled, which will be very time-consuming. AI compilers aim to solve the above problems. TPU-MLIR's series of automatic optimization tools can save a lot of manual optimization time, enabling models developed on the CPU to be smoothly and free of charge ported to TPU to obtain the best performance and price ratio.
5. Comprehensive information
The course includes Chinese and English video teaching, document guidance, code scripts, etc., with abundant video materials, detailed application guidance, and clear code scripts. TPU-MLIR stands on the shoulders of MLIR giants to create it, and now all the code of the entire project has been open-sourced and made available to all users for free.
Code Download Link: https://github.com/sophgo/tpu-mlir
TPU-MLIR Development Reference Manual: https://tpumlir.org/docs/developer_manual/01_introduction.html
The Overall Design Ideas Paper: https://arxiv.org/abs/2210.15016