SOPHON SDK is the SOPHGO’s proprietary SDK based on SOPHGO Deep learning processor, with its powerful tools, you can deploy the application in the runtime environment, and deliver the maximum inference throughput and efficiency.

Full-Stack solution for Deep Learning Development & Deployment

SOPHON SDK is composed of compiler and runtime.

It is responsible for optimizing and converting various deep neural network models (such as caffemodel), fully balancing the EU operation and memory access time, improving the parallelism of operations, and finally converting it to the bmodel model supported by SOPHGO TPU.
It is responsible for driving the TPU processor and providing a unified programmable interface for the upper-layer application program, so that the program can perform neural network inference through the bmodel model, and the user does not need to care about the implementation details of the underlying hardware.
Full-Stack solution

There are two device drive modes, PCIE and SOC, developers have more choices.

Combined with the Deep learning processor independently developed by SOPHGO, it provides the largest inference throughput and the simplest application deployment environment.

Provide runtime library programming interface for manipulating the underlying computing resources, users can conduct in-depth development.

The runtime library provides concurrent processing capabilities and supports multi-process and multi-thread modes.

Product Function


SOPHON SDK has two kinds of compilation. For the layer that TPU support, you can use the MLIR to compile and deploy. For the layer that TPU can’t support currently, you can extend the compiler by MLIR programming interface, use the TPU Kernel programming interface or RISC-V instructions to add custom network layer, enable users to compile a non-public network.


We provide developers with docker image for development, which integrated the tools and libraries required for SOPHON SDK, developers can use it to develop the deep learning application.


The compiled network and the application can be deployed through Runtime after integrated. In the deployed process, you can use the Runtime inference engine API interface for programming.