TPU processor, 16 channels HD video intelligent analysis, 16 channels of full HD video decoding, 10 channels of full HD video encoding
TPU processor, 32 channels HD video intelligent analysis, 32 channels of full HD video decoding, 12 channels of full HD video encoding
RISC-V + ARM intelligent deep learning processor
Based on the RISC-V core, operating at a frequency of 2GHz, the processor features a single SOC with 64 cores and 64MB shared L3 cache.
SRC1-10 is an excellent performance server cluster based on RISC-V arch. It has both computing and storage capabilities, and the full stack of software and hardware is domestically produced.
The RISC-V Fusion Server, supports dual-processor interconnection and enabled intelligent computing acceleration.
SRB1-20 is an excellent performance storage server based on RISC-V arch. It supports CCIX, 128-core concurrent, multi-disk large-capacity secure storage, and the full stack of software and hardware is domestically produced.
SRA1-20 is an excellent performance computing server based on RISC-V arch. It supports CCIX, 128-core concurrent, both software and hardware are open source and controllable.
SRA3-40 is a RISC-V server for high-performance computing, domestic main processor,excellent performance,fusion of intelligent computing, support powerful codec.
SRB3-40 is a high-performance RISC-V storage server with multiple disk slots and large-capacity secure storage.
Intelligent computing server SGM7-40, adapted to mainstream LLM, a single card can run a 70B large language model
SOM1684, BM1684, 16-Channel HD Video Analysis
Core-1684-JD4,BM1684, 16-Channel HD Video Analysis
SBC-6841,BM1684, 16-Channel HD Video Analysis
iCore-1684XQ,BM1684X,32-Channel HD Video Analysis
Core-1684XJD4,BM1684X,32-Channel HD Video Analysis
Shaolin PI SLKY01,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-M,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-M-G,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-W,BM1684, 16-Channel HD Video Analysis
AIV02T,1684*2,Half-Height Half-Length Accelerator Card
AIO-1684JD4,BM1684, 16-Channel HD Video Analysis
AIO-1684XJD4,BM1684X,32-Channel HD Video Analysis
AIO-1684XQ,BM1684X,32-Channel HD Video Analysis
IVP03X,BM1684X,32-Channel HD Video Analysis
IVP03A,Microserver, passive cooling, 12GB RAM
Coeus-3550T,BM1684, 16-Channel HD Video Analysis
EC-1684JD4,BM1684, 16-Channel HD Video Analysis
CSA1-N8S1684,BM1684*8,1U Cluster Server
DZFT-ZDFX,BM1684X,Electronic Seal Analyzer,ARM+DSP architecture
ZNFX-32,BM1684, 16-Channel HD Video Analysis
ZNFX-8,BM1684X,ARM+DSP architecture,Flameproof and Intrinsic Safety Analysis Device
EC-A1684JD4,Microserver with active cooling, 16GB RAM, 32GB eMMC
EC-A1684JD4 FD,BM1684, 16-Channel HD Video Analysis,6GB of RAM, 32GB eMMC
EC-A1684XJD4 FD,BM1684X,32-Channel HD Video Analysis
ECE-S01, BM1684, 16-Channel HD Video Analysis
IOEHM-AIRC01,BM1684,Microserver Active Cooling,16-Channel HD Video Analysis
IOEHM-VCAE01, BM1684, 16-Channel HD Video Analysis
CSA1-N8S1684X,BM1684*8,1U Cluster Server
QY-S1U-16, BM1684, 1U Server
QY-S1U-192, BM1684*12, 1U Cluster Server
QY-S1X-384, BM1684*12, 1U Cluster Server
Deep learning intelligent analysis helps make city management more efficient and precise
Using deep learning video technology to analyze sources of dust generation and dust events, contributing to ecological environmental protection
Using deep learning intelligent analysis to monitor scenarios such as safety production, urban firefighting, and unexpected incidents for emergency regulation.
Using deep learning technology to detect and analyze individuals, vehicles, and security incidents in grassroots governance
Empowering the problems of traffic congestion, driving safety, vehicle violations, and road pollution control
Utilizing domestically developed computational power to support the structured analysis of massive volumes of videos, catering to practical applications in law enforcement
Build a "smart, collaborative, efficient, innovative" gait recognition big data analysis system centered around data
Effectively resolving incidents of objects thrown from height, achieving real-time monitoring of such incidents, pinpointing the location of the thrown object, triggering alerts, and effectively safeguarding the safety of the public from falling objects
Using edge computing architecture to timely and accurately monitor community emergencies and safety hazards
SOPHGO with SOPHON.TEAM ecosystem partners to build a deep learning supervision solution for smart hospitals, enhancing safety management efficiency in hospitals
SOPHGO with SOPHON.TEAM ecosystem partners to build a smart safe campus solution
Using a combination of cloud-edge deep learning methods to address food safety supervision requirements across multiple restaurant establishments, creating a closed-loop supervision system for government and enterprise-level stakeholders
SOPHON's self-developed computing hardware devices, such as SG6/SE5/SE6, equipped with SOPHON.TEAM video analysis algorithms, are used to make industrial safety production become smarter
Combining deep learning, edge computing and other technologies, it has the ability to intelligently identify people, objects, things and their specific behaviors in the refueling area and unloading area. It also automatically detects and captures illegal incidents at gas stations to facilitate effective traceability afterwards and provide data for safety management.
SOPHGO, in collaboration with SOPHON.TEAM and its ecosystem partners, is focusing on three major scene requirements: "Production Safety Supervision," "Comprehensive Park Management," and "Personnel Safety & Behavioral Standard Supervision." Together, they are developing a comprehensive deep learning scenario solution, integrating "algorithm + computing power + platform."
SOPHGO, cooperates with SOPHON.TEAM ecological partners to build a deep learning monitoring solution for safety risks in chemical industry parks
SOPHGO with SOPHON.TEAM ecosystem partners to build a Smart Computing Center solution, establishing a unified management and scheduling cloud-edge collaborative smart computing center
SOPHGO, in collaboration with SOPHON.TEAM ecosystem, have jointly developed a set of hardware leveraging domestically-produced deep learning computational power products. This is based on an AutoML zero-code automated deep learning training platform, enabling rapid and efficient implementation of deep learning engineering solutions
输入格式:大小为 1 X 1 的RGB通道图片张量
输出格式:大小为 1 X 1 的BGR通道图片向量
该算子host端和device端API通信结构体为
typedef struct { unsigned int size; unsigned long long output_addr; unsigned long long input_addr; } __attribute__((packed)) param_t;
从host端的代码分析,size
为图片数量,每张图片的原始通道顺序为RGB,in/output_addr
为Global Memory的数据存放地址,图片数据连续存放,同一个图片的RGB三个通道也是连续存放的。
可以参考host/rgb2bgr.cpp
中的rgb2bgr_reference
函数实现:
static inline void rgb2bgr_reference(float *output, const float *input, const param_t ¶m) { for(unsigned int i = 0; i < param.size; i++) { // 分别对每张图片处理 output[i*3+2] = input[i*3]; // 源图像的R通道保存到目标图像的第三个通道 output[i*3+1] = input[i*3+1];// 源图像的G通道保存到目标图像的第二个通道 output[i*3] = input[i*3+2]; // 源图像的B通道保存到目标图像的第一个通道 } }
这里视输入数据大小为 [1, 1, 3, 1]
以1张图片为例:
不考虑一切特殊情况,使用一个NPU,配合GDMA操作就可以完成该算子的实现,参赛选手只需要将R/G/B
三个通道的数据、从input_addr
搬运到Local Memory中,再从Local Memory搬运到output_addr
即可。
这里视输入数据大小为 [2, 1, 3, 1]
扩展到2张图片,可以有以下思路
参赛选手当然可以按照一张图片的处理逻辑进行处理,但是这样效率较低,如果一次性处理两张图片,那么效率可能就会更高一点。如图所示,若参赛选手精心地设计了GDMA的stride
和shape
参数,那么就可以做到将两张图片 [2, 1, 3, 1] 视为按一定规则排放的一张图片 [1, 1, 3, 2]
,通过自定义
stride
和shape
就可以一次完成两张图片的搬运。但该方法也使得GDMA数据读取效率变低,具体的性能如何参赛选手可以进行尝试和探索。
以上两种方法都可以用于任意图片数量的数据处理。
不难发现前两种思路都基于数据为 [N, 1, 3, 1]的数据格式,而根据文档可以发现BM1684的数据搬运,会按照
维度 C 切分并搬运到不同NPU上,如此设计
shape
浪费了大量NPU空间和性能,那么基于 [1, N, 3, 1]的数据格式,就可以充分利用64个NPU进行数据搬运,在此就不做更详细的描述,若参赛选手完成了思路1的代码,则应该很容易修改为思路3的代码。
在完成基础的算子实现后,参赛选手可以从以下几点考虑改进RGB2BGR算子的实现: