Huawei has introduced the CloudMatrix 384 AI chip cluster, a platform for large-scale AI model training. It uses a high-density network of Ascend 910C processors connected via optical interconnects, and the company says it offers improvements to energy efficiency and the speed of training. The design purports to CloudMatrix 384 outperform GPU-based clusters, although taken individually, Ascend chips don’t match the performance of top-tier GPUs from the Chinese giant’s Western competitors.
Huawei’s AI hardware-software stack positions the company as a challenger to NVIDIA’s market dominance, especially in domestic and allied markets. Despite, or perhaps because of, the sanctions that limit access to American technologies, Huawei is expanding its ecosystem with tools that circumvent dependence on foreign hardware and software, producing its own versions of AI workflows and finished products that compete with the established AI players.
To take advantage of Huawei’s AI infrastructure, data engineers need to modify their workflows, using tools specifically designed for use with Ascend processors. Chief among them is MindSpore, Huawei’s deep learning framework. While any transition will mean the introduction of new tooling, underlying workflows – data ingestion, transformation, and model iteration – remain largely transferable.
Framework transition From PyTorch/TensorFlow to MindSpore
Instead of leveraging CUDA through frameworks like PyTorch and TensorFlow, Huawei’s Ascend processors need MindSpore; tightly integrated with Ascend hardware and supporting dynamic and static graph execution.
Models built in PyTorch or TensorFlow can be converted or retrained using MindSpore, for which Huawei provides MindConverter. This helps convert model definitions, although engineers should recognise that it can require manual adjustment and fine-tuning. Feature parity is not, unfortunately, one-to-one.
MindSpore differs in syntax, operator behaviour, and training pipeline architecture. Default settings for padding in convolutional and pooling layers and weight initialisation methods, for example, might behave quite differently. As ever, it’s in the nuances between the Huawei and CUDA tools and methods that help create training reproducibility.
Using MindIR for model deployment
MindSpore uses MindIR (MindSpore Intermediate Representation) as its model export format, a static graph representation used for cross-platform deployment that’s optimised for Ascend NPUs.
Models trained in MindSpore can be exported via the mindspore.export function, which serialises the trained network, producing output in MindIR format. This makes the model ready for inference.
According to MindSpore’s official documentation, deployment involves loading the MindIR model and invoking inference using APIs which manage model de-serialisation, memory allocation, compute execution, and so on.
It’s worth noting that MindSpore separates training and inference logic, unlike PyTorch or TensorFlow which blur the two. Resultingly, all preprocessing steps should match those used during training. Tools like GraphKernel, AOE (Auto Optimizing Engine), MindSpore Lite, and the Ascend Model Zoo offer additional optimisation for inference deployment.
Adapting to CANN
Huawei’s CANN (Compute Architecture for Neural Networks) is a foundational component of the Ascend AI software stack, and can be considered an analogue to NVIDIA’s CUDA. It consists of:
- The Ascend Compute Library (ACL)
- Ascend Runtime for graph execution
- Development tools for operator tuning, profiling, and debugging
Tools provided within CANN, such as Profiler, MindStudio, and Operator Tuner, are able to fine-tune model performance at runtime, and provide metrics on memory use, kernel-level bottlenecks, and execution flow.
GRAPH_MODE, PYNATIVE_MODE
MindSpore offers two execution modes:
- GRAPH_MODE: Compiles the computation graph in advance, for better performance and hardware utilisation.
- PYNATIVE_MODE: Executes operations immediately, similar to PyTorch’s eager mode, for model prototyping and debugging.
To support both modes, code should avoid complex Python-native control flow methods and use MindSpore’s built-in control operators wherever possible.
Deployment environment: Huawei AI ModelArts
ModelArts is Huawei’s cloud-native AI development and deployment platform for Ascend hardware and MindSpore. It provides a full AI pipeline, similar to AWS’s SageMaker and Google’s Vertex AI.
ModelArts supports:
- Data ingestion, labelling, and preprocessing
- Training job distribution
- Automated model deployment and monitoring
- CI/CD workflows through RESTful APIs or web-based GUI
Transitioning to Huawei’s MindSpore and CANN ecosystem will need some re-skilling for those well-entrenched in CUDA-based tooling.
While Huawei’s ecosystem is maturing, it lacks the open-source community and third-party enjoyed by PyTorch and TensorFlow, so developers can expect gaps in documentation and a degree of limit on library compatibility.
Hardware access is another consideration. Ascend processors are powerful and highly efficient for AI workloads, but their availability is limited outside of areas where Huawei has and is investing. Teams may need to rely on remote access via cloud platforms like ModelArts to run large-scale experiments.
Huawei does, however provide a comprehensive migration guide, some conversion and tuning tools to assist the transition. For teams targeting regions where Huawei infrastructure is readily available, the performance and efficiency gains can be considerable.
(Image source: “Huawei P9” by 405 Mi16 is licensed under CC BY-NC-ND 2.0.)
See also: Apple to open its garden for developers building Watch widgets
Looking to revamp your digital transformation strategy? Learn more about Digital Transformation Week taking place in Amsterdam, California, and London. The comprehensive event is co-located with IoT Tech Expo, AI & Big Data Expo, Cyber Security & Cloud Expo, and other leading events.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.