mlx.core.cuda module provides access to CUDA-specific functionality for running MLX on NVIDIA GPUs. This enables MLX to run on Linux and Windows systems with NVIDIA hardware.
Overview
The CUDA backend allows MLX to leverage NVIDIA GPUs for computation. When available, it provides:- High-performance computation on NVIDIA GPUs
- Multi-GPU support via the NCCL distributed backend
- Cross-platform compatibility (Linux, Windows)
Functions
is_available
True if MLX was compiled with CUDA support and a compatible NVIDIA GPU is detected, False otherwise.
Returns:
Trueif CUDA is available,Falseotherwise
Usage
Basic CUDA Check
Conditional Code Paths
Distributed Training Setup
Installation
Building MLX with CUDA Support
To use the CUDA backend, MLX must be built with CUDA support:Requirements
- NVIDIA GPU with compute capability 7.0 or higher
- CUDA Toolkit 11.0 or later
- Compatible NVIDIA drivers
- Linux or Windows operating system
Verifying Installation
Environment Variables
CUDA_VISIBLE_DEVICES
Control which GPUs are visible to MLX:MLX_DISABLE_CUDA
Disable CUDA even if available:Distributed Training with NCCL
When CUDA is available, MLX can use NCCL for efficient multi-GPU communication:Comparing Metal and CUDA
| Feature | Metal | CUDA |
|---|---|---|
| Platform | macOS only | Linux, Windows |
| Hardware | Apple Silicon, AMD GPUs | NVIDIA GPUs |
| Unified Memory | Yes (Apple Silicon) | No |
| Multi-GPU | JACCL (Thunderbolt) | NCCL (NVLink, PCIe) |
| Debugging | Xcode Metal Debugger | NVIDIA Nsight |
| Performance | Optimized for Apple | Optimized for NVIDIA |
Troubleshooting
CUDA Not Detected
Performance Issues
Performance Tips
- Use batch processing: Larger batches better utilize GPU parallelism
- Enable cuDNN: Ensure cuDNN is installed for optimized convolutions
- Monitor GPU utilization: Use
nvidia-smito check if GPU is fully utilized - Use NCCL for multi-GPU: Much faster than other backends for NVIDIA GPUs
- Pin memory: Reduce CPU-GPU transfer overhead (future MLX feature)
See Also
- Metal Backend - Metal-specific functionality for Apple devices
- Distributed Communication - Multi-GPU and multi-node training
- NCCL Backend - Distributed backend for CUDA
- Installation Guide - Building MLX with CUDA