Artificial intelligence is no longer limited to servers or the cloud. With ONNX Runtime on Android, you can bring high-performance AI inference directly to mobile devices. Whether you’re building smart camera apps, real-time translation tools, or health monitoring software, ONNX Runtime helps you run models fast and efficiently on Android.
In this guide, we’ll break down everything you need to know about ONNX Runtime on Android — what it is, why it matters, and how to get started with practical code examples.
What is ONNX Runtime?
ONNX Runtime is a cross-platform, high-performance engine for running machine learning models in the Open Neural Network Exchange (ONNX) format. It’s optimized for speed and efficiency, supporting models trained in frameworks like PyTorch, TensorFlow, and scikit-learn.
Why Use ONNX Runtime on Android?
- Speed: Optimized inference using hardware accelerators (like NNAPI).
- Portability: Train your model once, run it anywhere — desktop, cloud, or mobile.
- Flexibility: Supports multiple execution providers, including CPU, GPU, and NNAPI.
- Open Source: ONNX Runtime is backed by Microsoft and a large open-source community.
Setting Up ONNX Runtime on Android
Getting started with ONNX Runtime on Android is simple. Here’s how to set it up step by step.
1. Add ONNX Runtime to Your Android Project
First, update your project’s build.gradle file to include ONNX Runtime dependencies.
dependencies {
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.17.0'
}Replace 1.17.0 with the latest version available on Maven Central.
2. Add the ONNX Model to Assets
Place your .onnx model file in the src/main/assets directory of your Android project. This allows your app to load it at runtime.
3. Android Permissions
No special permissions are required just to run inference with ONNX Runtime on Android, unless your app needs access to the camera, storage, or other hardware.
Loading and Running ONNX Model on Android
Here’s a minimal but complete example of how to load a model and run inference.
import ai.onnxruntime.*
fun runInference(context: Context, inputData: FloatArray): FloatArray {
val ortEnv = OrtEnvironment.getEnvironment()
val modelBytes = context.assets.open("model.onnx").readBytes()
val session = ortEnv.createSession(modelBytes)
val shape = longArrayOf(1, inputData.size.toLong())
val inputTensor = OnnxTensor.createTensor(ortEnv, inputData, shape)
session.use {
ortEnv.use {
inputTensor.use {
val inputName = session.inputNames.iterator().next()
val results = session.run(mapOf(inputName to inputTensor))
val outputTensor = results[0].value as Array<FloatArray>
return outputTensor[0]
}
}
}
}- Create Environment: Initialize ONNX Runtime environment.
- Load Model: Read the
.onnxfile from assets. - Create Session: Set up an inference session.
- Prepare Input Tensor: Wrap input data into an ONNX tensor.
- Run Inference: Call the model with input data and fetch the output.
This is all done locally on the device — no internet connection required.
Optimizing Performance with NNAPI
ONNX Runtime on Android supports Android’s Neural Networks API (NNAPI), which can accelerate inference using hardware like DSPs, GPUs, or NPUs.
To enable NNAPI:
val sessionOptions = OrtSession.SessionOptions()
sessionOptions.addNnapi()
val session = ortEnv.createSession(modelBytes, sessionOptions)This simple addition can significantly reduce inference time, especially on modern Android devices with dedicated AI hardware.
Best Practices for ONNX Runtime on Android
- Quantize Models: Use quantization (e.g., int8) to reduce model size and improve speed.
- Use Async Threads: Run inference off the main thread to keep your UI responsive.
- Profile Performance: Measure inference time using
SystemClock.elapsedRealtime(). - Update Regularly: Keep ONNX Runtime updated for the latest performance improvements.
Common Use Cases
Here are some practical examples of where ONNX Runtime on Android shines:
- Real-Time Object Detection: Fast image recognition in camera apps.
- Voice Commands: Low-latency speech recognition on-device.
- Health Monitoring: Analyze sensor data in real-time.
- Smart Assistants: Natural language processing without cloud dependency.
Conclusion
ONNX Runtime on Android offers developers a straightforward way to integrate AI inference into mobile apps without sacrificing speed or battery life. With cross-platform compatibility, hardware acceleration, and a simple API, it’s a top choice for running machine learning models on Android.
If you’re serious about building AI-powered apps, ONNX Runtime on Android is your best bet for fast, efficient, and reliable inference.
