I’m really glad to annouce the release of MXNet Scala Package, which brings the very flexible and efficient deep learning framework to JVM.

With the Scala API, now you are able to integrate MXNet into your JVM stacks. Think about constructing state-of-art deep learning models in Scala, Java and other languages built on JVM, and applying them to tasks such as image classification and data science challenges. Besides, Scala/Java codes for tensor/matrix computation on multiple GPUs will be seemless and really easy to write.

This is not Zootopia, but try everything you like!

Build

Checkout the Installation Guide contains instructions to install mxnet. Then you can compile the Scala Package by

make scalapkg

Run unit/integration tests by

make scalatest

Easy, huh? If everything goes well, you will find a jar file named like mxnet_2.10-osx-x86_64-0.1-SNAPSHOT-full.jar under assembly/target. You can use this jar in your own project.

Currently we support Linux and OSX with Java 1.6+. We’ll soon deploy the jars to Maven Repository and try to make it available on Windows.

Usage

Here I give an example of how to train a 3-layer MLP on MNIST with Scala.

Model Definition

The model definition is straightforward, almost the same as the one in Python/R/Juila package:

import ml.dmlc.mxnet._

// model definition
val data = Symbol.Variable("data")
val fc1 = Symbol.FullyConnected(name = "fc1")(Map("data" -> data, "num_hidden" -> 128))
val act1 = Symbol.Activation(name = "relu1")(Map("data" -> fc1, "act_type" -> "relu"))
val fc2 = Symbol.FullyConnected(name = "fc2")(Map("data" -> act1, "num_hidden" -> 64))
val act2 = Symbol.Activation(name = "relu2")(Map("data" -> fc2, "act_type" -> "relu"))
val fc3 = Symbol.FullyConnected(name = "fc3")(Map("data" -> act2, "num_hidden" -> 10))
val mlp = Symbol.SoftmaxOutput(name = "sm")(Map("data" -> fc3))

You are able to construct other models with Scala. For example, Lenet, a more deeper network:

val data = Symbol.Variable("data")
// first conv
val conv1 = Symbol.Convolution()(Map("data" -> data, "kernel" -> "(5, 5)", "num_filter" -> 20))
val tanh1 = Symbol.Activation()(Map("data" -> conv1, "act_type" -> "tanh"))
val pool1 = Symbol.Pooling()(Map("data" -> tanh1, "pool_type" -> "max",
                                 "kernel" -> "(2, 2)", "stride" -> "(2, 2)"))
// second conv
val conv2 = Symbol.Convolution()(Map("data" -> pool1, "kernel" -> "(5, 5)", "num_filter" -> 50))
val tanh2 = Symbol.Activation()(Map("data" -> conv2, "act_type" -> "tanh"))
val pool2 = Symbol.Pooling()(Map("data" -> tanh2, "pool_type" -> "max",
                                 "kernel" -> "(2, 2)", "stride" -> "(2, 2)"))
// first fullc
val flatten = Symbol.Flatten()(Map("data" -> pool2))
val fc1 = Symbol.FullyConnected()(Map("data" -> flatten, "num_hidden" -> 500))
val tanh3 = Symbol.Activation()(Map("data" -> fc1, "act_type" -> "tanh"))
// second fullc
val fc2 = Symbol.FullyConnected()(Map("data" -> tanh3, "num_hidden" -> 10))
// loss
val lenet = Symbol.SoftmaxOutput(name = "softmax")(Map("data" -> fc2))

Of course it’ll cost more time to train and test with deeper models.

Dataset

Now load the training data through IO module. Most of time, you need a piece of training data and a piece of validation data as well. Suppose that you have already downloaded and unpacked the MNIST dataset:

val trainDataIter = IO.MNISTIter(Map(
  "image" -> "data/train-images-idx3-ubyte",
  "label" -> "data/train-labels-idx1-ubyte",
  "data_shape" -> "(1, 28, 28)",
  "label_name" -> "sm_label",
  "batch_size" -> batchSize.toString,
  "shuffle" -> "1",
  "flat" -> "0",
  "silent" -> "0",
  "seed" -> "10"))

val valDataIter = IO.MNISTIter(Map(
  "image" -> "data/t10k-images-idx3-ubyte",
  "label" -> "data/t10k-labels-idx1-ubyte",
  "data_shape" -> "(1, 28, 28)",
  "label_name" -> "sm_label",
  "batch_size" -> batchSize.toString,
  "shuffle" -> "1",
  "flat" -> "0", "silent" -> "0"))

Training and Prediction

Here we go! Choose proper parameters and use the Builder API to construct a trained model:

import ml.dmlc.mxnet.optimizer.SGD
// setup model and fit the training set
val model = FeedForward.newBuilder(mlp)
      .setContext(Context.cpu())
      .setNumEpoch(10)
      .setOptimizer(new SGD(learningRate = 0.1f, momentum = 0.9f, wd = 0.0001f))
      .setTrainData(trainDataIter)
      .setEvalData(valDataIter)
      .build()

It shouldn’t take too long for ‘shallow’ models like MLP. You can further do prediction using this model:

val probArrays = model.predict(valDataIter)
// in this case, we do not have multiple outputs
require(probArrays.length == 1)
val prob = probArrays(0)
// get predicted labels
val py = NDArray.argmaxChannel(prob)
// deal with predicted labels py

Here’s a list of MNIST training benchmark we made on EC2 g2.8xlarge instance with scala-package:

Model	Devices	KVStore	Epochs	Train Acc	Test Acc	Avg samples/sec
MLP	1 cpu	local	10	0.9906	0.9740	7419.98
MLP	1 gpu	local	10	0.9925	0.9770	18572.50
Lenet	1 cpu	local	3	0.9916	0.9870	717.38
Lenet	1 gpu	local	3	0.9912	0.9867	1457.29
Lenet	4 cpus	local	10	0.9997	0.9920	1338.40
Lenet	4 gpus	local	10	0.9998	0.9909	7882.14
Lenet	4 gpus	local_allreduce	10	0.9998	0.9914	13696.00

If you meet any problem, feel free to open an issure. Also we are looking for contributors to help us further improve the MXNet Scala Package. Any pull request will be highly appreciated!