Skip to content

Using custom Docker containers

By default, algorithms in a datakit run in the python-run-base Docker container, which provides commonly used libraries such as numpy and pandas. If you need a library not included in this default container, you can easily create a custom Docker container to run your algorithm.

Creating a custom Docker container

Let’s say you need to use the scikit-learn library, which isn’t included in python-run-base. To do this, you can create a custom container within the helloworld algorithm configuration directory:

Terminal window
mkdir helloworld/container
touch helloworld/container/Dockerfile

Inside the Dockerfile, write the following container definition:

helloworld/container/Dockerfile
FROM opendatastudio/python-run-base:latest
RUN pip install scikit-learn

This container definition uses python-run-base as the base image and installs scikit-learn on top.

Building your container

Now you can build the custom Docker container by running:

Terminal window
docker build -t mycontainer.

This command creates your new container with the tag mycontainer.

Updating the algorithm configuration

To use this custom container, you’ll need to specify it in your algorithm’s configuration file:

Edit the algorithm.json file to point to your new container:

helloworld/algorithm.json
{
"name": "helloworld",
...
"container": "mycontainer",
"signature": {
"inputs": [
...
],
"outputs": [
...
]
}
}

This tells the datakit to use mycontainer when executing the helloworld algorithm.

Importing our new dependency

Next, confirm that the custom container is working by importing scikit-learn in your algorithm. Open helloworld/algorithm.py and add the import statement:

helloworld/algorithm.py
import sklearn
def main(data, function):
"""Take the average along the long axis with the specified function"""
print(sklearn.__version__)
return {
"result": getattr(data, function)(axis=1).to_frame(name="result"),
}

Running the algorithm

Finally, run your algorithm using the following commands:

Terminal window
dk reset # Reset to ensure we clear any old run configurations
dk init
dk load data data/tabulardata.csv # Our algorithm will fail if we forget to load some input data
dk run

If everything is set up correctly, your algorithm should print the version of scikit-learn that you just installed in your custom container. Congrats - you’ve successfully executed your algorithm in a custom environment.