Using custom Docker containers
By default, algorithms in a datakit run in the python-run-base Docker container,
which provides commonly used libraries such as numpy
and pandas
. If you need
a library not included in this default container, you can easily create a custom
Docker container to run your algorithm.
Creating a custom Docker container
Let’s say you need to use the scikit-learn library, which isn’t included in
python-run-base. To do this, you can create a custom container within the
helloworld
algorithm configuration directory:
mkdir helloworld/containertouch helloworld/container/Dockerfile
Inside the Dockerfile, write the following container definition:
FROM opendatastudio/python-run-base:latest
RUN pip install scikit-learn
This container definition uses python-run-base
as the base image and installs
scikit-learn
on top.
Building your container
Now you can build the custom Docker container by running:
docker build -t mycontainer.
This command creates your new container with the tag mycontainer
.
Updating the algorithm configuration
To use this custom container, you’ll need to specify it in your algorithm’s configuration file:
Edit the algorithm.json file to point to your new container:
{ "name": "helloworld", ... "container": "mycontainer", "signature": { "inputs": [ ... ], "outputs": [ ... ] }}
This tells the datakit to use mycontainer
when executing the helloworld
algorithm.
Importing our new dependency
Next, confirm that the custom container is working by importing scikit-learn
in your algorithm. Open helloworld/algorithm.py
and add the import statement:
import sklearn
def main(data, function): """Take the average along the long axis with the specified function""" print(sklearn.__version__) return { "result": getattr(data, function)(axis=1).to_frame(name="result"), }
Running the algorithm
Finally, run your algorithm using the following commands:
dk reset # Reset to ensure we clear any old run configurationsdk initdk load data data/tabulardata.csv # Our algorithm will fail if we forget to load some input datadk run
If everything is set up correctly, your algorithm should print the version of
scikit-learn
that you just installed in your custom container. Congrats -
you’ve successfully executed your algorithm in a custom environment.