Skip to content

Tracking with Git

This guide explains how to track datakit runs in a version control repository using Git.

One of the key advantages of using datakits is the ability to version control your entire analysis process. By tracking these changes with Git, every step of your analysis becomes reproducible and shareable, allowing others with access to your repository to replicate your work.

This is a brief introduction for those who are new to Git to get started with tracking datakit changes.

Installing Git

Before getting started, you’ll need to install Git. Follow the instructions here for your operating system.

Initialising your repository

Once Git is installed, you can initialise your datakit as a Git repository:

Terminal window
cd helloworld-datakit # Navigate to your datakit root folder
git init # Initialise a Git repository

Now we can check the status of your repository:

Terminal window
git status

You should see a list of “untracked files” - these are files Git is not yet monitoring for changes.

Let’s add and commit all files to start tracking them:

Terminal window
git add --all
git commit -m "Initial commit"

Now, all files in your datakit are tracked. You can revert to this state at any time if needed.

Tracking changes

After running an analysis in your datakit, some files will be modified.

For example, if you run the following commands:

Terminal window
dk init
dk load data data/tabulardata.csv
dk run

Git will detect changes in your repository. You can check this by running:

Terminal window
git status

You might see output like this:

On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: datakit.json
Untracked files:
(use "git add <file>..." to include in what will be committed)
multipleruns.run/
no changes added to commit (use "git add" and/or "git commit -a")

Git has noticed changes, but they haven’t been committed yet. To save these changes, use:

Terminal window
git commit -am "Put a description of your run here"

It’s important to commit after each significant run you want to preserve. If you don’t, your changes may not be saved and could be overwritten by subsequent runs.

Publishing to GitHub

To share your datakit or make it available publicly, you can upload your repository to GitHub.

First, create a new repository on GitHub and copy its URL.

Now, link your local repository to the remote GitHub repository:

Terminal window
git remote add origin https://github.com/your-account/your-repository.git

Push your changes to GitHub:

Terminal window
git push origin mian

Your datakit is now published to GitHub and can be accessed by others if your repository is set to public.