Tracking with Git
This guide explains how to track datakit runs in a version control repository using Git.
One of the key advantages of using datakits is the ability to version control your entire analysis process. By tracking these changes with Git, every step of your analysis becomes reproducible and shareable, allowing others with access to your repository to replicate your work.
This is a brief introduction for those who are new to Git to get started with tracking datakit changes.
Installing Git
Before getting started, you’ll need to install Git. Follow the instructions here for your operating system.
Initialising your repository
Once Git is installed, you can initialise your datakit as a Git repository:
cd helloworld-datakit # Navigate to your datakit root foldergit init # Initialise a Git repository
Now we can check the status of your repository:
git status
You should see a list of “untracked files” - these are files Git is not yet monitoring for changes.
Let’s add and commit all files to start tracking them:
git add --allgit commit -m "Initial commit"
Now, all files in your datakit are tracked. You can revert to this state at any time if needed.
Tracking changes
After running an analysis in your datakit, some files will be modified.
For example, if you run the following commands:
dk initdk load data data/tabulardata.csvdk run
Git will detect changes in your repository. You can check this by running:
git status
You might see output like this:
On branch mainChanges not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: datakit.json
Untracked files: (use "git add <file>..." to include in what will be committed) multipleruns.run/
no changes added to commit (use "git add" and/or "git commit -a")
Git has noticed changes, but they haven’t been committed yet. To save these changes, use:
git commit -am "Put a description of your run here"
It’s important to commit after each significant run you want to preserve. If you don’t, your changes may not be saved and could be overwritten by subsequent runs.
Publishing to GitHub
To share your datakit or make it available publicly, you can upload your repository to GitHub.
First, create a new repository on GitHub and copy its URL.
Now, link your local repository to the remote GitHub repository:
git remote add origin https://github.com/your-account/your-repository.git
Push your changes to GitHub:
git push origin mian
Your datakit is now published to GitHub and can be accessed by others if your repository is set to public.