I have been struggling with learning to do effective submissions to kaggle. One of the things that was holding me back was that I found I had to completely rebuild my python environment between my laptop and my desktop at home. Another problem involved RAM and general low power considerations. I often thought that I might try loading the kaggle data into a database, and would struggle with installing Postgres or Mysql, and then discover that SQL had nothing in common between the two. I was trying to find something I could try to learn more about Docker, why not try Docker to set up a consistent environment? After many trials and missteps, and much searching, I found an Docker image that seemed a good starting point here: wiseio data science docker. After quite a number of rebuilds and tests, I came up with this Dockerfile:
Next, I need to create a consistent environment, and choose a container for my database experiments. I settled on the alpine version of the standard Postgres image, for the excellent opensource rep of Postgres, and its unparalleled ability to cause me frustration when I attempt to write SQL for it. To create this I wrote a docker-compose.yml:
Park the docker-compose.yml file in a directory with kaggle sort of disk space available, create a directory ./data and ./pgdata and you should be good to go.
A handy bit to know for creating databases in the Postgres image, and checking that data has actually been saved:
$ docker exec -it yourdir_db_1 sh
/ # psql -U postgres
postgres=# CREATE DATABASE demo;
.... and so on for all your favorite SQL