
Share
Creating dev environments in 20 seconds with some Fabio and Mongo Atlas -blog
August 01, 2021
In a previous post, we wrote about our way of providing full-featured, isolated development environments in just several minutes. The gist was, that we provided a CloudFormation template to provision an environment via Travis no commit, and then ran an Ansible playbook to provision the instance. ALB Host-based routing was our way of providing an endpoint for those environments.
That was.. nice, but flawed. There’s no reason to go into detail, since the solution was specific to our system. Suffice to say, we wanted to achieve even faster, more dynamic provisioning of development environments, scheduled by a single provisioning mechanism, to reduce complexity.
Since the workflow itself is what makes our new solution flexible, fast and very, very simple to maintain, we will focus on it, though some examples are provided.
What enabled us to improve were several changes we made to our infrastructure.
- We moved from our own, self-managed Mongo cluster to Mongo Atlas.
- We implemented full service-discovery (Consul), container scheduling (Nomad) and dynamic load-balancing (Fabio).
- We wrote a well-structured CLI-based workflow to aid us.
- We started building our container images on CircleCI and pushing them to ECR.
The Workflow
The Build Process
We build our container images on CircleCI, and push them directly to ECR. We have a semi-generic .circleci/config.yml
for all services:
version: 2 jobs: build: docker: - image: circleci/python:latest environment: - AWS_DEFAULT_REGION: us-something-1 - SERVICE_NAME: my-service - SERVICE_REPO: XXXX.dkr.ecr.eu-west-1.amazonaws.com/my-servicesteps: - checkout - setup_remote_docker - run: docker build -t ${SERVICE_NAME} . - deploy: command: | sudo pip install awscli `aws ecr get-login --no-include-email` # Extract the proper docker tag (branch or git tag) DOCKER_TAG="tag-missing" if [ -z "$CIRCLE_TAG" ]; then echo "Taking tag from CIRCLE_BRANCH: ${CIRCLE_BRANCH}" DOCKER_TAG="${CIRCLE_BRANCH}" else echo "Taking tag from CIRCLE_TAG: ${CIRCLE_TAG}" DOCKER_TAG="${CIRCLE_TAG}" fi DOCKER_FULL_TAG=${SERVICE_REPO}:ref-${DOCKER_TAG} echo Tagging image ${DOCKER_FULL_TAG} docker tag ${SERVICE_NAME} ${DOCKER_FULL_TAG} docker push ${DOCKER_FULL_TAG}...
Every push creates a docker image tag (or updates it). If a branch is used, a tag with the name of the branch is created, so branches can also be deployed if they’re provided as SERVICE_VERSION
parameters (The ref-
prefix isn’t that interesting, except that it’s required to apply retention policies on ECR images.)
The deployment-description workflow we now use is as follows:

Deploying a new environment
- We run
strigo dev deploy ENV_NAME SERVICE1_NAME=SERVICE1_VERSION ...
(defaulting to master for all services if versions are not provided — which is how we create our master dev env). - The CLI then ssh-copies a consolidated, templated job to a random nomad server, and runs the job. The
ENV_NAME
is then used both for the job’s name and to create a database in a Mongo Atlas dev-specific cluster under the same name. (We have a simple ssh-tunneling context manager implementation, and thought about using it to deploy the job directly via Nomad’s API, which is something we might do in the future. This will allow us to make blocking CLI calls to the Nomad API to also verify that the job succeeded.) - The
ENV_NAME
is also used as a variable in the job template so that the different applications can use it. Versions are also propagated as variables into the jobs, and correspond with container tags kept in ECR. - The job runs, and the different containers are then spread across a pre-created application-server cluster. Since utilization is very low, we can run many jobs in a relatively small cluster (~10–20 concurrent envs on ~4 medium instances.)
- Nomad registers the services in Consul, and adds a tag that Fabio can later read. Nomad will add a consul tag for a service which answers to the required endpoint, e.g.
urlprefix-ENV_NAME-SERVICE_NAME.dev-env.strigo.io
- Since all services in Consul now contain that tag, Fabio will read them, and start load-balancing traffic between them.
- The ALB, instead of load-balancing between application servers, can now load-balance between Fabio servers, based on the host-header. The only thing that changes in the endpoint URLs for
ENV_NAME
. - The env is then available at ~
https://ENV_NAME.dev-env.strigo.io
.
Our dev-env.nomad
job template looks something like this:
job "{{ ENV_NAME }}" { datacenters = ["dc1"] type = "service"{% include 'dev-SERVICE1_NAME.nomad' %} {% include 'dev-SERVICE2_NAME.nomad' %} ...}
Each service group is included
in the template, and so it’s very easy to add a new service. Ideally, we’d include our production service jobs, but.. you know, reality. Also, if for some reason you want to run a partial, all you have to do is remove the relevant include
, and that service will not be deployed. This allows to create dynamic deployment logic and allow the CLI to abstract the way services are chosen for specific environments. So, for instance, strigo dev deploy ENV_NAME --core
can deploy only core services as part of the env, etc..
Listing Environments
Since environments are nomad jobs, we easily list them by returning the output of a nomad status
on our Nomad dev cluster.
$ strigo dev list Choosing nomad server... Using 172.16.39.129 Listing environments... Retrieving job status... run: nomad status out: ID Type Status Submit Date out: feat1 service running 06/17/18 14:34:43 UTC out: master service running 06/18/18 11:34:05 UTC out: feat2 service running 06/17/18 20:34:55 UTC out: hotfix service running 06/03/18 17:53:33 UTC ...
Updating Environments
All we need to do to update an environment is to again run strigo dev deploy ENV_NAME ...
. Database creation is idempotent, so the env will use the database created when we first created the environment.
Tip: We add an environment variable to each nomad service job called DUMMY_VAR
, with a current timestamp value generated by the CLI and passed into it. This is done so that Nomad will identify there’s a change to the job, and redownload the image and deploy it.
Destroying Environments
Since a development environment is, in its entirety, a Nomad job and a database, all we do is run strigo dev stop ENV_NAME
. This runs a nomad stop ENV_NAME
and drops the database from Mongo Atlas, obliterating the environment completely! Fun.
Working with A DBaaS
Working with Mongo Atlas provides us with the ability to duplicate our main dev database using a simple API call. Since Mongo Atlas also provides a great cluster monitoring interface (built into its web interface), we can monitor how our cluster behaves in dev quite easily (obviously, dev != prod
, but at least we can monitor how relative changes to our code change the database’s behavior on some level.)
Win
This setup is a ~replica of our production environment. Grafana and Prometheus, Telegraf, Redash, etc.. all run in the dev environment and are as easily upgradable as any other service in the system. While it takes time to set this entire thing up, the benefits we reap are immeasurable. Running any development environment takes more or less 20s (also thanks to the fact that downloading even large Docker images from ECR to an EC2 instance of the same region is done in very little time. you can also optimize for alpine images to reduce image size by a large margin, but please, only do that if you know what you’re doing).
Providing variables via a CLI allows us to be as dynamic as we need in templating our job, and so even specific service versions can be passed when we wish to create an environment containing any set of services. We can also pass a container instance count via the CLI for specific services, so running a very-large-cluster test for a specific service (or a set of services), requires little to zero work.
An additional helper in scaling and managing these envs (not directly related to this flow), is the fact that all that’s required to add more instances to our Nomad’s app-server cluster is changing an instance_count
variable in Terraform and tf apply-ing
. More on that in a future post :)
If you have any feedback on this workflow, we’d appreciate comments. Additionally, it would be great to hear about the workflows you use to manage your development environments, for the benefit of all.