DevOps: Sprint 5 (January 27-February 9, 2021)

Note: We did not record a formal sprint review because the nature of the work accomplished is not something that translates well to a demo.

What was the goal of Sprint 5?
Sprint 5 was scheduled as a discrete sprint when we were off workcycle. We wanted to get both the dev and ops folks together across the campuses for knowledge sharing and high level discussion for project wide operational and process issues.

We were aware of some looming production stability issues in the campus’s Kubernetes deployments, and wanted to get a shared understanding of how to tackle them as a project. This was less about fixing any one issue, and more about building some background for cross-campus handling of operational issues.

What is the milestone that this sprint is supporting?
Since this sprint was scheduled outside the normal work cycles, we didn’t attach it to any high level milestone or project. We wanted this to be a free-standing work period. The GitLab milestone that tracked the work is https://gitlab.com/surfliner/surfliner/-/milestones/58

Accomplishments of Sprint 5:

  • Resolved Starlight Helm chart Kubernetes Node affinity and Persistent Volume Claim issues by:
    - Moving Sitemap hosting to S3-compatible bucket for Starlight Helm chart
    - Moving image hosting to S3-compatible bucket for Starlight Helm chart
    - This involved getting some Surfliner-level object storage infrastructure in place. At least for now this is targeted at development (docker-compose) and review/test/staging applications (helm). We’re unsure about whether this is ever going to advance to a production-ready service (probably not), but there’s a lot of value in some shared tooling around object storage.
  • Reviewed and upgraded our gitlab runner deployments as a group, and agreed to maintain and monitor them going forward.
  • UCSD and UCSB admins were able to get together for a meeting and review UCSD’s kubernetes cluster and management process. We hope this will be the first of ongoing meetings like this to learn from each other and find opportunities to push work into upstream Surfliner whenever possible.
  • Identified and documented issues with our dependency tooling (RenovateBot) which quietly disabled service on GitLab in the late fall/early winter.

Did we do everything we set out to accomplish?
No we did not. However, we knew during Sprint Planning that we had more DevOps work to do than we likely had time to accomplish during the Sprint. We also discovered work during the Sprint that we wanted to try and accomplish, such as addressing the fact that our RenovateBot dependency management service was disabled recently. However, we did have several positive meetings with knowledge sharing and relationship building between our DevOps team members that we believe will serve us well heading into future Surfliner Sprints.

What’s next?
The team still needs to determine what shared monitoring solution(s) to support for Surfliner Products deployed into Kubernetes environments. We expect dialog and work in this context to carry forward. We will also need to address the RenovateBot dependency issue soon, and UCSD will be continuing work on this during the remainder of the off-cycle.

GitLab link: https://gitlab.com/surfliner/surfliner/-/milestones/58