Overview of Qualifications
- Great organization, communication and multitasking skills
- Excellent troubleshooting and technical support abilities
- Team-oriented and strong independent worker
- Very fast learner, keeps up-to-date. Experienced and self-taught.
- Extremely fluent with Python, Golang and shell; comfortable with Node.js and Ruby
- Linux as a hobby as well as career
- Meticulous documentation
Software & Skills
- EC2 / RDS
- ECS / EKS
- Route53
- SQS
- VPC
- ...etc
- Berkshelf
- Custom Cookbooks
- Test-Kitchen
- Consul
- MongoDB
- MySQL
- Redis
- Zookeeper
- git
- nvm
- rvm
- Vagrant
- Virtualenv
| - Markdown
- Confluence
- Wikitext
- Graphviz Dot
- Sensu
- Riemann
- Seyren
- Datadog
- Nagios
| - ag / grep
- GNU Coreutils
- Make
- tmux / zsh
- vim
|
Hardware
- Server Provisioning
- Maintenance
- Upgrades
- Monitoring
- Troubleshooting
Systems
- Alpine
- Arch
- Ubuntu / Debian
- Centos / Red Hat
- sysvinit (including custom runlevel jobs)
- systemd (including custom units)
- Server 2008, 2012
- Desktop: 7 & 10
Professional Experience
2021 - Present: Senior DevOps Engineer - Proofpoint, Inc
- Migrated encrypted secrets from Ansible git repository to AWS ParameterStore
- Maintained and performed maintenance on AWS EKS clusters across environments
- Authored reusable and composable Terraform modules and published them to the company internal registry.
Previous
2017 - 2021: Senior Devops Engineer - Nomis Solutions
- Wrote Names API, a Golang microservice designed to automatically generate standardize tags, DNS records, and Terraform resource names
- Automated deployment of applications using Terraform, Python, Fabric and Docker
- Automated infrastructure provisioning with Terraform, including fully operational replicating MongoDB clusters, as part of a MEAN stack.
- Overhauled the manually maintained monitoring system from Nagios XI to Datadog.
- Wrote services-scaler, a Python microservice designed to control the size of auto scaling workers based on RabbitMQ queue length.
- Wrote NomisPy, a generic and reusable Python library for internal organization purposes.
2015 - 2017: Senior Infrastructure Engineer– Forever, Inc
- Wrote datamon - a Python daemon that monitors the production Postgres instance, system metrics, and log entries to spot potential database issues.
- Wrote qwatch, a Golang daemon that monitors AWS SQS queues and ships high resolution metrics to Librato.
- Wrote Provisioner - a cross-platform bootstrapping program designed to provision Windows / Linux machines with packages, and execute the initial configuration management run.
- Expeditiously tracked down root causes for production issues. Vigilant about triaging, fixing, and escalating off-hour alerts.
- Performed database migrations and upgrades at the lowest traffic times to ensure little-to-no customer impacting downtime.
- Was solely responsible for designing, maintaining, and monitoring the infrastructure at all times.
2013 - 2015: DevOps Engineer– BrandingBrand
- Wrote MISHAP - a Golang daemon that reads application routing information from redis and writes HAProxy configuration. It provides real-time (1s resolution) routing of moving, containerized applications on cloud hosts with no dropped connections.
- Wrote Dockermon - a Python monitoring / stat-collecting daemon for Docker that ships per-container metrics to Riemann and Graphite.
- Built and architected the Marathon / Mesos / Zookeeper fault-tolerant infrastructure. Maintained and monitored the infrastructure and performed many zero-downtime upgrades on the production environment using the blue / green deployment approach.
- Wrote commands and plugins for the internal developer CLI tool to simplify the development process. One of such tools, “stage-me”, provides a method for a developer to spin up an entire environment for an application with a single command (not unlike Heroku push).
- Rewrote all of the existing infrastructure cookbooks, stripping out cruft and decreasing the total system provisioning time from 40 minutes down to 5.
- Well-versed in configuration management software and source control (Chef, puppet, git, etc).
- Responsible for the uptime, maintenance, management and upgrades of hundreds of servers. Dedicated to keeping the company’s 99.9% uptime SLA.
2012 - 2013: Site Reliability Engineer – Livestream
- Developed scripts, services, and tools to control, monitor, configure, and manage a hybrid physical / cloud infrastructure.
- Visualized data through Graphite to trace down hard to find issues and correlated trends across the network.
- Experienced in building (physical and virtual) boxes and provisioning them with custom images via PXE.
- Extremely comfortable managing and maintaining a wide variety of differing servers, equipment, and systems in the on-site datacenter