Overview of Qualifications
- Great organization, communication and multitasking skills
- Excellent troubleshooting and technical support abilities
- Team-oriented and strong independent worker
- Very fast learner, keeps up-to-date. Experienced and self-taught.
- Extremely fluent with Python, Golang and shell; comfortable with Node.js and Ruby
- 10+ years of professional DevOps experience. Linux as a hobby as well as a career.
- Meticulous documentation
Software & Skills
- EC2 / RDS
- ECS / EKS
- Route53
- SQS
- VPC
- ...etc
- Berkshelf
- Custom Cookbooks
- Test-Kitchen
- Consul
- MongoDB
- MySQL
- Redis
- Zookeeper
- git
- nvm / rvm
- Vagrant
- virtualenv
| - Confluence
- Graphviz Dot
- Markdown
- Mermaid
- Wikitext
- collectd
- node-exporter
- Statsd
- SNMP
- Datadog
- Nagios
- Prometheus
- Riemann
- Sensu
- Seyren
- dnsmasq
- iptables
- netcat
- tcpdump
- wireguard
| - ag / grep
- GNU Coreutils
- Make
- tmux / zsh
- vim / nvim / lvim
- Graphite
- Grafana
- InfluxDB
- Prometheus
|
Hardware
- Server Provisioning
- Maintenance
- Upgrades
- Monitoring
- Troubleshooting
Systems
Professional Experience
2021 - Present: Senior DevOps Engineer - Proofpoint, Inc
- Authored python CLI tool: vaultcli to migrate encrypted secret files from Ansible git repository to AWS Parameter Store.
- Responsible for AWS EKS cluster performance, uptime and maintenance.
- Authored lambda lifecycle hook to drain target groups on node scale down. The result of this was a 100x reduction in 5XX HTTP errors during scaling events.
- Authored reusable and composable Terraform modules and published them to the company internal registry. These modules are widely used across teams.
- Responsible for multiple teams’ infrastructure and deployment needs.
Previous
2017 - 2021: Senior Devops Engineer - Nomis Solutions
- Wrote Names API, a Golang microservice designed to automatically generate standardize tags, DNS records, and Terraform resource names
- Automated deployment of applications using Terraform, Python, Fabric and Docker
- Automated infrastructure provisioning with Terraform, including fully operational replicating MongoDB clusters, as part of a MEAN stack.
- Overhauled the manually maintained monitoring system from Nagios XI to Datadog.
- Wrote services-scaler, a Python microservice designed to control the size of auto scaling workers based on RabbitMQ queue length.
- Wrote NomisPy, a generic and reusable Python library for internal organization purposes.
2015 - 2017: Senior Infrastructure Engineer– Forever, Inc
- Wrote datamon - a Python daemon that monitors the production Postgres instance, system metrics, and log entries to spot potential database issues.
- Wrote qwatch, a Golang daemon that monitors AWS SQS queues and ships high resolution metrics to Librato.
- Wrote Provisioner - a cross-platform bootstrapping program designed to provision Windows / Linux machines with packages, and execute the initial configuration management run.
- Expeditiously tracked down root causes for production issues. Vigilant about triaging, fixing, and escalating off-hour alerts.
- Performed database migrations and upgrades at the lowest traffic times to ensure little-to-no customer impacting downtime.
- Was solely responsible for designing, maintaining, and monitoring the infrastructure at all times.
2013 - 2015: DevOps Engineer– BrandingBrand
- Wrote MISHAP - a Golang daemon that reads application routing information from redis and writes HAProxy configuration. It provides real-time (1s resolution) routing of moving, containerized applications on cloud hosts with no dropped connections.
- Wrote Dockermon - a Python monitoring / stat-collecting daemon for Docker that ships per-container metrics to Riemann and Graphite.
- Built and architected the Marathon / Mesos / Zookeeper fault-tolerant infrastructure. Maintained and monitored the infrastructure and performed many zero-downtime upgrades on the production environment using the blue / green deployment approach.
- Wrote commands and plugins for the internal developer CLI tool to simplify the development process. One of such tools, “stage-me”, provides a method for a developer to spin up an entire environment for an application with a single command (similar to Heroku push).
- Rewrote all of the existing infrastructure cookbooks, stripping out cruft and decreasing the total system provisioning time from 40 minutes down to 5.
- Well-versed in configuration management software and source control (Chef, puppet, git, etc).
- Responsible for the uptime, maintenance, management and upgrades of hundreds of servers. Dedicated to keeping the company’s 99.9% uptime SLA.
2012 - 2013: Site Reliability Engineer – Livestream
- Developed scripts, services, and tools to control, monitor, configure, and manage a hybrid physical / cloud infrastructure.
- Visualized data through Graphite to trace down hard to find issues and correlated trends across the network.
- Experienced in building (physical and virtual) boxes and provisioning them with custom images via PXE.
- Extremely comfortable managing and maintaining a wide variety of differing servers, equipment, and systems in the on-site datacenter