NGUYỄN VIỆT HƯNG
SRE/DEVOPS ENGINEER
Mobile: +84982090290
Email: hvn@familug.org
GitHub: https://github.com/hvnsweeting/
Blog (Vietnamese): https://familug.github.io/
LinkedIn: https://www.linkedin.com/in/hvnsweeting
I’m a Cloud Software Engineer/DevOps Engineer, I’ve been programming in Python for 10+ years and I use programming to solve dev/ops problems at scale. My expertise is AWS cost optimization, I helped to save Grab more than $100M/1.5 year. I have written a technical blog since 2010, taught Python for people with no tech background and transformed them to developers since 2015. I organize SaltStack/Python meetups in Vietnam. I have worked remotely since 2013.
MAIN TECHNICAL SKILLS
Language: Python, Go, SQL, Bash, Elixir, Rust
Framework: Flask+SQLAlchemy
Database: Postgresql, MySQL, Redis, Elasticsearch, DynamoDB
Data analysis: pandas, matplotlib, airflow, plotly
Cloud: AWS, Azure, DigitalOcean, OpenStack
Infrastructure as Code: Terraform & TFE
Container: Docker, Kubernetes (EKS)
Configuration management: SaltStack, Ansible
CI: Jenkins, GitLabCI, CircleCI, TravisCI, ConcourseCI
OS: Ubuntu/Debian, Amazon Linux, MacOS, OpenBSD, ArchLinux, Windows
SCM: git, GitLab, GitHub
Editor: vim
Data Messaging/Streaming: Kafka, AWS SQS
Monitoring: VictoriaMetrics, Prometheus, DataDog, Graphite & Diamond, Grafana & InfluxDB, PagerDuty, Shinken
Logging: ELK, Scalyr, Graylog2
Security: OWASP
EDUCATION/LANGUAGES
- 2013 | Engineer, Applied Mathematics and Informatic, Hanoi University Of Science and Technology, Hanoi, Vietnam
- English: Business Level (speaking, reading, writing)
CERTIFICATES
- AWS Certified Solutions Architect - Professional - 2024/07
- AWS Certified DevOps Engineer - Professional - 2024/07
- Certified Kubernetes Administrator (CKA) - 2021/11/27 - LF 2sozdhk1mg
- Azure Solutions Architect Expert
- Azure DevOps Engineer Expert https://www.credly.com/users/hvn/badges
EXPERIENCES
PayPay Japan | SRE
May 2024 | Remote from Japan
- Optimized AWS Cloud cost, reduced $$$ thousand dollar per year
Binance.com | Senior DevOps Engineer - Observability team
Apr 2021 - Apr 2024 | Remote from Vietnam
- Operates PBs sized ELK logging clusters on AWS EC2 and EKS.
- Optimize ELK logging system cost, performance, stability, usability.
- Migrated log-shipping/transform system off AWS Lambda, reduced 10x cost and significantly improved stability (no more AWS Lambda throttle). Optimized regex and Python code, increased speed 10x. Tech stack: Python+regex, AWS EKS, SQS, EC2 spot.
- Developed custom plugin for Vectordotdev log masking for redacting sensitive data from logs. Tech stack: Rust.
- Developed custom Kubernetes Operator & CRD to manage Kafka resources from Kubernetes.
- Design and migrate the biggest logs volume off ElasticSearch, saved $$$/month. Developed log search API for searching log in Databend, retro-fit in the same custom UI used for ElasticSearch query. Tech stack: Go+gin, Vector, Databend, S3.
- Managed the most used internal chatbot, created a handy command that helped seemlessly migrate chat groups from old system (WebEx) to new internal chat system. Saved the company thousand hours of tedious export-to-csv-reimport manual tasks. Wrote custom Errbot backend for internal chat system. Tech stack: Python, Errbot
Grab | Lead Engineer - Cost Optimization team
Apr 2020 - Apr 2021 | HoChiMinh Office, Singapore based company
- Leading the process planning, purchasing, managing EC2/RDS/ElastiCache Reserved Instances (RIs), SavingsPlans, worked directly with AWS TAMs, and supports for exchanging under-utilized RIs, which helped save Grab millions dollar on AWS Cloud resources.
- Provides expertises to teams to understand and decipher their AWS bills.
- Drive increases adoption of AMD EC2 instances (and other new EC2 generations) via Terraform linters/Terraform default template, which is more cost-effective compared to older generation instances. I.e: for t3a, usage hours widely increased 20+ times.
- Designed and developed GrabTrustedAdvisor, the tool helps to deliver enhanced AWS Trusted Advisor "advice" about underutilized resources (EC2, RDS, ELB, DynamoDB, ElastiCache,...) to services' owners with additional information for making decisions for downsizing or shutting down the resources.
- Developed tools/alert helps teams detecting cost abnormally changes.
Grab | Senior SysOps Engineer / SRE
Nov 2017 - Mar 2020 | HoChiMinh Office, Singapore based company
- Developed a system for managing EC2 Capacity Reservation, ensuring 75% EC2 instances have Capacity Reservation across all GrabTaxi AWS accounts. This helps reduce the number of failures when scaling out services, ensuring the continuity of business.
- Leading the AWS resource tagging process, which helps teams view their cost by services, tech families or particular resources.
- Developed script for automating Azure subscription creation process.
- Greenfield: co-design and implemented 1-click new service creation solution with software team to create new service with everything needed in 1 hour.
- Released Grab cost dashboard helps engineers view and understand their services infra cost (including AWS, Azure, DataDog and ELK log clusters).
- Created NIA service, backend for Engineer Onboarding tool, which automates permission granting process for Engineers.
- Leading automation/linters for Terraform code base,
- Leading AWS Lambda pipeline design and tooling / training / consulting, integrated with Terraform, helps engineers create new Lambda functions via one script.
- Created GrabViz service: Visualizing Grab microservices architecture based on VPC Flowlog and Security Groups, helps detect staled Security Groups, misconfigurations, see traffic ratio
- Developed a Slack chatbot - integrated with AWS, JumpCloud, Scalyr, DataDog APIs … to automate onboarding new engineer tasks.
Unblockapp (the company that acquired Robotinfra)| Senior DevOps Engineer
Jan 2016 - Apr 2017 | Remote, Malaysia based company
- Leading SaltStack formulas development, develop formulas for deploying Golang microservices.
- Design and build CI system, code quality assurance tools for Golang microservices.
- Design and build CI system for Django frontend app.
- Setup CI to build multiple platforms apps (Windows, Linux, Android).
- Deploy staging and production (on DigitalOcean) for Unblockapp.
- Prototype PoC for Docker based deployment for Golang microservices with Kubernetes, Concourse CI, Docker registry, Influxdb, EFK stack on DigitalOcean and Google Cloud (GKE).
- Setup PoC Kubernetes cluster manually on DigitalOcean with CoreOS and etcd, and deploy other one using Kargo (Ansible based) to deploy the dev cluster on DigitalOcean.
Robotinfra | Development director, Vietnam
Jan 2015 - Dec 2015 | Remote, Hongkong based company.
- Coordinate Robotinfra.com development process.
- Research and develop Go application for new Robotinfra product.
- Co-design and develop testing system for SaltStack formulas on multiple slaves Jenkins + SaltStack cloud on DigitalOcean.
- Automate deployment and daily maintain whole infrastructure, namely email (Postfix+Dovecot+Amavis+OpenDKIM+OpenLDAP+RoundCube...), logging (Graylog2+Elasticsearch), graph (graphite+diamond, influxdb + grafana), monitoring (Shinken+NRPE), Git (GitLab/GitHub), CI (Jenkins, Concourse), task management (Youtrack, OpenERP), internal chat (Ejabberd, Mattermost), DNS authoritative and caching (Bind9), error monitor (Sentry)...
Robotinfra | Senior Devops
Apr 2013 - Dec 2014 | Remote, Hongkong based company.
- Writing SaltStack formulas for automating deploying a wide range of software (both open/close sources).
- Setup, extend and improve the continuous integration system for testing 100+ software deployment on all supported Ubuntu LTS versions. In charge of the whole testing process for Robotinfra.com product.
- Code (Python, Bash) and Salt formula review/merging for all changes made by all developers (on GitLab).
- Contribute bug fixes for SaltStack and Diamond and a lot of other Python open source software.
- Developing Python library and tools for internal using.
- Setup and maintaining email system (postfix + dovecot + openLDAP + amavis + spamassassin, RoundCube), metric system (Diamond + graphite), centralized logging system (Graylog2, rsyslog), CI system (Jenkins) for internal using and for clients.
- Troubleshooting complex system problems.
VCCloud, VCCorp | System Engineer, Python developer
June 2012 - April 2013 | Hanoi, Vietnam
- Developed Linux network gateway software, which supports high availability and load balancing. Which serves all offices of VCCorp. (Core python + iptables/Linux utilities)
- Developed and managed the backend for the VPN system. (web.py + OpenVPN)
- Automate OpenStack and infrastructure installation with configuration management software SaltStack.
- Setup and managing monitoring system (graphite), logging system (graylog2), CI system (Jenkins) which serves both public and private clouds.
- Guide developers write custom Diamond modules for internal use.
- Deployed private cloud and public cloud with SaltStack on 30+ high end servers.
- Training developers/sysadmin on using git, SaltStack.
- Troubleshooting, tuning cloud components configurations and writing functional tests for cloud system.
APSTech | Android developer
Jan 2012 - June 2012 | Hanoi, VietNam
- Developed features for QuickSettings, an application for quickly accessing phone settings, switch on/off phone feature by just one click.
- Developed modules in CallBilling, an application for calculating mobile cost.
- Developed API (PHP) and Android application for Euro 2012 for local users.
NON-WORK ACTIVITIES