1 of 34

Debugging with OpenTelemetry

August 18th 2023

Tracing - like Logging, just better

2 of 34

A few words about myself

Chief Architect at Torq.io

Love early stage startups 🫶 at my 4th one atm

Love espressos - 17g in 34g out, 25 seconds.

I love reducing CI build times 😅

Windows Internals and Low Level background

Check out my blog at kostyay.com

3 of 34

Torq.io

Trusted by the the world’s best security teams

Our Background

  • Comprehensive low code/no code security automation platform that unifies and automates the entire security stack to deliver unparalleled protection and productivity

About Torq

  • Founded in 2020
  • Offices in US, Israel, UK, and Spain
  • Millions of Daily Security Automations

3

4 of 34

Tech Stack

  • 100% GoLang
  • Microservice architecture running on GKE
    • 20-30 services
    • 10s of deployments per day to production
  • gRPC protocol internally, Frontend uses gRPC-Web (moving to connect.build)
  • Frontend VueJS
  • Databases:
    • Postgres
    • Redis
    • Cube for reporting
  • Heavy PubSub users
  • Light Cloud Functions users
  • Observability:
    • Metrics: Prometheus/Thanos
    • Logging: only errors (using zap)
    • Tracing – OpenTelemetry ❤️: Errors + Slow Requests
  • Feature Flags: LaunchDarkly

5 of 34

How do you debug applications?

6 of 34

We can debug with..

  • Locally, an interactive debugger
  • Remotely, by using built in observability signals

In a properly instrumented application you should be able to understand a problem without having to add more instrumentation.

7 of 34

Logs are great, but..

8 of 34

Shortcomings of logs

  • Never there when you need them 😅
  • Not ideal for understanding end to end processes

9 of 34

What is Distributed Tracing

10 of 34

Distributed Calculator

1

2

3

4

5

6

Service 1

Service 2

11 of 34

span

12 of 34

Distributed trace

A view on the lifespan of a request

A collection of spans

An excellent tool to debug production issues

13 of 34

OpenTelemetry

14 of 34

What is Open Telemetry?

  • CNCF project - 2nd popular and actively maintained
  • Observability framework - Tracing, Metrics, Logging
  • Current standard for Tracing
  • SDKs + Tools
  • Cloud centric

OpenTelemetry reference architecture

(source: Documentation)

15 of 34

Tracing basics

  • Trace: End-to-End process in your application. Contains spans.
  • Spans: “call” in a trace
    • Attributes: key/value pairs; tags; metadata
    • Events: named strings
    • Parent: Previous span that encapsulates this one
  • Sampler: always, probabilistic, errors only, etc.
  • Exporter: OTLP, Jaeger, Prometheus, Cloud Provider.

16 of 34

Why should you trace?

  • Improve the observability of your software
  • End to end visibility across distributed systems
  • Understand service to service dependencies
  • Identify performance bottlenecks

17 of 34

1 - Initialize a Tracer in Go

otel.Tracer(“MyTracer”)

Rate

Service Metadata

Destination

18 of 34

2 - Create a Span

Create a span

End a span

Record Errors

19 of 34

Trace Visualization

20 of 34

Trace Viewer - Google Cloud

Traces

Spans

21 of 34

Span Attributes

Span Tree

Attributes

22 of 34

Linked Logs and Events

23 of 34

CloudSQL – Query Insignts Integration

24 of 34

25 of 34

Adopting Traces

26 of 34

Adopting traces in your team

  • Enable auto-instrumentation – free tracing
  • Add trace-id to structured logs
  • Add middleware to your common libraries (clients and servers)
  • Experiment with the SDK
  • Write the code: Start with service boundaries and 3rd party calls
  • Be an advocate in your team - Tracing makes solving production issues much easier

27 of 34

Using Middleware - The key to adoption

OpenTelemetry has a big community.

There is tracing support for many popular Go packages.

28 of 34

Example: Tracing in mux

> go get go.opentelemetry.io/contrib/instrumentation/github.com/gorilla/mux/otelmux

29 of 34

Use Traces in Tests

30 of 34

Trace based E2E tests – tracetest.io

31 of 34

32 of 34

Async Calculator

REST API

33 of 34

Presentation + Code Samples

34 of 34

Thank you

August 18th 2023