1 of 32

Resilient Services with Clojure and Temporal

January 19, 2023

2 of 32

About Me – Greg Haskins

  • MSEE and BSECE from Worcester Polytechnic Institute
  • 25+ years working in the software industry, primarily in telecom and financial services.
    • Engineer, Architect, and a hands-on technology executive.
  • Co-founder and CTO of Manetu (https://manetu.com)
    • Startup founded in 2019 to focus on solving privacy and cybersecurity issues
    • Cloud-native platform based largely on Clojure/Clojurescript running on Kubernetes

3 of 32

Agenda

  • Part 1 - Introduction to Temporal
  • Part 2 - Temporal Clojure SDK
  • Part 3 - Live Demo

4 of 32

Part 1: Introduction to Temporal

5 of 32

What is Temporal?

A platform and framework for the reliable execution of your code recognizing:

  • Hardware may fail.
  • Networks may disconnect.
  • Natural disasters may strike.
  • Software may crash, have bugs, or be taken down for maintenance.
  • Etc

Many applications want to be hardened to remain available and correct despite these situations. Temporal helps you do this.

https://temporal.io

6 of 32

How does Temporal help?

  • Temporal provides a framework that abstracts concepts such as durable checkpoints and replay in a way that is largely transparent to your business logic.
  • Your business logic is coded in programming languages you already use right alongside non-workflow related logic, rather than an adjunct orchestration language such as BPMN.
  • All of the elements of Temporal including the framework and backend are designed for horizontal scalability and continuous, reliable, multi-AZ operation, helping your application achieve the same.

7 of 32

What does this look like?

  • You define Workflows and Activities functions
    • Workflows:
      • Invoke Activities and other Workflows.
      • Send and Receive Signals.
      • Respond to Queries.
    • Activities encapsulate operations that may fail.
      • E.g. Network or Database IO
  • Workers are processes running on your infrastructure that host Workflows and Activities
    • Workers listen on Task Queues.
  • Clients interact with Workflows.
    • Invoke, Query, Signal, Cancel Workflows.
    • Wait for Workflow results RPC-style.
  • Temporal Backend
    • Manages Workflow/Activity execution.
    • Persists interactions in a durable Event Log.
    • Can transparently replay interactions in the event of failure to resume correct behavior.
  • An orchestrator such as Kubernetes
    • Keeps your Workers, Temporal, and Cassandra available and spread out across AZs.

8 of 32

What does this look like? (continued)

9 of 32

Fault Remediation

10 of 32

Workflows

  • Workflows are resilient programs that will continue execution despite faults.
    • They orchestrate tasks, such as invoking Activities and other Workflows.
    • They respond to external events, such as Signals, Queries, and Cancellations.
    • They sometimes handle time-based events, such as deadlines and timeouts.
  • The Temporal Platform will:
    • Invoke Workflow programs at key points.
      • In response to a client explicitly starting a Workflow.
      • In response to faults in the midst of Workflow of execution.
        • E.g. a worker process crashes or an availability zone loses power.
    • Automatically record a durable history of the Workflow without explicit programmer action.
      • You do need to be aware that these inputs and outputs represent a contract to a potentially long lived entity.
        • Future or past versions of Clients and Activities.
        • Future or past versions of the Workflow itself.

11 of 32

Workflows (continued)

  • A Workflow needn't be concerned with replay.
    • Workflow code is re-executed from the beginning each time.
    • The platform will supply the same inputs and responses for any durable operations, allowing your program to recover its state and continue where it left off.
  • Implication: Inputs and outputs within Workflow processing must be deterministic.
    • Avoid any sources of non-determinism such as global atoms, non-seeded random number generators, accessing local process configuration, or functions involving system time.
    • All non-deterministic operations must occur within Activities or Side Effects.

Workflows are functional, and a natural match to Clojure.

12 of 32

Workflows (continued)

  • Workflows are like Lightweight Processes (LWPs).
    • LWPs such as Clojure core.async go routines generally share similar properties.
      • High instance/cpu ratios - potentially 1000s of Workflows per core.
      • Have specific IO constructs (e.g. core.async channels) designed to cheaply “park” LWPs that are waiting on IO.
  • Things to be aware of within Workflow context
    • Do not use native threading primitives such as go-routines or Java thread pools. All Workflow processing must work within the context of the framework provided thread.
    • Do not block the underlying thread with native blocking calls.
      • Sensitivities shared with go-routines.
      • Use primitives provided by the SDK, such as sleep and await, or deref for SDK provided promises.
      • An async api is on the roadmap, but is generally unneeded.

13 of 32

Workflows (continued)

Workflows have the concept of an ID.

  • The ID can be assigned by the caller or by the platform.
  • The ID is used to target a specific instance of the Workflow.
    • Signal, Query, Cancel, or to obtain status/results.
  • The Temporal platform guarantees that only one Workflow instance may have a given ID cluster wide at any given time.
    • This will be an important point later in the talk.

14 of 32

Activities

Activities share a few similar traits with Workflows:

  • The are defined, registered, and executed within Workers via the Temporal Platform.

They are also distinct in several ways:

  • Activities are generally unhindered with determinism or threading constraints.
    • They are implemented using standard/idiomatic practices of your chosen language.
    • The can block with conventional primitives and run complex operations over long periods of time.
  • Activities are where non-deterministic / non-pure type operations occur, such as updating a database or calling out to an external service.

15 of 32

Signals

Signals are a way for Clients or Workflows to send a payload to a running Workflow using a Channel metaphor.

  • Any signals sent become part of the durable Event Log, and will be resent in the same order should the Workflow be replayed.
  • Workflows may choose to listen to zero or more of its Channels with or without a timeout, and respond however they choose.
    • E.g. Update/cancel an expiration, send a message upon expiration, coordinate a cluster resource upon receipt such a mutex lock/release operation, etc.
  • A powerful signal-with-start concept is provided which combines sending a signal with optionally starting a workflow if it’s not already running.
    • Many important cluster coordination primitives can be built from this one mechanism when used in conjunction with Workflow IDs.

16 of 32

Part 2: Temporal Clojure SDK

17 of 32

Temporal Clojure SDK

  • Unofficial Clojure-native SDK for Temporal developed by Manetu and open sourced in late 2022.
  • Built upon the official Java SDK provided by the Temporal community.
  • Small but growing community.
  • Designed for Clojure to Clojure use.
    • Strives for a Clojure idiomatic experience while exposing powerful Temporal concepts with low friction.
    • Opinionated that all direct parties (Clients, Workflows, Activities) are implemented with this SDK in Clojure.
      • Uses nippy for transparent value/state persistence
      • Some SDKs provide more language neutral constructs, like json encoding, allowing polyglot integration.

If you need heterogeneity between Clients and Workflows, this SDK is not for you.

18 of 32

Workflow Definition

19 of 32

Hi, Bob

20 of 32

Chaining, Parallelization, and Error Channeling

  • Many functions in the Clojure SDK return promises to facilitate chaining, parallelization, and error channeling.
  • Promises have been integrated into the funcool/promesa library.
  • Support is included in the temporal.promises namespace.

For more information, visit Safe Blocking within Workflows.

21 of 32

Signals

  • One gotcha:
    • temporal.signals has methods intended for use in workflow context.
    • Use temporal.client.core/>! to send from client context.

22 of 32

Context

  • You may optionally attach an opaque context that is conveniently passed as the first parameter to all defworkflow and defactivity functions.
  • Useful for maintaining worker state such as database connections.
  • Be very careful with use within defworkflow since it can introduce non-determinism.
    • May be removed from defworkflow in a future release

23 of 32

Side Effects

Non-deterministic operations such as generating random numbers can be implemented as a Side Effect via the temporal.side-effect namespace.

  • A few common functions for UUID generation and current time are provided by the library.
  • Arbitrary functions can be implemented with temporal.side-effect/invoke

Values returned by the side-effect become part of the durable Event History and are not re-invoked during replay.

24 of 32

Idempotency

  • Activities only guarantee at-least-once execution, with the inputs and outputs becoming part of the Event History.
  • Failure after an Activity starts but before it is fully durable on completion means that the Activities may be retried even if they were partially or even fully visible to the outside world.

Activities must either be Idempotent or tolerate at-least-once behavior.

25 of 32

Idempotency (continued)

Avoid this

Do this instead

Following this RMW pattern allows:

  • The balance to become part of the Event History.
  • The account-write! Activity to become Idempotent.

26 of 32

Cluster Coordination

The Workflow ID uniqueness guarantees and the signal-with-start concept provide the primitives for powerful cluster coordination patterns without using external tools.

  1. Mutexes
    1. See https://github.com/manetu/temporal-clojure-sdk/tree/master/samples/mutex
  2. Granular queuing
    • Workflow-id based on resource-id
    • Clients always call signal-with-start to queue more work
    • Workflow drains queue until empty and exits

27 of 32

Fatal Errors and Retries

  • Uncaught exceptions trigger Temporal Retry Policies
  • Therefore, “fatal” errors should be caught and gracefully returned as a error

28 of 32

Roadmap

  • Versioning
  • Queries
  • Child Workflows

29 of 32

Unit Testing

The SDK provides a test framework to facilitate Workflow unit-testing including an in-memory implementation of the Temporal service.

  • You can use the provided environment with a Clojure unit testing framework of your choice, such as clojure.test.

30 of 32

Part 3 - Demo

31 of 32

Demo: ReminderBot

A Bot for Slack that adds a /remindme command that allows you to schedule a future reminder posted back to yourself at a specific time using Temporal.

https://github.com/ghaskins/reminderbot

32 of 32

Thank you!