Background

In the current Golang version of the agent, we can integrate our projects with go2sky and implement distributed tracing by manually embedding points into third-party frameworks. However, there are a few drawbacks to this approach:

  1. Projects must have manual instrumentation. If no points are embedded in the code(such as ctx.Context), it is impossible to enable the monitoring and propagation of the tracing context between services.
  2. It cannot be intercepted if the framework does not actively provide an API (e.g. a middleware or an inspector). Sometimes wrapping is complicated if no interface is provided or even with an interface you can’t access internal details.

Considering the two issues mentioned above, I have been pondering whether there is a better way to implement an automatic instrumentation agent similar to Java Agent for Golang.

Technology Select

Through my technical research, I have found that there are currently two technologies that can implement a similar functionality: eBPF or hybrid compilation.

eBPF

eBPF is a popular technology in recent years. In Apache SkyWalking, we have our own eBPF Agent -- Apache SkyWalking Rover, which allows non-intrusive monitoring of applications. In simple terms, with eBPF, the program can be monitored without any modifications; all that is needed is to deploy the eBPF Agent on the machine where the service is running.

For Golang automatic instrumentation agents, eBPF can monitor the execution process of each method in a Golang program. All that is required is to intercept, acquire, and expand the data during method execution.

Ideally, eBPF seems like a great solution, but upon further research, I found that it has a few issues, also, I found a similar answer in an issue on datadog's GitHub repository:

  1. Performance problems caused by context switching: When eBPF executes a user-mode program, it first needs to switch the current user mode to kernel mode execution, and then switch back to user mode after kernel mode execution is completed.
  2. Golang version compatibility and data manipulation: Different Golang versions may have completely different internal data structures, making it difficult to discern. Furthermore, when reading or writing data, using eBPF to operate Golang's memory space may be extremely challenging. Even if we can create a separate Tracing Context within the current service, the inability to pass the Tracing Context prevents it from combining with other services, which goes against the original purpose of Tracing Context.

Due to the two key issues mentioned above, we have decided to abandon this approach.

Hybrid compilation

Hybrid compilation takes advantage of Golang's “-toolexec” parameter during program building, dynamically executing compilation operations through the provided program, and ultimately generating the final binary file. With this feature, we can intercept any required Go file and make modifications during method execution, thus implementing dynamic interception. I found an article that provides a more detailed explanation of the implementation principles of dynamic interception: https://blog.sqreen.com/dynamic-instrumentation-go/

Using hybrid compilation, it typically offers more efficient performance, as it is mixed with the program and compiled into a binary file. Moreover, it can also intercept any method.

The hybrid compilation also has a few drawbacks:

  1. Unable to import libraries dynamically: Since it is based on intercepting Golang compilation to achieve enhancement, its extensibility is limited to existing libraries. This can lead to the inability to actively import some foundational libraries, such as go2sky or gRPC framework, which need to be actively introduced within the application to trigger compilation.
  2. toolexec only supports a single program: It is not possible to import multiple toolexec programs.

Result

Based on the two approaches mentioned above, we found that hybrid compilation is more suitable for our needs. It can embed custom programs into any code within the service while offering better performance.

To verify the feasibility of hybrid compilation, I have specifically written a demo program to validate its basic functionalities, including method interception, cross-goroutine data transmission, enhanced entities, and more. You can access the project and try it out: https://github.com/mrproliu/go-agent-instrumentation

Design Goal

Based on the hybrid compilation implementation approach, we hope to achieve the following results.

For User

Users only need to import the agent module in their program.

import _ "github.com/apache/skywalking-go"

And add it to the compilation parameters using “-toolexec” when building the project. The configuration of the Agent can be passed in via configuration files.

$ go install github.com/xxx/go-agent-instrument

$ export GO_AGENT_CONFIG=/path/to/config.yaml
$ go build -toolexec /path/to/go-agent-instrument ./

During the execution of the user's program, there is no need to worry about whether the context is passed or not. The system will automatically associate the tracing context with the current goroutine or any goroutine generated by the current goroutine.[a][b]

For Framework Plugin Developer

For developers who want to create plugins, we hope the process is as simple as possible, similar to the current Java Agent. You need to follow these steps.

Import agent core

When developing an agent, import the core library.

Plugin module import

When writing plugins, only the agent core library and the required enhancing libraries for the plugin can be imported. The only thing to be cautious about is not importing other libraries, as hybrid compilation cannot import additional libraries, and doing so may cause errors during compilation.

If the module you currently depend on contains the necessary sub-modules you need to use, you can use them directly.

But If you want to introduce a new library, it can only be imported and used through the Agent Core in a unified manner. This approach ensures consistency and compatibility across the various components of the agent.

Define Instrument Info

A plugin can define Instrumentation information, which includes the following key information:

  1. Package Name: The package to be enhanced, for example, "github.com/gin-gonic/gin".
  2. Version Checker: Indicates which versions of the current framework are supported.
  3. File System: The directory export where the current framework is located, is used for copying and modifying the interceptor files when the compilation is executed. You can directly use “go:embed *”.
  4. Interception Points: Specify which methods can be intercepted, which classes can be enhanced, etc. More details will be introduced in the next section.

Define Instrument Points

An Instrumentation can define multiple interception points, each containing the following information:

  1. Relative package path: In the Instrumentation, we define the package path, but here we have the relative path within the current package. For example, if the path of the framework files you intercept is "github.com/gin-gonic/gin/render", then its path in the Instrumentation is "github.com/gin-gonic/gin", and the relative path for the current interception point is "render". Since the “go build” compiles at each package level during compilation, we still need to comply with this rule.
  2. File name: Specifies in which file the current interception point needs to be processed. Using this parameter can reduce the traversal of all files, speeding up the program's compilation speed.
  3. Interception configuration: Supports method interception and enhancement of specified struct structures.

Intercept method

Provide interception for static methods (e.g., “func NewXXX()”) or class methods (e.g., “func (i *Instance) InstanceMethod()”). At the same time, it supports the validation of parameter information, and the method can be intercepted only when the validation rules are met.

The following two fields of information are required:

  1. Validation: Since method interception is based on AST syntax validation, I will provide a dedicated API to help users complete the validation work more easily, reducing the cost of use. This is as simple as using the ElementMatcher in the ByteBuddy framework in JavaAgent.
  2. Interceptor name: You also need to specify which interceptor to use to handle the method execution flow when the specified validation rules are met.

Interceptors need to implement the following two methods. Please refer to the code:

// Invocation define the method invoke context
type Invocation struct {
        CallerInstance
interface{}   // Caller from, nil if static method
        Args           []
interface{} // Function arguments

        Continue
bool          // Is continue to invoke method
        Return   []
interface{} // The method return values when continue to invoke

     Context
interface{} // Propagating context between BeforeInvoke or AfterInvoke
}

type Interceptor interface {
        
// BeforeInvoke method before invoking
        BeforeInvoke(invocation *Invocation) error
        
// AfterInvoke after invoke finished
        AfterInvoke(invocation *Invocation, result ...
interface{}) error
}

Enhance Instance

The purpose of enhancing instance is to embed custom fields in an object to assist during plugin usage. For example, in Java Agent, there is an “EnhancedInstance”.

Similar to method interception, you still need to define a validation configuration method. This informs which type needs to be enhanced.

We will define this interface in the plugin core module. Please refer to the following code:

type EnhancedInstance interface {
 GetSkyWalkingDynamicField()
interface{}
 SetSkyWalkingDynamicField(
interface{})
}
// cast to the enhanced instance
instance := invocation.CallerInstance.(core.EnhancedInstance)
// setting the customized value
instance.SetSkyWalkingDynamicField(
"test")
// get the customized value from instance
val := instance.GetSkyWalkingDynamicField()

Thread Local Usage

In Golang, we can also implement a function similar to "Context#getRuntimeContext" in Java, and it will be passed between Goroutines, as shown in the following code:

core.GetRuntimeContext().Put("test", "value1")
go func() {
        core.GetRuntimeContext().Get(
"test") // the result is "value1"
}()

Tracing Span Operation API

In the original go2sky, to operate a span, you need to know the tracer and context objects. In the automatic enhancement plugin, you don't need to pass in this information.

Therefore, it's necessary to introduce new APIs in the Agent Core module to operate the Span, making it the same as Java's ContextManager. For example, when creating an Entry, you only need to specify the operation name (Endpoint Name) and Injector (Carrier) by default.

Import Plugin

Once we have completed the plugin development, we need to actively introduce the plugin into the code for use. The reason is that there is no dynamic loading mechanism like Java's SPI in Golang. You only need to import it in one Golang file.

Plugin workflow

The following sequence diagram explains the execution process of how the enhanced program integrates the plugin with the original file. If you want to know more about the detailed implementation principles, please refer to the next section.

Implementation mechanism

When the toolexec program is specified during the "go build", the Go program will collect the files from each package-level directory (including Go's own package directories, such as runtime, text, etc.) and send the "compile" or "asm" command to the toolexec program. The toolexec program actually executes the method and generates the target file. You can think of the toolexec program as a proxy. It is important to note here that "go build" will only send Go files of the same level; if there are different level directories, they will be called multiple times.

So when the Go agent instrumentation program executes, it can obtain the file paths in each package, allowing it to access all the Go files required for the target program's compilation. At this point, the enhancement program can use AST technology to parse and modify the file content. After the modifications or additions are completed, the parameters required for compilation can be modified, and the compilation command can be executed eventually. This is the principle of hybrid compilation.

Method interception

When the enhancement program detects that the go file of the current package contains the specified method, the program will make the following changes.

Insert code before method execution

This is used to intercept the execution flow before and after the method is executed.

func (x *Receiver) TestMethod(arg1 int) (result error) {
        
if skip_result1, _sw_invocation, _sw_keep := _skywalking_enhance_receiver_TestMethod(&x, &arg1); !_sw_keep {
                
return skip_result1
        }
else {
                
defer _skywalking_enhance_receiver_TestMethod_ret(_sw_invocation, &result)
        }

        
// real method invoke
}

In the code above, we can see that we want to intercept the TestMethod of the Receiver, and the aforementioned code is inserted before the actual code execution of the method.

  1. _skywalking_enhance_receiver_TestMethod: This function is executed before the method interception, and the Receiver and the method's parameters are passed into it. The function returns the current invocation object and whether to continue execution. If further execution is not needed, a custom result is returned. Otherwise, the subsequent code is executed.
  2. _skywalking_enhance_receiver_TestMethod_ret: This function is executed after the intercepted method has been completed, and the return value of the current method is passed into it. It is worth noting that if the method to be intercepted does not have parameter names, the enhancement program will automatically set names for them.

Copy interceptor code to the package

The enhancement program copies the files from the plugin into the framework's code package, allowing the interceptor to be combined with method execution.

It's important to note that the copying process strictly follows the hierarchy of the method interception. If it's in the root directory, only files in the root directory will be copied. If it's in a certain level under a package, the files in that level will be copied. The reason for this approach is due to the principle that only files of the same package will be compiled during mixed compilation.

Create bridge methods

The bridge method is created to integrate the logic before and after method execution with the code in the interceptor.

// the interceptor only created once
var testMethodIntercepterInstance = &ServerHTTPInterceptor{}
func _skywalking_enhance_receiver_TestMethod(recv_0 **Receiver, param_0 *int) (result1 int, inv *Invocation, keep bool) {
        
// fallback for the plugin execute failure
        
defer func() {
                
if r := recover(); r != nil {
                        
// log the error
                }
        }

        
// create a new invocation instance
        invocation := &Invocation{}
        
// for caller if exist
        invocation.CallerInstance = *recv_0        
        
// for parameters
        invocation.Args =
make([]interface{}, 1)
        invocation.Args[0] = *param_0

        
// before method invoke
        
if err := testMethodIntercepterInstance.BeforeInvoke(invocation); err != nil {
                
// if invoke have return error, then keep the real method running
                
return 0, invocation, true
        }
        
// is skip method invoke, then return the customized result
        
if invocation.Continue {
                
return invocation.result[0], invocation, false
        }
        
// otherwide, keep the method running
        
return 0, invocation, true
}


func _skywalking_enhance_receiver_TestMethod_ret(invocation *Invocation, result_1 *error) {
        
// fallback for the plugin execute failure
        
defer func() {
                
if r := recover(); r != nil {
                        
// log the error
                }
        }
        testMethodIntercepterInstance.AfterInvoke(invocation, *result_1)
}

This part of the code serves to combine the code from the first and second sections together.

  1. Interceptor instance: The instance will be constructed within the current package, rather than being created each time it is executed, in order to reduce memory overhead.
  2. The first method: In the method interceptor, the parameters and Receiver information will be constructed into the Invocation object, and the "BeforeInvoke" method of the interceptor will be executed. If it doesn't need to continue, it will return a custom return value, otherwise, it will continue running. If any problems occur during code execution (recover or error return value from BeforeInvoke), the program will continue executing as well.
  3. The second method: This will be executed after the intercepted method is completed, with the return value of the intercepted method also being passed in. Similarly, there is a "recover" method to prevent issues caused by the plugin code.

Result

Upon completion of these three steps, the following modifications can be made to the files in the intercepted package:

  1. Modify the file containing the enhanced method: Create code before the intercepted method is executed.
  2. Add interceptor file: Copy the file where the interceptor is located.
  3. Add bridge file: Write a separate file for the bridging part.

Code Location Change Issue

Based on the first step, if we add code to the intercepted method, it may cause the code line position to be shifted when a problem occurs in the framework. The solution I came up with is to create a new method and name it as the real method, and rename the intercepted method to a temporary name. For example, the following code:

// before
func (x *Receiver) TestMethod(arg1 int) (result error) {
        
// real method invoke
}

// after
func (x *Receiver) skywalking_enhanced_TestMethod(arg1 int) (result error) {
        
// real method invoke
}
func (x *Receiver) TestMethod(arg1 int) (result error) {
        
if skip_result1, _sw_invocation, _sw_keep := _skywalking_enhance_receiver_TestMethod(&x, &arg1); !_sw_keep {
                
return skip_result1
        }
else {
                
defer _skywalking_enhance_receiver_TestMethod_ret(_sw_invocation, &result)
        }
        
return x.skywalking_enhanced_TestMethod(arg1)
}

Using this approach, the only downside I can think of is that when a problem occurs, the method name in the stack trace might not be correct, but the code line would be completely accurate.

Instance Enhance

For instance enhancement, the process is relatively straightforward. We just need to embed a new field of any type (interface{}) into the current struct. Additionally, in the Bridge's Go file, we need to add the implementation for the EnhancedInstance methods (mentioned in the "For framework plugin developer" section) specific to that struct. The following code demonstrates this:

type Test struct {
        
// existing fields
        skywalking_enhance_field
interface{}        // adding the field into the structure
}

// adding these two methods into bridge go file
func (receiver *Test) SetSkyWalkingField(val interface{}) {
        receiver.skywalking_enhance_field = val
}

func (receiver *Test) GetSkyWalkingField() interface{} {
        
return receiver.skywalking_enhance_field
}

Goroutine with Tracing Context

In Golang, there is no functionality similar to Java's ThreadLocal. However, we can achieve this by adding properties to the "struct g". The "struct g" represents a goroutine object in Golang, and in the "runtime" package, you can obtain the currently running goroutine object through the "getg() *g" function.

 Adding fields in the struct g

type struct g {
        
// real fields
        
// thread-local fields
        skywalking_enhance_obj
interface{}
}

// create a new file in "runtime" package for export the thread-local value
import (
 _
"unsafe"
)

//go:linkname _skywalking_tls_get _skywalking_tls_get
var _skywalking_tls_get = _skywalking_tls_get_impl

//go:linkname _skywalking_tls_set _skywalking_tls_set
var _skywalking_tls_set = _skywalking_tls_set_impl

//go:nosplit
func _skywalking_tls_get_impl() interface{} {
 
return getg().m.curg.skywalking_enhance_obj
}

//go:nosplit
func _skywalking_tls_set_impl(v interface{}) {
 getg().m.curg.skywalking_enhance_obj = v
}

In the code snippet above, we added a new attribute "skywalking_enhance_obj" to the g structure, and in the runtime package, we added a new file to export this field. The reason is that the "getg()" method is not exposed externally.

Next, we need to import the exported methods in the plugin core and go2sky (which will be introduced in more detail in the next section).

import _ "unsafe"

var (
        GetGLS =
func() interface{} { return nil }
        SetGLS =
func(interface{}) {}
)

//go:linkname _skywalking_tls_get _skywalking_tls_get
var _skywalking_tls_get func() interface{}

//go:linkname _skywalking_tls_set _skywalking_tls_set
var _skywalking_tls_set func(interface{})

func init() {
        
if _skywalking_tls_get != nil && _skywalking_tls_set != nil {
                GetGLS = _skywalking_tls_get
                SetGLS = _skywalking_tls_set
        }
}

Now, in the plugin, you can obtain the data information of the current goroutine without a context object.

Obtain data in different goroutine

In Golang, we can start new goroutines to accomplish tasks at any time. This leads to a problem: there can be many goroutines, but we only set custom data in the first one. When an RPC request is executed in a newly created goroutine, we are unable to access the data from the first goroutine, as demonstrated in the following code:

// goroutine A

core.SetGLS("key", "test data")
go func() {

      // goroutine B
        core.GetGLS(
"key")        // return nil
}()

After learning the Golang Runtime code, we can see that when creating a new goroutine, the "runtime.newproc1" method is actually called. By analyzing the parameters of the "runtime.newproc1" method, we can see that the second parameter, callergp, indicates who created the goroutine. The method signature is shown below:

// Create a new g in state _Grunnable, starting at fn. callerpc is the
// address of the go statement that created this. The caller is responsible
// for adding the new g to the scheduler.
func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {}

In the example code above, when executed, the second parameter represents "goroutine A", and the returned instance is "goroutine B".

Based on the fact that we have already added custom attributes to struct G in the previous section, we can modify the "runtime.newproc1" method. When it returns, we can set the custom attributes in the returned instance to complete the task.

func newproc1(fn *funcval, callergp *g, callerpc uintptr) (result *g) {
        
defer func() {


                result.skywalking_enhance_obj = &tracingContext{span: snapshot(span), }
        }()

      // code
}

Span Operation with GLS(Goroutine Thread Local)

When using methods like "core.CreateLocalSpan(opName string)", we need to obtain the tracing context and global tracer information from GLS to create the current span information.

Based on the current implementation in "go2sky", we can put the "segmentSpan" into GLS. Whenever a new span is needed, simply treat the Span from GLS as the parent span and store the current span back into GLS. As shown in the following logical code example:

// create new local span
func CreateLocalSpan(name string) span {
        parentSpan := GetGLS()
        
var result span
        
if parentSpan == nil {
                result = newRootSpan(GetGlobalTracer())
        }
else {
                result = newSpan(opName, parentSpan)
        }
        SetGLS(result)
        
return result
}
// found current active span
func ActiveSpan() span {
        
return GetGLS()
}

Runtime Context with GLS

Currently, Java Agent has the functionality of "Context.getRuntimeContext().put("test", "value")", which can also be used in the Golang Agent. The current idea is to store the segment span and runtime context in a single instance within GLS. For example, create the following structure and store it in GLS:

type tracingContext struct {
        activeSpan span
        runtimeContext
map[string]interface{}
}

Auto-instrument Agent with Dependencies

Since the enhanced agent is based on the compilation process, it lacks the ability to import other modules into the application. To achieve communication between the agent and the OAP server, it is necessary to introduce key modules, such as the Tracing data model and the gRPC communication protocol.

go2sky mechanism

Therefore, we need to use "go2sky" as a base module to integrate into the target application, making the integration of the automatic Agent and go2sky crucial.

First, we need to understand how go2sky works when building spans in the tracing context.

As shown in the illustration above, it relies on the "ctx.Context" instance in "Golang". Since it cannot be some method Thread Local, data sharing within and across goroutines can only be accomplished through Context. Additionally, different frameworks need context for invoking, and by using the Context Propagation mechanism, information from the current and parent Contexts can be traced.

In go2sky, there is no concept of a parent Span. Therefore, if an inherited context is used at the same level, such as after completing a Local Span in the Context and using the returned Context from the Local Span, the parent of the Local Span would be the Exit Span instead of the upper-level Span.

Customized Agent

Referring to the "go2sky" project, when we want to create a base library, we will copy the key code from "go2sky" and make the following modifications.

For the modules introduced in the user's project, the following functionalities will be provided:

  1. Introduce the gRPC and goapi modules.
  2. Provide the code related to gRPC and OAP communication.
  3. Provide the basic API originally offered by go2sky, such as Correlation, TraceID, and other functionalities.

Enhancing the code during the hybrid compilation phase, including the following parts:

  1. Introduce Agent Core-related features(Copy files into agent).
  2. Communicate with the customized Agent's protocol and send Segment data.
  3. Export the global tracer in the main package.
  4. Enhance the implementation of the provided external APIs.

Tracing Context with GLS

In go2sky, span information is passed through the Context object. We need to replace this with the GLS mechanism, as mentioned in "GoRoutine with Tracing Context.". Also, when saving Span information, we need to save a reference to the parent Span together, otherwise, we may encounter issues with not finding the parent Span and binding the "parentSpanID" incorrectly. the code like below:

func (s *span) End() {
        core.SetGLS(span.parentSpan())
}

Project Structure

The new agent project is should organize all project modules within a single repository to make it easier to manage dependencies and maintain consistency across modules. We can use Go modules to manage your dependencies and import the necessary modules within user code. Here's a suggested directory structure for the agent project:

  1. docs: This directory manages all documentation information for the current project. It can be synchronized with the Apache SkyWalking Website to keep the documentation up-to-date and consistent.
  2. plugins: This directory stores the base code for all plugins and the implementation of each individual plugin. The core subdirectory contains the base code, and other subdirectories contain plugin implementations. Plugins need to import the core module.
  3. reporter: Responsible for reporting the Tracing data collected from plugins to the OAP backend.
  4. test/e2e: Contains end-to-end tests for each different plugin to ensure their proper functioning. We can still use skywalking-infra-e2e for the testing of plugins. Using the SkyWalking Agent Test Tool as the server-side component and validator will help ensure compatibility and consistency with the Apache SkyWalking ecosystem.
  5. toolkit: In order to provide users with a set of plugins for common use cases, such as integrating tracing context into logging systems.
  6. tools/go-agent-enhance: Used for enhancing user programs. When users need to use this during their project build, they need to download the current package and specify it during the build process. For example: "go download github.com/apache/skywalking-go/tools/go-agent-enhance && go build -toolexec=$(/path/to/go-agent-enhance)"
  7. api.go: Users can use this API to obtain current Tracing data content, such as the current TraceID, Correlation information, etc.

Steps

  1. Resolve the highlighted parts issues:
  1. Method location when intercepting the method.
  1. Are there any features that I haven't mentioned or missed?
  2. Creating a new project to develop the Agent program, it should belong to Apache or SkyAPM, and the name of the project?
  3. Develop the Go Agent, and support the following things:
  1. Current plugins in which "go2sky-plugins" are supported.  
  2. Current API in "go2sky".

TODO:

spanid lock

timer root span close

logger interface

plugin config

[a]Curious about this. how would you achieve things like this? https://github.com/openzipkin/zipkin-go/blob/master/middleware/http/server.go#L145 where handler can also access the tracing context?

[b]Do You mean to access the tracing context in the `ctx.Context`? yeah, if the context passes the arguments when the method invokes, you can read it in the `Invocation`. It should decide by which information in the context, and should we use it to build its own tracing or not?