In the current Golang version of the agent, we can integrate our projects with go2sky and implement distributed tracing by manually embedding points into third-party frameworks. However, there are a few drawbacks to this approach:
Considering the two issues mentioned above, I have been pondering whether there is a better way to implement an automatic instrumentation agent similar to Java Agent for Golang.
Through my technical research, I have found that there are currently two technologies that can implement a similar functionality: eBPF or hybrid compilation.
eBPF is a popular technology in recent years. In Apache SkyWalking, we have our own eBPF Agent -- Apache SkyWalking Rover, which allows non-intrusive monitoring of applications. In simple terms, with eBPF, the program can be monitored without any modifications; all that is needed is to deploy the eBPF Agent on the machine where the service is running.
For Golang automatic instrumentation agents, eBPF can monitor the execution process of each method in a Golang program. All that is required is to intercept, acquire, and expand the data during method execution.
Ideally, eBPF seems like a great solution, but upon further research, I found that it has a few issues, also, I found a similar answer in an issue on datadog's GitHub repository:
Due to the two key issues mentioned above, we have decided to abandon this approach.
Hybrid compilation takes advantage of Golang's “-toolexec” parameter during program building, dynamically executing compilation operations through the provided program, and ultimately generating the final binary file. With this feature, we can intercept any required Go file and make modifications during method execution, thus implementing dynamic interception. I found an article that provides a more detailed explanation of the implementation principles of dynamic interception: https://blog.sqreen.com/dynamic-instrumentation-go/
Using hybrid compilation, it typically offers more efficient performance, as it is mixed with the program and compiled into a binary file. Moreover, it can also intercept any method.
The hybrid compilation also has a few drawbacks:
Based on the two approaches mentioned above, we found that hybrid compilation is more suitable for our needs. It can embed custom programs into any code within the service while offering better performance.
To verify the feasibility of hybrid compilation, I have specifically written a demo program to validate its basic functionalities, including method interception, cross-goroutine data transmission, enhanced entities, and more. You can access the project and try it out: https://github.com/mrproliu/go-agent-instrumentation
Based on the hybrid compilation implementation approach, we hope to achieve the following results.
Users only need to import the agent module in their program.
import _ "github.com/apache/skywalking-go" |
And add it to the compilation parameters using “-toolexec” when building the project. The configuration of the Agent can be passed in via configuration files.
$ go install github.com/xxx/go-agent-instrument $ export GO_AGENT_CONFIG=/path/to/config.yaml |
During the execution of the user's program, there is no need to worry about whether the context is passed or not. The system will automatically associate the tracing context with the current goroutine or any goroutine generated by the current goroutine.[a][b]
For developers who want to create plugins, we hope the process is as simple as possible, similar to the current Java Agent. You need to follow these steps.
When developing an agent, import the core library.
When writing plugins, only the agent core library and the required enhancing libraries for the plugin can be imported. The only thing to be cautious about is not importing other libraries, as hybrid compilation cannot import additional libraries, and doing so may cause errors during compilation.
If the module you currently depend on contains the necessary sub-modules you need to use, you can use them directly.
But If you want to introduce a new library, it can only be imported and used through the Agent Core in a unified manner. This approach ensures consistency and compatibility across the various components of the agent.
A plugin can define Instrumentation information, which includes the following key information:
An Instrumentation can define multiple interception points, each containing the following information:
Provide interception for static methods (e.g., “func NewXXX()”) or class methods (e.g., “func (i *Instance) InstanceMethod()”). At the same time, it supports the validation of parameter information, and the method can be intercepted only when the validation rules are met.
The following two fields of information are required:
Interceptors need to implement the following two methods. Please refer to the code:
// Invocation define the method invoke context |
The purpose of enhancing instance is to embed custom fields in an object to assist during plugin usage. For example, in Java Agent, there is an “EnhancedInstance”.
Similar to method interception, you still need to define a validation configuration method. This informs which type needs to be enhanced.
We will define this interface in the plugin core module. Please refer to the following code:
type EnhancedInstance interface { |
In Golang, we can also implement a function similar to "Context#getRuntimeContext" in Java, and it will be passed between Goroutines, as shown in the following code:
core.GetRuntimeContext().Put("test", "value1") |
In the original go2sky, to operate a span, you need to know the tracer and context objects. In the automatic enhancement plugin, you don't need to pass in this information.
Therefore, it's necessary to introduce new APIs in the Agent Core module to operate the Span, making it the same as Java's ContextManager. For example, when creating an Entry, you only need to specify the operation name (Endpoint Name) and Injector (Carrier) by default.
Once we have completed the plugin development, we need to actively introduce the plugin into the code for use. The reason is that there is no dynamic loading mechanism like Java's SPI in Golang. You only need to import it in one Golang file.
The following sequence diagram explains the execution process of how the enhanced program integrates the plugin with the original file. If you want to know more about the detailed implementation principles, please refer to the next section.
When the toolexec program is specified during the "go build", the Go program will collect the files from each package-level directory (including Go's own package directories, such as runtime, text, etc.) and send the "compile" or "asm" command to the toolexec program. The toolexec program actually executes the method and generates the target file. You can think of the toolexec program as a proxy. It is important to note here that "go build" will only send Go files of the same level; if there are different level directories, they will be called multiple times.
So when the Go agent instrumentation program executes, it can obtain the file paths in each package, allowing it to access all the Go files required for the target program's compilation. At this point, the enhancement program can use AST technology to parse and modify the file content. After the modifications or additions are completed, the parameters required for compilation can be modified, and the compilation command can be executed eventually. This is the principle of hybrid compilation.
When the enhancement program detects that the go file of the current package contains the specified method, the program will make the following changes.
This is used to intercept the execution flow before and after the method is executed.
func (x *Receiver) TestMethod(arg1 int) (result error) { |
In the code above, we can see that we want to intercept the TestMethod of the Receiver, and the aforementioned code is inserted before the actual code execution of the method.
The enhancement program copies the files from the plugin into the framework's code package, allowing the interceptor to be combined with method execution.
It's important to note that the copying process strictly follows the hierarchy of the method interception. If it's in the root directory, only files in the root directory will be copied. If it's in a certain level under a package, the files in that level will be copied. The reason for this approach is due to the principle that only files of the same package will be compiled during mixed compilation.
The bridge method is created to integrate the logic before and after method execution with the code in the interceptor.
// the interceptor only created once
|
This part of the code serves to combine the code from the first and second sections together.
Upon completion of these three steps, the following modifications can be made to the files in the intercepted package:
Based on the first step, if we add code to the intercepted method, it may cause the code line position to be shifted when a problem occurs in the framework. The solution I came up with is to create a new method and name it as the real method, and rename the intercepted method to a temporary name. For example, the following code:
// before |
Using this approach, the only downside I can think of is that when a problem occurs, the method name in the stack trace might not be correct, but the code line would be completely accurate.
For instance enhancement, the process is relatively straightforward. We just need to embed a new field of any type (interface{}) into the current struct. Additionally, in the Bridge's Go file, we need to add the implementation for the EnhancedInstance methods (mentioned in the "For framework plugin developer" section) specific to that struct. The following code demonstrates this:
type Test struct { // adding these two methods into bridge go file |
In Golang, there is no functionality similar to Java's ThreadLocal. However, we can achieve this by adding properties to the "struct g". The "struct g" represents a goroutine object in Golang, and in the "runtime" package, you can obtain the currently running goroutine object through the "getg() *g" function.
type struct g { |
In the code snippet above, we added a new attribute "skywalking_enhance_obj" to the g structure, and in the runtime package, we added a new file to export this field. The reason is that the "getg()" method is not exposed externally.
Next, we need to import the exported methods in the plugin core and go2sky (which will be introduced in more detail in the next section).
import _ "unsafe" |
Now, in the plugin, you can obtain the data information of the current goroutine without a context object.
In Golang, we can start new goroutines to accomplish tasks at any time. This leads to a problem: there can be many goroutines, but we only set custom data in the first one. When an RPC request is executed in a newly created goroutine, we are unable to access the data from the first goroutine, as demonstrated in the following code:
// goroutine A core.SetGLS("key", "test data") // goroutine B |
After learning the Golang Runtime code, we can see that when creating a new goroutine, the "runtime.newproc1" method is actually called. By analyzing the parameters of the "runtime.newproc1" method, we can see that the second parameter, callergp, indicates who created the goroutine. The method signature is shown below:
// Create a new g in state _Grunnable, starting at fn. callerpc is the |
In the example code above, when executed, the second parameter represents "goroutine A", and the returned instance is "goroutine B".
Based on the fact that we have already added custom attributes to struct G in the previous section, we can modify the "runtime.newproc1" method. When it returns, we can set the custom attributes in the returned instance to complete the task.
func newproc1(fn *funcval, callergp *g, callerpc uintptr) (result *g) {
// code |
When using methods like "core.CreateLocalSpan(opName string)", we need to obtain the tracing context and global tracer information from GLS to create the current span information.
Based on the current implementation in "go2sky", we can put the "segmentSpan" into GLS. Whenever a new span is needed, simply treat the Span from GLS as the parent span and store the current span back into GLS. As shown in the following logical code example:
// create new local span |
Currently, Java Agent has the functionality of "Context.getRuntimeContext().put("test", "value")", which can also be used in the Golang Agent. The current idea is to store the segment span and runtime context in a single instance within GLS. For example, create the following structure and store it in GLS:
type tracingContext struct { |
Since the enhanced agent is based on the compilation process, it lacks the ability to import other modules into the application. To achieve communication between the agent and the OAP server, it is necessary to introduce key modules, such as the Tracing data model and the gRPC communication protocol.
Therefore, we need to use "go2sky" as a base module to integrate into the target application, making the integration of the automatic Agent and go2sky crucial.
First, we need to understand how go2sky works when building spans in the tracing context.
As shown in the illustration above, it relies on the "ctx.Context" instance in "Golang". Since it cannot be some method Thread Local, data sharing within and across goroutines can only be accomplished through Context. Additionally, different frameworks need context for invoking, and by using the Context Propagation mechanism, information from the current and parent Contexts can be traced.
In go2sky, there is no concept of a parent Span. Therefore, if an inherited context is used at the same level, such as after completing a Local Span in the Context and using the returned Context from the Local Span, the parent of the Local Span would be the Exit Span instead of the upper-level Span.
Referring to the "go2sky" project, when we want to create a base library, we will copy the key code from "go2sky" and make the following modifications.
For the modules introduced in the user's project, the following functionalities will be provided:
Enhancing the code during the hybrid compilation phase, including the following parts:
In go2sky, span information is passed through the Context object. We need to replace this with the GLS mechanism, as mentioned in "GoRoutine with Tracing Context.". Also, when saving Span information, we need to save a reference to the parent Span together, otherwise, we may encounter issues with not finding the parent Span and binding the "parentSpanID" incorrectly. the code like below:
func (s *span) End() { |
The new agent project is should organize all project modules within a single repository to make it easier to manage dependencies and maintain consistency across modules. We can use Go modules to manage your dependencies and import the necessary modules within user code. Here's a suggested directory structure for the agent project:
TODO:
spanid lock
timer root span close
logger interface
plugin config
[a]Curious about this. how would you achieve things like this? https://github.com/openzipkin/zipkin-go/blob/master/middleware/http/server.go#L145 where handler can also access the tracing context?
[b]Do You mean to access the tracing context in the `ctx.Context`? yeah, if the context passes the arguments when the method invokes, you can read it in the `Invocation`. It should decide by which information in the context, and should we use it to build its own tracing or not?