Tracing

Resources

Telemetry data is indexed by service. In OpenTelemetry, services are described by resources, which are set when the OpenTelemetry SDK is initialized during program startup. We want our data to be normalized, so we can compare apples to apples. OpenTelemetry defines a schema for the keys and values which describe common service resources such as hostname, region, version, etc. These standards are called Semantic Conventions, and are defined in the OpenTelemetry Specification.

We recommend that, at minimum, the following resources be applied to every service:

AttributeDescriptionExampleRequired?
service.nameLogical name of the service.
MUST be the same for all instances of horizontally scaled services.
shoppingcartYes
service.versionThe version string of the service API or implementation as defined in Version Attributes.semver:2.0.0No
host.hostnameContains what the hostname command would return on the host machine.server1.mydomain.com,No

Go Configuration

In Go, resources are set on the TracerProvider` at program startup.

   host, _ := os.Hostname()
   attributes := []kv.KeyValue{
       kv.String(conventions.AttributeServiceName, "service123"),
       kv.String(conventions.AttributeServiceVersion, "1.2.3"),
       kv.String(conventions.AttributeHostName, host),
   }
   tp, _ := sdktrace.NewProvider(
       sdktrace.WithResource(resource.New(attributes...)),
   )
   global.SetTraceProvider(tp)

Semantic Conventions

Standardizing the format of your data is a critical part of using OpenTelemetry. OpenTelemetry provides a schema for describing common resources, so that backends can easily parse and identify relevant information.

It is important to understand these conventions when writing instrumentation, in order to normalize your data and increase its utility. The semantic conventions for resources can be found in the specification.

The following types of resources are currently defined:

Spans

OpenTelemetry comes with many instrumentation plugins for libraries and frameworks. This should be enough detail to get started with tracing in production.

As great as that is, you will still want to add additional spans to your application code, in order to break down larger operations and gain more detailed insights into where your application is spending its time.

When you create a new span to measure a subcomponent, that span is added to the current trace as the child of the current span, and then becomes the current span itself.

Tracer API

Accessing the tracer

In order to interact with traces, you must first acquire a handle to a Tracer.

By convention, Tracers are named after the component they are instrumenting; usually a library, a package, or a class.

import "go.opentelemetry.io/otel/api/global"

tracer := global.Tracer("package-name")

Note that there is no need to "set" a tracer by name before getting it. The name you provide is to help identify which component generated which spans, and to potentially disable tracing for individual components.

We recommend creating a Tracer once per component during initialization and retaining a handle to the tracer, rather than calling global.Tracer() repeatedly.

Accessing the current span

Ideally, when tracing application code, spans are created and managed in the application framework.

Assuming that your application framework is supported, a trace will automatically be created for each request, and your application code will already have a span available inside a the current context, which can be used for adding application specific attributes and events.

func printSpan(ctx context.Context) {
   span := trace.SpanFromContext(ctx)
   fmt.Printf("current span: %v\n", span)
}

You can access the current active span via the context object.

Setting a new current span

Let’s demonstrate creating a new span by example. Imagine you have an automated kitchen, and you want to time how long the robot chef takes to bake a cake. Create a new Bake-Cake span, and propagate it via a context object.

func BakeCake(ctx context.Context, tracer trace.Tracer) {
   // the new span will automatically become the child of the span in the current context, and be set as the current span in the new context.
   childCtx, childSpan := tracer.Start(ctx, "bake-cake")
   defer childSpan.End()
   
   PreheatOven(childCtx)
   MixBatter(childCtx)
   HeatCake(childCtx)
}

Attributes

When performing root cause analysis, span attributes are an important tool for pinpointing the source of performance issues.

Setting Attributes

Note that it is only possible to set attributes, not to get them.

Much like how resources are used to describe your services, attributes are used to describe your spans. Here is an example of setting attributes to correctly define an HTTP client request:

import(
  "go.opentelemetry.io/otel/api/trace"
  "go.opentelemetry.io/otel/api/kv"
  . "go.opentelemetry.io/otel/api/standard"
)

span := tracer.Starxt(
  ctx,
  "/project/:project-id/list",
  trace.WithSpanKind(trace.SpanKindClient),
  trace.WithAttributes(
    kv.String(HTTPMethodKey, "GET"),    
    kv.String(HTTPFlavorKey, "1.1"),
    kv.String(HTTPUrlKey, "https://example.com:8080/project/123/list/?page=2"),
    kv.String(NetPeerIPKey, "192.0.2.5"),
    kv.Int(HTTPStatusCodeKey, 200),
    kv.String(HTTPStatusTextKey, "OK"),
  )
  attributes={
    "http.method": 
    "http.flavor": ,
    "http.url": "https://example.com:8080/project/123/list/?page=2",
    "net.peer.ip": "192.0.2.5",
    "http.status_code": 200,
    "http.status_text": "OK"
  },
)

// In addition to the standard attributes, custom attributes can be added as well.
span.SetAttribute("list.page_number", 2);

// To avoid collisions, always namespace your attribute keys using dot notation.
span.SetAttribute("project.id", 2);

// attributes can be added to a span at any time before the span is finished.
span.End()

Conventions

Spans represent specific operations in and between systems. Many operations represent well-known protocols like HTTP or database calls. Like with resources, OpenTelemetry defines a schema for the attributes which describe these common operations. These standards are called Semantic Conventions, and are defined in the OpenTelemetry Specification.

OpenTelemetry provides a schema for describing common attributes so that backends can easily parse and identify relevant information. It is important to understand these conventions when writing instrumentation, in order to normalize your data and increase its utility.

The following semantic conventions are defined for tracing:

  • General: General semantic attributes that may be used in describing different kinds of operations.
  • HTTP: Spans for HTTP client and server.
  • Database: Spans for SQL and NoSQL client calls.
  • RPC/RMI: Spans for remote procedure calls (e.g., gRPC).
  • Messaging: Spans for interaction with messaging systems (queues, publish/subscribe, etc.).
  • FaaS: Spans for Function as a Service (e.g., AWS Lambda).

Events

The finest-grained tracing tool is the event system.

Span events are a form of structured logging. Each event has a name, a timestamp, and a set of attributes. When events are added to a span, they inherit the span's context. This additional context allows events to be searched, filtered, and grouped by trace ID and other span attributes.

Span context is one of the key differences between distributed tracing and traditional logging.

Adding events

Events are automatically timestamped when they are added to a span. Timestamps can also be set manually if the events are being added after the fact.

For example, enqueuing an item might be recorded as an event.

import "go.opentelemetry.io/otel/label"

// Get the span
span := tracer.SpanFromContext(ctx)

// Perform the action
queue.Enqueue(myItem)

// Record the action
span.AddEvent(ctx, "enqueued item", label.String("key1", "value1"))
  label.String("item.id", myItem.ID()),
  label.String("queue.id": queue.ID()),
  label.String("queue.length": queue.Length()),
})

Spans should be created for recording course-grained operations, and events should be created for recording fine-grained operations.

Recording exceptions

Many of the tracing conventions can apply to event attributes as well as span attributes. The most important event-specific convention is recording exceptions.

import "go.opentelemetry.io/otel/codes"

span := tracer.SpanFromContext(ctx)

// RecordError converts an error into a span event.
span.RecordError(ctx, err)

// If the error indicates that the entire operation is an 
// errored state, update the span status.
span.SetStatus(codes.Internal, "critical error")

Marking the span as an error is independent from recordings exceptions. To mark the entire span as an error, and have it count against error rates, set the SpanStatus to any value other than OK.

StatusCode definitions can be found in the OpenTelemetry specification. If no status code directly maps to the type of error you are recording, set the status code to UNKOWN for common errors, and INTERNAL for serious errors.