istio服务网格可观测性平台搭建之调用链跟踪
一、 背景&简介
为什么需要链路追踪?
调用链追踪(tracing)是通过记录和关联一次分布式调用中每个阶段的细节,帮助运维人员快速定位发生故障的位置或找出导致性能下降的原因。
istio使用Jaeger作为调用链的后端引擎,它是一个来自Uber的开源的分布式调用链跟踪系统,包含用于存储、可视化和跟踪的组件。
二、 特别说明&约束
Istio一直以来都说自己是无侵入的服务网格方案,即不需要用户修改业务代码。但是需要注意的是在处理调用链跟踪时,虽然不需要用户在业务代码进行埋点,但是仍然需要对请求头进行拦截转发。
原因官方文档也给了说明:https://istio.io/latest/zh/docs/tasks/observability/distributed-tracing/overview/
尽管 Istio 代理能够自动发送 Span,但需要一些附加信息才能将这些 Span 加到同一个调用链。 所以当代理发送 Span 信息的时候,应用程序需要附加适当的 HTTP 请求头信息,这样才能够把多个 Span 加到同一个调用链。 要做到这一点,每个应用程序必须从每个传入的请求中收集请求头,并将这些请求头转发到传入请求所触发的所有传出请求。 具体选择转发哪些请求头取决于所配置的跟踪后端。要转发到请求头的设置在每个追踪系统特定的任务页面进行说明
关于请求头的说明:
- x-request-id : 这是Envoy专用的请求头,用于对日志和分布式追踪的唯一采样标志
- x-b3-traceid: Trace的ID地址,在第一个span生成时生成TraceId,然后在一个请求中一直向后传递。通过这个字段可以关联多个请求span
- x-b3-spanid: 在创建span时分配ID地址
- x-b3-parentspanid: 父SpanId,为本级调用的上一级Span
- x-b3-sampled: 表示采样结构,1表示上报该span,0表示不上报该span。采样判定一般在根span上进行,赋值后会在后续调用中一直向调用方传递,保证整个trace上的span同时被上报或者不上报
- x-b3-flags
借助这些信息,在span将所有数据推送到服务端后,服务端就能根据这些信息进行重组,然后在界面上进行展示。
关于istio-proxy是如何生成span信息的呢?
-
首先对于Inbound流量: 经过sidecar流入应用程序的流量,如果此时Header中没有任何跟踪相关的信息,envoy则会创建一个根span,traceID由这个spanID拼接,然后再将请求传递给业务容器;如果此时请求Header中包含trace相关信息,则sidecar从中提取trace的上下文信息并发送给应用容器
-
然后对于Outbound流量: 经过sidecar流出的流量,如果此时Header中没有任何跟踪相关的信息,envoy则会创建根span,并将跟该span相关上下文信息放在请求头中传递给下一个调用的服务。如果请求Header中存在trace信息,sidecar从Header中提取span相关信息,并基于这个span创建子span,并将新的span信息加在请求头中进行传递。
注意:上面的描述中,我们一直强调通过请求头Header中获取信息,或者在Header中塞入信息。所以就要求我们的服务之间调用走的restful API, 即需要http通信协议才能使用。
三、 部署Jeager&测试demo
需要提前准备kubernetes 和istio 运行环境。搭建方式参考官方指导。
本人使用的是1.25版本的kubernetes & 1.19版本的istio
- Istio默认不启动Jaeger,需要手动创建
部署成功:apiVersion: apps/v1 kind: Deployment metadata: name: jaeger namespace: istio-system labels: app: jaeger spec: selector: matchLabels: app: jaeger template: metadata: labels: app: jaeger sidecar.istio.io/inject: "false" annotations: prometheus.io/scrape: "true" prometheus.io/port: "14269" spec: containers: - name: jaeger image: "docker.io/jaegertracing/all-in-one:1.46" env: - name: BADGER_EPHEMERAL value: "false" - name: SPAN_STORAGE_TYPE value: "badger" - name: BADGER_DIRECTORY_VALUE value: "/badger/data" - name: BADGER_DIRECTORY_KEY value: "/badger/key" - name: COLLECTOR_ZIPKIN_HOST_PORT value: ":9411" - name: MEMORY_MAX_TRACES value: "50000" - name: QUERY_BASE_PATH value: /jaeger livenessProbe: httpGet: path: / port: 14269 readinessProbe: httpGet: path: / port: 14269 volumeMounts: - name: data mountPath: /badger resources: requests: cpu: 10m volumes: - name: data emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: tracing namespace: istio-system labels: app: jaeger spec: type: ClusterIP ports: - name: http-query port: 80 protocol: TCP targetPort: 16686 # Note: Change port name if you add '--query.grpc.tls.enabled=true' - name: grpc-query port: 16685 protocol: TCP targetPort: 16685 selector: app: jaeger --- # Jaeger implements the Zipkin API. To support swapping out the tracing backend, we use a Service named Zipkin. apiVersion: v1 kind: Service metadata: labels: name: zipkin name: zipkin namespace: istio-system spec: ports: - port: 9411 targetPort: 9411 name: http-query selector: app: jaeger --- apiVersion: v1 kind: Service metadata: name: jaeger-collector namespace: istio-system labels: app: jaeger spec: type: ClusterIP ports: - name: jaeger-collector-http port: 14268 targetPort: 14268 protocol: TCP - name: jaeger-collector-grpc port: 14250 targetPort: 14250 protocol: TCP - port: 9411 targetPort: 9411 name: http-zipkin - port: 4317 name: grpc-otel - port: 4318 name: http-otel selector: app: jaeger
- Istio默认的采样率只有1%,便于采样,最好调整成100%
调整方式为:kubectl edit iop installed-state -n istio-system
spec: values: pilot: traceSampling: 100 #默认1,修改为100
- 部署测试应用bookinfo
此demo为istio官方示例微服务
部署成功:# Copyright Istio Authors apiVersion: v1 kind: Service metadata: name: details labels: app: details service: details spec: ports: - port: 9080 name: http selector: app: details --- apiVersion: v1 kind: ServiceAccount metadata: name: bookinfo-details labels: account: details --- apiVersion: apps/v1 kind: Deployment metadata: name: details-v1 labels: app: details version: v1 spec: replicas: 1 selector: matchLabels: app: details version: v1 template: metadata: labels: app: details version: v1 spec: serviceAccountName: bookinfo-details containers: - name: details image: docker.io/istio/examples-bookinfo-details-v1:1.18.0 imagePullPolicy: IfNotPresent ports: - containerPort: 9080 --- ################################################################################################## # Ratings service ################################################################################################## apiVersion: v1 kind: Service metadata: name: ratings labels: app: ratings service: ratings spec: ports: - port: 9080 name: http selector: app: ratings --- apiVersion: v1 kind: ServiceAccount metadata: name: bookinfo-ratings labels: account: ratings --- apiVersion: apps/v1 kind: Deployment metadata: name: ratings-v1 labels: app: ratings version: v1 spec: replicas: 1 selector: matchLabels: app: ratings version: v1 template: metadata: labels: app: ratings version: v1 spec: serviceAccountName: bookinfo-ratings containers: - name: ratings image: docker.io/istio/examples-bookinfo-ratings-v1:1.18.0 imagePullPolicy: IfNotPresent ports: - containerPort: 9080 --- ################################################################################################## # Reviews service ################################################################################################## apiVersion: v1 kind: Service metadata: name: reviews labels: app: reviews service: reviews spec: ports: - port: 9080 name: http selector: app: reviews --- apiVersion: v1 kind: ServiceAccount metadata: name: bookinfo-reviews labels: account: reviews --- apiVersion: apps/v1 kind: Deployment metadata: name: reviews-v1 labels: app: reviews version: v1 spec: replicas: 1 selector: matchLabels: app: reviews version: v1 template: metadata: labels: app: reviews version: v1 spec: serviceAccountName: bookinfo-reviews containers: - name: reviews image: docker.io/istio/examples-bookinfo-reviews-v1:1.18.0 imagePullPolicy: IfNotPresent env: - name: LOG_DIR value: "/tmp/logs" ports: - containerPort: 9080 volumeMounts: - name: tmp mountPath: /tmp - name: wlp-output mountPath: /opt/ibm/wlp/output volumes: - name: wlp-output emptyDir: {} - name: tmp emptyDir: {} --- apiVersion: apps/v1 kind: Deployment metadata: name: reviews-v2 labels: app: reviews version: v2 spec: replicas: 1 selector: matchLabels: app: reviews version: v2 template: metadata: labels: app: reviews version: v2 spec: serviceAccountName: bookinfo-reviews containers: - name: reviews image: docker.io/istio/examples-bookinfo-reviews-v2:1.18.0 imagePullPolicy: IfNotPresent env: - name: LOG_DIR value: "/tmp/logs" ports: - containerPort: 9080 volumeMounts: - name: tmp mountPath: /tmp - name: wlp-output mountPath: /opt/ibm/wlp/output volumes: - name: wlp-output emptyDir: {} - name: tmp emptyDir: {} --- apiVersion: apps/v1 kind: Deployment metadata: name: reviews-v3 labels: app: reviews version: v3 spec: replicas: 1 selector: matchLabels: app: reviews version: v3 template: metadata: labels: app: reviews version: v3 spec: serviceAccountName: bookinfo-reviews containers: - name: reviews image: docker.io/istio/examples-bookinfo-reviews-v3:1.18.0 imagePullPolicy: IfNotPresent env: - name: LOG_DIR value: "/tmp/logs" ports: - containerPort: 9080 volumeMounts: - name: tmp mountPath: /tmp - name: wlp-output mountPath: /opt/ibm/wlp/output volumes: - name: wlp-output emptyDir: {} - name: tmp emptyDir: {} --- ################################################################################################## # Productpage services ################################################################################################## apiVersion: v1 kind: Service metadata: name: productpage labels: app: productpage service: productpage spec: ports: - port: 9080 name: http selector: app: productpage --- apiVersion: v1 kind: ServiceAccount metadata: name: bookinfo-productpage labels: account: productpage --- apiVersion: apps/v1 kind: Deployment metadata: name: productpage-v1 labels: app: productpage version: v1 spec: replicas: 1 selector: matchLabels: app: productpage version: v1 template: metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: "9080" prometheus.io/path: "/metrics" labels: app: productpage version: v1 spec: serviceAccountName: bookinfo-productpage containers: - name: productpage image: docker.io/istio/examples-bookinfo-productpage-v1:1.18.0 imagePullPolicy: IfNotPresent ports: - containerPort: 9080 volumeMounts: - name: tmp mountPath: /tmp volumes: - name: tmp emptyDir: {} ---
四、 访问测试&观察现象
为了测试方便直接在集群中访问productpage 微服务,它会调用其他微服务。依赖关系为:
-
获得入口服务的clusterIP,然后curl 触发访问请求即可: 访问成功
-
查看Jeager的UI界面
span信息已经生成:
详细span图: 总span7条,涉及服务4个,深度 5层
调用链topo图:
五、 分析链路
试图探究sidecar是如何传递span信息的,同时又是如何组装成一条完整的线路的。
- 将bookinfo对应的负载日志级别设置为
trace
spec: template: metadata: annotations: sidecar.istio.io/logLevel: trace
- 分析span信息
productpage有三条spans信息,inbound流量一条,outbound流量两条。 reviews有两条spans信息,inbound流量一条,outbound流量一条。 ratings一条inbound span信息 details一条inbound span信息 共计7条spans信息,和Jeager 面板统计的数据保持一致
由于在productpage的inbound流量中,请求头并未携带任何trace相关信息,此时envoy拦截流量后会自行生成X-B3-TraceId 、X-B3-TraceId等,可通过查询envoy访问日志获取
根据日志发现 productpage 中span对应的trace信息:2023-11-09T01:55:02.075102Z trace envoy http external/envoy/source/common/http/http1/codec_impl.cc:547 [Tags: "ConnectionId":"101"] completed header: key=X-B3-TraceId value=235aa67f7b07cc91bbdd05f23bcd7bab thread=23 2023-11-09T01:55:02.075106Z trace envoy http external/envoy/source/common/http/http1/codec_impl.cc:547 [Tags: "ConnectionId":"101"] completed header: key=X-B3-SpanId value=bbdd05f23bcd7bab thread=23 2023-11-09T01:55:02.075109Z trace envoy http external/envoy/source/common/http/http1/codec_impl.cc:547 [Tags: "ConnectionId":"101"] completed header: key=X-B3-Sampled value=1 thread=23 2023-11-09T01:55:02.075114Z trace envoy http external/envoy/source/common/http/http1/codec_impl.cc:547 [Tags: "ConnectionId":"101"] completed header: key=x-request-id value=8bb99618-1793-9fbd-a84b-9118c7f82e92 thread=23
'x-b3-traceid': '235aa67f7b07cc91bbdd05f23bcd7bab' , 该traceid整个链路保持一致 'x-b3-spanid': 'bbdd05f23bcd7bab'
- 收集span信息
根据trace信息可得如下关系:
同时还可借助可视化后端Jaeger,查看每个span的耗时:
六、 业务代码处理逻辑解析
调用链埋点逻辑是在 Sidecar 代理中完成,应用程序不用处理复杂的埋点逻辑,但应用程序需要配合在请求头上传递生成的 Trace 相关信息。下面抽取服务代码来印证下结论,并看下业务代码到底是怎么修改的。
以productpage.py为例:
-
根据下面代码可以看到,根据Productpage 请求的 Restful 方法中从 Request 中提取 Trace 相关的 Header
@app.route('/productpage') @trace() def front(): product_id = 0 # TODO: replace default value headers = getForwardHeaders(request) ......
-
根据如下代码获取trace信息,header的填充
def getForwardHeaders(request): headers = {} # x-b3-*** headers can be populated using the opentracing span span = get_current_span() carrier = {} tracer.inject( span_context=span.context, format=Format.HTTP_HEADERS, carrier=carrier) headers.update(carrier) # We handle other (non x-b3-***) headers manually if 'user' in session: headers['end-user'] = session['user'] ...... return headers
-
最后构造一个请求发出去,请求 /details 服务接口。可以看到请求中包含收到的 Header。
def getProductDetails(product_id, headers): try: url = details['name'] + "/" + details['endpoint'] + "/" + str(product_id) res = requests.get(url, headers=headers, timeout=3.0) ......
再来看看Details微服务
- 根据收到的请求,获取header信息
server.mount_proc '/details' do |req, res| pathParts = req.path.split('/') headers = get_forward_headers(req)
- 在headers中添加trace信息
def get_forward_headers(request) headers = {} # Keep this in sync with the headers in productpage and reviews. incoming_headers = [ # All applications should propagate x-request-id. This header is # included in access log statements and is used for consistent trace # sampling and log sampling decisions in Istio. 'x-request-id', # Lightstep tracing header. Propagate this if you use lightstep tracing # in Istio (see # https://istio.io/latest/docs/tasks/observability/distributed-tracing/lightstep/) # Note: this should probably be changed to use B3 or W3C TRACE_CONTEXT. # Lightstep recommends using B3 or TRACE_CONTEXT and most application # libraries from lightstep do not support x-ot-span-context. 'x-ot-span-context', # Datadog tracing header. Propagate these headers if you use Datadog # tracing. 'x-datadog-trace-id', 'x-datadog-parent-id', 'x-datadog-sampling-priority', # b3 trace headers. Compatible with Zipkin, OpenCensusAgent, and # Stackdriver Istio configurations. 'x-b3-traceid', 'x-b3-spanid', 'x-b3-parentspanid', 'x-b3-sampled', 'x-b3-flags', # Application-specific headers to forward. 'end-user', 'user-agent', # Context and session specific headers 'cookie', 'authorization', 'jwt' ] request.each do |header, value| if incoming_headers.include? header then headers[header] = value end end return headers end
- 点赞
- 收藏
- 关注作者
评论(0)