Tracing with LightStep and Nginx

Tracing your webservice/microservice helps in knowing the bottleneck of the component that is taking the most time to respond to. It also helps in pinpointing the location/log/function where the error occurs and what causes the poor performance.

In this blog post, we are trying to cover how we can leverage opentracing with Nginx while using Lighstep vendor for tracing our webapp.

OpenTracing Vendor-neutral APIs and instrumentation for distributed tracing.

Lightstep provides unlimited cardinality, dynamic service maps, and immediate (shockingly accurate) root cause correlation across traces, metrics, and logs anywhere in your system
Lightstep is just another vendor that provides connector for configuring distributed tracing based on opentracing standards.

We need to configure Nginx to use nginx-opentracing module and provide a vendor tracer , we are going to use LightStep vendor.

This GitGist should help you get started very easily.

Install.txt describes the steps to be followed for downloading and configuring the required lib.

Note these two lines –

load_module modules/ngx_http_opentracing_module.so;
Instructs Nginx to load the opentracing module

opentracing_load_tracer /usr/local/lib/liblightstep_tracer_plugin.so /etc/nginx/lightstep-config.json;
The config describes Lightstep vendor lib location and the vendor lib config

Note- In Step3 when you run the command
strings /lib64/libstdc++.so.6 | grep GLIBCXX
make sure you have available
GLIBCXX_3.4.25
GLIBCXX_3.4.26

Once you have configured all the above things you need to restart the Nginx and you should be able to see traces for all the requests your web app is serving.

This would look like

Deploying Python Rest Service with uWSGI

This article describes how we can leverage uWSGI one of the first implementations of WSGI (Web Server Gateway Interface, to forward requests to the web applications )for python flask application. I assume the reader has a fair understanding of flask as the article won’t detail how you can write your flask applications.

Rest applications in python are very easy to write, but the problem arises when we have to choose a production-grade application server for hosting our webservice. Flask one of the most popular framework for writing python based web applications is not a prod ready server. Moreover, the problem with python threading is well known, which we should always try to avoid. I recently had the same situation where we were required to deploy a python based webservice that handles millions of records in a second from users spread across the globe. After doing some research we choose uWSGI as our application server in conjunction with Nginx.

uWSGI supports launching multiple processes of your application. You just need to mention the number of workers that you want to launch. uWSGI can listen to HTTP port but as we are using Nginx as our Web Server we will start uWSGI to listen to a Unix socket. The client requests will be entertained from Nginx which will route the requests to the configured socket endpoint of uWSGI.

So, what are all the configurations that we need to set to get uWSGI working?

  • Need to tell the location for our python rest service, obviously
  • Which module to call, as an entry point
  • Location for the socket, where uWSGI will listen for routing requests to our rest service
  • Number of the process we want uWSGI to launch
  • UserId that uWSGI will use
  • Logfile paths for logging the application logs
  • The plugins that uWSGI should load for running our python application

Sample Config

[uwsgi]
module = app:app # the entry point
for-readline = /home/centos/pythonservice/env.txt # the env. variable to set which are being used by application
  env = %(_)
endfor = 
master = true # Should have a master process
processes = 16 # Number of processes you want to launch
plugins = python36, logfile # plugins that uWSGI should load for running our application
uid = centos # userID 
socket = /run/uwsgi/pythonservice.sock # Socket file, where Nginx will route the requests
chown-socket = centos:nginx # as we are running as a centos user, making it the owner of the socket file
chmod-socket = 666
vacuum = true # remove the socket file when the process stops
single-interpreter = true
reload-mercy = 30000 # how long uWSGI should wait before killing the workers when you restart your application
worker-reload-mercy = 30000
die-on-term = true
chdir = /home/centos/pythonservice/ # location of our codebase
logger = file:/var/log/pythonservice/app.log # path to log files
req-logger = file:/var/log/pythonservice/apprequest.log

Sample env.txt file

APP_CONFIG=/home/centos/pythonservice/config/prod.properties

The beauty of using uWSGI is you write your python application with flask without caring about how to handle requests at scale. You can just increase the number of processes running your application by a simple configuration change. UWSGI will detect all of the endpoints defined in the flask application and will route the requests accordingly.

The only thing that is left is configuring Ngnix to pass requests to the above-defined uWSGI socket. The bare minimum configuration would be

server {
    listen 80;
    server_name your_url;

    location /api/v1/health {
       include               uwsgi_params;
       uwsgi_pass            unix:/run/uwsgi/pythonservice.sock;
   }

 

Get Rapid with RAPIDS

RAPIDS is a collection of open source libraries to write, deploy and manage data pipelines end-to-end on GPUs. It uses  NVIDIA CUDA® for optimizing compute resources, but exposes parallelism through well known Python interfaces.

The Focus of this post is not to share the details for RAPIDS  but to detail steps to get started with it without many difficulties. The RAPIDS team has done a great job in compiling the Startup Guide. But, if you are someone like me who is very new to the world of GPU’s but got some decent experience in designing data pipelines then this post will help you very much in getting up and running using AWS platform .

These are the prerequisites mentioned on the Startup Guide.

Container Host Prerequisites

  • NVIDIA Pascal™ GPU architecture or better
  • CUDA 9.2 or 10.0 compatible nvidia driver
  • Ubuntu 16.04 or 18.04
  • Docker CE v18+
  • nvidia-docker v2+

Well,  I was not aware of most of these requirements and what they mean when I first started working on RAPIDS . To make it easy I have compiled these below steps –
Note:- You need to have AWS account in place for this, GCP and other cloud providers also provides support for machines required for RAPIDS but for the sake of this post I have selected AWS

Step 1) In AWS console launch an instance with ami Deep Learning Base AMI (Amazon Linux) Version 16.2 (ami-038f5aa6f8673b785) .

Step2) Once you have selected the AMI, make sure to choose GPU instances in filter-by, else you won’t be able to run the docker image of RAPIDS as the NVIDIA CUDA®  framework requires GPU in place

AWS_GPUMachine
Step3) Once the instance is up and running

      •  Install docker
         yum install docker
      •  once docker is installed you need to download the image for RAPIDS  from docker repository, there are many other places where you can find the image , follow Startup Guide for more details on this. Run below command to download RAPIDS  image
        $ docker pull nvcr.io/nvidia/rapidsai/rapidsai:latest
        
      • once the image is downloaded run below command to start RAPIDS container
        $ docker run --runtime=nvidia \
                        --rm -it \
                        -p 8888:8888 \
                        -p 8787:8787 \
                        -p 8786:8786 \
                        -p 8889:8889 \
                        nvcr.io/nvidia/rapidsai/rapidsai:latest

        Note the port mentioned in this command it is required that the jupyter gets started on of the mentioned port else you won’t be able to access the notebook and will get error

         File "/conda/envs/gdf/lib/python3.5/site-packages/tornado/netutil.py", line 168, in bind_sockets
            sock.bind(sockaddr)
        OSError: [Errno 99] Cannot assign requested address
        

        It is mentioned in Startup Guide that the above command will start jupyter but it was not the case with me . I had to start the jupyter separately. If, this happens with you too use below command to start the jupyter

        jupyter notebook --ip=0.0.0.0  --port=8889 --allow-root

         

And YA!! you got the jupyter notebook running which using RAPIDS to perform ETL and many other transformations. Follow Startup Guide for all ETL operations as the intent for this post was just to get RAPIDS  up and running using docker image.

For reference here is the cheat sheet for RAPIDS

Logistic Regression – PART 1

One of the most commonly used Binary Classification technique. Given (m) training examples you want to classify these training examples into 2 groups.
Let’s  consider a problem statement where given a set of images we want to know if this is a Dog ( Probability – 1) or not (Probability – 0 )

These images in a computer are represented by pixel density based on the color pattern. This pixel vector is the Feature vector( A vector that represents the important characteristics of an object) for the images.

Given an Image ( I ) will have a feature vector (n) of dimension

Row = Number of pixel across Red, Green, Blue band
Column =1

metrics

Each image in our data set will have the corresponding vector related to three colors Red, Green and Blue, we can stack them together to produce one big column vector with all values
stckedmetrics

Each value in this vector is independent of other but is part of single observation i.e in our case part of one single image. Like this we will have a number of vectors = number of images, with the dimension of the vector as [number of Pixels X 1], if we represent the number of Pixels as nx., then the dimension of the vector becomes 1 X nx

To determine if the given image is of a Dog or not the only way is to look at the characteristics of every input image we have in training set since these images are nothing but the vector of pixels which means using these pixels we need to determine if the given image is of a dog!

 

Imagining Logistic regression as the simplest Neural Network

We can imagine logistic regression as a one-layer neural network also known as a shallow neural network.
We will have inputs (x1,x2,……xm) where M = number of training examples we have. With every input in our case an image to be more precise a feature vector corresponding to that image, we will have a weight associated with every input.
shallowNN

Where Ŷ is the probability of the feature being of a dog picture feature i.e probability of Picture being of a Dog.

ŷ= P (Y = 1/X)

Some Function = b0+ w1*x1 + w2*x2 + …..+ wm*xm

Need for constant in Logistic Regression b0

With regression we want to approximate a function that defines a relationship between X and Y ( Input -> to -> Output) to get this working we need a bias term so that we can predict accurate values i.e if your input values are zero then the predicted value would also have to be zero Adding a bias weight that does not depend on any of the features allows the hyperplane described by your learned weights to more easily fit data that doesn’t pass through the origin

W€Rnx   Where R is some real number vector of dimension nx

b€R  Where R is some real number vector

Question is to predict ŷ using w,b given inputs  x1,x2,……xm

Since the output of a logistic regression function can only be 0 or 1. The output function would be a Sigmoid function defined as

sigmoid
ŷ =  σ ( wTx+b )
z = wTx+b
ŷ =  σ ( z )

σ ( z ) =  1 / 1 + e -z

Case1 = When Z is very small, in that case, e -z will be some big number  giving value for σ ( z ) ~0
Case2 = When Z is very large , in that case e -z will be ~ 0 giving value for σ ( z ) ~ 1

So, in logistic regression, the task is to learn w,b so that ŷ becomes a good estimate of y.

In the next blog post, we will discuss how to learn the values for w,b

Handling dangling Elasticsearch watcher index.

A few weeks back, our Elasticsearch cluster stopped executing any watchers. Doing initial analysis it looked like there is some problem with AWS SMTP service. As we use AWS SMTP for sending mail alerts to our LDAP accounts. After going through more logs and spending some time in understanding the sent mail statistics on AWS, thanks to AWS for providing intuitive UI to get insights of emails that are getting rejected. We were sure there is no problem with sending of email but something is wrong on the current master. Analyzing below log line it was clear that there is some issue with .watcher index.

2018-05-10T07:05:16,969][WARN ][o.e.g.DanglingIndicesState] [es-master-1] [[.watches/23nm9NSrSkeZaK4Dtyughg]] cannot be imported as a dangling index, as index with same name already exists in cluster metadata

Resolution

  • Delete the local directory: The log line tells the node name that is holding a stale copy of index along with the directory name. In our case it was es-master-1 node name with the directory 23nm9NSrSkeZaK4Dtyughg under data folder for the master.
  • Restart Watcher Service: Once the stale index directory is deleted, restart the watcher service

    POST _xpack/watcher/_restart