Visualizing data efficiently is the first step in understanding the type of distribution( e.g normal distribution) present in available data Set.It also helps in finding skewness,outliers and many other properties present in data , to help us normalize/ clean it before performing any data-analytics on top of it.
Below are the few charts that are most commonly used in Datascience.
It shows the underlying frequency distribution of set of continuous data, divided in intervals bins.The x-axis represents the values present in the data, while the y-axis (and thus the height of each bar) represents the frequency.
Each bin contains the number of occurrences of scores in the data that are contained withing that distribution. The size of bins should be chosen wisely to make sure the resulting graph is able to depict the underlying frequency distribution of data.
“Use a histogram when you have numerical data and want to understand the data distribution, including its shape and central tendency”
Typically used with large dataset, when we want to find out if there is any relation between variables, provided both are numeric.If there is any relationship between the variables plot across x and y axis the points would scatter across in a way, as if there existed a invisible line.If the relationship is weaker, the dots will be arranged more loosely but still show a tendency for the y variable to either increase or decrease as the x variable increases.If no relationship exists between variables they would be scattered randomly.
“Use this type of graph when you have two numerical variables and are interested in the relationship between them”
These are useful when you are comparing numerical data across multiple groups or categories. With a boxplot you can quickly get information about the mean or median of the data, the overall distribution and degree of variation, and the existence of outliers.
“It is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set. Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared”
Happy reading .. ☺