# Class 12 Informatics Practices - Scatter Plots Using Matplotlib

A scatter plot also called scatter chart uses dots to represent data points for two different numeric variables. The position of each dot on the horizontal and vertical axis represent an individual data point.

A scatter plot is commonly used to compare distribution of two variables and to find visually any correlation between them. If there are distinct clusters/segments within the data, it will be clear in the scatter plot.

Let us try to plot a few scatter graphs using matplotlib.

A vendor sells mango shake in summer time. Here, follows daily sales vs the maximum temperature of the day.

Plot a scatter graph of the above data by using matplotlib

Let x represents temperature of the day

Let y represents sales of the day.

import matplotlib.pyplot as plt

x = [30, 31, 33, 35, 37, 36, 40, 35, 30, 32]

y = [400, 450, 500, 700, 800, 780, 1100, 600, 320, 380]

plt.scatter(x, y)

plt.show()

Let us calculate
the average sales, show it as a red horizontal line. It will help us compare the sales with respect to the average value.

Here is the code snippet and the figure.

import numpy as np

import matplotlib.pyplot as plt

x = [30, 31, 33, 35, 37, 36, 40, 35, 30, 32]

y = [400, 450, 500, 700, 800, 780, 1100, 600, 320, 380]

plt.xlabel("Temperature")

plt.ylabel("Sales in Rs")

plt.title("Mango Shakes Sales vs Temperature")

plt.grid(True)

plt.scatter(x, y)

#calculate average of sales

avg = np.average(y)

#print("average=" + str(avg))

# draw horizonal line showing avaerage

plt.axhline(avg, color='r', linestyle='-', label = "average")

plt.legend(['average=' + str(round(avg, 1))])

plt.show()

We can use size parameter to show relative size of each data point.

import numpy as np
import matplotlib.pyplot as plt

x = [30, 31, 33, 35, 37, 36, 40, 35, 30, 32]
y = [400, 450, 500, 700, 800, 780, 1100, 600, 320, 380]

# compute size of bubble of each data point
x_arr = np.array(x)
y_arr = np.array(y)
size = y_arr

plt.xlabel("Temperature")
plt.ylabel("Sales in Rs")
plt.title("Mango Shakes Sales vs Temperature")
plt.grid(False)
plt.scatter(x_arr, y_arr, s = size, color = 'cyan', linewidth = 3, marker = 'o', edgecolor = 'blue')

#calculate average of sales
avg = np.average(y)
# draw horizonal line showing average
plt.axhline(avg, color='r', linestyle='-', label = "average")
plt.legend(['average=' + str(round(avg, 1))])
plt.show()

We can compare two data sets by drawing two plots on the same figure.

Here is 10-day sales of coffee vs day temperature. Let us draw a scatter plot comparing sales of mango shake vs coffee sales.

import numpy as np

import matplotlib.pyplot as plt

x = [30, 31, 33, 35, 37, 36, 40, 35, 30, 32]

shake_y = [400, 450, 500, 700, 800, 780, 1100, 600, 320, 380]

coffee_y = [400,350, 300, 280, 250, 200, 205, 300, 360, 300]

# compute size of bubble of each data point

x_arr = np.array(x)

y_shakearray = np.array(shake_y)

y_coffeearray = np.array(coffee_y)

plt.xlabel("Temperature")

plt.ylabel("Sales in Rs")

plt.title("Mango Shakes Sales vs Coffee Sales")

plt.grid(False)

plt.scatter(x_arr, y_shakearray, color = 'blue', marker = 'o', label = 'shake sales')

plt.scatter(x_arr, y_coffeearray, color = 'red', marker = 'x', label = 'coffee sales')

plt.legend(loc="upper left")

plt.show()

Here is the chart:  