Histograms

Understanding and using Histograms

What is a Histogram?

Histograms are a specific variation of bar charts, and provide a way to show distributions of data. Statistician Karl Pearson first coined the use of the term histogram in 1892 in his lectures. The “Philosophical Transactions of the Royal Society of London” states that histogram originates “as a term for a common form of graphical representation, i.e., by columns marking as areas the frequency corresponding to the range of their base.”

A histogram tracks the different values found in one set of data as a series of connected bars. Statisticians, scientists, and analysts refer to the widths of each bar as bins or classes.

Histogram shows player counts by base salary ranges, from $0 to over $34 million

You should use a histogram if:

  • You would like to explore how members within a category in a dataset are distributed. i.e. The breakdown of salaries within an organization with the ability to see how balanced your pay scale is, or the count of bank members that have X amount of dollars in their accounts
  • You have one continuous, numerical value that can be split into multiple bins
  • You are looking to understand the distribution of values within a single category

Avoid using a histogram if:

  • You need to analyze multiple dimensions simultaneously
  • Your data set isn’t scaled correctly
  • You want to compare the specific values of individual data points
Histogram of Austin Airbnb prices up to $500, shown in $25 bins

Great example of a Histogram

This histogram looks at Airbnb rentals in Austin, Texas, showing price per day in $25 bins. The chart has a right-skewed distribution, and the average price for an Airbnb seems to be between $50 a night and $150 a night.

  • This histogram uses only one color
  • It looks at one measure
  • It has an easily estimated average