Histograms
Understanding and using Histograms
Understanding and using Histograms
Histograms are a specific variation of bar charts, and provide a way to show distributions of data. Statistician Karl Pearson first coined the use of the term histogram in 1892 in his lectures. The “Philosophical Transactions of the Royal Society of London” states that histogram originates “as a term for a common form of graphical representation, i.e., by columns marking as areas the frequency corresponding to the range of their base.”
A histogram tracks the different values found in one set of data as a series of connected bars. Statisticians, scientists, and analysts refer to the widths of each bar as bins or classes.
Histograms split a single continuous measure into bins, or groups, that represent a specific range of values. Data points are then grouped into these equally sized bins. The bins are then displayed visually as bars stacked next to each other. To measure the scores of Olympic divers, one bin could contain scores between 2 and 4, the next between 4 and 6 and another the scores between 6 and 8.
Bins are measured by the number of occurrences within each range of values. This count will alter the appearance of the view depending on where the values from the data are concentrated. When the values are concentrated on one side or the other of the middle it is called skew. Other ways the data could display are called:
The x-axis used on a histogram functions as the width of the bars, while the y-axis functions as the height. Unless a particular category has no frequency at all, there should be no spaces between bars.
There are two primary ways of displaying data in a histogram: Count of values within bins, and Density of values (% of Total).
Histograms work best when displaying continuous, numerical data. If the user wants to analyze the average number in a group of measurements, a histogram can give a viewer a grasp of what to generally expect in a process or system. A restaurant that wants to display its busiest hours online might use a histogram. It would split the average number of customers that visit every day into bins that measure how many customers visit on average every hour.
Unlike bar charts, histograms do not support comparisons between two or more categories. These charts make it easy to analyze distribution of the numbers, and the largest frequencies. They make it easy to perform statistical analysis, especially when it comes to analyzing population data (Ex. age, sex, ethnicity).
Some other measurements that fit into a histogram might include depth of water or difference in temperature. A country’s census might also work in histogram form if the goal was to show the number of people who were born in a certain decade.
For data sets that impact customers, consumers, or clients, histograms can be used to measure satisfaction. Some customers may feel their needs aren’t being met and might score the process with a low number, while others score it with a high number. A histogram can help you find the average and determine if the process itself needs improvement.
To plot a histogram you need a continuous value and an axis starting at zero to properly display the count of values within each bin. While these counts can be zero, there won’t be negative values.
This histogram looks at Airbnb rentals in Austin, Texas, showing price per day in $25 bins. The chart has a right-skewed distribution, and the average price for an Airbnb seems to be between $50 a night and $150 a night.