Trends of Internet Usage IN the United States
FNAR337: INFORMATION DESIGN AND VISUALIZATION, Penn Design - spring 2019
Original Article on Medium
Introduction
The Internet has brought these significant and potent changes to the structures of society:
Improves speed and scale of communication and global connection
Allows for fast and open dissemination of information on current time events
Gives users access to all information publicly available on the internet in nearly every realm (academia, industry, news)
The access to constant and current information encourages a permanent and sizable change in how society thinks, learns, and grows.
Furthermore, specifically in the United States, internet access and usage is nearly ubiquitous in major cities and middle class suburbia. Over time, technology and internet usage are becoming critical tools in how and what children are taught.
As a result, an interesting point of study is regions of the United States that have low internet usage. Starting with a Kaggle data set called “People Without Internet”, I explore the possible causes and demographics of these low internet usage areas.
Understanding and Manipulating the Original Data Set
The data set breaks regions up into counties and specifically looks at counties with a population greater than 65,000. This initial point is important to consider when visualizing the data in a map or analyzing geographic data. Many counties in the midwest with low population densities are not considered in this data set, likely due to the the fact that the original data set evaluates data points as raw counts instead of percentages in relation to area population. Another point to consider is that the main trend being discussed in this analysis is the percentage of people who do not use the internet, as opposed to tracking the percentage of people who do use the internet.
Basic Statistics on Internet Non-Usage per County:
Ranges from 2.66 to 54.011
Median is 14.711 and Mean is 15.26
Standard Deviation is 0.39
Utilizing these measures to inform my next steps, I set a twenty percent threshold for determining an abnormally high rate of internet non-use. The following map shows only regions that have over twenty percent of their population not using the Internet.
Trends in Presented Data
The data set included some basic demographic information on the racial and educational background of the regions. While these numbers in the original data set were presented as numbers, I converted them into percentage figure such that counties could be compared with each other with this information.
Here are some trends that I found using the percentage of people that do not use internet and other features of the data set.
Making Sense of the Outliers
While I previously defined 20 percent of the population to be a benchmark for high internet non-usage, many counties fell into that category. However, raising the benchmark to 35 percent, highlights just 6 outliers with very high population percentages that do not use the internet. After graphing trends between economic and educational backgrounds, I looked at data about the racial demographics of these regions.
I quickly found that three out of the six regions are high in Native American populations. After some brief research, I confirmed that Apache, Navajo, and McKinley counties are all high in traditional Native American groups that follow a more traditional lifestyle without much exposure to technology. I later found that additional counties in the area, like the San Juan County, also had low internet usage because how common traditional Native American lifestyles were in the area.
At this point, I took a step back to take another look at the various features for low internet usage for the range of 30 to 40 percent of the population not using the internet.
A common trend for this data range was that the percent of the population holding at least a bachelor’s degree was usually in the single digits and always under 13 percent.
The next step I took was to graph the final level of education for the populations of these counties. In many of these areas, a large portion of the population ended with a high school diploma being their highest level of education.
One observation I made was that the number of people who leave a bachelor’s degree unfinished is usually significantly higher than those who actually finish their undergraduate degree. In many of these counties, the highest level of education is most commonly an unfinished undergraduate degree. Connecting this back to Figure 4 that compared the trends of higher education and internet usage, a large portion of the people who later in life do not use the internet to their convenience fall into the category of unfinished bachelor’s degrees. Perhaps higher college retention rates in these areas would result in higher percentages of internet usage in the future of these regions.
Conclusion
Through analyzing the demographic information presented alongside internet usage data, I was able to draw meaningful hypotheses on correlations and trends embedded in the data. While correlation alone does not prove a relation, brief qualitative research can support and inform the analysis. Here, I attributed major outliers to traditional Native American settlements and areas of lower education levels. While potential remedies may also be interwoven in data trends, these are much more complex to analyze and then execute.
I have attached an Interactive Heat Map of Internet Non-Use Data, hosted on Tableau.