Trends of Internet Usage IN the United States

FNAR337: INFORMATION DESIGN AND VISUALIZATION, Penn Design - spring 2019

Original Article on Medium

Introduction

The Internet has brought these significant and potent changes to the structures of society:

  1. Improves speed and scale of communication and global connection

  2. Allows for fast and open dissemination of information on current time events

  3. Gives users access to all information publicly available on the internet in nearly every realm (academia, industry, news)

The access to constant and current information encourages a permanent and sizable change in how society thinks, learns, and grows.

Furthermore, specifically in the United States, internet access and usage is nearly ubiquitous in major cities and middle class suburbia. Over time, technology and internet usage are becoming critical tools in how and what children are taught.

As a result, an interesting point of study is regions of the United States that have low internet usage. Starting with a Kaggle data set called “People Without Internet”, I explore the possible causes and demographics of these low internet usage areas.

Understanding and Manipulating the Original Data Set

Figure 1: Visualization of All Counties that Are Considered

Figure 1: Visualization of All Counties that Are Considered

The data set breaks regions up into counties and specifically looks at counties with a population greater than 65,000. This initial point is important to consider when visualizing the data in a map or analyzing geographic data. Many counties in the midwest with low population densities are not considered in this data set, likely due to the the fact that the original data set evaluates data points as raw counts instead of percentages in relation to area population. Another point to consider is that the main trend being discussed in this analysis is the percentage of people who do not use the internet, as opposed to tracking the percentage of people who do use the internet.

Basic Statistics on Internet Non-Usage per County:

  • Ranges from 2.66 to 54.011

  • Median is 14.711 and Mean is 15.26

  • Standard Deviation is 0.39

Utilizing these measures to inform my next steps, I set a twenty percent threshold for determining an abnormally high rate of internet non-use. The following map shows only regions that have over twenty percent of their population not using the Internet.

Figure 2: Counties where over 20% of the population does not use the internet

Figure 2: Counties where over 20% of the population does not use the internet


Trends in Presented Data

The data set included some basic demographic information on the racial and educational background of the regions. While these numbers in the original data set were presented as numbers, I converted them into percentage figure such that counties could be compared with each other with this information.

Here are some trends that I found using the percentage of people that do not use internet and other features of the data set.

Figure 3: (Left) Shows positive correlation between lack of internet usage and poverty (Right) Shows negative correlation between lack of internet usage and median household income. In essence, both these trends prove the same point: the relationshi…

Figure 3: (Left) Shows positive correlation between lack of internet usage and poverty (Right) Shows negative correlation between lack of internet usage and median household income. In essence, both these trends prove the same point: the relationship between money and internet usage in families.

Figure 4: Interestingly, while there is a correlation between finishing an undergraduate degree and using the internet, there is no correlation between having some college experience and using the internet.

Figure 4: Interestingly, while there is a correlation between finishing an undergraduate degree and using the internet, there is no correlation between having some college experience and using the internet.

Figure 5: Three of the six regions with low internet usages are high in Native American populations.

Figure 5: Three of the six regions with low internet usages are high in Native American populations.

Making Sense of the Outliers

While I previously defined 20 percent of the population to be a benchmark for high internet non-usage, many counties fell into that category. However, raising the benchmark to 35 percent, highlights just 6 outliers with very high population percentages that do not use the internet. After graphing trends between economic and educational backgrounds, I looked at data about the racial demographics of these regions.

Figure 6: A cluster of the remaining data points with internet usage levels in the 30–40s. Many of these counties have poor rates of higher education.

Figure 6: A cluster of the remaining data points with internet usage levels in the 30–40s. Many of these counties have poor rates of higher education.

I quickly found that three out of the six regions are high in Native American populations. After some brief research, I confirmed that Apache, Navajo, and McKinley counties are all high in traditional Native American groups that follow a more traditional lifestyle without much exposure to technology. I later found that additional counties in the area, like the San Juan County, also had low internet usage because how common traditional Native American lifestyles were in the area.

At this point, I took a step back to take another look at the various features for low internet usage for the range of 30 to 40 percent of the population not using the internet.

A common trend for this data range was that the percent of the population holding at least a bachelor’s degree was usually in the single digits and always under 13 percent.

The next step I took was to graph the final level of education for the populations of these counties. In many of these areas, a large portion of the population ended with a high school diploma being their highest level of education.

Figure 7: In most of these regions, a large percentage of the population has no experience with higher education. Interestingly, in many of these counties, the number of people who start and do not finish a bachelor’s degree is significantly higher …

Figure 7: In most of these regions, a large percentage of the population has no experience with higher education. Interestingly, in many of these counties, the number of people who start and do not finish a bachelor’s degree is significantly higher than the number of people who end up finishing their undergraduate studies.

One observation I made was that the number of people who leave a bachelor’s degree unfinished is usually significantly higher than those who actually finish their undergraduate degree. In many of these counties, the highest level of education is most commonly an unfinished undergraduate degree. Connecting this back to Figure 4 that compared the trends of higher education and internet usage, a large portion of the people who later in life do not use the internet to their convenience fall into the category of unfinished bachelor’s degrees. Perhaps higher college retention rates in these areas would result in higher percentages of internet usage in the future of these regions.

Conclusion

Through analyzing the demographic information presented alongside internet usage data, I was able to draw meaningful hypotheses on correlations and trends embedded in the data. While correlation alone does not prove a relation, brief qualitative research can support and inform the analysis. Here, I attributed major outliers to traditional Native American settlements and areas of lower education levels. While potential remedies may also be interwoven in data trends, these are much more complex to analyze and then execute.

I have attached an Interactive Heat Map of Internet Non-Use Data, hosted on Tableau.