A. Data Collection
This dataset is collected from the Smithsonian Institution National Museum of National History Global Volcanism Program website. It is about worldwide volcanic eruption activities of 1343 in total. By using the geo data, the geomap visualization can be generated to display areas with frequent volcano activities.
b. Dataset II:This dataset is collected from the same website as dataset 1. It shows the information about the number of volcanoes found in 77 different counties, which provides geospatial information and can help us have a worldview of the volcanoes activities.
c. Dataset III:This is collected from the same website as the dataset 1. It contains 48 volcanoes that were in continuing eruption status as of 27 January 2022. It includes columns called ‘Eruption Start Date’ ‘Eruption Stop Date’, ‘country names’, and ‘WVAR (rollover for report)’.
d. Dataset IV:This dataset is collected from Kaggle. It contains 36 columns which describe the properties of the volcano, as well as the relevant economic and human impacts from the volcanic eruptions, including number of deaths, number of missing, number of injuries, etc.
e. Dataset V:This earthquake dataset is collected from Kaggle. It contains 23,412 observations of earthquakes ranging from 1965 to 2016. Some of the significant columns used in this project are Date, Latitude and Longitude.
B. Cleaning & Design:
Figure1:
Fig1 is a Leaflet Map plot generated by R with the “Mapview” library. Since we would like to know more about the distribution of volcanoes worldwidely, this plot was first generated as a map with multiple points standing for each volcano spot. However, to let the plot make more sense, a standard news form was considered. Moreover, to make the map clearer, we use ‘layersControlOptions’ to directly show the total number of volcanoes in each country. By clicking each point shown in the map, it shows the corresponding Volcano location. This map is easy to follow and can be viewed interactively by selecting different kinds of maps background at the top right corner.
Figure2:
Analyzing the trend and pattern of volcanoes’ behaviors plays an important role to predict the future and help people have better preparation for such natural disasters. Thus, a bubble plot is made by python plotly using the source from dataset2. The plot includes information about the eruption era, countries and number of volcanoes. Thanks to it, It’s very easy to get a general conclusion of volcanic eruption trends.
Figure3:
Referring to the Fig3.1 and Fig3,2, these two scatterplots were generated using ggplot and plotly libraries in R. The color of the point in this plot is marked as the status of VEI, which is a relative measure of the explosiveness of volcanic eruptions, from non-explosive to very large explosiveness. Thus, I utilize the relative explosivity status as a color label to display the scatters. In addition, “start date” and “end date” are two columns show the exact start and end times of volcanic eruptions, the “durations days'' is calculated here to filter the volcanoes activity situation which last between 1-2 years, since duration of volcanic eruptions is common in this time range. And instead of using the volcano name as the x-axis which makes no sense , the country is used here to stand for the area of activity.
Figure4:
Fig4 is a geographical map plot created in python using plotly package. Since longitude and latitude measurements are provided in this dataset, we draw a world map as the underlying image, coloring data points with different sizes stand for the spread of volcanic explosivity index. Since the time range of historical data is so large, we take a subset of volcanoes after 1800. By labeling the data points using measurement of relative explosivity, we can draw more conclusions about the relationship between explosiveness status and volcanic movement.
Figure5:
Referring to the Fig5, this graph shows the comparison between volcano eruption evidence categories. The ggplot as well as plotly library is used here in R to generate this density plot. Density plot can represent the distribution of the “last eruption year” variable and the peak of this plot helps display when the eruptions are concentrated over the interval. There would still exist some continuously erupting volcanoes if we use the end year as x axis, thus, here we used last eruption year as our x label.
Figure6:
The bar plot(Fig6) presents volcanoes with more active eruptions(top40) using plotly and ggplot library in R. Since every volcano has an uneven level of volcanic activity, we display the bar plot to show the number of eruptions for those more active volcanoes. In addition, the geographical locations are added right next to the volcano name to see where these volcanoes are originally from.
Figure7:
In figure 7 below, we convey the information of how volcanic eruptions related to earthquakes. It is created from the Python Plotly library. We used the animated scatter geographical plot because it optimally conveys the message that how erupted volcanoes and earthquakes are distributed in a given time period. By moving the bar underplot, it shows the distribution of events at different months. Hovering over the dots tells the type of the event, the month of the event, the longitude and latitude of the event. In addition, the plot supports the function of panning, zooming, and box selection. It combines both the dataset of earthquakes and volcanic eruptions, and fills the years without any events with the label ‘no event’ to make sure the time intervals of the time ticks of the graph could be even.
Figure8:
Figure 8-1 is created from the Python Plotly library from volcano dataset, which conveys the information of the number of death from volcano eruptions from 2010 to 2018 in each country. The volcano dataset has lots of missing values, here fill the missing values with the median. This explains why many countries have the same number of total deaths. An alternative to scatter plot would have been a box plot, however, we felt this wasn’t an optimal choice because some countries have only a few entries of volcano eruptions, as well as they were originally missing and filled with the same median value, these data would be presented as lines in the box plot, and not showing that much of distribution. What is more, Indonesia has a much larger value at around 367, which results in the visualization in an uncoordinated design. In order to observe the total deaths in different countries in a better way, this project removes that row and plots a new visualization as figure 8-2, which gives a better comparison of how total deaths vary by country.
Figure9:
Since recognizing the networking relationships between the volcanic eruptions could help improve the development of volcanic geology, ARM and networking analysis has been done here. This is a technique to uncover how items are associated with each other. Support rule was selected here in order to analyze the proportion of transactions in which an itemset appears. Firstly, columns “Volcano Country”, “Volcano Type”, “Volcano Status”, and “Volcano Time'' were chosen, and then they were transferred to text data into a text file. The top 10 rule’s results were imported to generate an interactive visualization. People could get networking conclusions of information included in the four columns.