Integrating web scraping, pivot longer, and spatial visualization |
Hello everyone, in this post I'll show you how to create spatial visualization using R. As we know that spatial analysis is important one, however I thought that not all of us can visualize spatial data as well. I felt it as I ever did it before.
In this section, we will scrape the data first then process it well using a bit of Natural Language Processing (NLP) method. After that, we link and match the data into single data. So, we need raster data specifically Jawa Timur (shape file, *shp) and latitude-longitude of Jawa Timur's region. The main data derived from BPS Jawa Timur, you can access it on this url: jatim.bps.go.id.
At last stage, we will learn and practice how to animate the spatial data using a special package often I used named gganimate(). This package is powerfull to change your general either graph or chart into an interactive visualization.
Noted: latitude-longitude data of Jawa Timur can be found in this site either shape file of Jawa Timur
How to practice web scraping, pivot longer, and spatial visualization using R? we can follow some stages belowed:
# packages activation to support the project
library(rvest)
library(xml2)
library(stringr)
library(dplyr)
# web scraping activity to get rice production data from BPS Jawa Timur in jatim.bps.go.id
# after that we integrate it into a data frame named prodberas
# first, we should declare the url and then read it
# second, we get some nodes from pages then scrape and extract it using a bit NLP process
url <- "https://jatim.bps.go.id/indicator/53/580/1/produksi-beras-menurut-kabupaten-kota.html"
laman <- read_html(url)
kako <- laman %>% html_nodes('#tablex td:nth-child(1)') %>% html_text()
kako <- kako[1:38] %>% str_replace_all("Kabupaten ", "")
jan <- laman %>% html_nodes('#tablex td:nth-child(3)') %>% html_text() %>% str_replace_all(" ", "")
jan <- jan[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
feb <- laman %>% html_nodes('#tablex td:nth-child(4)') %>% html_text() %>% str_replace_all(" ", "")
feb <- feb[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
mar <- laman %>% html_nodes('#tablex td:nth-child(5)') %>% html_text() %>% str_replace_all(" ", "")
mar <- mar[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
apr <- laman %>% html_nodes('#tablex td:nth-child(6)') %>% html_text() %>% str_replace_all(" ", "")
apr <- apr[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
may <- laman %>% html_nodes('#tablex td:nth-child(7)') %>% html_text() %>% str_replace_all(" ", "")
may <- may[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
jun <- laman %>% html_nodes('#tablex td:nth-child(8)') %>% html_text() %>% str_replace_all(" ", "")
jun <- jun[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
jul <- laman %>% html_nodes('#tablex td:nth-child(9)') %>% html_text() %>% str_replace_all(" ", "")
jul <- jul[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
ags <- laman %>% html_nodes('#tablex td:nth-child(10)') %>% html_text() %>% str_replace_all(" ", "")
ags <- ags[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
sep <- laman %>% html_nodes('#tablex td:nth-child(11)') %>% html_text() %>% str_replace_all(" ", "")
sep <- sep[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
okt <- laman %>% html_nodes('#tablex td:nth-child(12)') %>% html_text() %>% str_replace_all(" ", "")
okt <- okt[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
nov <- laman %>% html_nodes('#tablex td:nth-child(13)') %>% html_text() %>% str_replace_all(" ", "")
nov <- nov[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
des <- laman %>% html_nodes('#tablex td:nth-child(14)') %>% html_text() %>% str_replace_all(" ", "")
des <- des[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
as.numeric()
# the result
prodberas <- data.frame(kako, jan, feb, mar, apr, may, jun, jul, ags, sep, okt, nov, des)
prodberas
## kako jan feb mar apr may jun jul ags sep okt
## 1 Pacitan 74.30 7709.22 16664.44 6732.61 2924.93 5087.10 6623.02 1620.68 1232.37 1027.38
## 2 Ponorogo 4424.66 392.08 43211.19 42660.14 7572.77 8267.44 43204.56 20959.75 5310.70 237.95
## 3 Trenggalek 1652.96 419.30 16189.83 10576.06 2963.06 1588.58 13626.96 6630.16 1996.29 2534.68
## 4 Tulungagung 6812.89 2643.75 13299.86 33108.09 8882.33 4938.71 8323.36 21894.89 6265.12 6503.48
## 5 Blitar 3404.15 3482.39 16254.15 44325.64 18977.01 4482.54 5292.56 5915.28 8545.73 7723.40
## 6 Kediri 2299.77 3721.03 29636.05 23413.95 2567.84 2419.87 18711.00 4924.18 2788.64 522.82
## 7 Malang 15132.43 5992.60 18043.45 24490.44 17434.28 12156.26 12150.65 11444.80 10094.83 10731.35
## 8 Lumajang 14922.39 13933.55 19235.73 22531.27 15099.56 17643.17 15272.37 10363.57 8701.12 12770.96
## 9 Jember 20443.23 12864.52 46680.31 95342.61 27488.63 16158.38 29038.47 49950.82 20473.92 13698.28
## 10 Banyuwangi 18766.17 8434.83 34346.28 48585.98 23469.67 14631.95 15495.68 14525.81 21508.13 36279.30
## 11 Bondowoso 8311.20 8065.34 21660.89 21202.50 18287.10 12569.63 13348.98 7798.97 6177.20 8070.38
## 12 Situbondo 3546.16 6735.77 20347.51 15265.94 4353.95 4016.70 8990.11 6144.96 2032.09 3809.36
## 13 Probolinggo 4198.92 3996.79 20497.91 29060.97 14575.22 7594.48 5409.95 6532.54 5957.53 3913.96
## 14 Pasuruan 5891.52 5711.86 21935.32 21769.23 16768.56 6611.29 18064.41 9047.51 10650.58 8608.98
## 15 Sidoarjo 991.26 501.74 4514.21 21389.91 22029.76 4940.07 574.48 4428.10 18293.93 18186.59
## 16 Mojokerto 3152.40 9898.22 26483.37 40201.48 15314.56 13043.03 13875.79 15726.87 8891.03 7746.17
## 17 Jombang 3803.64 733.65 30062.37 61866.63 17515.80 1723.15 17593.24 28494.53 20942.02 2476.25
## 18 Nganjuk 3781.75 10409.78 63505.28 37186.04 4822.87 22270.83 37777.23 9210.10 6362.32 6165.72
## 19 Madiun 5671.56 509.65 45787.74 35963.82 8664.44 3114.32 49575.30 20405.76 6573.18 2792.35
## 20 Magetan 551.37 8171.18 41572.90 14198.04 2587.40 21116.14 26640.45 4842.41 2392.88 8024.92
## 21 Ngawi 2993.39 21771.86 93477.70 42048.82 3722.63 30360.45 85766.32 24278.50 3434.18 20954.32
## 22 Bojonegoro 5406.18 42889.97 163434.99 24443.56 6339.59 60792.22 41905.41 6817.31 10586.96 14665.17
## 23 Tuban 18410.00 21536.45 88058.10 39949.99 8148.49 7861.99 24870.00 10389.18 27278.44 27106.91
## 24 Lamongan 2909.94 67690.04 145597.68 17154.44 19765.76 78837.29 52327.94 43720.09 35996.41 33517.36
## 25 Gresik 357.21 59009.17 52374.15 8063.86 8186.15 62102.27 15976.65 8462.17 3947.69 13201.76
## 26 Bangkalan 888.87 14721.98 49103.81 7209.85 2656.17 15149.51 13442.62 5616.20 2152.18 2978.37
## 27 Sampang 0.00 15458.23 56465.20 1082.92 463.28 23476.00 2264.13 0.00 0.00 391.96
## 28 Pamekasan 0.00 3091.90 52094.19 2961.37 43.74 811.53 644.54 250.28 2260.99 0.00
## 29 Sumenep 1965.03 11746.06 63974.55 18788.22 1449.88 9445.65 10243.17 6536.89 2886.58 3770.35
## 30 Kota Kediri 359.72 514.83 589.74 1382.34 357.34 48.35 360.12 1053.16 257.72 89.53
## 31 Kota Blitar 27.28 0.00 132.75 1914.09 760.00 0.00 22.50 90.07 62.90 0.00
## 32 Kota Malang 262.10 787.72 852.10 113.45 451.00 515.84 478.32 548.66 578.63 206.73
## 33 Kota Probolinggo 68.61 0.00 564.23 2070.79 1635.11 171.68 0.00 0.00 41.14 127.80
## 34 Kota Pasuruan 118.18 300.78 333.17 291.51 399.23 654.25 283.33 266.36 329.79 495.03
## 35 Kota Mojokerto 0.00 104.28 0.00 383.90 513.84 32.62 185.39 333.78 490.10 413.67
## 36 Kota Madiun 0.00 0.00 1693.11 943.40 0.00 0.00 1761.78 185.72 0.00 47.49
## 37 Kota Surabaya 56.78 567.17 1972.79 500.10 120.30 85.30 332.95 428.29 300.33 58.52
## 38 Kota Batu 396.19 420.31 178.78 207.70 318.42 510.89 362.58 401.72 205.17 252.11
## nov des
## 1 2300.56 218.45
## 2 5958.71 25332.77
## 3 4982.36 3680.76
## 4 4420.27 2558.67
## 5 4012.82 2008.42
## 6 3619.84 2874.46
## 7 9773.96 9386.15
## 8 12485.35 10745.68
## 9 8424.42 10144.76
## 10 16977.50 13865.72
## 11 5599.75 6725.36
## 12 3498.58 3037.76
## 13 2937.44 2385.20
## 14 9814.76 7896.41
## 15 15476.82 1004.71
## 16 2906.61 5494.43
## 17 3747.07 5731.63
## 18 9028.34 6864.81
## 19 34340.69 18477.91
## 20 18018.97 5881.24
## 21 76797.07 30889.11
## 22 17796.48 11592.78
## 23 11361.31 3126.33
## 24 21212.26 3190.63
## 25 4795.04 983.41
## 26 480.49 0.00
## 27 0.00 0.00
## 28 0.00 0.00
## 29 2052.39 283.53
## 30 284.63 590.76
## 31 0.00 0.00
## 32 640.97 783.95
## 33 0.00 0.00
## 34 583.64 156.24
## 35 112.13 336.42
## 36 1633.42 348.81
## 37 117.41 147.16
## 38 111.42 539.55
# in this step, we active two packages that important to import the map and visualize it
library(sf)
library(ggspatial)
library(readxl)
# this step aims to import the map from computer and so latlong data to be integrated
nc <- st_read(dsn = "E:/R/petajatim", layer = "jatim")
## Reading layer `jatim' from data source `E:\R\petajatim' using driver `ESRI Shapefile'
## Simple feature collection with 38 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 110.8984 ymin: -8.780764 xmax: 116.2634 ymax: -5.527337
## CRS: NA
latlonjatim <- read_excel("E:/R/petajatim/latlonjatim.xlsx")
# this code aims to change the column's name of latlong data from "Kako" to "NAMA_KAB"
names(latlonjatim)[names(latlonjatim) == "Kako"] <- "NAMA_KAB"
# change the region's name in nc data to be proper char
nc$NAMA_KAB <- str_to_title(nc$NAMA_KAB)
# join the nc data and latlong data
joinpeta <- left_join(nc, latlonjatim, by = "NAMA_KAB")
# Deleting "Kabupaten"
prodberas$kako <- gsub("Kabupaten ", "", prodberas$kako)
# change the pattern "Kota X" into "X (Kota)"
prodberas$kako <- gsub("Kota (\\w+)", "\\1 (Kota)", prodberas$kako)
# change column's name from "Kako" to "NAMA_KAB"
names(prodberas)[names(prodberas) == "kako"] <- "NAMA_KAB"
# join the joinpeta and prodberas
hasil <- left_join(joinpeta, prodberas, by = "NAMA_KAB")
# selecting the main column for spatial visualization
hasil <- hasil[,-c(3:17)]
# change wide table to be longer using pivot_longer()
library(tidyr)
data_pivot <- hasil %>%
pivot_longer(cols = -c(KODE_KAB, NAMA_KAB, geometry, lat, long),
names_to = "bulan",
values_to = "nilai")
head(data_pivot)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 110.8984 ymin: -8.285555 xmax: 111.4228 ymax: -7.916109
## CRS: NA
## # A tibble: 6 x 7
## KODE_KAB NAMA_KAB lat long geometry bulan nilai
## <int> <chr> <dbl> <dbl> <MULTIPOLYGON> <chr> <dbl>
## 1 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ jan 7.43e1
## 2 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ feb 7.71e3
## 3 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ mar 1.67e4
## 4 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ apr 6.73e3
## 5 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ may 2.92e3
## 6 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ jun 5.09e3
# change monthly format into "2022-01-01" sampai "2022-12-01"
data_pivot$bulan <- gsub("(jan|feb|mar|apr|may|jun|jul|ags|sep|okt|nov|des)",
"2022-\\1-01", data_pivot$bulan)
# change the monthly name into numeric respectively
data_pivot$bulan <- gsub("jan", "01", data_pivot$bulan)
data_pivot$bulan <- gsub("feb", "02", data_pivot$bulan)
data_pivot$bulan <- gsub("mar", "03", data_pivot$bulan)
data_pivot$bulan <- gsub("apr", "04", data_pivot$bulan)
data_pivot$bulan <- gsub("may", "05", data_pivot$bulan)
data_pivot$bulan <- gsub("jun", "06", data_pivot$bulan)
data_pivot$bulan <- gsub("jul", "07", data_pivot$bulan)
data_pivot$bulan <- gsub("ags", "08", data_pivot$bulan)
data_pivot$bulan <- gsub("sep", "09", data_pivot$bulan)
data_pivot$bulan <- gsub("okt", "10", data_pivot$bulan)
data_pivot$bulan <- gsub("nov", "11", data_pivot$bulan)
data_pivot$bulan <- gsub("des", "12", data_pivot$bulan)
data_pivot$bulan <- as.POSIXct(data_pivot$bulan)
data_pivot
## Simple feature collection with 456 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 110.8984 ymin: -8.780764 xmax: 116.2634 ymax: -5.527337
## CRS: NA
## # A tibble: 456 x 7
## KODE_KAB NAMA_KAB lat long geometry bulan nilai
## * <int> <chr> <dbl> <dbl> <MULTIPOLYGON> <dttm> <dbl>
## 1 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-01-01 00:00:00 7.43e1
## 2 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-02-01 00:00:00 7.71e3
## 3 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-03-01 00:00:00 1.67e4
## 4 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-04-01 00:00:00 6.73e3
## 5 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-05-01 00:00:00 2.92e3
## 6 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-06-01 00:00:00 5.09e3
## 7 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-07-01 00:00:00 6.62e3
## 8 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-08-01 00:00:00 1.62e3
## 9 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-09-01 00:00:00 1.23e3
## 10 3501 Pacitan -8.18 111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-10-01 00:00:00 1.03e3
## # ... with 446 more rows
# Last step, the spatial visualization is ready to be served
library(gganimate)
x <- ggplot(data_pivot, aes(x = long, y = lat, fill = nilai)) +
geom_sf() + scale_fill_continuous(low = "#b7ffc2", high = "#00812a") +
ggtitle('Rice Production in Jawa Timur during January to December 2022',
subtitle = 'source: jatim.bps.go.id | visualization: jokoding.com') +
transition_states(states = bulan,
transition_length = 2, state_length = 1)
animate(x, height = 7,
width = 9, units = "in", res = 150)
##
Rendering [===================================================================================] at 1.1 fps ~ eta: 0s
I think enough for this sharing, feel free to command and share this post as your support this site. Welcome to understanding and practicing!