Landslide susceptibility assessment using frequency ratio model in Bogor, West Java, Indonesia

Landslides are common natural disasters in Bogor, Indonesia, triggered by a combination of factors including slope aspect, soil type and bedrock lithology, land cover and land use, and hydrologic conditions. In the Bogor area, slopes with volcanic lithologies are more susceptible to failure. GIS mapping and analysis using a Frequency Ratio Model was implemented in this study to assess the contribution of conditioning factors to landslides, and to produce a landslide susceptibility map of the study area. A landslide inventory map was prepared from a database of historic landslides events. In addition, thematic maps (soil, rainfall, land cover, and geology map) and Digital Elevation Model (DEM) were prepared to examine landslide conditioning factors. A total of 173 landslides points were mapped in the area and randomly subdivided into a training set (70%) with 116 points and test set with 57 points (30%). The relationship between landslides and conditioning factors was statistically evaluated with FR analysis. The result shows that lithology, soil, and land cover are the most important factors generating landslides. FR values were used to produce the Landslide Susceptibility Index (LSI) and the study area was divided into five zones of relative landslide susceptibility. The result of landslide susceptibility from the mid-region area of Bogor to the southern part was categorized as moderate to high landslide susceptibility zones. The results of the analysis have been validated by calculating the Area Under a Curve (AUC), which shows an accuracy of success rate of 90.10% and an accuracy of prediction rate curve of 87.30%, which indicates a high-quality susceptibility map obtained from the FR model.


Introduction
Landslides are defined as an event or series of events where a mass of rocks, soil, or debris moves down a slope. Landslide mechanisms include sliding, falling, or flowing of material down a slope due to gravitational pull (Das 2011;Motamedi 2013). Landslides are one of the most common natural disasters experienced in Bogor, Jawa Barat, Indonesia. A landslide can be influenced by various factors such as slope conditions and slope angle, lithology, soil type, and hydrologic or meteorological conditions. Another potential factor is induced by human activities such as deforestation, changes caused by construction of structures on the slope, undercutting the toe of the slope for road construction, etc. Human changes to the slope can make the slope become less stable. The negative impact of landslides includes damage to infrastructure (houses, buildings, roads, bridges, irrigation, canals, etc.), geological and environmental damage (fractures, creeping, and slumping), and serious injuries and loss of human life due to the landslide events.
From 2014 to 2018, there has been a significant increase in the number of landslide events in Bogor, with an increase from approximately 20 events in a year to over 60 events in a year. The increase in frequency of landslide events has also led to an increase in casualties, with 39 deaths and 48 injuries recorded since 2014.
Deaths and injuries can result from the collapse of structures impacted by the landslide or from being swept up in the material being moved downslope (Center for Volcanology and Geological Hazard Mitigation Geological Agency 2018).
Identifying areas with higher risk of landslides requires evaluation of the distribution and frequency of historical landslides. Historical landslides can be mapped efficiently with Geographic Information System (GIS) tools and the application of remote sensing (Audisio et al. 2009;Mandal and Mondal 2019;Yalcin 2008;Yalcin et al. 2011;Yilmaz and Keskin 2009). Quantitative spatial analysis can be used for mapping the landslide susceptibility of a region and providing the scientific information relevant for mitigation and prevention of future landslides in that region (Yilmaz 2009). Quantitative methods are used to assess the landslide events as it increases the ability to predict where, when and how frequent the occurrence of landslides in an area will be. It presents a complete and comprehensive assessment of landslide susceptibility, which includes analysis of model performance, prediction skills evaluation, uncertainty and estimation of errors (Guzzetti et al. 1999;Motamedi 2013).
Landslide Susceptibility (LS) is an assessment to quantify the volume or area and the spatial probability of a landslide event, by providing a relative estimation of the spatial events of landslides in a mapping unit based on the conditions of local terrain, and it may also include the information related to the temporal probability of the expected landslide event, the intensity and velocity rates of the existing or potential landslide events (Fell et al. 2008;Guzzetti et al. 1999;Lepore et al. 2011;Rossi and Reichenbach 2016). This method has various approaches. In this study, the statistical model was chosen because it has been largely used to assess LS and widely used by combining and integrating statistical models with the geographical data and open source of GIS applications. Many studies have tried to assess landslide susceptibility by increasing GIS applications using different models. Many of those studies have applied probabilistic models such as the frequency ratio (Audisio et al. 2009;Choi et al. 2012;Ehret et al. 2010;Lee et al. 2004;Lee and Pradhan 2006;Lepore et al. 2011;Mandal and Mondal 2019;Mezughi et al. 2011;Mohammady et al. 2012;Oh et al. 2009;Pal and Chowdhuri 2019;Pradhan and Youssef 2010;Rossi and Reichenbach 2016;Yalcin 2008;Yalcin et al. 2011;Yilmaz 2009;Yilmaz and Keskin 2009).
The basic idea is to use the information in combination with geo-environmental conditioning factors to extract the level of detail offered by the landslide data itself for determining landslide susceptibility in the study area. In this study, a bivariate statistical method called the Frequency Ratio (FR) was applied to derive a landslide susceptibility map for Bogor, West Java, Indonesia. FR was chosen for this research as a basic analysis for a preliminary probabilistic assessment, the mathematical simplicity and data extraction in a limited time period (rapid assessment).

Data and methodology
Bogor is one of the administration areas in the West Java Province, Indonesia. Bogor is divided into Bogor (Kota Bogor) and Bogor Regency (Kabupaten Bogor) with total area approximately 2782 km 2 . Geographically, Bogor is located between 6°19′ North latitude and 6°47′ South latitude, and between 106°01′ − 107°103′ East longitude. The geomorphology of the Bogor region includes relatively low plains in the north to highlands in the south. The area ranges in altitude from 15 m above sea level to over 2500 m above sea level. Bogor is geographically significant because its location is adjacent to Jakarta, the capital city of the Republic of Indonesia. The population density of Bogor Regency in 2016 was 2146 people/km 2 , while the population density of Bogor was 8985 people/km 2 (Statistics of Bogor and Bogor Regency, 2017). The dots on Fig. 1 represent the locations of historic landslides. The color-coded blue was used in the training dataset and red was used in the test set. A red box in the north-central western West Java is where Bogor is located within West Java.
The aim of this study is to assess and evaluate the susceptibility for landslides within the Bogor area in a GIS environment. Existing landslide data recorded from 1981 to 2018 and the conditioning factors (geology, slope, soil type, rainfall intensity, and land use) were used to analyze landslide susceptibility across the region. The relationship between a landslide and the conditioning factors was determined using the frequency ratio analysis. Our hypothesis is that there will be a strong relationship between landslide occurrence and both natural and human-influenced conditioning factors in the study area.
The study began with the compilation of a landslide inventory map based on previously recorded historic landslide events, documentation of field sites, and satellite imagery interpretation. A total of 173 observed historic landslide points were mapped in the area. The landslides in the landslide inventory were randomly divided into a training set area (70%) with 116 points, and a test set area with 57 points (30%) using the Subset Features Tools in ArcGIS. There is no specific limit in determining the distribution of training set and test set, but the greater the percentage of datasets for analysis, the higher the validation value (AUC) that will be obtained. In this study, the data set uses a ratio of 70%: 30% because the 70% data set is considered sufficient to represent analysis and 30% is considered sufficient to validate the model (Wang et al. 2016;Meena et al. 2019;Rossi and Reichenbach 2016).
The data are presented as geo-referenced points because the available landslide documentation uses point representation. The historic landslide inventory was obtained based on field observation through a ground check process (Fig. 2)   collected through visual identification on aerial photographs and satellite imagery using remote sensing by verifying the conformity of data with the main data inventory in order to create a homogeneous database. Landslides are described in terms of their presence (1) and absence (0) in a GIS database. Landslide points are discrete points as a representation of the source position near the scar of the landslide body (Neuhäuser et al. 2012).
The historic landslide events used in this database exhibit a variety of conditioning factors that drove the landslide events. Evaluating the study region for the various conditioning factors is critical in developing a landslide susceptibility map (Lee 2014;Yalcin et al. 2011;Yilmaz 2009). In preparation process, multi-resource spatial data were processed and the landslide inventory created (Table 1). Seven factors were considered that may contribute to slope failure, including slope gradient, slope aspect, elevation (with an interval of 250 ms), lithology, soil type, land cover (and land use), and rainfall intensity over a 10 year period from 2009 to 2018. These factors were determined for each historic landslide (and across the study area), and were collected and processed into a spatial database using GIS (Fig. 3).
In this study, a landslide location map was prepared based on interpretation of the satellite imagery and previous studies conducted in the area. To assess landslide susceptibility for an area, it is necessary to identify and remap both the conditioning factors and landslide locations. A remap table defines how the values will be reclassified in the new spatial database that is supported by integer values. In this classification, quantile was used and the input raster reclassified into unique values.
Because the availability of DEM data is limited, a Digital Surface Model (DSM) was used from TerraSAR-X data with resolution up to 5 m or equivalent to 1:50,000 of scale. DSM represents the elevation associated with topography and all natural or human-made features on the earth surface. DSM covers the West Java Region and was acquired in 2011 from the Geospatial Information  Agency of Indonesia. TerraSAR-X is a German Earthobservation satellite which provides value-added SAR (Synthetic Aperture Radar) data in the X-band, with a range of different modes of operation, allowing it to record images with different swath widths, resolutions and polarisations for research and development purposes. The accuracy of TerraSAR-X is shown in Table 2, evaluated by the Geospatial Information Agency of Indonesia. On Table 2, Data_ID is the code of reference samples, The Root Mean Squared Error (RMSE) is the square root of the average of the set of squared differences between collected coordinates and coordinates from an independent source of higher accuracy for identical locations, and linear error at 90% confidence (LE90) is a vertical (height) accuracy calculated based on the distance value that indicates 90% of the error or the difference in the height of the object on the map with the actual height value is not greater than the distance value (LE90 = 1.6499 × RMSEz (vertical)). The slope gradient and slope aspect are obtained from DEM data extraction using the 3D Analyst tool extension in ArcGIS 10.6. The frequency ratio (FR) is a bivariate statistical method that is simple to implement with accurate results. The FR is widely used in landslide susceptibility mapping (Choi et al. 2012;Ehret et al. 2010;Lee 2014;Lee and Pradhan 2006;Mezughi et al. 2011;Mohammady et al. 2012;Torizin 2011;Yalcin et al. 2011;Yilmaz 2009), and it is highly compatible with GIS technology (Lee 2014;Yilmaz and Keskin 2009;Yalcin et al. 2011). FR was used to calculate the ratio of the cell with landslide occurrence in each class for a reclassified factor or categorical factor (i.e., geology and land cover), and the ratio was assigned to each factor class again. The FR is the ratio of landslides in a desired class as a percentage of all landslides to the area of the class as a percentage of the entire map. So, a FR of 1 is the average value from the ratio of the area where landslides occurred to the total area (Ehret et al. 2010;Mezughi et al. 2011). Finally, the landslide susceptibility by FR was created using the overlay function in GIS, which is used to merge different factors assigned to the ratio. If the probability is high (the value > 1), there is a greater susceptibility for landslides. A lower value indicates a lower degree of landslide susceptibility in the region (the value < 1). The formula is as follows (Ehret et al. 2010): where Mi is the number of pixels with landslides for each subclass conditioning factor, Mi is the total number of landslides in the study area, Ni is the number of pixels in the subclass area of each factor and N is the number of total pixels in the study area (Ehret et al. 2010).
All of the selected conditioning factors are then evaluated with an Area under Curve (AUC). AUC is one type of accuracy in statistics for prediction models (probabilities) in the assessment or analysis of natural disaster events. The AUC value defines conditioning factors that would be used on landslide susceptibility mapping using FR method. The higher the value of AUC (if threshold definition gets the maximum value of 1), the higher the statistic accuracy of the model, which describes the prediction threshold independently (Lepore et al. 2011;Mandal and Mondal 2019;Meena et al. 2019;Mohammady et al. 2012;Pimiento 2010;Rossi and Reichenbach 2016;Yilmaz 2009). AUC of conditioning factors that is considered to be processed in this study must exceed the minimum limit of 0.6, which indicates that performance is higher classification by a chance.
To evaluate the influence of conditioning parameters on landslide activity, a threshold-independent method is used on Receiver Operating Characteristic (ROC) by presenting the results of the accuracy values obtained against a defined threshold value. Receiver Operating Characteristic (ROC) curves and area under ROC curves (AUC) are usually used to assess the binary response model such as a logistics model (Chung and Fabbri 2003;Pimiento 2010;Rossi and Reichenbach 2016;Wahono 2010). This curve depicts the use of validation and crossvalidation data sets to generate the ROC curve and estimate its area. The matrix value of each parameter is used to test the ROC curve on SPSS statistics. The calculated results are presented in percentage of study area classified as susceptible (x-axis) versus the cumulative percent of landslide occurrence (y-axis), with the area under curve (AUC) calculation formula, i.e., (Pimiento 2010): (1) where xi is the percentage of area and yi is the area of the landslide.
AUC is one type of accuracy statistics for prediction models (probabilities) in the assessment or analysis of natural disasters (Pimiento 2010). AUC is a graph of varying index numbers usually between a maximum value of 1 or equal to 100% and 0.5 or equal to 50%. From the AUC, value can be classified that is 0.9 as a very good model classification, 0.8-0.9 as a good model classification, 0.7-0.8 as a medium or reasonable model classification, and < 0.6 poor model classification, so the minimum AUC parameters are recommended with a minimum limit of 0.6. The higher the AUC value of a parameter, the higher the influence of the landslide event.

Lithology
Landslide occurrences and geology were mapped in the field to establish a landslide inventory and a geological map. The geological data of Bogor (Table 3) are provided in polygons at scales of 1:50,000 and were previously published by Geological Agency of Indonesia. Lithology plays an important role in the occurrence of landslides because it affects the strength and permeability properties of the bedrock associated with slope. The infiltration rates, soil drainage, and variability in the materials were considered as the factors which increase the pore pressure and decrease the stability of slope. This is because the lithology unit can have different characteristics such as composition, structure, and cohesion of soil which produce different resistance to the motion (Das 2011;Fell et al. 2008;Guzzetti et al. 1999;Lee 2014;Motamedi 2013;Torizin 2011;Wahono 2010).
From the second edition of the geological map of the Bogor, West Java, quadrangle in 1998, structural features like faults, folds, lineaments and joints are observed within the Oligocene to Quaternary rocks. Generally, the faults include N-S-, SW-NE-, and NW-SE-oriented strike-slip and normal faults. Folds are present as SW-NE, W-E, and NW-SE anticlines and synclines. Joints are commonly found and well developed in andesitic rocks of Quaternary age. Tectonic events at the end of Late Miocene resulted in two different structural patterns. Those tectonic events include a phase of uplift, followed by the intrusion of rhyolite rocks (Center for Volcanology and Geological Hazard Mitigation Geological Agency 2018).
The highest frequency of landslides based on Fig. 4 occurred where the slope consisted of older deposits, including lahar and lava flows, andesitic basalt, with oligoclase-andesine, labradorite, olivine, pyroxene, and hornblende (QVPO type with 22 historic events). The next highest frequency of landslides was hosted in QVK-type [21 events] materials (Breccia and lava of Gunung, Kencana, and Limo Mountains: Blocks of andesitic tuff and andesitic breccia with abundant pyroxene phenocrysts and basaltic lava); and QPV-type [13 events] materials (Endut volcanics: volcanic breccia, lava, and tuff ). The results of this study indicate that landslides in the Bogor District mostly occur on slopes hosted in volcanic or volcanic-derived lithologies.

Slope gradient and slope aspect
Landslides on the slope can occur due to the interaction of several conditions including morphological conditions, geology, geological structure, hydrology, and land use (Chung and Fabbri 2003). Slopes will be vulnerable if there are trigger factors such as intense rainfall events, vibration, or human activities (excavation, loading, and others) so that the fundamental thing is an inventory of historical data on landslide events to determine areas that will be affected by future landslides caused by both natural and artificial factors.
One of the geomorphologic parameters that is relevant for a landslide susceptibility study is the slope gradient because it has an important role in gravitational movements. Slope gradient represents elevation points which are greatly affected by the resolution of the DEM. The slope angle is typically considered to be one of the influential factors for landslide modeling because it controls the shear forces acting on hill slopes (Chung and Fabbri 2003;Lee 2014;Lepore et al. 2011;Mezughi et al. 2011;Mohammady et al. 2012;Pimiento 2010;Torizin 2011). Slope gradient was extracted from the DEM of TerraSAR-X satellite data as the first derivative of slope. The study areas are dominated by flat to moderately sloping slopes.
Soil moisture and soil thickness are related to differential solar insolation as a function of slope aspect. Slope aspect is considered an important factor in landslide studies and plays a fundamental role in slope stability due to variance in temperature and vegetation     (Guzzetti et al. 1999;Lepore et al. 2011;Lee 2014;Mezughi et al. 2011;Mohammady et al. 2012;Pimiento 2010;Yilmaz 2009). Slope aspect is divided into ten classes, i.e., Flat, North, Northeast, East, Southeast, South, Southwest, West, Northwest, and back to North (Fig. 6). The most observed landslide deposits with 18 observations were in south-facing slopes. The next highest observations were Northeast-facing, East-facing, and Southwest-facing slopes with 17, 15, and 14 historic landslide events, respectively (Table 3).

Land cover
Land cover is one of the main parameters controlling slope stability. Variability in vegetation cover influences the susceptibility of a slope to fail. Vegetation increases soil strength and cohesion through its roots which bonds soil sediments. The slope is more susceptible to slope failure if the soil surface lacks vegetation cover, especially during heavy rainfall events which produce surface runoff. The probability of the slope failure events in the future will likely occur the same as past or present events due to the same unstable conditions (Fell et al. 2008;Lepore et al. 2011). Based on a published land cover map of the Ministry of Forestry's Republic of Indonesia (2011), land cover in the study area was classified into 12 classes ( Table 3). The most observed landslide events [45 events] occurred on slopes with Dryland Agriculture land cover.
Other land use categories with higher records of historic slope failures (Fig. 7) include Settlement (25 events), Mixed Dryland with Shrubs (18 events), and Shrubs (8 events).

Soil
Particle size, shape, and pore distribution of the soil matrix influence slope stability and certain soil characteristics may be useful observations for assessing landslide frequency (Das 2011;Fell et al. 2008;Mezughi et al. 2011;Rossi and Reichenbach 2016). The soil properties influence infiltration of water, the velocity and the rate of interflow and baseflow of water movement, and the capacity of the soil to hold water. Soils with smaller (finertextured) particles such as clay and silt have a larger surface area than the coarse-textured soils, and tend to hold large volumes of water, especially under unsaturated conditions (Lepore et al. 2011). In both the bedrock and the soil cover-cohesion, permeability, etc. are important.
Increased pore pressures will weaken both rock and soil. A soil map from the regional planning agency of Bogor was used to classify soil types in the study area soil in the study area (Fig. 8). Soil in the study area was classified into 17 types, which were further lumped into associations based on soil properties (

Distance to lineament
Distance to lineament plays a role in influencing slope stability (Lee 2014). The geological structure weakens rock strength so that rocks are more weathered and eroded. Especially when it rains, water will enter the fracture so that it triggers soil movements. The lineaments are closely associated with the higher elevation highlands in the south as well. Based on maps obtained from the Geological Agency in 2013, the distance to lineament from the landslide location of the study area was classified into 11 classes with 250 m intervals (Table 3). The most landslide occurred first in class range 250-500 m and 500-750 m with total 36 landslide events, and the third is in class range 750-1000 m with total 19 events ( Fig. 9).

Rainfall intensity
Rainfall becomes one of the parameters that affect ground motion because it increases the pore pressure and increases soil moisture conditions on the slope, which then causes the resisting forces keeping the slope stable to decrease. The impact of rainfall intensity is magnified by the slope gradient and land cover (Fell et al. 2008;Lepore et al. 2011;Motamedi 2013). Rainfall data describe the amount of water that falls to the ground surface during a certain period of time and is measured in units of millimeters (mm). Rainfall data from the regional planning agency of Bogor in 2015 were used to classify rainfall amounts in the study area into six classes from a 10-year period ( Table 3). The highest frequency of landslide events (Fig. 10) occurred where the total rainfall amounts were between 3500 and 4000 mm/year (49 historic events) and lesser frequency were in 4000-5000 mm/year (30 events) and 3000-3500 mm/year (18 events).

Discussion
Landslide susceptibility maps plot the potential for landslides to occur in a region as a function of the environmental conditions of the area. These susceptibility maps also assist in classifying the size (volume of material displaced and aerial distribution of the debris lobe) and spatial distribution of landslide events in the region and as such are predictive models that can inform the regional planning for hazard mitigation and relief (Fell et al. 2008;Mandal and Mondal 2019;Mezughi et al. 2011;Oh et al. 2009;Yilmaz and Keskin 2009). The quantile classification method is used in this study so that each data class will have the same amount of data. The landslide susceptibility map classifies the region into four classes: very low susceptibility, low susceptibility, moderate susceptibility, and high susceptibility of landslide events occurring. The next step is the validation of the landslide susceptibility zone through the determination of the success and prediction rates. The success rate is a calculation of the success of a model that shows how well the model matches the prior events (Chung and Fabbri 2003;Wahono 2010). To generate the success rate curve, the calculated index values of all cells in the study area were sorted in descending order and were divided into 256 equal classes ranging from highly susceptible classes to non-susceptible classes. After that, the success rate curve was built by plotting the susceptible classes starting from the highest values to the lowest values on the X-axis and the cumulative percentage of landslides occurrence on the Y-axis.
Meanwhile, prediction rate is the validation of calculations on predictive assessments that show how well the model can predict unknown upcoming events or posterior events (Mezughi et al. 2011;Pimiento 2010;Wahono 2010). The prediction rate curve was prepared using the same data integration and representation procedures of preparing the success rate curve as described above but in this case the validation landslides group was used instead of the estimation landslides group (Mezughi et al. 2011). From the validation process, the AUC value for the success rate is 0.901 and the AUC value for the prediction rate is 0.873. This result show that the model is a good and reasonable model because the AUC value obtained exceeds minimum limit of 0.6 (Fig. 11).
The obtained ratio values using FR were assigned as weight values to the classes of each factor map to produce weighted conditioning factor using the raster calculator tool in ArcGIS to produce the Landslide Susceptibility Index (LSI) map. The landslide susceptibility index (LSI) for each pixel, is the summation of total overlapped pixels using equation (Mezughi et al. 2011): where LSI = Landslide susceptibility index; Wm = Weighted thematic maps of conditioning factors.
From the calculation, the study area was divided into five zones of relative landslide susceptibility, i.e., very low susceptibility, low susceptibility, moderate susceptibility, high and very high susceptibility of landslides (Fig. 12). The high and very high landslide susceptibility regions are where the steeper slope gradients are, which is dominated by dryland agriculture and settlement, latosols type of soil, and are closer to lineaments in which there are higher total rainfall amounts. In addition, Bogor's higher population density (8985 people/km 2 ) and the reactional area of Bogor regency are located mostly near or on slopes hosted in volcanic-derived lithologies (near Mount Salak, Mount Pangrango, and Mount Gede, located at Sukabumi and Cianjur regency). This shows the area is divided into very low to moderate landslide susceptibility in the northern part of this region (low plain landscape type) and moderate to high susceptibility of landslides in the southern part of this region (highland landscape type).
(3) LSI = Wm 1 + Wm 2 + Wm 3 + · · · · · · + Wm n , Conclusion Seven conditioning factors were collected and processed into a spatial database using GIS. The conditioning factors include slope gradient, slope aspect, elevation (with interval 250 m), lithology, soil type, land cover (and land use), and rainfall intensity over a 10-year period from 2009 to 2018, using Frequency Ratio (FR) model. The results have been validated by comparing each factor with the landslide validation set, and have been tested by calculating the prediction rate curve. The result from FR shows that lithology, soil, and land cover are the most important factors affecting landslides. The calculated success rate for FR model is 90.10% and the prediction rate of 87.30% is the accuracy. The curve provides a basis to distinguish different susceptibility levels which were classified into five susceptibility zones. The southern uplands in the Bogor region are more susceptible to landslides because the slopes are steeper and consist of older deposits, the lithology is dominated by volcanic or volcanic-derived lithologies, there is more intense rainfall, and it is dominated by dryland agriculture and settlement.