The seismicity data used in this study were taken from the 2017 PuSGeN earthquake catalog and moment magnitude conversion. PuSGeN is an Indonesian research consortium specializing in geohazard, and consists of experts from government institutions, academic universities, and the private sector. The catalog is a compilation of several catalogs; i.e., the United States Geological Survey (USGS), the International Seismological Centre Global Earthquake Model (ISC-GEM), Engdahl, van der Hilst, and Buland (EHB), and data from the Indonesian Meteorology, Climatology, and Geophysical Agency (BMKG). The advanced double-difference relocation technique has also been applied, using regional BMKG networks to improve the accuracy of the hypocenter locations.

The GPS data used around the Sumatran islands in this study are mainly taken from Bradley et al. (2017), Chlieh et al. (2007), Prawirodirdjo et al. (2010), and Shearer and Burgmann (2010). In the case of the Andaman and Nicobar Islands, the pre-seismic velocity crustal movement model was made on the basis of forward modeling (Okada 1985, 1992) by referring to the co-seismic data of Subarya et al. (2006) and Chlieh et al. (2007) in which the plate convergence rate of 14 mm/year (Bilham et al. 2005) is used.

Earthquakes occurring with a magnitude *M*_{w} ≥ 4.6 and depth range of *H* ≤ 50 km around the island of Sumatra dating from 1963 to 2016 were selected. We found that the regional *b*-value is ~ 1.0. The earthquake catalog was then declustered in order to obtain the independent earthquake events using ZMAP software (Wiemer 2001). The surface strain estimates are based on least-squares collocation (LSC), and the result of existing GPS data around Sumatra and its surroundings (Bilham et al. 2005; Bradley et al. 2017; Chlieh et al. 2007; Subarya et al. 2006). The seismicity smoothing was also estimated for the declustered catalog data with correlation distances of 25, 50 and 150 km in which *M*_{w} ≥ 5.0 and depth range of *H* ≤ 50 km were used for this study. Figure 1 shows the earthquake catalog data and the *b*-value (a) and GPS data used for surface strain rate estimation (b).

### Seismicity smoothing

In this study, Frankel’s algorithm (1995) of seismicity smoothing of earthquake data is used to determine *A*-value with minimum subjective judgment. Seismicity smoothing needs to be done because the location where the earthquake would most likely occur in the future will not necessarily be the same place where the previous earthquake occurred. Therefore, a factor which takes into account the uncertainty of future earthquake locations was used.

First of all, gridding needs to be done on the area to be studied; the number (*n*_{i}) of earthquake events with magnitudes greater than the reference (*M*_{ref}) is then counted in each cell. The count of n_{i} represents the maximum likelihood estimate of 10^{a} for earthquakes above *M*_{ref} in the cell (Bender 1983). The grid of *n*_{i} values is then smoothed spatially by using a Gaussian function with correlation distance *c*. For each cell *i*, the smoothed value is obtained from:

$$\tilde{n}_{i} = \frac{{\mathop \sum \nolimits_{j} n_{i} e^{{\frac{{ - \Delta_{ij}^{2} }}{{c^{2} }}}} }}{{\mathop \sum \nolimits_{j} e^{{\frac{{ - \Delta_{ij}^{2} }}{{c^{2} }}}} }},$$

(1)

in which \(\tilde{n}_{i}\) is normalized to preserve the total number of events, ∆*ij* is the distance between the *i*th and *j*th cells, and c is the correlation distance. In Eq. (1), the sum is taken over cell *j* within a distance of 3*c* from cell *i*.

### Occurrence rate function

The theoretical earthquake occurrence rate function for a particular cell, *v*_{i} (≥ *M*_{ref}) is given by

$$v_{i} \left( { \ge M_{\text{ref}} } \right) \approx \frac{{N_{i} }}{T},$$

(2)

in which *N*_{i} is the number of earthquakes with magnitude ≥ *M*_{ref} in cell *i* and *T* is the length of the record. *v*_{i} basically represents the 10^{a} of the earthquake with magnitudes equal to or greater than *M*_{ref}. Magnitude *M*_{ref} can be decided from the viewpoint of magnitude completeness. Thus, applying the Gaussian function to smooth the seismicity implies accepting the 10^{a} by Eq. (2). Furthermore, the following equation can also be written when substituting 10^{a} of Eq. (2) in Eq. (1):

$$v_{i} \left( { \ge m} \right) \approx \frac{{\tilde{n}_{i} \left( { \ge M_{\text{ref}} } \right)}}{{T. {\text{bln}}\left( {10} \right)}}10^{ - bm} \left( {1 - 10^{{b(m - M_{\text{max} } )}} } \right),$$

(3)

in which \(\tilde{n}_{i} \left( { \ge M_{\text{ref}} } \right)\) is the smoothed value for cell *i* of the number of earthquakes above reference magnitude during the time interval *T*, and b is the uniform *b*-value.

### Least-squares collocation

Least-squares collocation (LSC) is a generalized estimation method that combines adjustment, filtering and prediction (Mikhail and Ackermann 1976). This method is particularly appropriate for determining the terrestrial gravity field from arbitrary data, but it can also be applied to interpolation and transformation problems that arise in geodesy. Referring to a systematic and fairly comprehensive elementary presentation of the theory and its application to the Japanese Islands (El-Fiky 1998; Oware 1998), the method of LSC was applied to define the seismic moment rate around the Sumatran Subduction Zone. In this study, we mainly adopt the approximation by Ward (1994) and others (Molnar 1979; Savage and Simpson 1997; Field et al. 1999); as such, it is applied to the LSC based surface strain rate model to calculate the scalar moment rate which can be expressed by the following formula:

$$\dot{M}_{o} = 2\mu {\text{HA}}\, \hbox{max} \left( {\left| {e_{1} } \right|,\left| {e_{2} } \right|} \right),$$

(4)

in which, *µ* is the rigidity, *H* is the seismogenic depth, *A* is the unit area, and *e*_{1} and *e*_{2} are the principal strain rates.

### Hazard calculation: probability of exceedance

The annual exceedance probability of peak horizontal ground acceleration or velocity (PGA or PGV) u at a site due to events at a particular cell *k* under the Poisson distribution is given by:

$$P\left( {u \ge u_{\text{o}} } \right) = P_{\text{k}} \left( {m \ge m\left( {u_{\text{o}} ,D_{\text{k}} } \right)} \right) = 1 - e^{{\left( { - v_{i} \left( { \ge m\left( {u_{\text{o}} ,D_{\text{k}} } \right)} \right)} \right)}} ,$$

(5)

where *P*_{k} (*m* ≥ *m* (*u*_{o}, *D*_{k})) is the annual exceedance probability of earthquakes in *k*th cell, *m* (*u*_{o}, *D*_{k}) is the magnitude in *k*th source cell that would produce an PGA or PGV of *u*_{o} or larger at the site, and *D*_{k} is the distance between the site and the source cell. In this study, *D*_{k} is calculated on the basis of the distance of source to site and the top of the starting locking depth, which is a 3-km depth (The 2017 PUSGEN 2017). The function m (*u*_{o}, *D*_{k}) is the Ground Motion Prediction Equation (GMPE) relation which will be discussed in the next section. The probability distribution of PGA or PGV at the site was determined by integrating the influences of the surrounding source cells, as in:

$$P\left( {u \ge u_{\text{o}} } \right) = 1 - \prod P_{\text{k}} \left( {m \ge m\left( {u_{\text{o}} ,D_{\text{k}} } \right)} \right).$$

(6)

By substituting the GMPE, we could obtain

$$P\left( {u \ge u_{\text{o}} } \right) = 1 - \prod e^{{\left( { - v_{i} \left( { \ge m\left( {u_{\text{o}} ,D_{\text{k}} } \right)} \right)} \right)}} = 1 - e^{{ - \varSigma v_{i} \left( { \ge m\left( {u_{\text{o}} ,D_{\text{k}} } \right)} \right)}} ,$$

(7)

which gives the annual exceedance probability of particular PGA or PGV. For the specific time duration *T*, the probability of exceedance is given by:

$$P\left( {u \ge u_{\text{o}} } \right) = 1 - \prod e^{{\left( { - Tv_{i} \left( { \ge m\left( {u_{\text{o}} ,D_{\text{k}} } \right)} \right)} \right)}} = 1 - e^{{ - \varSigma Tv_{i} \left( { \ge m\left( {u_{\text{o}} ,D_{\text{k}} } \right)} \right)}} .$$

(8)

The annual probability of exceeding specified ground motions is calculated by applying Eq. (7) for each grid. For the specified time duration *T*, the probability of exceeding specified ground motions is calculated using Eq. (8).

### Ground Motion Prediction Equation (GMPE)

To construct the seismic hazard map expressed by PGA or PGV, we need an attenuation relationship (Ground Motion Prediction Equation), in terms of PGA or PGV as a function of magnitude and distance. Unfortunately, there is no specific GMPE that was derived for the Indonesian region. Therefore, in this study, we used GMPE that was derived for other regions or worldwide data which had similar geological and tectonic conditions and focused on megathrust, i.e., the GMPE of Fukushima and Tanaka (1992), Youngs et al. (1997) or, Zhao et al. (1997), and Atkinson and Boore (2006). Later we called the four tested GMPE as FT92, YG92, ZH97 and AT06, respectively. To select the appropriate GMPE, the SHF on each seismic cluster (presented in Fig. 2) based on four GMPE mentioned earlier were evaluated. The SHF around Padang and Bengkulu city showed similar pattern. The results of the SHF based on four GMPE around Bengkulu city are shown in Fig. 3. This graph shows that the result of the SHF based on GMPE of ZH97 and AT06 are more appropriate and reasonable compare to the SHF based on GMPE of FT92 and YG97. In which in FT92 and YG97 the PGA is saturated in the low probability of exceedance range. It might significantly affect the PGA calculation in a longer return period, i.e., 500 and 2500 years. Therefore, we used ZH97 and AT06 for PGA calculation. We could address also this selection with the database used to derive the GMPE, in which ZH97 was developed on the basis of recorded data mainly from crustal and subduction interface and in-slab earthquakes in Japan, with supplementary data from western part of the United States and 1978 Tabas, Iran, earthquakes, while AT06 was developed on the basis of real recorded ground motion data from interface and in-slab earthquakes occurring in subduction zones of Alaska, Chile, Cascadia, Japan, Mexico, Peru and the Solomon islands.