Processing of volcano infrasound using film sound audio post-production techniques to improve signal detection via array processing

The use of infrasound for the early detection of volcanic events has been shown to be effective over large distances, and unlike visual methods, is not weather dependent. Signals recorded via an infrasound array often have a poor signal to noise ratio, as other sources of infrasound are detected and recorded along with the volcano infrasound. Array processing software does not always detect known volcanic events, in part because of the amount of noise in the infrasound signal (Taisne et al., in: Pichon, Blanc, Hauchecorne (eds) Infrasound monitoring for atmospheric studies: challenges in middle atmosphere dynamics and societal benefits. Springer International Publishing, Cham, 2019). Resampling the infrasound into the audible range and then applying the acoustic noise reduction techniques of spectral subtraction prior to array processing is shown to improve signal detection of volcanic events. The discussed technique is applicable to any infrasound signal such as infrasound from anthropogenic sources like nuclear testing.


Introduction
Ash from volcanic plumes poses a serious threat to air traffic, and the early detection and identification of volcanic events are vital to provide early warning to allow rerouting of aircraft. In addition, the majority of global volcanic risk is in SE Asia, representing 70% of the global volcanic threat (Auker et al. 2015;Brown et al. 2017). Remote detection via satellite and other visual methods remains susceptible to cloud cover and other weather events, as some areas of SE Asian can experience up to 80% cloud coverage 80% of the time (Taisne et al. 2019). Volcanic activity produces infrasound or sound below 20 Hz, which can travel thousands of kilometers with relatively little attenuation (Dabrowa et al. 2011;Fee and Matoza 2013;Matoza et al. 2018;Perttu et al. 2020). It can be used to detect volcanic events at regional to global distances (Dabrowa et al. 2011;Matoza et al. 2013Matoza et al. , 2017 and with the exception of wind turbulence affecting some sensor locations (Dabrowa et al. 2011;Matoza et al. 2018), is not weather dependent (Taisne et al. 2019). Arrays of infrasound sensors have been installed globally to form the International Monitoring System (IMS) by the Comprehensive Nuclear Test-Ban Treaty Organization (CTBTO), as well as on specific volcanoes and smaller regional arrays for research and monitoring. An array is composed of multiple sensors spaced and located in a way that they could be processed together typically using the Progressive Multi-channel Correlation (PMCC) software (Cansi 1995). Such analysis allows for detection of coherent signals that are recorded across the array, and extract several parameters, including the back azimuth (direction from where the signal is coming from), and apparent velocity across the array. Known events observed with other methods such as waveform analysis are not always detected by array processing. This presents a limitation when relying solely on array processing for detection, classification and association for localization of infrasound events. The method presented in this paper can improve event detection via array processing and help to mitigate the limitation associated with relying only infrasound detection of volcanic events. Regional monitoring of volcanoes from a regional to global distance has been put forth as cloud cover-independent remote sensing method specifically for the identification of large volcanic plumes' hazards for the aviation industry (Dabrowa et al. 2011;Taisne et al. 2019). Infrasound is becoming a popular method for monitoring volcanoes in remote areas with limited or no local monitoring network such as in Alaska.
Volcanic infrasound signals are often contaminated with infrasound from non-volcanic sources such as wind noise, the ever-present microbarom and other natural and anthropogenic sources (Caudron et al. 2016;Fee and Matoza 2013;Garcés et al. 2010;Le Pichon et al. 2002). Wind produces pressure fluctuations, caused by air turbulence, that represent a significant source of infrasound noise (Marty 2019). Methods to reduce wind noise are designed and installed as a physical apparatus at the infrasound sensor location, and include pipe arrays, porous hoses, and wind screening devices such as wind fences and fabric domes (Marty 2019). These physical barriers seek to prevent or reduce the amount of noise recorded by the sensor and have been shown to be effective in reducing wind noise while maintaining waveform coherence across the array (Raspet et al. 2019). These are some of the common filters employed to attenuate noncoherent noise present in the infrasound signals prior to recording which can occur across the frequency band of interest. However, there are no noise reduction techniques generally adopted and applied post-recording for the coherent noise sources like the microbarom. The microbarom is generated by nonlinear interaction of the ocean and atmosphere and produces a coherent signal within the frequency band of interest for volcanoes (Taisne et al. 2019), and are therefore not filtered out by the physical noise reduction systems and are a constant source of noise within recorded infrasound data. Different infrasound recording system designs are dominated by different types of noise. While this paper focuses on the use of IMS stations for regional volcano monitoring, the system is designed to detect clandestine nuclear atmospheric testing. For this mission the microbarom which occurs around 0.1-0.4 Hz (Bowman et al. 2005) is within the passband of interest. Infrasound is also used to monitor volcanoes with locally installed infrasound sensors. For these sensors installed on or close to a volcano the dominant noise source is wind (Johnson et al. 2003).
This paper presents a new method for processing the infrasound signal (resampled into the audible domain) using available acoustic noise reduction software, prior to array processing. Using this method, we have improved the signal to noise ratio and shown that previously undetected events could be extracted and detected even from a noisy waveform with a significant coherent noise component.
Infrasound is defined as an acoustic or pressure wave with a frequency below the base of human hearing at around 20 Hz (Johnson 2003). It can be made audible by time compressing the signal (increasing playback speed) so that the spectral content is in the audible spectrum (20 Hz-20 kHz). Once in the audible range the spectral characteristics of the signal can be analyzed and compared through audition via headphones or through loudspeakers in addition to viewing the waveform or spectrogram. It quickly becomes apparent that there is often significant and persistent broadband ambient noise as well as less frequent tonal noise detected by the infrasound array. The broadband noise, noise containing spectral content across a wide range of frequencies, is often predominately microbarom. Microbarom is a common noise source across infrasound stations worldwide (Bowman et al. 2005). The tonal noise, noise containing spectral content in a narrow frequency band (or bands), has multiple potential manmade or natural sources. Our original research question was to determine if volcanic events could be detected via an operator listening to the time-compressed (resampled) infrasound signal. Due to a poor signal to noise ratio, this proved to be impossible in most cases, with the persistent noise masking the relatively short signal. In an attempt to solve this problem, acoustic noise reduction techniques commonly employed in film sound audio post-production were implemented.
Acoustic noise is a well-known and constant problem in film sound production. Location film recording environments are chosen for their visual characteristics and so are often suboptimal for sound recording purposes. This leads to varying amounts and types of acoustic noise being recorded along with the dialog, the desired signal in location film sound recording. Reducing or removing the acoustic noise present in the signal is important, as "clean", or noise free dialog, is needed to allow for maximum speech intelligibility, and for the widest range of sound design possibilities in the creation of a film's soundtrack. The need for acoustic noise reduction is common across many other fields such as telecommunications and it is from this field that many of acoustic noise reduction methods originate.
Hardware-and software-based digital signal processing allows for a range of tools that are effective in reducing acoustic noise, including wide frequency/broadband noise. Software to reduce broadband noise will typically employ noise reduction algorithms that can react dynamically to time variable broadband noise. These have proven to be effective in removing broadband acoustic noise from dialog and other sounds in post-production so it was decided to investigate their use in removing noise from the infrasound data.

Data
For this study, we will focus on the signal produced by a single known volcanic event to highlight the difficulties of relying on one method of signal detection (e.g., array processing), and improvements made to this processing. The selected infrasound signal was produced by an eruption at Sangeang Api Volcano, Indonesia, recorded by the IMS network in the Southeast Asia region (I07AU at ~ 2100 km, I39PW at ~ 2400 km, I06AU at ~ 2500 km, and I04AU at ~ 3000 km away from the volcano). The signal was identified by the International Data Center (IDC) detection system (Table 2) through associating the localizing the signal from the IMS network.
On May 30, 2014 at around 07:55 UTC the volcano Sangeang Api, in the Sunda Islands, Indonesia, erupted and produced a plume that was reported by the Darwin Volcanic Ash Advisory Center (VAAC) to be up to 15.2 km (Global Volcanism Program 2014;Zidikheri et al. 2017). This eruption was recorded clearly on the IMS network producing a clear signal at the station I07AU. Like most remote infrasound detections from large volcanic eruptions, the signal is low frequency and identified by time and azimuth being consistent with the known eruption (Matoza et al. 2017). While a clear detection was observed at both I07AU and I39PW, at least in the array processing, there was no obvious signal at I06AU. However, visually using beamforming it appears that the signal was recorded. There is a similar pattern at I04AU where initially the detection is not clear (see Appendix). In both I04AU and I06AU, during the eruption there was a strong noise signal from the ocean microbarom (and other sources) that was also recorded by the infrasound array. These four stations represent the closest IMS stations to the event, and with only two stations detecting the event, association and localization of the event would not be possible using automated processes. However, the fact that I07AU and I39PW had a strong detection indicates that a large coherent infrasound signal was in fact produced by the eruption. The process has been described using the results from I06AU; however, the analysis was also completed for I04AU and I07AU (Appendix).

Acoustic noise reduction
The audio files from the eight sensors part of Cocos Islands infrasound array, I06AU, were resampled to 44 kHz (time compressed), from their original 20 Hz sampling rate, to place them in the audible range that is optimal for RX audio-editing software (the software will not process infrasound at the original sample rate). RX Advanced Audio-Editing software, created by iZotope Inc, contains multiple modules designed to remove many types of audible acoustic noise. Introduced in 2007, it has become a popular tool for use in music production, audio restoration and film audio post-production. The Spectral De-Noise module is designed to remove broadband and tonal noise that is relatively unchanging (stationary) in the time domain. It is based on an improved version Todd 2006, 2007) of the standard spectral subtraction algorithm (Boll 1979) developed for acoustic noise reduction of speech.
Spectral subtraction begins with the assumption that the acoustic noise is relatively stationary in amplitude and frequency as compared to the signal of interest, and that the noise is not correlated to the signal (Thiemann 2001). The overall signal is the sum of the desired signal and noise, so in simple terms, the noise spectrum is estimated and then subtracted from original (noisy) signal spectrum to give an estimated clean signal spectrum.
The noise spectrum may be estimated from periods when the desired signal is absent, with the estimate changing if the noise spectrum changes. In the Spectral De-Noise module the user has the option to define the section of noisy signal that is used to calculate the estimated noise spectrum (noise profile); in this case, the noise spectrum estimate remains unchanged throughout spectral subtraction processing.
As a general overview, the noisy signal is divided into short time sections known as frames, which are typically no longer than 25 ms (i.e., 1100 samples at 44 kHz, the equivalent of 55 s at 20 Hz, sampling frequency of the IMS infrasound stations) and each frame is windowed (Vaseghi 2008). Discrete Fourier transform is utilized to compute, for each frame, spectral content from which the magnitude and phase spectrum are extracted. The estimated noise spectrum is subtracted from each frame to derive an estimated clean spectrum with suppression gains adaptively calculated for each frame. Spectral subtraction (Boll 1979) is one of the earliest and most widely used speech enhancement (noise reduction) algorithms and a detailed explanation of the process is beyond the scope of this paper.
Each audio file was loaded into iZotope RX Audio Editor (version 6.00.1210) and the files were listened to through high-quality audio monitors while viewing the spectrogram generated by the software. All files exhibited significant ambient noise and a poor signal to noise ratio, making it very difficult to discern an event occurrence through listening to files or observing the spectrogram. This is in contrast to signal from the I07AU station (which had a strong detection) where the event was clearly audible and observable on the spectrogram. Note that the operator (first author) of the software was not aware of the presence or absence of an event in the infrasound sample selected for processing.
Volcanic events occur for a relatively short amount of time (e.g., a 30-min event, recorded at 20 Hz, will last for 0.8 s when resampled to 44 kHz) and have time variable spectral content, so it is assumed that any spectral content that is relatively constant in amplitude and frequency in the time domain would constitute noise. After repeated listening, and viewing the spectrogram, a section from 1.114 to 1.569 s of the resampled waveform was identified as being only noise and a noise profile, consisting of the complete frequency spectrum in that time window, was captured (learnt) from the file IO6H1BDF, for use by the spectral de-noise module of RX6.
The spectral de-noise module was configured for maximum noise reduction with the reduction amount set to − 40 dB. Noise reduction algorithm "C" was chosen as it allows for multi-resolution processing and results in fewer noise reduction artifacts. Multi-resolution changes the size of the fast Fourier transform (FFT) based on real-time signal analysis so an optimal size is used, helping also to reduce transient smearing (Lukin and Todd 2006). Noise floor adjustments affect the signal after the noise reduction process and include synthesis, which synthesizes signal harmonics; masking, which adjusts noise processing based on a psychoacoustic model and reduces the amount of reduction where the noise is inaudible; enhancement, which enhances signal harmonics and whitening, which modifies the amount of noise reduction at different frequencies, affecting the spectrum of any residual noise. These parameters were set to 0 to avoid the possible introduction of new spectral content (Spectral De-noise [STD & ADV] 2017). This configuration was saved as a preset and remained consistent in all processing approaches.
The noise present in the files from each sensor was spectrally very similar, so it was decided to batch process the files from all sensors in the array using the saved preset. All files were processed with a single noise profile extracted from one sensor (IO6H1BDF). The resulting spectrogram is labeled "Batch process NR" in Fig. 1.
For comparison, the files were processed individually with a specific noise profile extracted for each sensor and then used to process the file from that sensor. This would account for any changes in the spectral content of the noise from the individual sensors. For processing, the same time window was used to capture each noise profile (1.114 s to 1.569 s). The resulting spectrogram is labeled "Individual NR" in Fig. 1.
Finally, the files were processed using the adaptive mode of the spectral de-noise module. In this mode, the noise profile is not user-defined, but instead changes as the incoming audio is analyzed by the software (Thiemann 2001). The "learning time" is the time duration the software looks ahead while calculating the changing noise profile. A maximum learning time of 5.0 s, corresponding to 3 h of infrasound data, was used and the module configured the same as with the previous preset. A new preset named Adaptive NR was created and saved, and the files were batch processed using this preset. The resulting spectrogram is labeled "Batch Adaptive NR" in Fig. 1 (see Appendix for the corresponding waveforms).
In summary, infrasound data recorded on May 30th, 2014 at around 10:00 UTC at the IO6H sensor array were processed with the spectral De-Noise module of iZotope RX6 Audio Editor software. The data were processed three separate times, the first with a noise profile captured from the recording of a single sensor and then applied to the recordings of all sensors (batch). Second, recordings from each sensor were processed with a noise profile captured from it (individual), and the third with an adaptive noise profile (batch adaptive). The settings for the De-Noise module remained the same throughout (with the exception of the addition of the "learning time" parameter for the adaptive mode).

Array processing
After the data were processed and the noise reduced it was resampled to the original 20 Hz, and was analyzed using array processing. Array processing for this project was completed using PMCC (Cansi 1995). This is the array processing program that is used by the International Data Center of the CTBTO (Mialle et al. 2019). The configuration of PMCC used for this study is based on the filter sets developed by Garces (2013), known as INFERNO. INFERNO uses nth octave bands based on ISO 3 acoustic filter standards, extended into the infrasound range. Filters were designed using 3rd octaves, and 8 Gabor window spacing. IMS arrays are designed to have an approximate array aperture of 2 km so the filter sets were identical for each IMS station. Data were filtered in 21 bands between 0.04 and 6 Hz (see Table 1) Individual detections (pixels) were grouped (families) with a threshold set to a minimum of 6 pixels. These settings were used before and after the noise reduction processing, and compared to a more standard filter set (Matoza et al. 2017) as well as the defaults within the program. Due to the audio post-processing steps, all amplitude information is no longer absolute and becomes relative.

Results and discussion
The acoustic noise reduction of the upsampled infrasound files from IO6AU showed significant reduction in the predominantly microbarom background noise. Batch processing with a single noise profile across all files, batch processing with the adaptive mode or individual processing with noise profiles specific to each sensor; all showed similar amounts of noise reduction in the spectral domain (see Fig. 1).
In the initial array processing, the coherent microbarom dominates the results in terms of families and associated pixels (Fig. 2). The detection is consistent in time and back azimuth while the eruption is not clearly differentiated, and the time period is characterized by a slight loss in coherence of the microbarom and a wide spread in back azimuths. However, after the noise reduction is performed there is a decrease in the number of families and pixels associated with the microbarom and an increase in the number of pixels and families at the time and expected back azimuth of the signal of interest (Figs. 3 and 4). The largest improvements came from the runs with the designed filters both for each site (individual) and for the whole station (batch); however, there was also a significant improvement using the adaptive filter which could be implemented into an automated processing schema. The improvement in the number of detected pixels is important, especially if the number of pixels parameter is being used as a proxy for signal strength at a station as in Matoza et al. (2018). The signal from I04AU, that also had significant noise component, was processed using the described method and similar results were achieved to those described for I06AU (see Fig. 4 and Appendix). In addition, applying the method to the infrasound signal from I07AU, an inland station less prone to microbarom and other noise, did not yield a significant difference between the RAW and processed data (see Fig. 4 and Appendix). This is expected since I07AU had a strong signal detected via PMCC processing prior to noise reduction. It does suggest that noise reduction of an infrasound signal, that does not have a strong noise component, will not adversely affect signal detection via PMCC processing.

Conclusions
Acoustic noise reduction applied to upsampled infrasound data prior to PMCC processing has improved the detection of a known signal that was dominated by noise associated with microbarom. At the time, compared to I06AU location, the microbarom had an average back azimuth of 220° giving good separation from Sangeang Api back azimuth of 82°. The normal data processing, with no pre-processing, resulted in 152 pixels assigned to five families within two 5° bins of the calculated back azimuth, and the batch noise reduction pre-processing resulted in 1965 pixels in 21 families. Prior to the application of acoustic noise reduction, the PMCC results for the event were not apparent, and post-application the results are more obvious as the histogram of back azimuths becomes more bimodal, one pick corresponding Fig. 2 Array processing results for all versions of the data using dtkPMCC. PMCC results shown are using the INFERNO filters. Each pixel is plotted and colored by calculated back azimuth, the known back azimuth of the source, 82°, is yellow in color and the average back azimuth of the microbarom, 220°, is blue. The frequency axis is plotted on a logarithmic scale. The array processing results are dominated by the microbarom but while the event is not overly obvious using the raw data, it became clear on the three types of pre-processed data. To evaluate the differences between the different results, we grouped the output into 5° azimuth bins and then we consider pixels within the bins on either side of the expected back azimuth. For the raw data processing there are 152 pixels in 5 families, while that number increases to 1369 pixels in 16 families, 1967 pixels in 20 families, and 1051 pixels in 11 families, for the individual noise reduction, batch noise reduction and batch adaptive runs, respectively to the noise, and one to the signal of interest. Detections from multiple arrays are used to triangulate to a common source, and associated to the same event (Matoza et al. 2017). CTBTO automated detection system was unable to identify the arrival of the studied event (Table 2). In our study the array processing results were analyzed manually, based on pixel and family statistics, and the event was clearly identifiable. An automated system including the type of acoustic noise reduction employed in this study would have an enhanced detection capability. This is especially true for stations dominated by coherent microbarom. While physical wind noise reduction systems exist for non-coherent noise, it is important to remember that the microbarom is a coherent noise signal that can dominate a station and make detection of other signals more difficult. While the application of acoustic noise reduction is not scalable in its current form, the application of adaptive mode for acoustic noise reduction (which does not require user input) has the potential to be automated and thus scalable. The effectiveness of acoustic noise reduction in the adaptive mode is similar to that of reduction when the noise profile is manually identified in the current example, and future research will investigate if it remains consistent across multiple events and types of events. However, the fact that this Fig. 3 Histograms of pixels and families per 5° bins from 0 to 360, back azimuth for the volcano is 82° (dashed line). In the left column are the results for number of pixels within families per 5° bin for no noise reduction (top) and a batch noise reduction by station (bottom), and the bins highlighted in red are around the known back azimuth (75°-85°). In the right column are the number of families for both no noise reduction (top) and batch noise reduction (bottom), again the bins around the known back azimuth are highlighted in red method did not have a negative impact on the clear signal at I07AU indicates that this process could be applied as an automated processing step without sacrificing clear signals.
While a commercially available noise reduction software, designed for processing sound in the audible range, was used for this paper, acoustic noise reduction algorithms, specifically spectral subtraction, can be coded and utilized in software such as Matlab. Future research will investigate optimizing acoustic noise reduction and building an automated system for applying acoustic noise reduction to recorded infrasound in an operational setting. The use of the cleaned signal compared to the raw Fig. 4 Map, I07AU, I06AU and I04AU raw on top. Batch station level noise reduction is shown below (performed for all stations), both normalized by station (I07AU has more pixels in total than I06AU). After noise reduction, the back azimuth (shown in red) is now clearly identifiable within the rose diagram and allows the signals to be associated  6 Array processing results for I04AU. In this example a noise profile from a single sensor at the station was used for noise reduction processing of all channels (Batch process NR). Adaptive noise reduction, which does not require a preselected noise profile, was also used on all channels (Batch adaptive NR). The noise reduction software was configured with the same settings as was used for I06AU. Prior to processing, the signal is present only at frequencies lower than the microbarom whereas in the noise-reduced data the broader band characteristics of the signal are visible Fig. 7 Histograms of pixels and families per 5° bins from 0 to 360, for the station I04AU. The back azimuth for the volcano is 6° (dashed line). In the left column are the results for number of pixels within families per 5° bin for no noise reduction (top) and a batch noise reduction by station (bottom), the bins highlighted in red are around the known back azimuth (0°-10°). In the right column are the number of families for both no noise reduction (top) and batch noise reduction (bottom), again the bins around the known back azimuth are highlighted in red