Counting 3,146 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1


FAST tutorial

The following instructions were only tested on Linux clusters; we do not currently support other operating systems. To efficiently process inputs spanning a long duration, we suggest running the pipeline on a server with multiple processes and sufficient memory.


The pipeline is implemented in Python and C++, with the following dependencies:

c++ dependency: boost
python dependencies: obspy, pywt, scipy, numpy, skimage, sklearn


Copy the zip file to your home diretory, unzip and install the Python dependencies:

~/$ unzip
~/$ cd FAST
~/FAST$ pip install -r requirements.txt

Install the C++ dependencies:

~/FAST$ sudo apt-get install cmake, build-essential, libboost-all-dev 


Raw SAC files for each station are stored under data/waveforms${STATION}. Station "HEC" has 3 components so it should have 3 time series data files; the other stations have only 1 component.


Parameters for each station is under parameters/fingerprint/. To fingerprint all stations and generate the global index, you can call the wrapper script (Python):

~/FAST$ python -c config.json

Another option for the fingerprint wrapper script (bash):

~/FAST$ cd fingerprint/
~/FAST/fingerprint$ ../parameters/fingerprint/

The fingerprinting step takes less than 1 minute per waveform file on a 2.60GHz CPU. The generated fingerprints can be found at data/waveforms${STATION}/fingerprints/${STATION}${CHANNEL}.fp. The json file data/waveforms${STATION}/${STATION}_${CHANNEL}.json contains information about the fingerprint file, including number of fingerprints (nfp) and dimension of each fingerprint (ndim).

Alternatively, to fingerprint a specific stations, call the fingerprint script with the corresponding fingerprint parameter file:

~/FAST$ cd fingerprint/
~/FAST/fingerprint$ python ../parameters/fingerprint/fp_input_CI_CDY_EHZ.json

In addition to generating fingerprints, the wrapper script calls the global index generation script automatically. The global index (as opposed to index with a single component) is a consistent way to refer to fingerprint times at different components and stations. Global index generation should only be performed after you've generated fingerprints for every component and station that is used in the detection:

~/FAST/fingerprint$ python  ../parameters/fingerprint/global_indices.json

The resulting global index mapping for each component is stored at data/global_indices/${STATION}_${CHANNEL}_idx_mapping.txt, where line i in the file represents the global index for fingerprint i-1 in this component.

Similarity Search

Compile and build the code for similarity search:

~/FAST$ cd simsearch
~/FAST/simsearch$ cmake .
~/FAST/simsearch$ make

Call the wrapper script to run similarity search for all stations:

~/FAST/simsearch$ cd ..
~/FAST$ python -c config.json

Another option for the similarity search wrapper script (bash):

~/FAST$ cd simsearch/
~/FAST/simsearch$ ../parameters/simsearch/

Alternatively, to run the similarity search for each station individually:

~/FAST$ cd simsearch
~/FAST/simsearch$ ../parameters/simsearch/ CDY EHZ


The following scripts parse the binary output from similarity search to text files, and combine the three channel results for Station HEC to a single output. Finally, it copies the parsed outputs to directory ../data/input_network/.

~/FAST$ cd postprocessing/
~/FAST/postprocessing$ ../parameters/postprocess/
~/FAST/postprocessing$ ../parameters/postprocess/

Run network detection:

~/FAST/postprocessing$ python ../parameters/postprocess/7sta_2stathresh_network_params.json

Results from the network detection are under data/network_detection/7sta_2stathresh_network_detlist*. The file contains a list of potential detections including information about starting fingerprint index (global index, or time) at each station, number of stations where we found other events similar to this event (nsta), total number of similar fingerprint pairs mapped to the event (tot_ndets), total sum of the similarity values (tot_vol). Detailed format of the output can be found in the user guide.

Optionally, to clean up the results from network detection (need to modify inputs within each script file):

~/FAST$ cd utils/network/
~/FAST/utils/network$ python
~/FAST/utils/network$ ./
~/FAST/utils/network$ python
~/FAST/utils/network$ ./

The results from the above scripts can be found at data/network_detection/7sta_2stathresh_FinalUniqueNetworkDetectionTimes.txt


To plot the waveforms from network detection:

~/FAST$ cd utils/events/ 
~/FAST/utils/events$ python 0 50

The above script plots the first 50 waveforms from the output. The plot file names are sorted in descending order by: num_sta (number of stations that detected this event), peaksum (peak total similarity) You can view the images at data/network_detection/7sta_2stathresh_NetworkWaveformPlots/ Inspect the waveforms in order to set detection thresholds.


You can find more details about the pipeline and guidelines for setting parameters in our extended user guide. You may also check out the following papers: