cross_spectrum_multiコマンドマニュアル

(The documentation of cross_spectrum_multi command)

Last Update: 2023/10/23

◆機能・用途(Purpose)

多数の観測点間および長期間のクロススペクトルをまとめて計算する。
Compute the cross spectra among many stations and for a long time period all at once.

◆概要(Overview)

地震波干渉法は地震波を用いて地下構造を推定する代表的な手法の一つであり、観測点間の波形の相互相関関数がグリーン関数を表すという性質を利用する。観測点ペアの数だけ相互相関関数を計算する必要があるので、観測点数を$N$とすると計算時間や出力ファイルのサイズは$N^2$に比例し、観測点数の増加とともにこれらは著しく増大する。
Seismic interferometry is widely used to investigate the subsurface structure from seismic waves. It computes the Green's function between two stations using the cross correlation function (CCF) of the waveforms at the stations. Because the CCF must be computed for each station pair, the computation time and output file size are proportional to $N^2$, where $N$ is the number of stations. Therefore, the computational costs increase drastically with increasing number of stations.

有意なシグナル/ノイズ比を持つグリーン関数を得るには様々な時間帯の波形から得られた相互相関関数を足し合わせる必要がある。これによってグリーン関数に含まれるノイズはキャンセルされ、シグナルは強められる。地震波干渉法は自然地震を用いる方法と雑微動を用いる方法に大別されるが、自然地震を用いる場合には多数の地震から求めた相互相関関数を足し合わせ、雑微動を用いる場合には様々な日時の波形データから求めた相互相関関数を足し合わせることになる。特に雑微動を用いる場合、信号が弱いので膨大な期間の相互相関関数の足し合わせが必要になる。典型的には1年分、あるいはそれ以上の期間のデータの解析が必要である。その結果、計算量は膨大になる。
Stacking of the CCFs from the waveforms in various time periods is needed to obtain the Green's functions with significant signal-to-noise ratios. The stacking cancels the noise in the Green's function while emphasizes the signal. There are mainly two classes of the seismic interferometry; one uses natural earthquakes and the other does ambient noise. The stacking uses the CCFs from various earthquakes in case of the natural earthquake based interferometry, and from various time periods in case of the ambient noise based interferometry. Especially in case of the noise interferometry, the CCFs from long time periods, typically 1 year or more, need to be stacked. Therefore the analysis is computationally expensive.

このプログラムは主に雑微動を用いる地震波干渉法を念頭に置き、計算量と出力ファイルサイズの削減を主目標に開発したものである。以下の点が特徴である。
This program was aimed at reducing the quantity of computation and the output file size, especially for the noise based interferometry. The algorithm is characterized as below.

時間領域での相互相関関数ではなく周波数領域でのクロススペクトルを求める。これにより、計算時間が$N^2$に比例する処理はスペクトルの掛け算のみに限定され、処理に要する時間が短くなる。
The program computes the cross spectra in the frequency domain instead of the CCFs in the time domain. This reduces the computation time because the spectral multipication is the only processing proportional to $N^2$.
観測点ペア毎ではなく全観測点の波形を一斉に読み込んで処理する。これは計算時間を節約する上で重要である。クロススペクトルを求めるにはスペクトルの掛け算の前に個々の波形のフーリエ変換を行わなければならない。フーリエ変換は本来は波形の数だけ(計$N$回)行えば良いが、観測点ペア毎に独立にプログラムを走らせる方式だとフーリエ変換も$N^2$回行わざるを得なくなる。これを避けるため、1回の計算の中で全ての観測点の波形をまとめて読み込む方式にしている。
The waveforms from all stations are used at once instead of pair-by-pair operations. This is essential to reduce the computation time. The computation of cross spectra needs Fourier transforms of individual waveforms before the spectral multipication. In principle, only one Fourier transform is needed for each station ($N$ times in total). In practice, the number of Fourier transforms would be $N^2$ if each program run treated each pair independently. To avoid this duplication, the program reads all waveforms in a single run.
クロススペクトルはスタッキング後のもののみを出力する。これにより、出力ファイルサイズを抑制する。
Only the stacked cross spectra are included in the output to save the file size.

このプログラムでは Seismic Analysis Code (SAC)形式の波形データファイルを入力とする。 1時間毎あるいは1日毎など、一定の時間間隔で区切られた複数のSACファイルに分割されていても良い。波形の前処理を行う機能は無いので、前処理はこのプログラムの前に行うこと。また、全てのファイルでサンプリングレートとデータ長を揃えておくこと。ファイル名やディレクトリパスにはデータの先頭日時と観測点・成分コードを任意の命名規則に基づいて含めることができる。使用する観測点・成分コードを列挙したテキストファイルを別途用意して与える。
The inputs of this program are the waveform data files in Seismic Analysis Code (SAC) format. The SAC files can be divided at a constant time interval, for example every 1 hour or every 1 day. The program does not support preprocessings, which thus need to have been completed before executing this program. The sampling rates and data lengths of all files must be same. The beginning date and time and the station and component codes can be included in the file names and directory paths with an arbitrary naming rule. Prepare a separate text file that lists the station and component codes to be used.

このプログラムの出力は全ての観測点・成分組合せに対するクロススペクトルであり、全期間についてスタッキングしたスペクトルのみを出力する。 ymaeda_opentoolsのスペクトルファイル形式 (独自のファイル形式参照) で出力される。スタッキングでは単純にスペクトルの足し算のみを行い、スタック数による割り算は行わない。これは後から新しいデータを追加して解析するときの都合を考慮したものである。観測点・成分組合せごとのスタック数の情報が別途、テキストファイルに出力される。
The outputs of this program are the cross spectra for all combinations of stations and components. Only the spectra stacked for all time periods are included in the output. One of the file formats of ymaeda_opentools for spectra is used; see special file formats. The stacking operation includes the addition of the individual spectra but excludes the division by the number of stacks for convenience of future addition of new data. A separate text file is created that records the number of stacks for each combination of station and component.

◆ソースコード(Source code)

$YMAEDA_OPENTOOL_DIR/structural_survey/src/cross_spectrum_multi.c

◆使用方法(Usage)

コマンドライン引数でパラメータを指定する。パラメータの一覧を下表に示す。
Specify parameters by command-line arguments. The table below shows a list of parameters.

●「-」から始まらない引数 (Arguments not beginning with “-”)

このコマンドでは「-」から始まらない引数は存在しない。
This command does not have arguments not beginning with “-”.

●1つの「-」から始まる引数 (Arguments beginning with a single “-”)

このコマンドでは1つの「-」から始まる引数は存在しない。
This command does not have arguments beginning with a single “-”.

●「--パラメータ名=パラメータ値」の形式の引数 (Arguments of a form “--Parameter name=Parameter Value”)

「--パラメータ名=パラメータ値」の形式の引数は自由な順番で指定できる。「-」から始まらない引数の間に挿入しても良い。相反する指定がなされた場合には後の指定が優先される。デフォルト値を持つパラメータは省略できる。
Arguments of a form “--Parameter name=Parameter Value” can be placed in an arbitrary order. They can even be inserted between arguments not beginning with “-”. In case of conflicting options being specified, the latter option has a higher priority. Parameters that have default values can be omitted.

パラメータ名 Parameter name	意味 Meaning	可能なパラメータ値 Allowed parameter values	デフォルト値 Default value
inputfiles	入力として用いる波形データファイル名。ファイルはSAC形式の時系列データファイルとする。通常、複数のファイルであるが、それらを列挙するのではなく、日時や観測点・成分コードをパターンとして含む形で命名規則で与える。 The name of the waveform data files used as the inputs. Each file must be a time series data file of SAC format. Although multiple files are usually used, do not list them but instead give the naming rule of the file names that includes the patterns corresponding to the date and time, and the station and component codes.	ファイル名を表す文字列。ディレクトリパスを含んでいても良い。以下のパターン文字列を含めることができる。 A string that represents a file name, possibly including the directory path. The patterns listed below can be used. %YYYY 年(西暦、4桁) Year (A.D., 4-digits) %YY 年(西暦、下2桁) Year (A.D., lowest 2-digits) %MM 月(2桁) Month (2-digits) %DD 日(2桁) Day (2-digits) %hh 時(2桁) Hour (2-digits) %mm 分(2桁) Minute (2-digits) %ss 秒(2桁) Second (2-digits) %STATION 観測点コード Station code %COMPONENT 成分コード Component code	省略不可 Cannot be omitted
trace_list_file	解析に使用する観測点・成分のリストファイル名。第1列に観測点コード、第2列に成分コードを書き並べる。列の区切りにはタブを用いる。 The name of a file that lists the stations and components to use in the analysis. Write the station codes in the 1st column and component codes in the 2nd column. Use tabs to separate the columns. オプションで第3列以降に追加の情報を含めることができる。例えば観測点座標を書いておくと別の解析に同じファイルを用いることができるので便利である。なおcross_spectrum_multiコマンドでは第3列以降は無視される。 Optionally, additional information can be written in 3rd and later columns. This is convenient, for example, to write the station coordinates that may be used in other analyses. The 3rd and later columns are ignored in cross_spectrum_multi command. 空行、各行の#から後の部分はコメントとして読み飛ばされるので自由に挿入できる。 Empty lines and parts after # in each line are ignored as comments.	ファイル名を表す文字列。ディレクトリパスを含んでいても良い。 A string that represents a file name, possibly including the directory path.	省略不可 Cannot be omitted
start	解析期間の先頭日時。 The beginning date and time of the analysis period.	「YYYY-MM-DD.hh-mm-ss」の形式の文字列。ここで YYYY: 年(西暦4桁) MM: 月(2桁) DD: 日(2桁) hh: 時(2桁) mm: 分(2桁) ss: 秒(2桁) である。 A string of “YYYY-MM-DD.hh-mm-ss” format, where YYYY is the year (A.D., 4-digits), MM is the month (2-digits), DD is the day (2-digits), hh is the hour (2-digits), mm is the minute (2-digits), and ss is the second (2-digits).	省略不可 Cannot be omitted
end	解析期間の末尾日時。 The end date and time of the analysis period.	「YYYY-MM-DD.hh-mm-ss」の形式の文字列。ここで YYYY: 年(西暦4桁) MM: 月(2桁) DD: 日(2桁) hh: 時(2桁) mm: 分(2桁) ss: 秒(2桁) である。 A string of “YYYY-MM-DD.hh-mm-ss” format, where YYYY is the year (A.D., 4-digits), MM is the month (2-digits), DD is the day (2-digits), hh is the hour (2-digits), mm is the minute (2-digits), and ss is the second (2-digits).	省略不可 Cannot be omitted
file_interval	入力ファイルの時間区切りの長さ(s)。例えば1時間毎に分割されたファイルを用いる場合は3600となる。 The time length (s) of individual input files. For example, the value is 3600 if the files are divided every 1 hour.	正の整数。 A positive integer.	パラメータinputfileに登場するパターン文字列の最小時間単位。例えばパラメータinputfileに%hhまで登場するなら1時間刻み(3600)、 %mmまで登場するなら1分刻み(60)となる。 The minimum time unit of the pattern strings that appear in parameter inputfile. For example, if the parameter inputfile includes down to %hh then the interval is 1 hour (3600); if %mm is included then 1 min (60).
outputdir	計算したクロススペクトルの出力先ディレクトリ名。 The name of the output directory to output the computed cross spectra.	ディレクトリ名を表す文字列。ディレクトリパスを含んでいても良い。 A string that represents a directory name name, possibly including the directory path.	省略不可 Cannot be omitted
extension	出力ファイルの書式(拡張子)。 The extension for the output file format.	ymaeda_opentoolsのスペクトルファイル形式のいずれかに対応する拡張子 (独自のファイル形式参照)。 One of the extensions for the spectral file formats of ymaeda_opentools; see special file formats.	bimseq
prevdir	以前の解析結果が保存されているディレクトリ名。既に計算結果が存在し、新しい期間のデータを追加して解析する場合にこのオプションを使用する。 The name of the directory where previous analysis results are stored. Use this option to add new period of data to existing results.	ディレクトリ名を表す文字列。ディレクトリパスを含んでいても良い。 A string that represents a directory name name, possibly including the directory path.	省略時は新規の計算となる。 The computation is regarded to be new when this parameter is omitted.
normalize	クロススペクトルを規格化するか否かの選択。 Choice of whether to normalize each cross spectrum or not.	yes 規格化する。この場合、下記(\ref{eq.C.normalize})(\ref{eq.C_sum})式に基づいて計算した$C_{sum}(\omega)$が出力される。 Normalize. In this case, the outputs are $C_{sum}(\omega)$ computed from Eqs. (\ref{eq.C.normalize}) and (\ref{eq.C_sum}) below. no 規格化しない。この場合、下記(\ref{eq.C.no_normalize})(\ref{eq.C_sum})式に基づいて計算した$C_{sum}(\omega)$が出力される。 Do not normalize. In this case, the outputs are $C_{sum}(\omega)$ computed from Eqs. (\ref{eq.C.no_normalize}) and (\ref{eq.C_sum}) below.	yes
correct_size	1つの入力ファイル(波形データ)の正しい時刻サンプル数。読み込んだ波形の時刻サンプル数がこの値と一致しない場合、入力データを誤ったデータとして処理する。 The correct number of time samples in each input file (waveform data). If the number of time samples of a waveform read is not equal to this value, the program treats the waveform as an erroneous data.	正の整数。 A positive integer.	最初に読み込んだ波形データの時刻サンプル数。 The number of time samples of the first waveform data read.
correct_t0	1つの入力ファイル(波形データ)の正しい先頭時刻(s)。読み込んだ波形の先頭時刻がこの値と一致しない場合、入力データを誤ったデータとして処理する。 The correct beginning time (s) in each input file (waveform data). If the beginning time of a waveform read is not equal to this value, the program treats the waveform as an erroneous data.	実数。 A real number.	最初に読み込んだ波形データの先頭時刻。 The beginning time of the first waveform data read.
correct_dt	1つの入力ファイル(波形データ)の正しいサンプリング間隔(s)。読み込んだ波形のサンプリング間隔がこの値と一致しない場合、入力データを誤ったデータとして処理する。 The correct sampling interval (s) in each input file (waveform data). If the sampling interval of a waveform read is not equal to this value, the program treats the waveform as an erroneous data.	正の実数。 A positive real number.	最初に読み込んだ波形データのサンプリング間隔。 The sampling interval of the first waveform data read.
irregular_data	誤った入力データに遭遇した場合の処理。使用する全ての波形データの時刻サンプル数、先頭時刻、サンプリング間隔が等しくなければならないが、等しくないデータ(欠損を含むデータ等)に遭遇した場合の処理をこのパラメータで指定する。 Treatment when the program encountered an erroneous input data. This program assumes that the number of time samples, beginning time, and sampling interval are equal among all input waveforms; if this requirement is not satisfied, the program behaves based on this parameter.	error プログラムをエラー終了する。 Finish the program as an error. skip その波形を処理から外して続行する。 Continue the program run skipping the erroneous waveform.	error
df_Ndecimate	正の整数であり、周波数の何サンプルに1つを解析に用いるかを指定する。このパラメータの値を$N_d$、波形のフーリエスペクトルの周波数刻みを$\Delta f$として、 $N_d\Delta f$の周波数間隔でクロススペクトルを計算・出力する。 $N_d$を小さくすれば細かい数波数刻みでクロススペクトルが得られる。 $N_d$を大きくすれば計算に必要なメモリと処理時間を節約できる。 A positive integer $N_d$ defined such that the cross spectra are computed for every $N_d$ frequency samples; the frequency interval of the output cross spectra is $N_d\Delta f$, where $\Delta f$ is the frequency interval of the original Fourier spectra of the input waveforms. Using a smaller $N_d$ value results in a finer cross spectra, while a larger $N_d$ results in saving the memory and processing time requirements.	自然数。 A natural number.	1

◆動作(Behaviour)

パラメータtrace_list_fileで指定したファイル内に列挙されている全ての観測点・成分のペアについてクロススペクトルを計算し、それらを日付・時刻に関してスタッキングした上でパラメータoutputdirで指定したディレクトリ内に出力する。
Compute the cross spectra for all pairs of stations and components that are listed in the file specified by parameter trace_list_file, stack them along the date and time, and output the results into the directory specified by parameter outputdir.

$k$番目($k=1,2,\cdots,N$)の日付・時刻における 1つ目の観測点・成分の波形を$f_k(t)$、そのフーリエスペクトルを$F_k(\omega)$、 2つ目の観測点・成分の波形を$g_k(t)$、そのフーリエスペクトルを$G_k(\omega)$ とする。この日付・時刻におけるクロススペクトルは --normalize=yesの場合には \[\begin{equation} C_k(\omega)\equiv \frac{F_k(\omega)^{∗}G_k(\omega)} {\sqrt{\left[\int_{-\infty}^{\infty} f_k(t)^2 dt \right] \left[\int_{-\infty}^{\infty} g_k(t)^2 dt \right]}} =\frac{F_k(\omega)^{∗}} {\sqrt{\int_{-\infty}^{\infty} f_k(t)^2 dt}} \frac{G_k(\omega)} {\sqrt{\int_{-\infty}^{\infty} g_k(t)^2 dt}} \label{eq.C.normalize} \end{equation}\] で定義され、これは相互相関関数 \[\begin{equation} c(\tau)\equiv \frac{\int_{-\infty}^{\infty}f_k(t)g_k(t+\tau)dt} {\sqrt{\left[\int_{-\infty}^{\infty} f_k(t)^2 dt \right] \left[\int_{-\infty}^{\infty} g_k(t)^2 dt \right]}} \label{eq.c.normalize} \end{equation}\] のフーリエ変換である。一方、--normalize=noの場合には \[\begin{equation} C_k(\omega)\equiv F_k(\omega)^{∗}G_k(\omega) \label{eq.C.no_normalize} \end{equation}\] で定義され、これは規格化しない相互相関関数 \[\begin{equation} c(\tau)\equiv \int_{-\infty}^{\infty}f_k(t)g_k(t+\tau)dt \label{eq.c.no_normalize} \end{equation}\] のフーリエ変換である。プログラムでは各$k$に対する$C_k(\omega)$は出力せず、それらを$k$について足し合わせた \[\begin{equation} C_{sum}(\omega)=\sum_{k=1}^{N} C_k(\omega) \label{eq.C_sum} \end{equation}\] を出力する。平均値$C_{sum}(\omega)/N$ではない点に留意。 (\ref{eq.C_sum})式はパラメータstartで指定した日時からendで指定した日時までの期間における、存在する全てのデータに基づくクロススペクトルの和である。但しパラメータprevdirを指定した場合には、そのディレクトリに保存されているクロススペクトルを更に足したものになる。
Let $f_k(t)$ and $g_k(t)$ be the time series data of the 1st and 2nd traces, respectively, for $k$th date and time, where $k=1,2,\cdots,N$. Le $F_k(\omega)$ and $G_k(\omega)$ be their Fourier spectra. The cross spectrum for this date and time is defined by Eq. (\ref{eq.C.normalize}) in case of --normalize=yes, which is the Fourier transformation of the CCF defined by Eq. (\ref{eq.c.normalize}). In case of --normalize=no, $C_k(\omega)$ is defined by Eq. (\ref{eq.C.no_normalize}), which is the Fourier transformation of a non-normalized CCF defined by Eq. (\ref{eq.c.no_normalize}). They program does not output $C_k(\omega)$ for individual $k$ but outputs only their stacks (Eq. \ref{eq.C_sum}) over all $k$. Note that the output quantity is not the average $C_{sum}(\omega)/N$ but the summation $C_{sum}(\omega)$. The summation is taken over all existing data in the time period from the date and time specified by parameter start to those specified by end. If parameter prevdir is specified, this summed cross spectrum is further summed with the previous result stored in that directory.

出力ファイル名は

観測点コード1.成分コード1_観測点コード2.成分コード2.拡張子

となる。観測点・成分コードはパラメータtrace_list_fileで指定したファイル内の表記に基づき、同ファイル内で先に登場する観測点・成分が「観測点コード1」「成分コード1」となる。拡張子はパラメータextensionの値に基づく。例えばSTN1.U_STN2.U.bimseqなどとなる。
The output file names are as follows.

Station code 1.component code 1_station code 2.component code 2.extension

The station and component codes are based on those in the file specified by parameter trace_list_file, and the stations and components that appear earlier in this file are used as Station code 1 and component code 1. The extension is based on parameter extension. For example, STN1.U_STN2.U.bimseq.

これとは別にそれぞれのペアに対するスタッキング数$N$が Nstack.datというファイル名で出力される。このファイルの第1列は「観測点コード1」、第2列は「成分コード1」、第3列は「観測点コード2」、第4列は「成分コード2」、第5列はスタッキング数である。列の区切りにはタブが用いられる。
In addition, the number of stacks $N$ for each pair is recorded in a file Nstack.dat. The 1st to 5th columns of this file are Station code 1, component code 1, Station code 2, component code 2, and the number of stacks, respectively. Tabs are used to separate the columns.

◆使用例(Example)

cross_spectrum_multi --inputfiles=structural_survey/data/%YYYY/%YY%MM%DD%hh/%STATION.%COMPONENT.sac --trace_list_file=structural_survey/etc/traces.dat --start=2021-01-01.00-00-00 --end=2021-12-31.23-59-59 --outputdir=structural_survey/result2021

cross_spectrum_multi --inputfiles=structural_survey/data/%YYYY/%YY%MM%DD%hh/%STATION.%COMPONENT.sac --trace_list_file=structural_survey/etc/traces.dat --start=2022-01-01.00-00-00 --end=2022-12-31.23-59-59 --outputdir=structural_survey/result2021-2022 --prevdir=structural_survey/result2021

この例では1時間毎の連続波形データが

structural_survey/data/YYYY/YYMMDDhh/観測点コード.成分コード.sac

の名前で存在するものとする。 1つ目の例では2021年の1年分のクロススペクトルを計算し、結果をstructural_survey/result2021ディレクトリに出力する。 2つ目の例では2021年の結果に 2022年の1年分のクロススペクトルを追加(スタッキング)して、計2年分の結果をstructural_survey/result2021-2022ディレクトリに出力する。
This example assumes that the continuous waveform data for every 1 hour are stored in files with the following names.

structural_survey/data/YYYY/YYMMDDhh/station code.component code.sac

The first example computes the annual cross spectra from 2021 and outputs the results into structural_survey/result2021 directory. The second example adds (stacks) the cross spectra from 2022 to the result from 2021 and outputs the two-year results into structural_survey/result2021-2022 directory.

また、この例に登場するstructural_survey/etc/traces.datは例えば次のようなテキストファイルである。ここで[TAB]はタブを表す。
The file structural_survey/etc/traces.dat in this example is a text file. Below shows an example of this file, where [TAB] indicates a tab.

STN1[TAB]U
STN2[TAB]U
STN3[TAB]wU
STN4[TAB]Z
STN5[TAB]BHZ

cross_spectrum_multiコマンド マニュアル