Introduction
This is a short tutorial designed to show the basics of SOD using a raw XML strategy file. For an easier introduction that uses a GUI editor to hide the complexities of XML, see the Strategy via Editor document. If you've never seen XML before, check out our crash course. This tutorial does not exploit all the features of SOD, but is rather simple. Once the basics are understood, it is not difficult to add a little here and there and come up with a sophisticated processing system.
Both this tutorial and SOD in general should be considered a work in progress. Please check the sod web site, http://www.seis.sc.edu/SOD, for updates and new information. Also, please send comments, criticism, complements, and changes to SOD at seis.sc.edu.
On to the tutorial! First, you will need to get and install SOD. Please see the installation instructions for details.
The Strategy File
Sod gets all of the information about what data to retrieve from a XML strategy file. So in order to use SOD, you need to create or modify one. For this example, we will get all data from long period channels from all magnitude 6 or larger earthquakes in January of 2003. We will use the IRIS-Ida network, code II, and pick some stations that have a distance of 30 to 60 degrees from the event. We will save the data in SAC format in directories by event.
The main structure of our XML strategy file will look like this. We will fill in each section with the details later. There should a completed copy of this called tutorial.xml within the examples directory of the SOD distribution.
<?xml version="1.0" encoding="UTF-8"?> <sod> <properties></properties> <eventArm></eventArm> <networkArm></networkArm> <waveformArm></waveformArm> </sod>
There are several very important items here so you should make sure that these are correct. The <?xml> line is not really part of the SOD strategy proper, but does signify that this is an XML file the version of the XML specification that is used. The <sod> tag surrounds the sod strategy items and should be the first tag.
The next section is the properties section. This is actually optional, but useful items can be set here. Probably the most important are the <removeDatabase> and <runName> properties. RemoveDatabase tells SOD if it should use the information from the database created by a previous SOD run. If it's true, SOD will only get and process event channel pairs that weren't already done by the previous run. It can be set like this:
<removeDatabase>TRUE</removeDatabase>
This would tell SOD to remove the database and start like a fresh run.
runName controls the name used by all of the status pages that SOD generates. Setting this can be useful to tell several runs apart.
<runName>Tutorial Run</runName>
Adding this tells SOD that this run is called "Tutorial Run".
Both of these items should be added to the file inside of the properties tags
Earthquakes!
Now to the fun part: finding earthquakes. We need to get a selection of earthquakes from the server. This is specified within the <eventArm> section. We will take earthquakes from anywhere in the world, but we want the magnitude to be at least 6. Here is the eventFinder section to accomplish this.
<eventFinder> <name>IRIS_EventDC</name> <dns>edu/iris/dmc</dns> <boxArea> <latitudeRange> <unit>DEGREE</unit> <min>-90</min> <max>90</max> </latitudeRange> <longitudeRange> <unit>DEGREE</unit> <min>-180</min> <max>180</max> </longitudeRange> </boxArea> <originDepthRange> <unit>KILOMETER</unit> <min>0</min> <max>1000</max> </originDepthRange> <originTimeRange> <startTime>20030101T00:00:00.001Z</startTime> <endTime>20030131T23:59:59.999Z</endTime> </originTimeRange> <magnitudeRange> <min>6.0</min> <max>10.0</max> </magnitudeRange> <catalog>PREF</catalog> <contributor>IRIS</contributor> </eventFinder>
This eventFinder sequence should be inserted in between the two eventArm tags
You will notice that there are quite a lot of extra items here that aren't directly related to our query. They will not eliminate any events and will make modification easier later. Basically, we are connecting to the IRIS_EventDC server, which is registered under the edu/iris/dmc domain. We are asking for events in January of 2003 with a magnitude of 6 or greater.
One subtle point is that a single real earthquake will be in multiple catalogs within the DMC's database. he DMC doesn't have enough information to group these origins into a single event. As such, they appear in seperate events. It is usually wise to pick a particular catalog to avoid duplicate processing of events. The default here is the PREF catalog from the DMC. PREF is not a true catalog, but something that the DMC Event server understands as the "best" catalog depending on the time of the request. The weeklies, or WHDF, catalog from the NEIC is another common one for historical data. If you are doing a SOD run and want near real time data, then you probably want to use FINGER catalog. This is because that has a short delay from the earthquake's occurrance to the location that is being included in the catalog. However, since this tutorial run is historical, PREF is a good choice.
Now that we have our events locally, we could do some subsetting based on the origin, but we will assume all of these events are fine. Just to make running SOD a little more interesting, we will add an event subsetter and an event processor. The event subsetter, <removeEventDuplicates/> makes sure that we do not get events that are very close in both location and time. It is often the case that there are duplicates even in a single catalog, and this removes them. The processor just prints out a line every time an event arrives. The last item generates the html status pages for events. This finishes the event arm.
<removeEventDuplicate/> <printLineEventProcess/> <eventStatusTemplate> <template>jar:edu/sc/seis/sod/data/templates/eventArm/eventStatus.xml"</template> </eventStatusTemplate>
These items should go after the end of the event finder and before the second event arm tag in the file.
Networks and Channels
Before getting any data, we must choose some channels within the network arm. Here is the network arm that gets long period channels from stations within the II network.
<networkFinder> <name>IRIS_NetworkDC</name> <dns>edu/iris/dmc</dns> <refreshInterval> <unit>DAY</unit> <value>2</value> </refreshInterval> </networkFinder>
Insert this XML inside the networkArm tags
Here we connect to the IRIS_NetworkDC server within the edu/iris/dmc domain. We will also go back and look for new channels every 2 days. In our current example this will not matter because the entire run will complete in a shorter time. However, for a longer running instance of SOD, and in particular for stations within USArray, there may very well be new stations that come online during a SOD run. Setting this refreshInterval tells SOD to add new channels to its run as it goes.
Next we will select the networks that we are interested in. This will select just the II network.
<networkCode>II</networkCode>
This piece of XML should immediately follow the networkFinder we just added and immediately precede the closing networkArm tag
If we wished to select more than one network, we could have wrapped several networkCode elements within a <networkOR> to allow them to all match. Care should be taken as you may say, "I want II and IU networks," and be tempted to use a <networkAND>, but SOD needs to be told, "The network code can be II or IU," so <networkOR> is correct. It is important that there be at most one subsetter of a given type. If more than one is needed, then they should be wrapped in a logical, such as AND, OR, NOT, or XOR.
We will accept all stations at this point. However, we may be able to speed up the process by eliminating stations that were not active during January. The station subsetter below will make sure that the station's effective time overlaps the time range for our event selection.
<stationEffectiveTimeOverlap> <startTime>20030101T00:00:00.001Z</startTime> <endTime>20030131T23:59:59.999Z</endTime> </stationEffectiveTimeOverlap>
Insert this after the networkCode subsetter
The next level of subsetting is the Site level. Sites in the Fissures/DHI model roughly correspond to location IDs in SEED. In order to avoid getting data from more than one site per station, we will specify that the site must be active during January and that its code must be either space-space or 00. We use both a logical site ID subsetter as well as a site subsetter.
<siteAND> <siteOR> <siteCode> </siteCode> <siteCode>00</siteCode> </siteOR> <siteEffectiveTimeOverlap> <startTime>20030101T00:00:00.001Z</startTime> <endTime>20030131T23:59:59.999Z</endTime> </siteEffectiveTimeOverlap> </siteAND>
Insert this after the effective time subsetter
Finally, we will actually select channels. At this point we can eliminate all but the long period channels. We also want to eliminate LOG channels and include an effective time overlap to lessen the amount of data that must be processed. Lastly, we will use a printlineChannelProcessor to print out the name of each successful channel.
<channelAND> <bandCode>L</bandCode> <channelNOT> <gainCode>O</gainCode> </channelNOT> <channelEffectiveTimeOverlap> <startTime>20030101T00:00:00.001Z</startTime> <endTime>20030131T23:59:59.999Z</endTime> </channelEffectiveTimeOverlap> </channelAND> <printlineChannelProcessor/>
These are the final items in the networkArm, so they should be inserted right before the closing networkArm tag.
Waveforms
The final section involves the combination of event and network/channel information to produce waveforms. Here we can make use of both the event and station location to do distance calculations. Insert the following in the waveform arm to use an event station subsetter to limit the distance from the event to the station to between 30 and 60 degrees.
<distanceRange> <unit>DEGREE</unit> <min>30</min> <max>60</max> </distanceRange>
The next item the time window to ask for data. The usual method is to use predicted phase arrival times to calculate the request. This uses the TauP Toolkit to calculate the times, so any model and phase from it is acceptable. See the TauP Toolkit for more information. We will use 1 minute before the first P arrival to 20 minutes after the first S arrival within the prem model.
<phaseRequest> <model>prem</model> <beginPhase>ttp</beginPhase> <beginOffset> <unit>SECOND</unit> <value>-60</value> </beginOffset> <endPhase>tts</endPhase> <endOffset> <unit>MINUTE</unit> <value>20</value> </endOffset> </phaseRequest>
We then choose the particular data center that we wish to get data from, the IRIS_PondDataCenter, which is in the edu/iris/dmc domain.
<fixedDataCenter> <name>IRIS_PondDataCenter</name> <dns>edu/iris/dmc</dns> </fixedDataCenter>
Next, the available data subsetter allows checking on the availability of data before asking for the actual data. In this case, we will simply say that some data must exist by using the someCoverage element.
<someCoverage/>
At this point SOD will request and retrieve the actual data. The last items will print a line saying how many seismograms were received for each request, then save the data in SAC format in a directory. Because of gaps in the original recording, it is possible to get more than one seismogram from a single request, but usually for a small time window this does not happen.
The saveSeismogramToFile allows you to customize the directory naming structure. First, the dataDirectory element gives the parent directory into which SOD will build the actual event directories. SOD will create this directory if it does not exist. Next, the eventDirLabel creates the name of each event's subdirectory. We will name the directories with Event_, followed by the time of the event as year_Jday_hour_minute_second.
<printlineSeismogramProcess/> <saveSeismogramToFile> <fileType>sac</fileType> <dataDirectory>POND_II</dataDirectory> <eventDirLabel>Event_<originTime>yyyy_DDD_HH_mm_ss</originTime></eventDirLabel> </saveSeismogramToFile>
If you've inserted all the pieces in the right places, you should have a working SOD strategy file now. Enjoy!