Thursday, May 22, 2008

data storage and plotting of realtime rdahmm

Since the present modifications to the real-time RDAHMM service is just an initial version that is not fully correct, we need to store all received data for possible remodeling and evaluation purposes in the future. We now use one file to save one day's input data for one station. And all data for all stations is stored in a directory structure like this: "histParentDir/stationName/yyyy/mm/statioName_yyyy-mm-dd.dat".

For plotting, we'll first do tests with the existing plotting script, which plots with lines instead of points. We'll switch to points later.

Sunday, May 11, 2008

Modifications to real-time RDAHMM

Present real-time RDAHMM service just runs RDAHMM in training mode periodically on stations' real-time input, which is actually not a right way to do it.
We'll do the following modifications, which is not totally right either, but just our first step towards the right way:
For each station, collect its input data for a whole day, and build a RDAHMM model for it by running RDAHMM in training mode on the one-day's input data;
Use this model to periodically do evaluations for the station in future time; this period could vary from 10 minutes to 1 hour.

Right now we just use the model created based on one-day's data to do evaluations for all the rest time. This is obviously not completely right, and there might be the need for rebuilding the model from time to time. We'll leave an argument for specifying the period for rebuilding models to make the new implementation as general as possible, and discuss the proper period at a later time.

Thread problem with GRWS queries

We came across some problems when issuing GRWS queries with multiple threads. Some thread gets no input for the stations they query about, while the input for these stations are actually available when only one thread is used. Paul explained that this is because of current problems with the GRWS services about threading support. When a thread queries after another one, but the query time is nearly the same, the late thread may get no input.

To solve this problem we temporarily use only one thread, which just runs for less than an hour to do evaluations for all stations, and which is acceptable. Paul will try to improve the threading support of their services. He also mentioned that we can query input for all stations at one time, and get the result in one single file. This will be very helpful to improve the performance of the daily RDAHMM service, and we'll try it later.