Sunday, June 15, 2008

Parallel RDAHMM

I did a test about the performance of two diffrerent ways for building a rdahmm model:
A. Train the model with 10 tries in one process;
B. Train the model with 10 processes, each carrying out 1 try with a different random seed, and then select the model with the largest L value.

Performance Comparison:
Station Name Line Count of input file Time for A (sec) Time for B (sec)
CVHS 3921 23 20
TABL 48138 277 212
TABL 72622 457 3 41

When the amount of input is not so large, the performance of A and B are similar, because with method B there is the cost of creating new processes. When the number of lines in input file is large, using 10 processes improves the performance by 20%-25%.

If we start the real-time service for all stations of one network at the same time, we might need to create two many processes running at the same time, because there are 7-8 stations per network, and 10 processes for each station. Having so many processes running at the same time is very costly. So I think maybe we can just use one process with 10 tries for building the models tmeporarily. 8-10 minutes is not a very long time, anyway.

No comments: