Finding patterns in sequences: Comparison of Motif Extraction, Dynamic Time Warping and Hidden Markov Model approaches, with applications to the TIMSS 1999 Video Study
See more in Publications and Presentations, Education Publications and Presentations, Technology Publications and Presentations
Thesis submitted for the MS Educational Psychology degree at the University of Illinois at Urbana-Champaign, 2004.
Title: Finding patterns in sequences: Comparison of Motif Extraction, Dynamic Time Warping and Hidden Markov Model approaches, with applications to the TIMSS 1999 Video Study
Abstract: This paper will present three different approaches to finding patterns in sequences—motif extraction, dynamic time warping distances, and hidden Markov models. These methods have been used in fields as diverse as bio-informatics and speech recognition, but are equally useful in analyzing categorical sequential data. Motif extraction is the process of looking for recurring patterns of codes in sequences. Dynamic time warping is an algorithm that finds the distance between pairs of sequences and can be used to find clusters in a set of sequences. Hidden Markov models use a probabilistic approach to studying the common underlying structure of a group of sequences. In the first part of this paper, I provide an overview of each technique and compare their advantages and limitations. In the second part, I apply these three methods to the TIMSS 1999 video study—a rich data set with time-coded information about the events taking place in 638 mathematics lessons from seven countries. The results provide new insights into the data that would not have been possible with traditional methods of prevalence analysis.
References:
Allison, P.D. (1984). Event History Analysis: Regression for Longitudinal Event Data. Beverley Hills: Sage.
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. (1997). Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389-3402.
Bakeman, R., & Gottman, J. M. (1997). Observing interaction: An introduction to sequential analysis. Cambridge: Cambridge University Press.
Cambridge University Engineering Department (2003). Hidden Markov model toolkit. Retrieved March 31, 2004 from http://htk.eng.cam.ac.uk/
Clarke, D.J. & Mesiti, C. (2003). Addressing the challenge of legitimate international comparisons: Lesson structure in Australia and the USA. In L. Bragg, C. Campbell, G. Herbert, & J. Mousley (Eds.), Mathematics education research: Innovation, networking,
opportunity, Proceedings of the 26th Annual Conference of the Mathematics Education Research Group of Australasia, Vol. 1, 230-237.
Givvin, K. B., Jacobs, J. K., Hollingsworth, H. (2003, April). “Lesson signatures”: A visual display of country differences in lesson structure. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
Griffin, W. A. (2003). Affect pattern recognition: Using discrete hidden Markov models to discriminate distressed from nondistressed couples. Marriage & Family Review, 34(1-2), 139-163.
Gusfield, D. (1997). Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge University Press.
Hausler, S. (2004). Hidden Markov models: A tutorial for the course Computational Intelligence. Retrieved March 31, 2004 from http://www.igi.tugraz.at/lehre/CI/tutorials/HMM/index.html
HCIL (2002). Visual exploration of time-series data. Retrieved March 31, 2004 from http://www.cs.umd.edu/hcil/timesearcher/
Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., Jacobs, J., Chui, A. M. Y., Wearne, D., Smith, M., Kersting, N., Manaster, A., Tseng, E., Etterbeek, W., Manaster, C., Gonzales, P., & Stigler, J. W. (2003). Teaching mathematics in seven countries: Results from the TIMSS 1999 video study (NCES 2003-013). Washington, DC: U.S. Department of Education.
IBM Bioinformatics Group (2003). Teiresias-based Text Pattern Discovery Using Individual Symbols. Retrieved March 27, 2004 from http://cbcsrv.watson.ibm.com/Ttspd.html
Kruskal, J. B., & Liberman, M. (1983). The symmetric time warping algorithm: From continuous to discrete. In J. B. Kruskal & D. Sankoff (Eds.), Time Warps, String Edits and Macromolecules. (pp. 125-162). Reading, MA: Addison-Wesley
Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverley Hills: Sage.
LessonLab (2003). TIMSS-R video math coding manual. Retrieved February 19, 2004 from http://www.lessonlab.com/timss1999/download/TIMSS%201999%20Video%20Coding%20Manual.pdf
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory, 10(8), 707-710.
Magnusson, M. S. (2000). Discovering hidden time patterns in behavior: T-patterns and their detection. Behavior Research Methods, Instruments & Computers, 32, 93-110.
Murphy, K. (2003). Hidden Markov Model (HMM) toolbox for Matlab. Retrieved March 31, 2004 from http://www.ai.mit.edu/~murphyk/Software/HMM/hmm.html
Rigoutsos, I., Floratos, A., Parida, L., Gao, Y., & Platt, D. (2000). The emergence of pattern discovery techniques in computational biology. Metabolic Engineering, 2(3), 159-177.
Schliep, A., Rungsarityotin, W., Georgi. B. (2003). GHMM: A LGPL-licensed hidden Markov model library. Retrieved March 31, 2004 from http://ghmm.sourceforge.net/
Shimizu, Y. (2003). Capturing the structure of Japanese mathematics lessons as embedded in the teaching unit. Paper presented as part of the symposium “Mathematics Lessons in Germany, Japan, the USA and Australia: Structure in Diversity and Diversity in
Structure” at the Annual Meeting of the American Educational Research Association, Chicago, April 21-25, 2003.
Stigler, J.W., & Hiebert, J. (1999). The teaching gap: Best ideas from the world’s teachers for improving education in the classroom. New York: Free Press.
Tufte, E. R. (1990). Envisioning information. Cheshire, CT: Graphics Press.