TEXT SIZE

search for



CrossRef (0)
Segmentation of binary sequence via minimizing least square error with total variation regularization
Communications for Statistical Applications and Methods 2024;31:487-496
Published online September 30, 2024
© 2024 Korean Statistical Society.

Jeungju Kima, Johan Lim1,a

aDepartment of Statistics, Seoul National University, Korea
Correspondence to: 1 Department of Statistics, Seoul National University, Kwanak Ro 1, Seoul 08826, Korea. E-mail: johanlim@snu.ac.kr
Received December 24, 2023; Revised May 2, 2024; Accepted May 20, 2024.
 Abstract
In this paper, we propose a data-driven procedure to segment a binary sequence as an alternative to the popular hidden Markov model (HMM) based procedure. Unlike the HMM, our procedure does not make any distributional or model assumption to the data. To segment the sequence, we suggest to minimize the least square distance from the observations under total variation regularization to the solution, and develop a polynomial time algorithm for it. Finally, we illustrate the algorithm using a toy example and apply it to the Gemini boat race data between Oxford and Cambridge University. Further, we numerically compare the performance of our procedure to the HMM based segmentation through these examples.
Keywords : binary sequence, gemini boat race data, least square error, run length code, segmentation, total variation regularization