帳號:guest(44.212.94.18)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):童旻浩
作者(外文):Tung, Ming-Hao
論文名稱(中文):快速且精準之時間序列分群演算法-透過精準群心選擇
論文名稱(外文):A Fast and Accurate Time Series Clustering Algorithm via Precise Center Selection
指導教授(中文):廖崇碩
指導教授(外文):Liao, Chung-Shou
口試委員(中文):黃文良
韓永楷
呂俊賢
口試委員(外文):Hwang, Wen-Liang
Hon, Wing Kai
Lu, Chun-Shien
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:104034524
出版年(民國):106
畢業學年度:105
語文別:英文
論文頁數:36
中文關鍵詞:集群問題時間序列動態時間校正群心選擇
外文關鍵詞:ClusteringTimeSeriesDynamicTimeWarpingCenterSelection
相關次數:
  • 推薦推薦:0
  • 點閱點閱:2936
  • 評分評分:*****
  • 下載下載:45
  • 收藏收藏:0
時間序列集群演算法在許多科學領域得到了廣泛的研究,特別是對於過去十年的交通流量預測問題。最近,在2015年提出了一種稱為TADPole的基於密度的時間序列集群演算法,其性能優於所有其他方法。既使TADPole在大多數測試用例中表現良好,但其輸出分群結果的精準度仍有改進空間以及其選定為群中心的品質。
在本研究中,我們提出了一種快速且準確的時間序列集群演算法,Density Peak via Center Selection (DPCS),我們透過更精準的群心選擇來保留各群的特性在這些群心中。此外,我們根據所需的各資料分佈的屬性構建輸出集群。實驗結果表明DPCS對於大量集群具有高度有效性,即使在處理群集之間具有明顯不同密度時也是如此。特別地,我們顯示DPCS在輸出群集和集群中心都比TADPole更準確,同時保持了與TADPole類似的運行時間。從實務的角度,我們提出的DPCS演算法可以很好的處理時間序列預測問題的應用。
Time series clustering algorithms have been widely studied in many scientific areas, especially for the traffic flow prediction problem in past decade. Very recently, a density-based time series clustering algorithm, called TADPole, which outperforms all the other approaches, was proposed in 2015. Although TADPole performs well in most of the test cases, there is still room for improvement in precision of its output clusters and the quality of the selected centers.
 In this study, we propose a fast and more accurate time series clustering algorithm,
Density Peak via Center Selection (DPCS), which selects centers that can hold features of the cluster data. Moreover, we construct the output clusters according to the required properties. Experimental results demonstrate the effectiveness of DPCS for a large number of clusters, even when dealing with clusters with significantly different densities. In particular, we show that DPCS is more accurate than TADPole in both output clusters and cluster centers while maintaining similar running time of TAD-Pole. From a practical perspective, the proposed DPCS algorithm can obviously find many applications of time series forecasting problems.
Contents
摘要 I
Abstract II
誌謝 III
Contents IV
List of Figures and Tables V
1 Introduction 1
1.1 Motivation 1
1.2 Prior work 2
1.2 Our contribution 3
2 DTW and TADPole Revisited 6
2.1 Multi-dimensional DTW 6
2.2 Lower bound of multi-dimensional DTW 7
2.3 Procedures of TADPole 8
3 DPCS: Density Peak via Center Selection 12
3.1 Algorithm 12
3.2 Comparison with the TADPole algorithm 14
3.3 Assignment strategies 17
4 Experimental Evaluation 19
4.1 Evaluation of the algorithm 19
4.2 Cluster centers evaluation 25
4.3 Parameter sensitivity of DPCS 26
4.4 Discussion 27
5 Real-world Case Study: Traffic Network 28
5.1 Similar time series patterns 28
5.2 Correlation between pattern and traffic situation 31
6 Conclusion 33
Reference 33
[1] Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn Keogh. 2015.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible
Pruning Strategy. In Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 15). ACM, New
York, NY, USA, 4958. DOI: http://dx.doi.org/10.1145/2783258.2783286
[2] S. Brecheisen, H. P. Kriegel, and M. Pfeifle. 2004. Efficient
density-based clustering of complex objects. In Data Mining, 2004.
ICDM 04. Fourth IEEE International Conference on. 4350. DOI:
http://dx.doi.org/10.1109/ICDM.2004.10082
[3] Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall,
Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series
Classification Archive. (July 2015).
[4] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn
Keogh. 2008. Querying and Mining of Time Series Data: Experimental Comparison
of Representations and Distance Measures. Proc. VLDB Endow. 1, 2
(Aug. 2008), 15421552. DOI: http://dx.doi.org/10.14778/1454159.1454226
33
[5] Martin Ester, Hans-Peter Kriegel, Jo rg Sander, and Xiaowei Xu. 1996. A
density- based algorithm for discovering clusters in large spatial databases
with noise. AAAI Press, 226231.
[6] Jyh-Shing Roger Jang. 2016. Machine Learning Toolbox. available at
http://mirlab.org/jang/matlab/toolbox/machineLearning. accessed on Dec
10, 2016.
[7] Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data:
An Introduction to Cluster Analysis. Wiley.
[8] Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact Indexing of
Dynamic Time Warping. Knowl. Inf. Syst. 7, 3 (March 2005), 358386. DOI:
h p://dx.doi.org/10.1007/s10115-004-0154-9
[9] Stephen Kokoska and Daniel Zwillinger. 2000. CRC Standard Probability and
Statistics Tables and Formulae. Chapman & Hall / CRC.
[10] P. D. Kovesi. 2000. MATLAB and Octave Functions for Computer Vision
and Image Processing. (2000). http://www.peterkovesi.com/matlabfns/.
[11] J. MacQueen. 1967. Some methods for classification and analysis
of multivariate observations. In Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability, Volume
1: Statistics. University of California Press, Berkeley, Calif., 281297.
http://projecteuclid.org/euclid.bsmsp/1200512992
[12] Son T. Mai, Ira Assent, and Martin Storgaard. 2016. AnyDBC: An Efficient
Anytime Density-based Clustering Algorithm for Very Large Complex
Datasets. In Proceedings of the 22Nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD 16). ACM, New York,
NY, USA, 10251034. DOI: http://dx.doi.org/10.1145/2939672.2939750
[13] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo
Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn
34
Keogh. 2012. Searching and Mining Trillions of Time Series Subsequences
Under Dynamic Time Warping. In Proceedings of the 18th
ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD 12). ACM, New York, NY, USA, 262270. DOI:
http://dx.doi.org/10.1145/2339530.2339576
[14] William M. Rand. 1971. Objective Criteria for the Evaluation
of Clustering Methods. J. Amer. Statist. Assoc. 66, 336 (1971),
846850. DOI: http://dx.doi.org/10.1080/01621459.1971.10482356 arXiv:
http://www.tandfonline.com/doi/pdf/10.1080/01621459.1971.10482356
[15] Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and
find of density peaks. Science 344, 6191 (Jun 2014), 14921496. DOI:
http://dx.doi.org/10.1126/science.1242072
[16] Jin Shieh and Eamonn Keogh. 2008. iSAX: Indexing and Mining Terabyte
Sized Time Series. In Proceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 08). ACM, New
York, NY, USA, 623631. DOI: http://dx.doi.org/10.1145/1401890.1401966
[17] Mohammad Shokoohi-Yekta, Jun Wang, and Eamonn
Keogh. On the Non-Trivial Generalization of Dynamic
Time Warping to the Multi-Dimensional Case.
289297.DOI:http://dx.doi.org/10.1137/1.9781611974010.33arXiv:http://epubs
.siam.org/doi/pdf/10.1137/1.9781611974010.33
[18] Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn
Keogh. 2003. Indexing Multi-dimensional Time-series with Support for Multiple
Distance Measures. In Proceedings of the Ninth ACM SIGKDD International
Conference on Knowledge Discovery and DataMining (KDD 03). ACM,
New York, NY, USA, 216225. DOI: http://dx.doi.org/10.1145/956750.956777
35
[19] Yuan Yuan, Yi-Ping Phoebe Chen, Shengyu Ni, Augix Guohua Xu, Lin Tang,
Martin Vingron, Mehmet Somel, and Philipp Khaitovich. 2011. Development
and application of a modified dynamic time warping algorithm (DTW-S) to
analyses of primate brain expression time series. BMC Bioinformatics 12, 1
(2011), 347. DOI: http://dx.doi.org/10.1186/1471- 2105- 12- 347
[20] KDD CUP 2017 website https:tianchi.aliyun.com/competition/information.htm?
spm=5176.100067.5678.2.8CnCPt&raceId=231597
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *