ABCDEFGHIJKLMNOPQRST
1
報告日期
報告人作者 標題 出處
報告者簡評老師或學長簡評
2
例:T.F. Smith and M.S. Waterman. "Identification of Common Molecular Subsequences". Journal of Molecular Biology vol. 147 pp. 195-197 (1981)a
3
請照報告順序填寫忘記順序的請自行到實驗室網站查詢
4
20131126許哲睿D.S. Hirschberg. "A Linear Space Algorithm for Computing Maximal Common Subsequences". Communications of the ACM, Vol. 18, No. 6, pp. 341–343, 1975本文提出三種演算法
A algorithm 即為一般常見的 LCS
B algorithm 為 A 的改良,主要改良在節省記憶體空間
C algorithm 使用 divide and conquer 的方式,一邊從頭,另一邊從結尾找尋 LCS,最後再將找到的結果 merge 起來
5
20131126許竣傑D. S. Hirschberg, “Algorithms for the longest common subsequence problem,” Journal of ACM, Vol. 24, pp. 664–675, 1977.作者提出兩個演算法,主要用(k-1) candidate 與 k candidate的區間找出minimal k candidate,其中minimal k candidate即是LCS的字
6
20131126袁信全James W. Hunt and Thomas G. Szymanski. "A Fast Algorithm for Computing Longest Common Subsequences". Communications of the ACM vol. 20 pp. 350-353 (1977)Be known as Hunt-Szymanski LCS Algorithm. 方法延伸自Robinson-Schensted-Knuth Algorithm for LIS.
time complexity: O(nlogn)
利用兩序列的所排列出的matrix和其中match的位置,利用此algo算出LCS.
7
20131204郭冠呈Weizhong Li and Adam Godzik. "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences". BIOINFORMATICS APPLICATIONS NOTE, Vol. 22, No. 13, 2006, pp. 1658–1659這篇paper提供了一組套件,可以提供給使用者進行蛋白質的比對,這個套件可以快速地篩選可能蛋白質已進行比對,詳細的使用方法有寫在CD-Hit user guide的網頁上。
8
20131204陳慶耀Lloyd ALLISON and Trevor I. DIX . "A BIT-STRING LONGEST-COMMON-SUBSEQUENCE ALGORITHM". Information Processing Letters, Volume 23, Issue 5, 24 November 1986, Pages 305–310主要是利用bit及Hunt-Szymanski的方式進行LCS
9
20131211詹德勝William J. Masek and Michales. Pateson"A faster algorithm computing string edit distances". Journal of Computer and System Sciences 20,18-31 1980作者提出一種分割再合併的 Faster algorithm 來解決 Edit Distance 的問題 所花費時間為O(n * max(1 , m/logn ))
10
20131211戴君潔Robert A. Wagner & Michael J. Fischer, "The String-to-String Correction Problem". Journal of the Association for Computing Machinery, vol. 21, No. 1, p.p. 168-173, January 1974用edit distance解Solve string-to-string correction problem.
X algorithm 即為edit distance
Y algorithm trace並印出兩字串中取代之pairs
在文末提到若edit distance的插入, 刪除, 取代之cost為1, 1, 2時, LCS與edit distance之結果有以下關係: LCS(A,B)=(|A|+|B|-ED(A,B))/2
11
20131218袁信全T.F. Smith and M.S. Waterman. "Identification of Common Molecular Subsequences". Journal of Molecular Biology vol. 147 pp. 195-197 (1981)與一般LCS做法相同,差別在於計算兩序列的距離時,加入負分,因此traceback的起點為matrix中最大的值。(match時加1,dismatch時扣4/3,連續dismatch的情況時,第一個dismatch扣4/3,隨後的扣1/3)
12
20131218黃鼎耀David Sankoff. "Minimal Mutation Trees of Sequences". SIAM Journal on Applied Mathematics, Vol. 28, No. 1 (Jan., 1975), pp. 35-42給定一棵樹,初始條件是我們知道某些點的資訊(每個已知點都是一個sequence),透過這些sequence以及作者的方法(切成多的個Ω、著色法),建立出新的sequence填入不知道資訊的點上,在計算點之間的edge length,讓total edge length最小。edge length就是兩個sequence做1 1 1 edit distance
13
20131224許哲睿William R. Person and David J. Lipman. "Improved tools for biological sequence comparison". Proceedings of the National Academy of Sciences of the United States of America Vol. 85, pp. 2444-2448, April 1988提出 FASTA 演算法,主要用在處理如 DNA 或蛋白質這種大量資料的部分
先找出連續 match 的片段,然後將這些片段進行評分
接著利用這個分數找出較好的片段,而不好或不能用的片段就排除
最後將所有片段連接起來,便能形成一組解
需要注意的是找出來的解只是近似解,並非最佳解
14
20131230周展億Vaclav Chvatal and David Sankoff . "Longest Common Subsequences of Two Random Sequences". Journal of Applied Probability, Vol. 12, No. 2 (Jun., 1975), pp. 306-315當我們有個隨機元素集合 k 形成的兩個k-ary時,他們可能的LCS長度會是多少,利用機率的觀點來探討這件事情
15
20140107方鈞毅 DAVID MAIER "The complexity of some problems on subsequences and supersequences" (1978)
Journal of the ACM (JACM)Volume 25 Issue 2, April 1978 Pages 322-336
傳說中的LCS是NP-complete的證明在這裡,這裡專注於字串的字符集,這裡證明了字符集大於等於二以上的LCS證明,還有字符集大於等於五的Super common sequence,而字串的數量設定上是任意數量的,也就是說不限於兩條字串的LCS;而方法大約是巧妙的利用字串得decode與分隔,在LCS與mode cover問題之間做轉換。
16
20140114許竣傑Chang-Biau Yang(楊昌彪), R.C.T. Lee(李家同). "Systolic Algorithms for the Longest Common Subsequence Problem". Journal of the Chinese Institute of Engineers Vol. 10, NO 6, pp.691-699(1987)作者提出兩個systolic algorithm來解LCS,基於dynamic programming的方法利用systolic array實作,所花時間只需O(n),而time processor product和dynamic programming是一樣的
17
20140114許哲睿PETER H. SELLERS. "An Algorithm for the Distance between Two Finite Sequences". JOURNAL OF COMBINATORIAL THEORY (A) 16, 253-258 (1974)完成 S. M. Ulam 提出(?)的 edit distence O((m^2)n) 的演算法
並簡單描述如何將其改進到 O(mn)
18
20140114戴君潔Alfred V. Aho, Margaret J. Corasick. "Efficient String Matching: An Aid to Bibliographic Search". Communications of the Association for Computing Machinery, vol. 18, issue 6, p.p. 333-340, June 1975作者提出以有限狀態的字樣比對機(finite state pattern matching machine)來作字串比對的動作,其以goto function、failure function以及output function來判斷字串比對的結果。
19
20140127袁信全Michael L. Fredman. "On computing the length of longest increasing subsequences". Discrete Mathematics vol. 11 pp. 29–35 (1975) 由於此篇為LIS,所以不詳加討論
20
20140127許竣傑C. Schensted. "Longest increasing and decreasing subsequences". Canad. J. Math. 13(1961), 179-191這篇論文用建構楊式陣列的方式來找到LIS以及LDS,陣列row的個數等於LDS,陣列column的個數等於LIS,並證明其正確性
21
20140127陳慶耀Maxime Crochemore, Costas S. Iliopoulos, Yoan J. Pinzon3, and James F. Reid. "A Fast and Practical Bit-Vector Algorithm for the Longest Common Subsequence Problem". Information Processing Letters, Volume 80, Issue 6, 31 December 2001, Pages 279–285主要是利用Delta L表的方式,將五次的bit操作變成四次,產生大幅降低時間花費的效果。
22
20140210黃鼎耀SAUL B. NEEDLEMAN AND CHRISTIAN D. WUNSCH. "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins". J. Mol. Biol. (1970) 48, 443-453這篇論文提出了一個global alignment的方法應用在看兩個蛋白質裡的胺基酸相似程度,這個方法是先建立一個score matrix,然後再給定penalty值
(範例是gap penalty=0),用dynamic programming方式依序完成表格,從表中就可以找到maximum match的值以及trace序列對應的方式
23
20140210郭冠呈J.F.Collins, A.F.W.Coulson and A.Lyall. "The significance of protein sequence similarities". Computer applications in the biosciences : CABIOS Vol.4, no.1. 1988 PP. 67-71這篇論文提出一個蛋白質的機率分布模型,假設相關的蛋白質,經過得分矩陣的判定後,有可能相關的蛋白質,Alignment數量與得分高低會形成指數衰減,並利用現有的資料來驗證正確性。此篇無實用性,不深入討論
24
20140210許哲睿Robert A. Wagner. "On the Complexity of the Extended String-to-String Correction Problem". Proceedings of seventh annual ACM symposium on Theory of computing, Pages 218-223, 1975說明含有 insert, delete, change, swap 四種動作的 edit distence
並提出各種不同的限制之下如何解決問題
(本文以 swap = INF, swap < INF, swap = 0 三種 case 為基礎進行衍伸)
最後證明此種演算法為 NP-complete
但在部分限制之下可在 polynomial time 之內解決此問題
25
20140218周展億J. W. Hunt, M. D. McIlroy. "An Algorithm for Differential File Comparison". Bell Laboratories Computing Science Technical Report #41(July 1976)這篇論文就是在UNIX工作站中的diff指令實作方式,將兩篇文章的每一行hash成一個字元,再利用diff algorithm進行LCS的處理,最後便可找出兩篇文章的不同之處
26
20140218詹德勝D.S. Hirschberg "A Linear Space Algorithm for Computing Maximal Common Subsequences" Communications of the ACM
June 1975 Volume 18 Number 6
用divide and conquer˙的概念設計出使用O(N)空間的LCS演算法
27
20140218戴君潔Roy Lowrance and Robert A. Wagner. "An Extension of the String-to-String Correction Problem". Journal of the Association for Computing Machinery, vol.22, No.2, pp. 177-183, April 1975
28
20140225方鈞毅WALTER M. FITCH TEMPLE F. SMITH. "Optimal sequence alignments". Proc. Nati Acad. Sci. USA Vol. 80, pp. 1382-1386, March 1983Base on a presented algorithm, we mainly focus on its parameters. Those parameters are using end gap or not, gap penalty value, and weight per element in a gap. Finally, it turn out in some issues it should concern to use a end gap to find the best solutions, however, some are not.
29
20140225許竣傑TEMPLE F. SMITH, MICHAEL S. WATERMAN. "Comparison of Biosequences". Advances in Applied Mathematics, December 1981, p482-489此篇論文主要做algiment,提出一個新演算法,找到最大相似的片段(local maximum algiment),而相似度的算法是引用Needleman and Wunsch作者所提論文。
30
20140225袁信全Daniel S. Hirschberg. "An information theoretic lower bound for the longest common subsequence problem" Information Processing Letters. vol. 7 pp. 40–41 (1978) 證明LCS的lower bound為nlogn,此方法為特例,在sequence A 有排序的條件下,就要花nlogn的時間;而當B只給一個字,找出LCS長度是0/1,要花logn的時間。方法類似2-way merge (ABAABBA)。
31
20140304郭冠呈Narao Nakatsu, Yahiko Kambayashi and Shuzo Yajima "A Longest Common Subsequence Algorithm Suitable for Similar Text Strings". Acta Informatica 18, 1982 PP.171 - 179這篇論文提出一個方法,找出兩序列之LCS,當兩序列所預期之LCS與比較的序列短的序列相似時,這邊提出的方法有較好的時間複雜度,時間複雜度為O(n(m-p)),其中m,n為兩序列之長度(m<n),p為LCS長度。
32
20140304陳慶耀Alberto Apostolico. "IMPROVING THE WORST-CASE PERFORMANCE OF THE HUNT-SZYMANSKI STRATEGY FOR THE LOGEST COMMON SUBSEQUENCE OF TWO STRINGS". Information Processing Letters 23 (1986) 63-69
33
20140310黃鼎耀P. Hogeweg, B. Hesper. "The alignment of sets of sequences and the construction of phylogenetic trees:An integrated method". Journal of Molecular Evolution 1984, Volume 20,Issue 2, pp. 175-186作者認為好的alignment,sequence alignment 和 construction of phyletic tree必須一起考慮進去,所以提出了一個整合的方法。先建一個putative tree 用來align sequences再將alignment對tree做調整,反覆的做,最後應用在5s rRNA上進行分析,利用mutation數量當做來判結果好壞
34
20140317羅崇軒MICHAEL S. WATERMAN. "Sequence alignments in the neighborhood of the optimum with general application to dynamic programming". Proc. NatL. Acad. Sci. USA Applied Mathematical Sciences Vol. 80, pp. 3123-3124, May 1983
這篇論文是在做alignment,主要是在已知optimal solution的時候,可能optimal的解不是我們所需要的,所以再繼續延伸下去找near optimal solution。
35
20140324詹德勝W. J. Hsu and M. W. Du. "New Algorithms for the LCS Problem". JOURNAL OF COMPUTER AND SYSTEM SCIENCES 29, 1984, pp. 133-152
36
20140408詹皇廷M. S. WATERMAN, T. F. SMITH, W. A. BEYER "Some Biological Sequence Metrics"Advances in Mathematics
Volume 20, Issue 3, June 1976, Pages 367–387
這篇主要在提出一個能夠解決任意變形的gap penalty,所提出的解決方式,然而以前的方式只能單一動作分開看,然而此作法能夠解決多個insertion和多個deletion有不同gap penalty的問題
37
20140408詹皇廷OSAMU GOTOH" An Improved Algorithm for Matching Biological Sequences"
Journal of molecular biology, Vol. 162, No. 3. (15 December 1982),
Pages 705-708
由於上一篇所提出的方法時間複雜度過高,所以這篇提出了一個新的方法來加速此演算法,然而其演算法需要有特別的條件,當penalty=u+kv時,才能利用此方法來降低時間複雜度
38
20140408詹皇廷STEPHAN F. Altschul, Bruce W. Erickson
"OPTIMAL SEQUENCE ALIGNMENT USING AFFINE GAP COSTS"
Bulletin of Mathematical Biology
1986, Volume 48, Issue 5-6
page 603-616

當上一篇在解出最佳解時,其判斷的direction matrix再找出多組解的過程中,會因為空間不夠導致有一些資訊沒辦法被正確的儲存,故這篇提出一個新的儲存direction的方法,正確地找出多組解
39
20140408周展億Chin Francis Y. L. and Poon C. K. "A Fast Algorithm for Computing Longest Common Subsequences of Small Alphabet Size". Journal of information processing 13(4), 463-469 (Feb, 1991)
40
20140422羅崇軒Arthur M. Lesk, Michael Levitt and Cyrus Chothia. "Alignment of the Amino Acid Sequences of Distantly Related Proteins Using Variable Gap Penalties". Protein Engineering, Vol. 1 No.1 pp.77-78, 1986在近親的蛋白質裡,當要預測一個未知的蛋白質結構,可以利用一個同系物的已知結構來做一個正確的residue 配對。然而因為螺旋結構的穩定性很重要,所以不會發現insertion和deletion在螺旋區域,但可以在螺旋區域的尾端做insertion和deletion
41
20140422黃鼎耀 D.F. Feng, M.S. Johnson, and R.F. Doolittle. "Aligning Amino Acid Sequences: Comparison of Commonly Used Methods". Journal of Molecular Evolution(1985) 21:112-125比較常見的四個方法都是based on Needleman &Wunsch,藉由對weight上的調整來看到底哪個比較好。
平均來說是dayhoff(1978)的方法比較好,並且可透過經驗法則來評估相關序列之間的顯著性
42
20140422郭冠呈Michael J. Wise. "YAP3: Improve Detection of Similarities in Computer Program and Other Texts". SIGCSE Proceedings of the twenty-seventh SIGCSE technical symposium on Computer science education 1996 pp. 130-134 這篇論文主要在描述YAP3這個系統,用於檢查兩段程式碼是否有抄襲,用於檢查相似性的演算法為Running-Karp-Rabin Greedy-String-Tiling當作系統的核心演算法
43
20140429詹德勝W. J. Hsu and M. W. Du. "New Algorithms for the LCS Problem". JOURNAL OF COMPUTER AND SYSTEM SCIENCES 29, 1984, pp. 133-152這篇論文改良了D. S. HIRSCHBERG, "Algorithms for the longest common subsequence problem",改良後,時間複雜度為:O(pm*max(1, log(n/m))),但"Remark on the Hsu-Du new algorithm for the longest common subsequence problem (1987)" 這篇論文指出時間複雜度有錯誤
44
20140506周展億J.Y. Guo and F.K. Hwang. "An almost-linear time and linear space algorithm for the longest common subsequence problem". Information Processing Letters 94(2005), pp. 131-135
45
20140506方鈞毅Temple F.Smith, Michael S.Waterman and Christian Burks. "The statistical distribution of nucleic acid similarities". Nucleic Acids Research Volume 13 Number 2 1985, pp. 645–656
46
20140514郭冠呈Peter H. Sellers "ON THE THEORY AND COMPUTATION OF EVOLUTIONARY DISTANCES".SIAM J. APPL. MATH. Vol. 26, No. 4, June 1974, pp. 787-793此篇論文的方法是改變Needleman and Wunsch的方法,主要是利用兩序列的edit distance來找出兩序列的alignment長度
47
20140514郭冠呈T.F. Smith, M.S. Waterman, and W.M. Fitch. "Comparative Biosequence Metrics". Journal of Molecular Evolution 1981, pp.38-46這篇比較Needleman and Wunsch的方法(1970)以及Sellers的方法(1974)推導出一條式子:w'k = k/2 + wk,w'k為Sellers方法的penalty,wk為Needleman方法的penalty,當兩個方法所找到的alignment相同,會符合式子,藉此證明兩方法相同
48
20140519黃鼎耀D.F. Feng, M.S. Johnson, and R.F. DoolittleGeoffrey J. Barton and Michael J. E. Sternberg "A Strategy for the Rapid Multiple Alignment of Protein Sequences Confidence Levels from Tertiary Structure Comparisons". J. Mol. Biol. (1987)198,327-337這篇paper主要是提出一個能快速又精確的做出multiple alignment的策略,從三級結構序列資料比較做分析。
那方法首先是使用最基本的dp方法Needle-Wunsch algorithm (dayhoff的score table)做pairwise的alignment,兩條序列alignment 會產一個新的alignment,然後在跟下一條做alignment,當中也會realign alignment依此類推做下去完成multiple alignment。
multiple alignment的好壞跟選擇序列的順序有很大的關係,利用機率統計分析可以選擇到一個比較好的順序,從n!種選擇-->1種選擇,這樣可以讓multiple alignment 更快速的作出來。
49
20140526周展億A. Apostolico. "Remark on Hsu-Du New Algorithm for the LCS Problem". Information Processing Letters, Volume 25, Issue 4, 17 June 1987, pp. 235–236
50
20140610黃鼎耀吳哲賢、王志倫. "生物序列比對問題上考慮凸形間隔處罰函數的一些O(nm)演算法". Proc.of the 19th Workshop on Combinatorial Mathematics and Computation Theory這篇論文是針對凸形間隔處罰函數(convex gap penalty function),首先推導出擁有O(nm)演算法的凸形間隔處罰函數條件,並提出一些符合條件的建議函數
51
20140616詹德勝S. Kiran Kumar and C. Pandu Rangan. "A Linear Space Algorithm for the LCS Problem". Acta Informatica 24, pp. 353-362 (1987)使用divide and conquer的方法來時做LCS,時間複雜度為O(L(n-L)),L為LCS(A,B的長度),n為A字串的長度
52
20140616郭冠呈Reed A. Cartwright. "Ngila: global pairwise alignments with logarithmic and affine gap costs". Bioinformatics Applications Note Vol. 23 no. 11 2007, pp. 1427–1428介紹Ngila這個application,Ngila將統計模型的概念引入alignment中,提升預測的準確性
53
20140701詹皇廷C. K. Wong AND Ashok K. Chandra. "Bounds for the String Editing Problem". Journal of the ACM (JACM) Volume 23 Issue 1, Jan. 1976 pp. 13-16
54
20140708周展億Maxime Crochemore, Costas S. Iliopoulos, and Yoan J. Pinzon. "Speeding-up Hirschberg and Hunt-Szymanski LCS Algorithms". Fundamental Informaticae, Vol. 56, No. 1-2, pp. 89-103, 2003
55
20140708詹德勝Jiaoyun Yang, Yun Xu, Yi Shang. "An Efficient Parallel Algorithm for Longest Common Subsequence Problem on GPUs". Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K. pp. 499-504
56
20140715黃鼎耀Xiaoqiu Huang, Ross C.Hardison and Webb Miller. "A space-efficient algorithm for local similarities". CABIOS Vol 6. no.4. 1990 Pages 373 – 381這篇論文是結合 Myers and Miller(1988)和 Waterman and Eggert (1987)的方法,前者就是運用Hirschberg的divide and conquer的方法在Gotoh's algorithm上,來減少sapce的使用,後者是一個尋找k個local alignment的algorithm。
因此這篇的重點就是將尋找k個local alignment的algorithm結合上Hirschberg的divide and conquer的方法讓這個local similarities algorithm達到linear space
57
20140715郭冠呈Wen-Jing Hsu. "New Algorithms for Comparing Symbol Sequences". Proceeding ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow pp. 381-389
58
20140722周展億Francis Y.L. Chin, Alfredo De Santis, Anna Lisa Ferrara, N.L. Hoa, S.K. Kim. "A Simple Algorithm for the Constrained Sequence Problems". Information Processing Letters 90 pp.175-179(2004)
59
20140722詹皇廷Sebastian Deorowicz, Szymon Grabowski. "Efficient algorithms for the longest common subsequence in k-length substrings". Information Processing Letters, Volume 114, Issue 11, November 2013, Pages 634–638
60
20140729黃鼎耀Göbel U, Sander C, Schneider R, Valencia A.. "Correlated Mutations and Residue Contacts in Proteins". Proteins. 1994 Apr;18(4): pp.309-17這篇論文是去分析multiple sequence alignment上不同位置間的相關性,進一步的去預測他的contact map,再來判斷說到底準不準。
然後是用比較好的L/5的資料來分析,最後提到accuracy 從37%到68%不等,improvement ratio 從1.4到5.1
61
20140729詹德勝Heikki Hyyro. "Bit-Parallel LCS-length Computation Revisited". PR In Proc. 15th Australasian Workshop on Combinatorial Algorithms (AWOCA) 2004
62
20140805郭冠呈Wen-Chen Hu, Gerhard X. Ritter, Mark S. Schmalz. "Approximating the Longest Approximate Common Subsequence Problem". Proceeding ACM-SE 36 Proceedings of the 36th annual Southeast regional conference pp. 166-172 (1998)
63
20140819詹皇廷Mauro Castelli, Stefano Beretta, Leonardo Vanneschi. "A hybrid genetic algorithm for the repetition free longest common subsequence problem". Operations Research Letters Volume 41, Issue 6, November 2013, Pages 644–649
64
20140819周展億Lasse Bergroth, Harri Hakonen, and Timo Raita. "New Approximation Algorithms for Longest Common Subsequences". String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings pp. 32-40
65
20140826黃鼎耀Arindam Banerjee and Joydeep Ghosh. "Clickstream Clustering using Weighted Longest Common Subsequences". In Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining這篇論文是針對網站上的使用者瀏覽資料進行分析,作者提出一套新的且有效率的演算法。首先是Path intersection 也就是找出兩個使用者瀏覽過相同的網頁,再來計算相似度考慮路徑以及待在網站上的時間,再來是針對不同使用者的瀏覽資訊進行分群,最後再利用概念式的分群將我們認知上會是差不多的路徑歸類成同一種,方便我們判斷是不是同一群。
66
20140826詹德勝Sebastian deorowicz and Agnieszka Danek. "Bit-Parallel Algorithms for the Merged Longest Common Subsequence Problem". International Journal of Foundations of Computer Science Vol. 24, No. 8 (2013) pp. 1281-1298
67
68
69
70
71
20141001徐子強A. Apostolico and C. Guerra. "The Longest Common Subsequence Problem Revisited". Algorithmica,Volume 2,Issue 1-4,pp. 315-336 (1987)
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100