![]() |
|
||||||||||||||
| | 网站首页 | 数据库教程 | web编程 | 服务器 | 程序设计 | | ||
|
||
|
||||||
| Boyer-Moore String Searching Algorithm and others | ||||||
作者:佚名 文章来源:不详 点击数: 更新时间:2007-9-12 ![]() |
||||||
|
(题外话:无知是最大的罪恶。
Simply said, KMP creates the lookup table from left to right, while BM creates the table from right to the left. The following example is from the website of the inventor (m http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.html 1. next['s']==7 // no 's' in the pattern, skip the whole string EXAMPLE 2. 'E' != 'P' (sorry, letter misaligned); next['P']==2, then go 2 letters ahead SEXAMLE 3. This time, 'A' != 'I', next['I']=7, then skip the whole string again. However, we can do better because we already found the pattern "MPLE". So we can use this information to align the discovered occurance with its last occurance in the pattern. This means we need to create another table for each post fix substring ("E", "LE", "PLE", "MPLE", "AMPLE" ...), so table["E"]==1, table["LE"]==1, table["PLE"]==1, etc. EXAMPL 4. 'E' != 'P', keep going. next['P']==2, then go 2 steps forward. MPLEXAMLE 5. eventually we got it: EXAMPL
The following is the implementing in "Algorithms in C". The table initiation is skipped since it's not that difficult anyway. int mischarsearch(char *p, char *a) return i;
The BM algorithm doesn't fit for binary string very well, but we can group bits to make "characters". However, there are some consideration: if we take b bits, then we need a table of 2b entries, thus b should be small so the tale is not too large. Then there are M-b+1 different b-bit sections in the pattern, so we want M-b+1 to be significantly less than 2b. For example, if we take b about lg(4M), then the table will be more than 3/4 filled with M entries (?). Also b must be less than M/2, otherwise the pattern could be missed if we split between two b-bit sections.
On pp.289 in "Algorithms in C", hash based string searching is mentioned (Rabin-Karp). Acutally, it is very similar to the solution on my 1st phone interview with Google. The initial idea is to create a hash value for each word (or each M-characters) in the text. While this is not feasible if the text is too huge, we just need to compute the hash value of the pattern and compare it with each M-characters in the text (computation is done in real-time) [ damn, i invented this within less than 10 mins in the google phone interveiw although I never read this algorithm before. still got rejected......maybe i didn't make my idea clear) Detailed algorithms is on pp.289 KMP and BM takes O(N+M). For RK, the worst case would be O(NM), but usually in practice it relies on N+M steps.
Other solutions could be binary tree, Patricia (B-tree). It's probably worthy to read Markov chain. 本文来源:http://blog.csdn.net/likexx/archive/2007/08/26/1759271.aspx
|
||||||
| 文章录入:admin 责任编辑:admin | ||||||
| 【发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口】 | ||||||
| 网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!) |
| | 设为首页 | 加入收藏 | 联系站长 | 友情链接 | 版权申明 | 网站公告 | 网站地图 | 管理登录 | | |||
|