网站公告列表

  没有公告

加入收藏
设为首页
联系站长
您现在的位置: 网络学院 >> 程序设计 >> VC编程 >> 文章正文
  Boyer-Moore String Searching Algorithm and others            【字体:
Boyer-Moore String Searching Algorithm and others
作者:佚名    文章来源:不详    点击数:    更新时间:2007-9-12    

(题外话:无知是最大的罪恶。
正在装载数据……
以前虽然知道UT Austin/Texas A&M排名不错,但是总觉得仅仅是排名罢了,并不觉得这两个学校如何了不起,总觉得超级大牛都是在MIT/Stanford/berkley里面----例如Knuth......现在认真重新复习C++和算法后,才懂得自己的无知。C++的发明者BJ是在Texas A&M,下面说的Boyer-Moore中的Moore是在UT Austin----唉,早知道,当年死也要申请这两个学校呀!!前次在Austin某公司interview的时候,让我写两个C程序的那位须发皆白的就说自己以前是Texas A&M 物理系的professor,欧心理还想你一个物理系出身的对c有多少了解哟-----自己真是不知道天高地厚) 

Simply said, KMP creates the lookup table from left to right, while BM creates the table from right to the left.

The following example is from the website of the inventor (m

http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.html

1. next['s']==7   // no 's' in the pattern, skip the whole string

EXAMPLE
HERE I
S A SIMPLE EXAMPLE

 2. 'E' != 'P' (sorry, letter misaligned);  next['P']==2, then go 2 letters ahead

        SEXAMPLE
HERE IS A SIM
PLE EXAMPLE

3. This time, 'A' != 'I', next['I']=7, then skip the whole string again. However, we can do better because we already found the pattern "MPLE". So we can use this information to align the discovered occurance with its last occurance in the pattern. This means we need to create another table for each post fix substring ("E", "LE", "PLE", "MPLE", "AMPLE" ...), so table["E"]==1, table["LE"]==1, table["PLE"]==1, etc.

                 EXAMPLE
HERE IS A SIMPL
E EXAMPLE

4. 'E' != 'P', keep going. next['P']==2, then go 2 steps forward.

                       MPLEXAMPLE
HERE IS A SIMPLE EXAM
PLE

5. eventually we got it:

                                 EXAMPLE
HERE IS A SIMPLE EXAMPL
E

 


 

The following is the implementing in "Algorithms in C". The table initiation is skipped since it's not that difficult anyway.

int mischarsearch(char *p, char *a)
{
 int i,j,t, M=strlen(p), N=strlen(a);
 initskip(p);
 for (i=M-1, j=M-1; j>0; i--, j--)
 // both starts from the right most letter
 {
  while (a[i] != p[j])
  {
   t=skip[index(a[i])];
   i+= (M-j>t)?M-j:t;   //use the rightmost one; this acutally doesn't matter since it depends on how you design your implementing
   if (i>=N)  return N;
   j=M-1;                  // reset the pointer to the right most
  }
 }

 return i;
}

 


 

 

The BM algorithm doesn't fit for binary string very well, but we can group bits to make "characters". However, there are some consideration:

if we take b bits, then we need a table of 2b entries, thus b should be small so the tale is not too large. Then there are M-b+1 different b-bit sections in the pattern, so we want M-b+1 to be significantly less than 2b. For example, if we take b about lg(4M), then the table will be more than 3/4 filled with M entries (?). Also b must be less than M/2, otherwise the pattern could be missed if we split between two b-bit sections.

 


 

 

On pp.289 in "Algorithms in C", hash based string searching is mentioned (Rabin-Karp).

Acutally, it is very similar to the solution on my 1st phone interview with Google. The initial idea is to create a hash value for each word (or each M-characters) in the text. While this is not feasible if the text is too huge, we just need to compute the hash value of the pattern and compare it with each M-characters in the text (computation is done in real-time) [ damn, i invented this within less than 10 mins in the google phone interveiw although I never read this algorithm before. still got rejected......maybe i didn't make my idea clear)

Detailed algorithms is on pp.289

KMP and BM takes O(N+M). For RK, the worst case would be O(NM), but usually in practice it relies on N+M steps.

 


 

 Other solutions could be binary tree, Patricia (B-tree). It's probably worthy to read Markov chain.




本文来源:http://blog.csdn.net/likexx/archive/2007/08/26/1759271.aspx
站内文章搜索 高级搜索
文章录入:admin    责任编辑:admin 
  • 上一篇文章:

  • 下一篇文章:
  • 发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口
    最新热点 最新推荐 相关文章
     directx 图形接口指南(…
     win2k下的api函数的拦截
     用crypto  api  实现公钥…
     根据别人的md5源码封装的…
     vc中使用gdi+合并jpg图片
     document/view的交互 --…
     windows下的函数hook技术
     windows api函数大全一
     用vc 6.0实现串行通信的…
     vc++技术内幕(第四版)…
  • 从头开始认识jboss

  • 浅析Spring框架下PropertyPl…

  • SPRING+STRUTS+HIBERNATE登录…

  • SIP简介,第2部分:SIP SERV…

  • JavaWeb中的Session、Sessio…

  • tomcat下配置jspservletbean…

  • chapter one

  • 看完了第二遍C++Primer,学习…

  • MVP——Model-Viewer-Presen…

  • boost.any源码重列

  •   网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!)
    网络学院©2007 www.23book.net
    为您提供web编程,vb编程,vc编程,服务器架设管理,数据库设计等方面的知识 站长:David