教程集 www.jiaochengji.com
教程集 >  Golang编程  >  golang教程  >  正文 三种字符串查找算法的Go实现

三种字符串查找算法的Go实现

发布时间:2021-12-21   编辑:jiaochengji.com
教程集为您提供三种字符串查找算法的Go实现等资源,欢迎您收藏本站,我们将为您提供最新的三种字符串查找算法的Go实现资源
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"/></svg><blockquote>

原文转载自我的个人网站 zhangmingkai.cn ,欢迎大家访问。

</blockquote>

字符串查找就是给定一段文字,查找所有包含特定单词的方式,当我们使用网站浏览信息的时候,用Ctr F搜索网页, 或在linux上使用grep查询日志文件中的特殊字符串都是属于这类模式。所以算是一种比较常见的算法,本文将总结一些字符串查找算法常见的实现方式,以及如何用Go语言实现该算法。

主要的算法分为三种:

<ul><li>暴力遍历算法</li><li>KMP算法</li><li>BM算法</li></ul><h4>1. 暴力遍历算法</h4>

暴力遍历算法,简单来说就是通过一个字符接一个字符的移动比对。比如下面的实例图, 原始的字符文本(长度为N)放在第一行,需要匹配的字符文本(长度为M)在第二行,不断位移第二行待匹配的字符串来观察是否相同,这里每一次观察都要进行最多M次的比较。

在很多程序语言实现中,都直接利用暴力遍历来获取匹配结果。 Go语言中对于AMD64的机器,设置的最长的位置是64字节,也就是如果原始文本和匹配内容低于64字节则直接利用暴力遍历,对于大于64字节的情况下,使用快速查找算法(另一种遍历算法),但会随计算输出调整算法模式

<ul><li>如果原始文本大于64字节, 匹配文本小于64字节,则调整为暴力遍历</li><li>如果匹配和原始文本都大于64字节,则调整为Rabin Karp算法,一种指纹搜索算法(O(1)的空间使用以及线性的搜索时间)。</li></ul>

关于暴力遍历的具体的Go代码实现如下, 顺序位移匹配,查询是否完成所有匹配,如果完成则代表匹配成功,否则代表匹配失败(中途break):

<pre><code class="lang-go hljs"><span class="token keyword">func</span> <span class="token function">bruteForceStringMatching</span><span class="token punctuation">(</span>src <span class="token builtin">string</span><span class="token punctuation">,</span> match <span class="token builtin">string</span><span class="token punctuation">)</span> <span class="token builtin">bool</span> <span class="token punctuation">{</span> m <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>match<span class="token punctuation">)</span> n <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>src<span class="token punctuation">)</span> <span class="token keyword">for</span> i <span class="token operator">:=</span> <span class="token number">0</span><span class="token punctuation">;</span> i <span class="token operator"><</span> n<span class="token operator">-</span>m<span class="token punctuation">;</span> i<span class="token operator"> </span> <span class="token punctuation">{</span> j <span class="token operator">:=</span> <span class="token number">0</span> <span class="token keyword">for</span> j <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> j <span class="token operator"><</span> m<span class="token punctuation">;</span> j<span class="token operator"> </span> <span class="token punctuation">{</span> <span class="token keyword">if</span> src<span class="token punctuation">[</span>i<span class="token operator"> </span>j<span class="token punctuation">]</span> <span class="token operator">!=</span> match<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token punctuation">{</span> <span class="token keyword">break</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token keyword">if</span> j <span class="token operator">==</span> m <span class="token punctuation">{</span> <span class="token keyword">return</span> <span class="token boolean">true</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> <span class="token boolean">false</span> <span class="token punctuation">}</span> <span class="token keyword">func</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token function">println</span><span class="token punctuation">(</span><span class="token function">bruteForceStringMatching</span><span class="token punctuation">(</span><span class="token string">"this is a simple test text"</span><span class="token punctuation">,</span> <span class="token string">"hello"</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment">// false</span> <span class="token function">println</span><span class="token punctuation">(</span><span class="token function">bruteForceStringMatching</span><span class="token punctuation">(</span><span class="token string">"this is a hello message"</span><span class="token punctuation">,</span> <span class="token string">"hello"</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment">// true</span> <span class="token punctuation">}</span> </code></pre>

KMP算法是在1976年由Knuth, Morris和Pratt提出并发表的一种字符查找算法。KMP算法的主要原理是通过计算最长前缀的方式来获得匹配表对应的跳跃表, 后面的匹配过程中利用前缀匹配来计算需要跳跃的位置。实例如下:

<h4>
2. KMP算法</h4>

暴力遍历的算法简单,平均需要遍历的时间基本上与N * M成正比,但是实际运行中我们可以看到,大部分开头第一个单词不匹配就退出了这一轮的检查,因此实际基本与N成正比。当然如果查询的原始文本类似AAAAAAAAAAAA, 查询的是AAAAB, 那么依旧需要遍历所有的字符串,查询时间变为N * M。

<pre><code class="lang-go hljs">S<span class="token punctuation">:</span> ABCFABCDABFABCDABCDABDE W<span class="token punctuation">:</span> ABCDABD </code></pre>

我们按照暴力遍历的方式开始比较,发现搜索字段W的第四个字符”D”和上面的文本S中的“F”不匹配, 这时候如果按照暴力遍历,我们需要将W位移一位,但是我们知道前面的三个字符也和”F”不一致,其实可以直接跳过去,开始新的遍历, 如下面所示:

<pre><code class="lang-go hljs">S<span class="token punctuation">:</span> ABCFABCDABFABCDABCDABDE W<span class="token punctuation">:</span> ABCDABD </code></pre>

第二次匹配过程中发现F和D不一致,但是同时我们发现后面出现了的AB和字符前面的AB可以对齐,因此我们直接位移到前一个AB的位置就可以开始第三次匹配了,

<pre><code class="lang-go hljs">S<span class="token punctuation">:</span> ABCFABCDABFABCDABCDABDE W<span class="token punctuation">:</span> ABCDABD </code></pre>

这就是KMP算法的原理。为了计算跳跃位置比如上面的搜索词语ABCDABD,我们需要依次查看每个单词出现的位置和前缀组合, 下面是通过拆分获取的组合信息,形成的最长前缀组合数组为[0,0,0,0,1,2,0]

<pre><code class="lang-go hljs"><span class="token number">0</span> A <span class="token number">0</span> AB <span class="token number">0</span> ABC <span class="token number">0</span> ABCD <span class="token number">1</span> ABCDA <span class="token number">2</span> ABCDAB <span class="token number">0</span> ABCDABD </code></pre>

为了后面更好的计算我们需要对于该前缀表进行移位操作, 计算出实际需要的Next查询表,这个表代表的是除了当前字符以外的最长前缀表,因此需要将数组向右移位,第一位用-1进行补齐。完成移位操作后的数组为[-1, 0,0,0,0,1,2 ]。

实际计算过程中通过查询跳跃表获取新的匹配位置进行位移即可

<pre><code class="lang-go hljs"><span class="token keyword">func</span> <span class="token function">getPrefixTable</span><span class="token punctuation">(</span>search <span class="token builtin">string</span><span class="token punctuation">)</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">int</span> <span class="token punctuation">{</span> ptLength <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>search<span class="token punctuation">)</span> pt <span class="token operator">:=</span> <span class="token function">make</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">int</span><span class="token punctuation">,</span> ptLength<span class="token punctuation">)</span> pt<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">0</span> <span class="token builtin">len</span> <span class="token operator">:=</span> <span class="token number">0</span> i <span class="token operator">:=</span> <span class="token number">1</span> <span class="token keyword">for</span> i <span class="token operator"><</span> ptLength <span class="token punctuation">{</span> <span class="token keyword">if</span> search<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">==</span> search<span class="token punctuation">[</span><span class="token builtin">len</span><span class="token punctuation">]</span> <span class="token punctuation">{</span> <span class="token builtin">len</span><span class="token operator"> </span> pt<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token builtin">len</span> i<span class="token operator"> </span> <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{</span> <span class="token keyword">if</span> <span class="token builtin">len</span> <span class="token operator">></span> <span class="token number">1</span> <span class="token punctuation">{</span> <span class="token builtin">len</span> <span class="token operator">=</span> pt<span class="token punctuation">[</span><span class="token builtin">len</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{</span> pt<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token builtin">len</span> i<span class="token operator"> </span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> pt <span class="token punctuation">}</span> <span class="token keyword">func</span> <span class="token function">shiftPT</span><span class="token punctuation">(</span>pt <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">int</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token builtin">len</span> <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>pt<span class="token punctuation">)</span> <span class="token keyword">for</span> i <span class="token operator">:=</span> <span class="token builtin">len</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">;</span> i <span class="token operator">></span> <span class="token number">0</span><span class="token punctuation">;</span> i<span class="token operator">--</span> <span class="token punctuation">{</span> pt<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> pt<span class="token punctuation">[</span>i<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token punctuation">}</span> pt<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token operator">-</span><span class="token number">1</span> <span class="token punctuation">}</span> <span class="token keyword">func</span> <span class="token function">searchWithPT</span><span class="token punctuation">(</span>text <span class="token builtin">string</span><span class="token punctuation">,</span> pattern <span class="token builtin">string</span><span class="token punctuation">)</span> <span class="token builtin">bool</span> <span class="token punctuation">{</span> pt <span class="token operator">:=</span> <span class="token function">getPrefixTable</span><span class="token punctuation">(</span>pattern<span class="token punctuation">)</span> <span class="token function">shiftPT</span><span class="token punctuation">(</span>pt<span class="token punctuation">)</span> fmt<span class="token punctuation">.</span><span class="token function">Printf</span><span class="token punctuation">(</span><span class="token string">"pt=%v"</span><span class="token punctuation">,</span> pt<span class="token punctuation">)</span> M <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>text<span class="token punctuation">)</span> N <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>pattern<span class="token punctuation">)</span> <span class="token keyword">if</span> N <span class="token operator">></span> M <span class="token punctuation">{</span> <span class="token keyword">return</span> <span class="token boolean">false</span> <span class="token punctuation">}</span> <span class="token comment">// i 追踪text的位置 , j 追踪pattern的位置</span> i<span class="token punctuation">,</span> j <span class="token operator">:=</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span> found <span class="token operator">:=</span> <span class="token number">0</span> <span class="token keyword">for</span> i <span class="token operator"><</span> M <span class="token punctuation">{</span> <span class="token keyword">if</span> j <span class="token operator">==</span> N<span class="token operator">-</span><span class="token number">1</span> <span class="token operator">&&</span> text<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">==</span> pattern<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token punctuation">{</span> fmt<span class="token punctuation">.</span><span class="token function">Printf</span><span class="token punctuation">(</span><span class="token string">"found pattern \"%s\" at index %d\n"</span><span class="token punctuation">,</span> pattern<span class="token punctuation">,</span> i<span class="token operator">-</span>j<span class="token punctuation">)</span> found<span class="token operator"> </span> j <span class="token operator">=</span> pt<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token punctuation">}</span> <span class="token keyword">if</span> text<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">==</span> pattern<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token punctuation">{</span> i<span class="token operator"> </span> j<span class="token operator"> </span> <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{</span> j <span class="token operator">=</span> pt<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token keyword">if</span> j <span class="token operator">==</span> <span class="token operator">-</span><span class="token number">1</span> <span class="token punctuation">{</span> i<span class="token operator"> </span> j<span class="token operator"> </span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> fmt<span class="token punctuation">.</span><span class="token function">Printf</span><span class="token punctuation">(</span><span class="token string">"find %d pattern in the text \n"</span><span class="token punctuation">,</span> found<span class="token punctuation">)</span> <span class="token keyword">return</span> found <span class="token operator">></span> <span class="token number">0</span> <span class="token punctuation">}</span> </code></pre>

KMP算法实现起来没有前面的暴力算法那么直接,但是可以保证现行级别的性能,且不需要再查询中进行回退操作。

<h4>
3. Boyer & Moore算法</h4>

BM算法有R.S.Boyer和J.S.Moore发明,被用在许多的文本编辑类的应用程序。B&M算法在Go源码的stringFinder结构体中实现,用来高效的实现字符串的查找。B&M算法的原理如下,下面的例子是在基因序列中查找特定的基因组合。比如TATGT组合,算法从右向左开始匹配,第一个匹配的是C和G,我们可以看到C并没有存在匹配序列中,因此我们可以直接将匹配序列向前移动整个序列长度的范围, 跳到第二个图例的位置上。

比如比对发现的不是C和G,而是A和G, 那么我们就将序列向前移动两个位置,使得A和匹配位置A对齐。

因此B&M算法需要预先计算一个查询表,用于查询跳跃的位置数。这个表格可以通过下面的代码生成, 对于不存在在匹配表的所有字符,全部设置为-1, 对于其他存在在匹配表的字符,存放实际的最右端的位置。比如有两个A,则仅存储最右侧的A, 用于实际移动距离计算使用。

<pre><code class="lang-go hljs"><span class="token keyword">const</span> NumberOfChars <span class="token operator">=</span> <span class="token number">256</span> <span class="token keyword">func</span> <span class="token function">genBadCharTable</span><span class="token punctuation">(</span>pattern <span class="token builtin">string</span><span class="token punctuation">,</span> size <span class="token builtin">int</span><span class="token punctuation">)</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">int</span> <span class="token punctuation">{</span> badCharTable <span class="token operator">:=</span> <span class="token function">make</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">int</span><span class="token punctuation">,</span> NumberOfChars<span class="token punctuation">)</span> <span class="token keyword">for</span> i <span class="token operator">:=</span> <span class="token number">0</span><span class="token punctuation">;</span> i <span class="token operator"><</span> NumberOfChars<span class="token punctuation">;</span> i<span class="token operator"> </span> <span class="token punctuation">{</span> badCharTable<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token operator">-</span><span class="token number">1</span> <span class="token punctuation">}</span> <span class="token keyword">for</span> i <span class="token operator">:=</span> <span class="token number">0</span><span class="token punctuation">;</span> i <span class="token operator"><</span> size<span class="token punctuation">;</span> i<span class="token operator"> </span> <span class="token punctuation">{</span> badCharTable<span class="token punctuation">[</span><span class="token function">int</span><span class="token punctuation">(</span>pattern<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">]</span> <span class="token operator">=</span> i <span class="token punctuation">}</span> <span class="token keyword">return</span> badCharTable <span class="token punctuation">}</span> </code></pre>

利用上面的表格进行实际搜索的代码如下面所示,skip变量存放跳跃的个数,该个数通过当前比较不一致的时候,查询跳跃表计算获得,如果内循环结束,且Skip为0则代表全部匹配成功,并找到查询的内容否则则根据实际情况进行跳跃。

<pre><code class="lang-go hljs"><span class="token keyword">func</span> <span class="token function">searchWithBM</span><span class="token punctuation">(</span>text <span class="token builtin">string</span><span class="token punctuation">,</span> pattern <span class="token builtin">string</span><span class="token punctuation">)</span> <span class="token builtin">int</span> <span class="token punctuation">{</span> M <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>pattern<span class="token punctuation">)</span> N <span class="token operator">:=</span> <span class="token function">len</span><span class="token punctuation">(</span>text<span class="token punctuation">)</span> badCharTable <span class="token operator">:=</span> <span class="token function">genBadCharTable</span><span class="token punctuation">(</span>pattern<span class="token punctuation">,</span> M<span class="token punctuation">)</span> skip <span class="token operator">:=</span> <span class="token number">0</span> <span class="token keyword">for</span> i <span class="token operator">:=</span> <span class="token number">0</span><span class="token punctuation">;</span> i <span class="token operator">&</span>lt<span class="token punctuation">;</span><span class="token operator">=</span> N<span class="token operator">-</span>M<span class="token punctuation">;</span> i <span class="token operator"> =</span> skip <span class="token punctuation">{</span> <span class="token keyword">for</span> j <span class="token operator">:=</span> M <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">;</span> j <span class="token operator">>=</span> <span class="token number">0</span><span class="token punctuation">;</span> j<span class="token operator">--</span> <span class="token punctuation">{</span> <span class="token function">println</span><span class="token punctuation">(</span>i<span class="token punctuation">,</span> j<span class="token punctuation">)</span> <span class="token keyword">if</span> text<span class="token punctuation">[</span>i<span class="token operator"> </span>j<span class="token punctuation">]</span> <span class="token operator">!=</span> pattern<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token punctuation">{</span> skip <span class="token operator">=</span> j <span class="token operator">-</span> badCharTable<span class="token punctuation">[</span>text<span class="token punctuation">[</span>i<span class="token operator"> </span>j<span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token keyword">if</span> skip <span class="token operator">&</span>lt<span class="token punctuation">;</span> <span class="token number">1</span> <span class="token punctuation">{</span> skip <span class="token operator">=</span> <span class="token number">1</span> <span class="token punctuation">}</span> <span class="token keyword">break</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token keyword">if</span> skip <span class="token operator">==</span> <span class="token number">0</span> <span class="token punctuation">{</span> fmt<span class="token punctuation">.</span><span class="token function">Printf</span><span class="token punctuation">(</span><span class="token string">"found pattern %s\n"</span><span class="token punctuation">,</span> pattern<span class="token punctuation">)</span> <span class="token keyword">return</span> i <span class="token punctuation">}</span> <span class="token punctuation">}</span> fmt<span class="token punctuation">.</span><span class="token function">Println</span><span class="token punctuation">(</span><span class="token string">"no pattern been found"</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span> <span class="token punctuation">}</span> </code></pre>

实际的算法中还可能结合一个正向匹配的表格一并进行计算,每次移动都进行两次移动,算出最远的匹配。但是两次计算的效率是否更高效,需要实际测试才能知道。

<h4>
总结</h4>

关于三种常见的字符串比较算法,应用场景其实存在比较大的区别,对于暴力遍历,比较易于理解,且对于较短的字符处理高效直接,不需要额外的数组空间存储其他的数据,但是对于较大量的数据搜索需要的时间可能相对较长。

KMP算法能够保证线行级别性能,BM算法也算是一种线行级别性能的比较算法,但是BM算法在一般情况下跳跃的字符会更多一些,执行搜索的效率也会更好,另外KMP和BM都需要额外的内存空间存储数据,对于较大的比较数据,需要综合考虑实际的情况进行选择。

PS: 另外如果需要执行多个模式的同时匹配,可利用Aho-Corasick算法实现比较好的搜索性能

到此这篇关于“三种字符串查找算法的Go实现”的文章就介绍到这了,更多文章或继续浏览下面的相关文章,希望大家以后多多支持JQ教程网!

您可能感兴趣的文章:
想系统学习GO语言(Golang
php字符串函数有哪些
php字符串查找函数(strrpos与strchr)
php 查找字符串常用函数说明
Javascript正则表达式详解(一)
php5 字符串处理函数汇总
php查找字符串学习笔记
php 字符串函数教程与实例代码
php字符串比较与查找方法详解
php字符串查找 查找字符最后一次出现位置

[关闭]
~ ~