教程集 www.jiaochengji.com
教程集 >  Golang编程  >  golang教程  >  正文 golang处理0x08不可见unicode字符

golang处理0x08不可见unicode字符

发布时间:2022-03-21   编辑:jiaochengji.com
教程集为您提供golang处理0x08不可见unicode字符等资源,欢迎您收藏本站,我们将为您提供最新的golang处理0x08不可见unicode字符资源
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"/></svg><h2>时间是个幻觉</h2>

在同步印象笔记的接口中报错,报错的基本信息大致如下:
<mark>golang An invalid XML character (Unicode: 0x8) was found</mark>

原因在于,文本中包含了unicode不可见字符。所以,目标很明确,将不可见字符过滤掉。

我使用了"unicode/utf8"来进行处理,因为我们的数据库默认字符也是utf-8的。看一下这个package的描述:

<blockquote>

Package utf8 implements functions and constants to support text encoded in UTF-8. It includes functions to translate between runes and UTF-8 byte sequences. See https://en.wikipedia.org/wiki/UTF-8

</blockquote>

应该支持的条件是UTF-8的文本,我感觉GBK的应该不支持。刚兴趣的可以留言告诉我。

我们看一下unicode对照表,我看到一篇格式比较端正的CSDN的文章,大家可以去看:ASCII码、Unicode编码对照表 —— ASCII控制字符 Unicode编码 字符编码的前世今生,我截个图吧:


这些都是不可见字符,都需要替换掉。

<pre><code class="lang-go hljs"><span class="token comment">// 我手直接码的,可能有编辑错误,有问题大家自己动手改改</span> originStr <span class="token operator">:=</span> <span class="token string">"等待处理的字符串"</span> <span class="token comment">// 将字符串转换为rune数组</span> srcRunes <span class="token operator">:=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token function">rune</span><span class="token punctuation">(</span>originStr<span class="token punctuation">)</span> <span class="token comment">// 创建一个新的rune数组,用来存放过滤后的数据</span> dstRunes <span class="token operator">:=</span> <span class="token function">make</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">rune</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token function">len</span><span class="token punctuation">(</span>srcRunes<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment">// 过滤不可见字符,根据上面的表的0-32和127都是不可见的字符</span> <span class="token keyword">for</span> <span class="token boolean">_</span><span class="token punctuation">,</span> c <span class="token operator">:=</span> <span class="token keyword">range</span> srcRunes <span class="token punctuation">{</span> <span class="token keyword">if</span> c <span class="token operator">>=</span> <span class="token number">0</span> <span class="token operator">&&</span> c <span class="token operator"><=</span> <span class="token number">31</span> <span class="token punctuation">{</span> <span class="token keyword">continue</span> <span class="token punctuation">}</span> <span class="token keyword">if</span> c <span class="token operator">==</span> <span class="token number">127</span> <span class="token punctuation">{</span> <span class="token keyword">continue</span> <span class="token punctuation">}</span> dstRunes <span class="token operator">=</span> <span class="token function">append</span><span class="token punctuation">(</span>dstRunes<span class="token punctuation">,</span> c<span class="token punctuation">)</span> <span class="token punctuation">}</span> result <span class="token operator">:=</span> <span class="token function">string</span><span class="token punctuation">(</span>dstRunes<span class="token punctuation">)</span> </code></pre>

没有对代码进行压测,也不知道性能怎么样,但感觉来看,应能应该不怎么样吧。

<h2>
没有当下</h2>

rune是什么类型呢,看一下它的定义:

<pre><code class="lang-go hljs"><span class="token comment">// rune is an alias for int32 and is equivalent to int32 in all ways. It is</span> <span class="token comment">// used, by convention, to distinguish character values from integer values.</span> <span class="token keyword">type</span> <span class="token builtin">rune</span> <span class="token operator">=</span> <span class="token builtin">int32</span> </code></pre> <h2>
广告时间</h2>

CSDN已经很少有这么良心的排版了,如果觉得有用,点点手,关注一下公众号。你的关注就是对我最大的支持。

到此这篇关于“golang处理0x08不可见unicode字符”的文章就介绍到这了,更多文章或继续浏览下面的相关文章,希望大家以后多多支持JQ教程网!

您可能感兴趣的文章:
golang处理0x08不可见unicode字符
PHP6的新特性:Unicode和TextIterator
Go 中文和unicode字符之间转换
JSP页面编码问题分析
php对unicode转utf-8编码
php各种编码集 字符集 显示 详解
[go基础] go基础之字符串中查找汉字数量
12.Go字符串
mysql中字符集 utf8 和utf8mb4 有什么区别?
Django中的惰性翻译怎么用?

[关闭]
~ ~