Bash脚本怎么控制终端的字符编码集？

sanz · #1

我写了个脚本自动读取网页并提取网页上有用数据，功能实现没有问题。但是我的终端默认的字符集是UTF8的，而提取的文字是GB2312的。每次我运行脚本都要先手动设置显示字符集然后运行完再设回来。很烦。
怎么能自动设置啊？

sanz · #2

没有人知道啊？

eexpress · #3

bash里面直接转换编码。
iconv -f gb18030 -t utf8 $j >tmp
自己研究。

sanz · #4

谢谢楼上的。醍醐灌顶。。。

roylez · #5

eexpress 写了：bash里面直接转换编码。
iconv -f gb18030 -t utf8 $j >tmp
自己研究。

这个都有……越来越佩服你了。

roylez · #6

发信人: samhui (RC), 信区: Linux
标题: FreeBSD中文HOWTO[转载](10)
发信站: 逸仙时空 Yat-sen Channel (Sat Dec 14 10:39:16 2002)

Chapter 10. 中文转码软体

_________________________________________________________________

10.1. iconv

iconv 的安装：

# cd /usr/ports/converters/iconv

# make install

基本的用法有：

% iconv -f gb2312 -t big5 file.gb > file.big5

WWW: [303]http://www.dante.net/staff/konstantin/FreeBSD/iconv/

_________________________________________________________________

10.2. cn2jp - 在中文和日文间的编码转移函式库

一个可以中文 {GB,Big5,HZ} 和日文 (EUC-Jis/Shift-Jis/Jis)

之间互相转换的程式。

cn2jp 的安装：

# cd /usr/ports/converters/cn2jp

# make install

基本的用法有：

% b2j < file.big5 > file.jis

% g2j < file.gb > file.jis

% j2b < file.jis > file.big5

% g2b < file.gb > file.big5

% j2g < file.jis > file.gb

% b2g < file.big5 > file.gb

另外还有几个可以使用的 Library：

char *lang_big5_to_eucjis(istr)

;translate Big5 in istr to EUC-Jis in allocated buffer

;the allocated buffer is returned and valid until next call

;refer to subdirectory big2jis

char *lang_gb_to_eucjis();

;translate GB in istr to EUC-Jis in allocated buffer

;the allocated buffer is returned and valid until next call

;refer to subdirectory gb2jis

char *lang_eucjis_to_big5(istr)

;translate EUC-Jis in istr to Big5 in allocated buffer

;the allocated buffer is returned and valid until next call

;refer to subdirectory jis2big

char *lang_gb_to_big5(istr)

;translate GB in istr to Big5 in allocated buffer

;the allocated buffer is returned and valid until next call

;refer to subdirectory gb2big

char *lang_eucjis_to_gb(istr)

;translate EUC-Jis in istr to GB in allocated buffer

;the allocated buffer is returned and valid until next call

;refer to subdirectory jis2gb

char *lang_big5_to_gb(istr)

;translate Big5 in istr to GB in allocated buffer

;the allocated buffer is returned and valid until next call

;refer to subdirectory big2gb

int lang_uzpj

;uses the uzpj system for unmappable words

int lang_debug

;turns on the debug info in translation

_________________________________________________________________

10.3. autoconvert - 智慧的中文编码转换

autoconverter 的特色是有自动判断转码功能，适合用在 GB <=> Big5 <=> HZ

互转。

autoconvert 的安装：

# cd /usr/ports/chinese/autoconvert

# make install

如果您使用 procmail，在

/usr/local/share/autoconvert/procmailrc.example 底下有一个将

autoconvert 当作 procmail 过滤器的范例。

autoconvert使用方法：

% autob5 -i utf8 -o big5 < shed.utf8 > shed.utf8.big5-ac

WWW: [304]http://banyan.dlut.edu.cn/~ygh/

_________________________________________________________________

10.4. c2t - 转译 GB/Big5 编码成拼音

GB(大陆)/Big5(台湾)中文转成拼音中文字转成英文字母拼音。

c2t 的安装：

# cd /usr/ports/chinese/c2t

# make install

% echo "您好" | b2g | c2t

nin2 hao3

_________________________________________________________________

10.5. hc - 汉字转换器，在 GB 和 Big5 编码间转换

汉字转换器，这是一个 BIG5 及 GB 编码的转换程式。

hc 的安装：

# cd /usr/ports/chinese/hc

# make install

GB 转为 BIG5 用：

% hc -m g2b -t /usr/local/share/chinese/hc.tab < INPUT_FILE > OUTPUT_FILE

BIG5 转为 GB 用：

% hc -m b2g -t /usr/local/share/chinese/hc.tab < INPUT_FILE > OUTPUT_FILE

您可以自己写一个 shell script 来简化上面的指令。

或是直接使用已经写好的 shell script g2b 和 b2g。

_________________________________________________________________

10.6. gb2jis - GB汉字转换JIS汉字

# cd /usr/ports/chinese/gb2jis

# make install

可输入以下编码：

GB2312-80 + GB8565-88

GB2312-80

中国语EUC (8-bit GuoBiao)

HZ-encoding

可输出以下编码：

JISX0208-1983 + JISX0212-1990

JISX0208-1983 + JISX0212-1990 + UZPJ

JISX0208-1983

JISX0208-1983 + UZPJ

日本语EUC

日本语EUC + UZPJ

Shift-JIS

Shift-JIS + UZPJ

UZPJ（双拼）规则等详细说明请参阅 man 1 gb2jis 的操作指南。

_________________________________________________________________

10.7. hztty - 在 GB Big5 和 HZ tty 中转换

Hztty 可在不同中文编码格式做转换。

可转换国标(GB)/繁体(Big5)/汉字(HZ)标(GB)就是大陆用的简体字，

繁体(Big5)是台湾用的，汉字(HZ)是其它地区用的，

各地区有不同的中文编码，有了此程式让不同地区间的中文沟通更顺畅。

hztty 的安装：

# cd /usr/ports/chinese/hztty

# make install

用法很简单，如果您要在 Big5 的环境底下连上 GB 的

BBS，只要依照以下的步骤：

% hztty -I big2gb -O gb2big

[hztty started] [using /dev/ttyp3]

% telnet bbs.tsinghua.edu.cn

% exit

exit

[hztty exited]

这样子就可以输出自动将简体转为繁体，并将输入自动由繁体转为简体，

就可以很方便的与使用简体的人沟通了。

bbs.tsinghua.edu.cn(水木清华) 有 FreeBSD 的讨论版喔

snapshot: [305]

_________________________________________________________________

10.8. jis2gb - JIS汉字转换GB汉字

# cd /usr/ports/chinese/jis2gb

# make install

可输入以下编码：

JISX0208-1983 (JISC6226-1978)

JISX0212-1990

日本语EUC

Shift-JIS

可输出以下编码：

GB2312-80 + GB8565-88

GB2312-80

中国语EUC (8-bit GuoBiao)

HZ-encoding

详细说明请参阅 man 1 jis2gb 的操作指南。

_________________________________________________________________

10.9. pycodec - 中文码/万国码转换程式

这个套件支援Python和C两种介面，可转换中文码和万国码(Unicode)。

如果你只用Python写程式，请采用Python介面。

然而，如果你喜欢C，可以试着使用C介面。 C介面是用Python/C

API写出来的，目的是为了得到较佳的效能。

就目前而言，Python介面适用Linux和Windows系统，

但是，这一版的C介面只能用于Linux系统。

# cd /usr/ports/chinese/pycodec

# make install

在demo/子目录下，你可以找到test_*.py；

这是用来示范如何把中文码转换成Unicode，或者从Unicode转换成中文码。

在chinesetw/子目录下，有四个对照表档案，如下所示：

o big52utf1.py

o big52utf2.py

o utf2big51.py

o utf2big52.py

主档名中最后一个数字是指BIG5码的层级，如big52utf1.py指的就是第一层

BIG5码，而big52utf2.py指的就是第二层BIG5码。

由于第一层BIG5码定义的都是最常用的中文字，

因此，把第一层和第二层分开，多少有助于加快辞典的搜寻速度。

此外，倚天版的Big5或是Big5 Plus并不保证能正常运作。

C介面：每个共享模组中只有两个方函：decode()和encode()。

你可以把BIG5字串转成万国码字串，也可以直接转成UTF-8或UTF-16。

最原始的用法，请参见范例。

WWW: [306]http://sourceforge.net/projects/python-codecs/

roylez · #7

这个应该是出处了，算是无敌的。

http://netlab.cse.yzu.edu.tw/~statue/freebsd/zh-tut/