亚洲免费在线-亚洲免费在线播放-亚洲免费在线观看-亚洲免费在线观看视频-亚洲免费在线看-亚洲免费在线视频

關(guān)于java生成UTF-8編碼格式文件的詭異問題

系統(tǒng) 2099 0
★★★ 本篇為原創(chuàng),需要引用轉(zhuǎn)載的朋友請注明:《 http://stephen830.iteye.com/blog/259350 》 謝謝支持!★★★

用java生成一個UTF-8文件:

如果文件內(nèi)容中沒有中文內(nèi)容,則生成的文件為ANSI編碼格式;
如果文件內(nèi)容中有中文內(nèi)容,則生成的文件為UTF-8編碼格式。

也就是說,如果你的文件內(nèi)容沒有中文內(nèi)容的話,你生成的文件是ANSI編碼的。

    
/**
     * 生成UTF-8文件.
     * 如果文件內(nèi)容中沒有中文內(nèi)容,則生成的文件為ANSI編碼格式;
     * 如果文件內(nèi)容中有中文內(nèi)容,則生成的文件為UTF-8編碼格式。
     * @param fileName 待生成的文件名(含完整路徑)
     * @param fileBody 文件內(nèi)容
     * @return
     */
    public static boolean writeUTFFile(String fileName,String fileBody){
        FileOutputStream fos = null;
        OutputStreamWriter osw = null;
        try {
            fos = new FileOutputStream(fileName);
            osw = new OutputStreamWriter(fos, "UTF-8");
            osw.write(fileBody);
            return true;
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }finally{
            if(osw!=null){
                try {
                    osw.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
            if(fos!=null){
                try {
                    fos.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
    }

//main()
public static void main(String[] argc){
    	writeUTFFile("C:\\test1.txt","aaa");//test1.txt為ANSI格式文件
    	writeUTFFile("C:\\test2.txt","中文aaa");//test2.txt為UTF-8格式文件
}

  


感謝朋友[Liteos]關(guān)于(utf-8 bom)的建議,貼出這2個文件的Hex內(nèi)容圖,大家參考。

用UltraEdit( UltraEdit版本10.0 )的Hex模式查看(見下圖):

(test1.txt ANSI格式 a:61)


(test2.txt UTF-8格式 2D 4E:中,87 65:文,61 00:a)

如果在你的UltraEdit也看到上面的2個圖,那么請馬上升級你的UltraEdit軟件吧,低版本的UltraEdit對utf-8文件的Hex模式查看有問題。

感謝Liteos朋友的指出,將我原來的UltraEdit10卸了裝上了最新的14.20版本,對test2.txt按Hex模式得到下面的圖:
關(guān)于java生成UTF-8編碼格式文件的詭異問題
UltraEdit版本14.20 看test2.txt文件 E4B8AD表示“中”,E69687表示“文”,61表示“a”)

至此可以發(fā)現(xiàn),UTF-8對中文的處理很夸張,需要3個字節(jié)才能表示一個中文字!!這讓我想起在另外一篇文章中看到的關(guān)于utf-8的一個批評(UTF-8對亞洲語言的一種歧視),具體的文章地址:《為什么用Utf-8編碼?》 http://stephen830.iteye.com/blog/258929



還是回到本文的問題,為啥沒有生成UTF-8文件?估計是JAVA內(nèi)部I/O處理時如果遇到都是單字節(jié)字符,則只生成ANSI格式文件( 但程序中已經(jīng)設(shè)定了要UTF-8,為什么不給我生成UTF-8,一個bug嗎? ),只有遇到多字節(jié)的字符時才根據(jù)設(shè)定的編碼(例如UTF-8)來生成文件。

下面引用一段w3c組織關(guān)于utf-8的bom描述:(原文地址:http://www.w3.org/International/questions/qa-utf8-bom)

引用

FAQ: Display problems caused by the UTF-8 BOM

on this page:? Question - Background - Answer - By the way - Further reading

Intended audience: XHTML/HTML coders (using editors or scripting), script developers (PHP, JSP, etc.), CSS coders, XSLT developers, Web project managers, and anyone who is trying to diagnose why blank lines or other strange items are displayed on their UTF-8 page.
Question

When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web page or included file. How do I remove them?
Answer

If you are dealing with a file encoded in UTF-8, your display problems may be caused by the presence of a UTF-8 signature (BOM) that the user agent doesn't recognize.

The BOM is always at the beginning of the file, and so you would normally expect to see the display issues at the top of a page. However, you may also find blank lines appearing within the page if you include text from a separate file that begins with a UTF-8 signature.

We have a set of test pages and a summary of results for various recent browser versions that explore this behaviour.

This article will help you determine whether the UTF-8 is causing the problem. If there is no evidence of a UTF-8 signature at the beginning of the file, then you will have to look elsewhere for a solution.
What is a UTF-8 signature (BOM)?

Some applications insert a particular combination of bytes at the beginning of a file to indicate that the text contained in the file is Unicode. This combination of bytes is known as a signature or Byte Order Mark (BOM). Some applications - such as a text editor or a browser - will display the BOM as an extra line in the file, others will display unexpected characters, such as ???.

See the side panel for more detailed information about the BOM.

The BOM is the Unicode codepoint U+FEFF, corresponding to the Unicode character 'ZERO WIDTH NON-BREAKING SPACE' (ZWNBSP).

In UTF-16 and UTF-32 encodings, unless there is some alternative indicator, the BOM is essential to ensure correct interpretation of the file's contents. Each character in the file is represented by 2 or 4 bytes of data and the order in which these bytes are stored in the file is significant; the BOM indicates this order.

In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoding text, however, either as a by-product of an encoding conversion or because it was added by an editor.
Detecting the BOM

First, we need to check whether there is indeed a BOM at the beginning of the file.

You can try looking for a BOM in your content, but if your editor handles the UTF-8 signature correctly you probably won't be able to see it. An editor which does not handle the UTF-8 signature correctly displays the bytes that compose that signature according to its own character encoding setting. (With the Latin 1 (ISO 8859-1) character encoding, the signature displays as characters ???.) With a binary editor capable of displaying the hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF.

Alternatively, your editor may tell you in a status bar or a menu what encoding your file is in, including information about the presence or not of the UTF-8 signature.

If not, some kind of script-based test (see below) may help. Alternatively, you could try this small web-based utility. (Note, if it’s a file included by PHP or some other mechanism that you think is causing the problem, type in the URI of the included file.)
Removing the BOM

If you have an editor which shows the characters that make up the UTF-8 signature you may be able to delete them by hand. Chances are, however, that the BOM is there in the first place because you didn't see it.

Check whether your editor allows you to specify whether a UTF-8 signature is added or kept during a save. Such an editor provides a way of removing the signature by simply reading the file in then saving it out again. For example, if Dreamweaver detects a BOM the Save As dialogue box will have a check mark alongside the text "Include Unicode Signature (BOM)". Just uncheck the box and save.

One of the benefits of using a script is that you can remove the signature quickly, and from multiple files. In fact the script could be run automatically as part of your process. If you use Perl, you could use a simple script created by Martin Dürst.

Note: You should check the process impact of removing the signature. It may be that some part of your content development process relies on the use of the signature to indicate that a file is in UTF-8. Bear in mind also that pages with a high proportion of Latin characters may look correct superficially but that occasional characters outside the ASCII range (U+0000 to U+007F) may be incorrectly encoded.
By the way

You will find that some text editors such as Windows Notepad will automatically add a UTF-8 signature to any file you save as UTF-8.

A UTF-8 signature at the beginning of a CSS file can sometimes cause the initial rules in the file to fail on certain user agents.

In some browsers, the presence of a UTF-8 signature will cause the browser to interpret the text as UTF-8 regardless of any character encoding declarations to the contrary.





在上面的方法中用到了[類 sun.nio.cs.StreamEncoder],下面貼出類的內(nèi)容,供大家參考:
    
/*
 * Copyright 2001-2005 Sun Microsystems, Inc.  All Rights Reserved.
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
 * under the terms of the GNU General Public License version 2 only, as
 * published by the Free Software Foundation.  Sun designates this
 * particular file as subject to the "Classpath" exception as provided
 * by Sun in the LICENSE file that accompanied this code.
 *
 * This code is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 * version 2 for more details (a copy is included in the LICENSE file that
 * accompanied this code).
 *
 * You should have received a copy of the GNU General Public License version
 * 2 along with this work; if not, write to the Free Software Foundation,
 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
 *
 * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
 * CA 95054 USA or visit www.sun.com if you need additional information or
 * have any questions.
 */

/*
 */

package sun.nio.cs;

import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;

public class StreamEncoder extends Writer
{

    private static final int DEFAULT_BYTE_BUFFER_SIZE = 8192;

    private volatile boolean isOpen = true;

    private void ensureOpen() throws IOException {
        if (!isOpen)
            throw new IOException("Stream closed");
    }

    // Factories for java.io.OutputStreamWriter
    public static StreamEncoder forOutputStreamWriter(OutputStream out,
                                                      Object lock,
                                                      String charsetName)
        throws UnsupportedEncodingException
    {
        String csn = charsetName;
        if (csn == null)
            csn = Charset.defaultCharset().name();
        try {
            if (Charset.isSupported(csn))
                return new StreamEncoder(out, lock, Charset.forName(csn));
        } catch (IllegalCharsetNameException x) { }
        throw new UnsupportedEncodingException (csn);
    }

    public static StreamEncoder forOutputStreamWriter(OutputStream out,
                                                      Object lock,
                                                      Charset cs)
    {
        return new StreamEncoder(out, lock, cs);
    }

    public static StreamEncoder forOutputStreamWriter(OutputStream out,
                                                      Object lock,
                                                      CharsetEncoder enc)
    {
        return new StreamEncoder(out, lock, enc);
    }


    // Factory for java.nio.channels.Channels.newWriter

    public static StreamEncoder forEncoder(WritableByteChannel ch,
                                           CharsetEncoder enc,
                                           int minBufferCap)
    {
        return new StreamEncoder(ch, enc, minBufferCap);
    }


    // -- Public methods corresponding to those in OutputStreamWriter --

    // All synchronization and state/argument checking is done in these public
    // methods; the concrete stream-encoder subclasses defined below need not
    // do any such checking.

    public String getEncoding() {
        if (isOpen())
            return encodingName();
        return null;
    }

    public void flushBuffer() throws IOException {
        synchronized (lock) {
            if (isOpen())
                implFlushBuffer();
            else
                throw new IOException("Stream closed");
        }
    }

    public void write(int c) throws IOException {
        char cbuf[] = new char[1];
        cbuf[0] = (char) c;
        write(cbuf, 0, 1);
    }

    public void write(char cbuf[], int off, int len) throws IOException {
        synchronized (lock) {
            ensureOpen();
            if ((off < 0) || (off > cbuf.length) || (len < 0) ||
                ((off + len) > cbuf.length) || ((off + len) < 0)) {
                throw new IndexOutOfBoundsException();
            } else if (len == 0) {
                return;
            }
            implWrite(cbuf, off, len);
        }
    }

    public void write(String str, int off, int len) throws IOException {
        /* Check the len before creating a char buffer */
        if (len < 0)
            throw new IndexOutOfBoundsException();
        char cbuf[] = new char[len];
        str.getChars(off, off + len, cbuf, 0);
        write(cbuf, 0, len);
    }

    public void flush() throws IOException {
        synchronized (lock) {
            ensureOpen();
            implFlush();
        }
    }

    public void close() throws IOException {
        synchronized (lock) {
            if (!isOpen)
                return;
            implClose();
            isOpen = false;
        }
    }

    private boolean isOpen() {
        return isOpen;
    }


    // -- Charset-based stream encoder impl --

    private Charset cs;
    private CharsetEncoder encoder;
    private ByteBuffer bb;

    // Exactly one of these is non-null
    private final OutputStream out;
    private WritableByteChannel ch;

    // Leftover first char in a surrogate pair
    private boolean haveLeftoverChar = false;
    private char leftoverChar;
    private CharBuffer lcb = null;

    private StreamEncoder(OutputStream out, Object lock, Charset cs) {
        this(out, lock,
         cs.newEncoder()
         .onMalformedInput(CodingErrorAction.REPLACE)
         .onUnmappableCharacter(CodingErrorAction.REPLACE));
    }

    private StreamEncoder(OutputStream out, Object lock, CharsetEncoder enc) {
        super(lock);
        this.out = out;
        this.ch = null;
        this.cs = enc.charset();
        this.encoder = enc;

        // This path disabled until direct buffers are faster
        if (false && out instanceof FileOutputStream) {
                ch = ((FileOutputStream)out).getChannel();
        if (ch != null)
                    bb = ByteBuffer.allocateDirect(DEFAULT_BYTE_BUFFER_SIZE);
        }
            if (ch == null) {
        bb = ByteBuffer.allocate(DEFAULT_BYTE_BUFFER_SIZE);
        }
    }

    private StreamEncoder(WritableByteChannel ch, CharsetEncoder enc, int mbc) {
        this.out = null;
        this.ch = ch;
        this.cs = enc.charset();
        this.encoder = enc;
        this.bb = ByteBuffer.allocate(mbc < 0
                                  ? DEFAULT_BYTE_BUFFER_SIZE
                                  : mbc);
    }

    private void writeBytes() throws IOException {
        bb.flip();
        int lim = bb.limit();
        int pos = bb.position();
        assert (pos <= lim);
        int rem = (pos <= lim ? lim - pos : 0);

            if (rem > 0) {
        if (ch != null) {
            if (ch.write(bb) != rem)
                assert false : rem;
        } else {
            out.write(bb.array(), bb.arrayOffset() + pos, rem);
        }
        }
        bb.clear();
        }

    private void flushLeftoverChar(CharBuffer cb, boolean endOfInput)
        throws IOException
    {
        if (!haveLeftoverChar && !endOfInput)
            return;
        if (lcb == null)
            lcb = CharBuffer.allocate(2);
        else
            lcb.clear();
        if (haveLeftoverChar)
            lcb.put(leftoverChar);
        if ((cb != null) && cb.hasRemaining())
            lcb.put(cb.get());
        lcb.flip();
        while (lcb.hasRemaining() || endOfInput) {
            CoderResult cr = encoder.encode(lcb, bb, endOfInput);
            if (cr.isUnderflow()) {
                if (lcb.hasRemaining()) {
                    leftoverChar = lcb.get();
                    if (cb != null && cb.hasRemaining())
                        flushLeftoverChar(cb, endOfInput);
                    return;
                }
                break;
            }
            if (cr.isOverflow()) {
                assert bb.position() > 0;
                writeBytes();
                continue;
            }
            cr.throwException();
        }
        haveLeftoverChar = false;
    }

    void implWrite(char cbuf[], int off, int len)
        throws IOException
    {
        CharBuffer cb = CharBuffer.wrap(cbuf, off, len);

        if (haveLeftoverChar)
        flushLeftoverChar(cb, false);

        while (cb.hasRemaining()) {
        CoderResult cr = encoder.encode(cb, bb, false);
        if (cr.isUnderflow()) {
           assert (cb.remaining() <= 1) : cb.remaining();
           if (cb.remaining() == 1) {
                haveLeftoverChar = true;
                leftoverChar = cb.get();
            }
            break;
        }
        if (cr.isOverflow()) {
            assert bb.position() > 0;
            writeBytes();
            continue;
        }
        cr.throwException();
        }
    }

    void implFlushBuffer() throws IOException {
        if (bb.position() > 0)
        writeBytes();
    }

    void implFlush() throws IOException {
        implFlushBuffer();
        if (out != null)
        out.flush();
    }

    void implClose() throws IOException {
        flushLeftoverChar(null, true);
        try {
            for (;;) {
                CoderResult cr = encoder.flush(bb);
                if (cr.isUnderflow())
                    break;
                if (cr.isOverflow()) {
                    assert bb.position() > 0;
                    writeBytes();
                    continue;
                }
                cr.throwException();
            }

            if (bb.position() > 0)
                writeBytes();
            if (ch != null)
                ch.close();
            else
                out.close();
        } catch (IOException x) {
            encoder.reset();
            throw x;
        }
    }

    String encodingName() {
        return ((cs instanceof HistoricallyNamedCharset)
            ? ((HistoricallyNamedCharset)cs).historicalName()
            : cs.name());
    }
}


  


哪位朋友如果能發(fā)現(xiàn)原因,請留下您的答案哦!

-------------------------------------------------------------
分享知識,分享快樂,希望文章能給需要的朋友帶來小小的幫助。

關(guān)于java生成UTF-8編碼格式文件的詭異問題


更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索:z360901061

微信掃一掃加我為好友

QQ號聯(lián)系: 360901061

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長非常感激您!手機(jī)微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。

【本文對您有幫助就好】

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描上面二維碼支持博主2元、5元、10元、自定義金額等您想捐的金額吧,站長會非常 感謝您的哦!!!

發(fā)表我的評論
最新評論 總共0條評論
主站蜘蛛池模板: 九九久久精品 | 久久99精品一区二区三区 | 我不卡老子影院午夜伦我不卡四虎 | 欧美中文字幕 | 日本一级免费 | 国产欧美久久久另类精品 | 爆操日本美女 | 精品在线一区二区三区 | 久操综合| 四虎最新紧急更新地址 | 日本夜爽爽一区二区三区 | 久久国产精品-国产精品 | 久久久不卡国产精品一区二区 | 久久久久久九 | 天天操夜夜骑 | 久久精品国内偷自一区 | 四虎免费看片 | 亚洲精品国产第一区二区图片 | 在线观看中文字幕第一页 | 欧美国产激情二区三区 | 高清不卡日本v在线二区 | 中文字幕一区二区三区永久 | 亚洲在线观看一区二区 | 亚洲国产综合人成综合网站00 | 97精品国产综合久久 | 中文字幕专区高清在线观看 | 久久澳门| 天堂成人精品视频在线观 | 狠狠操天天操夜夜操 | 九九国产在线视频 | 久草在现 | 亚洲精品久久久中文字 | 国产一区二区三区久久精品小说 | 特级毛片aaa免费版 特级毛片a级毛免费播放 | 日韩欧美成末人一区二区三区 | 午夜在线播放 | 日本人的色道www免费一区 | 国产一区二区亚洲精品天堂 | 男人的天堂在线精品视频 | 大香伊人久久 | 91精品91久久久 |