[Java] Convertire una stringa da un charset ad un altro [Archivio]

View Full Version : [Java] Convertire una stringa da un charset ad un altro

ndakota

20-04-2010, 14:16

Salve ragazzi. Come da titolo avrei bisogno di convertire una stringa dal formato us-ascii ad uno ascii esteso qualsiasi. In parole povere la mia stringa non prende le accentate(è me lo ritrovo come qualcosa tipo E=09). E' possibile, passando ad un altro charset(per esempio iso-8859-1), aggiustare questa stringa? Grazie :D

DanieleC88

20-04-2010, 17:12

Puoi specificare un po' meglio dove e come ti si presenta il problema?
Non vorrei che la mia memoria mi ingannasse, ma mi sembrava che Java gestisse nativamente le stringhe in Unicode.

banryu79

20-04-2010, 18:06

...
A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.
...

+

...
Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
...

Gli standard charset a cui si riferisce sarebbero questi:

Standard charsets

Every implementation of the Java platform is required to support the following standard charsets.
Consult the release documentation for your implementation to see if any other charsets are supported.
The behavior of such optional charsets may differ between implementations.

US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit UCS Transformation Format
UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

Non ho capito la tua situazione comunque dovresti far presto a fare una prova: partendo dalla tua String e sfruttando il metodo getBytes(Charset) puoi ottenerne l'array di bytes codificati secondo il Charset che desideri, quindi passi questo array come argomento al costruttore di una nuova String e la stampi per vedere cosa sputa: metodo scientifico :D

ndakota

20-04-2010, 21:34

Puoi specificare un po' meglio dove e come ti si presenta il problema?
Non vorrei che la mia memoria mi ingannasse, ma mi sembrava che Java gestisse nativamente le stringhe in Unicode.

Praticamente ricevo il formato di una mail in plain text. E dagli attributi leggo che è in charset "us-ascii".

+

Gli standard charset a cui si riferisce sarebbero questi:

Standard charsets

Every implementation of the Java platform is required to support the following standard charsets.
Consult the release documentation for your implementation to see if any other charsets are supported.
The behavior of such optional charsets may differ between implementations.

US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit UCS Transformation Format
UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

Non ho capito la tua situazione comunque dovresti far presto a fare una prova: partendo dalla tua String e sfruttando il metodo getBytes(Charset) puoi ottenerne l'array di bytes codificati secondo il Charset che desideri, quindi passi questo array come argomento al costruttore di una nuova String e la stampi per vedere cosa sputa: metodo scientifico :D

Ho già provato, ma mi stampa sempre la stringa di partenza, anche se il charset cambia. Provato stampando i byte.