• Sam Lantinga's avatar
    Fixed bug #896 · 74c8c77e
    Sam Lantinga authored
     John Popplewell      2009-12-08 23:05:50 PST
    
    Originally reported by AKFoerster on the mailing list.
    
    Error decoding UTF8 Russian text to UTF-16LE on Windows, but specifically on
    platforms without iconv support (the default on Windows).
    
    Valid UTF8 characters are flagged as being overlong and then substituted by the
    UNKNOWN_UNICODE character.
    
    After studying the testiconv.c example program, reading the RFCs and putting
    some printf statements in SDL_iconv.c the problem is in a test for 'Maximum
    overlong sequences', specifically 4.2.1, which is carried out by the following
    code:
    
          } else if ( p[0] >= 0xC0 ) {
            if ( (p[0] & 0xE0) != 0xC0 ) {
              /* Skip illegal sequences
                return SDL_ICONV_EILSEQ;
              */
              ch = UNKNOWN_UNICODE;
            } else {
              if ( (p[0] & 0xCE) == 0xC0 ) {    <<<<<<<< here
                overlong = SDL_TRUE;
              }
              ch = (Uint32)(p[0] & 0x1F);
              left = 1;
            }
          } else {
    
    Here is the 2-byte encoding of a character in range 00000080 - 000007FF
        110xxxxx 10xxxxxx
    
    The line in question is supposed to be checking for an overlong sequence which
    would be less than
        11000001 10111111
    
    which should be represented as a single byte.
    
    BUT, the mask value (0xCE) is wrong, it isn't checking the top-most bit:
        11000001     value
        11001110     mask (incorrect)
           ^
    and should be (0xDE):
        11000001     value
        11011110     mask (correct)
    
    making the above code:
    
          } else if ( p[0] >= 0xC0 ) {
            if ( (p[0] & 0xE0) != 0xC0 ) {
              /* Skip illegal sequences
                return SDL_ICONV_EILSEQ;
              */
              ch = UNKNOWN_UNICODE;
            } else {
              if ( (p[0] & 0xDE) == 0xC0 ) {    <<<<<<<< here
                overlong = SDL_TRUE;
              }
              ch = (Uint32)(p[0] & 0x1F);
              left = 1;
            }
          } else {
    
    I can supply a test program and/or a patch if required,
    
    best regards,
    John Popplewell
    
    --HG--
    extra : convert_revision : svn%3Ac70aab31-4412-0410-b14c-859654838e24/trunk%404283
    74c8c77e
Name
Last commit
Last update
..
atomic Loading commit data...
audio Loading commit data...
cpuinfo Loading commit data...
events Loading commit data...
file Loading commit data...
haptic Loading commit data...
joystick Loading commit data...
libm Loading commit data...
loadso Loading commit data...
main Loading commit data...
power Loading commit data...
stdlib Loading commit data...
thread Loading commit data...
timer Loading commit data...
video Loading commit data...
SDL.c Loading commit data...
SDL_compat.c Loading commit data...
SDL_error.c Loading commit data...
SDL_error_c.h Loading commit data...
SDL_fatal.c Loading commit data...
SDL_fatal.h Loading commit data...