unicode – Python 3 UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x9d – Stack Overflow

Csv read raises "unicodedecodeerror: ‘charmap’ codec can’t decode…"

In Python 3, the csv module processes the file as unicode strings, and because of that has to first decode the input file. You can use the exact encoding if you know it, or just use Latin1 because it maps every byte to the unicode character with same code point, so that decoding encoding keep the byte values unchanged. Your code could become:

...
with open(input_file, "r", encoding='Latin1') as source:
    reader = csv.reader(source)
    with open(output_file, "w", newline='', encoding='Latin1') as result:
        ...

Unicodedecodeerror: ‘charmap’ codec can’t decode byte 0x8f in position xxx: char

I am trying to read one log file from python script. My program works fine in Linux but I am getting error in windows.After reading some line at particular line number I am getting following error

  File "C:Pythonlibencodingscp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 311: char
acter maps to <undefined> 

following is code I am using to read file

with open(log_file, 'r') as log_file_fh:
    for line in log_file_fh:
        print(line)

I have tried to fix it by using different encoding modes as ascii,utf8,utf-8,ISO-8859-1,cp1252,cp850.
But still facing same issue.
Is there any way to fix this issue.

Unicodeencodeerror: ‘charmap’ codec can’t encode – character maps to <undefined>, print function

I am writing a Python (Python 3.3) program to send some data to a webpage using POST method. Mostly for debugging process I am getting the page result and displaying it on the screen using print() function.

The code is like this:

conn.request("POST", resource, params, headers)
response = conn.getresponse()
print(response.status, response.reason)
data = response.read()
print(data.decode('utf-8'));

the HTTPResponse.read() method returns a bytes element encoding the page (which is a well formated UTF-8 document) It seemed okay until I stopped using IDLE GUI for Windows and used the Windows console instead. The returned page has a U 2021 character (em-dash) which the print function translates well in the Windows GUI (I presume Code Page 1252) but does not in the Windows Console (Code Page 850). Given the strict default behavior I get the following error:

UnicodeEncodeError: 'charmap' codec can't encode character 'u2021' in position 10248: character maps to <undefined>

I could fix it using this quite ugly code:

print(data.decode('utf-8').encode('cp850','replace').decode('cp850'))

Now it replace the offending character “—” with a ?. Not the ideal case (a hyphen should be a better replacement) but good enough for my purpose.

:/>  Боррелии, антитела класса IgM методом Вестерн-блота (anti-Borrelia IgM, Western blot) - узнать цены на анализ и сдать в Москве

There are several things I do not like from my solution.

  1. The code is ugly with all that decoding, encoding, and decoding.
  2. It solves the problem for just this case. If I port the program for a system using some other encoding (latin-1, cp437, back to cp1252, etc.) it should recognize the target encoding. It does not. (for instance, when using again the IDLE GUI, the emdash is also lost, which didn’t happen before)
  3. It would be nicer if the emdash translated to a hyphen instead of a interrogation bang.

The problem is not the emdash (I can think of several ways to solve that particularly problem) but I need to write robust code. I am feeding the page with data from a database and that data can come back. I can anticipate many other conflicting cases: an ‘Á’ U 00c1 (which is possible in my database) could translate into CP-850 (DOS/Windows Console encodign for Western European Languages) but not into CP-437 (encoding for US English, which is default in many Windows instalations).

So, the question:

Is there a nicer solution that makes my code agnostic from the output interface encoding?

Оставьте комментарий

Adblock
detector