Today I used chardet.detect in the repl and it returned windows-1252 >(incorrect, because it later resulted in a UnicodeDecodeError). When I ran >chardet as a script (which uses UniversalLineDetector) this returned >MacRoman. Isn't charset.detect the correct way? I've used this method many >times.
Today I used chardet.detect in the repl and it returned windows-1252
(incorrect, because it later resulted in a UnicodeDecodeError). When I ran
chardet as a script (which uses UniversalLineDetector) this returned
MacRoman. Isn't charset.detect the correct way? I've used this method many
times.
# Interpreter
>>> contents = open(FILENAME, "rb").read()
>>> chardet.detect(content)
{'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
''}
# Terminal
$ python -m chardet FILENAME
FILENAME: MacRoman with confidence 0.7167379080370483
Thanks!
Albert-Jan
Today I used chardet.detect in the repl and it returned windows-1252
(incorrect, because it later resulted in a UnicodeDecodeError). When I
ran
chardet as a script (which uses UniversalLineDetector) this returned
MacRoman. Isn't charset.detect the correct way? I've used this method
many
times.
# Interpreter
>>> contents = open(FILENAME, "rb").read()
>>> chardet.detect(content)
{'encoding': 'Windows-1252', 'confidence': 0.7282676610947401,
'language':
''}
# Terminal
$ python -m chardet FILENAME
FILENAME: MacRoman with confidence 0.7167379080370483
Thanks!
Albert-Jan
'some file: ascii with confidence 1.0'from chardet.cli import chardetect
chardetect.description_of(open('/tmp/DATE', 'rb'), 'some file')
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}from chardet import detect
detect(open('/tmp/DATE','rb').read())
--
https://mail.python.org/mailman/listinfo/python-list
Sysop: | Tetrazocine |
---|---|
Location: | Melbourne, VIC, Australia |
Users: | 4 |
Nodes: | 8 (0 / 8) |
Uptime: | 215:00:10 |
Calls: | 73 |
Calls today: | 1 |
Files: | 21,500 |
Messages: | 73,905 |