Monday, 30 September 2013

Null bytes when using xlwt to make valid excel file

Null bytes when using xlwt to make valid excel file

I'm using Python 2.7 and attempting to automate downloading an excel file
from a website using the mechanize library. I used CharDet to discover the
original encoding for the file, which is "iso-8859-2". In order to
properly separate the data into columns based on the data read by
mechanize, I have an intermediary step storing the data to a text file.
fileobj = open("data.txt", 'wb')
fileobj.write(response.read())
fileobj.close()
To create the Excel file, I am using the xlwt module.
book = xlwt.Workbook(encoding = "utf-8")
sheet = book.add_sheet('sheet1')
After this, I read through the text file and attempt to decode the text
and encode it into utf-8 form with
for line in fileobj:
line = line.decode("iso-8859-2").encode("utf-8", "ignore")
The problem is that attempting to iterate over the file using Python's csv
default reader reports an error that there are null bytes. Placing the
encoded text in a .txt file shows there are no null bytes in the lines
themselves, so I am not sure where the problem is coming from.

No comments:

Post a Comment