Null bytes when using xlwt to make valid excel file
I'm using Python 2.7 and attempting to automate downloading an excel file
from a website using the mechanize library. I used CharDet to discover the
original encoding for the file, which is "iso-8859-2". In order to
properly separate the data into columns based on the data read by
mechanize, I have an intermediary step storing the data to a text file.
fileobj = open("data.txt", 'wb')
fileobj.write(response.read())
fileobj.close()
To create the Excel file, I am using the xlwt module.
book = xlwt.Workbook(encoding = "utf-8")
sheet = book.add_sheet('sheet1')
After this, I read through the text file and attempt to decode the text
and encode it into utf-8 form with
for line in fileobj:
line = line.decode("iso-8859-2").encode("utf-8", "ignore")
The problem is that attempting to iterate over the file using Python's csv
default reader reports an error that there are null bytes. Placing the
encoded text in a .txt file shows there are no null bytes in the lines
themselves, so I am not sure where the problem is coming from.
No comments:
Post a Comment