data:image/s3,"s3://crabby-images/24d7b/24d7b824fb92e1f93cd4194367d62625474be441" alt="Python Web Scraping Cookbook"
上QQ阅读APP看书,第一时间看更新
Getting ready
We will read a file named unicode.html from our local web server, located at http://localhost:8080/unicode.html. This file is UTF-8 encoded and contains several sets of characters in different parts of the encoding space. For example, the page looks as follows in your browser:
data:image/s3,"s3://crabby-images/3e723/3e72340d9ae73559240f92b691990f7841959660" alt=""
The Page in the Browser
Using an editor that supports UTF-8, we can see how the Cyrillic characters are rendered in the editor:
data:image/s3,"s3://crabby-images/fb1e0/fb1e02f6a5ae468dfb0017502eb6b7f6b1a40275" alt=""
The HTML in an Editor
Code for the sample is in 02/06_unicode.py.