Python urllib module allows us to access URL data programmatically.
Python urllib模塊允許我們以編程方式訪問URL數據。
Let’s start with a simple example where we will read the content of Wikipedia home page.
python urllib2、 讓我們從一個簡單的示例開始,我們將閱讀Wikipedia主頁的內容。
import urllib.requestresponse = urllib.request.urlopen('https://www.wikipedia.org')print(response.read())
Response read()
method returns the byte array. Above code will print the HTML data returned by the Wikipedia home page. It will not be in human readable format, but we can use some HTML parser to extract useful information from it.
響應read()
方法返回字節數組。 上面的代碼將打印Wikipedia主頁返回HTML數據。 它不是人類可讀的格式,但是我們可以使用一些HTML解析器從中提取有用的信息。
Let’s see what happens when we try to run the above program for JournalDev.
python獲取當前url、 讓我們看看嘗試為JournalDev運行以上程序時會發生什么。
import urllib.requestresponse = urllib.request.urlopen('https://www.journaldev.com')print(response.read())
We will get below error message.
我們將收到以下錯誤消息。
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py
Traceback (most recent call last):File "/Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py", line 3, in <module>response = urllib.request.urlopen('https://www.journaldev.com')File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopenreturn opener.open(url, data, timeout)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in openresponse = meth(req, response)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response'http', request, response, code, msg, hdrs)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in errorreturn self._call_chain(*args)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chainresult = func(*args)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_defaultraise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
It’s because my server doesn’t allow programmatic access to the website data because it’s meant for browsers that can parse HTML data. Usually we can overcome this error by sending User-Agent
header in request. Let’s look at the modified program for this.
urllib.parse? 這是因為我的服務器不允許以編程方式訪問網站數據,因為它是供可以解析HTML數據的瀏覽器使用的。 通常,我們可以通過在請求中發送User-Agent
標頭來克服此錯誤。 讓我們看一下修改后的程序。
import urllib.request# Request with Header Data to send User-Agent header
url = 'https://www.journaldev.com'headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'request = urllib.request.Request(url, headers=headers)
resp = urllib.request.urlopen(request)print(resp.read())
We are creating request headers using dictionary and then sending it in the request. Above program will print HTML data received from JournalDev home page.
我們正在使用字典創建請求標頭,然后在請求中發送它。 上面的程序將打印從JournalDev主頁接收HTML數據。
REST web services are accessed over HTTP protocols, so we can easily access them using urllib module. I have a simple JSON based demo rest web service running on my local machine created using JSON Server. It’s a great Node module to run dummy JSON REST web services for testing purposes.
python安裝urllib3。 REST Web服務通過HTTP協議訪問,因此我們可以使用urllib模塊輕松訪問它們。 我在使用JSON Server創建的本地計算機上運行了一個基于JSON的簡單演示剩余Web服務。 這是一個很棒的Node模塊,可以運行虛擬JSON REST Web服務以進行測試。
import urllib.requestresponse = urllib.request.urlopen('https://localhost:3000/employees')print(response.read())
Notice the console output is printing JSON data.
請注意,控制臺輸出正在打印JSON數據。
We can get response headers by calling info()
function on response object. This returns a dictionary, so we can also extract specific header data from response.
我們可以通過在響應對象上調用info()
函數來獲取響應頭。 這將返回一個字典,因此我們也可以從響應中提取特定的標頭數據。
import urllib.requestresponse = urllib.request.urlopen('https://localhost:3000/employees')print(response.info())print('Response Content Type is = ', response.info()["content-type"])
Output:
輸出:
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
X-Content-Type-Options: nosniff
Content-Type: application/json; charset=utf-8
Content-Length: 260
ETag: W/"104-LQla2Z3Cx7OedNGjbuVMiKaVNXk"
Date: Wed, 09 May 2018 19:26:20 GMT
Connection: closeResponse Content Type is = application/json; charset=utf-8
Let’s look at an example for POST method call.
讓我們看一下POST方法調用的示例。
import urllib.request
import urllib.parsepost_url = 'https://localhost:3000/employees'headers = {}
headers['Content-Type'] = 'application/json'# POST request encoded data
post_data = urllib.parse.urlencode({'name' : 'David', 'salary' : '9988'}).encode('ascii')#Automatically calls POST method because request has data
post_response = urllib.request.urlopen(url=post_url, data=post_data)print(post_response.read())
When we call urlopen
function, if request has data
then it automatically uses POST
http method. Below image shows the output of above POST call for my demo service.
當我們調用urlopen
函數時,如果請求中包含data
那么它將自動使用POST
http方法。 下圖顯示了上述演示服務的POST調用的輸出。
Reference: API Doc
參考: API文檔
翻譯自: https://www.journaldev.com/20795/python-urllib-python-3-urllib
版权声明:本站所有资料均为网友推荐收集整理而来,仅供学习和研究交流使用。
工作时间:8:00-18:00
客服电话
电子邮件
admin@qq.com
扫码二维码
获取最新动态