python urllib2，Python urllib – Python 3 urllib

2023-11-19 阅读 27 评论 0

摘要：Python urllib module allows us to access URL data programmatically. Python urllib模塊允許我們以編程方式訪問URL數據。 Python URLlib (Python urllib) We can use Python urllib to get website content in python program. 我們可以使用Python urllib在python程序中獲

Python urllib module allows us to access URL data programmatically.

Python urllib模塊允許我們以編程方式訪問URL數據。

Python URLlib (Python urllib)

We can use Python urllib to get website content in python program.
我們可以使用Python urllib在python程序中獲取網站內容。
We can also use it to call REST web services.
我們還可以使用它來調用REST Web服務。
We can make GET and POST http requests.
我們可以發出GET和POST http請求。
This module allows us to make HTTP as well as HTTPS requests.
這個模塊允許我們發出HTTP以及HTTPS請求。
We can send request headers and also get information about response headers.
我們可以發送請求標頭，還可以獲取有關響應標頭的信息。

Python urllib GET示例 (Python urllib GET example)

Let’s start with a simple example where we will read the content of Wikipedia home page.

python urllib2、讓我們從一個簡單的示例開始，我們將閱讀Wikipedia主頁的內容。

import urllib.requestresponse = urllib.request.urlopen('https://www.wikipedia.org')print(response.read())

Response read() method returns the byte array. Above code will print the HTML data returned by the Wikipedia home page. It will not be in human readable format, but we can use some HTML parser to extract useful information from it.

響應read()方法返回字節數組。上面的代碼將打印Wikipedia主頁返回HTML數據。它不是人類可讀的格式，但是我們可以使用一些HTML解析器從中提取有用的信息。

帶有標頭的Python urllib請求 (Python urllib request with header)

Let’s see what happens when we try to run the above program for JournalDev.

python獲取當前url、讓我們看看嘗試為JournalDev運行以上程序時會發生什么。

import urllib.requestresponse = urllib.request.urlopen('https://www.journaldev.com')print(response.read())

We will get below error message.

我們將收到以下錯誤消息。

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py
Traceback (most recent call last):File "/Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py", line 3, in <module>response = urllib.request.urlopen('https://www.journaldev.com')File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopenreturn opener.open(url, data, timeout)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in openresponse = meth(req, response)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response'http', request, response, code, msg, hdrs)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in errorreturn self._call_chain(*args)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chainresult = func(*args)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_defaultraise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

It’s because my server doesn’t allow programmatic access to the website data because it’s meant for browsers that can parse HTML data. Usually we can overcome this error by sending User-Agent header in request. Let’s look at the modified program for this.

urllib.parse？這是因為我的服務器不允許以編程方式訪問網站數據，因為它是供可以解析HTML數據的瀏覽器使用的。通常，我們可以通過在請求中發送User-Agent標頭來克服此錯誤。讓我們看一下修改后的程序。

import urllib.request# Request with Header Data to send User-Agent header
url = 'https://www.journaldev.com'headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'request = urllib.request.Request(url, headers=headers)
resp = urllib.request.urlopen(request)print(resp.read())

We are creating request headers using dictionary and then sending it in the request. Above program will print HTML data received from JournalDev home page.

我們正在使用字典創建請求標頭，然后在請求中發送它。上面的程序將打印從JournalDev主頁接收HTML數據。

Python urllib REST示例 (Python urllib REST Example)

REST web services are accessed over HTTP protocols, so we can easily access them using urllib module. I have a simple JSON based demo rest web service running on my local machine created using JSON Server. It’s a great Node module to run dummy JSON REST web services for testing purposes.

python安裝urllib3。 REST Web服務通過HTTP協議訪問，因此我們可以使用urllib模塊輕松訪問它們。我在使用JSON Server創建的本地計算機上運行了一個基于JSON的簡單演示剩余Web服務。這是一個很棒的Node模塊，可以運行虛擬JSON REST Web服務以進行測試。

import urllib.requestresponse = urllib.request.urlopen('https://localhost:3000/employees')print(response.read())

Notice the console output is printing JSON data.

請注意，控制臺輸出正在打印JSON數據。

Python urllib響應標頭 (Python urllib response headers)

We can get response headers by calling info() function on response object. This returns a dictionary, so we can also extract specific header data from response.

我們可以通過在響應對象上調用info()函數來獲取響應頭。這將返回一個字典，因此我們也可以從響應中提取特定的標頭數據。

import urllib.requestresponse = urllib.request.urlopen('https://localhost:3000/employees')print(response.info())print('Response Content Type is = ', response.info()["content-type"])

Output:

輸出：

X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
X-Content-Type-Options: nosniff
Content-Type: application/json; charset=utf-8
Content-Length: 260
ETag: W/"104-LQla2Z3Cx7OedNGjbuVMiKaVNXk"
Date: Wed, 09 May 2018 19:26:20 GMT
Connection: closeResponse Content Type is =  application/json; charset=utf-8

Python urllib開機自檢 (Python urllib POST)

Let’s look at an example for POST method call.

讓我們看一下POST方法調用的示例。

import urllib.request
import urllib.parsepost_url = 'https://localhost:3000/employees'headers = {}
headers['Content-Type'] = 'application/json'# POST request encoded data
post_data = urllib.parse.urlencode({'name' : 'David', 'salary'  : '9988'}).encode('ascii')#Automatically calls POST method because request has data
post_response = urllib.request.urlopen(url=post_url, data=post_data)print(post_response.read())

When we call urlopen function, if request has data then it automatically uses POST http method. Below image shows the output of above POST call for my demo service.

當我們調用urlopen函數時，如果請求中包含data那么它將自動使用POST http方法。下圖顯示了上述演示服務的POST調用的輸出。

GitHub Repository.GitHub Repository下載代碼。

Reference: API Doc

參考： API文檔