需求: 一臺機器上有多個網卡, 如何訪問指定的 URL 時使用指定的網卡發送數據呢?
$ curl --interface eth0 www.baidu.com # curl interface 可以指定網卡
閱讀 urllib.py 的源碼, 追述到 open_http ?C> httplib.HTTP ?C> httplib.HTTP._connection_class = HTTPConnection
HTTPConnection 在創建的時候會指定一個 source_address.
HTTPConnection.connect 時調用 HTTPConnection._create_connection = socket.create_connection
# 先看一下本地網卡信息 $ ifconfig lo0: flags=8049mtu 16384 options=3 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=1 en0: flags=8863 mtu 1500 ether c8:e0:eb:17:3a:73 inet6 fe80::cae0:ebff:fe17:3a73%en0 prefixlen 64 scopeid 0x4 inet 192.168.20.2 netmask 0xffffff00 broadcast 192.168.20.255 nd6 options=1 media: autoselect status: active en1: flags=8863 mtu 1500 options=4 ether 0c:5b:8f:27:9a:64 inet6 fe80::e5b:8fff:fe27:9a64%en8 prefixlen 64 scopeid 0xa inet 192.168.8.100 netmask 0xffffff00 broadcast 192.168.8.255 nd6 options=1 media: autoselect (100baseTX ) status: active
可以看到en0和en1, 這兩塊網卡都可以訪問公網. lo0是本地回環.
直接修改 socket.py 做測試.
def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT, source_address=None): """If *source_address* is set it must be a tuple of (host, port) for the socket to bind as a source address before making the connection. An host of '' or port 0 tells the OS to use the default. source_address 如果設置, 必須是傳遞元組 (host, port), 默認是 ("", 0) """ host, port = address err = None for res in getaddrinfo(host, port, 0, SOCK_STREAM): af, socktype, proto, canonname, sa = res sock = None try: sock = socket(af, socktype, proto) # sock.bind(("192.168.20.2", 0)) # en0 # sock.bind(("192.168.8.100", 0)) # en1 # sock.bind(("127.0.0.1", 0)) # lo0 if timeout is not _GLOBAL_DEFAULT_TIMEOUT: sock.settimeout(timeout) if source_address: print "socket bind source_address: %s" % source_address sock.bind(source_address) sock.connect(sa) return sock except error as _: err = _ if sock is not None: sock.close() if err is not None: raise err else: raise error("getaddrinfo returns an empty list")
參考說明文檔, 直接分三次綁定不通網卡的 IP 地址, 端口設置為0.
# 測試 en0 $ python -c 'import urllib as u;print u.urlopen("http://ip.haschek.at").read()' .148.245.16 # 測試 en1 $ python -c 'import urllib as u;print u.urlopen("http://ip.haschek.at").read()' .94.115.227 # 測試 lo0 $ python -c 'import urllib as u;print u.urlopen("http://ip.haschek.at").read()' Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen return opener.open(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 350, in open_http h.endheaders(data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders self._send_output(message_body) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output self.send(msg) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send self.connect() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect self.timeout, self.source_address) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 578, in create_connection raise err IOError: [Errno socket error] [Errno 49] Can't assign requested address
測試通過, 說明在多網卡情況下, 創建 socket 時綁定某塊網卡的 IP 就可以, 端口需要設置為0. 如果端口不設置為0, 第二次請求時, 可以看到拋異常, 端口被占用.
Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen return opener.open(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 350, in open_http h.endheaders(data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders self._send_output(message_body) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output self.send(msg) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send self.connect() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect self.timeout, self.source_address) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 577, in create_connection raise err IOError: [Errno socket error] [Errno 48] Address already in use
如果是在項目中, 只需要把 socket.create_connection 這個函數的形參 source_address 設置為對應網卡的 (IP, 0) 就可以.
# test-interface_urllib.py import socket import urllib, urllib2 _create_socket = socket.create_connection SOURCE_ADDRESS = ("127.0.0.1", 0) #SOURCE_ADDRESS = ("172.28.153.121", 0) #SOURCE_ADDRESS = ("172.16.30.41", 0) def create_connection(*args, **kwargs): in_args = False if len(args) >=3: args = list(args) args[2] = SOURCE_ADDRESS args = tuple(args) in_args = True if not in_args: kwargs["source_address"] = SOURCE_ADDRESS print "args", args print "kwargs", str(kwargs) return _create_socket(*args, **kwargs) socket.create_connection = create_connection print urllib.urlopen("http://ip.haschek.at").read()
通過測試, 可以發現已經可以通過制定的網卡發送數據, 并且 IP 地址對應網卡分配的 IP.
問題, 爬蟲經常使用 requests, requests 是否支持呢. 通過測試, 可以發現, requests 并沒有使用 python 內置的 socket 模塊.
看源碼, requests 是如果創建的 socket 連接呢. 方法和查看 urllib 創建socket 的方式一樣. 具體就不寫了.
因為我用的是 python 2.7, 所以可以定位到 requests 使用的 socket 模塊是 urllib3.utils.connection 的.
修改方法和 urllib 相差不大.
import urllib3.connection _create_socket = urllib3.connection.connection.create_connection # pass urllib3.connection.connection.create_connection = create_connection # pass
運行后, 可能會拋出異常. requests.exceptions.ConnectionError: Max retries exceeded with .. Invalid argument
這個異常不是每次出現, 跟 IP 段有關系, 跳轉遞歸層數太多導致, 只需要將 kwargs 中的 socket_options去掉即可. 127.0.0.1肯定會出異常.
import socket import urllib import urllib2 import urllib3.connection import requests as req _default_create_socket = socket.create_connection _urllib3_create_socket = urllib3.connection.connection.create_connection SOURCE_ADDRESS = ("127.0.0.1", 0) #SOURCE_ADDRESS = ("172.28.153.121", 0) #SOURCE_ADDRESS = ("172.16.30.41", 0) def default_create_connection(*args, **kwargs): try: del kwargs["socket_options"] except: pass in_args = False if len(args) >=3: args = list(args) args[2] = SOURCE_ADDRESS args = tuple(args) in_args = True if not in_args: kwargs["source_address"] = SOURCE_ADDRESS print "args", args print "kwargs", str(kwargs) return _default_create_socket(*args, **kwargs) def urllib3_create_connection(*args, **kwargs): in_args = False if len(args) >=3: args = list(args) args[2] = SOURCE_ADDRESS in_args = True args = tuple(args) if not in_args: kwargs["source_address"] = SOURCE_ADDRESS print "args", args print "kwargs", str(kwargs) return _urllib3_create_socket(*args, **kwargs) socket.create_connection = default_create_connection # 因為偶爾會出問題, 所以使用默認的 socket.create_connection # urllib3.connection.connection.create_connection = urllib3_create_connection urllib3.connection.connection.create_connection = default_create_connection print " *** test requests: " + req.get("http://ip.haschek.at").content print " *** test urllib: " + urllib.urlopen("http://ip.haschek.at").read() print " *** test urllib2: " + urllib2.urlopen("http://ip.haschek.at").read()
注意: 使用 urllib3.utils.connection 好像不起作用
稍微再完善一下, 就是把根據網卡名自動獲取 IP.
import subprocess def get_all_net_devices(): sub = subprocess.Popen("ls /sys/class/net", shell=True, stdout=subprocess.PIPE) sub.wait() net_devices = sub.stdout.read().strip().splitlines() # ['eth0', 'eth1', 'lo'] # 這里簡單過濾一下網卡名字, 根據需求改動 net_devices = [i for i in net_devices if "ppp" in i] return net_devices ALL_DEVICES = get_all_net_devices() def get_local_ip(device_name): sub = subprocess.Popen("/sbin/ifconfig en0 | grep '%s ' | awk '{print $2}'" % device_name, shell=True, stdout=subprocess.PIPE) sub.wait() ip = sub.stdout.read().strip() return ip def random_local_ip(): return get_local_ip(random.choice(ALL_DEVICES)) # code ...
只需要把 args[2] = SOURCE_ADDRESS 和 kwargs["source_address"] = SOURCE_ADDRESS改成 random_local_ip() 或者 get_local_ip("eth0")
至于有什么用途, 就全憑想象了.
以上這篇Python 使用指定的網卡發送HTTP請求的實例就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支持腳本之家。
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
