Crawling has become a well-known term in today's popular internet, relying on script files. Developers write code based on certain logic to crawl information from the World Wide Web according to predetermined rules.
Web crawlers are actually using scripts to access a large number of web pages in a short period of time, tracking scripts to specify specific targets, and crawling information. But because the browser has a limit on the frequency of accessing the same IP address at a fixed time, the restriction is to prevent errors caused by excessive server running pressure. At this point, in order to lift restrictions and quickly obtain data, proxy IP becomes the preferred choice for web crawlers. ISPKEY's overseas agents have a massive number of dynamic residential IPs, with IP proxy pools spread across the world, providing strong technical support for web crawlers.
IP proxies provide flexible IP addresses for web crawlers, and by constantly changing IP addresses, prevent the occurrence of anti crawling mechanisms that touch the server. The details are as follows.
Obtain the address and port number, which refers to obtaining the API link IP address
def get_ip_list():
url=”XXX”
resp=requests.get(url)
//Extract page data
resp_json=resp.text
Convert JSON string data to a dictionary
resp_dict=json.loads(resp_json)
ip_dict_list=resp_dict.get(‘data’)
Extract data from the data string
return ip_dict_list
Some non IP whitelisted IPs require user password verification, and API links will encrypt usernames and passwords. If necessary, code verification encryption is required.
Send a request to the target website to obtain relevant data. If successful, access the response information; if unsuccessful, print the result
Def spider_ip (ip_port, URL)://The actual URL address to be requested
headers1 = {
"User-Agent": 'XXX'
//Browser Information
}
headers = {
'Proxy-Authorization': 'Basic %s' % (base_code(username, password))
//User name+password
}
//Place the proxy IP address in the proxy parameter
proxy = {
'http':'http://{}'.format(ip_port)
}
//Send network request
Request successful
try:
reap = requests.get(url, proxies=proxy,headers=headers,headers1=headers1)
//Parsing Access Data
result = reap.text
//Sending failed, printing this agent is invalid
except:
Result='This agent is invalid'
That's all for the introduction of this article. For more IP information, please look forward to the following text.
More
- How to set up proxy ip tutorial in IE browser?
- How to set up a computer to access the Internet with dynamic IP? What is the use of dynamic proxy IP?
- How to change IP in a virtual machine?
- What are the steps to set the proxy IP for soft routing
- How to set the static IP address of the router? How can static exclusive IP make the network speed more stable?