If you've been blocked from websites and can't figure out why, this article can help you. Today, Xiaobian focuses on telling you about the common reasons why web crawlers are blocked.
Check JavaScript
If the page is blank and missing information, it is most likely due to a JavaScript problem with the website creation page.
II. Check cookies
If you are unable to log in or remain logged in, please check your cookies.
Third, the IP address is blocked
If the page cannot be opened, 403 Forbidden error appears, it is likely that the IP address has been blocked by the website and will no longer accept any of your requests. You can wait for IP addresses to be removed from the website blacklist, or you can choose to use proxy IP resources such as small elephant proxies. Once an IP is blocked, you can always replace it with a new IP to solve it.
In addition to the above three points, Python crawler should also try to slow down when crawling page information, too fast collection, not only easier to be blocked by anti-crawler, but also cause a heavy burden on the website. Try to add latency to your crawlers and try to keep them running in the dead of night, which is a network virtue.