Just contact with the crawler will always ask this sentence: crawler can climb which sites, yes, crawler as a powerful means, which sites can climb, which sites can not climb it. Today to say which sites can crawl it.


1, news sites


News sites, all the things that can be seen on the site can be collected.


Can be collected include: title; author; release time; news sources; secondary title; summary; content; video sites; image links; language; news type; release status; delete status; website name; content source code.


2、Recruitment website


Recruitment websites need to emphasize that resumes that need to be paid to be seen cannot be collected! Resumes of non-public applicants cannot be collected!


Can be collected including: company name; job postings; web links; job classification; work location; professional needs; company profile; delivery address; industry; job content; job requirements; other information.


3、Forum website


Forum site can be collected, including: posts; posters; posting time; the number of posts; the number of posters concerned; posting content, reply content and so on.


4、E-commerce website


E-commerce website can collect need to communicate with the technical consultant in advance, browse the e-commerce website of a product user's cell phone number can not be collected.


Can collect content: price; name; keywords; picture links; number of payments; link address and so on.


5、Search engine category


Search engine to provide users with login account and keywords, the configuration is very simple, the collection of invalid data will be more. Collected content can certainly be seen.


Above is the crawler can crawl the website, with the help of crawler technology, we can collect the data we want in a short time. The use of crawlers combined with proxy ip is also a good choice.


(Recommended operating system: windows 7 system, Python 3.9.1, DELL G3 computer.)

[email protected]