ROBOTS.TXT

For a reason, maybe you do not want a Web Crawler, whether from search engines or other types of web robots, accessing all or part of your website then robots.txt can be used for this purpose.

Robots.txt file is placed at the root of the website (example: yourdomain.com / robots.txt), and is a standard that has been developed since 1994, when indexing web became popular. This standard does not guarantee that the Web crawler will follow it; it all depends on the cooperation the Web crawler to pay attention to this standard. There are special instructions that can be used to instruct robots not to access the web your website at all, just writes down the following instructions in robots.txt:

User-agent: *
Disallow: /

User-agent: * means the robots.txt instruction applies to all web robots. You can change the specific name of the web robot, if you just want to impose on the robots.txt instructions for particular web robots.
Disallow: / means the root directory and all its contents are not allowed to be accessed by web robots.

If you want to protect some directories or specific files, the writing is as follows:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /download/browse.php

Instruction on the means to tell web robots do not access cgi-bin directory and the images (and its contents), nor access files / download / browse.php (but can access files in a directory other than browse.php / download).

 

List of Web Crawler

Some examples of Web Crawler:

1. Teleport Pro

Teleport Pro is a Web Crawler software for offline browsing. This software has been popular for a long time, especially when the Internet connection is not as easy and as fast as now. This software is paid and addressed in http://www.tenmax.com.

2. HTTrack

Written using C, as well as Teleport Pro, HTTrack is software that can download website content into a mirror on your hard drive, to be viewed offline. Interestingly, this software is free and can be downloaded at its official website http://www.httrack.com.

3. Googlebot

Web Crawler is to build a search index that is used by search engine Google. If people find your website through Google, it could be the services of Googlebot. Despite the consequences, some of your bandwidth will be taken because this crawling process.

4. Yahoo! Slurp

If a Web Crawler Googlebot is Google’s flagship search engine then Yahoo! rely Yahoo! Slurp, the technology developed by Inktomi Corporation, which acquired Yahoo.

5. YaCy

Slightly different from the others on the Web Crawler, YacY built on the principles of P2P networks (peer-to-peer), on-develop using Java, and distributed on several hundred machines (called YaCy peers). Each peer’s shared with the P2P principle to share the index so it does not require a central server. Examples of search engines that use YaCy are Sciencenet (http://sciencenet.fzk.de), to search for documents in the field of science.

Networking Troubleshooting

Determining Troubleshooting Method.

Network troubleshooting is when you resolution troubles by identifying and resolving problems. Example if you treat the servers to send directories to the client. Since the power goes out, then a server and a client go down. When the power is on, reboot both devices. After logging in the client, you necessitate to access the directory on the server, but can not. What happened?  There are quite a few methods that you can apply:

I. OSI Model

The basis of each method of troubleshooting here is OSI Model reference. If you don’t grasp what the OSI model, in a nutshell it is a network model that consists of seven layers, where the structure of the uppermost layer is:

  1. Application
  2. Presentation
  3. Session
  4. Transport
  5. Network
  6. Data Link
  7. Physical

The workings of the OSI model is run from the Application to the Physical layer, then headed to the Physical layer receiver via an intermediary network with a physical medium (such as an Ethernet cable). From there, data goes to the upper application layer to the receiver.

When data has arrived, the receiver turn into the sender. And the sender to the receiver. The retort from the receiver goes back and forth the contradictory path, and retrace to the primary sender. Hence, but there is one layer that is not performing, then the data could not run. For case in point, if the Session layer does not function, then the data will not be able to proceed from the Network layer to Transport layer.

II. Bottom-Up

This method starts from the bottom layer, the Physical Layer, a new upward toward the Application Layer. Physical Layer includes a network cable and network card. So, if there is a network cable is disconnected, then do not always do the troubleshooting. You should fix the problem first before proceeding to the Physical Layer. Having solved the problem, check it whether there is still interference. If yes, continue troubleshooting to display the data links. For example, if for example there is an entry the same MAC on the switch MAC address table, then fix the problem first before checking on the network layer (eg. IP address or routing).

III. Top-Down

The same as the bottom-up method, only the top-down methods, troubleshooting starting from the uppermost layer, the layer of new Application heading down to the Physical Layer.

IV. Divide and Conquer

This method takes a bit of instinct. This method can be started from anywhere, if you get by the cause of the problem. From there, you can go up or down.

 

So which method is chosen? Follow your intuition, where about problems that occur. For example, if the user can not browse the internet, and you think it’s because of the many browser setting, afterward you be able to use a top-down method. In contrast, if the user has just connect the notebook to the network and can not browse the internet, you can use a bottom-up method because the user is most likely the network cable is damaged or because of similar problems.