ITrain Homepage

Site Directory
Membership
Train-the-Trainer
Trainer Certification
Certified Training Materials
ITinfo E-zine
Responsible Training
White Papers
Trainer Resources
What's New
Speaking Engagements
Onsite Training
ITrain Gear



Popular Links
Speaking Engagements
Training Manuals
Certification
Train the Trainer
The Training Book
Technical Writing
Privacy Policy

Print this document

Google
Web ITrain.org

Lots of Linux

Google knows Linux will crunch it's data


ITinfo Sponsor

ERROR: Random File Unopenable

ERROR: Random File Unopenable

The random file, as specified in the $random_file perl variable was unopenable.

The file was not found on your file system. This means that it has either not been created or the path you have specified in $trrandom_file is incorrect.


4,000 Linux Servers Used as Backbone of Google Search Engine

by Dave Murphy
ISSN 1535-3613

Dave Murphy, ITrain founder The high-end search engine Google has setup up 4,000 PC servers running Red Hat Linux, and it has plans to upgrade the system to a total of 6,000 servers later this year. I think this is the largest Linux installation in the world. The practically free cost of Red Hat Linux compares to approximately $1,000 for the software to run a Windows NT server and even more for Windows 2000. And the cost of hardware is reduced because Linux doesn't have significant hardware requirements.

The choice to use Linux rather than Windows NT/2000 will save Google over $6 million this year in software cost alone. Overall, I estimate the savings will be more than double that, because Linux is cheaper to buy, more quickly installed, and requires less physical periodic system maintenance.

"The hypertext analysis is computationally expensive," said Sergey Brin, founder and president of Google.com. "We need to have an efficient system for doing that. That's why we use a lot of cheap PCs. It's a cheaper platform. The dollar per MIPS is better for PCs."

The Linux systems will be used to rank the importance of submitted webpages by counting how many referential links to that page exist and the importance of the referential pages. The system will also conduct a hypertext analysis to determine where keywords are located on submitted pages.

This work is computationally intensive, with 500 million variables and 2 million terms in a search equation to index the web, performed about every month, resulting in about 1 TB of data to index 300 million webpages. One terabyte (TB) is the equivalent of 1,024 gigabytes (2^40 bytes).

Google has in-house talent to maintain the Linux servers, and it values the ability to look at the source code of the operating system and applications to correct problems as they appear. Linux allows the Google staff to be less reliant on external vendors.

Call for Comments

What do you think? Leave your comments on the message center.

References

Google
Red Hat
Message Center


Subscribe to ITinfo.
Receive computing and Internet news & tips
by subscribing to the ITinfo information service.
Type your Internet email address in the form, and click "Subscribe."
Email Address:

Previous issues are on our website at http://itrain.org/itinfo/.

International Association of Information Technology Trainers
PMB 616
6030-M Marshalee Dr
Elkridge, MD 21075-5987

410.567.5366
1.888.290.6200
fax: 801.650.0423
Membership Director: member@itrain.org

Return to ITrain Homepage

Copyright © 2000 International Association of Information Technology Trainers, Ltd., All Rights Reserved

http://itrain.org/itinfo/2000/it000531.html
updated May 31, 2000