IBM's BigSheets Text-mining the UK Web Archive

Tuesday Mar 9th 2010 by DatabaseJournal.com Staff
Share:

Hadoop, a data storage system that can scale to billions of items with less required structure and space than a relational database.

Recently announced, the UK Web Archive, with the help of IBM and its decades of experience in text-mining and BigSheets software is going to store and make accessible every site in the .uk top-level domain to provide dynamic research with abilities like classifying pages into categories, extracting entities as metadata, and offering several approaches to querying and visualizing data.

Hadoop, the core technology being used within BigSheets, is a data storage system that can scale to billions of items with less required structure and space than a relational database; easily handling large amounts of traffic and using parallel processing as well as addition of new servers, replication, fail-over, and load balancing.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved