Analyzing Text Stored on Amazon’s Simple Storage Service

Amazon’s Simple Storage Service, or S3, was a revelation when it was introduced years ago, and it remains an important tool for many. Compared to most other cloud-based storage options and like other aws services, S3 is, as the name suggests, quite a bit simpler to set up and to use. Instead of needing to worry about creating sensible, efficiency enhancing directory structures and the like, S3 users simply dump their data into buckets. Given the right key and the appropriate access rights, data hosted on S3 can be retrieved or modified in much the same way as with the document-oriented, SQL-spurning databases that have become so popular in recent years.

That accounts for a great deal of the continuing popularity of S3, but there are real drawbacks. Compared to hosting data in a traditional, SQL-equipped database, for example, searching through a store kept on S3 has traditionally been quite a bit more difficult. While an Oracle database will come ready to plow through terabytes of data in order to find the piece that a user or developer might want, S3 on its own does not offer any such functionality.

0 (1)

For some users, that is acceptable, but many more would like to be able to index, search, and analyze the stores of data they upload to S3. Even basic text mining is not by default enabled by S3, but that is not to say that it is anything like impossible.

In fact, there are a number of excellent tools today for delving into data kept on S3 and analyzing it in any number of different ways. Just as no-SQL databases like MongoDB rose to prominence with a promise of making it easier to analyze documents of all kinds, in fact, so is S3 now starting to become recognized for delivering the same thing.

The difference is that, in the case of S3, most will want to make use of some third-party tools in order to accomplish this task. Thanks to the great popularity of the service, a number of the top enterprise search and text analysis tools now interface eagerly and effectively with S3, making it easy for their users to do whatever they might want.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s