First blog post

This is the post excerpt.


This is your very first post. Click the Edit link to modify or delete it, or start a new post. If you like, use this post to tell readers why you started this blog and what you plan to do with it.


Analyzing Text Stored on Amazon’s Simple Storage Service

Amazon’s Simple Storage Service, or S3, was a revelation when it was introduced years ago, and it remains an important tool for many. Compared to most other cloud-based storage options and like other aws services, S3 is, as the name suggests, quite a bit simpler to set up and to use. Instead of needing to worry about creating sensible, efficiency enhancing directory structures and the like, S3 users simply dump their data into buckets. Given the right key and the appropriate access rights, data hosted on S3 can be retrieved or modified in much the same way as with the document-oriented, SQL-spurning databases that have become so popular in recent years.

That accounts for a great deal of the continuing popularity of S3, but there are real drawbacks. Compared to hosting data in a traditional, SQL-equipped database, for example, searching through a store kept on S3 has traditionally been quite a bit more difficult. While an Oracle database will come ready to plow through terabytes of data in order to find the piece that a user or developer might want, S3 on its own does not offer any such functionality.

0 (1)

For some users, that is acceptable, but many more would like to be able to index, search, and analyze the stores of data they upload to S3. Even basic text mining is not by default enabled by S3, but that is not to say that it is anything like impossible.

In fact, there are a number of excellent tools today for delving into data kept on S3 and analyzing it in any number of different ways. Just as no-SQL databases like MongoDB rose to prominence with a promise of making it easier to analyze documents of all kinds, in fact, so is S3 now starting to become recognized for delivering the same thing.

The difference is that, in the case of S3, most will want to make use of some third-party tools in order to accomplish this task. Thanks to the great popularity of the service, a number of the top enterprise search and text analysis tools now interface eagerly and effectively with S3, making it easy for their users to do whatever they might want.