[Reel Two]

Home News Products Services Company

Reel Two offers range of powerful tools for text search and analysis. SureChem enables users to quickly identify all the chemical compounds in text documents, such as patents or journal abstracts. Classification System is a versatile application allowing rapid, custom categorization of text files or documents in either desktop or enterprise configurations. The Entity Extractor is a software development kit that lets users implement their own systems of identifying, tagging and marking up documents for specific types of terms, such as people, place names and company names. This can be combined with Classification System for a powerful search and analysis system.

SureChem Portal SureChem Database Classifier

Data Sets

Reel Two is constantly engaged in developing new datasets that test the performance and robustness of the Classification System technology. The sample datasets included here are from a variety of applications in different industries, and demonstrate the Classification System's ability to work with both text and nominal data formats.

The table lists statistics for each dataset, including "Build Time" and "F-Measure". Build Time is the time to load, model and evaluate (using Leave-One-Out evaluation) a dataset on a WinXP/1GHz Celeron/256MB computer. F-Measure is the micro-averaged F-Measure across all categories in the dataset.

Categories Instances Build Time F Measure
Reuters-21578 (Top 10)

The Reuters News research dataset is a compilation of new stories from Reuters News organized into a number of topics. Identifying the documents from the largest 10 categories is one of the most popular text categorization tests.

Download: Reuters.ratz

License: Restricted

Original Dataset: Maintained here by David Lewis of AT&T.

10 2,535 15 seconds 0.9121
Gene Ontology (GO) MEDLINE Abstracts

The GO dataset is an association of MEDLINE research abstracts that have been classified according to the Gene Ontology, a structure encoding information about gene products and functions.

Download: Gene Ontology.ratz

License: Restricted

Original Dataset: Maintained here by the United States National Library of Medicine.

72 2,721 45 seconds 0.7242
Jaguar: Car or Cat

Reel Two created this dataset as a basic demonstration of the categorization task. The dataset consists of documents containing the word "Jaguar", but are they about the car or the cat?

Download: Jaguar.ratz

License: Public Domain

2 200 n/a n/a
Language Recognition

The Reel Two Classification System supports 25 languages via the built-in facilities of the Java programming language. This dataset was created by Reel Two to demonstrate that capability. News was sampled from a variety of news sources around the world.

Download: Multilingual.ratz

License:

25 626 10 seconds 0.9774
Steel Annealing

Download: Anneal.ratz

License: Public Domain

Original Dataset: Maintained here by the University of California, Irvine (UCI).

6 798 5 seconds 0.9211
Diabetes Detection

Download: Diabetes.ratz

License:

Original Dataset: Maintained here by the University of California, Irvine (UCI).

2 421 2 seconds 0.7981
Gene Splicing

Download: Splice.ratz

License:

Original Dataset: Maintained by the University of California, Irvine (UCI).

3 3,190 15 seconds 0.9128