home > INVESTIGATOR

Powerful plagiarism and collusion detection
Used by the professions, where data security is a prime concern. Uses 'fuzzy-matching' to detect re-writing as well as direct copying.

Sophisticated searching
A completely new way of searching large numbers of documents No Keywords, no Proximity setting, no AND's or OR's. Just use a whole document or set of documents for the search entry. Investigator does the rest by identifying similar sentences in the documents, and presenting the results ordered by the strongest sentence links between documents.

Fast
Scaleable
GUI or automatic
Multi-threaded, multi-processor capability
Multi-platform - written in Java.

CopyCatch Investigator Search Screen
Purpose

CopyCatch Investigator looks for similarity between sentences in documents without using keywords or any other user entered search patterns. This makes it different from most search engines and databases in three ways.

Despite the much more detailed comparisons required, all the words in the document, not just the words in a query, CopyCatch Investigator delivers fully marked up document pairs for a single document in around one second.

Interface

The interface has been designed in consultation with users, and is extremely simple to operate, with only two main screens.

Two further screens give you information about the current document pair:

Presentation

Sentences identified as similar are shown side by side, in the order of the document being used as a query. The similar sentences are cross-referenced to the position in the current indexed file which has been found to share material. You have the option of seeing both files side by side fully marked up, so that related sentences can be seen in the context of the different or less similar sentences. In the screen shot above, you can see that sentence 17 on the right is a modified cut and paste of 46 on the left, (or vice versa), whereas 19 is almost certainly a contraction of 48. You can also see that neither example has long successive runs of words in common. The program does not take account of word order, either, so substantial re-writing can be identified.

Indexing

Investigator is built with the recognition that different users have different requirements

Investigator allows the user to set the levels of indexing and the words which should be ignored.The user can also choose the number of indexes, so can index in larger or smaller units, or have different levels of indexing on the same set of documents.

Reporting

Levels of reporting are also chosen by the user.

All the sliders which set the limits are interactive, so just need to move them up if you have got too much or down if you have got less than you expected or need.

Search

Both web searchers and database search engines are very fast at delivering answers once you have formulated the questions. What users and the suppliers of the search software tend to overlook is that the total search time involves

The fastest bit by far is the searching. You, as the user, have most of the work to do. CopyCatch Investigator removes the first stage altogether. All you need to do is decide how many results you think you need. It also accelerates stage three considerably, because all the matching is immediately visible, and the results are sorted to help you find the documents most similar to the query document.

Multilingual

The program uses lists of function words to assist the discrimination process, so if you have such a list in a plain text file then you can switch languages with a couple of mouse clicks. We have a number of such lists available on request. Note: It can't find similarities between documents written in two different languages

top