You can browse the software at
http://ims-svn.dei.unipd.it/repos/datacitation/
Username: guest - Password: guest
You can check it out using Subversion
$ svn checkout --username guest --password guest
http://ims-svn.dei.unipd.it/repos/datacitation/ datacitation
Documentation
The JavaDoc is available at the URL:
http://www.dei.unipd.it/~silvello/datacitation/learningtocite
We build the experimental collection by using the Library of Congress digital finding aids collection encoded in the EAD format which is publicly available at the following URL: http://findingaids.loc.gov/.
To build the training and validation set, we selected at random 25 EAD files and for each one of these files we randomly extract 4 citable units; we obtained a set of 100 XPaths identifying an equal number of different citable units. For each citable unit (i.e., XML element), we manually created a human-readable citation to be used to train the citation system and a machine-readable citation to build the ground-truth to be used for validation purposes.
The test set has been built by following a similar procedure: from the whole EAD collection minus the 25 files selected for the training and validation set, we randomly selected 50 EAD files and for each one a single citable unit has been selected at random. Then, we manually created a ground-truth machine-readable citation for each one of these randomly sampled citable units. We created the ground-truth citations by following the guidelines provided by the archives of the Purdue University which follows the Modern Language Association (MLA) citation style.
You can browse the test collection at
http://ims-svn.dei.unipd.it/repos/datacitation_collections/
Username: guest - Password: guest
You can check it out using Subversion
$ svn checkout --username guest --password guest
http://ims-svn.dei.unipd.it/repos/datacitation_collections/ datacitation_collections
Peter Buneman and Gianmaria Silvello. A Rule-Based Citation System for Structured and Evolving Datasets, IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 3, No. 3. IEEE Computer Society, pp. 33-41, September 2010. Download:
The JavaDoc is available at the URL:
http://www.dei.unipd.it/~silvello/datacitation/rulebasedsystem