ArGoT: A Glossary of Terms extracted from the arXiv

Luis Berlioz
(University of Pittsburgh)

We introduce ArGoT, a data set of mathematical terms extracted from the articles hosted on the arXiv website. A term is any mathematical concept defined in an article. Using labels in the article's source code and examples from other popular math websites, we mine all the terms in the arXiv data and compile a comprehensive vocabulary of mathematical terms. Each term can be then organized in a dependency graph by using the term's definitions and the arXiv's metadata. Using both hyperbolic and standard word embeddings, we demonstrate how this structure is reflected in the text's vector representation and how they capture relations of entailment in mathematical concepts. This data set is part of an ongoing effort to align natural mathematical text with existing Interactive Theorem Prover Libraries (ITPs) of formally verified statements.

In Temur Kutsia: Proceedings of the 9th International Symposium on Symbolic Computation in Software Science (SCSS 2021), Hagenberg, Austria, September 8-10, 2021, Electronic Proceedings in Theoretical Computer Science 342, pp. 14–21.
Published: 6th September 2021.

ArXived at: https://dx.doi.org/10.4204/EPTCS.342.2 bibtex PDF
References in reconstructed bibtex, XML and HTML format (approximated).
Comments and questions to: eptcs@eptcs.org
For website issues: webmaster@eptcs.org