A generic method for the automatic extraction of apparent semantic document structure from a structural calculation document was proposed in this paper. The method consists of two processes: extracting subtitles and classifying depth levels of the subtitles. The subtitles become tree nodes of the apparent semantic structure. A context model of technical documents was built for the subtitle extraction from plain text information. In addition, a formal classification method for the determination of depth levels of the subtitles was developed and used to build a document tree with sequentially ordered subtitles. An application module of the proposed method, which transforms a plain text document into a semistructured XML document, was implemented. Performance of the developed application module was also evaluated with 40 test documents including structural calculation documents, technical reports, and theses.
|Number of pages
|Journal of Computing in Civil Engineering
|Published - 2010
All Science Journal Classification (ASJC) codes
- Civil and Structural Engineering
- Computer Science Applications