1 basic concept
1.1 non-extractive, document-centric parsing
1.2 virtual token descriptor
1.3 location cache
basic concept
non-extractive, document-centric parsing
traditionally, lexical analyzer represents tokens (the small units of indivisible character values) discrete string objects. approach designated extractive parsing. in contrast, non-extractive tokenization mandates 1 keeps source text intact, , uses offsets , lengths describe tokens.
virtual token descriptor
virtual token descriptor (vtd) applies concept of non-extractive, document-centric parsing xml processing. vtd record uses 64-bit integer encode offset, length, token type , nesting depth of token in xml document. because vtd records 64 bits in length, can stored efficiently , managed array.
location cache
location caches (lc) build on vtd records provide efficient random access. organized tables, 1 table per nesting depth level, lcs contain entries modeling xml document s element hierarchy. lc entry 64-bit integer encoding pair of 32-bit values. upper 32 bits identify vtd record corresponding element. lower 32 bits identify element s first child in lc @ next lower nesting level.
Comments
Post a Comment