Information extraction on free texts is a difficult application due to frequently omitted or colloquial words and phrases with various ambiguities. We introduce a three-leveled information extraction architecture, which consists of instance extraction, filtering, and ranking rules. The extraction rules find instance candidates using named entities and context-independent lexico-semantic patterns. With context-dependent patterns and slot names produced from the previous step, the filtering rules remove improper candidates, and the ranking rules score the remaining instances. Finally, top-ranked instances of each slot are assigned to multi-targets. Experimental result shows 93.6 F-measure on e-mail texts that have three targets (header, address, and schedule) for personal information management domain.
© 2001-2026 Fundación Dialnet · Todos los derechos reservados