Identification

Author

Stewart A, Denecke K

Title

Using ProMED-Mail and MedWorm blogs for cross-domain pattern analysis in epidemic intelligence.

Year

2010

Publication type

Article

Journal

Stud Health Technol Inform.

Created

2016-03-17 21:48:27.142793+00:00

Modified

2016-05-23 21:42:02.959397+00:00

Details

Volume

160

Number

Pt 1

Pages

437-441

Access

Language

English

URL http://www.ncbi.nlm.nih.gov/pubmed/20841724
Accessed

2016-05-23

Extended information

Abstract

In this work we motivate the use of medical blog user generated content for gathering facts about disease reporting events to support biosurveillance investigation. Given the characteristics of blogs, the extraction of such events is made more difficult due to noise and data abundance. We address the problem of automatically inferring disease reporting event extraction patterns in this more noisy setting. The sublanguage used in outbreak reports is exploited to align with the sequences of disease reporting sentences in blogs. Based our Cross Domain Pattern Analysis Framework, experimental results show that Phase-Level sequences tend to produce more overlap across the domains than Word-Level sequences. The cross domain alignment process is effective at filtering noisy sequences from blogs and extracting good candidate sequence patterns from an abundance of text.