Any regular search for
a popular topic would result in an abundance of information and thus it is
impossible to go through these large amounts of data manually to understand the
trends.
This thesis discusses techniques for the intra-topic clustering of such social media
data and discusses how social media noise increases the redundancy of the search
results.
Our goal is to filter the amount of redundant information an end-user must
review from a regular social media search. The research proposes clustering
models based on two string similarity measures Jaccard word token and TInformation distance. Evaluation parameters are introduced and the models are
evaluated by clustering a set of current and historical topics to determine which
techniques are the most effective.
Full thesis text here: Uttej's thesis