Mozy Monday

This is by far the weirdest winter I've lived through yet. Supposedly there is some snow on the way, say weather forecasters, but I'm skeptical. Even if there is some, it's probably only going to stick around for a day or two. On the bright side, I haven't had to deal with icy sidewalks on steep hills, or walking to campus in deep snow or slush.

In other news, my workload has just cranked up a notch. One homework due Friday, another Monday, and whenever I have free time I need to begin writing my project report that will be my golden ticket out of here. I am getting more excited about my topic though, text mining. It's a relatively new area, in addition to being pretty difficult. This is only because usually statistics deals with numbers. We can calculate to our hearts delight with numbers, but with words...well, it's complicated. Now it's not numbers anymore. In addition to dealing with words, there are all the intricacies of the English language to take care of. Lucky for me, someone has already written code out there that will attempt to remove endings of words so that when word frequencies are counted, the same word isn't counted as two separate words. For instance, if the words write, writer, writing, writes, all show up in a document, then I want to count how many times those words show up but in the end put those counts all under the same word, e.g. "writ" since that is the root of the word. This in itself (I'm sure) is extremely difficult to code up; to try to have a computer (that doesn't understand language nearly as well as a human does) manipulate and understand language so that analysis can be done after the document is processed. We won't even get started on "what if there's a typo?", I'll let you magnify how much harder that would make it. This is just a small piece of what I'm working on for my project report. It's a little funny because I haven't actually done any of the analysis part yet, that comes after I've preprocessed all the documents into a workable form.

It's going to be funny when I start writing the paper part. It's so easy to write this blog, but come something I have to turn in and present, then writing somehow gets away from me. I'm not too worried about it, but it sure is more fun to write about anything I want instead of trying to keep the focus on the same topic for 17 pages. Slow and steady though right?

Something kinda fun sounding to leave you with, take it for what you will.

  • Thomasina: If there is an equation for a curve like a bell, there must be an equation for one like a bluebell, and if a bluebell, why not a rose? Do we believe nature is written in numbers?
    Septimus: We do.
    Thomasina: Then why do your shapes describe only the shapes of manufacture?
    Septimus: I do not know.
    Thomasina: Armed thus, God could only make a cabinet.
    Tom Stoppard, Arcadia (1993)

Comments

Popular posts from this blog

A Busy Grad Student's Update on Life

Serene Saturday

Countdown to End of Line