Where Are My Facebook Commenters? Part 1

If your website or blog uses Facebook comments as its discussion platform, you may be unknowingly sitting on a goldmine of marketing data. In one interpretation, people who take the time to comment on your website (not spam) are some of your most engaged readers. Or enraged readers, depending on how objectionable they find your content :). Given Facebook’s aggressive “real name” requirements, having your most engaged readers’ valid identities and information can be useful for marketing strategy.

What is one digestible use of this data? A simple exercise is plotting the known locations of your commenters. Not every commenter publicly broadcasts their location, so the input dataset is not perfect, but incomplete is much better than nonexistent. FiveThirtyEight.com is a leading data journalism website, founded by Nate Silver. In addition to Nate, one of the most prolific authors on the site is the lead writer for FiveThirtyEight’s Datalab, Mona Chalabi. Here is a quick comparison of the known locations of commenters from each of their last 50 articles, with comment density by location.

All credit for the beautiful and user-friendly mapping software goes to the great folks at CartoDB.

These maps are visually interesting and possibly fun for assigning bragging rights, but honestly, the full list of commenters (with their corresponding Facebook UIDS) and comments is much more useful. Facebook’s custom audience advertising is very much a walled garden, but it theoretically allows you to do interesting things with highly targeted ads.

If you are a high quality, but niche content producer like FiveThirtyEight, trying to mass promote your content a la Buzzfeed is probably the equivalent of dumping your advertising budget into the mysterious black holes featured in Interstellar. I haven’t tried this personally, so I admit that I’m just speculating.

A much more fruitful strategy of organically growing readership would be to market popular articles directly (using custom audiences) to your existing commenters, who may or may not be currently sharing your content on social channels. There is possibly no group of people more likely to share your content than the very people who are actively contributing to the discussion on your blog (and potentially interacting with the authors themselves).

Crawling Facebook comments is, surprisingly, not extremely straightforward. Because Facebook comments are often not fully accessible unless javascript interactions (such as clicks) are triggered after page load, I built a Zillabyte component that automates most of this process. Once a browser automation or headless browser tool like Selenium or CasperJS is attached to the Facebook Comment virtual frame, the process is pretty standard between sites.

Here is how to use that component in a Zillabyte app:

import zillabyte

app = zillabyte.app(name = "facebook_comments")
stream = app.source_from_csv("urls.csv", headers=["url"])
stream = stream.call_component("facebook_comment_extractor")

sink = stream.sink(name="facebook_comments", columns = [{"full_name":"string"},{"facebook_id":"string"},\
                                              {"page_url":"string"},{"location":"string"}, {"comment":"string"}\
                                              {"author":"string"}, {"page_title":"string"}]

Example row of the resulting CSV: Eric Prange,male,”Silver Spring, Maryland”,erprange,http://fivethirtyeight.com/datalab/which-state-has-the-worst-drivers/,MONA CHALABI,”Dear Mona, Which State Has The Worst Drivers?”,Why are loses per driver only 10-20% of cost of premiums? Is there really that much overhead/profit or am I missing something here?