sf tweet data

(please be patient while it loads)

In the demo, you'll see 15000 tweets that I logged in Jan 2015, displayed in WebGL with latitude, longitude and elevation. You can toggle a 2D histogram of the density tweets within a square and also the census tracts to show population density for comparison. This was a lot of fun and I want to talk a little bit about the process. All images are screenshots from the Demo above.

Data Sources

Tweets

Twitter API and Tweepy Python Library. I made a mistake here and saved the ENTIRE TWEET (retweets, text, user info, etc..) to MongoDB if it was in the SF area. This added up pretty quickly and my browser/Mongo REST API couldn't handle loading more than 15000 tweets without crashing. Next time I'm going to save just the coordinates, and use the GeoJSON objects so that I can do spatial queries in Mongo. That shit's dope - I can get that historgram computed fo' free.

Neighborhoods

SF Data. Nice GeoJSON of the hoods. This place has a mind-blowing amount of data about basically any kind of public service you can think about in SF. I will definitely look around here in the future for inspiraiton.

Topography

CGIAR - Consortium for Spatial Information . A while ago I wrote a python script to take data from this website and generate a 3D mesh of any surface in the world (I should do a writeup about that with some sweet renders in the future). Topographical info seemed like a nice addition - if at least just to get a feel for the coastline. This site is dope because you can download a .tif of anywhere in the world, with the kind of resolution you see in this demo, and the value of each pixel is the elevation. Here's a citation:

Jarvis A., H.I. Reuter, A. Nelson, E. Guevara, 2008, Hole-filled seamless SRTM data V4, International Centre for Tropical Agriculture (CIAT), available from http://srtm.csi.cgiar.org.

Population

SF Department of Public Health . This was by far the hardest to wrangle into coordinates into THREE.js. I got a zip of about 10 files, with extensions I had never seen like .dbf, .sbn, .shx, .sbx; and a useless .xml. After some failed binary parsing of a few of them, some google-fu converted it to some kind of JSON and It happened to line up perfectly with the neighborhood data. YUH

Elevation

One feature I think is pretty neat is that the tweets and neighborhood outlines are actually at the same elevation as the topo mesh (I've definitely exaggerated the topography here to make the hills more noticeable). Bilinear interpolation on the elevation image was my first idea, but then I thought about the THREE.js Raycaster object. Starting a ray at each tweet or neighborhood vertex, and then raycasting straight up to find the intersection with the topo gives you the evelation! After finding the intersection, add a little extra elevation so that everything floats above the ground a bit. This takes too long to compute for every point, so I baked it into my points dataset (you're welcome). Maybe not the most elegant but easiest for me to implement.

Tweet Histogram

The histogram part is pretty simple, I can specify a N x N grid that is overlaid on the tweet data, then i go through all 15k tweets and increment the box that they land in. finally you normalize the count by dividing all boxes by the total - leaving th percentage of tweets in each box. I make a semi-transparent box in THREE.js and place it over the grid cell. I like this technique because it's not always easy to see how densely packed dots are, especially if there is overlap. Being able to see both is a huge help for me, at least.

I wanted to be able to dynamically change the number of bins and have the histogram reload, but it's pretty slow re-counting all the tweets. You can get around that by making binsizes that go down in size by half each level - and store the lowest level. Then for higher levels you just add up 4/16/64 bins and that's pretty fast but I didn't want to bother it. If you're a 1337 haxor take a look at the codes and change the hist_params.bins

Census Tract Extrusion

Something was missing - like an experiment without a control group. Population density seemed like a good thing to look at, but how to make a histogram out of it? I could think of a few ways, but settled with taking the 2010 census tracts (similar size to the histogram cells) and extruding them up by the magnitude of the population. THREE had some issues triangulating most of the polygons, but after tweaking the lighting it's not too apparent overall. Still something I want to look into.