Thursday, February 27, 2014

Center Map on Layer Change in Leaflet

In Leaflet, it can be helpful to change the bounds of the map when the user adds or changes the visible map layers.

The Basics

First, we'll start with the initial code including our map and polygon layers:

  var circle = L.circle([51.508, -0.11], 500, {
      color: 'red',
      fillColor: '#f03',
      fillOpacity: 0.5
  });

  var polygon = L.polygon([
      [51.509, -0.08],
      [51.503, -0.06],
      [51.51, -0.047]
  ]);

  var map = L.map('map', {
    center: [51.505, -0.09],
    zoom: 13
  });

  var overlayMaps = {
    "Circle": circle,
    "Polygon": polygon
  };

  L.tileLayer('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
    attribution: '&copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>'
  }).addTo(map);

  L.control.layers(overlayMaps, null, {
    collapsed: false
  }).addTo(map);

So far, most of this should be familiar from the Leaflet Quick Start Guide.

Next, we want to add a listener function that will zoom and re-center upon a change in the circle or polygon layer.

The .on method of map allows you to watch for an event to occur and then execute a function when it does. In this case, we want to wait for the event 'baselayerchange'. This way the map will automatically zoom and recenter when the user changes layers.


  map.on('baselayerchange', function(e) {
    console.log(e);
    map.fitBounds(e.layer);
  });


The map automatically zooms to the bounds of the shape when the layer is activated.

Options

There are a few options for exactly how the map is zoomed and/or re-centered and for what type of layers are affected.

If you want to do this with overlay layers instead of base layers, you can substitute 'overlayadd' for 'baselayerchange'. Using overlay layers is more common for drawing shapes, but treating your layers as base layers makes it easy to display only one at a time.

  map.on('overlayadd', function(e) {
    console.log(e);
    map.fitBounds(e.layer);
  });

fitBounds automatically zooms to the tightest zoom level where the whole shape is visible. If you don't want to use fitBounds (say you're centering on a new overlay layer and don't want to zoom all the way in), you can use setView or panTo instead. panTo animates as the view changes.

  map.on('overlayadd', function(e) {
    console.log(e);
    map.panTo(e.layer);
  });

And there you have a few different ways to center and/or zoom in when an overlay layer is added or when the baselayer is changed. I'd recommend you check out the documentation for fitBounds, setView, and panTo and play around with the options. The options for L.control.layers are also helpful. For example, you can set collapsed to false to encourage users to change the layers.

The full code is available in a gist.

References


Thursday, February 20, 2014

Merge by City and State in R

Often, you'll need to merge two data frames based on multiple variables. For this example, we'll use the common case of needing to merge by city and state.

First, you need to read in both your data sets:

# import city coordinate data:
coords <- read.csv("cities-coords.csv",
  header = TRUE,
  sep = ",")

# import population data:
data <- read.csv("cities-data.csv",
  header = TRUE,
  sep = ",")

Next comes the merge. You can use by.x and by.y to declare which variables the merge will be based on. If the variables have exactly the same name in both data sets, you can use by instead of by.x and by.y.

x and y represent the two data sets you are merging, in that order.

You also want to state whether you want to include all data from either data set, using all or all.x and all.y. In this case, we want to make sure we hold onto all our city data, even data for the cities we do not have coordinates for.

# merge data & coords by city & state:
dataCoords <- merge(coords, data, 
  by.x = c("City", "State"),
  by.y = c("city", "state"),
  all.x = FALSE,
  all.y = TRUE)

Running that code shows what we would expect. Houston is included in the final data set even though there are no coordinates for it, while Dallas is not included since it has coordinates but no data:

            City State Latitude  Longitude year population
1        Chicago    IL 41.85003  -87.65005 2012    2714856
2       Columbus    GA 32.46098  -84.98771 2012     198413
3       Columbus    OH 39.96118  -82.99879 2012     809798
4       Columbus    OH 39.96118  -82.99879 2010     787033
5    Los Angeles    CA 34.05223 -118.24368 2012    3857799
6       New York    NY 40.71427  -74.00597 2012    8336697
7       New York    NY 40.71427  -74.00597 2010    8175133
8  San Francisco    CA 37.77823 -122.44250 2012     825863
9  San Francisco    CA 37.77823 -122.44250 2010     805235
10       Houston    TX       NA         NA 2012    2160821


Bonus

If you'd like to get a list of which cases got merged in but lack coordinate data, there's a simple line of code to do that:

> dataCoords[!complete.cases(dataCoords[,c(3,4)]),]
      City State Latitude Longitude year population
10 Houston    TX       NA        NA 2012    2160821

Also, you might want to tidy up the names of your variables, if they followed different conventions in their respective initial data sets:


> names(dataCoords) <- c("City", "State", "Latitude", "Longitude", "Year", "Population")
> dataCoords
            City State Latitude  Longitude Year Population
1        Chicago    IL 41.85003  -87.65005 2012    2714856
2       Columbus    GA 32.46098  -84.98771 2012     198413
3       Columbus    OH 39.96118  -82.99879 2012     809798
4       Columbus    OH 39.96118  -82.99879 2010     787033
5    Los Angeles    CA 34.05223 -118.24368 2012    3857799
6       New York    NY 40.71427  -74.00597 2012    8336697
7       New York    NY 40.71427  -74.00597 2010    8175133
8  San Francisco    CA 37.77823 -122.44250 2012     825863
9  San Francisco    CA 37.77823 -122.44250 2010     805235
10       Houston    TX       NA         NA 2012    2160821


The full sample code is available as a gist.

References


gist

Change R code tabs to 2 spaces.

bold package names
italicize functions

http://hilite.me/
manni

for R posts, use: sessionInfo()

Thursday, February 13, 2014

ggplot Fit Line and Lattice Fit Line in R

Let's add a fit line to a scatterplot!

Fit Line in Base Graphics

Here's how to do it in base graphics:

ols <- lm(Temp ~ Solar.R,
  data = airquality)

summary(ols)

plot(Temp ~ Solar.R,
  data = airquality)
abline(ols)

Fit line in base graphics in R



Fit Line in ggplot

And here's how to do it in ggplot:

library(ggplot2)
ggplot(data = airquality,
    aes(Solar.R, Temp)) + 
  geom_point(pch = 19) + 
  geom_abline(intercept = ols$coefficients[1],
    slope = ols$coefficients[2])

You can access the info from your regression results through ols$coefficients.

Edit: Thanks to an anonymous commenter, I have learned that you can simplify this by using geom_smooth.  This way you don't have to specify the intercept and slope of the fit line.


ggplot(data = airquality,
    aes(Solar.R, Temp)) + 
  geom_point(pch = 19) + 
  geom_smooth(method = lm,
    se = FALSE)

Fit line in ggplot in R

Fit Line in Lattice

In lattice, it's even easier. You don't even need to run a regression; you can just add to the type option.


library(lattice)

xyplot(Temp ~ Solar.R,
  data = airquality,
  type = c("p", "r"))

Fit Line in Lattice in R
The code is available in a gist.

References


Thursday, February 6, 2014

Compare Regression Results to a Specific Factor Level in R

Including a series of dummy variables in a regression in R is very simple. For example,

ols <- lm(weight ~ Time + Diet,
  data = ChickWeight)
summary(ols)

The above regression automatically includes a dummy variable for all but the first level of the factor of the Diet variable.

Call:
lm(formula = weight ~ Time + Diet, data = ChickWeight)

Residuals:
     Min       1Q   Median       3Q      Max 
-136.851  -17.151   -2.595   15.033  141.816 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  10.9244     3.3607   3.251  0.00122 ** 
Time          8.7505     0.2218  39.451  < 2e-16 ***
Diet2        16.1661     4.0858   3.957 8.56e-05 ***
Diet3        36.4994     4.0858   8.933  < 2e-16 ***
Diet4        30.2335     4.1075   7.361 6.39e-13 ***
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 35.99 on 573 degrees of freedom
Multiple R-squared:  0.7453,  Adjusted R-squared:  0.7435 
F-statistic: 419.2 on 4 and 573 DF,  p-value: < 2.2e-16

This is great, and it's often what you want. But in this case, it's comparing each of the diets to Diet1. In some cases, you might want to compare to a specific diet that isn't the first in the factor list.

How can we choose which dummy to compare to? Fortunately, it's simple to compare to a specific dummy in R. We can just relevel the factor so the dummy we want to compare to is first.

ChickWeight$Diet <- relevel(ChickWeight$Diet,
  ref = 4)

olsRelevel <- lm(weight ~ Time + Diet,
  data = ChickWeight)
summary(olsRelevel)

The "ref" argument allows us to change the reference level of the factor variable. This means that when we perform regression analysis

You can use table or str to find the factor levels, if you don't already know them.

After releveling the factor variable, we can simply perform the same regression again, and this time it will compare the results to the new reference level:

olsRelevel <- lm(weight ~ Time + Diet,
  data = ChickWeight)
summary(olsRelevel)


Call:
lm(formula = weight ~ Time + Diet, data = ChickWeight)

Residuals:
     Min       1Q   Median       3Q      Max 
-136.851  -17.151   -2.595   15.033  141.816 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  41.1578     4.0828  10.081  < 2e-16 ***
Time          8.7505     0.2218  39.451  < 2e-16 ***
Diet1       -30.2335     4.1075  -7.361 6.39e-13 ***
Diet2       -14.0674     4.6665  -3.015  0.00269 ** 
Diet3         6.2660     4.6665   1.343  0.17989    
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 35.99 on 573 degrees of freedom
Multiple R-squared:  0.7453,  Adjusted R-squared:  0.7435 
F-statistic: 419.2 on 4 and 573 DF,  p-value: < 2.2e-16

Now we can choose any factor level as the reference for the series of dummies in the regression analysis. The code is available in a gist.

Reference

Sunday, February 2, 2014

LWIMW3: Trail Magic

I just finished my submission for Look What I Made Weekend 3. Look What I Made Weekend (LWIMW) is a chance for people to create something over the course of 48 hours. The concept is based on Ludum Dare and other game jams, but for LWIMW you don't have to make a game. Instead, you are free to pursue any creative endeavor and show off your results at the end.

NB: The content below is mostly a reprint of my submission at LWIMW.

My project is an interactive website that will be part of the companion site to a book my friend Scott Thigpen is writing.
I didn't quite start from scratch on this project. This image shows the progress I had made before the weekend. You can also view it on the web.

pre-LWIMW

The other images show the current status after the weekend.
I made a lot of progress this weekend. I added GPS routes to the map, added a table of contents, added marker clustering, improved the graphic design (CSS and basemap), and added about 25% of the final content.
post-LWIMW

post-LWIMW

The only essential things I have left to do are adding the rest of the content, working with my friend on palette and graphics, and tweaking some small things.

You can view the current state of the Trail Magic site here.