Tracking Passenger Trains with Trains.FYI
While sitting on a delayed Via Rail train stuck behind others, I started to wonder about train traffic, and I started looking for a map of trains in North America. Those that I found were a bit complicated or difficult to understand as a layperson - so I decided to build my own - trains.fyi.
The Concept
Trains.FYI is a website that maps out the real-time location of passenger trains in North America. It's built to show trains (and the definition of train here is loose) from various different networks on a map.
Finding Trains
Passenger rail in North America is a bit of a mess. This also includes the available public data. A big part of the behind-the-scenes magic of trains.FYI involves taking train data from a huge variety of sources, normalizing it, categorizing it, and plotting it all on one big map.
What is a "train"?
It turns out, this is actually a bit complicated. For the purposes of trains.fyi, I've decided that anything a 5-year-old would call a "train" gets to make the website.
Which trains make the cut?
Train positions are only included on trains.fyi if they actually provide real-time GPS data. Not all train networks make this data publicly available. For instance, some only estimate their position and provide "5 minutes" style arrival times; these ones do not make the cut. If the rail network doesn't provide GPS data, it's not included on the map.
Technical Specs
The site has gone through two major rounds of iteration so far. Here's the story.
I first built the site as a combination of GitHub Pages for the front-end, and Google Functions for the backend. That worked a little something like this:
- GitHub Pages hosted a basic HTML site with some javascript, which fetched a JSON file from a Google Cloud Storage bucket, that was updated by the Google Cloud Function every minute. I chose GitHub pages because it's free with my GitHub account.
- A Google Cloud Function ran every minute to retrieve train data from various locations across the internet. It then via a pipeline, normalized the data, categorized it, and wrote the data to a Google Cloud Storage JSON file (where the front-end could fetch it from).
Here's an snippet of the normalized data returned by the cloud function.
{
"name": "1 (11-26)",
"company": "Via Rail",
"latitude": 46.5832,
"longitude": -80.9412,
"origin": "Toronto",
"destination": "Vancouver",
"speed": 65,
"direction": 323
}
The site was built this way first, because I wanted it to be two things above all:
- Cheap
- Fault Tolerant
While it was very fault-tolerant, it turned out not to be that cheap. Since the Google Cloud Function was running every minute, and I was adding more and more train networks each day (which required more compute power), I quickly exceeded the free-tier of Google Cloud Functions. Since this is less of a monetizable product, and more a niche website that I built to make the internet more fun - it needs to have a net-zero cost to maintain.
Enter Phase Two: Digital Ocean & Django
About a month into maintaining the project, I traded fault tolerance for cost-savings. I converted the site to run as a Django project, with the front-end and back-end running on the same $6 USD Digital Ocean droplet. Excluding the costs of my time for maintaining the project, but including the domain and compute power, this project now costs about $9 USD a month to run. Fortunately, through ad-revenue and donations, this project is able to break-even most months.
Of course, there are some trade-offs to these cost savings. The first, being fault tolerance. Running this project on a virtual machine, I now personally become much more responsbile for the more of the stack. This means there's more to break where it's my job to fix it. As opposed to Google Cloud Functions, where the only thing I needed to worry about was my code, I now need to worry about the underlying operating system, its security, networking, memory, disk usage, and storage. I'm still working out a few bugs here and there which cause the site to go down from time to time, but mostly this is a hands-off project that maintains itself.
Frequently Asked Questions (FAQ)
I get a lot of emails about this project - it seems to be favourited by a number of train afficianados (of which I am unfortunately not - I just hate Canadian train travel, which spiraled into this project). Here's some of the questions I get commonly asked.
Can you add freight trains too?
I wish. I went down a deep rabbit hole a little while ago to see if this is possible; it's generally not. While there are some rail-fan websites that allow tracking of freight trains across various signal blocks via listening into radio signals, this isn't something that can realistically be handled through this project. Even if I was able to capture all the signal data, I wouldn't have GPS locations of freight trains, only approximations of where they are (which goes against one of the rules for tracking passenger trains on this site).
What is a train?
Heavy Rail, Light Rail, Street Cars and Subways are all fair game, so long as they provide real-time GPS data on their positions.
Why don't you have New Jersey Transit?
I'm trying. They won't get back to me with API credentials.
What's the purpose of this site?
It's more of an art-project than it is an actual useful application.
How often does the data update?
The site fetches new data from individual providers every 1 minute.
I'm on a train, and the map says the train's position is 3 minutes behind where it actually is, why?
This is a combination of a delay from trains.fyi, and a delay from the data provider itself. Each network does something a bit differently, but most will have an on-board GPS component on the trains report its position via the internet to the train providers central servers every 10-seconds or so. From here, the train providers generally publish the latest data to their APIs on a one-minute interval. Some of them even choose to include a purposeful few-minute delay of data for security purposes.
Because trains.fyi fetches data on a 1-minute interval, it's possible the data being fetched is a few minutes out of date: For example, in a worst-case scenario, it would look something like this:
- 0 seconds - train is at this particular position, and sends GPS location to central server via internet
- 1 second - train provider server receives updated train location
- 59 seconds - train provider server publishes the latest data (retrieved 58 seconds ago) to the site
- 1:58 seconds - trains.fyi fetches the latest data on our 1-minute sync interval
- 2:30 seconds - Your map refreshes, showing the train's position from 2:30 seconds ago.
Do you have any other countries besides Canada and the USA
Not yet; I haven't been able to find a provider in other North American countries that gives live train GPS positions.
Are you planning on expanding this project to beyond North America
Maybe! If this is something that would interest you, please let me know.
I have an idea for a feature, can I email you?
For sure! Use the Contact link at the top of this page.
Questions
Any questions, comments, concerns, suggestions? Send me an email!
Interested?
Check out the project for yourself!
Let's build something big together
I build creative marketing experiences and projects with technology.