• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

ORR Origin-Destination Matrix 2021-22

Gaelan

Member
Joined
3 Apr 2023
Messages
813
Location
St Andrews
Appears to have largely escaped people here's notice, but the ORR now publishes the full origin-destination matrix on the Rail Data Marketplace. This is a dataset that provides an estimate of the passenger count on every station pair.

Downloading it directly from the RDM requires account signup, including a (relatively quick) human approval step. The ODM data itself is licensed under the OGL, however, so I've uploaded it to my Google Drive for the convenience of anyone wanting to take a look. It's a 16MB zip file that decompresses into a 100MB csv file.

Edit: Some spreadsheet software, including Excel, has a limit of around a million rows; the ODM has around 1.3 million. Here's a version I've edited to be sorted by passenger count, so any rows you lose will have 1 or 2 passengers/year.

The top 10 passenger flows are all in London, or London commuter routes:
FromToPassengers/Year
BarkingWest Ham2137040
London VictoriaEast Croydon1445091
Stratford (London)Highbury and Islington1223839
London EustonMilton Keynes Central1222244
East CroydonLondon Bridge1179265
London WaterlooClapham Junction1138866
Stratford (London)Ilford1092914
London VictoriaBromley South1091081
Stratford (London)Romford1061800
London WaterlooSurbiton999021
Note all flows have the same numbers in each direction; I'm not sure off hand if the numbers count single or return trips.

Also potentially of interest are two pdfs detailing the methodology:
 
Last edited:
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

PGAT

Established Member
Joined
13 Apr 2022
Messages
1,483
Location
Selhurst
What's the busiest passenger flow between 2 stations that do not have a direct train?
 

Nicholas Lewis

Established Member
Joined
9 Aug 2019
Messages
6,161
Location
Surrey
Appears to have largely escaped people here's notice, but the ORR now publishes the full origin-destination matrix on the Rail Data Marketplace. This is a dataset that provides an estimate of the passenger count on every station pair.

Downloading it directly from the RDM requires account signup, including a (relatively quick) human approval step. The ODM data itself is licensed under the OGL, however, so I've uploaded it to my Google Drive for the convenience of anyone wanting to take a look. It's a 16MB zip file that decompresses into a 100MB csv file.

The top 10 passenger flows are all in London, or London commuter routes:
FromToPassengers/Year
BarkingWest Ham2137040
London VictoriaEast Croydon1445091
Stratford (London)Highbury and Islington1223839
London EustonMilton Keynes Central1222244
East CroydonLondon Bridge1179265
London WaterlooClapham Junction1138866
Stratford (London)Ilford1092914
London VictoriaBromley South1091081
Stratford (London)Romford1061800
London WaterlooSurbiton999021
Note all flows have the same numbers in each direction; I'm not sure off hand if the numbers count single or return trips.

Also potentially of interest are two pdfs detailing the methodology:
Excellent find thanks for posting and downloading data.

Surprising Victoria to Bromley Sth is so high given its must have the worst service of all the above. Wonder how the industry uses the data to deal with capacity vs utilisation of service groups to manage service provision?
 

A S Leib

Member
Joined
9 Sep 2018
Messages
787
What's the busiest passenger flow between 2 stations that do not have a direct train?
I think I might have done something slightly wrong in downloading the data from Gaelan's Google Drive as Barking – West Ham isn't first for me, but London King's Cross – Sheffield (188,481) is the highest one I can see without direct services, probably via Doncaster.

2. Windsor & Eton Central – London Paddington (127,331)
3. Nottingham – London King's Cross (124,855) (via Grantham)
4. London Charing Cross – East Croydon (106,571)
5. London Euston – Derby (68,389) (presumably via Tamworth, maybe a small number via Stoke or Crewe)
6. London Paddington – Henley-on-Thames (65,040)
7. London Waterloo – East Croydon (55,465)
8. London Cannon Street – Chatham (53,433) (I may have missed some more common journeys which now need a change at London Bridge)
9. Highbury & Islington – Romford (47,189)
10. London Bridge – Woking (45,781)

The busiest without a direct service with origin and destination outside London as far as I can find is Henley-on-Thames – Reading (39,484), followed by Oxford – Swindon. (32,819); outside the southeast, I think it's Sheffield – Manchester Airport (25,525). If you want to exclude that for having had a direct service recently, it's Penzance – St. Ives (24,834), unless you're massively bothered by the one direct train per day.

Somehow there's 127,183 journeys from London Paddington to... London Paddington and 111,405 for Finsbury Park, similar for Luton Airport Parkway and others. All of the stations I've noticed that for so far are within the contactless PAYG area. There's also cases like 40,896 for Bedford – London King's Cross, so probably an error in deciding which journeys are to / from which London terminal.
 
Last edited:

JordR

Member
Joined
31 Aug 2014
Messages
25
I think I might have done something slightly wrong in downloading the data from Gaelan's Google Drive as Barking – West Ham isn't first for me, but London King's Cross – Sheffield (188,481) is the highest one I can see without direct services
That surely must be an incorrect allocation of London Terminals tickets or another reason, as no one would intentionally make that journey when it can be done direct over the road from St Pancras?
 

A S Leib

Member
Joined
9 Sep 2018
Messages
787
That surely must be an incorrect allocation of London Terminals tickets or another reason, as no one would intentionally make that journey when it can be done direct over the road from St Pancras?
Via Doncaster's often not much slower and quite a bit cheaper (especially with Grand Central / Hull Trains), and the same via Grantham from Nottingham.
 

thejuggler

Member
Joined
8 Jan 2016
Messages
1,186
That surely must be an incorrect allocation of London Terminals tickets or another reason, as no one would intentionally make that journey when it can be done direct over the road from St Pancras?
Cheaper to go LNER via Doncaster rather than a direct service from St Pancras to Sheffield. I have family just outside Sheffield and this is their preferred option.
 

CapabilityB

Member
Joined
27 Feb 2022
Messages
31
Location
York
I'm guessing split tickets would not show up the origin and ultimate end destination in this dataset.

Would be interesting to know how the potential impact of this on future demand forecasting and capacity allocation / investment decisions is being managed.
 

Nicholas Lewis

Established Member
Joined
9 Aug 2019
Messages
6,161
Location
Surrey
Cheaper to go LNER via Doncaster rather than a direct service from St Pancras to Sheffield. I have family just outside Sheffield and this is their preferred option.
Something wrong with our railway where we drive up demand on a route already stretched at the expense of a more direct route.
 

Gaelan

Member
Joined
3 Apr 2023
Messages
813
Location
St Andrews
That surely must be an incorrect allocation of London Terminals tickets or another reason, as no one would intentionally make that journey when it can be done direct over the road from St Pancras?
I noticed that one too, and asked ORR; they said it's because (as the pdf notes) they use a 2002 "London Area Transport Survey" to allocate London Terminals tickets. (Did 2002 have a direct Kings Cross to Sheffield service, or sufficiently poor MML service that the majority of passengers would take a connection at an ECML station?)

In any case, they say they have a new methodology they're using for the 2022-23 data, which hasn't been published yet.
Cheaper to go LNER via Doncaster rather than a direct service from St Pancras to Sheffield. I have family just outside Sheffield and this is their preferred option.
Certainly passengers will do this, but the data suggests 80% of Sheffield-London passengers opt for the indirect route, which strains credulity.

I'm guessing split tickets would not show up the origin and ultimate end destination in this dataset.
That's correct, yes. The methodology document notes this as an issue but they don't seem to be making any effort to correct for it.

(They do correct for certain similar issues, for example by looking at what ticket office sold a season ticket in areas where season tickets are routinely issued naming a further-away station with the same price. That wouldn't work with split tickets, though, as they're largely sold online. Maybe Trainline/Trainsplit would be willing to share anonymized data?)

I think I might have done something slightly wrong in downloading the data from Gaelan's Google Drive as Barking – West Ham isn't first for me
Hm, entirely possible I'm doing something wrong as well.

Do you have Waterloo - Surbiton first by any chance? If so you're sorting alphabetically instead of numerically - note the top nine flows are the only ones with 7 figures.
 
Last edited:

A S Leib

Member
Joined
9 Sep 2018
Messages
787
Do you have Waterloo - Surbiton first by any chance? If so you're sorting alphabetically instead of numerically - note the top nine flows are the only ones with 7 figures.
My top ones are East Croydon – Victoria (both directions), Highbury & Islington – Stratford (that direction only), Milton Keynes Central – Euston and back, London Bridge – East Croydon and back and Clapham Junction – Waterloo and back.
 

Gaelan

Member
Joined
3 Apr 2023
Messages
813
Location
St Andrews
My top ones are East Croydon – Victoria (both directions), Highbury & Islington – Stratford (that direction only), Milton Keynes Central – Euston and back, London Bridge – East Croydon and back and Clapham Junction – Waterloo and back.
Weird; absolutely no clue then. Just tried downloading the csv back from Google Drive and it matches the original on my computer exactly.
 

deltic

Established Member
Joined
8 Feb 2010
Messages
3,233
Appears to have largely escaped people here's notice, but the ORR now publishes the full origin-destination matrix on the Rail Data Marketplace. This is a dataset that provides an estimate of the passenger count on every station pair.

Downloading it directly from the RDM requires account signup, including a (relatively quick) human approval step. The ODM data itself is licensed under the OGL, however, so I've uploaded it to my Google Drive for the convenience of anyone wanting to take a look. It's a 16MB zip file that decompresses into a 100MB csv file.

The top 10 passenger flows are all in London, or London commuter routes:
FromToPassengers/Year
BarkingWest Ham2137040
London VictoriaEast Croydon1445091
Stratford (London)Highbury and Islington1223839
London EustonMilton Keynes Central1222244
East CroydonLondon Bridge1179265
London WaterlooClapham Junction1138866
Stratford (London)Ilford1092914
London VictoriaBromley South1091081
Stratford (London)Romford1061800
London WaterlooSurbiton999021
Note all flows have the same numbers in each direction; I'm not sure off hand if the numbers count single or return trips.

Also potentially of interest are two pdfs detailing the methodology:
Thanks for this - It is fascinating that the ORR, DfT and the rail industry having refused to release this information for years quietly released it into the public domain. The cynic in me wonders if this is a way of justifying service cuts when people see how few people are actually travelling on some routes.
 

Gaelan

Member
Joined
3 Apr 2023
Messages
813
Location
St Andrews
Weird; absolutely no clue then. Just tried downloading the csv back from Google Drive and it matches the original on my computer exactly.
Ah! Are you using Excel by any chance? The file has 1,348,219 rows, which is over Excel's limit of 1,048,576 - which would explain why you're randomly losing some entries. I'd think Excel would warn you about this, but I'm not actually sure.

Specifically, you'd lose any flow where the first station has an NLC over 6870 (and some of the flows from 6869).
 

Peterthegreat

Established Member
Joined
22 Feb 2021
Messages
1,338
Location
South Yorkshire
Ah! Are you using Excel by any chance? The file has 1,348,219 rows, which is over Excel's limit of 1,048,576 - which would explain why you're randomly losing some entries. I'd think Excel would warn you about this, but I'm not actually sure.
I downloaded to Excel and it warned me.
 

Gaelan

Member
Joined
3 Apr 2023
Messages
813
Location
St Andrews
Here's a version sorted by passenger count, so any rows you lose will be flows with 1 or 2 passengers a year.

Somehow there's 127,183 journeys from London Paddington to... London Paddington and 111,405 for Finsbury Park, similar for Luton Airport Parkway and others. All of the stations I've noticed that for so far are within the contactless PAYG area.
Oh, I also asked ORR about these; apparently most commonly caused by incomplete contactless journeys, but also something related to refunds - not entirely sure on the details there. There are a few outside Oysterland but it's much rarer.
 
Last edited:

PGAT

Established Member
Joined
13 Apr 2022
Messages
1,483
Location
Selhurst
Why are there 182,057 journeys from Clapham Junction to Clapham Junction?
 

A S Leib

Member
Joined
9 Sep 2018
Messages
787
Ah! Are you using Excel by any chance? The file has 1,348,219 rows, which is over Excel's limit of 1,048,576 - which would explain why you're randomly losing some entries. I'd think Excel would warn you about this, but I'm not actually sure.

Specifically, you'd lose any flow where the first station has an NLC over 6870 (and some of the flows from 6869).
Yes, it did warn me. At least there's a simple explanation.
 

b0b

Established Member
Joined
25 Jan 2010
Messages
1,331
fascinating data - I (am sure others too) have bought a "1" count ticket.
 

Gaelan

Member
Joined
3 Apr 2023
Messages
813
Location
St Andrews
Is the data compatible with Google Sheets?
Looks like Google Sheets has a limit of 10 million cells; the spreadsheet as provided has 10 columns and over a million records, so you'd lose some. If you use the sorted version linked in the OP, the ones you lose will be very rare journeys, assuming Google Sheets removes rows from the end like Excel does.

It'd be possible with a little bit of work to produce a version with some non-essential columns removed to make everything fit.
 

A S Leib

Member
Joined
9 Sep 2018
Messages
787
fascinating data - I (am sure others too) have bought a "1" count ticket.
The fact that I've excluded stations with NLCs above 6869 doesn't help (e.g. I can't see how many people have bought a Buckenham – Berney Arms ticket; I'd suspect more than one due to the novelty), but I can still find a lot of interesting information. For example, there's apparently 199 stations to which one ticket has been bought from Hemel Hempstead, including Aylesbury Vale Parkway (almost anybody would do that journey by bus), Sudbury Hill Harrow, and 23 Scottish stations. Rickmansworth's main destination by national rail is apparently Harrow-on-the-Hill, not Marylebone – although admittedly I'd guess that 50%+ of passengers stay on east of Baker Street, so that would push things in favour of the Met – and Burneside to Newcastle only had one journey, which seems extremely low considering relative proximity.
 

greatkingrat

Established Member
Joined
20 Jan 2011
Messages
2,784
There are 103 stations with no journeys to any London Terminal recorded. As well as the usual suspects such as Teeside Airport, Reddish South and Altnabreac, it also includes a lot of Glasgow suburban stations, which seems to be a bug in the data, as I find it hard to believe that of the 103k people who used e.g. Hillington West, not a single one of them was travelling to London.

There are 21 stations with just one journey to London recorded

Alness, Ardgay, Bank Hall, Barrhill, Braystones, Causeland, Cynghordy, Fearn, Golf Street, Hoscar, Kirkhill, Lairg, Llanbedr, Mosspark, Mount Vernon, Penychain, Pilning, Rannoch, St Budeaux Ferry Road, Thorntonhall, Thorpe Culvert.

The stations with more than 2000 different journeys made were
Manchester Piccadilly (2377), Birmingham New Street (2274), Leeds (2224), Manchester Victoria (2182), Liverpool Lime Street (2178), Edinburgh Waverley (2168), Liverpool Central (2151), York (2137), Newcastle (2125), Sheffield (2120), Manchester Oxford Road (2105), Bristol Temple Meads (2089), Nottingham (2079), Glasgow Central (2041), Birmingham Moor Street (2037), Oxford (2001)
 

Gaelan

Member
Joined
3 Apr 2023
Messages
813
Location
St Andrews
As well as the usual suspects such as Teeside Airport, Reddish South and Altnabreac, it also includes a lot of Glasgow suburban stations, which seems to be a bug in the data, as I find it hard to believe that of the 103k people who used e.g. Hillington West, not a single one of them was travelling to London.
Hm, I wonder if Glaswegians tend not to think to buy a through ticket to London, instead buying a Glasgow Central - London in advance then buying a walk-up into Glasgow as they always do? That, combined with season tickets into Glasgow, would explain some of it - but zero still seems low!
 

Mainline421

Member
Joined
7 May 2013
Messages
509
Location
Aberystwyth
I'm not sure off hand if the numbers count single or return trips.
As I'm repsonsible for rows 922584 and 922767 I can say it appears return tickets count as 1. Really suprising how low many flows are though, I guess through tickets aren't that popular...

Also a shame ferry and bus link aren't included
 

NorthOxonian

Established Member
Associate Staff
Buses & Coaches
Joined
5 Jul 2018
Messages
1,490
Location
Oxford/Newcastle
That is quite the treasure trove of data, and it's really fascinating to look at all the patterns. There are a few unusual quirks in the data (for instance very large flows between Lanarkshire and Stow - I find it hard to believe that Airdrie was a major destination with over 1,000 journeys for what is essentially a local Borders halt; third only to Edinburgh Waverley and Galashiels). And you can also quite clearly see the impact of split ticketing; Oxford to Birmingham/Coventry is quite a significant flow but both have rather modest numbers of passengers due to the large numbers splitting at Banbury (which similarly ends up with numbers that seem a little inflated).
 

pokemonsuper9

Established Member
Joined
20 Dec 2022
Messages
1,735
Location
Greater Manchester
I wonder how many people can find journeys here that they are the only person/group that made.
I've found 2 within that time period.
I look forwards to future versions, where I think I might have some more unique journeys.
 

Killingworth

Established Member
Joined
30 May 2018
Messages
4,914
Location
Sheffield
A big thank you from me to Gaelan for drawing our attention to this very useful resource.

I have reservations about all the obscure single supposed journeys between masses of places like Chathill and Widdrington, however the data for my own local station seems broadly in line with what I've strongly suspected, but have never previously been able to prove.
 

Springs Branch

Established Member
Joined
7 Nov 2013
Messages
1,434
Location
Where my keyboard has no £ key
There are 21 stations with just one journey to London recorded

Alness, Ardgay, Bank Hall, Barrhill, Braystones, Causeland, Cynghordy, Fearn, Golf Street, Hoscar, Kirkhill, Lairg, Llanbedr, Mosspark, Mount Vernon, Penychain, Pilning, Rannoch, St Budeaux Ferry Road, Thorntonhall, Thorpe Culvert.
I was surprised that any of the Merseyrail stations fell into the same category as Braystones, Golf Street or Pilning, even if BAH is one of the quieter shacks on the electric network.

Maybe that's a reflection of the allegedly common conversation at Merseyrail ticket windows:-
Passenger (at Bank Hall): "Return to London please."
Ticket Seller: "Here's a single to Liverpool. Book again at Lime Street, lad"
 

Top