• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

Big data and data anyalitics - does the rail industry make use of this?

Status
Not open for further replies.

infobleep

Veteran Member
Joined
27 Feb 2011
Messages
12,724
The following is a slightly nerdy techie subject. Apologies if I've not explained anything in plain English. I did try to.

I went to a conference today on data and how it can be used for the public good. Everything from using data to better target help where it is needed most to transparency and privacy of data.

Do Network Rail and / or the TOCs / RDG use data and data mining to improve their services and processes? For example to workout where best to target resources or what issues are likely to lead to delays and how best to mitigate against them?

With regards to implementing DOO, what kind of data analysis would they have done on the introduction of such a thing? And would they have also studied the issues that might occur from its introduction. Note I don't want to discuss the merits of DOO, which I personally disagree with, just thd data analysis that may or may not have been done.

The conference also talked about how one can capture information and can make use of it. How is railway data captured? For exanole do drivers still fill out handwritten forms on why their trains were delayed or is it now a digital form?

The conference looked at the sharing of data and the combining of it to produce better knowledge and outcomes across multiple organisations, as well as trying to stop silos. Obviously it needs to respect GDPR but it is possible to do that.

Is data shared much in the rail industry?

Could artificial intelligence be use at all in the rail industry?

I find the way data and it's uses is heading, very interesting. I say data and not digital data, as even paper records can play a part.
 
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

sheff1

Established Member
Joined
24 Dec 2009
Messages
5,496
Location
Sheffield
I can't answer your question, but can say that I went to similar conference around ~10 years ago and the senior managers present were enthused. When I left the organisation a few years later the way the organisation's data was collected, held and used remained virtually unaltered and the same type of under-uniformed decisions were being made. This may say more about the organisation than the theory !
 

DarloRich

Veteran Member
Joined
12 Oct 2010
Messages
29,366
Location
Fenny Stratford
Do Network Rail and / or the TOCs / RDG use data and data mining to improve their services and processes? For example to workout where best to target resources or what issues are likely to lead to delays and how best to mitigate against them?

No idea what big data or data mining is but there is railway data captured on everything and reported on. EVERYTHING.
 

Jonfun

Established Member
Joined
16 Mar 2007
Messages
1,254
Location
North West
Aye, data's used extensively across the industry - a lot is still paper based but that is changing, slowly. Data sharing happens in some areas but others are heavily tied down with "commercial confidentiality" and GDPR.

As an example, safety statistics that you'll see in RSSB reports etc are driven by an industry database which each train operator/NR records every reported safety incident in. The data as a whole can be used to get a more accurate picture of issues nationally as opposed to trying to piece together individual train operator stats.

As an industry the railways don't generally embrace change; in many (most?) parts of the industry there aren't people who's job it is to make things work better and change with the times, it's reliant on the right people being in the right roles with the ability to drive change over and above what they *should* be doing in their normal day-to-day jobs.
 

JB_B

Established Member
Joined
27 Dec 2013
Messages
1,418
No idea what big data or data mining is but there is railway data captured on everything and reported on. EVERYTHING.

Everything ( in caps !) is a big claim. That's certainly false unless you're taking a remarkably narrow definition of 'railway data'.
 

infobleep

Veteran Member
Joined
27 Feb 2011
Messages
12,724
I can't answer your question, but can say that I went to similar conference around ~10 years ago and the senior managers present were enthused. When I left the organisation a few years later the way the organisation's data was collected, held and used remained virtually unaltered and the same type of under-uniformed decisions were being made. This may say more about the organisation than the theory !
Someone made the same comment today. I can't answer if anything will change in the future but I'd hope something might. One mustn't give up hope at least or at least I'm trying not to.

Maybe if you've experienced it multiple times, that's when you lose enthusiasm
 

infobleep

Veteran Member
Joined
27 Feb 2011
Messages
12,724
No idea what big data or data mining is but there is railway data captured on everything and reported on. EVERYTHING.
Big data is where you collect masses and masses of data. To much for someone to process. So you use computers to process it. You might write programs to mine the data for information. Something you couldn't do so easily by yourself, as the amount of data is so huge.
 

JB_B

Established Member
Joined
27 Dec 2013
Messages
1,418
I bow to your expertise. Perhaps you could list some of the data not captured.

I have no particular expertise but innumerable aspects of the passenger's experience of the railway (rightly) go unrecorded by the industry. You were claiming that data is collected on 'everything' - that is a very strong claim.

Just one example: cases where staff use their discretion not to apply the letter of the ticket rules where they encounter a passenger appears to be vulnerable.

(And I'm absolutely not suggesting that this should be recorded - quite the contrary.)
 

deltic

Established Member
Joined
8 Feb 2010
Messages
3,246
The simple answer is yes.

Most obvious is usage data which is used to determine pricing of advance tickets, revise services, determine best use of rolling stock to meet demand, fraud detection through to forecasting asset failure before it happens. TfL also use mobile phone data to track people through the network and to develop transport models which are used to predict future demand etc.
 

infobleep

Veteran Member
Joined
27 Feb 2011
Messages
12,724
The simple answer is yes.

Most obvious is usage data which is used to determine pricing of advance tickets, revise services, determine best use of rolling stock to meet demand, fraud detection through to forecasting asset failure before it happens. TfL also use mobile phone data to track people through the network and to develop transport models which are used to predict future demand etc.
Obviously reasons for delays are recorded but are these published in any great detail.

I may be wrong about this but I feel trains might run to time more if they didn't need to pick up passengers on route. So are stats such as passengers boarding a train delaying its departure recorded? I suspect they are and if they are, are they publicised much. You hear about infrastructure faults delaying trains but I don't often hear about trains being delayed due to regular large number of passengers boarding trains.

I guess those delays are usually more minor than infrastructure faults.
 

Skymonster

Established Member
Joined
7 Feb 2012
Messages
1,764
No idea what big data or data mining is but there is railway data captured on everything and reported on. EVERYTHING.
The railway clearly does not capture data on everything. Let me demonstrate this with one very simple example: From my local station, I can travel to Birmingham New Street via two different routes, using two different TOCs. As trains on both routes leave within a few minutes of each other, the railway has no idea which route I actually take to reach Birmingham or which TOC I use (which is a reason why ORCATS exists, to apportion revenue). Similarly, as my departure station is not barriered (and at New Street the barriers are often open), the railway often has no idea of the times of the trains - or in some cases even the day, especially for the return) - I use my anytime / any permitted ticket on. Actually, the railway often doesn't even know who has bought the ticket, when the person last bought a ticket, how frequently they travel, where they travel to regularly, what sort of ticket they typically buy, whether they split or break their journeys, etc, etc.

So where does big data fit into this? As an example, if the railway could capture the use (departure time, route, changes, arrival time, return day / time, etc) of every leg of every journey made by a customer it could do so much to improve analysis of true demand - and plan accordingly. If it could accurately record which tickets were being used on every train between every station, again it could greatly enhance its understanding of loading. And then it could more accurately start to model peak / off peak flows and set fares accordingly. If it could record every ticket inspection made by guards on every train, it could so much better analyse where ticketless travel was likely, and increase targeting of fare evaders. If it could more reliably link every ticket to an individual customer (and it had customer details / demographics), it could use that to target each customer and start to introduce more reliable measures to influence customer behaviour. For example, going back to my journey to Birmingam, if the railway could accurately track the loading on every train, it could advise me (if it knew when I arrived at the station) which route / train I should take if I wanted to ensure I got a seat. What about using mobile data / beacons to track movements that would enable optics action of flows - not just across the network but within stations - or better manage queuing times at ticket offices.

Big data is about capturing every interaction, and using that to make decisions on how to run the business (big data isn't just about customer interaction - it can also be about things like maintenance of track and stock, leading to improved predictive maintenance and reduction in failures). The problem is that in many cases - especially those related to customer interaction - the railway currently doesn't have a mechanism to collect the data. To do that, it would need to have a much more closed system such as that enjoyed by airlines. And big data isn't just about printing out reams of paper reports - it's about using analytics to draw senior staff's attention to ongoing issues that need resolution or opportunities that could be exploited.

Sure, the railway already does many of the things I mention. But lacking the true detail, a lot of what it does is based on imprecise or empirical measures. Because in many cases the railway does not currently have mechanisms to collect big data, or the data sources are not integrated, or in some cases they too anonymous (a whole different situation to [say] a utility company which knows what each of its customers consumes, and with the increasing prevalence of smart meters when they consume it). Or the big picture isn't available because 'rival' TOCs don't share data.

Make no mistake though, in terms of customer interaction smart card pay-as-you-go ticketing will gradually start to change things. Once you can link travel to a card (and thus by definition a customer) you can do much more in terms of collecting data and starting to analyse it. The TOCs may say they want it to cut costs, simplify purchases, or reduce ticketless travel. But it will open up an entire new world of opportunity to start collecting and analysing 'big data'. Already airlines and airports are using beacons and mobile technology to understand and manage bottlenecks - maybe that will come to the railway one day? Some of the results will benefit the consumer, but a lot of it will benefit the TOCs and the network as a whole.
 

DarloRich

Veteran Member
Joined
12 Oct 2010
Messages
29,366
Location
Fenny Stratford
Goodness me. Another person with little grasp of humour.

For the benefit of the chornically dull and for the avoidance of doubt: OF COURSE the railway does not store data on everything. It merely seems that way when trying to interpret reports and performance statistics for many different parts of the railway as some of us do on a regualr basis.

I will also point out that the railway goes well beyond passenger usage. Many of you miss that.
 

ASharpe

Member
Joined
4 Feb 2013
Messages
1,001
Location
West Yorkshire
As someone whose day job is using big data in logistics can I point out that it is not about using every bit of data you can lay your hands on.

The key part is working out which data is useful, a 95% confidence interval can easily lead to one in 20 of your parameters leading you up the wrong path. I'm yet to be convinced that artificial intelligence can do this step, you will end up with the computer suggesting random correlations.

Go back to the basics: come up with a hypothesis and then use big data to prove it (or reject your null hypothesis). Too many people get it the wrong way around and because they have had a computer do the leg work think it must be right.

And when you finally have your theory let other humans tear it to shreds. It it survives you're on to something.

Saying all that, the fact that deliveries to addresses in the uk mainland take an extra 9.6 seconds for every degree south you go makes so much sense.
 

krus_aragon

Established Member
Joined
10 Jun 2009
Messages
6,051
Location
North Wales
Some examples of data gathering (off the top of my head):

The National Measurement Train (Flying Banana) and its ilk gather data on railhead quality, radio reception, etc for Network Rail. Newer passenger trains measure several of these variables for NR too.

New stock (such as the 80x) have the ability to sense occupied/unoccupied seats and count passengers on and off the train (iirc), and record this for later use.

Train Management systems, which monitor on-board systems, can be downloaded at the depot to identify faults. The Pendolino's TMS phones home in advance to let engineers plan what work to do on the fleet overnight.
 

deltic

Established Member
Joined
8 Feb 2010
Messages
3,246
Obviously reasons for delays are recorded but are these published in any great detail.

I may be wrong about this but I feel trains might run to time more if they didn't need to pick up passengers on route. So are stats such as passengers boarding a train delaying its departure recorded? I suspect they are and if they are, are they publicised much. You hear about infrastructure faults delaying trains but I don't often hear about trains being delayed due to regular large number of passengers boarding trains.

I guess those delays are usually more minor than infrastructure faults.

The south western rail re-timetabling was partially based on longer dwell times at stations due to larger volumes of passengers travelling - the issue is a known problem
 

underbank

Established Member
Joined
26 Jan 2013
Messages
1,486
Location
North West England
I bow to your expertise. Perhaps you could list some of the data not captured.

Number of fare evaders that aren't caught/challenged
Passenger numbers on every service/route
Proper/accurate payment of fares to the actual TOC on which the passenger travelled
 

infobleep

Veteran Member
Joined
27 Feb 2011
Messages
12,724
The south western rail re-timetabling was partially based on longer dwell times at stations due to larger volumes of passengers travelling - the issue is a known problem
But if journey times are not being sped up and more trains are running, how can they increase dwell times? I'm aware in future they will be sped up with new trains but that's not the timetable that was proposed at this stage is my understanding. Is it down to the spacing between trains for differing routes having a better spread, which computers can obviously model.

I'm not suggesting you trust the computer findings outright but they give you an answer to at least consider.

Collecting passenger numbers on the new trains is good, especially as it can tlwl you on the train itself. On the 700 series though I found one of the symbol colors used misleading. Perhaps unintentionally. So data is good but you do need to communicate it in away that users understand.

WiFi connections on trains is another useful stat. Of course it's only based on those who actually bother to connect and on trains that have WiFi.

I wonder how much modeling TOCs do on delay repay refusal rates and numbers of passengers who try again and are then successful. GTR must have a lot of delay repay dtat they could model.

If they could track ticket gatelines refusals, they could track what kind of tickets are being used at stations where the staff say they are not valid. Some of these will not be valid and others will be. I'm not saying the gathering of such data would be easy, if ever possible but it might provide an interesting insight.

Airlines have it easier in some respects as ones ticket is always scanned. With railways this isn't so because so many stations are unbarried and it's not feasible to have staff at every little station.

Obviously modeling has been done on leaf fall and the best date to start a leaf fall timetable. Leaves might fall before or later but you need an average so you can plan ahead.
 

Envoy

Established Member
Joined
29 Aug 2014
Messages
2,497
Well, they surely can’t figure out who is going from a to b due to the large numbers of people who are having to buy split tickets to get their fares down.
 

DarloRich

Veteran Member
Joined
12 Oct 2010
Messages
29,366
Location
Fenny Stratford
As someone whose day job is using big data in logistics can I point out that it is not about using every bit of data you can lay your hands on.

The key part is working out which data is useful, a 95% confidence interval can easily lead to one in 20 of your parameters leading you up the wrong path. I'm yet to be convinced that artificial intelligence can do this step, you will end up with the computer suggesting random correlations.

Most sensible point so far. Missed by many.
 

HowardGWR

Established Member
Joined
30 Jan 2013
Messages
4,983
Origin and destination data are much better gathered for rail than those for road are, or can be. However, train data only tell which station was the origin and which station the destination, and then not always, as not every station has scanners, for instance. Specific surveys were carried out, in order to find out where folk went, after alighting at the main London stations. Such data were clearly important for various TfL analyses.
On road, site surveys are sometimes carried out by interview, but clearly, there are logistical problems with these, especially if motorways are involved (!). Number plate recognition can help here, but whole trip data are still unknown in that latter case. Roadside interviews will only ever provide a snapshot.

Also, reason for travel is interesting, since Cost / Benefit outcomes rely on whether the trip is for business, commuting, or leisure, as different time saving benefits are calculated for each. I imagine that mobile technology will provide the possibility of better transport data coming forward, but there will be discussions about privacy invasion, I'm sure.
 

infobleep

Veteran Member
Joined
27 Feb 2011
Messages
12,724
Most sensible point so far. Missed by many.
I do agree not all data is useful. You also need to accept the fact you might get an answer is that turns out to be wrong and you have to try again.

When I was programming at university, I said to my tutor, I keep finding things I cN turn into a separate but reusable program. He said yes you are likely to get that when building something experimental.

Paasenger numbers fascinate me too. In the summer one morning I got on the 7:58 East Croydon to Victoria. It had plenty of space and I got a seat. The next week it was rammed you'd just about board. Both weeks were in the school holidays.

They can't all have been on holiday the previous week.

I also use to find with trains that some days they would be busier than others. It was as if groups of passengers were collectively delayed elsewhere. Perhaps it was road traffic that delayed them.

Edit: on the last point I maybe misremembering and it might be buses I'm thinking of.
 
Last edited:

WatcherZero

Established Member
Joined
25 Feb 2010
Messages
10,272
As already mentioned the Rail industry was one of the first after the Airline industry to get heavily involved in demand price modelling.

Well, they surely can’t figure out who is going from a to b due to the large numbers of people who are having to buy split tickets to get their fares down.
Usually they do know but alternate routes and shorter sections are priced by different companies to that pricing the whole journey, and often designed to reflect more local conditions such as commuter flows or maintaining community links rather than long distance journeys your using them for.
 

Ianigsy

Member
Joined
12 May 2015
Messages
1,122
funnily enough I know somebody working on this right now

Could be a big job....

I made the point on another thread that any suppressed demand for travel from the Calder Valley to Huddersfield won't be recorded as somebody who commutes from, say, Todmorden using an MCard will only be registered as exiting the barriers at Huddersfield - their origin and destination on the return journey are missed because there's no touch in/out system.
 

underbank

Established Member
Joined
26 Jan 2013
Messages
1,486
Location
North West England
I made the point on another thread that any suppressed demand for travel from the Calder Valley to Huddersfield won't be recorded as somebody who commutes from, say, Todmorden using an MCard will only be registered as exiting the barriers at Huddersfield - their origin and destination on the return journey are missed because there's no touch in/out system.

Likewise with the station usage statistics, where some unmanned stations show ridiculously low passenger numbers said to be caused by guards issuing tickets as if from an earlier station where they know the price is the same, rather than changing their terminal at every stop.
 

quartile

Member
Joined
17 Oct 2018
Messages
27
Yes google Hacktrain for a recent example of a railway Hackathon that took place at Innotrans.
TfL has made much of its data free and open to anyone who wants to use it https://tfl.gov.uk/info-for/open-data-users/
Many of the more modern rolling stocks and signaling systems contain data loggers that can mined. For example to analyse for real dwell times at all stations across the network.
Smartcard ticketing like oyster and gated stations help a lot with getting accurate passenger usage data. An alternative is tracking peoples mobile phones. Telefonica (o2) has a team that sells anonymous mobile phone data to transport and other sectors.
 
Status
Not open for further replies.

Top