timetableworld.com

shawmat · 15 May 2020

I'm the developer of timetableworld.com. The website has been ticking away since 2008 – and from 2016 in the background on my home server. It turns out that a lot of people are visiting it – 40,000 per month. Who knew? I didn’t; I didn’t look.

A question for the Forum: Would there be sufficient appetite for a major revamp?

The original aim was to connect historical mapping of railways to historical timetables. My vision for the site changed as reality intruded but I think the site could be repurposed as THE main (free) archive for historical public transport timetables (rail, bus, metro, tram etc). A quick Google search for “train timetable collectors” brings timetableworld.com to the top – despite zero promo effort from me.

Maybe it’d be worth sharing how we got here.

History of timetableworld.com

Google Maps and Google Earth were fairly new in 2008, and I was wowed by the experience of being able to follow former rail-tracks on the ground (via a satellite image). At the same time, exploring timetables gave me the problem of – where is this place? What was the story behind the services in the timetable?

It seemed like a simple indexing problem to connect the two.

I was a very IT-literate analyst for a city bank, part of a small front-office team that provided a portfolio analysis tool to pension funds. I programmed all day. The emerging open-source software movement around 2005-2007 sounded interesting and, when made redundant in the credit crunch of 2007/8, I took the opportunity to learn some new skills.

timetableworld.com was the result. It looks stale today but the underlying technology - developed by volunteers as grad projects – still stands up rather well.

It turned out to be way-too-much work for one person – more of which later - but once I was back into employment other priorities took over.

What’s involved?

Publishing, indexing and geolocating timetable data involves several separate disciplines. Trying to do all these myself was over-ambitious but could be OK for a small team to do.

Scanning whole timetable books. I built my own book scanner for that, using two basic cameras. You can build you own too using https://www.diybookscanner.org/ but, nowadays, a personal high-speed book scanner is emerging as a consumer product

Cleaning the scanned images - to whiten them, straighten them and remove speckles – is a software process, a little slow but improvable
Using OCR (optical character recognition) to read the index pages
Detailed data processing to connect images to an index of stations, and to geolocate the stations for use on a map
Image processing to make image retrieval fast, however far the user zoomed or panned
A website to present the results and take user interactions.

On top of that, I was negotiating with collectors to make their timetables collections available for scanning.

Working alone was unsustainable as the number of books went from dozens to hundreds. I still have dozens of partly complete timetable projects to get over the line.

Update

IT innovations continues to move quickly – very! Since retiring in 2016, I’ve had the time to get back up-to-date and, I think, stay there. A few interesting-ish outcomes for my home town of Maidenhead are:

https://atamuseum.org/collection/ (requires free registration to get the full experience)

https://collection.maidenheadheritage.org.uk/side-by-side-map-maidenhead.php - historical maps

https://maidenheadac.org – a simple Wordpress example

https://collection.maidenheadheritage.org.uk/then-and-now-demo.php - images

https://mnf.org.uk/he-listings-maidenhead.php - interactive maps using open data

Relaunching

Is it possible to assemble a small team to try again?

I’m happy to manage the whole work if there are volunteers willing to:

Offer their timetable collections for scanning
Scan timetables page by page with great care
Help with the post-cleaning and OCR steps.

I would do the website and database work. But, even then, if others wish to pitch in, that suits me. I’m happy to mentor youngsters wanting to get into IT careers.

Your thoughts

Is it worth doing?
Can you help, and commit sufficient time and effort?
Would you suggest taking the project in a different direction?

I’m all ears.

RailUK Forums

hexagon789 · 15 May 2020

shawmat said:
I'm the developer of timetableworld.com. The website has been ticking away since 2008 – and from 2016 in the background on my home server. It turns out that a lot of people are visiting it – 40,000 per month. Who knew? I didn’t; I didn’t look.

A question for the Forum: Would there be sufficient appetite for a major revamp?

The original aim was to connect historical mapping of railways to historical timetables. My vision for the site changed as reality intruded but I think the site could be repurposed as THE main (free) archive for historical public transport timetables (rail, bus, metro, tram etc). A quick Google search for “train timetable collectors” brings timetableworld.com to the top – despite zero promo effort from me.

View attachment 77936

Maybe it’d be worth sharing how we got here.

History of timetableworld.com

Google Maps and Google Earth were fairly new in 2008, and I was wowed by the experience of being able to follow former rail-tracks on the ground (via a satellite image). At the same time, exploring timetables gave me the problem of – where is this place? What was the story behind the services in the timetable?

It seemed like a simple indexing problem to connect the two.

I was a very IT-literate analyst for a city bank, part of a small front-office team that provided a portfolio analysis tool to pension funds. I programmed all day. The emerging open-source software movement around 2005-2007 sounded interesting and, when made redundant in the credit crunch of 2007/8, I took the opportunity to learn some new skills.

timetableworld.com was the result. It looks stale today but the underlying technology - developed by volunteers as grad projects – still stands up rather well.

It turned out to be way-too-much work for one person – more of which later - but once I was back into employment other priorities took over.

What’s involved?

Publishing, indexing and geolocating timetable data involves several separate disciplines. Trying to do all these myself was over-ambitious but could be OK for a small team to do.

Scanning whole timetable books. I built my own book scanner for that, using two basic cameras. You can build you own too using https://www.diybookscanner.org/ but, nowadays, a personal high-speed book scanner is emerging as a consumer product

View attachment 77935

Cleaning the scanned images - to whiten them, straighten them and remove speckles – is a software process, a little slow but improvable

Using OCR (optical character recognition) to read the index pages

Detailed data processing to connect images to an index of stations, and to geolocate the stations for use on a map

Image processing to make image retrieval fast, however far the user zoomed or panned

A website to present the results and take user interactions.

On top of that, I was negotiating with collectors to make their timetables collections available for scanning.

Working alone was unsustainable as the number of books went from dozens to hundreds. I still have dozens of partly complete timetable projects to get over the line.

Update

IT innovations continues to move quickly – very! Since retiring in 2016, I’ve had the time to get back up-to-date and, I think, stay there. A few interesting-ish outcomes for my home town of Maidenhead are:

https://atamuseum.org/collection/ (requires free registration to get the full experience)

https://collection.maidenheadheritage.org.uk/side-by-side-map-maidenhead.php - historical maps

https://maidenheadac.org – a simple Wordpress example

https://collection.maidenheadheritage.org.uk/then-and-now-demo.php - images

https://mnf.org.uk/he-listings-maidenhead.php - interactive maps using open data

Relaunching

Is it possible to assemble a small team to try again?

I’m happy to manage the whole work if there are volunteers willing to:

Offer their timetable collections for scanning

Scan timetables page by page with great care

Help with the post-cleaning and OCR steps.

I would do the website and database work. But, even then, if others wish to pitch in, that suits me. I’m happy to mentor youngsters wanting to get into IT careers.

Your thoughts

Is it worth doing?

Can you help, and commit sufficient time and effort?

Would you suggest taking the project in a different direction?

I’m all ears.

If you have more material to add then I would say that a revamp could be very with while in view of the traffic the site gets, certainly but I always found the site easy to navigate I just kept hoping more timetables would be added!

shawmat · 15 May 2020

I need some willing workers to help me. That lesson has been learned! I'm pleased you find a 2008 site still OK to use, but a revamp could make it a lot better.

the sniper · 15 May 2020

Glad you've turned up here so I can thank you for your efforts! While the site may look a little outdated, I've always found it to work really well. I wish the majority of websites associated with magazines worked so well...

shawmat · 15 May 2020

Thank you!

the sniper said:
Glad you've turned up here so I can thank you for your efforts! While the site may look a little outdated, I've always found it to work really well. I wish the majority of websites associated with magazines worked so well...

Thank you!

kentuckytony · 16 May 2020

Best of luck - seems very worthwhile.

shawmat · 16 May 2020

kentuckytony said:
Best of luck - seems very worthwhile.

Thank you. For the US, I have Official Guides for 1916 and 1923 that remain to be completed. The two images show:

a raw scan from 1916. The books can be very fragile.
the cleaned result.

The resolution on the working images is much higher than these copies for the forum.

The Official Guide is quite difficult to index. To find the services at a given railroad depot, the user has to:

Use the station index to find which railroad companies operate there
Use the railroad index to find their section in the guide

Using 2008 technology, OCR couldn't really cope with sightly-blurry print.

deltic · 16 May 2020

Echoing other comments I have found the site very useful so thank you. Knowing nothing about the technology is it possible to produce a journey planner using the scanned information - ie can you replicate modern journey planners for the pre-Beeching era. I can imagine that would generate a lot of traffic and hence revenue.

shawmat · 16 May 2020

deltic said:
Echoing other comments I have found the site very useful so thank you. Knowing nothing about the technology is it possible to produce a journey planner using the scanned information - ie can you replicate modern journey planners for the pre-Beeching era. I can imagine that would generate a lot of traffic and hence revenue.

Realistically, the work to extract the times from the scanned images would be absolutely monumental. Modern journey planners are only possible because all the underlying rostering data is already in digital form.

Secondly, it's important to preserve and develop the skill of reading a printed timetable. Being able to so is already becoming rarer because of online journey planners. The indexing in https://timetableworld.com helps people to find and jump to a page quickly within a big book, but users still need to be able to read and interpret the table.

splashoutradio · 16 May 2020

As has been mentioned many times already, your site is excellent and very impressive!
I've spent many hours browsing the site and seeing how service patterns have changed over the years.

30907 · 16 May 2020

I'm hesitant to offer help, as my flatbed scanner would break the backs of thick timetable volumes (e.g. SR 50s/60s) - is there a Book-friendly solution?

shawmat · 16 May 2020

30907 said:
I'm hesitant to offer help, as my flatbed scanner would break the backs of thick timetable volumes (e.g. SR 50s/60s) - is there a Book-friendly solution?

You're quite right to be careful with the timetables. It's not appropriate to damage them, and a flatbed scanner won't do. As mentioned above, I'm looking to assemble a small team to do parts of the work. In this case, I can lend you a non-destructive scanner which works face-up - if you're willing to do the scanning work and have a good collection to work through. Typically, it's possible to do 100 pages in about 10 minutes. If that suits, please drop me a note at [email protected], ideally with an idea of what timetables you have.

ccityplanner12 · 19 May 2020

I make several pageloads a day on timetableworld.com, using it as a source of data in writing a (heavily idealistic) timetable for post-war Britain (read "BR had they had sufficient money").
I have next to no problems with the website & can think of very little that could be done to it, & I am a pernickety perfectionist always looking for improvements to be made (one of the reasons I started this project). The only slight problem I have with it is a small number of spelling mistakes, which can cause problems because I use Ctrl+F to find locations, but other than that it's a fine site. It can be very interesting to see how the different regions & operators arranged their timetables, ranging from small two-station branches to the SR's ridiculously big "megatables" as I call them, such as 34 & 41. Another source of interest are the patterns in the numbering: The LNER simply started from 1 & worked their way up, nothing more to it than that other than the triskaidekaphobia that was mutual with the GWR; the LMS & the Western start at 50, lower numbers being reserved for summaries; the most important routes on the Western have a habit of ending in 1. These numbers have embedded themselves into my consciousness so much that I have adopted some of the more distinguished ones into my own parlance & use them to refer to the lines as if they were bus numbers. I call the WCML "the Fifty", the ECML "the One", the Settle to Carlisle "the Two O Nine" & the North Wales Coast "the Ninety-Nine", & so on & so forth. Also the cultural differences: German tables are into symbols & icons while the English express everything through immaculate prose.

Doctor Fegg · 19 May 2020

deltic said:
Echoing other comments I have found the site very useful so thank you. Knowing nothing about the technology is it possible to produce a journey planner using the scanned information - ie can you replicate modern journey planners for the pre-Beeching era. I can imagine that would generate a lot of traffic and hence revenue.

Yes, it would, and I've given some thought to that. (Plus I have a book scanner on order.

A CZUR Shine Ultra, currently under development on Indiegogo, if anyone else is similarly tempted.)

The complex bit is extracting the timetables from the PDF. This is hard but not impossible - there's a lot of research going into it right now. https://nanonets.com/blog/table-extraction-deep-learning/ is one example. But you could go a long way with a PDF-parsing library (there are good ones in Ruby and Python) and a few custom scripts.

The journey planner is easy by comparison. You basically need to munge the data into GTFS format and then load it into OpenTripPlanner. Nothing that hasn't been done a thousand times before.

shawmat · 20 May 2020

Here's the CZUR ET16 overhead scanner in my spare room.

It's a big improvement on the homemade kit I previously used for timetableworld.com. As the image is captured, lasers detect the shape of the book curvature and reprocess it out immediately, whereas it was previously a laborious software post-process.

Speed is not as good as advertised, but impressive enough. I tested with a 460 page timetable (sample scan at the bottom). The book has a really solid binding that couldn't be opened flat, but the results are outstanding. It took 20 minutes to scan the pages + 30 minutes for the software to catch up + 40 minutes to correct a small percentage of mis-scans individually. The software is pretty good at managing replacements without losing the page order. 90 mins overall is great. It is reasonably portable but needs a power supply (and one for the laptop).

The basic output is JPG. You can choose to export as PDF, etc but don't have to.

OCR is provided. It is comparable in quality to https://www.onlineocr.net, which is the best online service I know of, but still struggles with poor quality print + poor quality paper = errors. I doubt whether a PDF parser will help because the underlying originals are not digital documents, so the usual hooks are not available. I'm exploring deep learning for another project so we'll see what can be done with that.

So, step one of re-energising timetableworld.com is done.

Here's the scan at full resolution. You can choose to whiten the image if you wish.

shawmat · 20 May 2020

ccityplanner12 said:
I make several pageloads a day on timetableworld.com, using it as a source of data in writing a (heavily idealistic) timetable for post-war Britain (read "BR had they had sufficient money").
I have next to no problems with the website & can think of very little that could be done to it, & I am a pernickety perfectionist always looking for improvements to be made (one of the reasons I started this project). The only slight problem I have with it is a small number of spelling mistakes, which can cause problems because I use Ctrl+F to find locations, but other than that it's a fine site. It can be very interesting to see how the different regions & operators arranged their timetables, ranging from small two-station branches to the SR's ridiculously big "megatables" as I call them, such as 34 & 41. Another source of interest are the patterns in the numbering: The LNER simply started from 1 & worked their way up, nothing more to it than that other than the triskaidekaphobia that was mutual with the GWR; the LMS & the Western start at 50, lower numbers being reserved for summaries; the most important routes on the Western have a habit of ending in 1. These numbers have embedded themselves into my consciousness so much that I have adopted some of the more distinguished ones into my own parlance & use them to refer to the lines as if they were bus numbers. I call the WCML "the Fifty", the ECML "the One", the Settle to Carlisle "the Two O Nine" & the North Wales Coast "the Ninety-Nine", & so on & so forth. Also the cultural differences: German tables are into symbols & icons while the English express everything through immaculate prose.

Thank you. I've managed to source the complete German timetable for 1944 - it is incomplete on timetableworld.com. Some of it was fantasy, probably prepared in Berlin by people with no idea where the Eastern Front was.

S&CLER · 20 May 2020

ccityplanner12 said:
I make several pageloads a day on timetableworld.com, using it as a source of data in writing a (heavily idealistic) timetable for post-war Britain (read "BR had they had sufficient money").
I have next to no problems with the website & can think of very little that could be done to it, & I am a pernickety perfectionist always looking for improvements to be made (one of the reasons I started this project). The only slight problem I have with it is a small number of spelling mistakes, which can cause problems because I use Ctrl+F to find locations, but other than that it's a fine site. It can be very interesting to see how the different regions & operators arranged their timetables, ranging from small two-station branches to the SR's ridiculously big "megatables" as I call them, such as 34 & 41. Another source of interest are the patterns in the numbering: The LNER simply started from 1 & worked their way up, nothing more to it than that other than the triskaidekaphobia that was mutual with the GWR; the LMS & the Western start at 50, lower numbers being reserved for summaries; the most important routes on the Western have a habit of ending in 1. These numbers have embedded themselves into my consciousness so much that I have adopted some of the more distinguished ones into my own parlance & use them to refer to the lines as if they were bus numbers. I call the WCML "the Fifty", the ECML "the One", the Settle to Carlisle "the Two O Nine" & the North Wales Coast "the Ninety-Nine", & so on & so forth. Also the cultural differences: German tables are into symbols & icons while the English express everything through immaculate prose.

The UIC had rules for the numbering of tables in official timetable books, but BR never adopted them. I found them so useful on a holiday in Germany that I beguiled the time in the departure lounge of Düsseldorf airport working out how these rules would apply to the BR all-line volume. In fact they fit Britain better than they fit Germany because of the radial nature of our system. I eventually worked out a complete renumbering of the whole BR system, based on sets and subsets.

It was, in outline:

1-9 international timetables from London.
10-99 long distance summary tables, especially for cross-country (small c) journeys
100-199 South Eastern (100 was HS1 domestic services)
200-299 South Central (200 the Brighton main line)
300-399 South Western (300 the Bournemouth line)
400-499 Western
500-599 WCML and related lines (500 being the WCML)
600-699 Midland main line and related lines
700-799 ECML and related lines
800-899 East Anglia
900-999 Scotland.
1000 up North Sea, Channel, Irish Sea and Scottish ferries

Within each set, geographically related subsets were numbered so that principal lines had a round 10, rather less important lines a 5, and least significant branches were numbered within the 10s and 5s. The only set that was a tight fit was 500-599, but eventually I squeezed it all in. Unfortunately I no longer have the computer file in which I worked it all out, only a hard copy (34 pages of A4). It'll never happen now, because the idea of an all-line timetable book is dead. That's my own greatest regret of the last 20 years of railway development in this country.

I'm also fascinated to see that Shawmat's copy of the Official Guide of the Railroads is as fragile as mine. I have the February 1944 issue, historic as the all-time peak month for passenger traffic on the US railroads. Printed on cheap wartime pulp paper, it has to be kept in a plastic bag to avoid crumbling. My copy of the final Bradshaw is in much better nick and still has the late supplements.

shawmat · 20 May 2020

S&CLER said:
The UIC had rules for the numbering of tables in official timetable books, but BR never adopted them. I found them so useful on a holiday in Germany that I beguiled the time in the departure lounge of Düsseldorf airport working out how these rules would apply to the BR all-line volume. In fact they fit Britain better than they fit Germany because of the radial nature of our system. I eventually worked out a complete renumbering of the whole BR system, based on sets and subsets.

It was, in outline:

1-9 international timetables from London.
10-99 long distance summary tables, especially for cross-country (small c) journeys
100-199 South Eastern (100 was HS1 domestic services)
200-299 South Central (200 the Brighton main line)
300-399 South Western (300 the Bournemouth line)
400-499 Western
500-599 WCML and related lines (500 being the WCML)
600-699 Midland main line and related lines
700-799 ECML and related lines
800-899 East Anglia
900-999 Scotland.
1000 up Irish Sea ferries

Within each set, geographically related subsets were numbered so that principal lines had a round 10, rather less important lines a 5, and least significant branches were numbered within the 10s and 5s. The only set that was a tight fit was 500-599, but eventually I squeezed it all in. Unfortunately I no longer have the computer file in which I worked it all out, only a hard copy (34 pages of A4). It'll never happen now, because the idea of an all-line timetable book is dead. That's my own greatest regret of the last 20 years of railway development in this country.

I'm also fascinated to see that Shawmat's copy of the Official Guide of the Railroads is as fragile as mine. I have the February 1944 issue, historic as the all-time peak month for passenger traffic on the US railroads. Printed on cheap wartime pulp paper, it has to be kept in a plastic bag to avoid crumbling. My copy of the final Bradshaw is in much better nick and still has the late supplements.

I have to keep the Official Guides in plastic because they give off something unpleasant - spores I'm guessing. I became a bit wheezy when scanning them. It's not a problem with other books.

I do this so you don't have to...

ColuGav · 3 Jun 2020

Doctor Fegg said:
Yes, it would, and I've given some thought to that. (Plus I have a book scanner on order. A CZUR Shine Ultra, currently under development on Indiegogo, if anyone else is similarly tempted.)

The complex bit is extracting the timetables from the PDF. This is hard but not impossible - there's a lot of research going into it right now. https://nanonets.com/blog/table-extraction-deep-learning/ is one example. But you could go a long way with a PDF-parsing library (there are good ones in Ruby and Python) and a few custom scripts.

The journey planner is easy by comparison. You basically need to munge the data into GTFS format and then load it into OpenTripPlanner. Nothing that hasn't been done a thousand times before.

This is something that I've been having a little play with as a proof of concept to develop my python programming skills..

Currently as you mention the difficult thing is extracting the timetable data from the scanned images. Not to mention actually interpreting which trains are through services / splits.

I'm also experimenting with animations of "trains" on routes similar to raildar to show where trains are at this time back in the year the timetable was published. Difficulty here is which year to choose!

Doctor Fegg · 3 Jun 2020

ColuGav said:
Currently as you mention the difficult thing is extracting the timetable data from the scanned images. Not to mention actually interpreting which trains are through services / splits.

I've not had the time to put together any actual code, but my gut feeling is that it would be best done as a crowdsourced, semi-automated process.

In other words, OCR each timetable to get a rough version of the timetable data. Then put together a UI to enable users to do the 20% of the remaining work to get this into services with all the details (e.g. splits).

ColuGav · 3 Jun 2020

Doctor Fegg said:
I've not had the time to put together any actual code, but my gut feeling is that it would be best done as a crowdsourced, semi-automated process.

In other words, OCR each timetable to get a rough version of the timetable data. Then put together a UI to enable users to do the 20% of the remaining work to get this into services with all the details (e.g. splits).

I agree.. I'm a bit off that stage yet anyhow. I just want to get a single time table saved (I've picked the Colchester to Clacton/Walton branch from the 1947 LNER timetable) to get other things working.

Taunton · 11 Jun 2020

Let me just add that I've used Timetable World over time for all sorts of queries, and find it a very valuable resource. Thank you very much for providing it.

WesternLancer · 11 Jun 2020

An opportunity to thank you for your superb site, which I found from a link on this forum some months ago.

Sadly work and family commitments (plus limited IT skills) prevent me offering to help in practical terms. Were I retired for example I'd like to think I could help. I have a modest collection of UK timetables (and Thomas Cook continentals) from different eras post 1945 and I often enjoy looking at them but your site opens up more opportunities.

From posts on here from time to time I also suspect authors / novelists / film and TV writers etc make use of such information for plot themes / historical plots involving times / travel / location etc (another recent thread about Downton Abbey makes me think the author of that might have usefully used it).

Once again a great many thanks for your efforts.

Ref the site design - I guess keep it simple in broad terms - graphics that convey what it's about (timetable covers should generally do the trick) and clearly designed easy to navigate 'index' that helped you find what you are looking for to download would be all I'd ask for.

Best wishes

davejbur · 11 Jun 2020

An interesting thread which I've just stumbled across!

In my day job I've been working a lot recently with the open data feed for the GB rail timetable ( https://datafeeds.networkrail.co.uk/ ), in both JSON & CIF format. I've also recently been catching up on my (real, paper) reading, including some past issues of The Southern Way.

It struck me that it would be interesting to try to construct an electronic version of an old SR timetable, and then do journey planning on it to be able to compare "then & now". (Think Ryde-Freshwater, or the Atlantic Coast Express from Waterloo to Padstow.) So, as you do, I started breaking the task down in my head into the various steps necessary:
(1) Get old timetable (well that's easy, I've got SR ones from 1949, 1953 & 1960)
(2) Scan old timetable (not so easy)
(3) OCR the scan into, say a CSV or Excel (probably going to be the hardest step)
(4) Write a script to turn that into CIF/JSON
(5) Play with it

Well, that's how my thinking started, then I realised how badly most tables would scan, and that I'd probably end up typing it all in by hand. Well, that would mean really sticking to just one line or table - no way I'd ever have enough time to do more than that, let alone a whole book.

At this point I discovered GTFS - I'd never needed it before, since my day job isn't about journey planning, it's about matching train passes to the timetable after the event. GTFS should in theory be easier to create from a CSV/Excel file. So maybe that's step (4) made a little easier. Still, the notes in text at 90 degrees (e.g. info about restaurant cars, train names, etc) would probably confuse things, and I'd also spotted the problem referred to by ColuGav in https://www.railforums.co.uk/threads/timetable-demo-for-timetableworld-com.204918/ regarding working out whether a time in a column refers to a through train or a connection. So still plenty of pitfalls.

I then tried to get my head around the scanning problem - step (2). On a sudden whim I googled old railway timetables - and ended up here! So, thank you very much shawmat for the time and dedication you've put into this so far - I certainly don't think the site looks dated, nor do I have a problem with the layout or navigation.

It certainly would be interesting to try and see what could be done with, say, the SR timetable for September 1950: https://timetableworld.com/book_viewer.php?id=6 - in theory no more difficult than the transcription already carried out by armies of volunteers for family history research purposes (another interest of mine!)

Anyone else interested in starting a project in this respect? Maybe a good starting point would be everyone's favourite south coast holiday line, table 37: https://timetableworld.com/image_viewer.php?id=6&section_id=1752

(Having said that, if a summer timetable was preferred, I do have the one for May-Sep 1949!)

davejbur · 12 Jun 2020

I've just realised that the zoomable scans on timetableworld are made up of lots of tiled images. OK, I should have thought of that before, it makes for fast & seamless zooming. But it also means that trying to OCR any given table is going to go awry...
However, just to see what would happen, I used Windows 10 Snipping Tool to grab a screenshot, saved it to a png, then pointed tesseract ( https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-setup-3.02.02.exe/download ) at it. I got a rather uninspiring result:

Table 37 HAVANT and HAYLING ISLAND
| ts: — -— . = —-
I m ; I Sun_days
3 l Down ' , ,, ,, \_Ye_e_l;_l)n_yjs Cozgncncgcs hm M:,w._‘1951____
l ‘ ,a.m_n.m gm |.zn*u.m‘un lam p.ml3 0*p.m'p.m|y.m S X S 0 8 X! 30 :3 X 80 3.111 n.m {pan p.m v.m‘n.m‘p.m§p.m p.m
__ _

a.v;nc , , _ , _ _. , de 5 7 348 20,9 7\1Dl9 I119 .. 1235 1 S42 ‘4)!3 344 4215 3.16 an 347 ‘(DJ 34 @884 .. 10351135 .. 12351 352 35 3 355 354? 35,7 35. .
1 1 Langston”, Lb 37823810102211 .. 12391372 2313 371445 36 B 37 93737323837 .. 10351138’ .. l23S\3823S338538‘G38‘738 n
n I 2§North Hlyling . .. 7 4] 279 1610261125 .. 12 1412918 41% 495 40627 4]7L'71418278-11 .. 104TIll42 .. ;1!14‘.‘.1 4 ‘£3 :5 426 ..
a ‘J; ‘ yljx_x_g_Is1a _¢p_7 47 inc}; 2-210 1132 .. 124914-.2 33g_4jg 555 as as 4'. I 331 47333 47 .. 1o4s|11_-ss .. 134351 " gm} ’..
I9 I_ _ _ > _ _Sn.c}n-day: only. 7* _sx Safgrdnyn excepted. I-‘ox-J-Vg_tu___|‘|_]ou1{ney._9§9_]ga;g:
.
Q
Table s'l—mmm HAYLING ISLAND and EAVANT n
In
‘a " i 3: Sundays
_ up week .3)’; commences 63!: Mag._1951
if a.myg,mm,m 3,3) mm: a.m: 5 018! S O p.mrp_m,'p.mlp.m 9.m p.m A.mlA.m ‘p.m‘p.m{p.m p.m§n.m‘p.m!v.|n.
__ nnylhu 1sland.. .dep7 73 013 409 45 105231 on . 1 662 53 2 574 16:5 sgq 57:5 52. 529 52 .. 1055:1155 .. 12551 552 55 4 555 as-is 5517 an: I
2 Norm Hayling. .. 1 ms 455 449 49‘ 105411 59; . 1 W2 59 s 1 4 ms 10-: Is 567 5618 66» .. 1oo9;un9 .. 12591 59.2 5914 5355916 59:7 59‘
3 Lnngston ,_ 7163 91S499b4‘111|12 4,‘ . 2 4:3 53 64255116 0,7 19 19 1 1141124: ‘,1 42 43 45 46 4'1 4‘B on
—(3‘ﬂivgn_t:, .1-/_2o§> ;§‘s_5:} 9__|5_e1_

_g;gA 3‘ _,. V'__..._2_g3 10 3 10 4A2_9«5i9:tLlq57 5 S 549%.’: __., }VI_7§s12 L: 11 82 as s_5_A§6#§7 B-S
so Sgﬁufdgyg gm, 5! Saturdays ucepced. —,

I suspect that maybe OCR needs to be trained to read timetables...

shawmat · 13 Jun 2020

Many thanks for all the positive responses to this thread over the last month or so. I've been building a new timetable viewer for timetableworld that should be ready to demo in the next few days. One reader of this forum has been helping me to re-index a timetable for the new viewer, and with 40 more timetables in the pipeline it'd be great to get a few more helpers. Make yourself known via the new email address [email protected]. On the website, you can see where I'd got to a couple of weeks ago, but now there are lot more navigation aids.

For the IT techies, the new viewer is based on the Leaflet Javascript library. It's mainly intended for maps but can be used for non-maps too. It uses 256x256 tiles like the existing PanoJS viewer but they are served from a PostGreSQL database at runtime. Using actual image files is not scaleable whereas creating them on demand from a BLOB store has few limits. It'll be a shame to say goodbye to PanoJS - it was a great summer project for a student in 2007, but long since superseded.

A few people have been discussing OCR. Remember: it is technology, not magic. Timetables are often printed on poor quality paper with a lot of blurring, and the scanning process loses some fidelity too. But humans can fill in the gaps. If adopting OCR, expect to do a lot of post-processing to replicate the part that humans do. I'll discuss artificial intelligence (AI) in a moment...

OCR has always been part of the indexing process for Timetable World, though I haven't previously adopted AI. That would be an interesting project to follow the website relaunch!

Here's a real-world example: It's a snippet from The Official Guide to the Railways Oct 1923 (North America), listing 1,000 railroads in a two-column table. No problem reading it.

And here is what OCR manages:

So, it's a good start, pretty accurate rendering of the text, but not the page numbers. Every single digit needs to be checked. This book has 11 pages in the railroad index, 130 pages in the stations index. ocr.space is an online service that I've just started experimenting with, whose advantage is that every word and element can be downloaded in JSON format for post-manipulation.

AI can be used to help OCR with its guesses over characters, which might help improve accuracy. You can add further rules (not really AI) that tell it to expect data in a sequence e.g. alphabetical, or ascending time. To be honest, I think there's a PhD in there for someone in overcoming the complexities of fully digitising historical timetables.

(Shameless promo coming up...) So, unless you're up for doing a 4 year PhD, why not come and help timetableworld instead?

timetableworld.com

Member

RailUK Forums

Veteran Member

Member

Established Member

Member

Member

Member

Established Member

Member

Member

Veteran Member

Member

Member

Established Member

Member

Member

Member

Member

Member

Established Member

Member

Established Member

Established Member

New Member

New Member

Member