Part of Amazon's problem is that you could only use their stores if you had their app.
I don't think that was really the problem. The problem is that analysing images in real time, to the level of detail required to accurately determine what a customer is purchasing, is computationally expensive and, with current technology, impossible to do reliably.
This is a problem solved by barcodes, which are a standardised identifier intended to make it easy for machines to determine what an item is by "looking" at it...
The Tesco Express trial I mentioned above has overcome that.
In doing so, though, they've removed an enormous advantage of Scan and Go, which is that the handsets or apps keep track of what you've bought as you go.
They're also an excellent advertising opportunity for the supermarkets. Customers now carry around a device that in theory they could use to advertise to customers based on their past purchasing history, which items they've already picked up in this trip, which section of the store the customer is located in, the time of day, stock levels and probably a million other things I haven't thought of.
Tesco is already doing a bit of this sort of thing and no doubt we'll see more of it - since, used to its full potential, it brings many of the marketing advantages of e-commerce into the physical store environment as well. I can't imagine the supermarkets wanting to give this up easily.
I imagine the supermarkets will gradually try to push people away from handsets towards apps, so that they can have fewer handsets and save money that way. Perhaps they'll also start allowing payment through their apps as well. But I think the camera technology is a very long way off, if it happens at all.