How can a large dataset be made amenable to distributed data processing in a Big Data solution environment?
How can large datasets be accessed in a way that lends itself to efficient processing of data in batch mode?