How can a large dataset be made amenable to distributed data processing in a Big Data solution environment?