Imagine that you want to analyze one terabyte (1 TB) of data that is residing in a single machine with eight input/output channels, where each channel has a reading speed of 150 megabytes per second
- Calculate the time it takes for the reader to read the entire file.
- To speed up the reading operation, consider adding more machines and creating a distributed cluster. What is the minimum number of machines you should install in the cluster so the entire read time is less than 10 seconds?