|  | 
		Hi all,
 I'm working on a small project for university and I have some question
 about how to implement it. Maybe you could give me some hints....
 
 I have a directory that contains around 1 million HTML files. Basically,
 I just want to read each file entirely into a String and parse it with
 JSoup in a Mapper. Do we have a InputFormat that can be used for this
 use case or do I have to implement my own FileInputFormat for that? :/
 In general: Do you think creating InputSplits of the directory will work
 properly with 1 million FileStatus'es?
 
 
 Regards,
 Timo
 
 |