Generic pipeline framework for composing and running sequential data processing steps with checkpointing