How Different is Different? The Traps and Pitfalls of Applying Statistics on System Performance Evaluation