In this work, a novel binary version of the grey wolf optimization (GWO) is proposed and used to select optimal feature subset for classification purposes. Grey wolf optimizer (GWO) is one of the latest bio-inspired optimization techniques, which simulate the hunting process of grey wolves in nature. The binary version introduced here is performed using two different approaches. In the first approach, individual steps toward the first three best solutions are binarized and then stochastic crossover is performed among the three basic moves to find the updated binary grey wolf position. In the second approach, sigmoidal function is used to squash the continuous updated position, then stochastically threshold these values to find the updated binary grey wolf position. The two approach for binary grey wolf optimization (bGWO) are hired in the feature selection domain for finding feature subset maximizing the classification accuracy while minimizing the number of selected features. The proposed binary versions were compared to two of the common optimizers used in this domain namely particle swarm optimizer and genetic algorithms. A set of assessment indicators are used to evaluate and compared the different methods over 18 different datasets from the UCI repository. Results prove the capability of the proposed binary version of grey wolf optimization (bGWO) to search the feature space for optimal feature combinations regardless of the initialization and the used stochastic operators.