Skip to Content

Datasets

Dataset

Here is a brief description of bankruptcy data that I gathered during PhD research. If you like to use them you can
find appropriate info in 'Usage' section.

Database description:
Database contains 240 cases (112 from failed companies and 128 from non-failed ones). Data come from a period of 2
years (in a row), so there are observations for 120 companies. Observations come from 2 up to 5 years before bankruptcy
toke place.

Research background:
I used this data in experiments during my PhD researches. They aimed at a construction of dynamical (in terms of
Systems' Theory) classifier that could be able to forecast bankruptcy. Constructed model was to be used as a part of
Expert System. All data were collected from financial statements published by polish companies. Analysis proved that
they are noised, thus a noise-aware approach was necessary to calculate model's parameters. Unscent Kalman Filter
(a non-linear version of well known Kalman filter) was chosen as an estimator. Results of experiments can be found
here. Dynamical classifier was more effective than methods previously used in field of financial distress prediction
(Multilayered Perceptron, Discriminant Analysis, Nearest Neighbour and Nearest Mean).

Technical details:
There are 33 attributes for each case:

Company ID - Numerical symbol used to identify companies,
Year - year from which data come,
Status - company status - failed "-1" or non-failed "1",
X1 - cash/current liabilities
X2 - cash/total assets
X3 - current assets/current liabilities
X4 - current assets/total assets
X5 - working capital/total assets
X6 - working capital/sales
X7 - sales/inventory
X8 - sales/receivables
X9 - net profit/total assets
X10 - net profit/current assets
X11 - net profit/sales
X12 - gross profit/sales
X13 - net profit/liabilities
X14 - net profit/equity
X15 - net profit/(equity + long term liabilities)
X16 - sales/receivables
X17 - sales/total assets
X18 - sales/current assets
X19 - (365*receivables)/sales
X20 - sales/total assets
X21 - liabilities/total income
X22 - current liabilities/total income
X23 - receivables/liabilities
X24 - net profit/sales
X25 - liabilities/total assets
X26 - liabilities/equity
X27 - long term liabilities/equity
X28 - current liabilities/equity
X29 - EBIT (Earnings Before Interests and Taxes)/total assets
X30 - current assets/sales

Usage:
If you wish to gain access to this database please send me an email (about yourself and details of planned usage).
After this you will receive instruction how to get access to dataset. Data is prepared in various formats.
It can be accessed in XML form (for any database system capable to process XML files), as ARFF file (for Weka
and other data mining software) and CVS file (the plain version that can be converted to the other formats).
Questions and comments are welcome. If you have any cooperation proposals - simply mail me.

If you use this dataset in a book, article or publish them in any form, do not forget to cite the source
(in any suitable form) according to citation below:

Wieslaw Pietruszkiewicz "Application of Discrete Predicting Structures in an Early Warning Expert
System for Financial Distress",
PhD Thesis, Szczecin Technical University, Szczecin 2004

or in form of BibTex entry:

@PhdThesis{Pietruszkiewicz2004:PhD,
author = "Wieslaw Pietruszkiewicz",
title = "Application of Discrete Predicting Structures in an Early Warning Expert System for Financial Distress" ,
school = "Faculty of Computer Science and Information Technology",
Szczecin University of Technology",
type = "Ph.D. Thesis",
year = "2004",
month = "December",
url = "http://www.pietruszkiewicz.com"
}