# Research

PhD Results

The theme of my PhD dissertation was "Application of Discrete Predicting Structures in an Early Warning Expert System for Financial Distress".

The PhD disputation toke place in December 2004.

Brief description:

Dissertation's main idea was to use Kalman filters to estimate dynamical classifiers' parameters. A problem of financial distress

forecasting was used to test developed classifier. Previous researches in this field were limited to ML (Machine Learning) methods

assuming static behaviour of crisis process. Efficiency test were performed to compare this approach with existing ones.

Analysis of data pointed out strong noise existence, thus noise aware method - UKF (Unscented Kalman Filter) and discrete dynamic

systems - were chosen as a machine learning method. The accuracy of classification achieved using this new attempt was compared with

result the other methods, well known in this domain i.e. Neural Networks, Discriminant Analysis, Nearest Neighbour and Nearest Mean.

The results of various experiments, conducted in different space dimensions i.e. different number of attributes, proved that taking

dynamic into account one can build more efficient classifiers.

Ancillary problem undertaken in PhD researches was a reduction of attributes space. Appropriate feature selection algorithm was

proposed, in which selection wasn't done solely on features significance but also on their dependencies.

This feature selection algorithm is available in form of MATLAB code.

Introduction to the proposed method:

Discrete dynamical systems are widely known and used in technical problems, however their application to financial

problems are rarely. Using a state-space model, we can model process as:

Where uk are inputs, xk state variables, yk output, vk and nk are process and observation noise respectively. It is a generic

representation of dynamic systems. The first equation is known as process equation, second one as a observation equation.

Fundamental problem concerning application of above models is extension of these both equations in precise functional

form. In problem chosen as a testing environment, these functions cannot be derived from any other dependencies, thus

system identification procedure was required.

Identification (parameters estimation) was performed using a UKF (Uncented Kalman Filter). This is a nonlinear version

of Kalman filter, basing on an assumption that transformation of set of chosen points via nonlinear system allows to

calculate statistical moments of a random variable more precisely than e.g. linearization used in EKF (Extended Kalman

Filter). This transformation of random variable x is known as Unscented Transformation. More info on UKF or UT transformation

can be found here.

A generic form of proposed dynamic classifier is shown at this schema:

The developed method was called DDS Discrete Dynamical System while classifier with open feedback loop was denoted as DSS

Discrete Static System. During learning phase feedback loop should be opened (feedback ratio equal to 0) and parameters ought

to be calculated to minimalize mean square ratio (MSE classifier). An basic idea of MSE classification is shown on picture

below (solid line is a classification function, constant B was set to 1 in this case):

After this first phase of learning, feedback gain ratio should be calculated to maximalize accuracy ratio for train

set, as shown on chart below (charts comes from experiments described later at this page):

Experiments:

The problem of financial crisis forecasting was chosen to perform test for developed dynamical classifier. There were

many classification methods tested on this subject in previous researches, however all of them assumed static characteristic

of crisis process.

A dataset used in experiments contained 240 financial statements (112 from bankrupt firms and 128 from existing ones) a set of 30 financial ratios

was build as possible crisis indicators. Classification task was to decide to which group firm should belong ? to bankrupt

or existing. More info about this dataset can be found here.

Ancillary problem undertaken in PhD researches was attributes space selection. This selection should take into account

not only attributes significance ratios but also their inter-dependencies (desired for most of problems). Moreover

this requirement was very important in presented case of financial ratios dataset, where many of ratios are very similar

to the others due to their construction.

Appropriate feature selection algorithm was proposed, in which selection wasn?t done solely on features significance but

also on their dependencies. Experiments were performed to check if space dimensionality reduction lead to increase of the

classification accuracy. Results are show at chart below, where 'Significant' means features selected only from their

significance point of view, while 'With reduced dependencies' denotes set of significant variables but with reduced dependencies.

Increase of accuracy can be easily noticed in both cases, that leads to conclusion that proposed algorithm was efficient.

The classification accuracy of the proposed DSS method was compared to the results achieved using methods previously

applied to this problem ? Nearest Neighbour, Nearest Mean, Artificial Neural Network and Discriminant Analysis.

Parameters were estimated via UKF filter in 200 epochs (160 of cases in each epoch), while simulated annealing of noise

covariance matrix was used to ensure that filter converges fast at the beginning and parameters will not change rapidly

later on. Chart below show an example of parameters estimation for one of the experiments (3 inputs + 1 free parameter).

Classification accuracy was checked on a test set, using a 3-fold cross validation. It can be noticed on brief results that

DDS was the most accurate method in all cases.

Overall results allow to state that dynamic classification may increase the accuracy. A chart below contains comparison of

average classification accuracy for all methods.

Any questions and comments - send me an email.

Dataset used in experiments is available.