Pathology detection mechanisms through continuous acquisition of biological signals

Jorge Sánchez Casanova

Ayuda

Pathology detection mechanisms through continuous acquisition of biological signals

Autores: Jorge Sánchez Casanova
Directores de la Tesis: Raúl Sánchez Reillo (dir. tes.), Judith Liu Jiménez (codir. tes.)
Lectura: En la Universidad Carlos III de Madrid ( España ) en 2022
Idioma: inglés
Tribunal Calificador de la Tesis: Carmen Sánchez Ávila (presid.), Mariano López García (secret.), Richard Matthew Guest (voc.)
Programa de doctorado: Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de Madrid
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Inteligencia artificial
- Ciencias médicas
  - Ciencias clínicas
    - Patología clínica
Enlaces
- Tesis en acceso abierto en: e-Archivo
Resumen
- 1. Introduction Today, artificial intelligence has been integrated into most of the systems we use. Smartphones, cars, video or audio players are some of the examples of how we use these technologies on a daily basis. Nevertheless, if we move to the world of medicine, we find practically no self-diagnostic tools. One of the reasons for this is the sense of false security (in case of a false negative) or alarm (in case of a false positive) that they can generate. Certainly, these algorithms can be configured to, for example, increase the false positive rate in order to avoid false negatives. However, these tools used under the control of a specialist, or used as a pre-diagnostic can be useful. Especially if they allow the patient to make this first diagnosis without restrictions, both in terms of environment and availability.
  
  Throughout this thesis, different algorithms based on artificial intelligence have been developed with the aim of recognising lower body pathologies. In addition, two systems for acquiring the databases, smartphones and a professional system have also been evaluated. Below are presented an overview of the contributions made in the theses. In first place are presented the algorithms using different smartphone configurations followed by the algorithms with the professional acquisition tool.
  
  2. Pathology detection algorithms using smartphones This experiment aims to create a pathology detection algorithm using smartphones to acquire the data. These experiments arise from the possibility to identify people using a smartphone. In order to evaluate if it is possible to make a pathology identification algorithm using smartphones, different configurations of smartphones are used: 1, 2 and 4 smartphones. Using smartphones to acquire the data offers a small cost and an incredible portability, as actually everybody owns one.
  
  2.1. Pathology Detection Using a Single Smartphone This configuration is the easiest and offers the most realistic scenario, which is why we start with it. On the other hand, it is the configuration that offers the least information about the stride. The first algorithm can be divided into 4 parts: pre-processing stage, in which the signals are prepared; cycle extraction, in which the gait cycles are extracted; feature extraction, in which the gait cycles are parametrised and finally the classification, in which the features are used to classify the cycles.
  
  2.1.1. Pre-processing In the acquisition of the database, the smartphone can be placed in different orientation each time it is used. This means that each time the axes will be different for each position of the phone, making it impossible to buy different rides purchased. To solve this problem, the three axes module of each sensor is made. The next step is to ensure that the sampling frequency is constant along all the samples, and if not correct it. This can happen due to the smartphone works with regular Operating System, which cannot guarantee a constant sampling frequency.
  
  2.1.2. Cycle Extraction Once the signals have been pre-processed, it is time to extract the gait cycles. By extracting the cycles, the amount of data is increased, as each walk is composed by about 25-30 cycles. To find the cycles starting points we follow the steps below.
  
  1. First of all, the algorithm looks for all the maximum peaks without any restriction.
  
  2. Then the average of these peaks is obtained, and the peaks below it are discarded.
  
  3. Finally, the points that are less than 0.9 seconds away are eliminated.
  
  2.1.3. Feature Extraction and Data Preparation The features used can be divided into time and frequency features: the frequency features are the position and value of the first five harmonics. On the other hand, the time features are the average, maximum and minimum values, furthermore of rise time, high and width ratio, peaks time difference and zero crossing range. In the parametrization process fourteen parameters have been used, obtaining 47 different features for each cycle. Using those features the gait cycles are parametrised so no information should be lost.
  
  Once the features have been obtained, they need to be prepared, in that way the outliers are removed, and the data is formatted in a correct shape to feed the classification algorithms. To cleanse the data, we follow the next steps: 1. In the first place, the missing and zero values are discarded.
  
  2. After that, the incoherent data is removed, as negative times or ranges, or unusually large amplitudes.
  
  3. The last step is to analyse the data to determine whether data contains outliers and, if so, eliminate them.
  
  The first two steps can be applied directly, however, to determine whether the dataset contains outliers we use the z-score. After cleansing the dataset, the features are sorted in a matrix.
  
  Once all the features have been obtained, the next step is to create a model to classify the pathologies. For this purpose, machine learning algorithms are used, which through prior training are able to detect patterns. The type of machine learning algorithm that works best depends on the dataset and the problem. For this reason, 5 different algorithms (CART, kNN, Logistic Regression, Naïve Bayes, SVM) are first used and compared with each other. For the initial configuration of the algorithms, previously tested configurations in the literature are used.
  
  2.1.4. Results To evaluate the performance of the algorithms, a database is used, which consists of 9 users who make 12 walks each time (4 healthy, 4 left-pathology and 4 right pathology).
  
  In order to assess the correct functioning of the algorithms, confusion matrices are used, which provide details of how the algorithm is classifying the walks. To make the study of these easier, 3 metrics are extracted from the confusion matrices, which are detailed below:
  
  • False Positive Rate (FPR): is the proportion of healthy walks wrongly identified as pathological. Although, in this case, pathology has been diagnosed when the patient was healthy, this does not have a serious effect on the patient, beyond the fact that the patient is treated and does not need it. Therefore, we can consider this metric as a low-risk error, it can be considered as such as this algorithm is intended to aid in diagnosing gait problems, however, in a case where treatment is detrimental to the patient (e.g., cancer screening), this metric should be considered of vital importance.
  
  • False Negative Rate (FNR): is the proportion of pathological walks (both left and right) wrongly classified as healthy. As the algorithm presented here is intended to aid in diagnosing pathologies, we need this metric to be as low as possible, otherwise, a person with a pathology might not be diagnosed. Therefore, the patient would not receive any treatment for the pathology.
  
  • False Limp Rate (FLR): is the proportion of pathological walks that are misclassified as other pathology (e.g., left limp classified as right and vice versa). This metric has the worst of both FPR and FNR, as it can result in a patient receiving treatment that is at best harmless to the patient. It can also be that the pathology is left untreated, for example, when a knee pathology is classified when it is actually in the ankle.
  
  Using these metrics along with accuracy, different algorithms can be easily compared. Analysing the results, we can see that all the algorithms have poor outcomes, being kNN the best one with an accuracy of 59 %, however, it has an FNR of almost 20 %. In the light of the results, we can conclude that it is not viable to use the data acquired by one smartphone to perform pathology detection accurately. However, although some algorithms have shown an accuracy of 30 %, others are around 60 %. Though beforehand, these results do not seem very useful, they open the door to future research. Is it the lack of information that obstructs the results? To clarify this the next algorithm uses two smartphones instead.
  
  2.2. Pathology Detection Using a Single Smartphone The schema of the new algorithm is quite similar to the one presented previously, as the objective of both is the same. The main difference between them is the time alignment and the new low-pass filtering stages in the current algorithm. The other distinction is the number of features used, as we now have information from both legs, more features are calculated. The remaining stages are still unchanged.
  
  In the previous section, we questioned whether the lack of information was limiting the results of the algorithms. By adding another phone to the DB acquisition, we have increased the amount of information in the database and been able to extract new features. As a result, we have seen how the algorithms have improved by an average of 14 %, although the two algorithms that have benefited the most are Naïve Bayes and SVM, which have improved by 20 % and 27 % respectively. We also have brought up to light that using the most alike GCs increases the accuracy, so it is not only the quantity of data that matters but the quality of the data. Although the increase in accuracy is not significant enough to consider the algorithms as valid, this opens a new door as it confirms that the information was insufficient. We proceed with the 4-smartphones configuration.
  
  2.3. Pathology Detection Using a Single Smartphone As we have seen, increasing the amount of data in the database by using more smartphones increases the algorithm accuracy. In this chapter, we employ four smartphones to acquire the database. By this change, we lose entirely the possibility for a user to detect pathologies by himself. However, if good results are obtained with this configuration, we can conclude that the bad results of the previous configurations are due to the lack of information. On the contrary, even if the amount of information received by the algorithm is increased, an accurate result is not obtained, we can deduce that it is due to the smartphones fault, as they do not manage to acquire the signals with sufficient quality. The current algorithm structure is the same as the one used above, including discarding the dissimilar cycles.
  
  Analysing the results, it can be seen that the performance of the algorithms increases as the number of devices in the database increases. However, this increase cannot be considered good enough to justify the use of these additional phones. From the poor growth in accuracy that comes with the rise in the number of smartphones, it can be deduced that the poor results of the algorithm are due to the quality of the signals obtained by the phones and not to the amount of information used 3. Recurrent Neural Network Applied To Professional Motion Capture System This chapter aims to present an algorithm based on neural networks capable of distinguishing lower body pathologies with the requirement of not having to pre-register to use the system. In addition to this, it is also intended to create a system as portable and less restrictive as possible, solving the problem of going to a specialised centre to perform the motion capture.
  
  3.1. Capture System The device used to acquire the database is the Technaid Tech-MCS v3, a professional tool to register the movement and the orientation of the human body. The Tech-MCS can be divided into the Tech-HUB and the Tech-IMUs. The Tech-HUB is the device that collects and stores all the data from the IMUs (Inertial Measurement Unit). The IMUs are small electronic devices based on MEMS (Micro Electro-Mechanical Systems) technology. Inside each IMU, there is an accelerometer, a gyroscope, and a magnetometer, all working in 3D. This system allows capturing the acceleration (m/s2), angular acceleration (rad/s) and the magnetic field (µT) for each IMU. The device has a sampling rate range from 10 to 500 Hz. Table 9 shows the dynamic range and the sensibility of each sensor.
  
  With the new acquisition tool, we created a database of 51 users. Of the 51 recruited users, only 32 made the second visit, and only 21 completed all visits, making a total of 104 different visits. As stated above, each visit consists of 16 walks, so taking into account that there are 104 visits, we have 1664 walks.
  
  3.2. Algorithm Instead of parameterising the signals, the algorithm uses them directly to obtain a result. First, the signals are low-pass filtered to remove any possible noise. Then, the signal is divided into 4 to obtain different fragments, thus increasing the amount of information available to us. The next step is to fit the data and use it to train the neural network. Through this procedure, we have achieved an RNN capable of distinguishing between healthy walks and pathological (both right and left) walks, however, there is still room for improvement.
  
  The improvements made throughout the paper are as follows: 1. Evaluating the influence of the data origin and the way of dividing it.
  
  2. Studying the influence of the cut-off frequencies.
  
  3. Testing whether it is possible to train the RNN with some users and test with the others.
  
  4. Evaluating the influence adding the first and last 3 seconds, and physiological information.
  
  5. Testing whether it is possible to reduce the number of signals without worsening the algorithm.
  
  6. Evaluating whether all the cycles in a walk have the same importance when classifying the walks.
  
  3.3. Results Throughout the chapter, different experiments have been conducted to evaluate the algorithm and try to improve it. At the beginning of the chapter, a rule was presented to eliminate the information related to the first and last seconds of a walk; however, it has been shown that not only does this information not harm the algorithm, but it also improves it. In this case, it can be said that this improvement is due both to the increase in the number of GCs and to the importance of the first cycles of the walk. On the one hand, it has been proven that the magnetometer signals do not contribute to detecting pathological conditions and that the angles between the joints are the most influential signals. On the other hand, it has been observed that by adding physiological information from the users, the algorithm increases the accuracy, as the RNN can better isolate the pathology pattern.
  
  After all the experimentation, it has been possible to create an algorithm with an accuracy of 93.7 % classifying pathologies, which does not need to be retrained to classify new users.
  
  4. Motion Capture System with Feature-Based Algorithm The general structure of the ML algorithm is as follows: a pre-processing phase, in which the signals are low-pass filtered and the module of the accelerometer and gyroscope signal is performed; cycle extraction phase, in which the signal is divided into GCs; feature extraction phase, in which new representative features are obtained by processing the signals; a data preparation phase, in which the features are adapted to the need of the ML algorithm; and lastly, the classification phase, in which the ML model is trained and the features are classified. As done in chapter 4, different ML algorithms are used to find which best perform with the dataset, however, from what we learnt in that part, we use three of the five algorithms initially used, which are: CART, SVM and kNN.
  
  Throughout this chapter, different experiments are carried out with the aim of increasing the accuracy of the algorithm or reducing the complexity of the dataset.
  
  Like the algorithm presented in the previous section, this algorithm uses kinematic signals as well as joint angles, from which the GCs are extracted. However, this algorithm aims to parameterise these GCs to obtain characteristic features describing the GCs, thereby reducing the amount of data. The algorithm uses 15 different parameters, resulting in 201 features.
  
  From the results of the different experiments, we can state that the best ML algorithm for creating a pathology detection algorithm is kNN with 96.30 % accuracy using only 97 features out of the 201 initially obtained.
  
  The two first experiments aim to optimise both the dataset and the ML algorithms; however, the main contribution of this chapter is the dimensionality reduction experiment. Not only for having reduced the proposed dataset by almost half but also for having demonstrated that it is possible to reduce the number of IMUs when collecting the database. This can be used as a basis for further studies with the aim of creating a portable and reduced pathology detection system.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: