Below is an essay I had written for a university requirement. I admit, it’s pretty rough. I attempt to condense two essays into the word count of one, and I don’t go into as much depth as perhaps I would’ve liked.
But it reflects an important transformation in my thinking around mental health. Therefore I’ve decided to present it.
I grew up in a semi-conservative middle class Immigrant family. There isn’t really much link between this kind of upbringing and mental health except for the fact my father was a psychiatrist. Was he the kind of psychiatrist that had patients lounge on a sofa and confess their concerns to the ceiling? Nope. He was a self-professed biological psychiatrist; adamant that psychiatric pathologies had some biological root that, once revealed, would be the nucleus for the next great psychopharmacological revolution.
Now, he didn’t believe that human connection and experience was irrelevant. His view was more so one borne out of frustration by the lack of progress (and sometimes event destructive effects of) psychological and Freudian psychodynamic approaches. See, psychiatry never had the same kind of “scientific revolution” that other fields have had (e.g. heart transplantation, dialysis, penicillin). Drugs were discovered to have transformative effects on mental health really by accident. And even now, the understanding of psychopharmacology is nascient in comparison to fields like cardiology or oncology.
Enter machine learning. Now, I’m a medical student, not a computer scientist. My only knowledge of machine learning is that it might make radiology obsolete. So, when the opportunity to write a 3000 word essay on AI and medicine, it was an opportunity that provided a completely different perspective. I’ll leave my personal ramblings here, and allow you to read the (below par) work below.
AI’s application in mental health
One can appreciate artificial intelligence’s (AI’s) implication in mental health through the context of two disease: depression and schizophrenia. These are two conditions that are unrelated, and present two different problems. Depression is one of the most prevalent diseases worldwide. ON the other hand, schizophrenia is one of the most disabling mental health diseases.
Researchers have used imaging modalities like MRI scanning and EEG imaging to develop algorithms that can diagnose patients with depression and schizophrenia. In areas like research, AI is being applied differently for both. In depression, researchers are using AI to subclassify the disease and discover new subtypes of depression based on differences in neuroanatomy picked up by the machine learning algorithm. In schizophrenia, AI is attempting to discover molecular phenotypes of genes associated with schizophrenia. AI is also being used in the prediction of suicide: the most severe consequence of both depression and schizophrenia[2,6].
Diagnosis of depression has been a major challenge in the field of psychiatry. One significant issue is that despite being prevalent, it remains misdiagnosed due to poor sensitivity and specificity. This might be due to current methods of diagnosis as being inadequate, as diagnosis is not based on an objective clinical measurement as highlighted when discussing its heterogeneity.
Artificial intelligence has been extensively applied to the area of depression and schizophrenia diagnostics. Specifically, there has been an interest in using AI to classify neuroimaging to determine whether a patient has depression. It does this by creating multi-modal data from imaging and using it to separate a training set of patients and controls. The effectiveness can be tested with a set the algorithm hasn’t seen.
One modality being explored for use with AI in depression diagnostics is MRI imaging. IN one study, researchers took diffuse tensor MRI scans from 44 healthy controls (HC) and 52 patients with Major Depressive Disorder (MDD). They trained, then tested a Support Vector Machine Learning (SVM ML) algorithm to be able to separate patients into either MDD or HC classifications.
They looked at different connectivity patterns in the whole brain; right; and left hemispheres to find the algorithm with the highest accuracy of classification.
The mean sensitivity of the algorithm was 82%, whereas the mean specificity for the algorithms was 50%. From this its clear the SVM algorithm was over diagnosing patients with MDD. It may be because the training and testing data set are from the same site, resulting in overfitting. As we’ll discuss in the Wider Discussion segment, this is an important consideration for AI algorithms.
Little evidence exists around the accuracy of psychiatric diagnosis to compare to, however one study looking at the accuracy of GP diagnosis of MDD found that their sensitivity was 50.1% but their specificity was 81.3%. The high specificity and low sensitivity suggests GPs may underdiagnose depression.
This suggests rather than AI being solely responsible for diagnosis, MRI diagnostics may be used in combination with the judgement of healthcare professionals for the best diagnostic outcome. However, to know this for sure, studies must be done to evaluate the effectiveness of psychiatric diagnosis, as well as AI assisted psychiatric diagnosis.
Much like depression, evidence has existed about the role of neuroanatomy in schizophrenia development. Some evidence suggests that there are neuroanatomical differences between patients with and without schizophrenia. As we’ve seen before with depression, AI could be used in conjunction with modalities that measure brain structure and function to diagnose schizophrenia.
Similar to the use of MRI imaging in depression, studies have fed MRI scans of patients with schizophrenia into AI algorithms with the purpose of classification. One study took MRI imaging from 941 patients from 5 different sites and used these to train and validate different algorithms.
This paper used multiple sites for training, then sites the algorithm hadn’t seen for validation (SETs 4 and 5). This is important for testing whether the algorithm was “overfitting”, in that the algorithm used non-generalizable set-specific factors to diagnose patients. Given the testing sets were from different sites, overfitting would have resulted in a lower accuracy. From this table, you can see AUC was over 0.7 for pooled data from SETs 1–3 when tested against SET 4 and 5.
However, when individual sites were used to train the data, the accuracy was more variable when tested with external site data. The average AUC for algorithms tested against data from site 4 trained against individual site data (so sites 1–3) was 0.684/0.659, which is not high enough to indicate significance. When comparing it to the algorithm when trained with all three sites (0.731/0.738), it demonstrates the importance of multi-site training and validation.
There is another imaging modality that could be used: the electroencephalogram (EEG). The application is similar to MRI imaging: an algorithm is developed from EEGs to diagnose patients with depression or schizophrenia.
There is evidence of EEG data being used to create AI algorithms that could diagnose depression and schizophrenia. IN fact, according to one systematic review, 17.1% of papers looking at AI applications in depression were using EEG data.
The accuracy from EEGs has been more impressive than MRI imaging. Since 2012, using a variety of AI techniques including machine learning and neural networks, the accuracy of AI/EEG algorithms have been over 80%, with a mean accuracy of 94.4%.
What’s interesting is that the average accuracy using the SVM algorithm ( the ML algorithm used in the MRI depression paper) is 92.5%, which is greater than the mean total classification accuracy of MRI, which was 66.7%. Combine this with the fact that EEG’s are cheaper and easier to use, and it would be sensible to assume that if AI assisted diagnosis of depression were to occur, it would do so using data from EEGs. That being said, the superior localisation and visualisation power of MRI lends itself well to AI assisted research, a point we explore later.
AI assisted EEG diagnosis shows promise with schizophrenia as well. One study looked at three different types of EEG data: source-level data, sensor-level data and a combination of both. It found accuracies greater than 70% across all three modalities.
What’s interesting is that the average accuracy increases when using a combination of EEG data types, rather than using one or the other. This is also seen with algorithms using MRI data from different sites to train its algorithms. The important distinction is that this paper shows using multiple modalities produces more accurate results, whereas the other paper suggests using more data from multiple sites does the same. Both demonstrate that the accuracy of AI increases with data of greater complexity and volume.
We don’t have a neuropathological model of depression. Its been suspected that there is a neuroanatomical basis for the disease, but the results have not been consistent. One explanation for this uncertainty is that depression is a uniform disease.  Depression is heterogenous, and can vary in its presentation. For instance, a patient can be diagnosed with the same label of MDD despite having different symptom presentations, as a combination of up to nine symptoms is required for a diagnosis, according to NICE.
One landmark paper used unsupervised machine learning to subclassify patients with depression into 4 groups based not on their symptom profile but differences in fMRI imaging. Each subtype was based on differences in fMRI connectivity patterns in different parts of the brain (see figure 7).
They found that where the anatomical structures between the groups were similar, those regions were associated with the most common depression symptoms (“anhedonia”, “fatigue” and “mood” symptoms). The more “abnormal” the connection, the more severe the symptoms.
They were also able to come up with unique symptom profiles associated with each subtype (see figure 9).
To test whether the subtypes were just based in irrelevant variation in the data sample or whether they were based on true neuroanatomical differences in patients with depression, the researchers used the SVM machine learning algorithms for each subtype and applied them to an external data set. 
The results as shown in figure 10 show that each subtype had an accuracy greater than 80%, and that the algorithm could correctly identify 84% of patients with depression. When data was carefully selected (segmented bars), accuracy increased to 93.3%.
This suggests that subclassification of depression help improve accuracy, when you compare these results to the results of the MRI study shown in depression part. This is significant as diagnosis was improved with the application of AI-assisted research.
For schizophrenia, there is another application for research. Schizophrenia is a polygenetic disease. There is evidence of high probability of inheritance. The problem , is understanding the relationship with the genes associated with schizophrenia and the phenotypes they are responsible for. Understanding this relationship can elucidate more information about the development and pathophysiology of the disease.
If a gene that is a part of a biochemical process is associated with a polygenetic disease like schizophrenia, it suggests that the process is related to the underlying pathology of the disease. If this associated was not previously known, It can then be a new focus of study in the field, which could uncover more information about the pathology of the disease. Machine learning is being applied to polygenetic diseases like schizophrenia for this purpose.
In one paper, Bern M et al. used machine learning first to assess the importance of some schizophrenic genes in causing the disease. They then compared those significant genes to find those which had a significant role in cell motility, a phenotype they hypothesised was associated with schizophrenia.
They then identified 6 genes in which the association was the strongest with the intention of studying their effect on the cell, the results of which could be used to further develop our understanding of cell motility’s relationship with schizophrenia.
However, this technique has not been universally successful. A paper by Zheutlin et al compared machine learning with standard linear regression models to see whether they could predict phenotypes based on 77 genotypes that were known to be associated with schizophrenia. What they found was no comparable difference, with evidence that the machine learning algorithm was task specific; it was using features unique to the data set to predict phenotype, rather than actual differences in genotype. This is significant, as it impacts the generalisability of the algorithm in wider research.
However, the two results may not be mutually exclusive. The paper by Bern M et al used a pre-made algorithm called SINKSOURCE, which had been used successfully in a previous study. The paper by Zheutlin et al used random forest modelling. 
The significance of this is that different machine learning algorithms may vary in terms of their reliability, so more research needs to be done to distinguish algorithms that are generalizable in a clinical context. We saw this when looking at the paper by Acharya UR, which showed that different AI algorithms (in that case, neural networks and SVM ML) can produce different accuracies.
800, 000 people die of suicide every year all around the world. According to the Office for National Statistics 2018, there were 6507 suicides in the UK, the equivalent of 11.2 deaths per 100,000 people. 
It is one of the most severe outcomes for both schizophrenia and depression. According to a systematic review on risk factors for suicidal thoughts, depression was the third most common cause of suicidal ideation. Suicide is also serious consequence of schizophrenia. Around 10% of patients with schizophrenia will go on to commit suicide.
The main application of machine learning in suicide prevention is predicting risk. Its able to take into account different risk factors at once. Most current techniques of suicide risk evaluation includes looking at risk factors in isolation. Machine learning can take into account data on multiple risk factors of an individual and use that information to determine a risk algorithm.
TO achieve this, some researchers used data from electronic health records to generate machine learning algorithms that could predict whether a patient was at risk of a suicide attempt. Machine learning was applied to electronic health records of 5167 patients, of which 3250 had evidence of a suicide attempt. They looked at the algorithm’s ability to predict an attempt by using the data at different time intervals between 7 and 720 days before it.
The results showed the algorithm was able to successfully predict whether a patient would go on to attempt suicide. The algorithm, for all time intervals in between 7 days and 2 years before the attempt had an AUC that never fell below 0.8, indicating the classification was significant.
An interesting find was that similar to the study that looked at Facebook data to predict depression diagnosis, the algorithm had a higher AUC the closer the data was to the suicide attempt. This is important for a screening program. If this technology is to be implemented, there must be research into developing algorithms that use the same data to predict how close the patient is to an attempt of suicide to be used side by side. This is to determine whether the prediction of the algorithm is accurate and an attempt is likely soon, or whether a prediction is not as accurate, and an attempt is likely far away. The proximity of the event is important for the degree of clinician intervention.
The study then went on to use a different cohort for the control cases. Whereas for the initial algorithm took into account cases that were classified as attempts but were later found not to be (around 1917 patients), the new control cohort included a random selection of hospital patients (around 12,695 patients). The area under the curve values improved to 0.92 at 7 days before attempt and 0.86 at 720 days. 
The algorithms performance actually improves when using real clinical data indicating its applicability as a potential tool for suicide screening. It proves that the algorithm used in this paper can be generalised to the hospital patient population. Note however that the decrease in accuracy with time before attempt is more marked than with curated data, highlighting the need for a complementary algorithm to assess likely time of event, as mentioned previously.
The paper also attempted to use standard statistical analysis measure that only produced an AUC of 0.66 at 7 days and 0.68 at 2 years before attempt. This demonstrates the ability of machine learning to differentiate patients based on a larger number of predictors and produce a more accurate and significant result that what a human can do with statistical analysis tools. It shows that machine learning’s advantage is significant.
Its clear the type of data used to train these algorithms impacts their accuracy, their generalisability and their stability. We saw when comparing papers by Schyner DM et al and Rozycki et al that accuracy of the algorithm increases when using multiple sites to train the algorithm. If, like in the Schyner DM et al paper, you use a data set with a small sample size, then train and test the algorithm on that data set, you increase the risk of “overfitting”. Overfitting is the phenomena when the algorithm is able to accurately classify data based not on true differences, but rather differences that exist within that data set only. You can spot this when testing the algorithm on data from a separate set. If we want to use algorithms in clinical practice, those algorithms will have to be trained on data from the patient population.
Unlike in other fields of medicine, no studies have been undertaken in mental health comparing the efficacy of algorithms against consultants. There is extensive evidence in other specialities of medicine where the efficacy of AI diagnostics is compared to clinicians in real time.
No studies with such direct comparison exists in mental health. As we’ve seen when discussing depression diagnostics, there is very little research on the effectiveness of psychiatric diagnosis or prediction. To be confident AI-assisted diagnostics and prediction is safe and effective, future studies should look into direct comparisons with psychiatrists.
Its clear that the focus in AI-assisted psychiatry is assessing the performance of these algorithms against clinicians in real time. What clear, based on the effectiveness of the algorithms discussed, is that there is promise in using AI in the diagnosis, research and prediction of mental health conditions
We’ve seen its current application in breaking the psychiatric research barrier, which appear to be the only aspect of psychiatry that AI is currently in application. We may be able to elucidate the pathology of mental health diseases like depression and schizophrenia from gross anatomy, or from molecular biology and genetics.
Its likely we will also see AI assisted EEG diagnosis of both depression and schizophrenia, whereas the future of MRI diagnostics seems less certain. Although, its important to highlight that MRI imaging, especially modalities that assess connectivity and function, have been applied successfully to research.
The implications for a suicide prediction algorithm extends beyond the mental health clinic. It could be used in workplaces and school by staff who may not be trained to spot those at risk. It likely such an algorithm will be invaluable not just in healthcare, but also outside of it.
IN summation, AI-assisted psychiatry is showing significant promise, but before its implementation in current practice, more research is in the field is needed.
1. World Health Organisation. Suicide Prevent. Available at: https://www.who.int/mental_health/prevention/suicide/suicideprevent/en/. Accessed March 8th, 2020
2. Marder SR, Cannon TD. Schizophrenia. New England Journal of Medicine 2019 Oct 31,;381(18):1753–1761.
3. Drysdale AT, Grosenick L, Downar J, Dunlop K, Mansouri F, Meng Y, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature medicine 2017 Jan;23(1):28–38.
4. Bern M, King A, Applewhite DA, Ritz A. Network-based prediction of polygenic disease genes involved in cell motility. BMC bioinformatics 2019 Jun 20,;20(Suppl 12):313.
5. Walsh CG, Ribeiro JD, Franklin JC. Predicting Risk of Suicide Attempts Over Time Through Machine Learning. Clinical Psychological Science 2017 May;5(3):457–469.
6. Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychological bulletin 2017 Feb;143(2):187–232.
7. Schnyer DM, Clasen PC, Gonzalez C, Beevers CG. Evaluating the diagnostic utility of applying a machine learning algorithm to diffusion tensor MRI measures in individuals with major depressive disorder. Psychiatry Research: Neuroimaging 2017;264:1–9.
8. Mitchell AJ, Vaze A, Rao S. Clinical diagnosis of depression in primary care: a meta-analysis. 2009.
9. Rozycki M, Satterthwaite TD, Koutsouleris N, Erus G, Doshi J, Wolf DH, et al. Multisite Machine Learning Analysis Provides a Robust Structural Imaging Signature of Schizophrenia Detectable Across Diverse Patient Populations and Within Individuals. Schizophrenia Bulletin 2018 Aug 20,;44(5):1035–1044.
10. Tran BX, McIntyre RS, Latkin CA, Phan HT, Vu GT, Nguyen HLT, et al. The Current Research Landscape on the Artificial Intelligence Application in the Management of Depressive Disorders: A Bibliometric Analysis. International journal of environmental research and public health 2019 Jun 18,;16(12):2150.
11. Acharya UR, Oh SL, Hagiwara Y, Tan JH, Adeli H, Subha DP. Automated EEG-based screening of depression using deep convolutional neural network. Computer Methods and Programs in Biomedicine 2018 Jul;161:103–113.
12. Shim M, Hwang H, Kim D, Lee S, Im C. Machine-learning-based diagnosis of schizophrenia using combined sensor-level and source-level EEG features. Schizophrenia Research 2016;176(2–3):314–319.
13. National Institute for Health and Care Excellence. Depression in adults: recognition and management [Internet]. [London]: NICE; 2009 [updated 2016 Apr; cited 2016 Dec 16]. (Clinical guideline [CG90]). Available from: https://www.nice.org.uk/guidance/cg90
14. Zheutlin AB, Chekroud AM, Polimanti R, Gelernter J, Sabb FW, Bilder RM, et al. Multivariate Pattern Analysis of Genotype–Phenotype Relationships in Schizophrenia. Schizophrenia Bulletin 2018 Aug 20,;44(5):1045–1052.
15. Office for National Statistics. Suicides in the UK: 2018 registrations. 2019; Available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/suicidesintheunitedkingdom/2018registrations. Accessed Mar 8, 2020.
16. Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, et al. Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences of the United States of America 2018 Oct 30,;115(44):11203–11208.
17. Steinhubl SR, Muse ED, Topol EJ. The emerging field of mobile health. Science translational medicine 2015 Apr 15,;7(283):283rv3.