My career has been motivated by a singular curiosity into what it means to learn from evidence arising from measurements, i.e., data. This has driven me to commit myself to the study of statistics and machine learning. From a technical standpoint I am focused on the analysis of learning from streaming data in highly dynamic environments, a research direction that I established in academia as a Research Fellow at Cambridge University and an Assistant Professor at Imperial College. I focused on real-time neuroimaging and cybersecurity, two domains that share a common data format, that of multivariate data streams emitted from communication between nodes arranged in a network (computer hosts in one case, brain regions in the other). In search of real-world validation of my ideas, I co-founded a startup in Machine Learning for data streams, Mentat, which has bloomed to a highly esteemed cybersecurity consulting company.
- Nationality: British
- Phone: +44(0)790757402[eight]
- Email: canagnos[at]ment[dot]at
Employment - Highlights
- Nov 12 - to date: Co-Founder, Chief Scientist and CTO, Mentat Innovations, UK
- Roadmapped and oversaw the creation of a Centre of Excellence in Anomaly Detection spanning the Security Office and bridging the academic state-of-the-art with an enterprise environment and a large security team, resulting in the group’s work appearing in top academic conferences while several software modules were enthusiastically approved by stakeholders and went into live production on feeds ranging from physical security to internet traffic monitoring.
- Engaged with senior management and C-level executives in a number of large enterprises (Barclays Group, Cisco, Festo, Ordnance Survey, Ministry of Defence, and others) to communicate the potential of state-of-the-art machine learning in addressing their pain points.
- Motivated and trained a team of both junior and senior data scientists and developers to stay on course and navigate difficult technological challenges to deliver proof-of-concept prototypes in extremely short timescales and with limited resources, in areas asdiverse as image classification, natural language processing, and streaming anomaly detection.
- Balanced the competing roles of Founder, CTO and Chief Scientist by simultaneously optimising for speed of delivery, code quality and scientific integrity respectively, enabling a small start-up to service large enterprises in regulated and critical areas such as security and quality control.
- Took a leading role in promoting the importance of sound statistical methodology in the broader data science community, via a number of high-profile presentations and panel contributions.
- Introduced a program for MSc industrial thesis supervision and 3-month PhD internships, attracting students from around the world (e.g., Harvard University, Imperial College, University of Athens)
- Oct 11 - Dec 2016: Lecturer (Associate Professor) in Statistics (on unpaid leave from 2014 onwards), Mathematics, Imperial College London, UK
- Established an independent research program in the field of adaptive learning rates for parametric models of streaming data via a combination of journal publications, workshops and invited talks, with applications ranging from neuroimaging to fraud detection. The program remains active.
- Secured total grant funding in excess of 300K GBP in my first three years as lecturer.
- Supervised to successful completion three PhD students and one postdoctorate researcher, without research output published in top journals such as the Annals of Applied Statistics.
- Fostered career growth opportunities for all individuals under my supervision, helping them secure high-profile research jobs (e.g., a postdoc position at UCL, and a Lectureship at Imperial College).
- Designed two new courses for the MSc in Statistics, and redesigned an existing 3rd/4th year undergraduate course in statistical modelling increasing its popularity fourfold.
- Organised the Statistics Section’s weekly seminar and a number of high-profile workshops.
- Publicised the expertise of Imperial College in my field of research worldwide by international speaking engagements in academia and industry (e.g., Google HQ, University of Vancouver).
- Offered consulting services on the deployment of advanced statistical modelling in a number of fields including online advertising, threat intelligence, credit scoring and streaming data analysis. Clients included Advance Media, BAe Systems, and Barclays Group.
- Jan 10 - Aug 2011: Research Fellow, Statistical Laboratory, Cambridge University, UK
- Pursued novel research in the theoretical underpinnings of online learning in dynamic environments
- Contributed to two open-source software packages on CRAN, on a novel metric for classification performance www.hmeasure.net, and a dynamic classification and regression method (dynaTree).
- PhD Mathematics, Statistics Section, Imperial College, Nov 06 - Jan 10
- “A statistical framework for Streaming Data Analysis” (Prof. D.J. Hand and N.M. Adams)
- MSc Logic and Algorithms (distinction), Athens University, Greece, Oct 04 - June 06
- “Bertrand Paradoxes and Kolmogorov’s Foundations of Probability” (Prof. Y.N. Moschovakis)
- MSc Machine Learning (distinction), Edinburgh University, UK, Sep 03 - Sep 04
- “Learning Probabilistic Relational Models for Genetic Networks” (Dr. D. Husmeier)
- BAHons Mathematics (2:1), Cambridge University, Pembroke College, UK, Sep 03 - Sep 04
As Chief Scientist and Founder of Mentat
- Awards and Accelerators
- TechFounders 2016: Winner
- Cybersecurity London (CyLon) 2015: Winner
- Cisco Enterpreneurs in Residence Program: Winner
- EY Privacy Challenge 2014: Runner-up
- Consulting and Software Development
- Anomaly Detection and AI-driven Cyber-Intelligence, Barclays Group
- AI-driven Quality Control from photographic evidence, EDF
- Built a prototype demonstrating
- AI-driven Change Detection, Ordnance Survey
- Predictive Maintenance via AI, Festo
- Situational Awareness and Fault Monitoring on POS terminals, Cardlink
- Anomaly Detection on Mobile Virtual Private Networks, Undisclosed Client
- Predictive Optimisation of Fuel Logistics, Sensile
- Quality Control and Fault Monitoring, Cisco
As a freelance consultant
- Construction of credit scorecards, Undisclosed client
- Online advertising optimisation, Advance Media
- Sortino Optimisation, Integral Capital Management
- Price discovery in e-commerce, Variably Technologies
- Modelling user-generated restaurant reviews, www.ask4food.gr
- Network analysis, Barclays Group
- Network analytics and plan detection, BAe Systems
Advisory Board Positions
- Jan 12 - Dec 14: Scientific Advisor, Variably Technologies, Hong Kong
Media and Government
- Aug 2016: Alternative Models League Table, Google News
- Dec 2013: POST Parliamentary Committee: contribution to Working Group on Big Data
- Jan 2014: The Independent, “The Missing Girls”, contribution to gender inequality analysis
- March 2013: Facts Are Sacred, Guardian e-book, video interview
- Aug 2012: Alternative Medals League Table, Guardian Datablog
- February 2017, Data Science vs Artificial Intelligence: a useful distinction, 1st AI Conference Athens, Athens, Greece
- Demistifying Data Science, Machine Learning and AI, Barclays Headquarters, UK
- July 2014, Panel Speaker on Artificial Intelligence, CISCO Live, San Jose, United States
- July 2013, Why Data Science is a Science, AVIVA HQ, London
- July 2013, Bayesian updating in non-stationary settings, Google HQ, Mountain View, California, USA
- Nov 10, Handling temporal variation of unknown characteristics in streaming data analysis. Microsoft Research Centre, Cambridge, UK
- Sep 14 - Dec 15: Principal Investigator, Data Exploration and Predictive Analytics for Music Publishing
- Total grant value of 330K GBP
- Industrial partner: Sentric Music
- March 13: EPSRC Small Research Equipment Grant
- Postdoc on dynamic modelling of non-stationary data
- PhD project on Principles of Active Learning (completed)
- PhD project on Covariance Selection in Hetereogeneous Data for Neuroimaging (completed)
- PhD project on Wavelet Analysis on Data Streams (completed)
- several MSc student projects
- Oct 12 - Dec 14: Official Statistics MSc course, Imperial College
- Oct 12 - Dec 14: Graphical Modelling MSc course, Imperial College
- Oct 12 - Dec 14: Advanced Statistical Modelling final undergraduate course, Imperial College
- June 13, First Year Undergraduate Statistics
- Feb 2014: Big Data – Challenges and Applications, Imperial College London
- Dec 2013: Stochastic Approximation for Big Data, ERCIM 2013, London
- May 2013: Big Data – Bridging the Gap between Theory and Practice, Imperial College London
Books and book chapter contributions
- Poorly supervised learning: how to engineer labels for cybersecurity research. Data Science for Cybersecurity, Contributed Book Chapter, to appear.
- Principles of Data Mining II, to appear.
Refereed Journal Publications - click on [PP] for preprints
- Learning population and subject-specific brain connectivity networks via mixed neighborhood selection, Pio Monti, R., Anagnostopoulos, C. and Montana, G., Annals of Applied Stats, to appear PP
- Real-time estimation of dynamic functional connectivity networks. Human Brain Mapping, 2017 PP
- Decoding Time-Varying Functional Connectivity Networks via Linear Graph Embedding Methods. Pio Monti, R., Lorenz, R., Hellyer, P.J., Anagnostopoulos, C., and Montana, G., Frontiers in Computational Neuroscience, 2017 PP
- The Automatic Neuroscientist: A framework for optimizing experimental design with closed-loop real-time fMRI. Lorenz., R., Pio Monti, R., Violante, R.I., Anagnostopoulos, C., and Leech, R., NeuroImage, 2016 PP
- Financial time series modelling using the Hurst exponent. Tzouras, S., Anagnostopoulos, C., and McCoy, E.J., Physica A: Statistical Mechanics and its Applications, 2015
- Targeting Optimal Active learning via Example Quality. Evans, L.G., Adams, N.M. and Anagnostopoulos, C. Journal of Machine Learning Research, 15, 2014
- Estimating time-varying brain connectivity networks from functional MRI time series. Pio Monti, R., Hellyer, P.J., Sharp, D.J., Leech, R., Anagnostopoulos, C., and Montana, G. NeuroImage, 2014 PP
- A better Beta for the \(H\) measure of classification performance. Hand, D.J. and Anagnostopoulos, C. Pattern Recognition Letters, 40, 41–46, 2014
- Information-theoretic data discarding for dynamic trees on data streams. Anagnostopoulos, C., and Gramacy, R.B, Entropy, 15(12), 5510–5535, 2013
- When is the area under the receiver operating characteristic curve an appropriate measure of classifier performance? Hand, D.J., and Anagnostopoulos, C. Pattern Recognition Letters, 34, pp 492–495, 2012.
- Online Linear and Quadratic Discriminant Analysis with adaptive forgetting for streaming classification. Anagnostopoulos, C.,and Tasoulis, D.,and Adams, N.M.,and Hand, D.J., Statistical Analysis and Data Mining, 5(2), pp 139–166, 2012.
- Streaming covariance selection with applications to adaptive querying in sensor networks, Anagnostopoulos, C., Adams, N.M.and Hand, D.J., The Computer Journal, doi: 10.1093/comjnl/bxp123, 2010.
- Temporally adaptive estimation of logistic classifiers on data streams, Adams, N.M., Anagnostopoulos, C., Hand, D.J., and Tasoulis, D., Journal of Advances in Data Analysis and Classification (ADAC)}, doi:10.1007/s11634-009-0051-x, 2009.
Refereed Conference Publications
- Anomaly Detection on User Agent Strings, Spyropoulou, E., Noble, J., and Anagnostopoulos, C., In Proceedings of Data Science for Cybersecurity, to appear, 2017
- Text-mining the NeuroSynth corpus using Deep Boltzmann Machines, Pio Monti, R., Lorenz, R., Leech, R., Anagnostopoulos, C., and Montana, G. IEEE 2016 Workshop on Pattern Recognition in Neuroimaging, 2016 PP
- Towards tailoring non-invasive brain stimulation using real-time fMRI and Bayesian optimization, Lorenz, R., Pio Monti, R., Hampshire, A., Anagnostopouos, C., and Violante, I.R. IEEE 2016 Workshop on Pattern Recognition in Neuroimaging, 2016 PP
- Stopping criteria for boosting automatic experimental design using real-time fMRI with Bayesian optimization. 5th NIPS Workshop on Machine Learning and Interpretation in Neuroimaging, NIPS, 2015
- Graph embeddings of dynamic functional connectivity reveal discriminative patterns of task engagement in HCP data, Pio Monti, R., Lorenz, R., Hellyer, P.J., Anagnostopoulos, C., and Montana, G. IEEE 2015 Workshop on Pattern Recognition in Neuroimaging, 2015 PP
- When does active learning work? Evans, L. P.G. and Anagnostopoulos, C. and Adams, N.M., IDA 2013, Lecture Notes in Computer Science 8207, 174–185, 2013
- Temporally-Adaptive Linear Classification for Handling Population Drift in Credit Scoring. Adams, N.M., Tasoulis, D.K., Anagnostopoulos, C., and Hand, D.J., COMPSTAT, 2010
- Temporally adaptive estimation of logistic classifiers on data streams. Advanceds in Data Analysis and Classification, 2009
- Online optimisation for variable selection in data streams. Anagnostopoulos, C.,and Adams, N.M.and Hand, D.J.,and Tasoulis, D., Proc. of the 18th Europ. Conference on Artificial Intelligence, pp 132 – 136, ECAI 2008
- Simulating dynamic covariance structures for testing the adaptive behaviour of variable selection algorithms. Anagnostopoulos, C.,and Adams, N.M., Proc. of the 10th Int. Conference on Computer Modelling and Simulation, pp 52–57, UKSIM/EUROSIM 2008
- Deciding what to observe next: adaptive variable selection for regression in multivariate data streams. Anagnostopoulos, C., Adams, N.M.and Hand, D.J., Proc. of ACM SAC, Vol. 2, pp 961–965, 2008
- Information-theoretic data discarding for Dynamic Trees on data streams, Anagnostopoulos, C., and Gramacy, R., MCMSki IV, 5th IMS-ISBA joint meeting, Chamonix, France, Jan 2014
- Strategies for Handling the Risk of Obsolete Information in Scorecards, Anagnostopoulos, C., Credit Scoring and Credit Control XIII, Edinburgh, UK
- Dynamic trees for massive data contexts, Anagnostopoulos, C., and Gramacy, R., NIPS workshop on Bayesian optimization, experimental design and bandits, Spain, Dec 2011
- Online Expectation-Maximization in the presence of drift. Anagnostopoulos, C.,Greek stochastics \(\beta'\), Lefkada, Greece, August 2010
- Temporally adaptive, online Expectation-Maximization. Anagnostopoulos, C., Stochastic Approximation Workshop, University of Bristol, September 2010
- Adaptive Filtering for State-Space Identification and State Estimation. Ehrlich, E., Adams, N.M., Anagnostopoulos, C., and Tasoulis, D.K., RSS Conference, 2010
- Temporally-Adaptive Linear Classification for Handling Population Drift in Credit Scoring. Adams, N.M.and Tasoulis, D.K.and Anagnostopoulos, C., and Hand, D.J., in Lechevallier, Y. And Saporta. (eds), Proceedings of the 19th International Conference on Computational Statistics, 2010, Springer, 167-176, COMPSTAT2010, 2010
- Anagnostopoulos, C., Contribution to the Discussion of ‘’Maximum Likelihood Estimation of a multi-dimensional log-concave density’’ by M. Cule, R. Samworth and M. Stewart, to appear in Journal of the Royal Statistical Society, Series B, 2010
- Anagnostopoulos, C. and Tasoulis, D., Contribution to the Discussion of ‘’Sure Independence Screening for ultrahigh dimensional feature space’’ by Fan, J. and Lv, J., Journal of the Royal Statistical Society, Series B (Methodological), Vol 70, Issue 5, p 890, 2008
- Anagnostopoulos, C.and Turnbull, M.C., A Note on Learning Linear Gaussian State-Space Models via Expectation – Maximization, Technical Report http://www.ma.imperial.ac.uk/statistics/techreports/, Imperial College, 2007
Invited Talks / Conference Abstracts
- October 2017, Emerging and adaptive systems: theory and practice, Lancaster University, UK
- September 2017, Data Science for Cybersecurity, Poorly supervised learning: how to engineer labels for machine learning in cybersecurity, Imperial College London, UK
- July 2015, Streaming Data Analysis, Demokritos Labs, University of Athens, Greece
- Jan 2014, MCMSki IV, Fifth IMS-ISBA joint meeting, Information-theoretic data discarding for Dynamic Trees on data streams, Chamonix, France
- Aug 2013, Strategies for Handling the Risk of Obsolete Information in Scorecards, Credit Scoring and Credit Control XIII, Edinburgh, UK
- July 2013, Adaptive power priors for Bayesian updating in the presence of drift with applications, Statistics Seminar, University of British Columbia, Canada
- Nov 12, Simultaneous handling of outliers and drift using adaptive learning rates, Statistics Section, University College London
- Oct 12, Learning in the presence of drift. Computer Lab, Cambridge University, UK
- Oct 11, Dynamic trees for streaming and massive data contexts. Imperial College London, UK
- Dec 10, Handling temporal variation of unknown characteristics in streaming data analysis. Imperial College London, UK
- Nov 10, Online, temporally adaptive parameter estimation with applications to streaming data analysis. Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA
- Nov 10, Online, temporally adaptive parameter estimation with applications to streaming data analysis. Engineering Department, Signal Processing Seminar, University of Cambridge
- Sep 10, Temporally adaptive online EM. Stochastic Approximation Workshop, Uni. of Bristol
- Aug 10, Online EM in the presence of drift. Greek stochastics \(\beta'\), Lefkada, Greece
- July 09, Online statistical inference and temporal adaptivity. Business Intelligence Lab seminar, Telecom ParisTech ENST, Paris, France
- Mar 09, Adaptive querying and temporally adaptive estimation of graphical model structure in distributed sensor networks. ‘’Streaming Data Mining for Sensor Networks’’ Workshop of the International Federation of Classification Societies 2009 Conference, Dresden
- Nov 08, Adaptive forgetting for streaming data. Statistics seminar, Lancaster University
- June 08, Online optimisation for streaming variable selection. Mobile Systems Group seminar, Department of Computer Science, UCL
- Apr 08, Simulating dynamic covariance structures for testing the adaptive behaviour of variable selection algorithms. UKSIM/EUROSIM 2008, Cambridge, UK
- Project Management: I have managed software teams of up to 8 people, delivering software to clients as large as Cisco and Barclays in as many as 6 different programming languages, in all cases ensuring quality by adopting solid design principles and tight processes while ensuring agility and speed of delivery (Testing-Driven-Developmnet, Behavioural-Driven-Development, Git workflows, agile workflows, JIRA project management pipelines). I both contribute to, and maintain active open-source projects in the field of data science.
- Databases: advanced use of a number of database technologies, including relational stores and beyond, such as indexing document stores (e.g., Elasticsearch, MongoDB, Splunk) and graph databases (Neo4j, Orient DB).
- Frameworks: in-depth, hands-on expertise while delivering software to clients in all major Big and Streaming Data open-source platforms, including Apache Storm, Apache Flink, Apache Spark and Spark Streaming, and IBM Infosphere Streams, as well as relevant packages that support large-scale analysis in Python and R.
- Data pipelines: several years’ experience buiding data pipelines using components such as queues (Kafka, RabbitMQ), APIs (Python Flask), streaming connections (socket), distributed memcache stores (Reddis), and ingestion engines (Logstash).
- Visualisation: extensive experience in both using and developing visualisation tools (Bokeh, Matplotlib, R ggplot, R Shiny, D3.js, Kibana, Grafana, Splunk Dashboards); for advanced reporting, interactive visualisation and real-time dashboards.
- Open-source: Experience both contributing in and maintaining open-source software.
- The H-measure for measuring classification performance: www.hmeasure.org
- Dynamic Trees for Classification and Regression, CRAN package