SEDAL aims at contributing novel machine learning algorithms along these lines:
- Advanced remote sensing data and EO time series processing and statistical characterization
- Advanced regression methods, involving kernel methods, Gaussian processes, random forests, and deep nets
- Efficient large-scale model implementations
- Uncertainty quantification and propagation
- Physically-based models, emulation of RTMs, and design of physically-meaningful priors in machine learning regression
- Knowledge discovery and structure learning from empirical EO data
- (Conditional) Dependence estimation of EO variables and observations
- Graphical models, structure learning, Bayesian networks and causal inference from empirical EO data
The target EO applications are:
- Improved retrieval (regression) algorithms at local, regional, and global planetary scales
- Structure inference and relevance determination of essential climate variables and observations
- Climate change detection, anomalies, extremes, and causal inference attribution
Activities are organized in three major tasks: two theoretical tasks guided by an application-oriented one dealing with relevant EO problems.
- Workpackage 1. Improving statistical regression models. We will develop new kernel regression models to cope with the shortcomings identified before, namely: improve model’s accuracy by encoding prior knowledge, quantify the uncertainty of the estimations, attain self-explanatory models, and alleviate the computational cost. We will develop ways to encode prior knowledge about the problem by design of kernels and neural structures able to: (i) incorporate explicit physical restrictions based on warping functions, (ii) combine heterogeneous information for spatial-spectral, multi-temporal multi-angular and multi-sensor data processing; (iii) include the information of unlabeled samples via semisupervised covariances; (iv) predict multiple variables simultaneously in order to constrain predictions to sensible levels; (v) account for signal and noise characteristics; (vi) deploy efficient (sparse and divide-and-conquer) kernel regression models; and (vii) discover knowledge in kernel models.
- Workpackage 2. Learning graphical models and causal inference. We will exploit results and algorithms of the previous task in order to develop methods that can learn nonlinear data dependencies and possibly infer causal relations. We will propose (i) new conditional independence estimates, (ii) constrained-based (physically-based) structure learning, (iii) dynamic graphical models, and (iv) causal inference models, mostly based on detection of PDF asymmetries and regression-based methods. Models and inferred structures will be tested through pure non-interventional settings, as well as intervention analyses in controlled situations, that might reveal the presence of hidden causal variables and relationships, and by quantifying the impact of prior (physical) knowledge.
- Workpackage 3. Case study: From local to global scales in EO variable learning. We will focus on the relevant applications of (a) learning statistical predictive models for key biophysical variables, (b) extracting knowledge from the models and the nonlinear hierarchical data representations, and (c) inferring causal variable relations from empirical data, both at local and global scales.
- Modeling biophysical parameters at local scale, primarily focusing on chlorophyll content, fluorescence, biomass, LAI and fAPAR. The study and quantification of uncertainty, inclusion of prior physical knowledge to constrain model’s flexibility, and the analysis of dependence/causal relations between variables will be the main scientific questions to be addressed.
- Generate global flux products derived from upscaling FLUXNET eddy covariance observations using an array of remote sensing data. We will evaluate the developed regression algorithms, the relative relevance of explanatory variables, and will learn graph dependencies between remote sensing variables and carbon (e.g. total ecosystem respiration, net ecosystem exchange), energy (e.g. latent heat and heat radiation), and water (e.g. evapotranspiration) fluxes. We will also study statistical relations between global products for both essential climate variables over land (biomass, LAI and the fraction of absorbed photosynthetically active radiation, fAPAR).
- Both case studies will involve important efforts in open data harmonization (formats, centralized database server, access/sharing protocols, documentation, etc.) and open code generation (toolbox releases, products, models, etc.).