Abstract:
NeuroMorpho.Org is a scientific database of digital reconstructions of neurons and glia. It
serves as a large-scale repository of a wide range of morphological information that can be
accessed all over the world, thus encouraging data sharing and communication amongst
the international neuroscience community. The curation of such data is very helpful in
mining and understanding the relationships between dendritic and axonal branching, glial
processes, brain connectivity, and synaptic signaling. Metadata refers to information about
the data. NeuroMorpho.Org specifically provides metadata for each curated cell, including
details on the animal subject, brain region, cell type, and experimental protocol. This
information is extracted from the corresponding peer reviewed publications that describe
the reconstructed neurons or glia.
The manual process of metadata extraction and annotation can be labor intensive, time
consuming, and error prone. In this regard, machine learning can be employed to overcome
such challenges by facilitating and eventually automating the identification of relevant
information. To ensure efficacy, machine learning tools must be trained with a corpus of existing annotations. Here we deployed a two-pronged approach for analyzing
NeuroMorpho.Org metadata to provide a useful training set to aid the ongoing development
of semi-automated annotation. First, we investigated our records of metadata in order to
deduce any systematic patterns that may underlie neurobiological rules or statistical trends
and could be expressed into artificial intelligence heuristics. Specifically, we used a
frequency-based data mining algorithm known as “Apriori”, which makes use of
association rules to compute frequent itemsets consisting of neuronal and glial metadata.
Second, we utilized machine learning tools in extracting key metadata via an approach
known as “named entity recognition”, or NER, such that metadata acquisition can be
automated. In this case, it is necessary to perform several rounds of manual annotations
that the algorithm can learn from, thus making automated annotation as precise as possible.
Altogether, our investigation can potentially aid technologies in training algorithms for
robust metadata annotation, which can lead to the expansion and enhancement of
NeuroMorpho.Org.