Página 1 dos resultados de 4531 itens digitais encontrados em 0.022 segundos

Extração de informação de artigos científicos: uma abordagem baseada em indução de regras de etiquetagem; Information extraction from scientific articles: an approach based on induction of tagging rules

Álvarez, Alberto Cáceres
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 08/05/2007 Português
Relevância na Pesquisa
67.41994%
Este trabalho faz parte do projeto de uma ferramenta denominada FIP (Ferramenta Inteligente de Apoio à Pesquisa) para recuperação, organização e mineração de grandes coleções de documentos. No contexto da ferramenta FIP, diversas técnicas de Recuperação de Informação, Mineração de Dados, Visualização de Informações e, em particular, técnicas de Extração de Informações, foco deste trabalho, são usadas. Sistemas de Extração de Informação atuam sobre um conjunto de dados não estruturados e objetivam localizar informações específicas em um documento ou coleção de documentos, extraí-las e estruturá-las com o intuito de facilitar o uso dessas informações. O objetivo específico desenvolvido nesta dissertação é induzir, de forma automática, um conjunto de regras para a extração de informações de artigos científicos. O sistema de extração proposto, inicialmente, analisa e extrai informações presentes no corpo dos artigos (título, autores, a filiação, resumo, palavras chaves) e, posteriormente, foca na extração das informações de suas referências bibliográficas. A proposta para extração automática das informações das referências é uma abordagem nova, baseada no mapeamento do problema de part-of-speech tagging ao problema de extração de informação. Como produto final do processo de extração...

Real-time information extraction of an electric vehicle

Ferreira, João C.; Monteiro, Vítor; Afonso, João L.
Fonte: Universidade do Minho Publicador: Universidade do Minho
Tipo: Conferência ou Objeto de Conferência
Publicado em 11/07/2012 Português
Relevância na Pesquisa
66.954565%
In this paper is presented the development of a project to extract, in real-time, information’s related with an Electric Vehicle (EV). This project was elaborated to extract data from an EV battery charging device developed at the University of Minho, and from an EV prototype, the VEECO (Veículo Eléctrico ECOlógico – Ecologic Electric Vehicle), developed in a cooperation project of ISEL (Lisbon Superior Institute of Engineering) and the Portuguese company VE. The main goal of this project consists in collecting and transmitting the extracted data to inform the EV driver about the performance and the real behavior of the EV. Thereby, it is created an open interface to manage, in real-time, the main data related with the EV, as the batteries SoC (State-of-Charge), the EV speed, and internal temperatures (like the temperatures of the batteries, motor and power electronics inverter), as well as to control the start and stop of the batteries charging process, and to optimize the charging program (to define the best algorithm to preserve the batteries lifespan). This interface also controls the discharging process of the batteries, in order to make possible to deliver back to the electrical power grid part of the stored energy in the batteries...

Automatic preservation watch using information extraction on the Web : a case study on semantic extraction of natural language for digital preservation

Faria, Luís; Akbik, Alan; Sierman, Barbara; Ras, Marcel; Ferreira, Miguel; Ramalho, José Carlos
Fonte: Biblioteca Nacional de Portugal Publicador: Biblioteca Nacional de Portugal
Tipo: Conferência ou Objeto de Conferência
Publicado em /09/2013 Português
Relevância na Pesquisa
67.20677%
The ability to recognize when digital content is becoming endangered is essential for maintaining the long-term, continuous and authentic access to digital assets. To achieve this ability, knowledge about aspects of the world that might hinder the preservation of content is needed. However, the processes of gathering, managing and reasoning on knowledge can become manually infeasible when the volume and heterogeneity of content increases, multiplying the aspects to monitor. Automation of these processes is possible [11,21], but its usefulness is limited by the data it is able to gather. Up to now, automatic digital preservation processes have been restricted to knowledge expressed in a machine understandable language, ignoring a plethora of data expressed in natural language, such as the DPC Technology Watch Reports, which could greatly contribute to the completeness and freshness of data about aspects of the world related to digital preservation. This paper presents a real case scenario from the National Library of the Netherlands, where the monitoring of publishers and journals is needed. This knowledge is mostly represented in natural language on Web sites of the publishers and, therefore, is dificult to automatically monitor. In this paper...

Mining biomedical information from scientific literature; Mineração de informação biomédica a partir de literatura científica

Campos, David Emmanuel Marques
Fonte: Universidade de Aveiro Publicador: Universidade de Aveiro
Tipo: Tese de Doutorado
Português
Relevância na Pesquisa
57.076465%
The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases...

Algorithms for information extraction and signal annotation on long-term biosignals using clustering techniques

Abreu, Rodolfo Telo Martins de
Fonte: Faculdade de Ciências e Tecnologia Publicador: Faculdade de Ciências e Tecnologia
Tipo: Dissertação de Mestrado
Publicado em //2012 Português
Relevância na Pesquisa
57.097275%
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica; One of the biggest challenges when analysing data is to extract information from it, especially if we dealing with very large sized data, which brings a new set of barriers to be overcome. The extracted information can be used to aid physicians in their diagnosis since biosignals often carry vital information on the subjects. In this research work, we present a signal-independent algorithm with two main goals: perform events detection in biosignals and, with those events, extract information using a set of distance measures which will be used as input to a parallel version of the k-means clustering algorithm. The first goal is achieved by using two different approaches. Events can be found based on peaks detection through an adaptive threshold defined as the signal’s root mean square (RMS) or by morphological analysis through the computation of the signal’s meanwave. The final goal is achieved by dividing the distance measures into n parts and by performing k-means individually. In order to improve speed performance, parallel computing techniques were applied. For this study, a set of different types of signals was acquired and annotated by our algorithm. By visual inspection...

Automated Information Extraction to Support Biomedical Decision Model Construction: A Preliminary Design

Li, Xiaoli; Leong, Tze Yun
Fonte: MIT - Massachusetts Institute of Technology Publicador: MIT - Massachusetts Institute of Technology
Tipo: Artigo de Revista Científica Formato: 125335 bytes; application/pdf
Português
Relevância na Pesquisa
67.104917%
We propose an information extraction framework to support automated construction of decision models in biomedicine. Our proposed technique classifies text-based documents from a large biomedical literature repository, e.g., MEDLINE, into predefined categories, and identifies important keywords for each category based on their discriminative power. Relevant documents for each category are retrieved based on the keywords, and a classification algorithm is developed based on machine learning techniques to build the final classifier. We apply the HITS algorithm to select the authoritative and typical documents within a category, and construct templates in the form of Bayesian networks. Data mining and information extraction techniques are then applied to extract the necessary semantic knowledge to fill in the templates to construct the final decision models.; Singapore-MIT Alliance (SMA)

Learning the Structure of High-Dimensional Manifolds with Self-Organizing Maps for Accurate Information Extraction

Zhang, Lili
Fonte: Universidade Rice Publicador: Universidade Rice
Português
Relevância na Pesquisa
67.010825%
This work aims to improve the capability of accurate information extraction from high-dimensional data, with a specific neural learning paradigm, the Self-Organizing Map (SOM). The SOM is an unsupervised learning algorithm that can faithfully sense the manifold structure and support supervised learning of relevant information from the data. Yet open problems regarding SOM learning exist. We focus on the following two issues. 1. Evaluation of topology preservation. Topology preservation is essential for SOMs in faithful representation of manifold structure. However, in reality, topology violations are not unusual, especially when the data have complicated structure. Measures capable of accurately quantifying and informatively expressing topology violations are lacking. One contribution of this work is a new measure, the Weighted Differential Topographic Function (WDTF), which differentiates an existing measure, the Topographic Function (TF), and incorporates detailed data distribution as an importance weighting of violations to distinguish severe violations from insignificant ones. Another contribution is an interactive visual tool, TopoView, which facilitates the visual inspection of violations on the SOM lattice. We show the effectiveness of the combined use of the WDTF and TopoView through a simple two-dimensional data set and two hyperspectral images. 2. Learning multiple latent variables from high-dimensional data. We use an existing two-layer SOM-hybrid supervised architecture...

Application of information extraction techniques to pharmacological domain : extracting drug-drug interactions

Segura Bedmar, Isabel
Fonte: Universidade Carlos III de Madrid Publicador: Universidade Carlos III de Madrid
Tipo: info:eu-repo/semantics/doctoralThesis; info:eu-repo/semantics/doctoralThesis Formato: application/pdf
Português
Relevância na Pesquisa
57.170557%
Una interacción farmacológica ocurre cuando los efectos de un fármaco se modifican por la presencia de otro. Las consecuencias pueden ser perjudiciales si la interacción causa un aumento de la toxicidad del fármaco o la disminución de su efecto, pudiendo provocar incluso la muerte del paciente en los peores casos. Las interacciones farmacológicas no sólo suponen un grave problema para la seguridad del paciente, sino que además también conllevan un importante incremento en el gasto médico. En la actualidad, el personal sanitario tiene a su disposición diversas bases de datos sobre interacciones que permiten evitar posibles interacciones a la hora de prescribir un determinado tratamiento, sin embargo, estas bases de datos no están completas. Por este motivo, médicos y farmacéuticos se ven obligados a revisar una gran cantidad de artículos científicos e informes sobre seguridad de medicamentos para estar al día de todo lo publicado en relación al tema. Desgraciadamente, el gran volumen de información al respecto hace que estos profesionales estén desbordados ante tal avalancha. El desarrollo de métodos automáticos que permitan recopilar, mantener e interpretar toda esta información es crucial a la hora de conseguir una mejora real en la detección temprana de las interacciones entre fármacos. Por tanto...

Using a shallow linguistic kernel for drug-drug interaction extraction

Segura Bedmar, Isabel; Martínez, Paloma; Pablo-Sánchez, César de
Fonte: Elsevier Publicador: Elsevier
Tipo: info:eu-repo/semantics/acceptedVersion; info:eu-repo/semantics/article
Publicado em /10/2011 Português
Relevância na Pesquisa
57.129043%
A drug–drug interaction (DDI) occurs when one drug influences the level or activity of another drug. Information Extraction (IE) techniques can provide health care professionals with an interesting way to reduce time spent reviewing the literature for potential drug–drug interactions. Nevertheless, no approach has been proposed to the problem of extracting DDIs in biomedical texts. In this article, we study whether a machine learning-based method is appropriate for DDI extraction in biomedical texts and whether the results provided are superior to those obtained from our previously proposed pattern-based approach [1]. The method proposed here for DDI extraction is based on a supervised machine learning technique, more specifically, the shallow linguistic kernel proposed in Giuliano et al. (2006) [2]. Since no benchmark corpus was available to evaluate our approach to DDI extraction, we created the first such corpus, DrugDDI, annotated with 3169 DDIs. We performed several experiments varying the configuration parameters of the shallow linguistic kernel. The model that maximizes the F-measure was evaluated on the test data of the DrugDDI corpus, achieving a precision of 51.03%, a recall of 72.82% and an F-measure of 60.01%. To the best of our knowledge...

SES : sistema de extração semântica de informações; System of semantic extraction of information

Scarinci, Rui Gureghian
Fonte: Universidade Federal do Rio Grande do Sul Publicador: Universidade Federal do Rio Grande do Sul
Tipo: Dissertação Formato: application/pdf
Português
Relevância na Pesquisa
57.418276%
Entre as áreas que mais se desenvolvem na informática nos últimos anos estão aquelas relacionadas ao crescimento da rede Internet, que interliga milhões de usuários de todo o mundo. Esta rede disponibiliza aos usuários uma a enorme variedade e quantidade de informações, principalmente dados armazenados de forma não estruturada ou semi estruturada. Contudo, tal volume e heterogeneidade acaba dificultando a manipulação dos dados recuperados a partir da Internet. Este problema motivou o desenvolvimento deste trabalho. Mesmo com o auxílio de várias ferramentas de pesquisa na Internet, buscando realizar pesquisas sobre assuntos específicos, o usuário ainda tem que manipular em seu computador pessoal uma grande quantidade de informação, pois estas ferramentas não realizam um processo de seleção detalhado. Ou seja, são recuperados muitos dados não interessantes ao usuário. Existe, também, uma grande diversidade de assuntos e padrões de transferência e armazenamento da informação criando os mais heterogêneos ambientes de pesquisa e consulta de dados. Esta heterogeneidade faz com que o usuário da rede deva conhecer todo um conjunto de padrões e ferramentas a fim de obter a informação desejada. No entanto, a maior dificuldade de manipulação esta ligada aos formatos de armazenamento não estruturados ou pouco estruturados...

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction

Deleger, Louise; Molnar, Katalin; Savova, Guergana; Xia, Fei; Lingren, Todd; Li, Qi; Marsolo, Keith; Jegga, Anil; Kaiser, Megan; Stoutenborough, Laura; Solti, Imre
Fonte: BMJ Group Publicador: BMJ Group
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
57.255107%
Objective: (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. Material and methods A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated ‘gold standard’. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. Results: The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. Discussion and conclusion NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore...

Personalized Web Services for Web Information Extraction

Jarir, Zahi; Quafafou, Mohamed; Erradi, Mahammed
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 27/08/2011 Português
Relevância na Pesquisa
57.2528%
The field of information extraction from the Web emerged with the growth of the Web and the multiplication of online data sources. This paper is an analysis of information extraction methods. It presents a service oriented approach for web information extraction considering both web data management and extraction services. Then we propose an SOA based architecture to enhance flexibility and on-the-fly modification of web extraction services. An implementation of the proposed architecture is proposed on the middleware level of Java Enterprise Edition (JEE) servers.

WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Information Extraction

Chenthamarakshan, Vijil; Desphande, Prasad M; Krishnapuram, Raghu; Varadarajan, Ramakrishna; Stolze, Knut
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 28/06/2015 Português
Relevância na Pesquisa
57.16865%
The visual layout of a webpage can provide valuable clues for certain types of Information Extraction (IE) tasks. In traditional rule based IE frameworks, these layout cues are mapped to rules that operate on the HTML source of the webpages. In contrast, we have developed a framework in which the rules can be specified directly at the layout level. This has many advantages, since the higher level of abstraction leads to simpler extraction rules that are largely independent of the source code of the page, and, therefore, more robust. It can also enable specification of new types of rules that are not otherwise possible. To the best of our knowledge, there is no general framework that allows declarative specification of information extraction rules based on spatial layout. Our framework is complementary to traditional text based rules framework and allows a seamless combination of spatial layout based rules with traditional text based rules. We describe the algebra that enables such a system and its efficient implementation using standard relational and text indexing features of a relational database. We demonstrate the simplicity and efficiency of this system for a task involving the extraction of software system requirements from software product pages.

A Fuzzy Approach for Pertinent Information Extraction from Web Resources

Boughamoura, Radhouane; Omri, Mohamed Nazih; Youssef, Habib
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 05/06/2012 Português
Relevância na Pesquisa
57.143027%
Recent work in machine learning for information extraction has focused on two distinct sub-problems: the conventional problem of filling template slots from natural language text, and the problem of wrapper induction, learning simple extraction procedures ("wrappers") for highly structured text such as Web pages. For suitable regular domains, existing wrapper induction algorithms can efficiently learn wrappers that are simple and highly accurate, but the regularity bias of these algorithms makes them unsuitable for most conventional information extraction tasks. This paper describes a new approach for wrapping semistructured Web pages. The wrapper is capable of learning how to extract relevant information from Web resources on the basis of user supplied examples. It is based on inductive learning techniques as well as fuzzy logic rules. Experimental results show that our approach achieves noticeably better precision and recall coefficient performance measures than SoftMealy, which is one of the most recently reported wrappers capable of wrapping semi-structured Web pages with missing attributes, multiple attributes, variant attribute permutations, exceptions, and typos.; Comment: International Journal of Computational Science - 2008

FrameNet CNL: a Knowledge Representation and Information Extraction Language

Barzdins, Guntis
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 10/06/2014 Português
Relevância na Pesquisa
57.23429%
The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that FrameNet-CNL eventually could shape the natural language subset used for writing the newswire articles.; Comment: CNL-2014 camera-ready version. The final publication is available at link.springer.com

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

Marcheggiani, Diego; Sebastiani, Fabrizio
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
57.170557%
In the last five years there has been a flurry of work on information extraction from clinical documents, i.e., on algorithms capable of extracting, from the informal and unstructured texts that are generated during everyday clinical practice, mentions of concepts relevant to such practice. Most of this literature is about methods based on supervised learning, i.e., methods for training an information extraction system from manually annotated examples. While a lot of work has been devoted to devising learning methods that generate more and more accurate information extractors, no work has been devoted to investigating the effect of the quality of training data on the learning process. Low quality in training data often derives from the fact that the person who has annotated the data is different from the one against whose judgment the automatically annotated data must be evaluated. In this paper we test the impact of such data quality issues on the accuracy of information extraction systems as applied to the clinical domain. We do this by comparing the accuracy deriving from training data annotated by the authoritative coder (i.e., the one who has also annotated the test data, and by whose judgment we must abide), with the accuracy deriving from training data annotated by a different coder. The results indicate that...

An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching

Wellner, Ben; McCallum, Andrew; Peng, Fuchun; Hay, Michael
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 11/07/2012 Português
Relevância na Pesquisa
57.04865%
Although information extraction and coreference resolution appear together in many applications, most current systems perform them as ndependent steps. This paper describes an approach to integrated inference for extraction and coreference based on conditionally-trained undirected graphical models. We discuss the advantages of conditional probability training, and of a coreference model structure based on graph partitioning. On a data set of research paper citations, we show significant reduction in error by using extraction uncertainty to improve coreference citation matching accuracy, and using coreference to improve the accuracy of the extracted fields.; Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

Learning with Scope, with Application to Information Extraction and Classification

Blei, David; Bagnell, J Andrew; McCallum, Andrew
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 12/12/2012 Português
Relevância na Pesquisa
57.104917%
In probabilistic approaches to classification and information extraction, one typically builds a statistical model of words under the assumption that future data will exhibit the same regularities as the training data. In many data sets, however, there are scope-limited features whose predictive power is only applicable to a certain subset of the data. For example, in information extraction from web pages, word formatting may be indicative of extraction category in different ways on different web pages. The difficulty with using such features is capturing and exploiting the new regularities encountered in previously unseen data. In this paper, we propose a hierarchical probabilistic model that uses both local/scope-limited features, such as word formatting, and global features, such as word content. The local regularities are modeled as an unobserved random parameter which is drawn once for each local data set. This random parameter is estimated during the inference process and then used to perform classification with both the local and global features--- a procedure which is akin to automatically retuning the classifier to the local regularities on each newly encountered web page. Exact inference is intractable and we present approximations via point estimates and variational methods. Empirical results on large collections of web data demonstrate that this method significantly improves performance from traditional models of global features alone.; Comment: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

Information extraction from chemical patents

Jessop, David M
Fonte: University of Cambridge; Department of Chemistry; Fitzwilliam College Publicador: University of Cambridge; Department of Chemistry; Fitzwilliam College
Tipo: Thesis; doctoral; PhD
Português
Relevância na Pesquisa
57.112393%
The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye ? an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) ? is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye ? 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3...

Optimizing Selection of Assessment Solutions for Completing Information Extraction Results

Feilmayr,Christina
Fonte: Centro de Investigación en computación, IPN Publicador: Centro de Investigación en computación, IPN
Tipo: Artigo de Revista Científica Formato: text/html
Publicado em 01/06/2013 Português
Relevância na Pesquisa
67.30114%
Incomplete information produces serious consequences in information extraction: it increases costs and leads to problems in downstream processing. This work focuses on improving the completeness of extraction results by applying judiciously selected assessment methods to information extraction based on the principle of complementarity. Our recommendation model simplifies the selection of assessment methods which can overcome a specific incompleteness problem. This paper also focuses on the characterization of information extraction and assessment methods as well as on a rule-based approach that allows estimation of general processability, profitability in the complementarity approach, and the performance of an assessment method under evaluation.