Artificial Intelligence and Intelligent Search Techniques

Artificial Intelligence and Intelligent Search
Techniques.

UMK - logo

          Włodzisław Duch  

wduch

Computational Intelligence Laboratory,
Department of Informatics,
Nicolaus Copernicus University,

Grudziądzka 5, 87-100 Toruń, Poland.

e-mail: id: wduch, na serwerze fizyka.umk.pl.

WWW: https://www.is.umk.pl/~duch

 


Computational Intelligence, CI: is a branch of science which tries to solve problems that are effectively nonalgorithmic (such as the semantic retrieval problem).

Artificial Intelligence, AI: is a branch of CI, stressing the importance of knowledge, representation of knwledge, rule-based understanding.

Other fields relevant to CI:

Biological inspirations: neural networks, evolutionary programming, genetic algorithms.
Logic: fuzzy logic, rough logic, possibility theory
Mathematics: multivariate statistics, classification theory, clusterization, optimization theory
Pattern recognition: computer vision, speech recognition
Enginering: robotics, control theory, biocybernetics
Computer science: theory of grammatics, automata theory, machine learning

"Soft computing" = {neural networks, evolutionary programming, fuzzy logic}

Useful collections of links:

AI and machine learning: www.is.umk.pl/~duch/ai-ml.html
Statistics, neural networks, neurobiology: www.is.umk.pl/~duch/neural.html
Cognitive Science: www.is.umk.pl/~duch/cognitive.html
Software for statisitcs, neural networks, machine learning: www.is.umk.pl/~duch/software.html

Natural Language Processing (NLP)

Understanding meaning of sentences, learning from existing texts, dialog with humans, machine translation.

Problems with meaning:

"In managing the DoD there are many unexpected communications problems. For instance, when the Marines are ordered to "secure a building," they form a landing party and assault it. On the other hand, the same instructions will lead the Army to occupy the building with a troop of infantry, and the Navy will characteristically respond by sending a yeoman to assure that the building lights are turned out. When the Air Force acts on these instructions, what results is a three year lease with option to purchase."

-- James Schlesinger (former Secretary of Defense, USA).

Basic concepts in NLP:

Syntacs, grammar, parsing and semantics. Meaning refers to background knowledge. What is knowledge?

Knowledge representation, linguistic (verbal) knowledge structures.

Knowledge as rules:

IF good food THEN salivate

Are we using rules? In the army all the time ...

Knowledge as semantic network

Each network node is a word, connections (arcs) may signify relations

Sem-net.jpg (27808 bytes) 

 

Knowledge as frames

Generic DOG Frame

Self: an ANIMAL; a PET

Breed: ?

Owner: a PERSON (if-Needed: find a PERSON with pet=myself)

Name: a PROPER NAME (DEFAULT=Rover)

DOG_NEXT_DOOR Frame

Self: a DOG

Breed: mutt

Owner: Jimmy

Name: Fido

 

Knowledge as scripts 

Stereotypic story: restaurants, accidents, business

Many other knowledge representation schemes.
Good part of AI is knowledge engineering.


Mind and concept spaces 

How to show similarity relations between words?
Psychologists: semantic distance from associations or time of reactions.

How do we do it with our brains? Neural networks.
Vector description instead of neural activations - perhaps about 300 dimensions are sufficient (Latent Semantic Analysis indication).
High similarity of symbols or concepts <=> close in the concept space.

"Platonic mind" model - a few pictures

Semantic maps

Semantic maps: 96som-inf.sam SOM map of oil from Italy - ../../g-input/italy.eps

 

wpe6.jpg (21622 bytes)

The DISCERN architecture (performance configuration). The model consists of parsing, generating, question answering, and memory subsystems, two modules each. A dark square indicates a memory module, a light square indicates a processing module. The lines indicate pathways carrying distributed word, sentence, and story representations during the performance phase of the system. The modules are trained separately with compatible I/O data.

wpe7.jpg (32319 bytes)

The lexicon. The lexical input symbol JOHN is translated into the semantic representation of the concept John. The representations are vectors of gray-scale values between 0.0 and 1.0, stored in the weights of the units. The size of the unit on the map indicates how strongly it responds. Only a small part of each map, and only a few strongest associative connections of the lexical unit JOHN are shown in this Figure.

wpe8.jpg (24779 bytes)

The FGREP-module. At each I/O presentation, the representations at the input layer are modified according to the backpropagation error signal, and replace the old representations in the lexicon. In the case of sequential input or output, the hidden layer pattern is saved after each step in the sequence, and used as input to the hidden layer during the next step, together with the actual input.


wpe9.jpg (55538 bytes)

The hierarchical feature map classification of script-based stories. Labels indicate the maximally responding unit for the different scripts and tracks. This particular input story representation is classified as an instance of the restaurant script (top level) and fancy-restaurant track (middle level), with role bindings customer=John, food=lobster, restaurant=MaMaison, tip=big (i.e., unit JLMB, bottom level). Before passing on the originally 84-component representation vector to the next level, the REST-unit removes those 22 components that do not vary across the different restaurant stories, and the FANCY-unit removes those 44 components of the remaining vector that do not vary across the different fancy-restaurant stories. At the bottom level, only an 18 component vector representing the role bindings remains to be mapped. Compression is determined automatically based on the variance in the components, and varies slightly depending on the script and the track.

wpeA.jpg (34094 bytes)

 

 

Lexicon propagation. The orthographic input symbol DOG is translated into the semantic concept dog in this example. The representations are vectors of gray-scale values between 0 and 1, stored in the weights of the feature map units. The size of the unit on the map indicates how strongly it responds. Only a few strongest associative connections of the orthographic input unit DOG (and only that unit) are shown.

wpeB.jpg (75681 bytes)

The training data for the lexicon. Orthographic representations are blurred bitmaps of the orthographic words and phonological representations consist of concatenations of phoneme representations. Concept representations were developed by FGREP in the case-role assignment task and stand for distinct meanings. Gray-scale boxes indicate component values between 0 and 1. The connections depict the mapping between the symbols and their meanings. Many concepts map to several synonymous lexical symbols, and the homonymous symbols CHICKEN and BAT map to two distinct concepts each. The orthographic and phonological symbols correspond one-to-one to each other in this data.

wpeD.jpg (48753 bytes)

The orthographic, phonological, and semantic maps. The input and output maps in each modality have the same order, shown here only once in (a) and (b). (a) Orthographic map. Each unit in the 9 9 network is represented by a box, and the labels indicate the image unit for each symbol representation. The map is divided into major subareas according to word length. (b) Phonological map. The labels indicate the images for each phonological word representation. Again, the word length is the major ordering factor. (c) Semantic map. The labels on this 7 7 map indicate the maximally responding unit for each concept representation. The map is organized according to the semantic categories (table 5).

Example of application in information management:
WebSOM Project - 1 milion documents categorized!

WebSOM root, and second level query


DARPA intelligent search projects.

InterSpace and
MedSpace Project, Cancere gene and Medline example

Taxonomy and automatic classification projects.


Beyond search:

Automatic summarization of news and general texts - difficult but ...

- see Inxight  https://www.inxight.com/
- see the Altavista Discovery project

Data Mining

Find short logical summary of the database, or show the most important relationships.

Example: Iris, mushrooms