#onenote# ELK

ELK on docker


Integrating Hadoop and Elasticsearch – Part 1 – Loading into and Querying Elasticsearch from Apache Hive

Integrating Hadoop and Elasticsearch – Part 2 – Writing to and Querying Elasticsearch from Apache Spark


基于 Hive / ES 金融大数据指标系统






Rest URL

Verify installation

curl ‘http://localhost:9200/?pretty

curl localhost:9200/_cat/indices?v

shutdown API:

curl -XPOST ‘http://localhost:9200/_shutdown

A request to Elasticsearch consists of the same parts as any HTTP request:

curl -X ‘:///?’ -d ”

The parts marked with < > above are:


The appropriate HTTP method or verbGET, POST, PUT, HEAD, or DELETE.


Either http or https (if you have an https proxy in front of Elasticsearch.)


The hostname of any node in your Elasticsearch cluster, or localhost for a node on your local machine.


The port running the Elasticsearch HTTP service, which defaults to 9200.


Any optional query-string parameters (for example ?pretty will pretty-printthe JSON response to make it easier to read.)


A JSON-encoded request body (if the request needs one.)

For instance, to count the number of documents in the cluster, we could use this:

curl -XGET ‘http://localhost:9200/_count?pretty‘ -d ‘
“query”: {
“match_all”: {}

Multi-index, Multitype


Search all types in all indices


Search all types in the gb index


Search all types in the gb and us indices


Search all types in any indices beginning with g or beginning with u


Search type user in the gb index


Search types user and tweet in the gb and us indices


Search types user and tweet in all indices


GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10

      1. in Elasticsearch, a document belongs to a type, and those types live inside anindex. You can draw some (rough) parallels to a traditional relational database:

Relational DB  ⇒ Databases ⇒ Tables ⇒ Rows      ⇒ Columns
Elasticsearch  ⇒ Indices   ⇒ Types  ⇒ Documents ⇒ Fields

      1. Elasticsearch supports the following simple field types:
      • String: string
      • Whole number: byte, short, integer, long
      • Floating-point: float, double
      • Boolean: boolean
      • Date: date


What Is Relevance?

We’ve mentioned that, by default, results are returned in descending order of relevance. But what is relevance? How is it calculated?

The relevance score of each document is represented by a positive floating-point number called the_score. The higher the _score, the more relevant the document.

A query clause generates a _score for each document. How that score is calculated depends on the type of query clause. Different query clauses are used for different purposes: a fuzzy query might determine the _score by calculating how similar the spelling of the found word is to the original search term; a terms query would incorporate the percentage of terms that were found. However, what we usually mean by relevance is the algorithm that we use to calculate how similar the contents of a full-text field are to a full-text query string.

The standard similarity algorithm used in Elasticsearch is known as term frequency/inverse document frequency, or TF/IDF, which takes the following factors into account:

Term frequency

How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.

Inverse document frequency

How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more-uncommon terms.

Field-length norm

How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short title field carries more weight than the same term appearing in a long content field.

Individual queries may combine the TF/IDF score with other factors such as the term proximity in phrase queries, or term similarity in fuzzy queries.

Relevance is not just about full-text search, though. It can equally be applied to yes/no clauses, where the more clauses that match, the higher the _score.

When multiple query clauses are combined using a compound query like the bool query, the _scorefrom each of these query clauses is combined to calculate the overall _score for the document.

From <http://techbus.safaribooksonline.com/book/web-development/search/9781449358532/1dot-you-know-for-search/idm12696976_html#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk3ODE0NDkzNTg1MzIlMkZyZWxldmFuY2VfaW50cm9faHRtbCZxdWVyeT0=>

Basic Concepts

Let’s take a look at the main concepts of ElasticSearch:

  • Cluster: A set of Nodes (servers) that holds all the data.
  • Node: A single server that holds some data and participate on the cluster’s indexing and querying.
  • Index: Forget SQL Indexes. Each ES Index is a set of Documents.
  • Shards: A subset of Documents of an Index. An Index can be divided in many shards.
  • Type: A definition of the schema of a Document inside of an Index (a Index can have more than one type assigned).
  • Document: A JSON object with some data. It’s the basic information unit in ES.

From <https://dzone.com/articles/elasticsearch-101>


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s