注重体验与质量的电子书资源下载网站
分类于: 设计 云计算&大数据
简介
Apache Solr 3 Enterprise Search Server: Enhance your search with faceted navigation, result highlighting, relevancy ranked sorting, and more 豆 7.5分
资源最后更新于 2020-11-17 04:19:09
作者:David Smiley
出版社:Packt Publishing
出版日期:2011-01
ISBN:9781849516068
文件格式: pdf
标签: solr Java search lucene 搜索 计算机科学 程序设计 搜索引擎
简介· · · · · ·
Enhance your search with faceted navigation, result highlighting, relevancy ranked sorting, and more
Comprehensive information on Apache Solr 3 with examples and tips so you can focus on the important parts
Integration examples with databases, web-crawlers, XSLT, Java & embedded-Solr, PHP & Drupal, JavaScript, Ruby frameworks
Advice on data modeling, deployment considerations t...
目录
Chapter 1: Quick Starting Solr 7
An introduction to Solr 7
Lucene, the underlying engine 8
Solr, a Lucene-based search server 9
Comparison to database technology 10
Getting started 11
Solr's installation directory structure 12
Solr's home directory and Solr cores 14
Running Solr 15
A quick tour of Solr 16
Loading sample data 18
A simple query 20
Some statistics 23
The sample browse interface 24
Configuration files 25
Resources outside this book 27
Summary 28
Chapter 2: Schema and Text Analysis 29
MusicBrainz.org 30
One combined index or separate indices 31
One combined index 32
Problems with using a single combined index 33
Separate indices 34
Schema design 35
Step 1: Determine which searches are going to be powered by Solr 36
Step 2: Determine the entities returned from each search 36
Step 3: Denormalize related data 37
Denormalizing¡ª'one-to-one' associated data 37
Denormalizing¡ª'one-to-many' associated data 38
Step 4: (Optional) Omit the inclusion of fields only used in search results 39
The schema.xml file 40
Defining field types 41
Built-in field type classes 42
Numbers and dates 42
Geospatial 43
Field options 43
Field definitions 44
Dynamic field definitions 45
Our MusicBrainz field definitions 46
Copying fields 48
The unique key 49
The default search field and query operator 49
Text analysis 50
Configuration 51
Experimenting with text analysis 54
Character filters 55
Tokenization 57
WordDelimiterFilter 59
Stemming 61
Correcting and augmenting stemming 62
Synonyms 63
Index-time versus query-time, and to expand or not 64
Stop words 65
Phonetic sounds-like analysis 66
Substring indexing and wildcards 67
ReversedWildcardFilter 68
N-grams 69
N-gram costs 70
Sorting Text 71
Miscellaneous token filters 72
Summary 73
Chapter 3: Indexing Data 75
Communicating with Solr 76
Direct HTTP or a convenient client API 76
Push data to Solr or have Solr pull it 76
Data formats 76
HTTP POSTing options to Solr 77
Remote streaming 79
Solr's Update-XML format 80
Deleting documents 81
Commit, optimize, and rollback 82
Sending CSV formatted data to Solr 84
Configuration options 86
The Data Import Handler Framework 87
Setup 88
The development console 89
Writing a DIH configuration file 90
Data Sources 90
Entity processors 91
Fields and transformers 92
Example DIH configurations 94
Importing from databases 94
Importing XML from a file with XSLT 96
Importing multiple rich document files (crawling) 97
Importing commands 98
Delta imports 99
Indexing documents with Solr Cell 100
Extracting text and metadata from files 100
Configuring Solr 101
Solr Cell parameters 102
Extracting karaoke lyrics 104
Indexing richer documents 106
Update request processors 109
Summary 110
Chapter 4: Searching 111
Your first search, a walk-through 112
Solr's generic XML structured data representation 114
Solr's XML response format 115
Parsing the URL 116
Request handlers 117
Query parameters 119
Search criteria related parameters 119
Result pagination related parameters 120
Output related parameters 121
Diagnostic related parameters 121
Query parsers and local-params 122
Query syntax (the lucene query parser) 123
Matching all the documents 125
Mandatory, prohibited, and optional clauses 125
Boolean operators 126
Sub-queries 127
Limitations of prohibited clauses in sub-queries 128
Field qualifier 128
Phrase queries and term proximity 129
Wildcard queries 129
Fuzzy queries 131
Range queries 131
Date math 132
Score boosting 133
Existence (and non-existence) queries 134
Escaping special characters 134
The Dismax query parser (part 1) 135
Searching multiple fields 137
Limited query syntax 137
Min-should-match 138
Basic rules 138
Multiple rules 139
What to choose 140
A default search 140
Filtering 141
Sorting 142
Geospatial search 143
Indexing locations 143
Filtering by distance 144
Sorting by distance 145
Summary 146
Chapter 5: Search Relevancy 147
Scoring 148
Query-time and index-time boosting 149
Troubleshooting queries and scoring 149
Dismax query parser (part 2) 151
Lucene's DisjunctionMaxQuery 152
Boosting: Automatic phrase boosting 153
Configuring automatic phrase boosting 153
Phrase slop configuration 154
Partial phrase boosting 154
Boosting: Boost queries 155
Boosting: Boost functions 156
Add or multiply boosts? 157
Function queries 158
Field references 159
Function reference 160
Mathematical primitives 161
Other math 161
Download from Wow! eBook <www.wowebook.com>
ord and rord 162
Miscellaneous functions 162
Function query boosting 164
Formula: Logarithm 164
Formula: Inverse reciprocal 165
Formula: Reciprocal 167
Formula: Linear 168
How to boost based on an increasing numeric field 168
Step by step¡ 169
External field values 170
How to boost based on recent dates 170
Step by step¡ 170
Summary 171
Chapter 6: Faceting 173
A quick example: Faceting release types 174
MusicBrainz schema changes 176
Field requirements 178
Types of faceting 178
Faceting field values 179
Alphabetic range bucketing 181
Faceting numeric and date ranges 182
Range facet parameters 185
Facet queries 187
Building a filter query from a facet 188
Field value filter queries 189
Facet range filter queries 189
Excluding filters (multi-select faceting) 190
Hierarchical faceting 194
Summary 196
Chapter 7: Search Components 197
About components 198
The Highlight component 200
A highlighting example 200
Highlighting configuration 202
The regex fragmenter 205
The fast vector highlighter with multi-colored highlighting 205
The SpellCheck component 207
Schema configuration 208
Configuration in solrconfig.xml 209
Configuring spellcheckers (dictionaries) 211
Processing of the q parameter 213
Processing of the spellcheck.q parameter 213
Building the dictionary from its source 214
Issuing spellcheck requests 215
Example usage for a misspelled query 217
Query complete / suggest 219
Query term completion via facet.prefix 221
Query term completion via the Suggester 223
Query term completion via the Terms component 226
The QueryElevation component 227
Configuration 228
The MoreLikeThis component 230
Configuration parameters 231
Parameters specific to the MLT search component 231
Parameters specific to the MLT request handler 231
Common MLT parameters 232
MLT results example 234
The Stats component 236
Configuring the stats component 237
Statistics on track durations 237
The Clustering component 238
Result grouping/Field collapsing 239
Configuring result grouping 241
The TermVector component 243
Summary 243
Chapter 8: Deployment 245
Deployment methodology for Solr 245
Questions to ask 246
Installing Solr into a Servlet container 247
Differences between Servlet containers 248
Defining solr.home property 248
Logging 249
HTTP server request access logs 250
Solr application logging 251
Configuring logging output 252
Logging using Log4j 253
Jetty startup integration 253
Managing log levels at runtime 254
A SearchHandler per search interface? 254
Leveraging Solr cores 256
Configuring solr.xml 256
Property substitution 258
Include fragments of XML with XInclude 259
Managing cores 259
Why use multicore? 261
Monitoring Solr performance 262
Stats.jsp 263
JMX 264
Starting Solr with JMX 265
Securing Solr from prying eyes 270
Limiting server access 270
Securing public searches 272
Controlling JMX access 273
Securing index data 273
Controlling document access 273
Other things to look at 274
Summary 275
Chapter 9: Integrating Solr 277
Working with included examples 278
Inventory of examples 278
Solritas, the integrated search UI 279
Pros and Cons of Solritas 281
SolrJ: Simple Java interface 283
Using Heritrix to download artist pages 283
SolrJ-based client for Indexing HTML 285
SolrJ client API 287
Embedding Solr 288
Searching with SolrJ 289
Indexing 290
When should I use embedded Solr? 294
In-process indexing 294
Standalone desktop applications 295
Upgrading from legacy Lucene 295
Using JavaScript with Solr 296
Wait, what about security? 297
Building a Solr powered artists autocomplete widget with jQuery
and JSONP 298
AJAX Solr 303
Using XSLT to expose Solr via OpenSearch 305
OpenSearch based Browse plugin 306
Installing the Search MBArtists plugin 306
Accessing Solr from PHP applications 309
solr-php-client 310
Drupal options 311
Apache Solr Search integration module 312
Hosted Solr by Acquia 312
Ruby on Rails integrations 313
The Ruby query response writer 313
sunspot_rails gem 314
Setting up MyFaves project 315
Populating MyFaves relational database from Solr 316
Build Solr indexes from a relational database 318
Complete MyFaves website 320
Which Rails/Ruby library should I use? 322
Nutch for crawling web pages 323
Maintaining document security with ManifoldCF 324
Connectors 325
Putting ManifoldCF to use 325
Summary 328
Chapter 10: Scaling Solr 329
Tuning complex systems 330
Testing Solr performance with SolrMeter 332
Optimizing a single Solr server (Scale up) 334
Configuring JVM settings to improve memory usage 334
MMapDirectoryFactory to leverage additional virtual memory 335
Enabling downstream HTTP caching 335
Solr caching 338
Tuning caches 339
Indexing performance 340
Designing the schema 340
Sending data to Solr in bulk 341
Don't overlap commits 342
Disabling unique key checking 343
Index optimization factors 343
Enhancing faceting performance 345
Using term vectors 345
Improving phrase search performance 346
Moving to multiple Solr servers (Scale horizontally) 348
Replication 349
Starting multiple Solr servers 349
Configuring replication 351
Load balancing searches across slaves 352
Indexing into the master server 352
Configuring slaves 353
Configuring load balancing 354
Sharding indexes 356
Assigning documents to shards 357
Searching across shards (distributed search) 358
Combining replication and sharding (Scale deep) 360
Near real time search 362
Where next for scaling Solr? 363
Summary 364
Appendix: Search Quick Reference 365
Quick reference
An introduction to Solr 7
Lucene, the underlying engine 8
Solr, a Lucene-based search server 9
Comparison to database technology 10
Getting started 11
Solr's installation directory structure 12
Solr's home directory and Solr cores 14
Running Solr 15
A quick tour of Solr 16
Loading sample data 18
A simple query 20
Some statistics 23
The sample browse interface 24
Configuration files 25
Resources outside this book 27
Summary 28
Chapter 2: Schema and Text Analysis 29
MusicBrainz.org 30
One combined index or separate indices 31
One combined index 32
Problems with using a single combined index 33
Separate indices 34
Schema design 35
Step 1: Determine which searches are going to be powered by Solr 36
Step 2: Determine the entities returned from each search 36
Step 3: Denormalize related data 37
Denormalizing¡ª'one-to-one' associated data 37
Denormalizing¡ª'one-to-many' associated data 38
Step 4: (Optional) Omit the inclusion of fields only used in search results 39
The schema.xml file 40
Defining field types 41
Built-in field type classes 42
Numbers and dates 42
Geospatial 43
Field options 43
Field definitions 44
Dynamic field definitions 45
Our MusicBrainz field definitions 46
Copying fields 48
The unique key 49
The default search field and query operator 49
Text analysis 50
Configuration 51
Experimenting with text analysis 54
Character filters 55
Tokenization 57
WordDelimiterFilter 59
Stemming 61
Correcting and augmenting stemming 62
Synonyms 63
Index-time versus query-time, and to expand or not 64
Stop words 65
Phonetic sounds-like analysis 66
Substring indexing and wildcards 67
ReversedWildcardFilter 68
N-grams 69
N-gram costs 70
Sorting Text 71
Miscellaneous token filters 72
Summary 73
Chapter 3: Indexing Data 75
Communicating with Solr 76
Direct HTTP or a convenient client API 76
Push data to Solr or have Solr pull it 76
Data formats 76
HTTP POSTing options to Solr 77
Remote streaming 79
Solr's Update-XML format 80
Deleting documents 81
Commit, optimize, and rollback 82
Sending CSV formatted data to Solr 84
Configuration options 86
The Data Import Handler Framework 87
Setup 88
The development console 89
Writing a DIH configuration file 90
Data Sources 90
Entity processors 91
Fields and transformers 92
Example DIH configurations 94
Importing from databases 94
Importing XML from a file with XSLT 96
Importing multiple rich document files (crawling) 97
Importing commands 98
Delta imports 99
Indexing documents with Solr Cell 100
Extracting text and metadata from files 100
Configuring Solr 101
Solr Cell parameters 102
Extracting karaoke lyrics 104
Indexing richer documents 106
Update request processors 109
Summary 110
Chapter 4: Searching 111
Your first search, a walk-through 112
Solr's generic XML structured data representation 114
Solr's XML response format 115
Parsing the URL 116
Request handlers 117
Query parameters 119
Search criteria related parameters 119
Result pagination related parameters 120
Output related parameters 121
Diagnostic related parameters 121
Query parsers and local-params 122
Query syntax (the lucene query parser) 123
Matching all the documents 125
Mandatory, prohibited, and optional clauses 125
Boolean operators 126
Sub-queries 127
Limitations of prohibited clauses in sub-queries 128
Field qualifier 128
Phrase queries and term proximity 129
Wildcard queries 129
Fuzzy queries 131
Range queries 131
Date math 132
Score boosting 133
Existence (and non-existence) queries 134
Escaping special characters 134
The Dismax query parser (part 1) 135
Searching multiple fields 137
Limited query syntax 137
Min-should-match 138
Basic rules 138
Multiple rules 139
What to choose 140
A default search 140
Filtering 141
Sorting 142
Geospatial search 143
Indexing locations 143
Filtering by distance 144
Sorting by distance 145
Summary 146
Chapter 5: Search Relevancy 147
Scoring 148
Query-time and index-time boosting 149
Troubleshooting queries and scoring 149
Dismax query parser (part 2) 151
Lucene's DisjunctionMaxQuery 152
Boosting: Automatic phrase boosting 153
Configuring automatic phrase boosting 153
Phrase slop configuration 154
Partial phrase boosting 154
Boosting: Boost queries 155
Boosting: Boost functions 156
Add or multiply boosts? 157
Function queries 158
Field references 159
Function reference 160
Mathematical primitives 161
Other math 161
Download from Wow! eBook <www.wowebook.com>
ord and rord 162
Miscellaneous functions 162
Function query boosting 164
Formula: Logarithm 164
Formula: Inverse reciprocal 165
Formula: Reciprocal 167
Formula: Linear 168
How to boost based on an increasing numeric field 168
Step by step¡ 169
External field values 170
How to boost based on recent dates 170
Step by step¡ 170
Summary 171
Chapter 6: Faceting 173
A quick example: Faceting release types 174
MusicBrainz schema changes 176
Field requirements 178
Types of faceting 178
Faceting field values 179
Alphabetic range bucketing 181
Faceting numeric and date ranges 182
Range facet parameters 185
Facet queries 187
Building a filter query from a facet 188
Field value filter queries 189
Facet range filter queries 189
Excluding filters (multi-select faceting) 190
Hierarchical faceting 194
Summary 196
Chapter 7: Search Components 197
About components 198
The Highlight component 200
A highlighting example 200
Highlighting configuration 202
The regex fragmenter 205
The fast vector highlighter with multi-colored highlighting 205
The SpellCheck component 207
Schema configuration 208
Configuration in solrconfig.xml 209
Configuring spellcheckers (dictionaries) 211
Processing of the q parameter 213
Processing of the spellcheck.q parameter 213
Building the dictionary from its source 214
Issuing spellcheck requests 215
Example usage for a misspelled query 217
Query complete / suggest 219
Query term completion via facet.prefix 221
Query term completion via the Suggester 223
Query term completion via the Terms component 226
The QueryElevation component 227
Configuration 228
The MoreLikeThis component 230
Configuration parameters 231
Parameters specific to the MLT search component 231
Parameters specific to the MLT request handler 231
Common MLT parameters 232
MLT results example 234
The Stats component 236
Configuring the stats component 237
Statistics on track durations 237
The Clustering component 238
Result grouping/Field collapsing 239
Configuring result grouping 241
The TermVector component 243
Summary 243
Chapter 8: Deployment 245
Deployment methodology for Solr 245
Questions to ask 246
Installing Solr into a Servlet container 247
Differences between Servlet containers 248
Defining solr.home property 248
Logging 249
HTTP server request access logs 250
Solr application logging 251
Configuring logging output 252
Logging using Log4j 253
Jetty startup integration 253
Managing log levels at runtime 254
A SearchHandler per search interface? 254
Leveraging Solr cores 256
Configuring solr.xml 256
Property substitution 258
Include fragments of XML with XInclude 259
Managing cores 259
Why use multicore? 261
Monitoring Solr performance 262
Stats.jsp 263
JMX 264
Starting Solr with JMX 265
Securing Solr from prying eyes 270
Limiting server access 270
Securing public searches 272
Controlling JMX access 273
Securing index data 273
Controlling document access 273
Other things to look at 274
Summary 275
Chapter 9: Integrating Solr 277
Working with included examples 278
Inventory of examples 278
Solritas, the integrated search UI 279
Pros and Cons of Solritas 281
SolrJ: Simple Java interface 283
Using Heritrix to download artist pages 283
SolrJ-based client for Indexing HTML 285
SolrJ client API 287
Embedding Solr 288
Searching with SolrJ 289
Indexing 290
When should I use embedded Solr? 294
In-process indexing 294
Standalone desktop applications 295
Upgrading from legacy Lucene 295
Using JavaScript with Solr 296
Wait, what about security? 297
Building a Solr powered artists autocomplete widget with jQuery
and JSONP 298
AJAX Solr 303
Using XSLT to expose Solr via OpenSearch 305
OpenSearch based Browse plugin 306
Installing the Search MBArtists plugin 306
Accessing Solr from PHP applications 309
solr-php-client 310
Drupal options 311
Apache Solr Search integration module 312
Hosted Solr by Acquia 312
Ruby on Rails integrations 313
The Ruby query response writer 313
sunspot_rails gem 314
Setting up MyFaves project 315
Populating MyFaves relational database from Solr 316
Build Solr indexes from a relational database 318
Complete MyFaves website 320
Which Rails/Ruby library should I use? 322
Nutch for crawling web pages 323
Maintaining document security with ManifoldCF 324
Connectors 325
Putting ManifoldCF to use 325
Summary 328
Chapter 10: Scaling Solr 329
Tuning complex systems 330
Testing Solr performance with SolrMeter 332
Optimizing a single Solr server (Scale up) 334
Configuring JVM settings to improve memory usage 334
MMapDirectoryFactory to leverage additional virtual memory 335
Enabling downstream HTTP caching 335
Solr caching 338
Tuning caches 339
Indexing performance 340
Designing the schema 340
Sending data to Solr in bulk 341
Don't overlap commits 342
Disabling unique key checking 343
Index optimization factors 343
Enhancing faceting performance 345
Using term vectors 345
Improving phrase search performance 346
Moving to multiple Solr servers (Scale horizontally) 348
Replication 349
Starting multiple Solr servers 349
Configuring replication 351
Load balancing searches across slaves 352
Indexing into the master server 352
Configuring slaves 353
Configuring load balancing 354
Sharding indexes 356
Assigning documents to shards 357
Searching across shards (distributed search) 358
Combining replication and sharding (Scale deep) 360
Near real time search 362
Where next for scaling Solr? 363
Summary 364
Appendix: Search Quick Reference 365
Quick reference