Introduction to Content Integration
About this document
This paper discusses features that you can use to integrate content from various repositories into the Google Search Appliance (GSA).
The recommendations and information in this document were gathered through our work with a variety of clients and environments in the field. We thank our customers and partners for sharing their experiences and insights.
|What's covered||This paper covers using feeds, connectors, and cloud connect.|
|Primary audience||Project managers, GSA administrators, and connector developers.|
|IT environment||GSA and various external data repositories, such as enterprise content management systems, LDAP, and G Suite.|
|Deployment phases||Initial configuration of the GSA.|
In order to search, we need to index all the content first. There is a range of possibilities:
- Crawling: Is the process to follow the links from a starting point, downloading all the content and adding it to the index.
- Crawling through a Proxy: Same as the previous process, but using a proxy in between e.g., to add Metadata (i.e. Microdata, security) or change content.
- Database Crawling: Allows indexing databases by using SQL queries. It is done by using a JDBC-SQL query that will map from the data model to the Feed format.
- Feeding: Is an XML-document that represents the document/content and permits to control the metadata and security of each document.
- Connector: Is the software that traverses the document from the repository, usually a DMS or CMS system, and sends it to the GSA in order to index it.
In the following chapters we will review the different indexing options in depth and will present advantages and disadvantages of each solution.