1. Overview
  2. Content
  3. Ingesting content

Ingesting content

The more complete your content, the better the AI generated answers will be. Therefor it's important to properly ingest all content you have available. Deflekt.ai lets you ingest the following content:

  • Web page (by single url)
  • Multiple web pages (by multiple urls)
  • Multiple web pages (by sitemap url)
  • Documents (by single or multiple urls)
  • Documents (by upload)

Supported file types

Besides webpages (html), the following file types are supported for ingestion through url or upload:

  • .pdf
  • .docx
  • .md
  • .txt
Missing a specific file type you would like to be supported? Reach out to make sure it reaches the roadmap.

Decide what you want to ingest

First step is to decide what you want to ingest. If the content you want to ingest is online and not protected by a login or other form of authentication, the easiest way is to ingest by url. If you want to handpick some pages (ie. your FAQ page, pricing page, how-to page) you can do that one-by-one by selecting Single URL. You can also collect the urls first and list them comma separated by using Multiple URLs. These urls can also point to files that are hosted (like PDF brochures or guides).

Only ingest what is valuable

A unique feature of Deflekt.ai is to 'skip' certain standard elements on a page. If you know a little about html, you can use the HTML elements to exclude feature. Here you can select elements like <header> or <nav> to be skipped from ingestion. If you think about it, this makes a lot of sense. Every page on your website probably has the same navigation element with a bunch of text links. The text in these links would be part of your content if you don't exclude them. If that text would match with a user's query, basically every page would be seen as relevant to the query and the right conent might not surface. If you have no idea what al of this means, you can simply skip this setting or try without selecting something first and delete the documents and try again if the results are shaky.

Document management

Your plan comes with a maximum number of documents (across your account). The current and maximum number is shown at the top of the content page. At the bottom you'll find a list of your documents. It will show their title, url and status. Web pages get their <meta> title as default title, files get their file name as title. These titles are shown to your users if they are used as source of an answer. To clean up these titles, you can click Edit for each document and make sure they have a nice, clean and relevant title.

You can also view the raw (text) content of a file by clicking View or delete a file by clicking Delete.

Each document also has a status. After you add a url or file you will see the document go through a series of status before it reaches the end status Complete. If it ends up as Failed something went wrong. Go ahead and delete to file and try again. Most often the reason for failure is the fact that the exact same file was already added.

Finally, for url type documents there is the option to Refetch. This basically deleted the file and refetches it from the original location. If you do big updates to your web content, you can use this option for that particular page to make sure that fresh content also is used to construct answers in Deflekt.ai.

 


Was this article helpful?
© 2024 Deflekt.ai Documentation