In DQL there is an option that you can add to your query regarding not having deleted items included the report output. The command that makes sure deleted entries are not included in the report output is:
isdeleted = 0
The Data Retention period is set at the following location:
In Data Insight Console > Settings > Data Retention
When you select the option "Purge data for deleted shares/site collections automatically" this enables the purging of data pertaining to deleted shares. This option is enabled by default.
If you clear the check box, the index data for the share or site collection remains on the disk until the global archive or purge takes place. Thus, if you decide to decommission a filer or site collection, the data continues to be available for reporting.
INSTALL_DIR/bin/configcli execute_job DataRetentionJob
This setting can be changed via the following steps:
In the Data Insight Console > Data Insight Servers > Select Associated Indexer Node > Advanced Settings > Indexer Settings > Uncheck the box that says: "Reconfirm deleted paths when reconciling full scan information" Save it and close it.
After Data Insight indexes full scan data, it computes the paths that no longer seem to be present on the file system.
Select this check box to have Data Insight re-confirm if those paths are indeed deleted using an incremental scan before removing them from the index.
When the check box is clear, Data Insight readily removes the missing paths from the indexes without carrying out a re-confirmation.
(Re-confirm scan is not supported for SharePoint site collections.)
Data Insight retains historical data, so all indexed data is still preserved, even when it has been deleted from the filer. When Data Insight finds that the data has been deleted, it flags it as deleted, but still keeps it. The data is kept there until the Data Retention period is reached.