Fil (Files) Entity¶
The Fil entity represents file attachments in the Danish parliamentary system, providing direct access to downloadable documents such as PDFs, Word documents, and other file formats. Files are linked to their parent documents and can be downloaded directly without authentication.
Overview¶
- Entity Name:
Fil - Endpoint:
https://oda.ft.dk/api/Fil - Primary Key:
id(Int32) - Purpose: File attachments for parliamentary documents
- File Hosting: Files stored on
www.ft.dkdomain - Access: Direct download without authentication required
Field Reference¶
Core Identification Fields¶
| Field | Type | Description | Example |
|---|---|---|---|
id |
Int32 | Primary key, unique file identifier | 158 |
dokumentid |
Int32 | Foreign key to parent document | 11432 |
titel |
String | File title/name (often includes extension) | "Evaluering af lov om friplejeboliger_endelig vers.pdf" |
File Access Fields¶
| Field | Type | Description | Example |
|---|---|---|---|
filurl |
String | Direct download URL hosted on www.ft.dk | "https://www.ft.dk/samling/20131/almdel/BYB/bilag/89/1396405.pdf" |
format |
String | File format type | "PDF", "DOCX" |
Temporal Fields¶
| Field | Type | Description | Example |
|---|---|---|---|
opdateringsdato |
DateTime | Last update timestamp | "2020-02-25T10:46:59.49" |
File Format Support¶
The API supports multiple file formats commonly used in parliamentary documentation:
| Format | Description | Common Use |
|---|---|---|
| Portable Document Format | Official documents, reports, legislation | |
| DOCX | Microsoft Word Document | Draft documents, working papers |
Format Accuracy
The format field in the API may not always match the actual file type. Some DOCX files may have PDF URLs or vice versa. Always verify the actual file format when processing downloads.
Common Query Examples¶
Basic File Queries¶
# Get latest 5 files
curl "https://oda.ft.dk/api/Fil?%24top=5&%24orderby=opdateringsdato%20desc"
# Get specific file by ID
curl "https://oda.ft.dk/api/Fil(158)"
# Count total files
curl "https://oda.ft.dk/api/Fil?%24inlinecount=allpages&%24top=1"
Filter by Format¶
# Get all PDF files
curl "https://oda.ft.dk/api/Fil?%24filter=format%20eq%20'PDF'&%24top=10"
# Get all Word documents
curl "https://oda.ft.dk/api/Fil?%24filter=format%20eq%20'DOCX'&%24top=10"
# Files updated recently
curl "https://oda.ft.dk/api/Fil?%24filter=opdateringsdato%20gt%20datetime'2020-01-01T00:00:00'&%24top=10"
Search by Title¶
# Find files with specific keywords in title
curl "https://oda.ft.dk/api/Fil?%24filter=substringof('budget',titel)&%24top=5"
# Search for evaluation reports
curl "https://oda.ft.dk/api/Fil?%24filter=substringof('evaluering',titel)&%24top=5"
# Find PDF files by title
curl "https://oda.ft.dk/api/Fil?%24filter=substringof('pdf',titel)%20and%20format%20eq%20'PDF'&%24top=5"
Field Selection for Performance¶
# Only essential fields for file listings
curl "https://oda.ft.dk/api/Fil?%24select=id,titel,format,filurl&%24top=20"
# Get just download URLs
curl "https://oda.ft.dk/api/Fil?%24select=filurl&%24top=10"
# File metadata without URLs
curl "https://oda.ft.dk/api/Fil?%24select=titel,format,opdateringsdato&%24top=10"
Relationship Expansion¶
Document Integration¶
# Files with parent document information
curl "https://oda.ft.dk/api/Fil?%24expand=Dokument&%24top=3"
# Find files for specific document
curl "https://oda.ft.dk/api/Fil?%24filter=dokumentid%20eq%2011432&%24expand=Dokument"
# Files with document titles and types
curl "https://oda.ft.dk/api/Fil?%24expand=Dokument/Dokumenttype&%24select=titel,format,Dokument/titel,Dokument/Dokumenttype/type&%24top=5"
Combined Document and File Search¶
# Find documents with file attachments by keyword
curl "https://oda.ft.dk/api/Dokument?%24expand=Fil&%24filter=substringof('budget',titel)&%24select=titel,Fil/filurl,Fil/format&%24top=5"
# Documents with PDF attachments only
curl "https://oda.ft.dk/api/Dokument?%24expand=Fil&%24filter=Fil/format%20eq%20'PDF'&%24top=3"
File Download Examples¶
Direct File Access¶
# Test file accessibility (headers only)
curl -I "https://www.ft.dk/samling/20131/almdel/BYB/bilag/89/1396405.pdf"
# Download file with progress
curl -L -o "document.pdf" "https://www.ft.dk/samling/20131/almdel/BYB/bilag/89/1396405.pdf"
# Get file size and download time
curl -w "%{size_download} bytes in %{time_total}s" -o /dev/null -s "https://www.ft.dk/samling/20131/almdel/BYB/bilag/89/1396405.pdf"
Programmatic File Processing¶
# Extract file URLs for batch processing
curl -s "https://oda.ft.dk/api/Fil?%24filter=format%20eq%20'PDF'&%24select=filurl&%24top=10" | jq -r '.value[].filurl'
# Get file metadata for processing pipeline
curl -s "https://oda.ft.dk/api/Fil?%24select=id,titel,format,filurl,opdateringsdato&%24top=5" | jq '.value[]'
Data Analysis Examples¶
File Format Distribution¶
# Count files by format
curl "https://oda.ft.dk/api/Fil?%24filter=format%20eq%20'PDF'&%24inlinecount=allpages&%24top=1"
curl "https://oda.ft.dk/api/Fil?%24filter=format%20eq%20'DOCX'&%24inlinecount=allpages&%24top=1"
# Get format statistics
curl "https://oda.ft.dk/api/Fil?%24select=format&%24top=1000" | jq '[.value[].format] | group_by(.) | map({format: .[0], count: length})'
Document Attachment Analysis¶
# Documents with multiple file attachments
curl "https://oda.ft.dk/api/Dokument?%24expand=Fil&%24filter=Fil/id%20ne%20null&%24select=titel,Fil/titel,Fil/format&%24top=10"
# Recent file uploads
curl "https://oda.ft.dk/api/Fil?%24filter=opdateringsdato%20gt%20datetime'2020-01-01T00:00:00'&%24orderby=opdateringsdato%20desc&%24top=20"
Content Analysis Preparation¶
# Get PDF files for text analysis
curl "https://oda.ft.dk/api/Fil?%24filter=format%20eq%20'PDF'%20and%20substringof('rapport',titel)&%24select=titel,filurl&%24top=10"
# Find evaluation documents
curl "https://oda.ft.dk/api/Fil?%24filter=substringof('evaluering',titel)&%24expand=Dokument&%24select=titel,filurl,Dokument/titel&%24top=10"
Common Use Cases¶
1. Document Archive System¶
def download_parliamentary_files(document_keywords, file_format="PDF"):
"""Download files related to specific parliamentary topics"""
filter_query = f"format eq '{file_format}' and substringof('{document_keywords}',titel)"
files = get_files(filter_query=filter_query, select="titel,filurl", top=100)
for file in files['value']:
download_url = file['filurl']
filename = file['titel']
# Download logic here
2. File Format Migration Analysis¶
def analyze_file_formats():
"""Analyze distribution of file formats in the system"""
all_files = get_files(select="format", top=1000)
format_counts = {}
for file in all_files['value']:
fmt = file.get('format', 'Unknown')
format_counts[fmt] = format_counts.get(fmt, 0) + 1
return format_counts
3. Content Extraction Pipeline¶
def setup_content_extraction():
"""Get files ready for text extraction and analysis"""
pdf_files = get_files(
filter_query="format eq 'PDF'",
expand="Dokument",
select="titel,filurl,Dokument/titel,Dokument/dokumenttypeid",
top=500
)
# Process files for content extraction
return pdf_files
4. Document Completeness Check¶
def check_document_attachments():
"""Find documents that should have file attachments"""
documents_with_files = get_documents(
expand="Fil",
filter_query="Fil/id ne null",
select="titel,Fil/format"
)
# Analyze attachment patterns
return documents_with_files
Performance Optimization¶
Efficient File Queries¶
# Good: Request only needed fields
curl "https://oda.ft.dk/api/Fil?%24select=filurl,format&%24top=100"
# Good: Filter by format first, then other criteria
curl "https://oda.ft.dk/api/Fil?%24filter=format%20eq%20'PDF'%20and%20substringof('budget',titel)&%24top=20"
# Avoid: Requesting all fields when only URLs needed
curl "https://oda.ft.dk/api/Fil?%24top=100" # Returns unnecessary data
Batch File Processing¶
# Process files in batches for better performance
curl "https://oda.ft.dk/api/Fil?%24select=filurl&%24skip=0&%24top=100"
curl "https://oda.ft.dk/api/Fil?%24select=filurl&%24skip=100&%24top=100"
Important Notes¶
File Access Characteristics¶
- No Authentication: Files can be downloaded directly without API keys or authentication
- Direct URLs: All file URLs point to
www.ft.dkdomain (different from API domain) - URL Stability: File URLs appear to be permanent and don't expire
- File Sizes: Parliamentary documents typically range from ~1MB to 1.5MB+
Download Considerations¶
File Size and Bandwidth
Parliamentary documents can be large (1MB+). Consider implementing: - Concurrent download limits - Progress tracking for large files - Retry logic for failed downloads - Local caching to avoid repeated downloads
Data Quality Notes¶
- Format Field Accuracy: The
formatfield may not always match the actual file type - Title Consistency: File titles may include version information and Danish characters
- Update Frequency: Files have update dates but are generally static once published
Danish Language Support¶
Full UTF-8 Support: File titles and content support Danish characters (æ, ø, å):
# Search for files with Danish characters
curl "https://oda.ft.dk/api/Fil?%24filter=substringof('ø',titel)&%24top=5"
curl "https://oda.ft.dk/api/Fil?%24filter=substringof('evaluering',titel)&%24top=5"
Integration with Document Workflow¶
The Fil entity is tightly integrated with the parliamentary document workflow:
- Document Creation: Documents are created in the
Dokumententity - File Attachment: Related files are added to the
Filentity withdokumentidforeign key - Publication: Files become accessible via direct download URLs
- Updates: Files can be updated (reflected in
opdateringsdato)
Example File Record¶
{
"id": 158,
"dokumentid": 11432,
"titel": "Evaluering af lov om friplejeboliger_endelig vers.pdf",
"filurl": "https://www.ft.dk/samling/20131/almdel/BYB/bilag/89/1396405.pdf",
"format": "PDF",
"opdateringsdato": "2020-02-25T10:46:59.49"
}
This file entity enables direct access to the rich document archive of the Danish Parliament, supporting everything from automated document analysis to public transparency initiatives.