I’ve opened thousands of XER files over the years. Most schedulers treat them as black boxes — export from one system, import into another. But understanding the format gives you a real edge, especially when things go wrong during import or when you need to analyze schedule data outside of P6.
What Is an XER File?
An XER file is a plain text export from Primavera P6. It contains a complete snapshot of one or more projects — activities, relationships, resources, calendars, WBS, codes, everything.
The format is tab-delimited with a specific structure. You can open any XER file in Notepad, VS Code, or any text editor and read it directly. No proprietary binary format. No encryption. Just structured text.
This simplicity is what makes XER files so useful. They’re the lingua franca of P6 data exchange.
File Structure
Every XER file follows this pattern:
ERMHDR 24.8.0 2026-03-15 Project Admin ...
%T PROJECT
%F proj_id proj_short_name ...
%R 1042 PRJ-2026-001 ...
%T PROJWBS
%F wbs_id proj_id wbs_short_name ...
%R 5001 1042 PHASE-1 ...
%R 5002 1042 PHASE-2 ...
%T TASK
%F task_id proj_id task_code task_name ...
%R 10001 1042 A1010 Mobilize Equipment ...
The markers:
ERMHDR— Header line. Contains the P6 version, export date, export user, and database info. Always the first line.%T— Table marker. Followed by the table name. Everything after this until the next%Tbelongs to this table.%F— Field definition. Lists the column names for the current table, tab-separated.%R— Row data. One data record, tab-separated, matching the field order from%F.%E— End marker. Appears at the very end of the file.
That’s the entire format. No nesting, no XML tags, no JSON brackets. Just headers, tables, fields, and rows.
Key Tables and What They Contain
A typical XER file contains 20-40 tables. Here are the ones that matter most:
| Table | Contents |
|---|---|
PROJECT | Project-level data — ID, name, planned start/finish, data date, status |
PROJWBS | WBS structure — hierarchical, with parent references via parent_wbs_id |
TASK | Activities — the core of the schedule. All activity-level data lives here |
TASKPRED | Logic links — predecessor/successor relationships between activities |
RSRC | Resource dictionary — resource names, IDs, types, units |
TASKRSRC | Resource assignments — which resources are assigned to which activities |
CALENDAR | Calendar definitions — work hours, holidays, exceptions |
ACTVCODE | Activity code type definitions |
ACTVTYPE | Activity code value definitions |
TASKACTV | Activity code assignments — which codes are assigned to which activities |
ACCOUNT | Cost account dictionary |
UDFTYPE | User Defined Field definitions |
UDFVALUE | UDF values assigned to activities |
The TASK Table Deep Dive
This is where the schedule lives. On a 5,000-activity project, the TASK table alone can have hundreds of columns and thousands of rows.
Key fields I look at first:
task_id— P6’s internal numeric ID. Not the Activity ID that users see in the interface. This is the primary key.task_code— The Activity ID that users see (e.g.,A1010,CIV-0350). This is what schedulers reference.task_name— Activity description.task_type—TT_Task(Task Dependent),TT_Rsrc(Resource Dependent),TT_Mile(Milestone),TT_FinMile(Finish Milestone),TT_LOE(Level of Effort),TT_WBS(WBS Summary).target_start_date/target_end_date— Baseline dates.act_start_date/act_end_date— Actual dates. Empty if the activity hasn’t started.restart_date/reend_date— Remaining early start and finish.remain_drtn_hr_cnt— Remaining duration in hours. Divide by the calendar’s hours/day to get days.total_float_hr_cnt— Total float in hours. Same conversion applies.status_code—TK_NotStart,TK_Active,TK_Complete.phys_complete_pct— Physical percent complete.
The TASKPRED Table
This table defines your schedule logic. Each row is one relationship:
task_pred_id— Unique ID for this linktask_id— The successor activity (the one being constrained)pred_task_id— The predecessor activitypred_type—PR_FS(Finish-to-Start),PR_FF(Finish-to-Finish),PR_SS(Start-to-Start),PR_SF(Start-to-Finish)lag_hr_cnt— Lag in hours. Can be negative.
When I’m doing forensic analysis on a problem schedule, this is the second table I check after TASK. Missing or broken logic shows up clearly here.
Parsing XER Files with Python
You don’t need a special library. The format is simple enough to parse in 20 lines:
def parse_xer(filepath):
tables = {}
current_table = None
fields = []
with open(filepath, 'r', encoding='cp1252') as f:
for line in f:
line = line.rstrip('\n')
if line.startswith('%T'):
parts = line.split('\t')
current_table = parts[1] if len(parts) > 1 else None
tables[current_table] = []
elif line.startswith('%F'):
fields = line.split('\t')[1:]
elif line.startswith('%R') and current_table:
values = line.split('\t')[1:]
row = dict(zip(fields, values))
tables[current_table].append(row)
return tables
data = parse_xer('project_backup.xer')
# Get all activities
activities = data.get('TASK', [])
print(f"Found {len(activities)} activities")
# Find critical activities (zero or negative float)
critical = [a for a in activities if float(a.get('total_float_hr_cnt', 999)) <= 0]
print(f"Critical activities: {len(critical)}")
# Get all relationships
relationships = data.get('TASKPRED', [])
print(f"Found {len(relationships)} logic links")
Note the cp1252 encoding. That matters.
From here you can load the data into pandas, push it to Power BI, generate custom reports, or run quality checks that P6’s built-in tools don’t support. I built a schedule health check script that parses the XER and flags missing logic, excessive lags, and constraint overuse. Runs in seconds on a 10,000-activity schedule.
Practical Uses
Backup and archive. XER files are the standard way to snapshot a schedule at a point in time. I export XERs at every data date, named with the date: PRJ001_2026-03-15.xer. Cheap insurance.
Transfer between environments. Moving a schedule from a development P6 instance to production. XER export/import is the cleanest method.
Forensic analysis. Compare two XER files from different dates to see exactly what changed. Which activities slipped? Which relationships were deleted? Which resources were reassigned? A diff script that compares TASK tables between two XER files has saved me dozens of hours of manual investigation.
Data migration. Moving schedules between P6 instances or versions. XER files are version-aware — the ERMHDR line tells the target system what version exported the file.
External reporting. When P6’s built-in reports aren’t enough (and they often aren’t), parse the XER and build what you need. Power BI dashboards fed from XER data are common on large programs.
Common Gotchas
Encoding is Windows-1252, not UTF-8. This is the most common parsing failure I see. International characters — accented names, Arabic text, CJK characters — will break or corrupt if you read the file as UTF-8. Always specify cp1252 or latin-1 as your encoding. If you’re working with a P6 instance that has multilingual data, test with a file that contains non-ASCII characters.
Fields are tab-delimited. Spaces within field values are data, not delimiters. Only tabs separate fields. If you split on whitespace instead of tabs, your data will be wrong.
Empty fields are consecutive tabs. Two tabs with nothing between them mean an empty value. Don’t strip or collapse whitespace, or you’ll shift all subsequent fields in that row.
Dates are formatted as YYYY-MM-DD HH:MM. They’re stored as the P6 server’s local time, not necessarily UTC. I’ve seen conflicting documentation on this. In practice, verify against the P6 interface when it matters.
Internal IDs are not Activity IDs. The task_id field is P6’s internal numeric identifier. It changes between exports and environments. The task_code field is the stable Activity ID that users see. When matching data across XER files, use task_code, not task_id.
XER files are version-specific. An XER exported from P6 24.12 might not import cleanly into P6 19.12. The table structures evolve between versions. Fields get added. Check the ERMHDR version and test imports before assuming compatibility.