deface package

deface.cli

deface.cli.create_parser() argparse.ArgumentParser[source]

Create the argument parser for the deface command line tool.

deface.cli.main() None[source]

A command line tool to convert Facebook posts from their personal archive format to a simpler, cleaner version. The tool reads in one or more files with possibly overlapping post data, simplifies the structure of the data, eliminates redundant information, reconciles the records into a single timeline, and then exports that timeline of posts as JSON.

deface.error

exception deface.error.DefaceError[source]

Bases: Exception

The base class for errors specific to this package.

exception deface.error.ValidationError[source]

Bases: deface.error.DefaceError

An error indicating that JSON data does not have expected fields or type.

exception deface.error.MergeError[source]

Bases: deface.error.DefaceError

An error indicating that two posts are unrelated and cannot be merged.

deface.ingest

deface.ingest.ingest_into_history(data: deface.validator.Validator[Any], history: deface.model.PostHistory) list[deface.error.DefaceError][source]

Ingest the JSON data value wrapped by the validator as list of posts into the given history. This function returns a list of ingestion errors.

deface.ingest.ingest_post(data: deface.validator.Validator[Any]) deface.model.Post[source]

Ingest the JSON data value wrapped by the validator as a post.

deface.ingest.ingest_media(data: deface.validator.Validator[Any]) deface.model.Media[source]

Ingest the JSON data value wrapped by the validator as a media descriptor.

deface.ingest.ingest_location(data: deface.validator.Validator[Any]) deface.model.Location[source]

Ingest the JSON data value wrapped by the validator as a location.

deface.ingest.ingest_external_context(data: deface.validator.Validator[Any]) deface.model.ExternalContext[source]

Ingest the JSON data value wrapped by the validator as an external context.

deface.ingest.ingest_event(data: deface.validator.Validator[Any]) deface.model.Event[source]

Ingest the JSON data value wrapped by the validator as an event.

deface.ingest.ingest_comment(data: deface.validator.Validator[Any]) deface.model.Comment[source]

Ingest the JSON data value wrapped by the validator as a comment.

deface.logger

deface.logger.pluralize(count: int, noun: str, suffix: str = 's') str[source]
class deface.logger.Level(value)[source]

Bases: enum.Enum

An enumeration.

ERROR = '🛑 '
WARN = '⚠️ '
INFO = 'ℹ️ '
class deface.logger.Logger(stream: TextIO = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>, prefix: str = '', use_color: bool = True, use_emoji: bool = True)[source]

Bases: object

A simple console logger. By default, the logger prefixes messages with the given prefix followed by appropriate emoji. If the underlying stream is a TTY, it also uses ANSI escape codes to style messages. The use of color or emoji can be disabled by setting the corresponding argument to false.

print(text: str = '') None[source]

Log the given text followed by a newline.

print_json(value: Any, **kwargs: Any) None[source]

Log a nicely indented JSON representation of the given value

print_bold(text: str) None[source]

Log the text in bold followed by a newline.

print_in_green(text: str) None[source]

Log the text in green followed by a newline.

print_in_red(text: str) None[source]

Log the text in red followed by a newline.

property error_count: int

The number of errors reported with error() so far.

error(err: Union[str, Exception], *extras: Any) None[source]

Log the given error message followed by the JSON representation of any additional exception arguments as well as additional method arguments.

property warning_count: int

The number of warnings reported with warn() so far.

warn(warning: Union[str, Warning], *extras: Any) None[source]

Print a warning message.

info(message: str, *extras: Any) None[source]

Print an informational message.

done(message: str) None[source]

Print a summarizing message at completion of a tool run. If the output stream is a TTY, the message is highlighted in red or green, depending on whether any errors have been reported.

deface.model

The data model for posts. This module defines the deface’s own post schema, which captures all Facebook post data in a much simpler fashion. The main type is the Post dataclass. It depends on the Comment, Event, ExternalContext, Location, Media, and MediaMetaData dataclasses as well as the MediaType enumeration. This module also defines the PostHistory and find_simultaneous_posts() helpers for building up a coherent timeline from Facebook post data.

The schema uses Python tuples instead of lists because the former are immutable and thus do not get in the way of all model classes being both equatable and hashable.

The model’s JSON serialization follows directly from its definition, with every dataclass instance becoming an object in the JSON text that has the same fields — with one important exception: If an attribute has None or the empty tuple () as its value, deface.serde.prepare() removes it from the JSON representation. Since the schema needs to capture all information contained in Facebook post data, it includes a relatively large number of optional attributes. Including them in the serialized representation seems to have little benefit while cluttering the JSON text.

The model can easily be reinstated from its JSON text post-by-post by passing the deserialized dictionary to Post.from_dict(). The method patches the representation of nested model types and also fills in None and () values. For uniformity of mechanism, all model classes implement from_dict, even if they do not need to patch fields before invoking the constructor.

class deface.model.MediaType(value)[source]

Bases: enum.Enum

An enumeration of media types.

PHOTO = 'PHOTO'
VIDEO = 'VIDEO'
class deface.model.Comment(author: str, comment: str, timestamp: int)[source]

Bases: object

A comment on a post, photo, or video.

author: str

The comment’s author.

comment: str

The comment’s text.

timestamp: int

The comment’s timestamp.

classmethod from_dict(data: dict[str, typing.Any]) deface.model.Comment[source]

Create a new comment from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of deface.serde.prepare(), just as deface.serde.dumps() does.

class deface.model.Event(name: str, start_timestamp: int, end_timestamp: int)[source]

Bases: object

An event

name: str

The event’s name.

start_timestamp: int

The beginning of the event.

end_timestamp: int

The end of the event or zero for events without a defined duration.

classmethod from_dict(data: dict[str, typing.Any]) deface.model.Event[source]

Create a new event from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of deface.serde.prepare(), just as deface.serde.dumps() does.

class deface.model.ExternalContext(url: str, name: Optional[str] = None, source: Optional[str] = None)[source]

Bases: object

The external context for a post. In the original Facebook post data, a post’s external context is part of the attachments:

{
  "attachments": [
    {
      "data": [
        {
          "external_context": {
            "name": "Instagram Post by Ro\u00cc\u0081isi\u00cc\u0081n Murphy",
            "source": "instagram.com",
            "url": "https://www.instagram.com/p/B_13ojcD6Fh/"
          }
        }
      ]
    }
  ]
}

Unusually, the example includes a name and source in addition to the url. It also illustrates the mojibake resulting from Facebook erroneously double encoding all text. The name should read Instagram Post by Róisín Murphy.

url: str

A URL linking to external content.

name: Optional[str] = None

The name of the website or, if article, its title. Not a common attribute.

source: Optional[str] = None

The name of the website or, if article, the publication’s name. Not a common attribute.

classmethod from_dict(data: dict[str, typing.Any]) deface.model.ExternalContext[source]

Create a new external context from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of deface.serde.prepare(), just as deface.serde.dumps() does.

class deface.model.Location(name: str, address: Optional[str] = None, latitude: Optional[float] = None, longitude: Optional[float] = None, url: Optional[str] = None)[source]

Bases: object

A location in the real world. In the original Facebook post data, a post’s place is part of the attachments:

{
  "attachments": [
    {
      "data": [
        {
          "place": {
            "name": "Whitney Museum of American Art",
            "coordinate": {
              "latitude": 40.739541735,
              "longitude": -74.009095020556
            },
            "address": "",
            "url": "https://www.facebook.com/whitneymuseum/"
          }
        }
      ]
    }
  ]
}

The coordinate is stripped during ingestion to hoist latitude and longitude into the location record. In rare cases, the coordinate may be missing from the original Facebook data, hence both the latitude and longitude attributes are optional.

name: str

The location’s name.

address: Optional[str] = None

The location’s address.

latitude: Optional[float] = None

The location’s latitude. In the original Facebook post data, this attribute is nested inside the coordinate attribute.

longitude: Optional[float] = None

The location’s longitude. In the original Facebook data, this attribute is nested inside the coordinate attribute.

url: Optional[str] = None

“The URL for the location on https://www.facebook.com.

is_mergeable_with(other: deface.model.Location) bool[source]

Determine whether this location can be merged with the other location. For two locations to be mergeable, they must have identical name, address, latitude, and longitude attributes. Furthermore, they must either have identical url attributes or one location has a string value while the other location has None.

merge(other: deface.model.Location) deface.model.Location[source]

Merge this location with the given location. In case of identical URLs, this method returns self. In case of divergent URLs, this method returns the instance with the URL value.

Raises

MergeError – indicates that the locations differ in more than their URLs and thus cannot be merged.

classmethod from_dict(data: dict[str, typing.Any]) deface.model.Location[source]

Create a new location from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of deface.serde.prepare(), just as deface.serde.dumps() does.

class deface.model.MediaMetaData(camera_make: Optional[str] = None, camera_model: Optional[str] = None, exposure: Optional[str] = None, focal_length: Optional[str] = None, f_stop: Optional[str] = None, iso_speed: Optional[int] = None, latitude: Optional[float] = None, longitude: Optional[float] = None, modified_timestamp: Optional[int] = None, orientation: Optional[int] = None, original_height: Optional[int] = None, original_width: Optional[int] = None, taken_timestamp: Optional[int] = None)[source]

Bases: object

The metadata for a photo or video. In the original Facebook post data, this object also includes the upload_ip and upload_timestamp, but since both attributes describe the use of the photo or video on Facebook and not the photo or video itself, they are hoisted into the Media record. The remaining attributes, even if present in the original Facebook post data, tend to be meaningless, i.e., are either the empty string or zero. Also, while the remaining attributes would be meaningful for both photos and videos, they are found only on photos.

camera_make: Optional[str] = None
camera_model: Optional[str] = None
exposure: Optional[str] = None
focal_length: Optional[str] = None
f_stop: Optional[str] = None
iso_speed: Optional[int] = None
latitude: Optional[float] = None
longitude: Optional[float] = None
modified_timestamp: Optional[int] = None
orientation: Optional[int] = None
original_height: Optional[int] = None
original_width: Optional[int] = None
taken_timestamp: Optional[int] = None
classmethod from_dict(data: dict[str, typing.Any]) deface.model.MediaMetaData[source]

Create new media metadata from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of deface.serde.prepare(), just as deface.serde.dumps() does.

class deface.model.Media(media_type: deface.model.MediaType, uri: str, description: Optional[str] = None, title: Optional[str] = None, thumbnail: Optional[str] = None, metadata: Optional[deface.model.MediaMetaData] = None, creation_timestamp: Optional[int] = None, upload_timestamp: Optional[int] = None, upload_ip: str = '', comments: tuple[deface.model.Comment, ...] = <factory>)[source]

Bases: object

A posted photo or video.

media_type: deface.model.MediaType

The media type, which is derived from the metadata key in the original data.

uri: str

The path to the photo or video file within the personal data archive. In terms of RFC 3986, the attribute provides a relative-path reference, i.e., it lacks a scheme such as file: and does not start with a slash /. However, it should not be resolved relative to the file containing the field but rather from the root of the personal data archive.

description: Optional[str] = None

A description of the photo or video. In the original Facebook post data, the value for this attribute may be duplicated amongst all of a post’s media objects as well as the post’s body. Whereever safe, such redundancy is resolved in favor of the post’s body. As a result, any remaining description on a media record is unique to that photo or video.

title: Optional[str] = None

The title for the photo or video. This field is filled in automatically and hence generic. Common variations are Mobile Uploads or Timeline Photos for photos and the empty string for videos.

thumbnail: Optional[str] = None

The thumbnail for a photo or video. If present in the original Facebook data, the value is an object with uri as its only field. Just like Media.uri, the thumbnail URI is a relative-path reference that should be resolved from the root of the personal data archive.

metadata: Optional[deface.model.MediaMetaData] = None

The metadata for the photo or video.

creation_timestamp: Optional[int] = None

Seemingly the timestamp for when the media object was created on Facebook. In the original Facebook, this timestamp differs from the post’s timestamp by less than 30 seconds.

upload_timestamp: Optional[int] = None

The timestamp at which the photo or video was uploaded. In the original Facebook post data, this field is part of the photo_metadata or video_metadata object nested inside the media object’s media_metadata. However, since it really is part of Facebook’s data on the use of the photo or video, it is hoisted into the media record during ingestion.

upload_ip: str = ''

The IP address from which the photo or video was uploaded from. In the original Facebook post data, this attribute is part of the photo_metadata or video_metadata object nested inside the media object’s media_metadata. It also is the only attribute reliably included with that object. However, since upload_ip really is part of Facebook’s data on the use of the photo or video, it is hoisted into the media record during ingestion.

comments: tuple[deface.model.Comment, ...]

Comments specifically on the photo or video.

is_mergeable_with(other: deface.model.Media) bool[source]

Determine whether this media object can be merged with the other media object. That is the case if both media objects have the same field values with exception of comments, which may be omitted from one of the two media objects.

merge(other: deface.model.Media) deface.model.Media[source]

Merge this media object with the other media object.

Raises

MergeError – indicates that the two media objects are not mergeable.

classmethod from_dict(data: dict[str, typing.Any]) deface.model.Media[source]

Create a new media descriptor from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of deface.serde.prepare(), just as deface.serde.dumps() does.

class deface.model.Post(timestamp: int, backdated_timestamp: Optional[int] = None, update_timestamp: Optional[int] = None, post: Optional[str] = None, name: Optional[str] = None, title: Optional[str] = None, text: tuple[str, ...] = <factory>, external_context: Optional[deface.model.ExternalContext] = None, event: Optional[deface.model.Event] = None, places: tuple[deface.model.Location, ...] = <factory>, tags: tuple[str, ...] = <factory>, media: tuple[deface.model.Media, ...] = <factory>)[source]

Bases: object

A post on Facebook.

timestamp: int

The time a post was made in seconds since the beginning of the Unix epoch on January 1, 1970 at midnight.

backdated_timestamp: Optional[int] = None

A backdated timestamp. Its semantics are unclear.

update_timestamp: Optional[int] = None

Nominally, the time of an update. In practice, if a post includes this field, its value appears to be the same as that of timestamp. In other words, the field has devolved to a flag indicating whether a post was updated.

post: Optional[str] = None

The post’s textual body.

name: Optional[str] = None

The name for a recommendations.

title: Optional[str] = None

The title of a post. This field is filled in automatically and hence generic. Starting with more common ones, variations include:

  • Alice

  • Alice updated her status.

  • Alice shared a memory.

  • Alice wrote on Bob's timeline.

  • Alice is feeling blessed.

  • Alice was with Bob.

text: tuple[str, ...]

The text introducing a shared memory.

external_context: Optional[deface.model.ExternalContext] = None

An external context, typically with URL only.

event: Optional[deface.model.Event] = None

The event this post is about.

places: tuple[deface.model.Location, ...]

The places for this post. Almost all posts have at most one deface.model.Location. Occasionally, a post has two locations that share the same address, latitude, longitude, and name but differ on deface.model.Location.url, with one location having None and the other having some value. In that case, deface.ingest.ingest_post() eliminates the redundant location object while keeping url’s value. Posts with two or more distinct locations seem rare but do occur.

tags: tuple[str, ...]

The tags for a post, including friends and pages.

media: tuple[deface.model.Media, ...]

The photos and videos attached to a post.

is_simultaneous(other: deface.model.Post) bool[source]

Determine whether this post and the other post have the same timestamp.

is_mergeable_with(other: deface.model.Post) bool[source]

Determine whether this post can be merged with the given post. The two posts are mergeable if they differ in their media at most.

merge(other: deface.model.Post) deface.model.Post[source]

Merge this post with the other post. If the two posts differ only in their media, this method returns a new post that combines the media from both posts.

Raises

MergeError – indicates that the two posts differ in more than their media or have different media descriptors for the same photo or video.

classmethod from_dict(data: dict[str, typing.Any]) deface.model.Post[source]

Create a new post from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of deface.serde.prepare(), just as deface.serde.dumps() does.

class deface.model.PostHistory[source]

Bases: object

A history of posts. Use add() to add posts one-by-one, as they are ingested. This class organizes them by Post.timestamp. That lets it easily merge posts that only differ in media as well as eliminate duplicate posts. The latter is particularly important when ingesting posts from more than one personal data archive, since archives may just overlap in time. Once all posts have been added to the history, timeline() returns a list of all unique posts sorted by timestamp.

add(post: deface.model.Post) None[source]

Add the post to the history of posts. If the history already includes one or more posts with the same timestamp, this method tries merging the given post with each of those posts and replaces the post upon a successful merge. Otherwise, this method adds the post to the history.

timeline() list[deface.model.Post][source]

Get a timeline for the history of posts. The timeline includes all posts from the history in chronological order.

deface.model.find_simultaneous_posts(timeline: list[deface.model.Post]) list[range][source]

Find all simultaneous posts on the given timeline and return the ranges of their indexes.

deface.serde

deface.serde.restore_utf8(data: bytes) bytes[source]

Restore the UTF-8 encoding for files exported from Facebook. Such files may appear to be valid JSON at first but nonetheless encode all non-ASCII characters incorrectly. Notably, what should just be UTF-8 byte values are Unicode escape sequences of the form \u00xx. This function replaces such sequences with the byte value given by the last two hexadecimal digits. It leaves all other escape sequences in place.

NB: If an arbitrary but odd number of backslashes precedes u00xx, the final backslash together with the u00xx forms a unicode escape sequence. However, if an even number of backslashes precedes u00xx, there is no unicode escape sequence but text discussing unicode escape sequences.

This function should be invoked on the bytes of JSON text, before parsing.

deface.serde.loads(data: bytes, **kwargs: Any) Union[None, bool, int, float, str, list[typing.Any], dict[str, typing.Any]][source]

Return the result of deserializing a value from the given JSON text. This function simply wraps an invocation of the eponymous function in Python’s json package — after applying restore_utf8() to the given data. It passes the keyword arguments through.

deface.serde.prepare(data: Any) Any[source]

Prepare the given value for serialization to JSON. This function recursively replaces enumeration constants with their names, lists and tuples with equivalent lists, and dataclasses and dictionaries with equivalent dictionaries. While generating equivalent dictionaries, it also filters out entries that are None, the empty list [], or the empty tuple (). All other values remain unchanged.

deface.serde.dumps(data: Any, **kwargs: Any) str[source]

Return the result of serializing the given value as JSON text. This function simply wraps an invocation of the eponymous function in Python’s json package — after applying prepare() to the given data. It passes the keyword arguments through.

deface.validator

class deface.validator.Validator(value: deface.validator.T, filename: str = '', key: Optional[Union[int, str]] = None, parent: Optional[deface.validator.Validator[deface.validator.T]] = None)[source]

Bases: Generic[deface.validator.T]

property filename: str

Get the filename for the file with the JSON data.

property only_key: str

Get the only key. If the current value is a singleton object, this method returns the only key. Otherwise, it raises an assertion error.

property keypath: str

Determine the key path for this validator value. The key path is composed from list items, formatted as say [42], and object fields, formatted like .answer for fields named with Python identifiers or like ["42"] otherwise.

property value: deface.validator.T

Get the current value.

raise_invalid(message: str) NoReturn[source]

Raise a validation error for the current value. The error message is automatically formatted as the character sequence consisting of filename, keypath, a space, and the given message string.

Raises

ValidationError – indicates a malformed JSON object.

to_integer() deface.validator.Validator[int][source]

Coerce the current value to an integer.

Raises

ValidationError – indicates that the current value is not an integer.

to_float() deface.validator.Validator[float][source]

Coerce the current value to an integral or floating point number.

Raises

ValidationError – indicates that the current value is neither an integer nor a floating point number.

to_string() deface.validator.Validator[str][source]

Coerce the current value to a string.

Raises

ValidationError – indicates that the current value is not a string.

to_list() deface.validator.Validator[list[typing.Any]][source]

Coerce the current value to a list.

Raises

ValidationError – indicates that the current value is not a list.

items() collections.abc.Iterator[deface.validator.Validator[deface.validator.T]][source]

Get an iterator over the current list value’s items. Each item is wrapped in the appropriate validator to continue validating the JSON data. If the current value is not a list, this method raises an assertion error.

to_object(valid_keys: Optional[set[str]] = None, singleton: bool = False) deface.validator.Validator[dict[str, typing.Any]][source]

Coerce the current value to an object. If valid_keys are given, this method validates the object’s fields against the given field names. If singleton is True, the object must have exactly one field.

Raises

ValidationError – indicates that the current value is not an object, not an object with a single key, or has a field with unknown name.

__getitem__(key: Union[int, str]) deface.validator.Validator[Any][source]

Index the current value with the given key to create a new child validator. The given key becomes the new validator’s key and the result of the indexing operation becomes the new validator’s value. This validator becomes the new validator’s parent.

Raises
  • TypeError – indicates that the current value is neither list nor object, that the key is not an integer even though the current value is a list, or that the key is not a string even though the current value is an object.

  • IndexError – indicates that the integer key is out of bounds for the current list value.

  • ValidationError – indicates that the required field named by the given key for the current object value is missing.