o ?"åg<ã@sdZddlmZmZmZddlZddlmZeƒr'ddlm Z m Z ddlmZGdd „d ƒZ d ejddfd d„Zejjdddejdejdejdejfdd„ƒZ ddejjdejdejdejdeejdfdeedeedeejdeejejffdd„ZdS)a7 Partially inspired by torchtune's flex attention implementation Citation: @software{torchtune, title = {torchtune: PyTorch's finetuning library}, author = {torchtune maintainers and contributors}, url = {https//github.com/pytorch/torchtune}, license = {BSD-3-Clause}, month = apr, year = {2024} } é)ÚOptionalÚTupleÚUnionNé)Úis_torch_flex_attn_available)Ú BlockMaskÚflex_attention)Úcreate_block_maskcsJeZdZdZdZdZdZ‡fdd„Zej j dddd„ƒZd d „Z‡Z S)ÚWrappedFlexAttentionzh We are doing a singleton class so that flex attention is compiled once when it's first called. NFcs|jdurtƒ |¡|_|jS©N)Ú _instanceÚsuperÚ__new__)ÚclsÚargsÚkwargs©Ú __class__©új/mnt/skqttb/ctump_chatbot/chatbot/lib/python3.10/site-packages/transformers/integrations/flex_attention.pyr7s zWrappedFlexAttention.__new__©Ú recursivecCs(|jdurtjtdd|_d|_dSdS)z> Initialize or update the singleton instance. F)ÚdynamicTN)Ú_is_flex_compiledÚtorchÚcompilerÚ_compiled_flex_attention©ÚselfrrrÚ__init__=s þzWrappedFlexAttention.__init__cCs|jSr)rrrrrÚ__call__FszWrappedFlexAttention.__call__)Ú__name__Ú __module__Ú__qualname__Ú__doc__rrrrrÚcompilerÚdisablerr Ú __classcell__rrrrr .s r Úattention_mask_2dÚreturnrcs4|j}|‰ˆj\}}‡fdd„}t||d|||dS)a Create a block causal document mask for a batch of sequences, both packed and unpacked. Create Block causal logic and passing it into :func:`torch.nn.attention.flex_attention.create_block_mask`. The resultant BlockMask is a compressed representation of the full block causal mask. BlockMask is essential for performant computation of flex attention. See: https://pytorch.org/blog/flexattention/ Args: attention_mask_2d (torch.Tensor): Attention mask for packed and padded sequences of shape (batch_size, total_seq_len). e.g. For unpacked sequence: [[1, 1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0, 0]] For packed sequence: [[1, 1, 1, 2, 2, 2, 0], [1, 1, 2, 2, 2, 3, 3]] Returns: BlockMask cs<||k}ˆ||fˆ||fk}ˆ||fdk}||@|@S)zý Defines the logic of a block causal mask by combining both a standard causal mask and a block diagonal document mask. See :func:`~torchtune.modules.attention_utils.create_block_causal_mask` for an illustration. rr)Ú batch_idxÚhead_idxÚq_idxÚkv_idxÚcausal_maskÚ document_maskÚpadding_mask©Údocument_idsrrÚcausal_mask_modksz4make_flex_block_causal_mask..causal_mask_modN)Úmask_modÚBÚHÚQ_LENÚKV_LENÚdevice)r9ÚshapeÚcreate_block_causal_mask_flex)r(r9Ú batch_sizeÚ total_seq_lenr3rr1rÚmake_flex_block_causal_maskJs úr>FrÚqueryÚkeyÚvaluecKstƒƒ}||||fi|¤ŽSr)r )r?r@rArÚflex_attention_compiledrrrÚcompile_friendly_flex_attention‚sýürCÚmoduleÚattention_maskÚscalingÚsoftcapÚ head_maskc sšd} d‰t|tƒr|} n|‰ˆdur%ˆdd…dd…dd…d|jd…f‰‡‡‡fdd„} t|||| | d|dd\}}| |j¡}| dd¡ ¡}||fS)Néþÿÿÿcs^ˆdur ˆt |ˆ¡}ˆdur|ˆ|d||}ˆdur-|ˆ||dd}|S)Nr)rÚtanh)Úscorer*r+r,r-©r.rHrGrrÚ score_mod¨sz)flex_attention_forward..score_modT)rMÚ block_maskÚ enable_gqaÚscaleÚ return_lseér)Ú isinstancerr:rCÚtoÚdtypeÚ transposeÚ contiguous) rDr?r@rArErFrGrHrrNrMÚattn_outputÚattention_weightsrrLrÚflex_attention_forward“s* & ö rZ)NNN)r$ÚtypingrrrrÚutilsrÚ!torch.nn.attention.flex_attentionrrr r;r ÚTensorr>r%r&rCÚnnÚModuleÚfloatrZrrrrÚsR8ÿþýûøÿþýüûúùø ö